Skip to content

Examples

Explore real-world examples of LangStruct in action across different domains. Each example includes complete code, explanations, and best practices for production use.

Browse complete runnable examples on GitHub

Financial Documents

Extract metrics, dates, and insights from earnings reports, SEC filings, and financial statements View Example

Medical Records

Process clinical notes, lab reports, and medical documents View Example

Legal Contracts

Analyze contracts, agreements, and legal documents for key terms and risks View Example

Scientific Papers

Extract methodology, results, and citations from scientific literature View Example

Perfect for getting started with LangStruct:

from pydantic import BaseModel, Field
from langstruct import LangStruct
class PersonSchema(BaseModel):
name: str = Field(description="Full name")
age: int = Field(description="Age in years")
occupation: str = Field(description="Job title")
extractor = LangStruct(schema=PersonSchema)
result = extractor.extract("Dr. Sarah Johnson, 34, is a data scientist at Google")
print(result.entities) # {'name': 'Dr. Sarah Johnson', 'age': 34, 'occupation': 'data scientist'}

Extract structured product data from descriptions:

class ProductSchema(BaseModel):
name: str = Field(description="Product name")
price: float = Field(description="Price in USD")
features: List[str] = Field(description="Key features")
brand: str = Field(description="Brand name")
extractor = LangStruct(schema=ProductSchema)
text = """
MacBook Pro 16" - $2,399
Features: M2 Pro chip, 16GB RAM, 512GB SSD, Retina display
Brand: Apple
"""
result = extractor.extract(text)

Financial Documents

Quarterly earnings, SEC filings, balance sheets View Example

Market Research

Consumer surveys, market analysis, competitor data

Sales Data

CRM records, sales reports, customer feedback

Medical Records

Patient records, diagnostic reports, treatment plans View Example

Scientific Papers

Medical literature, clinical trial results, case studies View Example

Lab Reports

Test results, pathology reports, imaging studies

Legal Contracts

Service agreements, NDAs, employment contracts View Example

Regulatory Filings

SEC documents, compliance reports, legal notices

Case Law

Court decisions, legal precedents, case summaries

For the use cases mentioned above that don’t have dedicated pages, here are quick implementation examples:

class MarketResearchSchema(BaseModel):
survey_topic: str = Field(description="Main research topic")
respondents: int = Field(description="Number of survey respondents")
key_findings: List[str] = Field(description="Primary research findings")
demographics: List[str] = Field(description="Respondent demographics")
recommendations: List[str] = Field(description="Business recommendations")
market_extractor = LangStruct(schema=MarketResearchSchema)
class SalesDataSchema(BaseModel):
customer_name: str = Field(description="Customer or company name")
deal_value: float = Field(description="Deal value in USD")
products: List[str] = Field(description="Products or services sold")
sales_rep: str = Field(description="Sales representative name")
close_date: str = Field(description="Deal close date")
pipeline_stage: str = Field(description="Current sales stage")
sales_extractor = LangStruct(schema=SalesDataSchema)
class LabReportSchema(BaseModel):
patient_id: str = Field(description="Patient identifier")
test_type: str = Field(description="Type of laboratory test")
results: List[Dict[str, str]] = Field(description="Test results with values and units")
reference_ranges: List[str] = Field(description="Normal reference ranges")
abnormal_flags: List[str] = Field(description="Abnormal or critical values")
ordering_physician: str = Field(description="Physician who ordered tests")
lab_extractor = LangStruct(schema=LabReportSchema)
  • Person Extraction - Names, ages, occupations
  • Product Listings - E-commerce product data
  • Contact Information - Emails, phone numbers, addresses
  • Event Details - Dates, locations, descriptions
  • Financial Documents - Earnings reports with metrics
  • News Articles - Entities, sentiment, key facts
  • Academic Papers - Authors, abstracts, methodologies
  • Customer Reviews - Ratings, sentiment, product aspects
  • Medical Records - Clinical data extraction from medical documents
  • Legal Contracts - Risk analysis and compliance checking
  • Scientific Literature - Complex research data extraction
  • Financial Analysis - Multi-document portfolio analysis
  • Resume Parsing - Extract candidate information
  • Invoice Processing - Line items, totals, vendor details
  • Email Analysis - Sender, intent, action items
  • Document Libraries - Process hundreds of documents
  • Compliance Monitoring - Regular regulatory document analysis
  • Content Migration - Legacy system data extraction
  • Live Chat Analysis - Customer service automation
  • Social Media Monitoring - Real-time sentiment analysis
  • News Feed Processing - Breaking news categorization
from langstruct import LangStruct
class DocumentProcessor:
def __init__(self, schema):
self.extractor = LangStruct(schema=schema)
def process_batch(self, documents):
# Process multiple documents efficiently using built-in batch processing
results = self.extractor.extract(documents)
return results
# Process many documents efficiently
processor = DocumentProcessor(YourSchema)
results = processor.process_batch(document_list)
def robust_extraction(text, extractor):
try:
result = extractor.extract(text)
# Validate extraction quality
if result.confidence < 0.8:
print(f"Low confidence: {result.confidence}")
# Handle low confidence extractions
return result
except Exception as e:
print(f"Extraction error: {e}")
# Handle extraction errors
# Track basic extraction metrics
def track_extraction_performance(text, extractor):
result = extractor.extract(text)
print(f"Extraction confidence: {result.confidence:.2f}")
print(f"Fields extracted: {len(result.entities)}")
# Check if source tracking is available
if hasattr(result, 'sources') and result.sources:
print(f"Source locations tracked: {len(result.sources)}")
return result
# Usage
result = track_extraction_performance(document_text, extractor)
  • Be Specific - Use detailed field descriptions
  • Use Types - Leverage Python type hints for validation
  • Nested Structure - Model complex relationships with nested schemas
  • Optional Fields - Mark non-essential fields as optional
  • Graceful Degradation - Handle partial extractions
  • Confidence Thresholds - Set minimum confidence levels
  • Fallback Strategies - Define backup extraction approaches
  • Logging - Track errors for debugging and improvement
  • Batch Processing - Process multiple documents together
  • Rate Limits - Respect provider quotas with rate_limit
  • Model Selection - Choose appropriate models for your use case

Want to contribute an example? We welcome contributions that demonstrate:

  • Novel Use Cases - New domains or applications
  • Best Practices - Production-ready implementations
  • Performance Optimizations - Efficient processing techniques
  • Integration Patterns - Working with other tools and systems

View all examples on GitHub

Ready to dive into specific examples?