Skip to content

Financial Document Processing

Learn how to extract structured financial data from earnings reports, SEC filings, financial statements, and other financial documents with high accuracy and regulatory compliance.

Start with a basic example to get familiar with financial data extraction:

from langstruct import LangStruct
# Create a simple financial extractor from an example
extractor = LangStruct(example={
"company": "Apple Inc",
"revenue": 81.8, # in billions
"net_income": 14.7,
"quarter": "Q1 2024"
})
# Extract from a simple earnings snippet
text = """
Apple Inc. reported strong Q1 2024 results with revenue of $81.8 billion
and net income of $14.7 billion, exceeding analyst expectations.
"""
result = extractor.extract(text)
print(result.entities)
# {'company': 'Apple Inc', 'revenue': 81.8, 'net_income': 14.7, 'quarter': 'Q1 2024'}

For production use, define detailed schemas for financial data extraction:

from langstruct import LangStruct, Schema, Field
from typing import List, Optional, Dict
from datetime import datetime
class FinancialMetricsSchema(Schema):
revenue: Optional[float] = Field(
description="Total revenue in millions USD"
)
net_income: Optional[float] = Field(
description="Net income/profit in millions USD"
)
gross_margin: Optional[float] = Field(
description="Gross margin as percentage"
)
operating_margin: Optional[float] = Field(
description="Operating margin as percentage"
)
ebitda: Optional[float] = Field(
description="EBITDA in millions USD"
)
eps: Optional[float] = Field(
description="Earnings per share in USD"
)
class FinancialReportSchema(Schema):
company_name: str = Field(
description="Full company name"
)
ticker_symbol: Optional[str] = Field(
description="Stock ticker symbol"
)
report_period: str = Field(
description="Reporting period (e.g., Q3 2024, FY 2023)"
)
report_date: Optional[str] = Field(
description="Report publication date"
)
# Financial metrics
current_metrics: FinancialMetricsSchema = Field(
description="Current period financial metrics"
)
previous_metrics: Optional[FinancialMetricsSchema] = Field(
description="Previous period comparison metrics"
)
# Key highlights
key_highlights: List[str] = Field(
description="Key business highlights and achievements"
)
business_outlook: Optional[str] = Field(
description="Management guidance and future outlook"
)
risks_concerns: List[str] = Field(
description="Risk factors and concerns mentioned"
)
# Segment performance
segment_performance: Optional[Dict[str, float]] = Field(
description="Revenue breakdown by business segment"
)
# Geographic breakdown
geographic_revenue: Optional[Dict[str, float]] = Field(
description="Revenue breakdown by geographic region"
)

Extract key information from quarterly earnings reports:

# Create specialized financial document extractor
financial_extractor = LangStruct(
schema=FinancialReportSchema,
model="gemini-2.5-flash" # Fast and cost-effective for financial data
# Auto-optimization and source grounding enabled by default
)
# Sample earnings report text
earnings_report = """
Apple Inc. (NASDAQ: AAPL) Fiscal Q4 2024 Results
Cupertino, California — October 31, 2024 — Apple today announced financial
results for its fiscal 2024 fourth quarter ended September 28, 2024.
Fourth Quarter Highlights:
- Total net sales of $94.9 billion, up 6% year-over-year
- iPhone revenue of $46.2 billion, up 5.5% year-over-year
- Services revenue reached record $24.2 billion, up 12% year-over-year
- Net income of $23.3 billion, or $1.46 per diluted share
- Generated $27.5 billion in operating cash flow
"We are pleased with our performance in Q4, driven by strong iPhone 15 adoption
and continued growth in our Services business," said Tim Cook, Apple's CEO.
"Our focus on innovation and customer experience continues to drive results."
Geographic Revenue Breakdown:
- Americas: $41.7 billion (44% of total revenue)
- Europe: $24.9 billion (26% of total revenue)
- Greater China: $15.0 billion (16% of total revenue)
- Japan: $7.4 billion (8% of total revenue)
- Rest of Asia Pacific: $5.9 billion (6% of total revenue)
Business Segment Performance:
- iPhone: $46.2 billion
- Mac: $7.0 billion
- iPad: $6.9 billion
- Wearables, Home and Accessories: $9.0 billion
- Services: $24.2 billion
Gross margin for the quarter was 46.2%, compared to 45.2% in the prior year.
Operating margin was 30.7%, compared to 29.8% in the prior year.
Looking ahead, we expect continued growth in Services and are excited about
our product pipeline for 2025. However, we remain cautious about
macroeconomic headwinds and supply chain challenges.
Risk factors include potential regulatory changes, competition in key markets,
and foreign exchange rate fluctuations.
"""
# Extract financial information
result = financial_extractor.extract(earnings_report)

Expected Output:

=== Financial Report Analysis ===
Company: Apple Inc.
Ticker: AAPL
Period: Q4 2024
Report Date: October 31, 2024
=== Current Period Metrics ===
Revenue: $94.9B
Net Income: $23.3B
EPS: $1.46
Gross Margin: 46.2%
Operating Margin: 30.7%
=== Key Highlights ===
1. Total net sales up 6% year-over-year to $94.9 billion
2. iPhone revenue grew 5.5% to $46.2 billion
3. Services revenue reached record $24.2 billion, up 12%
4. Generated $27.5 billion in operating cash flow
5. Strong iPhone 15 adoption driving growth
=== Business Segments ===
iPhone: $46.2B
Services: $24.2B
Wearables, Home and Accessories: $9.0B
Mac: $7.0B
iPad: $6.9B
=== Geographic Revenue ===
Americas: $41.7B
Europe: $24.9B
Greater China: $15.0B
Japan: $7.4B
Rest of Asia Pacific: $5.9B
=== Outlook & Risks ===
Outlook: Expect continued growth in Services and excited about product pipeline for 2025, but cautious about macroeconomic headwinds
Risk Factors:
• Potential regulatory changes
• Competition in key markets
• Foreign exchange rate fluctuations
• Macroeconomic headwinds
• Supply chain challenges
Extraction Confidence: 0.93

Process SEC filings like 10-K and 10-Q reports:

class SECFilingSchema(Schema):
company_name: str = Field(description="Company name")
cik_number: Optional[str] = Field(description="Central Index Key number")
filing_type: str = Field(description="Type of filing (10-K, 10-Q, 8-K, etc.)")
filing_date: str = Field(description="Filing date")
fiscal_period: str = Field(description="Fiscal period covered")
# Financial position
total_assets: Optional[float] = Field(
description="Total assets in millions USD"
)
total_liabilities: Optional[float] = Field(
description="Total liabilities in millions USD"
)
shareholders_equity: Optional[float] = Field(
description="Shareholders equity in millions USD"
)
cash_equivalents: Optional[float] = Field(
description="Cash and cash equivalents in millions USD"
)
# Business information
business_description: str = Field(
description="Description of company's business and operations"
)
competitive_strengths: List[str] = Field(
description="Company's stated competitive advantages"
)
risk_factors: List[str] = Field(
description="Key risk factors identified in filing"
)
# Legal proceedings
legal_proceedings: Optional[str] = Field(
description="Summary of material legal proceedings"
)
# Management discussion
md_a_highlights: List[str] = Field(
description="Key points from Management Discussion & Analysis"
)
# Process SEC filing
sec_extractor = LangStruct(schema=SECFilingSchema)

Calculate and extract financial ratios and metrics:

class FinancialRatiosSchema(Schema):
# Profitability ratios
gross_profit_margin: Optional[float] = Field(
description="Gross profit margin percentage"
)
net_profit_margin: Optional[float] = Field(
description="Net profit margin percentage"
)
return_on_equity: Optional[float] = Field(
description="Return on equity percentage"
)
return_on_assets: Optional[float] = Field(
description="Return on assets percentage"
)
# Liquidity ratios
current_ratio: Optional[float] = Field(
description="Current ratio (current assets / current liabilities)"
)
quick_ratio: Optional[float] = Field(
description="Quick ratio (liquid assets / current liabilities)"
)
# Leverage ratios
debt_to_equity: Optional[float] = Field(
description="Debt to equity ratio"
)
debt_to_assets: Optional[float] = Field(
description="Debt to assets ratio"
)
# Efficiency ratios
asset_turnover: Optional[float] = Field(
description="Asset turnover ratio"
)
inventory_turnover: Optional[float] = Field(
description="Inventory turnover ratio"
)
# Raw financial data for calculations
revenue: float = Field(description="Total revenue")
net_income: float = Field(description="Net income")
total_assets: float = Field(description="Total assets")
current_assets: float = Field(description="Current assets")
current_liabilities: float = Field(description="Current liabilities")
total_debt: float = Field(description="Total debt")
shareholders_equity: float = Field(description="Shareholders equity")

Set up robust financial document processing for production:

class FinancialProcessor:
"""Production-ready financial document processor"""
def __init__(self):
self.extractors = {
'earnings': LangStruct(schema=FinancialReportSchema),
'sec_filing': LangStruct(schema=SECFilingSchema),
'ratios': LangStruct(schema=FinancialRatiosSchema)
}
# Validation rules for financial data
self.validation_rules = {
'revenue_positive': lambda x: x.get('revenue', 0) >= 0,
'margin_reasonable': lambda x: 0 <= x.get('gross_margin', 0) <= 100,
'eps_format': lambda x: isinstance(x.get('eps'), (int, float)) or x.get('eps') is None
}
def detect_document_type(self, text: str) -> str:
"""Detect the type of financial document"""
text_lower = text.lower()
if any(term in text_lower for term in ['10-k', '10-q', '8-k', 'sec filing']):
return 'sec_filing'
elif any(term in text_lower for term in ['earnings', 'quarterly results', 'q1', 'q2', 'q3', 'q4']):
return 'earnings'
elif any(term in text_lower for term in ['balance sheet', 'income statement', 'financial ratios']):
return 'ratios'
else:
return 'earnings' # Default fallback

High Accuracy

Specialized models and schemas for financial terminology

Comprehensive Data

Extract metrics, ratios, forecasts, and risk factors

Source Grounding

Track exact locations for regulatory compliance

Production Ready

Built-in validation, error handling, and batch processing

  • Quarterly earnings analysis
  • Competitive benchmarking
  • Risk assessment
  • Portfolio screening
  • SEC filing monitoring
  • Risk factor tracking
  • Disclosure analysis
  • Audit trail maintenance
  • Financial health assessment
  • Debt capacity evaluation
  • Cash flow analysis
  • Covenant compliance
  • Use high-resolution financial documents
  • Validate extracted metrics against known benchmarks
  • Implement confidence thresholds for critical data
  • Cross-reference multiple data sources
  • Process similar document types in batches
  • Cache frequently used extractors
  • Use appropriate model sizes for different complexity levels
  • Implement async processing for large volumes
  • Track information sources with source grounding
  • Implement data retention policies
  • Ensure secure handling of confidential financial data
  • Regular validation against regulatory requirements

Financial document processing requires high accuracy - LangStruct’s source grounding features help track where information was extracted from the original documents.