Skip to content

LangStruct

Turn unstructured text into clean, typed data. No prompt engineering, just examples and automatic optimization.
from langstruct import LangStruct
# Define what you want to extract with a simple example
extractor = LangStruct(example={
"patient_name": "John Doe",
"diagnosis": "Type 2 Diabetes",
"medication": "metformin",
"dosage": "500mg"
})
# Extract from any unstructured text
text = "Patient John Smith diagnosed with hypertension, prescribed lisinopril 10mg daily."
result = extractor.extract(text)
print(result.entities)
# {"patient_name": "John Smith", "diagnosis": "hypertension",
# "medication": "lisinopril", "dosage": "10mg"}
print(result.sources) # Know exactly where each value came from
# {"patient_name": [CharSpan(8, 18, "John Smith")], ...}

No Prompt Engineering

DSPy automatically optimizes prompts for accuracy. Focus on your data, not prompts.

Model Portability

Switch between any LLM (OpenAI, Claude, Gemini, local) and auto-reoptimize instantly.

Source Attribution

Know exactly where each extracted value came from in the original text.

Future-Proof

Never rewrite prompts when new models emerge - just change one line and re-optimize.

Perfect for:

  • Document processing: Invoices, medical records, legal contracts, reports
  • Data pipelines: Converting unstructured text to database records
  • RAG enhancement: Adding structured filters to semantic search
  • Compliance: Extracting required fields with source attribution
  • Research: Processing papers, patents, technical documents

Not ideal for:

  • Simple pattern matching (use regex instead)
  • When you have thousands of labeled examples (train a classifier)
  • Sub-100ms latency requirements (LLM calls take time)
  • Streaming/real-time extraction needs
  • DSPy dependency: Built on DSPy 3.0 for automatic prompt optimization
  • Optimization cost: Initial optimization requires 50-100 example calls
  • LLM costs: Each extraction is an LLM call (use caching)
  • No streaming: Extracts complete documents only
  • Context limits: Large documents need chunking
FeatureLangStructLangExtract
Optimization✅ Automatic (DSPy MIPROv2)⚠️ Manual prompts/examples
Refinement✅ Best-of-N + iterative improvement⚠️ Multi-pass extraction; no Best-of-N/judge pipeline
Schema Definition✅ From examples OR Pydantic⚠️ Prompt + examples (no Pydantic models)
Source Grounding✅ Character-level tracking✅ Character-level tracking
Confidence Scores✅ Built-in⚠️ Not surfaced as scores
Query Parsing✅ Bidirectional (docs + queries)❌ Documents only
Model Support✅ Any LLM (via DSPy/LiteLLM)✅ Gemini, OpenAI, local via Ollama; extensible
Learning Curve✅ Simple (example-based)⚠️ Requires prompt + example design
Performance✅ Self-optimizing⚠️ Depends on manual tuning
Project TypeCommunity open-sourceGoogle open-source

Comparison verified on 2025-09-10 against the latest LangExtract docs. See LangExtract: https://github.com/google/langextract and example walkthroughs such as https://github.com/google/langextract/blob/main/docs/examples/longer_text_example.md.

Terminal window
pip install langstruct
# Set up any API key (choose one):
export OPENAI_API_KEY="sk-your-key" # OpenAI
export GOOGLE_API_KEY="your-key" # Google Gemini
export ANTHROPIC_API_KEY="sk-ant-key" # Claude models
# Or use local models with Ollama (no API key needed)

Links: Documentation | GitHub | Examples