Before Optimization
Baseline performance - may miss details or have inconsistent format
Make your extraction more accurate with automatic optimization. LangStruct learns from your data to improve results without any manual prompt engineering.
Create an extractor and call optimize()
when you’re ready:
from langstruct import LangStruct
extractor = LangStruct( example={ "name": "Dr. Sarah Johnson", "age": 34, "occupation": "data scientist" })
# Later, once you have training data ready:# extractor.optimize(texts=training_texts, expected_results=good_results)
Quick experiments (skip optimization entirely):
extractor = LangStruct(example={"name": "John", "age": 30})
If you have examples of what good extraction looks like, run optimization explicitly:
# Your training examplestraining_texts = [ "Dr. Sarah Johnson, 34, is a data scientist", "Prof. Michael Chen, 45, teaches at MIT", "Emma Wilson, 28, software engineer"]
# What the results should look likegood_results = [ {"name": "Dr. Sarah Johnson", "age": 34, "occupation": "data scientist"}, {"name": "Prof. Michael Chen", "age": 45, "occupation": "professor"}, {"name": "Emma Wilson", "age": 28, "occupation": "software engineer"}]
# Train it to be betterextractor.optimize(texts=training_texts, expected_results=good_results)
# Now it's optimized for your specific use caseresult = extractor.extract("Jane Smith, 29, works as a designer")
Don’t have labeled training data? No problem! You can optimize using just the texts and let LangStruct use the model’s confidence scores:
# Just provide the texts - no need for expected resultstraining_texts = [ "Dr. Sarah Johnson, 34, is a data scientist", "Prof. Michael Chen, 45, teaches at MIT", "Emma Wilson, 28, software engineer", "Dr. Lisa Park, 39, works in research", "John Davis, 31, is a consultant"]
# Optimize using confidence scoresextractor.optimize(texts=training_texts)
# The extractor learns from the patterns in your dataresult = extractor.extract("Jane Smith, 29, works as a designer")
When to use this approach:
Note: While confidence-based optimization works well, providing expected_results
will give you better accuracy if you have the time to create them.
Optimization can significantly improve accuracy on real-world tasks:
Before Optimization
Baseline performance - may miss details or have inconsistent format
After Optimization
Improved performance - better information capture and consistent formatting
Save and load optimized extractors to reuse them without re-running optimization:
# Save after optimizationextractor.save("./my_extractor")
# Load laterfrom langstruct import LangStructloaded = LangStruct.load("./my_extractor")
# Use immediately - optimization is preservedresult = loaded.extract("new text")
Most users don’t need this, but if you want more control:
# Fine-tune the optimization processextractor.optimize( texts=training_texts, expected_results=good_results, validation_split=0.3 # Use 30% for testing improvements)
Start Simple
Start without optimization for quick experiments, enable when you need accuracy
Quality Over Quantity
10 good training examples beats 100 poor ones
Test on Real Data
Optimize with data similar to what you’ll use in production
Save Your Work
Always save optimized extractors so you don’t lose progress
Q: Do I always need training data?
A: You need example texts, but not necessarily expected outputs. If you don’t provide expected_results
, LangStruct uses the LLM’s confidence ratings to optimize. Providing expected outputs significantly improves accuracy.
Q: How long does optimization take? A: Usually 1-5 minutes for typical datasets (10-100 examples).
Q: Can I optimize an already optimized extractor? A: Yes, you can continue optimizing with new data as you collect it.
Q: Will this make my extractions slower? A: No - optimization happens once during training. Production extraction speed is unchanged.
Q: What happens when I switch models? A: Change the model and re-optimize with the same training data. No prompt rewriting needed.
Try It Now
Create a LangStruct extractor and enable optimization when you need accuracy.
Source Grounding
Examples