Before Optimization
65% accuracy - Misses details, inconsistent format
Make your extraction more accurate with automatic optimization. LangStruct learns from your data to improve results without any manual prompt engineering.
Optimization is enabled by default - you’re already getting better results:
from langstruct import LangStruct
# Create extractor (optimization enabled by default)extractor = LangStruct(example={ "name": "Dr. Sarah Johnson", "age": 34, "occupation": "data scientist"})
result = extractor.extract("Dr. Sarah Johnson, 34, is a data scientist")print(result.entities) # Already optimized results!
That’s it! Your extractions are automatically improving over time.
To disable optimization (not recommended):
# Only if you need faster startup and don't care about accuracyextractor = LangStruct( example={"name": "John", "age": 30}, optimize=False)
If you have examples of what good extraction looks like, LangStruct can get even better:
# Your training examplestraining_texts = [ "Dr. Sarah Johnson, 34, is a data scientist", "Prof. Michael Chen, 45, teaches at MIT", "Emma Wilson, 28, software engineer"]
# What the results should look likegood_results = [ {"name": "Dr. Sarah Johnson", "age": 34, "occupation": "data scientist"}, {"name": "Prof. Michael Chen", "age": 45, "occupation": "professor"}, {"name": "Emma Wilson", "age": 28, "occupation": "software engineer"}]
# Train it to be betterextractor.optimize(training_texts, good_results)
# Now it's optimized for your specific use caseresult = extractor.extract("Jane Smith, 29, works as a designer")
Optimization typically improves accuracy by 20-40% on real-world tasks:
Before Optimization
65% accuracy - Misses details, inconsistent format
After Optimization
87% accuracy - Catches more information, consistent results
Once optimized, save it so you don’t have to re-train:
# After optimizationextractor.save("my_optimized_extractor.json")
# Later, load it backextractor = LangStruct.load("my_optimized_extractor.json")
Most users don’t need this, but if you want more control:
# Fine-tune the optimization processextractor.optimize( texts=training_texts, expected_results=good_results, num_trials=50, # More trials = better results (takes longer) validation_split=0.3 # Use 30% for testing improvements)
Start Simple
Optimization is enabled by default - just create your extractor
Quality Over Quantity
10 good training examples beats 100 poor ones
Test on Real Data
Optimize with data similar to what you’ll use in production
Save Your Work
Always save optimized extractors so you don’t lose progress
Q: Do I always need training data?
A: No! Optimization works without any training data and still improves results.
Q: How long does optimization take?
A: Usually 1-5 minutes for typical datasets (10-100 examples).
Q: Can I optimize an already optimized extractor?
A: Yes! You can keep optimizing with new data as you get it.
Q: Will this make my extractions slower?
A: No - optimization happens once during training. Production extraction speed is the same.
Try It Now
Create a LangStruct extractor - optimization is already enabled!
Source Grounding
Examples