Skip to content

Optimization

Make your extraction more accurate with automatic optimization. LangStruct learns from your data to improve results without any manual prompt engineering.

Optimization is enabled by default - you’re already getting better results:

from langstruct import LangStruct
# Create extractor (optimization enabled by default)
extractor = LangStruct(example={
"name": "Dr. Sarah Johnson",
"age": 34,
"occupation": "data scientist"
})
result = extractor.extract("Dr. Sarah Johnson, 34, is a data scientist")
print(result.entities) # Already optimized results!

That’s it! Your extractions are automatically improving over time.

To disable optimization (not recommended):

# Only if you need faster startup and don't care about accuracy
extractor = LangStruct(
example={"name": "John", "age": 30},
optimize=False
)

If you have examples of what good extraction looks like, LangStruct can get even better:

# Your training examples
training_texts = [
"Dr. Sarah Johnson, 34, is a data scientist",
"Prof. Michael Chen, 45, teaches at MIT",
"Emma Wilson, 28, software engineer"
]
# What the results should look like
good_results = [
{"name": "Dr. Sarah Johnson", "age": 34, "occupation": "data scientist"},
{"name": "Prof. Michael Chen", "age": 45, "occupation": "professor"},
{"name": "Emma Wilson", "age": 28, "occupation": "software engineer"}
]
# Train it to be better
extractor.optimize(training_texts, good_results)
# Now it's optimized for your specific use case
result = extractor.extract("Jane Smith, 29, works as a designer")

Optimization typically improves accuracy by 20-40% on real-world tasks:

Before Optimization

65% accuracy - Misses details, inconsistent format

After Optimization

87% accuracy - Catches more information, consistent results

Once optimized, save it so you don’t have to re-train:

# After optimization
extractor.save("my_optimized_extractor.json")
# Later, load it back
extractor = LangStruct.load("my_optimized_extractor.json")

Most users don’t need this, but if you want more control:

# Fine-tune the optimization process
extractor.optimize(
texts=training_texts,
expected_results=good_results,
num_trials=50, # More trials = better results (takes longer)
validation_split=0.3 # Use 30% for testing improvements
)

Start Simple

Optimization is enabled by default - just create your extractor

Quality Over Quantity

10 good training examples beats 100 poor ones

Test on Real Data

Optimize with data similar to what you’ll use in production

Save Your Work

Always save optimized extractors so you don’t lose progress

Q: Do I always need training data?
A: No! Optimization works without any training data and still improves results.

Q: How long does optimization take?
A: Usually 1-5 minutes for typical datasets (10-100 examples).

Q: Can I optimize an already optimized extractor?
A: Yes! You can keep optimizing with new data as you get it.

Q: Will this make my extractions slower?
A: No - optimization happens once during training. Production extraction speed is the same.

Try It Now

Create a LangStruct extractor - optimization is already enabled!