Save & Load Extractors
LangStruct extractors can be saved and loaded with complete state preservation, including optimized prompts, refinement configurations, and all DSPy module state. This enables:
- Training once, deploying everywhere: Save optimized extractors for production use
- Team collaboration: Share extractors across development teams
- Version control: Track extractor versions alongside code
- Cost efficiency: Avoid re-optimization on every deployment
Quick Start
Section titled “Quick Start”from langstruct import LangStruct
# Create and configure extractorextractor = LangStruct(example={ "company": "Apple Inc.", "revenue": 125.3, "quarter": "Q3 2024"})
# Save the extractor (creates directory structure)extractor.save("./my_extractor")
# Load anywhere (API keys must be available)loaded_extractor = LangStruct.load("./my_extractor")
# Works exactly like the originalresult = loaded_extractor.extract("Microsoft reported $56B in Q1 2024")print(result.entities)# {'company': 'Microsoft', 'revenue': 56.0, 'quarter': 'Q1 2024'}
from langstruct import LangStruct
# Create extractorextractor = LangStruct( example={"name": "John", "age": 30, "role": "engineer"},)
# Train the extractortraining_texts = ["Your domain-specific texts..."]expected_results = [{"name": "Expected outputs..."}]
extractor.optimize( texts=training_texts, expected_results=expected_results)
# Save optimized stateextractor.save("./optimized_extractor")
# Load preserves all optimizationsloaded = LangStruct.load("./optimized_extractor")# Contains learned prompts and examples from optimization
from langstruct import LangStruct, Refine
# Create extractor with refinement configurationextractor = LangStruct( example={"product": "iPhone", "rating": 4.5, "sentiment": "positive"}, refine=Refine( strategy="bon_then_refine", n_candidates=5, max_refine_steps=2 ))
# Save with refinement configextractor.save("./refined_extractor")
# Load preserves refinement settingsloaded = LangStruct.load("./refined_extractor")
# Refinement works automaticallyresult = loaded.extract(text, refine=True) # Uses saved config
What Gets Saved
Section titled “What Gets Saved”LangStruct saves complete extractor state in a clean directory structure:
my_extractor/├── langstruct_metadata.json # Schema, model config, versions├── pipeline.json # DSPy pipeline state (native format)├── optimizer_state.json # Optimizer config (if optimization used)└── refinement_config.json # Refinement settings (if configured)
Preserved Components
Section titled “Preserved Components”- Schema Definition: Both predefined and dynamically generated schemas
- DSPy Pipeline State: Optimized prompts, learned examples, module parameters
- Model Configuration: Model name and settings (API keys never saved)
- Chunking Configuration: Text processing settings
- Optimization State: Optimizer type and configuration
- Refinement Configuration: Refinement strategy and parameters
- Source Grounding Settings: Whether source tracking is enabled
Security & Best Practices
Section titled “Security & Best Practices”- API keys are never saved for security
- Version compatibility checking prevents silent failures
- Graceful fallbacks for missing schemas or configuration
- Human-readable formats for easy debugging and inspection
Advanced Usage
Section titled “Advanced Usage”Schema Types
Section titled “Schema Types”# Schema generated from examplesextractor = LangStruct(example={ "name": "Alice", "skills": ["Python", "ML"]})
extractor.save("./dynamic_schema_extractor")
# Schema is reconstructed from saved JSON schemaloaded = LangStruct.load("./dynamic_schema_extractor")
from pydantic import BaseModel, Field
class PersonSchema(BaseModel): name: str = Field(description="Full name") age: int = Field(description="Age in years") location: str = Field(description="Current location")
extractor = LangStruct(schema=PersonSchema)extractor.save("./predefined_schema_extractor")
# Attempts to import original schema, falls back to reconstructionloaded = LangStruct.load("./predefined_schema_extractor")
Version Compatibility
Section titled “Version Compatibility”# Version checking prevents major incompatibilitiestry: extractor = LangStruct.load("./old_extractor")except PersistenceError as e: print(f"Incompatible version: {e}") # Handle migration or recreation
Version compatibility rules:
- Major version differences: Not supported (raises error)
- Minor version differences: Warning issued but loading continues
- Patch version differences: Silent compatibility
Error Handling
Section titled “Error Handling”from langstruct.exceptions import PersistenceError
try: extractor = LangStruct.load("./my_extractor")except PersistenceError as e: if "API key" in str(e): print("Set required API key environment variable") elif "version" in str(e): print("Extractor version incompatible") elif "corrupted" in str(e): print("Save files corrupted or invalid") else: print(f"Unknown persistence error: {e}")
Common error scenarios:
- Missing API keys for the saved model
- Corrupted or missing save files
- Version incompatibilities
- Schema reconstruction failures
Production Deployment
Section titled “Production Deployment”Deployment Workflow
Section titled “Deployment Workflow”# Development: Train and saveextractor = LangStruct(schema=MySchema)extractor.optimize(training_data, expected_results)extractor.save("./production_extractor")
# Production: Load and usedef load_extractor(): return LangStruct.load("./production_extractor")
# Use in API or serviceextractor = load_extractor()result = extractor.extract(incoming_text)
Docker Integration
Section titled “Docker Integration”# DockerfileCOPY ./saved_extractors /app/extractorsENV OPENAI_API_KEY="" # Set at runtime
# In applicationextractor = LangStruct.load("/app/extractors/my_extractor")
Environment Configuration
Section titled “Environment Configuration”import os
# Validate API keys before loadingrequired_key = "OPENAI_API_KEY" # Based on saved modelif not os.getenv(required_key): raise EnvironmentError(f"Missing {required_key}")
extractor = LangStruct.load("./extractor")
Best Practices
Section titled “Best Practices”Save Organization
Section titled “Save Organization”# Organize saves by version/purposeextractor.save("./extractors/v1.0/invoice_processor")extractor.save("./extractors/production/customer_feedback")extractor.save("./extractors/staging/contract_analyzer")
Validation After Load
Section titled “Validation After Load”# Verify loaded extractor works as expectedloaded = LangStruct.load("./my_extractor")
# Quick validationtest_text = "Known good input text"result = loaded.extract(test_text)assert result.confidence > 0.8, "Extractor confidence too low"
# Schema validationexpected_fields = {"field1", "field2", "field3"}actual_fields = set(loaded.schema.get_field_descriptions().keys())assert expected_fields == actual_fields, "Schema fields don't match"
Backup and Recovery
Section titled “Backup and Recovery”import shutilfrom pathlib import Path
# Backup before updatessave_path = Path("./my_extractor")backup_path = Path("./backups/my_extractor_backup")shutil.copytree(save_path, backup_path)
# Update extractorextractor.optimize(new_training_data)extractor.save(str(save_path))
# Rollback if neededif validation_fails(): shutil.rmtree(save_path) shutil.copytree(backup_path, save_path)
Migration Guide
Section titled “Migration Guide”Updating Extractors
Section titled “Updating Extractors”When LangStruct versions change:
- Test compatibility with existing saves
- Backup critical extractors before updating
- Re-optimize if needed for best performance
- Update deployment scripts for new API if changed
# Migration script exampledef migrate_extractor(old_path, new_path): try: # Try loading with new version extractor = LangStruct.load(old_path)
# Re-save in new format extractor.save(new_path) print(f"Migrated {old_path} → {new_path}")
except PersistenceError as e: print(f"Migration failed for {old_path}: {e}") # Handle manual migration
Performance Considerations
Section titled “Performance Considerations”- Loading time: Proportional to DSPy pipeline complexity
- Save size: Typically 10-100KB for basic extractors
- Optimization state: Larger saves for heavily optimized extractors
- Network deployment: Consider compression for remote deployment
Save/load operations are designed to be fast and lightweight, suitable for production use cases including serverless deployments.