Medical Accuracy
Specialized for medical terminology and clinical concepts
Process clinical notes, radiology reports, and medical documents to extract structured information for analysis and research.
Start with a basic example to extract key medical information:
from langstruct import LangStruct
# Create a simple medical extractor from an exampleextractor = LangStruct(example={ "patient_age": 45, "chief_complaint": "chest pain", "diagnosis": "myocardial infarction", "medication": "aspirin"})
# Extract from a clinical notetext = """Patient: 45-year-old male presenting with acute chest pain.Assessment: Diagnosed with myocardial infarction.Treatment: Started on aspirin therapy."""
result = extractor.extract(text)print(result.entities)# {'patient_age': 45, 'chief_complaint': 'chest pain', 'diagnosis': 'myocardial infarction', 'medication': 'aspirin'}
For production healthcare systems, define detailed schemas:
from langstruct import LangStruct, Schema, Fieldfrom typing import List, Optional, Dictfrom datetime import datetime
class PatientInfoSchema(Schema): patient_id: Optional[str] = Field(description="Patient identifier (if present)") age: Optional[int] = Field(description="Patient age") gender: Optional[str] = Field(description="Patient gender") admission_date: Optional[str] = Field(description="Hospital admission date") discharge_date: Optional[str] = Field(description="Hospital discharge date")
class DiagnosisSchema(Schema): primary_diagnosis: str = Field(description="Primary medical diagnosis") secondary_diagnoses: List[str] = Field(description="Additional diagnoses") icd_codes: List[str] = Field(description="ICD-10 diagnostic codes") severity: Optional[str] = Field(description="Condition severity (mild/moderate/severe)")
class MedicalRecordSchema(Schema): # Patient information patient_info: PatientInfoSchema = Field(description="Patient demographic information")
# Clinical findings chief_complaint: str = Field(description="Primary reason for visit/admission") symptoms: List[str] = Field(description="Reported symptoms") vital_signs: Dict[str, str] = Field(description="Vital sign measurements")
# Diagnoses diagnoses: DiagnosisSchema = Field(description="Medical diagnoses")
# Treatment medications: List[str] = Field(description="Prescribed medications") procedures: List[str] = Field(description="Medical procedures performed") treatment_plan: List[str] = Field(description="Ongoing treatment recommendations")
# Clinical notes assessment: str = Field(description="Clinical assessment summary") prognosis: Optional[str] = Field(description="Expected outcome") follow_up: List[str] = Field(description="Follow-up instructions")
# Lab results (if present) lab_results: Optional[Dict[str, str]] = Field(description="Laboratory test results")
Extract key information from clinical notes:
# Create medical data extractormedical_extractor = LangStruct( schema=MedicalRecordSchema, model="gemini-2.5-flash", # Fast and reliable for medical analysis temperature=0.0, # Zero temperature for consistent medical analysis use_sources=True # Track sources for validation)
# Sample clinical note (pre-sanitized for privacy)clinical_note = """PATIENT: [PATIENT_NAME_REDACTED] (MRN: [MRN_REDACTED])DOB: [DATE_REDACTED] AGE: 67 GENDER: FemaleADMISSION DATE: 2024-03-15
CHIEF COMPLAINT:Chest pain and shortness of breath
HISTORY OF PRESENT ILLNESS:67-year-old female presented to ED with acute onset chest pain radiating to left arm,associated with dyspnea and diaphoresis. Symptoms started 2 hours prior to arrival.Patient has history of hypertension and hyperlipidemia.
VITAL SIGNS:BP: 145/95 mmHg, HR: 98 bpm, RR: 22/min, O2 Sat: 94%, Temp: 98.6°F
PHYSICAL EXAMINATION:Cardiovascular: Irregular rhythm, no murmursRespiratory: Bilateral crackles at bases
LABORATORY RESULTS:Troponin I: 2.4 ng/mL (elevated)CK-MB: 18 ng/mL (elevated)BNP: 450 pg/mL (elevated)
ASSESSMENT AND PLAN:PRIMARY DIAGNOSIS: Acute ST-elevation myocardial infarction (STEMI) - Inferior wallSECONDARY DIAGNOSES:- Acute heart failure with preserved ejection fraction- Hypertension- Hyperlipidemia
ICD-10 CODES:- I21.19 - ST elevation myocardial infarction involving other coronary vessel- I50.30 - Unspecified diastolic heart failure
MEDICATIONS:- Aspirin 325mg daily- Metoprolol 25mg BID- Lisinopril 10mg daily- Atorvastatin 40mg daily
PROCEDURES PERFORMED:- Cardiac catheterization with PCI to RCA- Echocardiogram
TREATMENT PLAN:1. Continue dual antiplatelet therapy2. Optimize heart failure medications3. Cardiac rehabilitation referral4. Follow-up with cardiology in 1 week
PROGNOSIS: Good with appropriate medical management"""
# Extract medical informationresult = medical_extractor.extract(clinical_note)
print("=== Medical Record Analysis ===")print(f"Primary Diagnosis: {result.entities.diagnoses.primary_diagnosis}")print(f"Medications: {len(result.entities.medications)} prescribed")print(f"Procedures: {len(result.entities.procedures)} performed")print(f"Confidence: {result.confidence:.2f}")
class LabResultSchema(Schema): patient_id: Optional[str] = Field(description="Patient identifier") test_name: str = Field(description="Name of laboratory test") result_value: str = Field(description="Test result value") reference_range: Optional[str] = Field(description="Normal reference range") units: Optional[str] = Field(description="Measurement units") abnormal_flag: Optional[str] = Field(description="High/Low/Critical flag")
class LabReportSchema(Schema): patient_info: PatientInfoSchema = Field(description="Patient information") test_date: str = Field(description="Date tests were performed") lab_results: List[LabResultSchema] = Field(description="Individual test results") ordering_physician: Optional[str] = Field(description="Physician who ordered tests") critical_values: List[str] = Field(description="Critical or abnormal results")
lab_extractor = LangStruct(schema=LabReportSchema)
class RadiologyReportSchema(Schema): patient_info: PatientInfoSchema = Field(description="Patient information") study_type: str = Field(description="Type of imaging study") study_date: str = Field(description="Date of imaging study") indication: str = Field(description="Clinical indication for study") technique: str = Field(description="Imaging technique used") findings: List[str] = Field(description="Radiological findings") impression: str = Field(description="Radiologist's impression/conclusion") recommendations: List[str] = Field(description="Recommended follow-up")
radiology_extractor = LangStruct(schema=RadiologyReportSchema)
class DischargeSummarySchema(Schema): patient_info: PatientInfoSchema = Field(description="Patient information") admission_diagnosis: str = Field(description="Admission diagnosis") discharge_diagnosis: List[str] = Field(description="Final discharge diagnoses") hospital_course: str = Field(description="Summary of hospitalization") discharge_medications: List[str] = Field(description="Medications at discharge") discharge_instructions: List[str] = Field(description="Patient discharge instructions") follow_up_appointments: List[str] = Field(description="Scheduled follow-up care") discharge_disposition: str = Field(description="Where patient was discharged to")
discharge_extractor = LangStruct(schema=DischargeSummarySchema)
When working with medical data, implement appropriate privacy protections:
import re
def sanitize_medical_text(text: str) -> str: """Basic sanitization of sensitive information in medical text"""
# Note: This is a basic example - production systems need comprehensive detection patterns_to_redact = { 'dates': r'\b\d{1,2}/\d{1,2}/\d{4}\b', 'phone': r'\b\d{3}-\d{3}-\d{4}\b', 'mrn': r'\bMRN:?\s*\d+\b', 'names': r'\b[A-Z][a-z]+ [A-Z][a-z]+\b' }
sanitized = text for pattern_name, pattern in patterns_to_redact.items(): sanitized = re.sub(pattern, f'[{pattern_name.upper()}_REDACTED]', sanitized, flags=re.IGNORECASE)
return sanitized
# Example usagesanitized_text = sanitize_medical_text(clinical_note)result = medical_extractor.extract(sanitized_text)
print("Extracted medical data from sanitized text:")print(f"Primary diagnosis: {result.entities.diagnoses.primary_diagnosis}")
Process multiple medical records efficiently:
from pathlib import Path
class MedicalRecordProcessor: def __init__(self): self.extractor = LangStruct(schema=MedicalRecordSchema)
def process_medical_records(self, records_folder: Path): """Process multiple medical records""" record_files = list(records_folder.glob("*.txt"))
# Prepare documents for batch processing documents = [] file_names = []
for record_file in record_files: try: record_text = record_file.read_text() sanitized_text = sanitize_medical_text(record_text) documents.append(sanitized_text) file_names.append(record_file.name) except Exception as e: print(f"Error reading {record_file}: {e}")
# Process all documents in batch results = self.extractor.extract(documents)
# Format results processed_results = [] for i, result in enumerate(results): processed_results.append({ 'file': file_names[i], 'primary_diagnosis': result.entities.diagnoses.primary_diagnosis, 'medication_count': len(result.entities.medications), 'confidence': result.confidence, 'processed_at': datetime.now() })
return processed_results
# Usageprocessor = MedicalRecordProcessor()records = processor.process_medical_records(Path("./medical_records/"))
Medical Accuracy
Specialized for medical terminology and clinical concepts
Comprehensive Data
Extract diagnoses, medications, procedures, and lab results
Source Tracking
Track source locations for validation and verification
Batch Processing
Process large volumes of medical records efficiently
Ready to start processing medical records?
Start with Examples
Try the sample medical schemas with anonymized records
Implement Privacy Controls
Set up proper PHI detection and anonymization processes
Build Review Workflows
Create clinical review processes for extracted data
Scale Your Processing
Process large medical record collections with batch processing