Skip to content

Medical Records Processing

Process clinical notes, radiology reports, and medical documents to extract structured information for analysis and research.

Quick Start: Simple Medical Data Extraction

Section titled “Quick Start: Simple Medical Data Extraction”

Start with a basic example to extract key medical information:

from langstruct import LangStruct
# Create a simple medical extractor from an example
extractor = LangStruct(example={
"patient_age": 45,
"chief_complaint": "chest pain",
"diagnosis": "myocardial infarction",
"medication": "aspirin"
})
# Extract from a clinical note
text = """
Patient: 45-year-old male presenting with acute chest pain.
Assessment: Diagnosed with myocardial infarction.
Treatment: Started on aspirin therapy.
"""
result = extractor.extract(text)
print(result.entities)
# {'patient_age': 45, 'chief_complaint': 'chest pain', 'diagnosis': 'myocardial infarction', 'medication': 'aspirin'}

For production healthcare systems, define detailed schemas:

from langstruct import LangStruct, Schema, Field
from typing import List, Optional, Dict
from datetime import datetime
class PatientInfoSchema(Schema):
patient_id: Optional[str] = Field(description="Patient identifier (if present)")
age: Optional[int] = Field(description="Patient age")
gender: Optional[str] = Field(description="Patient gender")
admission_date: Optional[str] = Field(description="Hospital admission date")
discharge_date: Optional[str] = Field(description="Hospital discharge date")
class DiagnosisSchema(Schema):
primary_diagnosis: str = Field(description="Primary medical diagnosis")
secondary_diagnoses: List[str] = Field(description="Additional diagnoses")
icd_codes: List[str] = Field(description="ICD-10 diagnostic codes")
severity: Optional[str] = Field(description="Condition severity (mild/moderate/severe)")
class MedicalRecordSchema(Schema):
# Patient information
patient_info: PatientInfoSchema = Field(description="Patient demographic information")
# Clinical findings
chief_complaint: str = Field(description="Primary reason for visit/admission")
symptoms: List[str] = Field(description="Reported symptoms")
vital_signs: Dict[str, str] = Field(description="Vital sign measurements")
# Diagnoses
diagnoses: DiagnosisSchema = Field(description="Medical diagnoses")
# Treatment
medications: List[str] = Field(description="Prescribed medications")
procedures: List[str] = Field(description="Medical procedures performed")
treatment_plan: List[str] = Field(description="Ongoing treatment recommendations")
# Clinical notes
assessment: str = Field(description="Clinical assessment summary")
prognosis: Optional[str] = Field(description="Expected outcome")
follow_up: List[str] = Field(description="Follow-up instructions")
# Lab results (if present)
lab_results: Optional[Dict[str, str]] = Field(description="Laboratory test results")

Extract key information from clinical notes:

# Create medical data extractor
medical_extractor = LangStruct(
schema=MedicalRecordSchema,
model="gemini-2.5-flash", # Fast and reliable for medical analysis
temperature=0.0, # Zero temperature for consistent medical analysis
use_sources=True # Track sources for validation
)
# Sample clinical note (pre-sanitized for privacy)
clinical_note = """
PATIENT: [PATIENT_NAME_REDACTED] (MRN: [MRN_REDACTED])
DOB: [DATE_REDACTED] AGE: 67 GENDER: Female
ADMISSION DATE: 2024-03-15
CHIEF COMPLAINT:
Chest pain and shortness of breath
HISTORY OF PRESENT ILLNESS:
67-year-old female presented to ED with acute onset chest pain radiating to left arm,
associated with dyspnea and diaphoresis. Symptoms started 2 hours prior to arrival.
Patient has history of hypertension and hyperlipidemia.
VITAL SIGNS:
BP: 145/95 mmHg, HR: 98 bpm, RR: 22/min, O2 Sat: 94%, Temp: 98.6°F
PHYSICAL EXAMINATION:
Cardiovascular: Irregular rhythm, no murmurs
Respiratory: Bilateral crackles at bases
LABORATORY RESULTS:
Troponin I: 2.4 ng/mL (elevated)
CK-MB: 18 ng/mL (elevated)
BNP: 450 pg/mL (elevated)
ASSESSMENT AND PLAN:
PRIMARY DIAGNOSIS: Acute ST-elevation myocardial infarction (STEMI) - Inferior wall
SECONDARY DIAGNOSES:
- Acute heart failure with preserved ejection fraction
- Hypertension
- Hyperlipidemia
ICD-10 CODES:
- I21.19 - ST elevation myocardial infarction involving other coronary vessel
- I50.30 - Unspecified diastolic heart failure
MEDICATIONS:
- Aspirin 325mg daily
- Metoprolol 25mg BID
- Lisinopril 10mg daily
- Atorvastatin 40mg daily
PROCEDURES PERFORMED:
- Cardiac catheterization with PCI to RCA
- Echocardiogram
TREATMENT PLAN:
1. Continue dual antiplatelet therapy
2. Optimize heart failure medications
3. Cardiac rehabilitation referral
4. Follow-up with cardiology in 1 week
PROGNOSIS: Good with appropriate medical management
"""
# Extract medical information
result = medical_extractor.extract(clinical_note)
print("=== Medical Record Analysis ===")
print(f"Primary Diagnosis: {result.entities.diagnoses.primary_diagnosis}")
print(f"Medications: {len(result.entities.medications)} prescribed")
print(f"Procedures: {len(result.entities.procedures)} performed")
print(f"Confidence: {result.confidence:.2f}")
class LabResultSchema(Schema):
patient_id: Optional[str] = Field(description="Patient identifier")
test_name: str = Field(description="Name of laboratory test")
result_value: str = Field(description="Test result value")
reference_range: Optional[str] = Field(description="Normal reference range")
units: Optional[str] = Field(description="Measurement units")
abnormal_flag: Optional[str] = Field(description="High/Low/Critical flag")
class LabReportSchema(Schema):
patient_info: PatientInfoSchema = Field(description="Patient information")
test_date: str = Field(description="Date tests were performed")
lab_results: List[LabResultSchema] = Field(description="Individual test results")
ordering_physician: Optional[str] = Field(description="Physician who ordered tests")
critical_values: List[str] = Field(description="Critical or abnormal results")
lab_extractor = LangStruct(schema=LabReportSchema)
class RadiologyReportSchema(Schema):
patient_info: PatientInfoSchema = Field(description="Patient information")
study_type: str = Field(description="Type of imaging study")
study_date: str = Field(description="Date of imaging study")
indication: str = Field(description="Clinical indication for study")
technique: str = Field(description="Imaging technique used")
findings: List[str] = Field(description="Radiological findings")
impression: str = Field(description="Radiologist's impression/conclusion")
recommendations: List[str] = Field(description="Recommended follow-up")
radiology_extractor = LangStruct(schema=RadiologyReportSchema)
class DischargeSummarySchema(Schema):
patient_info: PatientInfoSchema = Field(description="Patient information")
admission_diagnosis: str = Field(description="Admission diagnosis")
discharge_diagnosis: List[str] = Field(description="Final discharge diagnoses")
hospital_course: str = Field(description="Summary of hospitalization")
discharge_medications: List[str] = Field(description="Medications at discharge")
discharge_instructions: List[str] = Field(description="Patient discharge instructions")
follow_up_appointments: List[str] = Field(description="Scheduled follow-up care")
discharge_disposition: str = Field(description="Where patient was discharged to")
discharge_extractor = LangStruct(schema=DischargeSummarySchema)

When working with medical data, implement appropriate privacy protections:

import re
def sanitize_medical_text(text: str) -> str:
"""Basic sanitization of sensitive information in medical text"""
# Note: This is a basic example - production systems need comprehensive detection
patterns_to_redact = {
'dates': r'\b\d{1,2}/\d{1,2}/\d{4}\b',
'phone': r'\b\d{3}-\d{3}-\d{4}\b',
'mrn': r'\bMRN:?\s*\d+\b',
'names': r'\b[A-Z][a-z]+ [A-Z][a-z]+\b'
}
sanitized = text
for pattern_name, pattern in patterns_to_redact.items():
sanitized = re.sub(pattern, f'[{pattern_name.upper()}_REDACTED]', sanitized, flags=re.IGNORECASE)
return sanitized
# Example usage
sanitized_text = sanitize_medical_text(clinical_note)
result = medical_extractor.extract(sanitized_text)
print("Extracted medical data from sanitized text:")
print(f"Primary diagnosis: {result.entities.diagnoses.primary_diagnosis}")

Process multiple medical records efficiently:

from pathlib import Path
class MedicalRecordProcessor:
def __init__(self):
self.extractor = LangStruct(schema=MedicalRecordSchema)
def process_medical_records(self, records_folder: Path):
"""Process multiple medical records"""
record_files = list(records_folder.glob("*.txt"))
# Prepare documents for batch processing
documents = []
file_names = []
for record_file in record_files:
try:
record_text = record_file.read_text()
sanitized_text = sanitize_medical_text(record_text)
documents.append(sanitized_text)
file_names.append(record_file.name)
except Exception as e:
print(f"Error reading {record_file}: {e}")
# Process all documents in batch
results = self.extractor.extract(documents)
# Format results
processed_results = []
for i, result in enumerate(results):
processed_results.append({
'file': file_names[i],
'primary_diagnosis': result.entities.diagnoses.primary_diagnosis,
'medication_count': len(result.entities.medications),
'confidence': result.confidence,
'processed_at': datetime.now()
})
return processed_results
# Usage
processor = MedicalRecordProcessor()
records = processor.process_medical_records(Path("./medical_records/"))

Medical Accuracy

Specialized for medical terminology and clinical concepts

Comprehensive Data

Extract diagnoses, medications, procedures, and lab results

Source Tracking

Track source locations for validation and verification

Batch Processing

Process large volumes of medical records efficiently

  • Clinical Decision Support - Extract key information for physician review
  • Quality Improvement - Analyze patterns in diagnoses and treatments
  • Research Data Collection - Structure clinical data for medical research
  • Coding and Billing - Extract ICD codes and procedure information
  • Retrospective Studies - Extract data from historical medical records
  • Clinical Trial Screening - Identify eligible patients from EHR data
  • Epidemiological Research - Analyze disease patterns and outcomes
  • Drug Safety Monitoring - Track medication usage and adverse events
  • Claims Processing - Extract relevant information for claim adjudication
  • Medical Review - Structure clinical information for review processes
  • Risk Assessment - Analyze patient risk factors and health status
  • Fraud Detection - Identify inconsistencies in medical documentation
  • EHR Migration - Extract structured data from legacy systems
  • Data Standardization - Convert unstructured notes to structured formats
  • Clinical Documentation - Assist with clinical note summarization
  • Population Health - Analyze patient populations and health trends
  • Use Medical Terminology - Include standard medical terms and abbreviations
  • Handle Optional Fields - Many medical elements may not be present in all records
  • Validate Against Standards - Use ICD-10, CPT, and other medical coding standards
  • Consider Data Types - Use appropriate types for dates, numeric values, and codes
  • Use Advanced Models - Medical analysis benefits from GPT-4 or similar capable models
  • Zero Temperature - Set temperature=0.0 for consistent medical interpretations
  • Source Tracking - Always enable to validate extracted information
  • Human Review - Implement clinical review workflows for critical applications
  • Data Sanitization - Remove or anonymize sensitive information before processing
  • Access Controls - Restrict access to medical data processing systems

Ready to start processing medical records?

  1. Installation - Set up LangStruct for medical applications
  2. Source Grounding - Essential for medical data validation
  3. Optimization - Improve accuracy for medical terminology
  4. API Reference - Complete technical documentation

Start with Examples

Try the sample medical schemas with anonymized records

Implement Privacy Controls

Set up proper PHI detection and anonymization processes

Build Review Workflows

Create clinical review processes for extracted data

Scale Your Processing

Process large medical record collections with batch processing