# Sentedel > Secure what's private. Sentedel builds privacy filter models that detect and redact Protected Health Information (PHI) in raw EDI (Electronic Data Interchange) healthcare transactions. ## What We Do Sentedel fine-tunes language models using proprietary training data and a custom training pipeline to create PHI detection models purpose-built for X12 EDI formatting. General-purpose PII models fail on EDI data because of its unique structure. Our models are trained specifically for this domain. We selectively open-source our models and benchmarks. ## Technical Blog - **Featured post**: `sentedel_blog_post.html` - **Title**: PHI Detection for EDI Transactions. - **Summary**: Building the first PHI detection benchmark for payer-side healthcare EDI — and why general-purpose models fail dramatically on structured transaction data. ## Open-source model - **Weights**: `models/sentedel-edi-phi-v1/` (`model.safetensors`, `config.json`, `tokenizer.json`) — local only, not on public site - **Public playground**: `demo-public.html` (in-browser; OpenAI Privacy Filter via Transformers.js) ## Current Model **Sentedel EDI-PHI v1** An MoE (Mixture of Experts) model fine-tuned for healthcare EDI PHI detection. ### Benchmark Results Evaluated on an open-source benchmark of 500 EDI test transactions across 13 PHI categories. | Model | Strict Recall | Relaxed Recall (50% Overlap) | Strict F1 | |--------------------------|---------------|------------------------------|-----------| | Sentedel EDI-PHI v1 | 91.0% | 100.0% | 76.4% | | GLiNER PII Base | 49.0% | 54.2% | 50.3% | | NVIDIA GLiNER PII | 35.9% | 40.9% | 39.8% | | OpenAI Privacy Filter | 3.0% | 64.4% | 1.2% | ### Key Results - **100% relaxed recall**: v1 caught 11,000 out of 11,002 PHI elements (99.98%) across 13 categories and 500 transactions. - **91% strict recall**: S-tag label mapping fixed subword tokenization boundary errors. - **Training efficiency**: Dynamic sequence packing and LoRA optimization reduced training time by 62%. ## Technical Approach - Fine-tuning open-source base models with proprietary synthetic EDI training data - Custom training pipeline with S-tag labeling (instead of standard BIOES) for accurate subword boundary detection - Diverse synthetic data generation covering hyphenated names, PO boxes, varied CPT arrays, and other EDI-specific patterns - LoRA optimization and dynamic sequence packing for efficient training ## Contact Website: https://sentedel.com