# Sentedel

> Secure what's private.

Sentedel builds privacy filter models that detect and redact Protected Health Information (PHI) in raw EDI (Electronic Data Interchange) healthcare transactions.

## What We Do

Sentedel fine-tunes language models using proprietary training data and a custom training pipeline to create PHI detection models purpose-built for X12 EDI formatting. General-purpose PII models fail on EDI data because of its unique structure. Our models are trained specifically for this domain.

We selectively open-source our models and benchmarks.

## Technical Blog

- **Featured post**: `sentedel_blog_post.html`
- **Title**: PHI Detection for EDI Transactions.
- **Summary**: Building the first PHI detection benchmark for payer-side healthcare EDI — and why general-purpose models fail dramatically on structured transaction data.

## Open-source model

- **Weights**: `models/sentedel-edi-phi-v1/` (`model.safetensors`, `config.json`, `tokenizer.json`) — local only, not on public site
- **Public playground**: `demo-public.html` (in-browser; OpenAI Privacy Filter via Transformers.js)

## Current Model

**Sentedel EDI-PHI v1**

An MoE (Mixture of Experts) model fine-tuned for healthcare EDI PHI detection.

### Benchmark Results

Evaluated on an open-source benchmark of 500 EDI test transactions across 13 PHI categories.

| Model                    | Strict Recall | Relaxed Recall (50% Overlap) | Strict F1 |
|--------------------------|---------------|------------------------------|-----------|
| Sentedel EDI-PHI v1      | 91.0%         | 100.0%                       | 76.4%     |
| GLiNER PII Base          | 49.0%         | 54.2%                        | 50.3%     |
| NVIDIA GLiNER PII        | 35.9%         | 40.9%                        | 39.8%     |
| OpenAI Privacy Filter    | 3.0%          | 64.4%                        | 1.2%      |

### Key Results

- **100% relaxed recall**: v1 caught 11,000 out of 11,002 PHI elements (99.98%) across 13 categories and 500 transactions.
- **91% strict recall**: S-tag label mapping fixed subword tokenization boundary errors.
- **Training efficiency**: Dynamic sequence packing and LoRA optimization reduced training time by 62%.

## Technical Approach

- Fine-tuning open-source base models with proprietary synthetic EDI training data
- Custom training pipeline with S-tag labeling (instead of standard BIOES) for accurate subword boundary detection
- Diverse synthetic data generation covering hyphenated names, PO boxes, varied CPT arrays, and other EDI-specific patterns
- LoRA optimization and dynamic sequence packing for efficient training

## Contact

Website: https://sentedel.com