Sentedel — PHI Detection for EDI Transactions

EDI (X12) transactions contain PHI, but they don’t look like normal text. A patient name lives inside delimited segments like NM1*IL*1*JOHNSON*MICHAEL*T***MI*XKW123456789~, not inside sentences. We benchmarked modern PII detectors on synthetic—but structurally valid—payer-side EDI to measure what breaks and what fixes it.

Finding

General-purpose models fail on EDI

Baseline PII models fragment spans across delimiters and qualifiers, collapsing strict boundary performance.

Fix

Domain fine-tuning works fast

With a few thousand EDI examples, performance jumps from near-zero strict F1 to production-usable accuracy.

Approach

Synthetic EDI + S-tag alignment

Real payer EDI is PHI by definition, so we generated valid 837P transactions with realistic identifiers (names, addresses, DOBs, IDs) placed in the correct segments. We fine-tuned an open-source token classifier and used S-tag alignment to reduce boundary errors caused by subword tokenization.

Why synthetic works here

The model is learning structure (which segments/positions contain identifiers), not memorizing clinical content. Perfect labels are more valuable than “realistic” prose.

Benchmark

Strict vs relaxed matching

For compliance, overlap-based NER scores can hide leakage. We report strict boundary metrics (exact start/end) alongside relaxed overlap to show whether any characters remain exposed.

Swipe table →

Model	Strict Recall	Relaxed Recall (50%)	Strict F1
Sentedel EDI-PHI v1	91.0%	100.0%	76.4%
GLiNER PII Base	49.0%	54.2%	50.3%
NVIDIA GLiNER PII	35.9%	40.9%	39.8%
OpenAI Privacy Filter (baseline)	3.0%	64.4%	1.2%

What we learned

EDI is out-of-distribution

Delimiters, qualifiers, and loop structure break assumptions learned from emails/docs/chat.

Fine-tuning is high leverage

Thousands of examples are enough to learn consistent PHI locations across segments.

False positives are filterable

Many FPs cluster in predictable segment types (service dates, control numbers) and can be removed post-hoc.

Strict metrics matter

Relaxed overlap can look “okay” while still leaking characters that re-identify.

Technical details