news.shamcode.ru | maziyarpanahi / openmed

maziyarpanahi / openmed

среда, 10 июня 2026 г. в 00:00:02

https://github.com/maziyarpanahi/openmed

open-source healthcare ai

Local-first healthcare AI that never leaves the device

Turn clinical text into structured insight with one line of code.
Entity extraction, PII de-identification, and 1,000+ specialized medical models that run entirely on your own hardware — from a one-liner in Python to a native Swift app on iPhone, powered by Apple MLX. No cloud. No vendor lock-in. No patient data leaving your network.

1,000+ models · 12 languages · 247 PII checkpoints · 100% on-device · Apache-2.0

English · 简体中文 · Español · Français · Deutsch · Italiano · Português · Nederlands · العربية · हिन्दी · తెలుగు · 日本語 · Türkçe · فارسی

See it in action

_{Real-time PII de-identification — the Nemotron Privacy Filter redacting names, addresses, IDs, and billing data from a clinical discharge packet, entirely on-device. (All values shown are synthetic.)}

30-second example

from openmed import analyze_text

result = analyze_text(
    "Patient started on imatinib for chronic myeloid leukemia.",
    model_name="disease_detection_superclinical",
)

for entity in result.entities:
    print(f"{entity.label:<12} {entity.text:<28} {entity.confidence:.2f}")
# DISEASE      chronic myeloid leukemia     0.98
# DRUG         imatinib                     0.95

A state-of-the-art clinical NER model running locally — no API key, no network call.

Why OpenMed?

	OpenMed	Cloud medical APIs
Runs on your device / servers	✅	❌
Patient data leaves your network	Never	Sent to the vendor
Cost	Free & open-source	Per-call pricing
Specialized medical models	1,000+	Limited
Languages	12+	Varies
Offline / air-gapped	✅	❌
Apple Silicon (MLX) acceleration	✅	n/a
Native iOS / macOS apps	✅ via OpenMedKit	❌
Vendor lock-in	None — Apache-2.0	Yes

Specialized models — 1,000+ curated biomedical & clinical models, many outperforming proprietary stacks.
HIPAA-aware de-identification — all 18 Safe Harbor identifiers, smart entity merging, format-preserving fakes.
Runs everywhere — CPU, CUDA, Apple Silicon (MLX), and natively in iOS/macOS apps via OpenMedKit.
One-line deployment — Python API, Dockerized REST service, or batch pipelines.
Zero lock-in — Apache-2.0, your infrastructure, your data.

On-device on Apple — Swift, MLX & iOS

OpenMed is built to run where your data already lives. On Apple hardware it accelerates with MLX, and it ships straight into iPhone, iPad, and Mac apps through OpenMedKit — so PII detection and clinical extraction happen fully offline, on the device.

// Add OpenMedKit to your app
dependencies: [
    .package(url: "https://github.com/maziyarpanahi/openmed.git", from: "1.5.5"),
]

MLX runtime for PII token classification, the Privacy Filter family, and experimental GLiNER-family zero-shot tasks — with a CoreML fallback path.
One model name, every platform — MLX model names automatically fall back to the matching PyTorch checkpoint on non-Apple hardware.
Python on Apple Silicon too: pip install "openmed[mlx]".

Guides: MLX backend · OpenMedKit (Swift) · CoreML export

How it works

flowchart LR
    A["Clinical text"] --> B["OpenMed<br/>(100% on-device)"]
    B --> C["Medical entities"]
    B --> D["PII detected"]
    B --> E["De-identified text"]
    style B fill:#0D6E6E,stroke:#0A5656,stroke-width:2px,color:#ffffff
    style C fill:#D6EBEB,stroke:#0D6E6E,color:#0E1116
    style D fill:#F7DCD8,stroke:#C5453A,color:#0E1116
    style E fill:#F5E27A,stroke:#A9A088,color:#0E1116

Quick start

# Core + Hugging Face runtime (Linux, macOS, Windows; CPU or CUDA)
pip install "openmed[hf]"

# Add the REST service
pip install "openmed[hf,service]"

# Apple Silicon acceleration (MLX)
pip install "openmed[mlx]"

Python API

from openmed import analyze_text

analyze_text(
  "Patient received 75mg "
  "clopidogrel for NSTEMI.",
  model_name=
  "pharma_detection_superclinical",
)

REST service

uvicorn openmed.service.app:app \
  --host 0.0.0.0 --port 8080

GET /health POST /analyze POST /pii/extract POST /pii/deidentify

Batch

from openmed import BatchProcessor

p = BatchProcessor(
  model_name=
  "disease_detection_superclinical",
  group_entities=True,
)
p.process_texts([...])

Offline / air-gapped? Point model_name (or model_id) at a local directory and OpenMed loads it without contacting the Hugging Face Hub:

from openmed import OpenMedConfig, analyze_text

result = analyze_text(
    "Patient presents with chronic myeloid leukemia and Type 2 diabetes.",
    model_id="./models/OpenMed-NER-DiseaseDetect-SuperClinical-434M",
    config=OpenMedConfig(device="cpu"),
)

Models

A curated registry of specialized medical NER models — browse the full catalog.

Model	Specialization	Entity types	Size
`disease_detection_superclinical`	Disease & conditions	DISEASE, CONDITION, DIAGNOSIS	434M
`pharma_detection_superclinical`	Drugs & medications	DRUG, MEDICATION, TREATMENT	434M
`pii_superclinical_large`	PII & de-identification	NAME, DATE, SSN, PHONE, EMAIL, ADDRESS	434M
`anatomy_detection_electramed`	Anatomy & body parts	ANATOMY, ORGAN, BODY_PART	109M
`gene_detection_genecorpus`	Genes & proteins	GENE, PROTEIN	109M

Privacy: PII detection & de-identification

from openmed import extract_pii, deidentify

text = "Patient: John Doe, DOB: 01/15/1970, SSN: 123-45-6789"

# Extract PII with smart merging (prevents tokenization fragmentation)
result = extract_pii(text, model_name="pii_superclinical_large", use_smart_merging=True)

# De-identify with the method you need
deidentify(text, method="mask")     # [NAME], [DATE]
deidentify(text, method="replace")  # Faker-backed, locale-aware, format-preserving fakes
deidentify(text, method="hash")     # Cryptographic hashing
deidentify(text, method="shift_dates", date_shift_days=180)

Smart entity merging keeps 01/15/1970 whole instead of fragmenting it.
Faker-backed obfuscation with custom clinical-ID providers (CPF, CNPJ, BSN, NIR, Codice Fiscale, NIE, Aadhaar, Steuer-ID, NPI).
HIPAA: all 18 Safe Harbor identifiers, configurable confidence thresholds.
Batch PII (v1.5.5): extract or de-identify across many documents with BatchProcessor(operation="extract_pii" | "deidentify", batch_size=16).

Complete PII notebook · Smart merging · Anonymization

Privacy Filter family — three model families on the OpenAI Privacy Filter architecture

Same model code (gpt-oss-style sparse-MoE transformer with local attention, sink tokens, RoPE+YaRN, tiktoken o200k_base), different training data. All route through the same extract_pii() / deidentify() API — only model_name= changes.

Variant	PyTorch (CPU + CUDA)	MLX (Apple Silicon)	MLX 8-bit
OpenAI Privacy Filter	`openai/privacy-filter`	`OpenMed/privacy-filter-mlx`	`…-mlx-8bit`
Nemotron-PII fine-tune	`OpenMed/privacy-filter-nemotron`	`…-nemotron-mlx`	`…-nemotron-mlx-8bit`
OpenMed Multilingual	`OpenMed/privacy-filter-multilingual`	`…-multilingual-mlx`	`…-multilingual-mlx-8bit`

from openmed import extract_pii

text = "Patient Sarah Connor (DOB: 03/15/1985) at MRN 4471882."

extract_pii(text, model_name="openai/privacy-filter")              # PyTorch baseline
extract_pii(text, model_name="OpenMed/privacy-filter-nemotron")    # same code, different weights
extract_pii(text, model_name="OpenMed/privacy-filter-mlx")         # Apple Silicon (MLX)

On non-Apple-Silicon hosts, MLX model names are automatically substituted with the matching PyTorch checkpoint (with a one-time warning) — ship one model name, run anywhere. See Privacy Filter architecture & backend routing.

Multilingual PII (12 languages)

Extraction and de-identification across en, fr, de, it, es, nl, hi, te, pt, ar, ja, and tr — 247 PII checkpoints total.

python -c "from openmed import extract_pii; print([(e.label, e.text) for e in extract_pii('Dr. Pedro Almeida, CPF: 123.456.789-09, email: pedro@hospital.pt', lang='pt').entities])"

Show per-language examples (Portuguese, Dutch, Hindi, Arabic, Japanese, Turkish)

from openmed import extract_pii

portuguese = extract_pii("Paciente: Pedro Almeida, CPF: 123.456.789-09, telefone: +351 912 345 678", lang="pt", use_smart_merging=True)
dutch      = extract_pii("Patiënt: Eva de Vries, BSN: 123456782, telefoon: +31 6 12345678", lang="nl", use_smart_merging=True)
hindi      = extract_pii("रोगी: अनीता शर्मा, फोन: +91 9876543210, पता: नई दिल्ली 110001", lang="hi", use_smart_merging=True)
arabic     = extract_pii("المريضة ليلى حسن، الهاتف +20 10 1234 5678، الرقم القومي 29801011234567.", lang="ar", use_smart_merging=True)
japanese   = extract_pii("患者 佐藤 花子、電話 +81 90 1234 5678、マイナンバー 1234 5678 9012.", lang="ja", use_smart_merging=True)
turkish    = extract_pii("Hasta Ayşe Yılmaz, telefon +90 532 123 45 67, TCKN 10000000146.", lang="tr", use_smart_merging=True)

for r in (portuguese, dutch, hindi, arabic, japanese, turkish):
    print([(e.label, e.text) for e in r.entities])

REST API

A Docker-friendly FastAPI service with request validation, shared pipeline preload, and unified error envelopes.

pip install "openmed[hf,service]"
uvicorn openmed.service.app:app --host 0.0.0.0 --port 8080

# or with Docker
docker build -t openmed:1.5.5 .
docker run --rm -p 8080:8080 -e OPENMED_PROFILE=prod openmed:1.5.5

curl -X POST http://127.0.0.1:8080/pii/extract \
  -H "Content-Type: application/json" \
  -d '{"text":"Paciente: Maria Garcia, DNI: 12345678Z","lang":"es"}'

Model lifecycle (v1.5.5): free memory on demand with GET /models/loaded, POST /models/unload, and a keep_alive idle window:

OPENMED_SERVICE_KEEP_ALIVE=10m uvicorn openmed.service.app:app --host 0.0.0.0 --port 8080
curl -X POST http://127.0.0.1:8080/models/unload -H "Content-Type: application/json" -d '{"all":true}'

See the full REST service guide.

Documentation

Full guides at openmed.life/docs.


Getting Started	Analyze Text	Model Registry
PII Detection Guide	Anonymization	Batch Processing
Configuration Profiles	REST Service	MLX Backend

Meet the mascot

OpenMed's guardian is a fluffy Persian cat styled as a tiny Avicenna (Ibn Sina) — the great Persian physician whose Canon of Medicine was the world's standard medical text for some 600 years. He keeps watch over the open book of medical knowledge, in a palette built around Persian turquoise (fīrūza): a local-first guardian for your most private data.

Contributing

Contributions welcome — bug reports, feature requests, and PRs alike.

Open an issue
Translations welcome — help complete the other-language READMEs linked in the switcher at the top.

Credits

OpenMed builds on excellent open-source work — particular thanks to OpenAI (the Privacy Filter architecture), NVIDIA (the Nemotron PII dataset), Hugging Face (transformers & the model ecosystem), Apple (MLX), and the Faker maintainers.

License

Released under the Apache-2.0 License.

Citation

@misc{panahi2025openmedneropensourcedomainadapted,
      title={OpenMed NER: Open-Source, Domain-Adapted State-of-the-Art Transformers for Biomedical NER Across 12 Public Datasets},
      author={Maziyar Panahi},
      year={2025},
      eprint={2508.01630},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2508.01630},
}

Star History

If OpenMed is useful to you, a star helps others discover it.

Built by the OpenMed team

Website · Docs · X / Twitter · LinkedIn