AI Agents for healthcare: How to Automate fraud detection (single-agent with LlamaIndex)
Healthcare fraud teams spend too much time triaging claims that should have been flagged automatically: duplicate billing, upcoding, phantom services, and suspicious provider patterns. A single-agent setup with LlamaIndex gives you a controlled way to ingest claims, policy docs, and historical case notes, then route suspicious records to investigators with evidence attached.
The point is not to replace SIU analysts or compliance staff. It is to reduce manual review volume, shorten detection time, and make every alert traceable back to source data.
The Business Case
- •
Reduce manual claim review by 30-50%
- •In a mid-size payer or provider network, fraud analysts often spend 10-20 minutes per suspicious claim gathering context from EHR exports, billing notes, prior adjudication history, and policy PDFs.
- •A single-agent workflow can cut that to 3-5 minutes by prefetching evidence and summarizing why the claim looks abnormal.
- •
Lower avoidable fraud leakage by 5-12% in the pilot population
- •If your organization is losing $2M-$10M annually to preventable overpayments or abusive billing patterns, even a narrow pilot on high-risk CPT/HCPCS codes can recover $100K-$500K in the first quarter.
- •Focus on high-frequency areas like durable medical equipment, behavioral health, telehealth, and outpatient lab claims.
- •
Improve alert precision from ~40-60% to ~70-85%
- •Most rule-based systems generate noisy queues that burn analyst time.
- •With retrieval over claims history and policy documents, the agent can attach context that separates genuine anomalies from legitimate edge cases like complex oncology care or chronic condition management.
- •
Cut investigation turnaround from days to hours
- •A good target is same-day triage for flagged claims instead of a 2-5 day queue.
- •That matters when you need timely recoupment decisions, provider outreach, or pre-payment suspension under internal control policies.
Architecture
A single-agent design is enough for a first production pilot. Keep it simple: one orchestrator agent, deterministic tools, and strict retrieval boundaries.
- •
Ingestion layer
- •Pull claims data from your core admin system, EHR extracts, SIU case management system, and policy repositories.
- •Normalize ICD-10-CM, CPT, HCPCS, NPI, DRG, place-of-service codes, denial reasons, and prior authorization metadata.
- •Use ETL jobs plus document parsing for PDFs and scanned medical records.
- •
Retrieval layer with LlamaIndex + pgvector
- •Store claim summaries, provider profiles, policy excerpts, and prior fraud cases in
pgvector. - •Use LlamaIndex as the retrieval engine for semantic search across unstructured evidence.
- •Add metadata filters for payer line of business, state Medicaid plan rules, date range, specialty type, and member eligibility status.
- •Store claim summaries, provider profiles, policy excerpts, and prior fraud cases in
- •
Single-agent decision layer
- •Use one LLM-backed agent to classify risk, explain the signal, and recommend next action: auto-clear, route to analyst, or escalate to compliance.
- •LangChain works well for tool calling; LangGraph is useful if you want a controlled state machine for review steps without turning this into a multi-agent system.
- •Keep the agent on a short leash: retrieval only from approved sources; no free-form web access.
- •
Audit and governance layer
- •Log every prompt, retrieved document ID, output score, and analyst override into an immutable audit store.
- •Align controls with HIPAA minimum necessary access requirements and SOC 2 logging expectations.
- •If you process EU patient data or cross-border member records, add GDPR retention controls and data subject handling workflows.
Example flow
- •Claim arrives with diagnosis/procedure combinations that look inconsistent.
- •The agent retrieves similar historical claims, provider specialty data, prior denials/appeals, and applicable billing rules.
- •It scores risk based on patterns like duplicate services within a short window or impossible service-location combinations.
- •It returns an explanation with citations so the analyst can verify quickly.
What Can Go Wrong
| Risk | Why it matters in healthcare | Mitigation |
|---|---|---|
| Regulatory exposure | The agent may surface PHI in logs or over-share member details in summaries. That creates HIPAA risk immediately. | Enforce role-based access control, redact PHI in prompts where possible, encrypt at rest/in transit, and maintain audit trails. Run privacy reviews before production. |
| Reputation damage | False positives against legitimate providers can trigger complaints or contract disputes. In healthcare billing this gets political fast. | Start with human-in-the-loop review for all adverse actions. Tune thresholds on a narrow set of high-confidence fraud patterns before expanding. |
| Operational failure | Bad ingestion quality leads to missed signals: wrong NPI mapping, incomplete claims history, stale policy versions. | Build data validation checks upstream. Version policy documents and retrain retrieval indexes weekly or daily depending on claim volume. |
A practical note: do not treat this like Basel III-style capital modeling where you can tolerate slower governance cycles after launch. Healthcare fraud detection touches payment integrity and patient trust at the same time; your controls need to be live before scale-up.
Getting Started
- •
Pick one narrow use case
- •Start with one line of business: Medicare Advantage outpatient claims or commercial DME billing are good candidates.
- •Choose one fraud pattern only: duplicate billing is easier than generalized anomaly detection.
- •Timeline: 2 weeks for scoping with a small team of 1 product owner, 1 data engineer, 1 ML engineer/agent developer, and 1 compliance lead.
- •
Build the evidence corpus
- •Collect six months of claims history plus denial reasons, provider master data, coding policies (CMS manuals if applicable), SIU case notes where allowed by policy, and relevant contract language.
- •Clean identifiers carefully: member ID hashing does not excuse bad joins.
- •Timeline: 2-4 weeks.
- •
Pilot the single-agent workflow
- •Implement retrieval in LlamaIndex with
pgvector, then wrap it with LangChain tool calls for structured lookups. - •Keep outputs constrained:
- •risk score
- •top three reasons
- •cited source IDs
- •recommended action
- •Run it side-by-side with analysts for at least one month on a few thousand claims.
- •Implement retrieval in LlamaIndex with
- •
Measure hard outcomes before expanding
- •Track precision/recall on flagged claims, average analyst handling time, overpayment dollars identified, false positive rate by specialty, and percentage of cases resolved without escalation.
- •If the pilot does not save at least one full-time analyst equivalent per month or improve recovery rates materially within 60-90 days, stop and fix the data model before scaling.
For most healthcare organizations I work with, the right first deployment is small: one agent, one queue, one clear fraud pattern, and strict governance from day one. That gets you real operational value without creating another opaque model that compliance has to clean up later.
Keep learning
- •The complete AI Agents Roadmap — my full 8-step breakdown
- •Free: The AI Agent Starter Kit — PDF checklist + starter code
- •Work with me — I build AI for banks and insurance companies
By Cyprian Aarons, AI Consultant at Topiax.
Want the complete 8-step roadmap?
Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.
Get the Starter Kit