By Raman Ramachandran — 15 May 2026

Building a Resilient AI Credentialing Assistant

How I navigated model deprecations, encoding hurdles, and hybrid RAG architectures to build a production-ready healthcare prototype.

In the complex world of healthcare, verifying provider credentials shouldn't be a manual bottleneck. I set out to build an AI assistant that could:

Extract data from medical licenses using multimodal LLMs.
Verify NPI data against federal registries in real-time.
Analyze risk using a custom RAG (Retrieval-Augmented Generation) pipeline fueled by state medical board disciplinary records.

The War Room Lessons

The path to a functional demo on Hugging Face was filled with real-world engineering challenges that separate a "local script" from a "deployed product."

1. The Moving Target of Model IDs

I started with Gemini 2.0, but as the API evolved, I encountered errors.

The Learning: Model aliases like gemini-flash-latest are convenient but can be unstable in production. By pivoting to Gemini 3.1 Flash-Lite, I leveraged the latest sub-second latency models optimized for high-volume data extraction.

2. Data Ingestion & The "Invisible" UTF-8 Bug

A CSV isn't always just a CSV. My RAG pipeline initially crashed because of encoding mismatches from state board exports.

The Learning: Always implement defensive ingestion. Switching to encoding='utf-8-sig' in Pandas allowed the system to handle Excel’s Byte Order Marks (BOM) without crashing the pipeline.

3. Hybrid RAG: Local Embeddings vs. Cloud APIs

When the cloud embedding API hit regional restrictions, I pivoted to a hybrid architecture.

The Learning: I integrated langchain-huggingface to run Sentence-Transformer embeddings locally on the server. This decoupled our semantic search from the Google API, ensuring the "Risk Intelligence" layer remained operational regardless of external API stability.

Technical Stack

Frontend: Streamlit
LLM: Gemini 3.1 Flash-Lite (via Google GenAI SDK)
Vector DB: ChromaDB
Embeddings: all-MiniLM-L6-v2 (Local Transformers)
Deployment: Hugging Face Spaces