Over 250 million Indians engage in code-switched communication every single day. That is not a niche behavior. It is the default. Yet standard monolingual ASR models still suffer roughly 42% word error rates when processing code-switched speech. If your voice AI cannot handle a customer saying "Mujhe loan ki details chahiye, can you email the statement?" without breaking, you are not serving more than half your potential callers.
I have spent years at OnDial working with businesses that wanted "Hindi support" for their voice AI, only to discover that what their customers actually speak is neither Hindi nor English. It is both, blended fluidly, sometimes within the same sentence. The frustration is real. Vendors check the "Hindi" box, demos sound impressive, and then the system collapses the moment a real person from Lucknow or Pune picks up the phone.
Here is what this article covers: what code-switching really means in the context of voice AI, why it breaks most systems, how the technology pipeline actually works when it is done right, and exactly what questions to ask before signing a vendor contract.
What Code-Switching Actually Means (and Why Voice AI Struggles With It)
Code-switching is the practice of alternating between two or more languages during a single conversation, often within a single sentence. In India, this is not slang. It is not laziness. It is how hundreds of millions of bilingual speakers naturally communicate.
Inter-sentential vs. Intra-sentential Switching
There are two main types, and the distinction matters for AI performance.
Inter-sentential switching happens between sentences. A customer might say one sentence in Hindi and the next in English. This is the easier type for voice AI to handle because each sentence stays in one language. The system gets a clean boundary to detect the shift.
Intra-sentential switching is where things get difficult. This is mid-sentence mixing: "Mujhe flight book karni hai" (I need to book a flight), where Hindi grammar wraps around an English noun. There is no clean boundary. No pause. No signal. The speaker's brain handles this effortlessly. Most ASR models do not.
(Here is the part that surprises people: intra-sentential switching is actually the more common type in Indian business conversations.)
Why This Matters for Indian Businesses
A consumer finance brand analyzed its actual call recordings and found that Hindi-English code-switching covered 31% of their customer base. Hindi-only covered just 22%. English-only was a mere 6%. A Hindi-English deployment, if it truly handles code-switching, covers 53% of callers. A Hindi-only deployment? You are leaving the majority unserved.
If your contact center or sales operation touches customers in urban or semi-urban India, code-switching is not a feature request. It is a baseline requirement.
The Three-Layer Problem: How AI Voice Agents Process Code-Switching

Most people think of voice AI as a single technology. It is not. AI voice agents code-switching depends on three distinct layers working together. A failure in any one of them breaks the entire conversation.
ASR: Hearing Two Languages at Once
The ASR (Automatic Speech Recognition) layer converts spoken words into text. For code-switched speech, the ASR must detect language transitions in real time, sometimes mid-word. It needs to handle mixed-script phonemes: Hindi phonological patterns with English loanwords blended together.
Monolingual ASR models, even excellent ones, collapse here. They are trained to expect one language at a time. When a speaker says "payment schedule bhej do by email," the model might transcribe "payment" and "email" accurately but garble the Hindi verbs connecting them.
The industry is moving toward single models trained on multilingual and code-switched data rather than routing calls to separate language-specific models. This avoids the latency and error of language detection as a separate step.
NLU: Understanding Intent Across Languages
Even if ASR transcribes code-switched speech correctly, the NLU (Natural Language Understanding) layer must extract the right intent. When a customer says "Mujhe refund kab milega?" (When will I get my refund?), the key entity "refund" is in English while the question structure is Hindi. The NLU must treat this as one unified request, not two language fragments.
I have personally seen NLU systems split a single Hinglish sentence into two separate intent classifications, producing nonsensical responses. This is what happens when the language model was trained on clean, monolingual data and then deployed on real Indian conversations.
TTS: Speaking Back in the Same Mix
This is the layer most vendors ignore. Your voice agent does not just need to understand code-switching. It needs to produce it.
A customer speaks in Hinglish. The agent responds in formal English. That mismatch signals to the caller that they are talking to a machine. The TTS must generate "Sir, aapka EMI due hai on the 15th" as a single, natural utterance, not English-voice stitched to Hindi-voice. Research published in Frontiers in Computer Science confirmed that synthesis method significantly affects how bilingual listeners perceive and comprehend code-switched TTS output.
Where Most Voice AI Systems Break Down
The "Please Choose a Language" Failure
Have you ever called a helpline and heard "Press 1 for English, 2 for Hindi"? That is the old IVR model. But some AI voice agents do something worse. They encounter Hinglish, fail to parse it, and ask the customer to "please choose one language." This is a forced restart. It breaks the conversation flow and tells the customer the system does not understand how they naturally speak.
Production-grade voice AI handles mid-utterance code-switching without any restart, with full intent capture, and responds in the same mixed register the customer used.
Domain Vocabulary in Mixed-Language Contexts
General-purpose multilingual models still struggle with domain-specific vocabulary. A voice agent handling loan collections needs to understand "EMI," "overdue," "principal," and "moratorium" as they appear in Hinglish, not just in clean English.
In projects I have worked on at OnDial, we have found that fine-tuning on domain-specific, code-switched data is what separates voice AI that works in production from demos that work in controlled settings. The AI4Bharat IndicVoices dataset, with 23,700 hours of speech from 51,000 speakers across 22 languages, represents a major step forward, but domain-specific data remains the critical bottleneck.
How to Evaluate Code-Switching in a Voice AI Vendor
This is the section I wish someone had written for me three years ago. If you are evaluating multilingual voice agents for India, do not trust demo scripts. Run these tests instead.
The Five Tests That Reveal the Truth
1. The financial code-switch test. Say: "Loan ki EMI kitni hogi? Can you send the repayment schedule by email?" The system must handle the Hindi financial query, the English channel preference, and the implicit output format instruction as a single request.
2. The stress switch test. Begin in English. Switch to Hindi at the moment of frustration. Does the system recognize the shift? Does it respond in the language you switched to, or does it ignore the change?
3. The dialect probe. Use a speaker from Lucknow and a speaker from Mumbai with the same script. If accuracy drops more than 5 percentage points between them, the model is not dialect-aware.
4. The silence and repair test. Pause for four seconds mid-call, then continue. Does the system hold context or start over?
5. The register matching test. Open in Hinglish. Does the agent respond in Hinglish? Switch to pure Hindi. Does the agent follow? The agent should match the caller's register, not force its own.
Accuracy Benchmarks That Actually Matter
Do not accept a single "accuracy" number. Ask for Word Error Rate (WER) broken down by language condition: clean Hindi, clean English, and code-switched Hinglish. In 2026, best-in-class Hindi ASR achieves WER of 8-12% on clean telephony audio. For code-switched speech, expect higher numbers, but anything above 20% in a production environment will create noticeable conversation failures.
Should you really invest in code-switching support? If your customer base includes urban or semi-urban Indian callers, the answer is unambiguous: yes. The ROI data supports it. Gartner projects that conversational AI will reduce contact center agent labor costs by $80 billion globally in 2026. Voice AI costs roughly $0.40 per call compared to $7-$12 for human agents. But those savings only materialize if the AI can actually handle how your customers talk.
Conclusion
AI voice agents and code-switching are inseparable topics for any business operating in India. Three things matter most: your system must process code-switched speech through all three layers (ASR, NLU, TTS) without forced restarts, your vendor must demonstrate performance on real Hinglish data rather than clean monolingual demos, and domain-specific fine-tuning is the difference between a demo and a deployment.
You do not need to accept vendor claims at face value. You now have the tests to run, the benchmarks to request, and the questions to ask.
At OnDial, we build voice AI solutions grounded in how Indian customers actually communicate, not how textbooks say they should. If you are evaluating voice AI for a multilingual Indian customer base, we would welcome the conversation. Visit OnDial and let us walk through your specific use case together.
AI voice agents that handle Hindi-English code-switching process mixed-language speech through unified ASR, NLU, and TTS layers, enabling natural conversations that match how over 250 million Indians actually communicate every day.


