Platform · Languages & voice engine

Built for how India actually talks.

Pick a primary language and Vocily AI wires the speech-to-text model, voice, and fallback chain to match. 17+ Indian languages, two code-mix auto-modes, and per-language voice binding so your bot stops sounding like an English voice attempting Hindi.

Problem · Solution

The problem today

Most voice AI platforms treat Indian languages as an afterthought — a Hindi voice that's really an English voice with an accent, an STT model that mangles 'Bansal' and 'Bhansali' the same way, and zero awareness when a caller code-mixes mid-sentence. The result is bots that sound foreign on the very calls that need to feel local. Provider config is the other half of the pain: choosing between Sarvam and Deepgram, picking the right Cartesia voice for Tamil, configuring fallbacks if one provider hiccups — every team rebuilds this stack from scratch.

How Vocily AI handles it

  • Language-first agent setup

    Pick your callers' primary language and Vocily AI recommends the best STT model, the right TTS voice, and a sensible fallback. Override anything; the defaults are coherent.

  • 17+ Indian languages with code-mix auto-modes

    Hindi, English (Indian/US/UK), Bengali, Gujarati, Kannada, Malayalam, Marathi, Odia, Punjabi, Tamil, Telugu, Urdu, Assamese — plus Hinglish auto and Indian Multilingual auto powered by Sarvam saaras:v3.

  • Mid-call language switching

    When a caller flips from English to Hindi at turn three, the agent follows — hot-swapping STT and the TTS voice mid-conversation. No reconnect, no awkward pause.

  • Per-language TTS voice mapping

    Bind a native Hindi voice for Hindi turns and a separate English voice for English turns inside the same agent. Stops the 'English voice attempting Hindi' problem.

  • Vendor retry + automatic fallback

    When a provider hiccups, Vocily AI retries silently with a spoken 'one moment' so callers never hear dead silence. If Sarvam lags more than 5 seconds, Deepgram picks up — buffered audio replayed, switchover silent.

What's in it

What the voice engine ships with.

The configuration surface that lets you tune an agent for real Indian conversations — not just toggle on a generic 'Hindi mode.'

Languages live

17+ Indian and English variants supported out of the box.

Hindi
Native Hindi STT + voices.
English
Indian English, US, UK accents.
Indian languages
Bengali · Gujarati · Kannada · Malayalam · Marathi · Odia · Punjabi · Tamil · Telugu · Urdu · Assamese
Code-mix modes
Hinglish auto (mixes Hindi + English mid-sentence) and Indian Multilingual auto (handles whatever the caller throws).
Engine
Sarvam saaras:v3 for code-mix; provider-best for monolingual.

Voice & accent

Voices tuned per language, not borrowed across them.

Per-language voice
Bind a distinct voice per language inside the same agent.
Accent profiles
Indian English accents available where the customer matters.
Providers routed
Cartesia · ElevenLabs · Sarvam · Smallest — picked per language by default, overridable.
Pronunciation dictionary
Custom word → spoken-form mapping, provider-agnostic. Brand names, customer surnames, technical jargon.

Accuracy on your domain

STT trained on your vocabulary, not just generic Indian speech.

Keyword boosting
Custom vocabulary bias per agent — brand names, product SKUs, customer-name lists, domain terms.
Examples
'Vocily AI', 'Cal.com', customer-name CSV, product codes — never get garbled.
Scope
Per-agent, so different agents in the same workspace can bias different vocabularies.

Number, currency & date formatting

How the agent speaks numbers — natural or formal — per agent.

Phone numbers
Digit-by-digit or grouped.
Currency
'twenty-three fifty' vs 'twenty-three rupees and fifty paise' vs '₹23.50'.
Dates
'March 5th' vs 'the fifth of March' vs '5/3/2026'.
Times
12-hour, 24-hour, conversational ('half past three').

LLM model choice

Pick the reasoning model that fits the use case and budget. STT, TTS, and LLM all run on Vocily-managed provider routing.

OpenAI
GPT-4o-mini · GPT-4o · GPT-5.4-mini — selectable per agent.
Anthropic
Claude on the Vocily-managed routing layer.
Google
Gemini on the Vocily-managed routing layer.
Automatic fallback
Configure a backup model; if the primary errors or times out, the platform switches mid-turn.
Pricing
Per-minute voice and per-message chat — model cost included in the rate. No separate provider bills to manage.

Resilience & low-latency modes

What happens when providers blink — and how to skip the pipeline when you need speed.

Vendor retry
Automatic retry on transient STT/TTS/LLM failures with a spoken 'one moment' so callers don't hear silence.
STT fallback chain
Lag-triggered: if primary STT falls 5s+ behind, a backup provider takes over with buffered audio replayed.
Speech-to-Speech mode
Ultra-low-latency S2S via OpenAI Realtime or Gemini Live — skip STT → LLM → TTS for native interruption handling.

Common questions

What teams ask before they switch.

Yes — if the caller flips from English to Hindi between turns, the agent matches: STT swaps, the TTS voice swaps, the call continues. No reconnect, no replay.