How AI Agents Hand Over: Agent-to-Agent Transfer, Human Escalation, and Live Supervision
Anuprash Gupta
Product & Platform, Zoice

The hardest moment in any customer conversation is the handover. Human call centers learned this decades ago: a cold transfer where the caller repeats everything to a new person is where satisfaction goes to die. AI-run conversations have the same failure mode — plus a new one, where a single overloaded agent tries to be the billing desk, the scheduler, and the complaints team all at once, and does each job worse than a specialist would.
This post covers how handover actually works on the Zoice voice platform: warm AI-to-AI transfers, escalation to human phone numbers, the trigger scenarios that decide when each happens, and the live supervision tools that let a human operator step into a conversation without ending it.
Why One Mega-Prompt Agent Fails at Scale
The first version of most deployments is a single agent with a long prompt: greet the caller, answer product questions, handle billing disputes, book appointments, take complaints. It demos well. Then real traffic arrives and three problems compound:
- Instruction dilution. The more jobs in one prompt, the weaker each instruction's pull on the model. Edge-case billing rules buried under scheduling logic get ignored exactly when they matter.
- Untestable changes. Fixing the refund flow means editing a prompt that also runs scheduling — every change risks regressions in unrelated conversations, so teams stop iterating.
- No clean ownership. When the billing team and the support team share one prompt, nobody owns its behavior, and prompt edits turn into merge conflicts between departments.
The fix mirrors how good call centers are organized: specialists plus a disciplined transfer protocol. Each agent gets a short, focused prompt for one domain, and built-in transfer tools move the conversation to whoever should own it next — another AI agent or a human.
Key Insight
One mega-prompt agent that tries to do everything degrades at scale — specialist agents connected by transfer tools do not. Two built-in tools, transfer_agent and transfer_number, route conversations between AI specialists and human staff based on natural-language trigger scenarios, while live supervision (Whisper, text take-over, voice takeover) lets operators step in mid-conversation without ending the AI session.
On this page
The Two Transfer Tools
Zoice agents ship with two built-in tools that the LLM can invoke mid-conversation, the same way it would call any other function.
transfer_agent: Warm AI-to-AI Transfer
This tool hands the conversation to another agent picked from your organization. The transfer is warm — the conversation continues rather than restarting — so a caller who drifts from a delivery question into a billing dispute lands with the billing agent without re-explaining anything. Specialist agents stay small and testable because anything outside their domain is one tool call away.
transfer_number: Escalation to a Human
Some conversations need a person: an angry customer, a request with legal weight, a judgment call the agent is not authorized to make. transfer_number sends the call to a human phone number, with an optional pre-transfer TTS message — something like "Connecting you to our support team, please hold" — so the caller hears a clear bridge instead of dead air.
Trigger Scenarios: Teaching the Agent When to Transfer
Both tools are configured with multiple natural-language trigger scenarios. You write plain sentences — "user asks about billing", "caller wants to speak to a manager", "user mentions a refund" — and each scenario maps to a destination. This is the part teams underinvest in: the quality of your handoffs is the quality of your scenario list. Write scenarios the way callers actually talk, including the indirect phrasings ("there's a problem with my bill" as well as "billing question").
One guardrail worth knowing: transfers are function calls under the hood, so the agent editor warns you when the selected LLM lacks function-calling support. Catch that at design time, not on a live call where the transfer silently never fires.
These pieces — the two tools, their scenarios, and the routing and supervision around them — break down as follows.
Core Components
A built-in tool for warm AI-to-AI transfer — the conversation moves to another agent picked from your organization, so a billing specialist takes over from the front-desk agent mid-conversation.
Escalation to a human phone number, with an optional pre-transfer TTS message such as 'Connecting you to our support team' so the caller is never dropped into silence.
Each transfer tool carries multiple natural-language scenarios — 'user asks about billing', 'caller wants to cancel' — that tell the LLM exactly when to invoke the transfer.
Whisper folds a private operator instruction into the agent's next turn; text take-over pauses the bot while the operator replies in-thread; voice takeover rings the operator's phone into the live call, then releases back to the AI.
First-match-wins rules on caller prefix, language, business hours, keywords, and channel decide which agent answers in the first place — transfers handle what routing cannot predict.
Live Supervision: Three Levels of Human Intervention
Transfers handle the planned handoffs. Supervision handles the unplanned ones — a high-value caller, a conversation drifting off course, a new agent on its first week of traffic. Operators watching a live conversation have three escalating controls:
| Control | What happens | Caller notices? |
|---|---|---|
| Whisper | Operator sends a private instruction that is folded into the agent's next turn — the AI keeps talking, but steered | No — the AI simply gets smarter |
| Text take-over | Bot pauses; the operator replies as a human in the same thread. The transcript shows amber OPERATOR labels and handoff timeline markers | Yes — a human is now answering |
| Voice takeover | The operator's phone rings and joins the live call; when the situation is handled, the operator releases the call back to the AI | Yes — a human voice joins the call |
Whisper is the workhorse: a supervisor who spots the agent missing context can inject "offer the goodwill credit, this is a ten-year customer" without the caller ever knowing. Take-over modes are for moments that need human judgment now — and crucially, voice takeover is reversible. The human resolves the hard part, releases the call, and the AI finishes the routine wrap-up.
Every one of these events — transfers, whispers, take-overs, releases — is logged in the transcript, so QA reviews and analytics can answer the operational questions: which scenarios fire most, which agents escalate most, where humans had to step in.
Routing Decides Who Answers First
Transfers fix mid-conversation misroutes, but the cheapest handoff is the one you avoid. Inbound routing rules match on caller prefix, language, business hours, keywords, and channel — first match wins — so a Hindi-speaking caller dialing after hours about a claim lands on the right specialist immediately. Get routing right and transfer tools become the safety net, not the front door. The roadmap below sequences the whole setup.
Implementation Roadmap
- 1Split your mega-prompt into specialist agents — one per domain (billing, scheduling, support) — each with a short, focused prompt
- 2Attach transfer_agent to each specialist with natural-language trigger scenarios covering every out-of-domain request you expect
- 3Add transfer_number for the cases that genuinely need a human, and write the pre-transfer TTS message so callers know what is happening
- 4Configure inbound routing rules (caller prefix, language, business hours, keywords, channel) so most conversations start with the right agent
- 5Train supervisors on the Whisper, text take-over, and voice takeover controls, and review handoff events in transcripts weekly
Designing the Handoff Layer: Practical Guidance
- Start from your call taxonomy, not your org chart. Specialist agents should map to caller intents (billing, scheduling, claims) — the things callers ask for — even if one human team handles several of them today.
- Make every agent a dead-end-free zone. Each specialist needs trigger scenarios covering everything outside its domain. The failure mode to design against is an agent gamely improvising answers it was never given.
- Reserve humans for judgment, not volume. transfer_number should fire on emotion, authority, and ambiguity — not on routine questions a better-scoped AI agent could answer. Every unnecessary human transfer is the cost you deployed AI to remove.
- Script the bridge. The pre-transfer TTS message is two seconds of audio that determines whether the caller experiences a handoff or a hang-up. Always set it.
- Supervise new agents like new hires. Put fresh agents under live observation for their first weeks, lean on Whisper to correct course in real time, and graduate them to lighter oversight as transcript reviews come back clean.
The pattern that emerges is the same one on our home page transfer story: AI handles the routine majority, specialists handle their domains, and humans enter precisely when judgment is needed — with the full conversation context already in place, in any of 10+ Indian languages your callers prefer.
FAQ
Is a transfer_agent handoff visible to the caller?
It is a warm transfer — the conversation continues with the new agent rather than restarting, so the caller is not asked to repeat themselves.
How does the agent know when to transfer?
Through the natural-language trigger scenarios you attach to each tool. Multiple scenarios per tool are supported, so "user asks about billing", "caller disputes a charge", and "user mentions a refund" can all route to the same billing agent.
What if my chosen LLM does not support function calling?
Transfers depend on function calling, and the agent editor warns you at configuration time if the selected model lacks it — switch models before going live.
Can an operator intervene without the caller knowing?
Yes — Whisper folds a private operator instruction into the agent's next turn. The caller only perceives a change if the operator escalates to text take-over or voice takeover, both of which are labeled in the transcript with amber OPERATOR markers and handoff timeline events.
What happens after a voice takeover ends?
The operator releases the call back to the AI agent, which resumes the conversation — humans handle the exception, the AI finishes the routine.
Want to see a warm transfer and a live takeover end to end? Book a walkthrough with our team.
Written by
Anuprash Gupta
Product & Platform, Zoice
Anuprash Gupta works on the Zoice platform across telephony, WhatsApp, and the agent tooling that powers real customer conversations. He writes about how teams put AI voice and chat agents into production — integrations, onboarding, analytics, and the practical decisions behind shipping conversational AI for Indian businesses.
Keep reading
All articles
Connect Plivo to Zoice: A Step-by-Step Guide to Putting an AI Agent on Your Phone Number
June 14, 2026 · 7 min read
Read more
WhatsApp Business API Without a BSP: What Skipping the Middleman Actually Means
June 12, 2026 · 6 min read
Read more
BYOC for Voice AI: Wiring Your Own SIP Trunks into AI Agents (and Why Telephony Margins Matter)
June 10, 2026 · 7 min read
Read moreReady to put an AI agent to work?
Deploy voice, WhatsApp, and chat agents across Indian languages — grounded in your knowledge and measured on every call.