Introduction
AI phone agents are amongst the first technologies poised to completely upend society as we know it. Artificial intelligence agents will automate the trillions of calls made between businesses and their customers. Soon, consumers will save business’ numbers in their contacts and will call and text them for support, to get their questions answered, accounts updated, and even to make new purchases.
During the transition period as this new technology takes hold, enterprises have the opportunity to massively reduce costs while driving better customer experiences. Simultaneously, enterprises must navigate new risks posed by AI phone calling technology with poise, or they’ll face backlash and hurt relationships with their customers.
Bland AI is the infrastructure of AI phone calling, and in this guide we detail everything you need to know about AI phone agents. From how they’re built, to how your organization can program, test, and deploy them successfully at scale. By the time you finish reading this guide, you’ll understand the opportunities and challenges associated with the end-to-end implementation of an AI phone agent and will be prepared to automate the calls your business makes today.
What is an AI phone agent and how does it work?
AI phone agents are the combined output of three models: a transcription model, a language model, and a text-to-speech model. The transcription model hears what the person says and writes it down. The language model looks at what’s been said, and references the instructions you provide (either in a prompt or structured format) to generate a response. Finally the text-to-speech model outputs audio back to the person.
A layer of conversational intelligence sits on top of the generative AI models. That layer helps the phone agent decide when to interrupt vs. stay silent. The conversation layer also dictates when the phone agent should transfer the call, end it, or use one of the custom tools at its disposal to schedule an appointment, send a text message, or complete any other live action.
Because three models must successfully fire to generate every response, there are also three places where the phone agent can fail. If just one of the three models spikes in latency or altogether fails, the entire call experience will be ruined, hurting customer trust and satisfaction.
While using other providers’ APIs is a viable solution, most enterprises find the risk untenable. If the transcription, inference, or text-to-speech provider has any lapse in service, the entire phone calling operation would collapse. That’s why enterprises rely on companies like Bland that host their own model stack. Through self-hosting, Bland removes latency between pinging different APIs. Bland also removes the risk of waiting in long queues (e.g. OpenAI) by creating easy-to-scale infrastructure. Plus Bland fine-tunes its models to be specifically effective for AI phone calling (quality and fast to generate) to get the best outputs.
To see an example of an ultra-low-latency AI phone agent, send yourself a call from Bland Turbo.
How do I implement AI agents?
It depends on your use case. If your use case is more assistant-based and conversational, you should build your agent based off a prompt. Alternatively, if your use case includes a stricter script or conversation flow, e.g. a prequalification or customer support call, you should follow a more structured format.
We’ll explore the tradeoffs between each approach below.
Prompt-based phone agents
At the text generation step, prompt-based phone agents reference a big prompt to figure out how to respond. The prompt will tell the phone agent what persona to play (e.g. an assistant or salesperson), the goal for the call, what steps to follow, and when to use the custom tools (e.g. ‘use the scheduling tool after getting the lead’s availability’).
Prompt-based phone agents are flexible, conversational, and the most human-sounding. They roleplay effectively, playing the part they’re given.
However they have one crucial flaw. They can hallucinate pretty much anything. A given caller could get the agent to say whatever they wanted by persistently pushing the agent in the right direction.
For enterprises, this approach opens them to massive liability. Imagine a malicious caller getting the phone agent to offer a massive discount - or getting it to skip an authorization step before pulling customer-related information from the database. The potential risk is overwhelming.
Conversational pathways
Pathways is the structured alternative to creating effective human-sounding AI phone agents. Using Bland’s conversational pathways, enterprises can create a conversation graph. At every node, the company can define the exact action the phone agent should take, and that action can be a fixed response, using a custom tool, or generating a response based on a prompt.
Each time the phone agent generates a response, it first looks at the pathways available to it. Then decides which pathway to transfer to and follows the instructions at that step.
This approach has multiple benefits. Enterprises can force consistent results from their agent, choosing when to introduce variability and consistency. They can force the agent to follow the exact pathway they want, rather than crossing their fingers and hoping that the agent will adhere to the prompt. Finally, once calls are live, enterprises can easily see what decisions their agent made, and why, and can make corrections to specific nodes and pathways, instead of having to update the entirety of their long prompt.
To start building with pathways, sign up for a free Bland AI account.
Connecting knowledge bases and taking live actions
Outside of prompting or structuring your phone agent to perform in a certain way, you also need to integrate your own APIs for actions like scheduling, getting order status, and more. This ability is crucial; the phone agent must consistently and effectively receive and send data to and from your company’s backend systems to be capable of fully automating your organization’s phone calls.
Bland provides two functions for collecting and sending information: dynamic data and custom tools. Dynamic data enables you to ping your backend for live data at the start of and during the call. E.g. to fetch calendar availability in real-time, ensuring scheduling occurs seamlessly.
Custom tools, on the other hand, enable enterprises to define specific API requests the phone agent can fire during the call. Such function calling enables Bland’s agent to embed deeply with enterprise systems to dynamically update CRM records, schedule appointments, and take account-level actions.
Picking a fantastic voice
To the layperson, voice is the only attribute that matters. A quality and friendly voice is one of the largest differentiators between AI phone agents and IVRs.
Bland hosts a curated set of ultra-high-quality voices, and enables users a wide sampling of personas. Additionally, Bland users constantly provide ratings feedback on voices, enabling Bland to surface the most-loved voices on the platform.
To deliver an even more personalized feel to your calls, Bland enables you to create voice clones. Record a five-minute audio sample and upload it and then Bland can integrate that voice into the AI phone agent (try it here).
Timeline for building an enterprise-grade AI agent
On Bland you can create your first phone agent in under five minutes. Creating a robust prompt/pathway, configuring custom tools, testing the pathway, going live with customers, and monitoring results can take 2-8 weeks.
Expect to spend at least one week creating the phone agent, equipping it with custom tools, and making sure that the pathways and actions are natural and the agent chooses them correctly. You’ll need at least one other week to heavily test internally to find flaws in the prompt or pathways provided, and find weaknesses in your implementation, before going live with customers. Many times testing and tuning (even for the simplest use cases) takes longer than expected – enterprises should buffer plans and assume implementation will take 8 weeks.
Required resources
During this period, your organization will be most successful if an engineering leader takes ownership of the project. The minimum requirement is a single engineer who can read API documentation, tweak the various phone agent settings, and build integrations with internal systems.
Additionally, the leader of the organization that currently makes the calls - whoever heads customer support, marketing, etc. - should assign a member of their org to help with prompting/pathways and expect them to be fully focused on implementation.
Going live with your AI agent
Once you’re confident your phone agent will perform, the next step is to go live with customers. Start by diverting some of your call volume to the phone agent. Ensure you define a transfer pathway so the phone agent can hand off unsuccessful calls to human agents and closely monitor call logs to catch any issues that occur.
To closely monitor, track, and improve your calls, use tools like Bland’s analysis endpoint. Ask questions about calls and extract structured outputs to understand customers’ dispositions, perform sentiment analysis, and generate other qualitative insights.
Managing risk & reliability
In AI phone calling, reliability is the name of the game. An AI phone agent that’s based on external APIs is destined for outages, latency spikes, and failure.
For any enterprise planning to serve customers with AI phone agents, it’s absolutely necessary to own the underlying infrastructure and have the exclusive ability to freeze the endpoint and push updates on a fixed schedule. Otherwise, If your underlying AI models or conversation layer are constantly changing, your phone agent’s behavior will also differ, causing instability, potential outages, and a worse overall customer experience.
Aside from stability benefits, dedicated infrastructure also provides a priority call queue, ability to bring your own telephony provider, and even lets you decrease call latency by doing multiregional hosting.
To learn how your organization can get its own dedicated AI phone agent infrastructure, schedule time with Bland’s team today.
When not to use AI agents
Having read this entire guide, you may be under the impression that AI phone agents are ready to automate every call ever made. This notion couldn’t be farther from the truth.
Enterprise-ready AI phone agents should have strict and pre-defined guardrails. Therefore, while they are perfect for automating routine tasks, they’re unsuitable for high-tough, emotionally nuanced conversations.
Your account executive and high-tier support reps can make critical and time-sensitive decisions on behalf of your business, that AI phone agents most likely should not be making.
Conclusion
AI phone agents will fundamentally change the way businesses communicate with their customers. Enterprises must introduce strict guardrails and build on dedicated infrastructure to automate their customer interactions or run risk of failed calls and security breaches. To learn how Bland AI can help your enterprise automate its phone operations, submit an enterprise inquiry today. Alternatively, to start building on Bland immediately, sign up on the development portal.