Infrastructure for AI Phone Agents

The artificial intelligence revolution has brought us an entirely new generation of what are called AI phone agents. These are completely autonomous AI powered entities that can have phone calls and follow scripts just like their human agent counterparts.

New Agents, New Challenges

‍

Automated phone systems aren't new, everyone has suffered through robotic, cheerful prompts and ultimately been redirected to a human operator in frustration. But the latest generation of AI solutions have enabled teams like Bland to build robot phone operators that sound just like human agents.

It's incredible the level of realism and responsiveness that AI phone agents can now possess. But the journey of progress is far from complete. With new capabilities come new hardware requirements, and given the continuous improvement of machine learning models and large language models the demands on your infrastructure will only increase.

In order to achieve the competitive edge in your call centers automated with AI phone agents, a commitment to technical infrastructure excellence is becoming more and more apparent.

This article identifies the main benefits of self hosted infrastructure when it comes to phone agents. We attempt to answer the question "Why should you care about building dedicated infrastructure when some many faster routes (cloud) are available?".

‍

Why Self Host AI Phone Agents

There's a lot to be said for the convenience and scalability offered by storing language models in the cloud or building your applications around highly available 3rd party APIs. Initial costs are low, the potential to scale as your application grows is limitless, and rapid development and iteration are friction-less.

‍

Key Benefits of Self Hosted AI Agents

‍

Benefit 1: Reduced Latency

Latency is an important consideration when implementing ai solutions. But when it comes to AI phone agents latency is everything. Let's break it down.

On average during a fluid conversation between people the average response time by each person can be around 200ms several studies have found.

The average response time from a large language model that's accessible publicly can be 500ms to well over 1 second depending on the context.

If we look at an AI phone agent architecture, we'll see three uses of language models (in Bland's design).

First to translate the user's verbal speech to text
Second to pass the translated text though a second model to get a proper response
Third to translate the text back to speech, and speak back to the user
And fourth (during the second stage) to use another model to decide where to travel next in the AI agent's "script"

With multiple models involved and different prompt/answer cycles often end with high latency and poor performance.

This is why the Bland team has obsessed over latency, and through self hosted infrastructure (and some pretty big breakthroughs in model latency)

Benefit 1: Reduced Latency

Definition of latency in the context of AI phone agents
How dedicated infrastructure minimizes latency
Impact on user experience and customer satisfaction
Comparison with cloud-hosted solutions

Benefit 2: Improved Reliability

Importance of reliability in customer service applications
How dedicated infrastructure enhances reliability
Reduced dependence on external factors
Customization options for specific business needs
‍

Benefit 2: Improved Reliability

The main justification for the operational costs of running your own infrastructure is the hope that you can significantly increase reliability of your systems. If something is slow or broken, it's in your power to fix. For the truly self-reliant engineering team, this is far preferable than endless conversations with enterprise support agents over minutia.

The ability to customize your infrastructure for your specific problem sets is also overlooked for the quicker API style services available today. While many are a 90% fit for the majority of use cases, that last 10% of functionality or design missing can wreak havoc on end products down the line. In the case of AI Phone agents, their multi-model design requires a special architecture to hum along at full capacity.

All this adds to the primary goal of customer service based phone systems, a consistent experience users can count on.

Benefit 3: Cost-Effectiveness

Initial investment vs. long-term savings
Elimination of ongoing cloud service fees
Scalability and resource optimization
Cost comparison with other hosting options
‍

Benefit 3: Cost-Effectiveness

What you pay in upfront costs (both in hardware and time), you gain back over a few years of operational and user growth.

The age-old decision point for CTOs and CEOs is "do I invest ahead of the capacity needs for the long term, or do I kick the investment down the road?"

This varies depending on your company and your use case. If you're a startup that's barely raised enough to cover salary for the next year, a multi-hundred thousand dollar physical infrastructure buildout just won't make sense, reliability and performance be damned.

However for a more established organization that's confident they'll exist in 2 years time should give heavy consideration to making an upfront investment in infrastructure for their ai agent infrastructure.

‍

What About the Risks?

If there was a clear path to infrastructure scalability then the major cloud providers like Amazon Google and Azure wouldn't be growing at well over 20% year over year (Source). So what risks or considerations should you have for early AI phone agent adoption?

‍

Ease of Early Implementation

The obvious draw of pre-built and highly available solutions is their pug-n-play nature. For a demo or proof of concept build (MVP) nothing beats being able to have an API endpoint up and running in minutes to interact with.

‍

Technical Expertise Shortfalls

Has your team built and managed their own infrastructure before? Many software engineers in today's market learned their craft in the worlds of easy deployments and cloud obfuscated app deployments only. Make sure that your team has the technical expertise to deliver state of the art dedicated infrastructure for your ai adoption project. Otherwise you'll end up wasting precious development time on unrelated headaches.

‍

Hardware Obsolescence

Moore's law is in full effect when it comes to machine learning chipsets and technology advancements. The innovation curve for technology that machine learning models need to run is only accelerating. That means that a large investment in the "latest" technology to build your dedicated infrastructure today may be 3 generations old in 4 or 5 years time.

With cloud and highly available providers, you benefit from the arms race to have the latest GPUs and processors.

‍

Security and Regulatory Considerations

Depending on the nature of your business, security and data privacy laws can be non-starters for certain application architectures. Your regulatory department (or those of your customers) may never accept a solution where their data is shipped off to an external API with fluctuating terms of service. Owning your infrastructure can offer a granular level of control to appease all regulators and situations. However, the risk and burden of certification of environments can often add more operational costs depending on your situation.

‍

Dedicated Infrastructure for AI Phone Agents - The Final Word

While there are certainly plenty of well-defined benefits to using cloud providers to bootstrap your machine learning tasks and reduce operational costs early on, it's clear to the Bland team that the benefits of implementing AIi with pure cloud solutions too often come at the cost of service quality.

When the long term investment costs are brought into the equation, it becomes an obvious decision for companies planning to stick around and grow their high service quality ai solutions. In the case of AI powered phone agents, teams like Bland are planning for the long term, and only continue to see operational efficiency increase along with their customer base.

‍

John Bland