Case Study
This case study explores the challenges Echo AI faced when deploying Large Language Model (LLM) applications at enterprise scale. It also describes how Log10’s end-to-end AI-powered LLMOps workflow enabled Echo AI to quickly move enterprise customers with 6-figure contracts from prototypes to production by resolving critical LLM accuracy issues.
Echo AI is a Conversation Intelligence Platform that uses generative AI to review customer conversations and uncover hidden insights. Companies such as StitchFix, CLEAR, Wine Enthusiast and more are suddenly able to go from manually reviewing 1% of conversations to understanding 100% of what their customers are saying; they now can ask and answer nuanced questions about their business that might have never occurred to them previously. This is a transformative shift and a net-new use case.
A single conversation might get analyzed 50 different ways, with 50 unique API calls to 10 different LLMs, producing millions of LLM outputs over hundreds of thousands of customer interactions. Echo AI has built a powerful analysis platform that classifies these outputs to drive actionable business insights and trigger downstream workflows.
A major challenge with LLMs today is accuracy: LLMs are notoriously error-prone, producing different outputs even when given the same prompt, which had downstream implications for Echo AI if outputs were tagged erroneously. For example, Echo AI’s customers made financial decisions like providing refunds based on tagging information, so it was critical that LLM outputs were highly accurate and therefore correctly tagged.
Making LLM outputs more accurate began with the initial LLM prompt, which often needed to be tailored via an iterative prompt engineering process to produce reliable results. Echo AI engineers found that initial prompts could start out with an accuracy as low as 40-50%, and that boosting this accuracy to the 95%+ accuracy required for production was an arduous journey. The process typically required several days of effort to set up, with ongoing manual monitoring that could last indefinitely. They had to:
If models changed they’d have regressions and would have to start the cycle all over again on prompts that they’d already tuned.
Because the AI tech stack was emerging and developer tooling was nascent, fixing LLM accuracy issues was prohibitively time consuming. Echo AI knew that they needed a more powerful tooling solution or their ability to roll out new customers would stall.
Echo AI turned to Log10, which provides an end-to-end LLMOps workflow that supports logging, prompt engineering and optimization, enabling engineers to solve LLM accuracy issues within a single environment.
With just one line of integration, they began logging and tagging their LLM calls to Log10’s platform. In contrast to other solutions on the market, Log10 could handle direct LLM calls to multiple providers as well as to frameworks such as LangChain without the latency delays of a proxy.
from log10.load import OpenAI
client = OpenAI(tags=["customer/activision", "use-case/customer-support"])
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{
"role": "system",
"content": "You are a customer support representative for Activision."
}
{
"role": "user",
"content": "How can I reset my password if I no longer have access to the email address associated with my account?",
}
],
temperature=0,
)
print(response)
When a prompt needed improvement, Echo AI engineers could filter and search millions of logs in seconds to isolate a sample set of LLM call logs, open them in a collaborative Playground, and start improving the prompt using AI-powered suggestions along with a rich array of models and hyper parameters.
Whereas previously it took multiple days to improve a prompt, with Log10's workflow Echo AI engineers could optimize a prompt for production in less than 1 hour.
Identifying an opportunity to make use of Echo AI’s massive trove of logs, Log10 built a Prompt Engineering Copilot to further boost accuracy and efficiency.
The copilot continuously analyzed Echo AI’s logs, testing for ways to improve accuracy via prompt engineering, and surfaced optimizations to solutions engineers when performance thresholds were crossed. On critical tasks the copilot improved accuracy by 10-20 F1 points resulting in greater user satisfaction and trust in Echo AI’s offering.
With the copilot, Echo AI Solution Engineering team was able to onboard a 10x influx of customers without increasing the size of their team.
As part of a new guard forming The Modern AI Stack, Log10 is bringing state-of-the-art tooling that solves net-new use cases to the marketplace. With Log10’s end-to-end LLMOps workflow, Echo AI efficiently resolved LLM accuracy issues and quickly moved enterprise customers with 6-figure contracts from prototypes to production.