Case Study

Powered by Log10: How Echo AI Conquered LLM Accuracy Issues and Deployed GenAI at Enterprise Scale

This case study explores the challenges Echo AI faced when deploying Large Language Model (LLM) applications at enterprise scale. It also describes how Log10’s end-to-end AI-powered LLMOps workflow enabled Echo AI to quickly move enterprise customers with 6-figure contracts from prototypes to production by resolving critical LLM accuracy issues.

Echo AI Revolutionalizes Conversation Intelligence with New GenAI Use Cases

Echo AI is a Conversation Intelligence Platform that uses generative AI to review customer conversations and uncover hidden insights. Companies such as StitchFix, CLEAR, Wine Enthusiast and more are suddenly able to go from manually reviewing 1% of conversations to understanding 100% of what their customers are saying; they now can ask and answer nuanced questions about their business that might have never occurred to them previously. This is a transformative shift and a net-new use case.

A single conversation might get analyzed 50 different ways, with 50 unique API calls to 10 different LLMs, producing millions of LLM outputs over hundreds of thousands of customer interactions. Echo AI has built a powerful analysis platform that classifies these outputs to drive actionable business insights and trigger downstream workflows.

Echo AI analyzes every single conversation across every channel with human-level depth and gives leaders the answers to the critical strategic questions that drive growth and retention.

LLM Accuracy Issues Impacted Echo AI’s Ability to Deploy at Scale

A major challenge with LLMs today is accuracy: LLMs are notoriously error-prone, producing different outputs even when given the same prompt, which had downstream implications for Echo AI if outputs were tagged erroneously. For example, Echo AI’s customers made financial decisions like providing refunds based on tagging information, so it was critical that LLM outputs were highly accurate and therefore correctly tagged.

From escalations to supply chain issues, Echo AI’s powerful tagging abilities track anything with just a simple prompt.

Making LLM outputs more accurate began with the initial LLM prompt, which often needed to be tailored via an iterative prompt engineering process to produce reliable results. Echo AI engineers found that initial prompts could start out with an accuracy as low as 40-50%, and that boosting this accuracy to the 95%+ accuracy required for production was an arduous journey. The process typically required several days of effort to set up, with ongoing manual monitoring that could last indefinitely. They had to:

  • Write SQL queries in their data warehouse to pull out the LLM call logs of interest.
  • Copy and paste these prompts into the OpenAI playground or into a Jupyter notebook.
  • Iterate on prompts and hyper parameters.
  • Share results by copying and pasting into a spreadsheet so others on the team (PM, engineering, executives) could review and sign off on changes.
  • Reply back to their customers with updated prompts.

If models changed they’d have regressions and would have to start the cycle all over again on prompts that they’d already tuned.

Because the AI tech stack was emerging and developer tooling was nascent, fixing LLM accuracy issues was prohibitively time consuming. Echo AI knew that they needed a more powerful tooling solution or their ability to roll out new customers would stall.

“We were signing 6-figure contracts, so scaling LLM accuracy from prototype to production became critical to our ability to grow rapidly.”

Alexander Kvamme – CEO, Echo AI

Using Log10’s End-to-End LLMOps Workflow, Echo AI Engineers Rapidly Solved LLM Accuracy Issues

Echo AI turned to Log10, which provides an end-to-end LLMOps workflow that supports logging, prompt engineering and optimization, enabling engineers to solve LLM accuracy issues within a single environment.

With just one line of integration, they began logging and tagging their LLM calls to Log10’s platform. In contrast to other solutions on the market, Log10 could handle direct LLM calls to multiple providers as well as to frameworks such as LangChain without the latency delays of a proxy.

from log10.load import OpenAI

client = OpenAI(tags=["customer/activision", "use-case/customer-support"])
response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {
            "role": "system",
            "content": "You are a customer support representative for Activision."
        }
        {
            "role": "user",
            "content": "How can I reset my password if I no longer have access to the email address associated with my account?",
        }
    ],
    temperature=0,
)

print(response)
Log10 is easy to integrate. With just one line of code, all your LLM calls are captured, whether directly using the provider (OpenAI, Anthropic, Together, MosaicML, etc.) library, or via 3rd-party frameworks such as LangChain or magentic.

When a prompt needed improvement, Echo AI engineers could filter and search millions of logs in seconds to isolate a sample set of LLM call logs, open them in a collaborative Playground, and start improving the prompt using AI-powered suggestions along with a rich array of models and hyper parameters.

Whereas previously it took multiple days to improve a prompt, with Log10's workflow Echo AI engineers could optimize a prompt for production in less than 1 hour.

Log10 provides a streamlined end-to-end workflow to find and fix inaccurate LLM outputs.

“Log10’s streamlined workflow enabled us to scale to millions of LLM calls without compromising on accuracy.”

Trey Doig – CTO, Echo AI

Log10 Responded to Echo AI’s Needs with Prompt Engineering Copilot that Systematically Boosts Accuracy at Scale

Identifying an opportunity to make use of Echo AI’s massive trove of logs, Log10 built a Prompt Engineering Copilot to further boost accuracy and efficiency.

The copilot continuously analyzed Echo AI’s logs, testing for ways to improve accuracy via prompt engineering, and surfaced optimizations to solutions engineers when performance thresholds were crossed. On critical tasks the copilot improved accuracy by 10-20 F1 points resulting in greater user satisfaction and trust in Echo AI’s offering.

Log10’s Prompt Engineering Copilot provides AI-powered assistance to rapidly improve the accuracy of prompts and models.

With the copilot, Echo AI Solution Engineering team was able to onboard a 10x influx of customers without increasing the size of their team.

“Log10 dug deep to understand our customers and delivered straight-from-research solutions that moved the needle.”

Emiliano Colosimo – Head of Solutions and Support, Echo AI

Log10 Empowered Echo AI to Deliver GenAI Apps at Enterprise Scale

As part of a new guard forming The Modern AI Stack, Log10 is bringing state-of-the-art tooling that solves net-new use cases to the marketplace. With Log10’s end-to-end LLMOps workflow, Echo AI efficiently resolved LLM accuracy issues and quickly moved enterprise customers with 6-figure contracts from prototypes to production.

  • Engineers optimized prompts to 95%+ accuracy in hours vs. days.
  • Teams collaboratively solved prompt improvement issues within a single environment while also providing visibility to product and executive stakeholders.
  • Solution Engineers onboarded 10x new customers without increasing their team size.
  • Log10’s Prompt Engineering Copilot systematically improved accuracy on critical, high-stakes tasks by leveraging customer log data.
  • Echo AI’s Conversational Intelligence platform scaled to millions of LLM calls without compromising on quality.

“Log10 is a critical part of our stack – we could not have scaled LLM accuracy to serve our enterprise customers without them.”

Alexander Kvamme – CEO, Echo AI

Interested in Conquering LLM Accuracy at Scale?