Boosting Confidence and Reducing Churn in AI Copilots

This case study examines how OpenBB significantly improved the performance and reliability of its AI-powered Copilot for financial analysts after incorporating Log10's advanced observability tools.

Situation

OpenBB is transforming investment research in the financial services sector by providing an AI-powered Copilot that serves as a trusted partner for financial analysts. Analysts can leverage any data that the OpenBB Terminal Pro platform offers or augment it with their own sources, and then utilize AI to uncover new insights, giving them a competitive advantage in the industry.

OpenBB users have the flexibility to bring their data in various formats — such as CSV/JSON/PDF files, API endpoints, RSS feeds, or data from a data warehouse like Snowflake — and combine it with the vast range of data offered by the company. With the OpenBB Copilot, analysts can ask complex questions about financial data using natural language and receive detailed answers instantly. Copilot covers a wide range of areas including risk management, corporate finance, financial modeling, financial metrics, behavioral finance, and more.

Copilot is a financial analyst’s AI-powered investment research partner with whom they can discuss their investment thesis because it operates across a wealth of financial data – both provided by OpenBB and from the analyst’s own files.

Problem

“One thing I spend a lot of time sweating over is, will someone try Copilot and have their first three interactions be bad, and they conclude it’s just like every other rubbish AI product and they don’t come back,” says Michael Struwig, Head of AI for OpenBB.

While OpenBB’s Copilot uses the most advanced Large Language Models (LLMs) in the world, LLMs have challenges with accuracy that diminish the user experience. Prompts can yield erroneous results, and the same prompt repeated in different runs can generate different results each time. In addition, a blank text input box can mislead users into expecting a coherent answer to any question, no matter what they ask or how they phrase it.

When asked about this, Didier Rodrigues Lopes, OpenBB founder and CEO said: “It’s even deeper than that. Even if Copilot can answer the user question, how do they know that the answer is accurate? This is why having the widgets with the data that was used by Copilot to reply visibly on the dashboard is critical for our financial terminal.”

Given the number of users interacting with the OpenBB Terminal Pro platform every day, it was not feasible to ask each user about their experience with Copilot. “Lacking visibility into our Copilot’s real-world operation was like flying blind,” says Michael. “We couldn’t know if the product was hitting the mark or not, much less fix things that were wrong or plan new features and enhancements.”

The Log10 Solution

Didier and Michael realized they needed an observability tool that would provide visibility into the functioning of their LLM models as well as into users’ interactions with Copilot. They looked at three or four analytics logging companies that specialized in LLMs before choosing to work with Log10. “It was clear that the folks at Log10 had thought deeply about the problem of visibility from all the angles we needed,” says Michael.

Log10 integrates easily with a variety of LLM providers such as OpenAI, Anthropic, Gemini, Mistral, Together, Mosaic, and self-hosted solutions. It works well with agentic LLM frameworks without enforcing heavy abstractions. And it interacts directly with provider libraries, avoiding the need to use proxies, which increase latency and can cause reliability issues.

With just a single line of integration, OpenBB was operational on the Log10 platform. “Initially, we were using the LangChain LLM framework, but when Copilot development necessitated remote function calling, we transitioned to the Magentic framework,” said Michael. “The Log10 team was highly responsive, adding support for Magentic within two days, enabling us to maintain our momentum.” 

“Ever since then it’s been a dream. Whenever we run into bugs — I always seem to find the most obscure stuff — I report it and the Log10 folks have it fixed the next day,” says Michael.

Logging all LLM calls through Log10, and having the ability to open calls in playgrounds, allows the OpenBB team to collaboratively debug Copilot’s complex agentic workflows. Because Log10 tracks when chains start and end, OpenBB can filter entire conversations. When a teammate flags a problem, “I don’t need to reproduce issues,” Michael says. “I just query the logs, look at the conversation, and see things like widgets not getting passed through properly.” The team even began debugging other parts of the codebase by injecting new elements into the system prompt along with the context of the conversation.

Logs for each request show details about the prompt and completion, usage metrics, status information, analytics information, hyperparameters and cost – as well as debugging information such as session scope, prompt provenance and call stack information.

Log10’s powerful tagging system enables the team to focus on specific flows to understand the detailed user experience. “We can look at all the cases where a user was interacting with documents,” explains Michael. By tagging the logs based on specific usage, such as URLs or PDFs, they can drill down into particular interactions. “This is something we couldn't really do before,” Michael continues. “And it’s really important because it means we can get good at all these different user flows.” When users encounter specific issues or use the product in new and interesting ways, OpenBB can reach out and schedule an interview.

By clicking on tags, the list of logs can be filtered for specific usage such as URLs or PDFs, and user interactions with these types of documents can be explored.

Given that observability is crucial for deploying new features into production, OpenBB frequently has time-sensitive support requests. “The Log10 team promptly adds new features and capabilities as fast as we need them implemented,” says Michael. Recent new capabilities include support for LLM-based vision models and asynchronous streaming; comparisons of model providers with cost-latency vs. accuracy reports; and feedback collection from non-programmers on the team, such as project managers and domain experts in finance, to measure quality and build online accuracy metrics.

“If we invent something new or need a new capability, Log10 just rolls it in,” says Michael. “We've not had to delay features, which is so good.”

With the Feedback UI, teams of human reviewers can easily add feedback to LLM completions using a JSON-based task schema that enables just about any criteria they can imagine.

OpenBB’s product roadmap is packed with first-to-market features that are dependent on advanced LLMOps tooling such as Log10’s AutoFeedback system. OpenBB can continuously improve the accuracy of its LLM outputs through this system, which scales human review using proprietary AI to provide overall monitoring and quality alerts, along with curated datasets. “With these metrics and feedback, we can do A/B testing and fine-tune models,” says Michael.

In this summarization use case, AutoFeedback scales human review by grading news summaries on four key metrics, using proprietary AI to mimic human feedback.

Results

By delivering essential observability and being highly responsive to OpenBB's development needs, Log10 has become a trusted partner. “The confidence Log10 gives us when we ship a new version of our financial analyst Copilot is invaluable,” says Didier. “Log10 has been critical in enabling us to react quickly to customer feedback and improve our product.”

Now that they can truly observe the performance of the OpenBB Copilot, the OpenBB team can:

  • Understand user intent and psychology by seeing their initial queries and follow-up prompts.

  • Identify where the model was providing inaccurate or irrelevant responses.

  • Debug issues quickly by searching logs for specific user interactions.

  • Compare model performance across different providers to optimize for accuracy and cost.

  • Leverage Log10's flexible tagging to zoom in on specific user flows and journeys.

In addition, with Log10’s AutoFeedback, which labels every Copilot LLM completion with generated human feedback, the OpenBB team can:

  • Build metrics that characterize the overall customer experience.

  • Monitor quality and generate alerts.

  • Detect quality issues and route to engineering for resolution.

  • Perform A/B testing on prompts and models.

  • Create datasets to fine-tune prompts and models.

Ultimately, Log10 enables the OpenBB team to actively manage and reduce customer churn caused by poor LLM experiences. “Instead of a few centimeters of visibility, Log10 delivers a long-distance, high-definition view of both our models and our users’ interactions with Copilot,” says Michael.

Incorporating AutoFeedback, which leverages proprietary AI combined with human insight, OpenBB has the ability to make Copilot exceptionally accurate and user-friendly, creating an application that delivers resounding value to its financial services users.

“To deliver on the promise of OpenBB, our Copilot accuracy needs to be impeccable,” says Didier. “We’re using AI to build a personalized, trusted financial research analyst, and Log10 is key to making that happen.”