FinChat v. LLMs For Finance
FinChat scores highest among all large language models according to FinanceBench
Conversational AI has taken the world by storm. From solving mundane tasks to answering complex prompts, AI is becoming increasingly prevalent across different fields.
But different industries require different competencies from an LLM. In finance for example, data accuracy is of highest importance. If an LLM's response contains inaccurate data, the investor utilizing it could ultimately end up making misinformed decisions.
So to see which AI is truly the best for answering financial questions, we put each one to the test:
Introducing FinanceBench
FinanceBench is an open source dataset developed by Patronus AI, a company focused on evaluating LLM mistakes.
To build FinanceBench, researchers at Patronus and Stanford teamed up with 15 finance industry experts to compile a list of more than 10,000 questions along with the correct answers for each one.
To evaluate which AI is best for financial questions, we tested the 3 leading general LLMs (ChatGPT-4o, Claude2, and Llama2) against the top 100 questions from FinanceBench.
Here's how each of them scored:
Claude2: 37%
ChatGPT-4o: 31%
Llama2: 19%
Then we put FinChat Copilot to the test.
Here's how it compares:
Why does FinChat perform better?
There are several reasons that FinChat is able to perform 2x-3x better than the general LLMs for financial queries, but it primarily comes down to 3 things.
Domain-Specific Knowledge: Detecting and interpreting finance-specific language isn't easy and requires specific training. This is a big differentiator for FinChat.
FinChat Copilot has access to transcripts, filings, financials, company presentations, real-time news, and more, and it is engineered to answer questions specifically around equity research.
General LLMs on the other hand are trained on everything, but aren't not particularly good at handling questions relating specifically to finance.
Up-to-Date Data: While general-purpose LLMs are able to get some answers right by scouring public websites, they also often pull improper data or out of date metrics.
FinChat, on the other hand, maintains a best-in-class real time financial database where it's able to extract specific numbers with institutional-grade accuracy. This drastically reduces the number of data hallucinations relative to other LLMs.
Better ingestion of structured data: Financial questions often require responses with a blend of qualitative insights in addition to quantitative numbers.
Since FinChat has been trained to pull from a variety of different sources (conference call transcripts, financial tables, analyst estimates, etc) it's able to provide more nuanced and valuable answers.
What did FinChat miss?
In each of the cases where FinChat was unable to provide the correct response, the most common root cause came down to an inability to access the proper sources. FinChat Copilot is already in the process of remedying this.
Despite already being the highest performing conversational AI for financial queries globally, FinChat is actively pulling in additional data sources such as 8k Filings, Proxy Statements, and Fixed Income Data, which will continue to raise its score and ultimately be able to serve even more user queries.
Additionally, FinChat provides a variety of sources within each response so the user can easily click into the primary documents to find any additional context that might be helpful.