Why tech firms are courting Fidelity’s vast trove of data

As tech companies across the world race to create AI services akin to ChatGPT, the underlying raw material required – data – is suddenly in demand like never before. Fidelity Investments is a case in point: tech start-ups and conglomerates alike are courting the wealth management firm to lay their hands on its vault of financial services data, information chief Mihir Shah said in an interview.

For companies seeking to build AI systems for the finance industry, Fidelity’s decades' worth of online transaction records, customer call transcripts and face-to-face client interaction reports would be a treasure trove. It holds about 8 petabytes of data – equivalent to trillions of pages of printed text.

The US investment company, which oversees more than $11 trillion and has tens of millions of customers, hasn’t engaged with any of the suitors, said Shah, who is leading an effort to harvest value from Fidelity’s data. The firm has considered building its own AI model, although it hasn’t decided whether to go that route. Any data it shared would be anonymised and scrubbed of personal information in keeping with the best security practices.

Services such as ChatGPT are based on large language models, or AI systems that analyse vast quantities of writing from across the internet and other sources to determine how to generate human-sounding text. The technology has spurred excitement across industries as companies seek ways to reduce costs and better serve their customers – with banks from JPMorgan Chase to Morgan Stanley among those leading the way.

ChatGPT creator OpenAI, backed by Microsoft, as well as Alphabet and Meta Platforms, are among the tech leaders in the field. They all mostly use the same public data for training their systems to understand and generate text or code in a human-like fashion.

But proprietary data, such as that owned by Fidelity, would enable an AI service to stand out from the competition, said Shah, who started at Fidelity 29 years ago and oversaw the building of its website – the first for a major financial services company. He’s now directing the creation of Fidelity’s companywide cloud-based warehouse for its data, part of an effort to put it to better use.

“The differentiation will be in combining first-party data with public data to have a vertical large language model for financial services,” Shah, who is based in Boston, said via video.

“We’ve already seen vertical LLMs coming up in scientific research and health care industries.”

A large language model’s value depends largely on the amount and quality of the data it’s trained on. Massive amounts of text, images, sound and other information are required to make the AI models learn patterns and relationships, so they can then generate content based on them.

Fidelity’s data is deemed so attractive that some suitors have proposed building an AI system for the company for free, in exchange for collaboration, Shah said. Much of Fidelity’s data is relatively current, saved in the past seven years as per the latest compliance requirements, he said. Fidelity has more than 42 million customers, and it manages retirement plans and other benefit programmes for tens of thousands of businesses.

As Fidelity decides on how to utilise the data, it needs to take into account AI systems’ challenges such as reliability, bias, and how personally identifiable information is processed, Shah said. Meanwhile, the company is taking steps to tighten its security infrastructure and adding further restrictions on who can access the data, he said.

“We are exercising extreme caution with these new tools,” Shah said. “With generative AI, you can’t fully trust the results.”

Why tech firms are courting Fidelity’s vast trove of data

Proprietary data such as that owned by Fidelity would enable an AI service to stand out from the competition