Join us at Microsoft Ignite in Chicago, Nov 19-21. Booth 614! ->->

Blog / Community

NYC Meetup Report: Large Scale Financial Market Analytics with ClickHouse (Bloomberg)

author avatar
ClickHouse Editor
Dec 6, 2022 - 5 minutes read

On December 6, 2022, Baudouin Giard, Engineering Team Lead on the Structured Products team at Bloomberg, presented his team's use case for ClickHouse at the NYC meetup.

The financial sector requires advanced data management and analysis capabilities to provide accurate and timely information to institutional investors. Bloomberg, as a leading data provider for financial markets, faces unique challenges when it comes to optimizing data ingestion and query performance. ClickHouse's columnar storage, ease of use, and ability to write complex calculations with simple mathematics make it an excellent choice for Bloomberg’s use case.

Bloomberg's products include the Bloomberg Terminal and some enterprise APIs that provide the best data continuously to institutional investors. The company's challenge is to optimize for a relatively small number of people querying about a hundred million securities all at once.

"What I have to optimize for is not a hundred million people connecting to Bloomberg all at once. What I have to optimize for is one single person querying data, about a hundred million securities all at once. And what if then a thousand people do the same thing?” said Giard. “People want to get a big view over the entirety of the market, and they want to compare what they're doing to the average of the market.”

Giard's team has been using ClickHouse for two years, and the specific application on the Bloomberg Terminal leveraging ClickHouse is one of their most worked-on projects. The data set is relatively small, with only 5 billion rows and 100 columns, and the usage pattern is pretty basic. Users filter and group columns, and the columns are selected to perform interactive queries that require them to iterate on their queries and perform somewhat complex calculations.

Bloomberg  highlights.png

Giard explained that ClickHouse is an excellent choice for their use case because of its columnar storage, which allows users to query only the necessary columns and not use excessive memory.

“The columnar storage is a very big asset when you have a data set like this one with a hundred columns. You quickly find out that people are only interested in probably 10 of them, but every now and then there is a person who wants to query the hundredth one. So you want to have it available, but you don't want to have a full memory data store that's going to use a ton of memory for all these columns that people use once in a blue moon,” said Giard.

Additionally, ClickHouse is easy to use and allows users to write complex calculations with simple mathematics.

Bloomberg strengths.png

The team's approach to optimizing data ingestion, rather than the speed of data queries, has helped them ingest four terabytes of data in under an hour. Giard created a new ClickHouse table every time they needed to make changes to the data, such as when adding a new column. The use of a buffer table helped to ensure that all the data was in memory, making the process extremely flexible.

Data quality is crucial, and Giard emphasized the importance of updating the data quickly to ensure that clients get accurate data. The use of buffer tables and creating a new table for every change made the process flexible, allowing for quick data updates and accurate data for clients.

“It's very important to me because when you are in the data business, and especially financial data, your edge, what makes you good, is the quality of your data,” said Giard.

Giard also discussed how ClickHouse works with immutable files through the merge tree table, which allows users to clone a table instantly when appending new data to it. However, every query that the client performs puts stress on ZooKeeper, which could cause the entire application to go down if ZooKeeper goes down. Therefore, the team has been testing disaster recovery simulations to ensure that the process works.

Bloomberg currently uses ClickHouse on-premises on the default setup of the machine provided by the company, and they use local storage for data serving to clients. However, Giard would like to experiment with deploying ClickHouse on Kubernetes clusters, using their internal S3 backend as storage, and trying ClickHouse Keeper.

Baudouin Giard's presentation on Bloomberg's unique use case for ClickHouse provides valuable insights for other companies in the financial sector. Giard's team's approach to optimizing data ingestion, rather than the speed of data queries, has helped them efficiently ingest vast amounts of data while ensuring data quality. ClickHouse's columnar storage, ease of use, and ability to write complex calculations with simple mathematics make it an excellent choice for this use case.

More Details

Share this post

Subscribe to our newsletter

Stay informed on feature releases, product roadmap, support, and cloud offerings!
Loading form...
Follow us
Twitter imageSlack imageGitHub image
Telegram imageMeetup imageRss image