We're thrilled to announce that the first ClickHouse research paper was accepted and is now published at VLDB.
VLDB—the international conference on very large databases—is widely regarded as one of the leading conferences in the field of data management. Among the hundreds of submissions, VLDB generally has an acceptance rate of ~20%.
This year, VLDB 2024, held in Guangzhou, China, marked the 50th anniversary of the conference, making it one of the longest-running data management conferences.
The conference featured 250 paper presentations and 10 accompanying workshops on the latest research and industry trends.
This year’s dominant topic was machine learning in all shapes and forms but also lots of papers in core database areas like query engines, storage, and database theory appeared.
A sneak peek into the ClickHouse paper
Our publication is the culmination of a months-long, cross-functional effort to offer readers a concise description of ClickHouse's most interesting architectural and system design components that make it so fast. And now, for the very first time, it's available.
In the paper, you'll learn about:
The history of ClickHouse
When were major features described in this paper introduced to ClickHouse, and what features and enhancements are planned for the future?
The architecture of ClickHouse
Layers, components, and execution modes.
The storage layer of ClickHouse
On-disk format, data pruning techniques, merge-time data transformations, updates and deletes, idempotent inserts, data replication, and ACID compliance.
The query processing layer of ClickHouse
SIMD parallelization, multi-core parallelization, multi-node parallelization, and performance optimization techniques.
The integration layer of ClickHouse
Native support for 90+ file formats and 50+ integrations with external systems.
Benchmarks
Performance comparison of ClickHouse with other databases frequently used for analytics. Note: lower is better.
ClickHouse at VLDB 2024
Paper presentation
Alexey Milovidov, our CTO and the creator of ClickHouse, presented the paper last week in Guangzhou (slides here), followed by a Q&A (that quickly ran out of time!). You can catch the recorded presentation here:
Poster presentation
In addition to the paper presentation, authors of accepted VLDB papers were asked to give a poster presentation.
Bonus meetup talk
And as luck has it, we also hosted a ClickHouse Guangzhou User Group Meetup just a few days before VLDB. At that meetup, we presented an extended version (slides here) of Alexey’s conference talk:
From coast to coast–the journey of our first research paper
We conclude with a bonus section for readers curious about the backstory of our first research paper.
After ClickHouse became open source in 2016, its popularity grew while the pace of development accelerated as well. The ClickHouse team has been so focused on building the world’s fastest analytics database in the past eight years that there hasn't been time to publish an academic paper on ClickHouse.
However, during a ClickHouse company offsite meeting at the stunning Mediterranean coastline of the French Riviera in October 2023, Tanya Bragin, our VP of Product and Marketing at ClickHouse, raised the idea of finally writing a foundational paper on ClickHouse and submitting it to VLDB taking place this year in Guangzhou, China, in the Guangdong province on the north shore of the South China Sea.
We quickly put a small team of authors together, and while some of us had already written research papers as PhD students at university, others were new to this. An intensive writing process kicked off in November 2023 with status calls almost daily, as the paper authors live in different locations. We submitted our final version in April 2024.
Summary
We had a blast last week! Apart from feasts of delicious Cantonese cuisine, the ClickHouse team spent last week at the special 50th-anniversary VLDB 2024 conference in Guangzhou, China, where our CTO and creator of ClickHouse, Alexey Milovidov, proudly presented our first ClickHouse research paper to the scientific community.
We hope you enjoy reading the paper and watching the recording of Alexey’s presentation. We would love to hear what you think.
Lastly, for your convenience, here is a list with links to the paper and all it's accompanying material mentioned in this post:
- VLDB 2024 research paper "ClickHouse - Lightning Fast Analytics for Everyone" + poster
- Recording of Alexey Milovidov's paper presentation at VLDB 2024 + slide deck
- Recording of extended meetup version of our VLDB 2024 talk + slide deck