Welcome to the October ClickHouse newsletter, which will round up what’s happened in real-time data warehouses over the last month.
This month, we have the impressions and challenges of ClickHouse from a first-time user, the APPEND clause for Refreshable Materialized Views, the pancake SQL pattern, and more!
Inside this issue
- Featured community member
- Upcoming events
- ClickHouse for Embedded Analytics: First Impressions and Unexpected Challenges
- Using ClickHouse for High-Volume Data Pipeline Processing and Asynchronous Updates
- 24.9 release
- The pancake SQL pattern
- ClickHouse Cloud Live Update: September 2024
- Quick reads
- Post of the month
Featured community member
This month's featured community member is Duc-Canh Le, a Software Engineer at Ahrefs.
Duc-Canh works on data infrastructure at Ahrefs and is responsible for developing and operating ClickHouse on over 600 machines that hold 100 PB of compressed data.
He is a regular contributor to the ClickHouse code base and has made 28 contributions in the calendar year. These include supporting OPTIMIZE on join tables to reduce their memory footprint, fixing a bug when using an empty tuple on the left-hand side of the `IN` function, and a fix for the FINAL clause when run on tables that don’t use adaptive granularity.
Upcoming events
Global events
- Release call 24.10 - Oct 31
- ClickHouse Cloud Live Update - Nov 12
Free training
- BigQuery to ClickHouse Workshop - Oct 23
- Query optimization with ClickHouse workshop - Oct 30
- ClickHouse Fundamentals - Nov 6
- Migrating from Postgres to ClickHouse Workshop
- Asia Pacific - Nov 20
- Europe - Nov 27
Events in AMER
- Coffee with ClickHouse in Santa Monica - Oct 25
- KubeCon North America - Nov 12-15
- Microsoft Ignite - Chicago - Nov 19-22
Events in EMEA
- Meetup in Madrid - Oct 22
- Coffee with ClickHouse - Oct 23
- Meetup in Oslo - Oct 31
- Meetup in Barcelona - Nov 12
- Meetup in Ghent - Nov 19
- Meetup in Dubai - Nov 21
- Meetup in Paris - Nov 26
Events in Asia Pacific
- Data & AI Summit VIC - Oct 22
ClickHouse for Embedded Analytics: First Impressions and Unexpected Challenges
Jorin Vogel recently started using ClickHouse for an embedded analytics project and shared his first thoughts. He also described things he wished he’d known before starting, including how materialized views work and working with duplicate data. This is a good read if you’re just starting your ClickHouse journey.
Using ClickHouse for High-Volume Data Pipeline Processing and Asynchronous Updates
Marais Kruger works at Evinced (a company focused on accessibility compliance for enterprise clients) and has written a blog post about this experience building a ClickHouse-based data pipeline.
Marais explains how they designed their pipeline to handle a large volume of incoming data while also handling infrequent updates to that data. He also describes how they made writes idempotent using ClickHouse’s duplicate block detection and a setting used to ensure similar behavior with dependent materialized views.
This one is a good read for the ClickHouse enthusiast or anyone curious about how to design data pipelines at scale.
24.9 release
The 24.9 release introduced the APPEND clause for working with refreshable materialized views. When configured, the materialized view’s query will append results to the end of the destination table rather than replacing everything. This is useful if you want to capture snapshots of data from other tables or poll data from an external API and store it in ClickHouse.
This release also made response headers available when using the url table function, automatic inference of the Variant data type, and aggregate functions to query the new JSON data type.
The pancake SQL pattern
Jacek Migdal had a tricky problem: One of the Quesma dashboards was sending up to 10 queries to populate a single panel, putting the ClickHouse database under pressure.
Jacek was trying to solve this problem and had a lightbulb moment while feeding his toddler pancakes: Could the dashboard queries be redesigned to look more like pancakes?
Rather than spawning multiple queries, they put everything into one query. The aggregations would be stacked on each other, like a pancake, where each layer is a grouping with a limit, and between layers, they have metric aggregations—our pancake “fillings.”
It worked, and they’re seeing a 50x increase in performance.
ClickHouse Cloud Live Update: September 2024
We had a special guest, Dunith Danushka from Redpanda, join us for our latest ClickHouse Cloud update call. Dunith and Mark Needham showed how to use the combination of Redpanda Serverless, ClickHouse Cloud, and OpenAI to power a sports commentary Copilot application.
We also had updates on some upcoming features in ClickHouse Cloud, including Bring Your Own Cloud, Compute-Compute separation, and the JSON data type.
Quick reads
- Juan S. Carrillo wrote a User Defined Function (UDF) to make it easier to sort software versions.
- Rafal Kwasny explores various options for data storage and focuses on using ClickHouse for high-performance financial data analysis.
- Alexey Milovidov shared his favorite ClickHouse features of 2024 at a recent San Francisco meetup.
- Sai Srirampur and Bryan Clark wrote a blog post explaining how to combine ClickHouse and Neon for real-time analytics on transactional data using PeerDB to sync the data.
Post of the month
Our favorite post this month was by Carl Lindesvärd about ClickHouse’s compression rate, a somewhat underrated feature!