Welcome to the September ClickHouse newsletter, which will round up what’s happened in real-time data warehouses over the last month. This month, we have the much-awaited JSON data type, our 1st ClickHouse research paper, a Private Preview of BYOC on AWS, better PyPi stats with Ibis, and more!
Inside this issue
- Featured community member
- Upcoming events
- VLDB 2024: First ClickHouse research paper
- How Reco leverages advanced analytics to detect sophisticated SaaS threats
- 24.8 LTS release
- Better PyPI stats with Ibis, ClickHouse, and Shiny
- ClickHouse Cloud: BYOC AWS in Private Preview
- Quick reads
- Post of the month
Featured community member
beehiiv is a newsletter platform that helps creators, publishers, and businesses build and grow their email audiences. They collect events capturing every time an email is processed, every time it lands in an inbox, every time it’s deferred, every time it’s bounced, every time you open it, every time you click a link, and so on.
Eric has worked at beehiv for just over a year and was responsible for moving data operations from Postgres to ClickHouse Cloud. There’s a user story on the work he and his team did, and he also presented at the New York meetup in the summer.
Eric previously worked as a Tech Lead at Arthur.ai, where he architected and built the company's data ingestion pipeline, storage, and much of the backend infrastructure.
Upcoming events
Global events
- ClickHouse Cloud Live Update - Sep 24
- 24.9 release community call - Sep 26
Free training
-
Query optimization with ClickHouse workshop
- Sep 25
-
In-Person ClickHouse Workshop
- Singapore - Oct 3
Events in EMEA
- Meetup in Tel Aviv - Sep 22
- Meetup in Madrid - Oct 22
- Meetup in Barcelona - Oct 29
- Meetup in Oslo - Oct 31
- Meetup in Ghent - Nov 19
- Meetup in Dubai - Nov 21
- Meetup in Paris - Nov 26
Events in Asia Pacific
-
DataEngBytes - Sydney
- Sep 24
-
DataEngBytes - Perth
- Sep 27
-
DataEngBytes - Melbourne
- Oct 1
-
DataEngBytes - Auckland
- Oct 4
- Big Data & AI World Asia - Oct 10
-
Cloud Excellence Summit NSW
- Oct 17
-
Data & AI Summit VIC
- Oct 22
VLDB 2024: First ClickHouse research paper
It’s been almost a year in the making, and at the end of August, we presented our first research paper at VLDB 2024.
VLDB—the international conference on very large databases—is widely regarded as one of the leading conferences in data management. VLDB generally has an acceptance rate of ~20% among the hundreds of submissions.
The paper concisely describes ClickHouse's most interesting architectural and system design components, which make it so fast. We’ve embedded the PDF of the paper in the blog post linked below.
How Reco leverages advanced analytics to detect sophisticated SaaS threats
Reco is a full-lifecycle SaaS security solution that uses ClickHouse as the foundation of its advanced analytics system. Nir Barak explains how ClickHouse gives them a holistic view of data across multiple layers and allows them to detect outliers and anomalies.
24.8 LTS release
The 24.8 release is here, and it has an exciting feature that I (and many of you) have been waiting for - the new JSON data type!
It’s in experimental mode, but that didn’t stop us from taking it through its paces while exploring structured data of events in football/soccer matches.
This release also introduces the TimeSeries table engine, which can store Prometheus data, and a new Kafka table engine that supports exactly-once event processing.
Better PyPI stats with Ibis, ClickHouse, and Shiny
ClickPy is a ClickHouse-backed application that analyzes the download of Python packages published on PyPI. In addition to the front-end application, you can also query the underlying data, which is exactly what Cody Peterson has done.
Cody shows how to connect to ClickPy using Ibis and then explores the seasonality of downloads of the clickhouse-connect package by day of the week and month. The results are visualized using plot.ly, and Cody then puts everything together into a Shiny application.
ClickHouse Cloud: BYOC AWS in Private Preview
ClickHouse Cloud has been running for almost two years and supports all the major cloud platforms, AWS, Azure, and GCP. So far, it’s been a SaaS offering that runs entirely on ClickHouse’s cloud account, which made it a non-starter for users with strict data residency and compliance requirements.
We’re therefore happy to announce the Private Preview release of Bring Your Own Cloud (BYOC) on AWS. BYOC is a fully managed ClickHouse Cloud service deployed to your AWS account.
The waiting list is now open, so be sure to sign up, and we’ll contact you to set you up.
Quick reads
- Heng Ma shows how to build a system that enriches shopping cart events with product details. Using Rising Wave, a Kafka event data stream is joined with a product catalog, and the enriched events are written to ClickHouse using the Rising Wave-ClickHouse connector.
- Auxten released a new version of chDB, the in-process embedded version of ClickHouse, that can query Pandas DataFrames 87 times faster than the initial version.
- I loved this video from Jess Archer’s talk at Laracon US 2024. It is an excellent introduction to ClickHouse and shows where it’s better than MySQL.
-
Sai Srirampur
shares his tips for ClickHouse data modeling aimed at Postgres users. He explains various strategies to handle duplicates when using the
ReplacingMergeTree table engine, how to handle null values, and the
importance of ordering keys
Post of the month
Our favorite post this month was by Michael Driscoll about the new JSON data type: