Welcome to the June ClickHouse newsletter, which will round up what’s happened in real-time data warehouses over the last month.
This month, we have the dynamic data type in the 24.5 release, why HyperDX chose ClickHouse over Elasticsearch for observability data, and how to use ClickHouse to count unique users at scale.
In this issue
- Featured Community Member
- Upcoming events
- 24.5 release
- Why HyperDX Chose Clickhouse Over Elasticsearch for Storing Observability Data
- Python User-Defined Functions in ClickHouse
- Tweeq Data Platform: Journey and Lessons Learned: Clickhouse, dbt, Dagster, and Superset
- Using ClickHouse to count unique users at scale
- ClickHouse as part of the ETL/ELT process
- Post of the month
Featured Community Member: Michael Driscoll
This month's featured community member is Michael Driscoll, Co-Founder and CEO at Rill Data.
Michael has worked in the tech industry for two decades as a technologist, entrepreneur, and investor. Over the years, he has founded several companies, including Metamarkets, a real-time analytics platform for digital ad firms, which Snap, Inc. acquired in 2017.
His latest company is Rill, a cloud service for operational intelligence. The Rill and ClickHouse worlds collided when Michael met Alexey, ClickHouse’s Co-Founder and CTO, at FOSDEM earlier this year.
Alexey suggested running Rill on top of a ClickHouse-powered data set of Wikipedia traffic. Michael and his team got this working in a couple of days, and Michael joined the 24.2 Community Call to share Rill’s connector for ClickHouse. Michael also presented at the ClickHouse San Francisco meetup two weeks ago.
Upcoming events
- ClickHouse Fundamentals - June 26th & 27th
- AWS Summit D.C. - June 26th
- Amsterdam Meetup - June 27th
- ClickHouse 24.6 release call - June 27th
- Belgium Meetup - July 4th
- ClickHouse Cloud live update - July 9th
- Paris Meetup - July 9th
- New York Meetup - July 9th
- AWS Summit New York Happy Hour - July 10th
- Boston Meetup - July 11th
- Singapore Meetup - July 11th
24.5 release
The journey to add a semi-structured data type to ClickHouse continues with the introduction of the Dynamic type. This release also saw performance improvements for CROSS JOINs and functionality to read into archive files on S3.
Why HyperDX Chose Clickhouse Over Elasticsearch for Storing Observability Data
Michael Shi works on HyperDX, an open-source observability platform built on OpenTelemetry and Clickhouse. In this blog post, he explains why they use ClickHouse rather than Elasticsearch, pointing out that observability has become more of an analytics problem than a search problem. He identifies ClickHouse’s columnar data layout and sparse indexes as key differentiators.
Python User-Defined Functions in ClickHouse
Tom Weisner has written a tutorial on using Python User-Defined functions in ClickHouse. He starts with a simple function that reverses a string before moving onto a multi-argument function that adds minutes or hours to a provided DateTime. He concludes with a function that detects elevated heart rate activity in time-series data with help from numpy and scipy.
Tweeq Data Platform: Journey and Lessons Learned: Clickhouse, dbt, Dagster, and Superset
Tweeq is a fintech startup building a highly scalable and flexible payments platform from scratch. ClickHouse is the data warehouse, and Tweeq uses the Kafka table engine to ingest data. In this blog post, Atheer Alabdullatif explains how they chose ClickHouse and the other tools that form part of the data platform.
Using ClickHouse to count unique users at scale
Twilio Engage is an Omnichannel Customer Engagement Tool that lets users define customers’ journeys. They wanted to show their users the overall stats per journey and provide more accurate step-level stats. This worked well for all users except those storing vast amounts of data. In the blog post, they explain how they solved this problem using semantic sharding and the distributed_group_by_no_merge
setting, as well as reducing the size of grouping keys in the database.
ClickHouse as part of the ETL/ELT process
Nikolai Potapov discusses the different ways in which ClickHouse can transform data in a data pipeline. We learn about parameterized views, materialized views, and various table engines.
Post of the month
Our favorite post this month was by Pascal Senn, who’s having a great time working with ClickHouse.