June 2024 Newsletter

ClickHouse Team

Jun 20, 2024 - 6 minutes read

Welcome to the June ClickHouse newsletter, which will round up what’s happened in real-time data warehouses over the last month.

This month, we have the dynamic data type in the 24.5 release, why HyperDX chose ClickHouse over Elasticsearch for observability data, and how to use ClickHouse to count unique users at scale.

This month's featured community member is Michael Driscoll, Co-Founder and CEO at Rill Data.

Michael has worked in the tech industry for two decades as a technologist, entrepreneur, and investor. Over the years, he has founded several companies, including Metamarkets, a real-time analytics platform for digital ad firms, which Snap, Inc. acquired in 2017.

His latest company is Rill, a cloud service for operational intelligence. The Rill and ClickHouse worlds collided when Michael met Alexey, ClickHouse’s Co-Founder and CTO, at FOSDEM earlier this year.

Alexey suggested running Rill on top of a ClickHouse-powered data set of Wikipedia traffic. Michael and his team got this working in a couple of days, and Michael joined the 24.2 Community Call to share Rill’s connector for ClickHouse. Michael also presented at the ClickHouse San Francisco meetup two weeks ago.

Follow Michael on LinkedIn

ClickHouse Fundamentals - June 26th & 27th
AWS Summit D.C. - June 26th
Amsterdam Meetup - June 27th
ClickHouse 24.6 release call - June 27th
Belgium Meetup - July 4th
ClickHouse Cloud live update - July 9th
Paris Meetup - July 9th
New York Meetup - July 9th
AWS Summit New York Happy Hour - July 10th
Boston Meetup - July 11th
Singapore Meetup - July 11th

The journey to add a semi-structured data type to ClickHouse continues with the introduction of the Dynamic type. This release also saw performance improvements for CROSS JOINs and functionality to read into archive files on S3.

Read the release post

Michael Shi works on HyperDX, an open-source observability platform built on OpenTelemetry and Clickhouse. In this blog post, he explains why they use ClickHouse rather than Elasticsearch, pointing out that observability has become more of an analytics problem than a search problem. He identifies ClickHouse’s columnar data layout and sparse indexes as key differentiators.

Read the blog post

Tom Weisner has written a tutorial on using Python User-Defined functions in ClickHouse. He starts with a simple function that reverses a string before moving onto a multi-argument function that adds minutes or hours to a provided DateTime. He concludes with a function that detects elevated heart rate activity in time-series data with help from numpy and scipy.

Read the blog post

Tweeq is a fintech startup building a highly scalable and flexible payments platform from scratch. ClickHouse is the data warehouse, and Tweeq uses the Kafka table engine to ingest data. In this blog post, Atheer Alabdullatif explains how they chose ClickHouse and the other tools that form part of the data platform.

Read the blog post

Twilio Engage is an Omnichannel Customer Engagement Tool that lets users define customers’ journeys. They wanted to show their users the overall stats per journey and provide more accurate step-level stats. This worked well for all users except those storing vast amounts of data. In the blog post, they explain how they solved this problem using semantic sharding and the distributed_group_by_no_merge setting, as well as reducing the size of grouping keys in the database.

Read the blog post

Nikolai Potapov discusses the different ways in which ClickHouse can transform data in a data pipeline. We learn about parameterized views, materialized views, and various table engines.

Read the blog post

Our favorite post this month was by Pascal Senn, who’s having a great time working with ClickHouse.

Read the post

Share this post