Welcome to the August ClickHouse newsletter, which will round up what’s happened in real-time data warehouses over the last month.
This month, we have exciting news about PeerDB joining ClickHouse, downsampling time series data, join performance improvements in the 24.7 release, and more!
Alexey, ClickHouse creator and CTO, goes on tour!
We are excited to share that Alexey Milovidov, creator and CTO of ClickHouse, will be delivering a series of technical talks around the world. Please join these events in person to hear him speak and a chance to ask him ANY question about ClickHouse! Space is limited, register below:
- Sun, Aug 25 - China Meetup, Guangzhou - Register
- Tues, Aug 27 - VLDB Talk, Guangzhou - Schedule
- Thur, Sept 5 - San Francisco Meetup (Cloudflare) - Register
- Mon, Sept 9 - Raleigh Meetup (Deutsche Bank) - Register
- Tues, Sept 10 - New York Meetup (Rokt) - Register
- Thur, Sept 12 - Chicago Fireside Chat (Jump Capital) - Register
- Wed, Sept 18 - Warsaw AWS Cloud Day - Register
Inside this issue
- Featured community member: Chase Richards
- Upcoming events
- ClickHouse welcomes PeerDB
- Downsampling time series data
- 24.7 release
- How Maxilect transferred ClickHouse between geographically distant data centers
- Java Client… the SEQUEL?!
- Quick reads
- Post of the month
Featured community member: Chase Richards
This month's featured community member is Chase Richards, VP of Engineering at Corsearch.
Chase Richards previously led engineering efforts at Marketly from a 2011 start-up through its acquisition in 2020 by Corsearch.
Chase recently presented at the Bellevue meetup about his experience replacing MySQL with ClickHouse as the backing database for a client-facing report interface for their search engine protection service. Having done this in 2018, Chase earned his status as a trailblazer in the community.
More recently, Chase and his team have added vector-based analytics to their fraud detection model. They’re also using ClickHouse to monitor their search engine scraping setup.
Upcoming events
- ClickHouse Guangzhou Meetup - Aug 25
- ClickHouse + Melbourne Data Engineering Meetup - Aug 27
- ClickHouse Meetup in Bellevue - Aug 27
- ClickHouse Developer Training - Sep 3
- ClickHouse Meetup in Zurich - Sep 5
- ClickHouse + Sydney Data Engineering Meetup - Sep 5
- ClickHouse Meetup @ Cloudflare - San Francisco - Sep 5
- Kubernetes Community Days - Sydney - Sep 5-6
- ClickHouse Meetup in Raleigh - Sep 9
- ClickHouse Meetup @ Shopify - Toronto - Sep 10
- ClickHouse Admin Workshop - Sep 10
- AWS Summit Toronto - Sep 10
- ClickHouse Meetup @ Rokt - NYC - Sep 10
- Coffee with ClickHouse - Amsterdam - Sep 11
- ClickHouse Fundamentals - Sep 11
- ClickHouse Meetup @ Jump Capital - Sep 12
- ClickHouse Meetup - Austin - Sep 17
- ClickHouse Meetup in London - Sep 17
- AWS Cloud Day - Warsaw - Sep 18
- In-person ClickHouse Fundamentals training - Amsterdam - Sep 18-19
- Big Data LDN (London) - Sep 18-19
- ClickHouse Cloud Live Update - Sep 24
- DataEngBytes - Sydney - Sep 24
- DataEngBytes - Perth - Sep 27
- DataEngBytes - Melbourne - Oct 1
- DataEngBytes - Auckland - Oct 4
ClickHouse welcomes PeerDB
A couple of weeks ago, we were thrilled to announce today that ClickHouse joined forces with PeerDB, a Change Data Capture (CDC) provider focused on Postgres.
Now, users have an easy button to sync their data from the number one transactional database to the number one analytical database.
Downsampling time series data
Phare is a platform for website monitoring, incident management, status pages, analytics, security, and alerting. They wanted to create a chart showing 90 days of monitoring data. As they collect one data point per minute, this meant that the chart needed to render 130,000 data points, which was both slow to do and difficult to interpret.
Enter the largestTriangleThreeBuckets function, added to ClickHouse at the end of 2023. Using this function, they could remove redundant data points, making the chart quicker to create and easier to interpret.
24.7 release
The 24.7 release includes many performance improvements. These include a full sorting merge algorithm for ASOF joins, a faster parallel hash join algorithm, and improvements to the “read in order” algorithm when running queries with a high-selectivity filter.
We also have deduplication In Materialized Views, automatic named tuples, and the percent_rank window function.
How Maxilect transferred ClickHouse between geographically distant data centers
Maxilect, an IT solutions provider for the Adtech and Fintech industries, has written an experience report on moving a ClickHouse cluster from a data center in Miami to another in Detroit.
In this blog post, Igor Ivanov and Denis Palaguta explain how they did this using the clickhouse-copier tool while keeping the service up and serving user requests.
Java Client… the SEQUEL?!
We recently started work on revamping the ClickHouse Java client. The new version has a more intuitive, self-documenting API, and we’ve added more usage examples to the documentation.
It’s still in alpha, but we’d love for you to try it and send us your thoughts.
Quick reads
- Vladimir Ivoninskii shares his best techniques for effectively running a production ClickHouse cluster.
- Denys Golotiuk shows how to do image similarity search using vector embeddings in ClickHouse with the L2Distance function.
- Joe Zhou explores integrating ClickHouse with Dragonfly, an ultra-high-throughput, Redis-compatible in-memory data store.
Post of the month
Our favorite post this month was by Y Combinator about PeerDB joining ClickHouse.