Join us at Microsoft Ignite in Chicago, Nov 19-21. Booth 614! ->->

Blog / Community

Seattle Meetup Report: Petabyte-Scale Website Behavior Analytics using ClickHouse (Microsoft)

author avatar
ClickHouse Editor
Jan 18, 2023 - 4 minutes read

On January 18th, 2023 Microsoft hosted a ClickHouse community meetup at their office in Redmond. The WebXT team presented two of their analytics products using ClickHouse: Microsoft Clarity and Titan. Microsoft Clarity is a free tool that provides website owners with insights to help them make better business decisions, without the need for a data science team or an instrumentation pipeline.

Narendra Rana, Principal Data Scientist on the Clarity Insights team at Microsoft, explained the architecture of Clarity, the challenges faced by the team when integrating ClickHouse, and how they overcame these challenges to create an efficient, robust, and GDPR-compliant analytics platform.

Powering Website Analytics at Petabyte-Scale

Microsoft Clarity is designed to handle billions of page views, petabytes of data, millions of websites, and hundreds of millions of users daily. Clarity is simple to set up and users can visit clarity.microsoft.com, download a JavaScript bundle, and install it on their website. Within 30 minutes, data will begin flowing onto a dashboard dedicated to the user's site. The dashboard offers insights that help website owners improve their website. They can see session recordings, heat maps, scroll maps, and gain deeper insights into their website's performance. The sessions are not sampled, and all data comes in, which is a significant advantage for webmasters.

The presentation included a live demo, where the audience was shown how a webmaster could select a session and see the interaction of their end-users. Microsoft Clarity provides heat maps that help to aggregate and show the interaction of the end-users. Rana explained that creating a heatmap is a complex operation that involves a big query, which is well suited for ClickHouse. “A simple operation, like looking at a heat map is basically a big query. And as this is happening lots of data is being ingested at the same time. We looked at a lot of the options out there…we realized that yes, ClickHouse is the stuff that we are going to bet on”, said Rana.

clarity image 1.png Heat Maps in Clarity, a complex operation involving a big query in ClickHouse.

ClickHouse Backup and Recovery on Azure

The team worked with ClickHouse creator Alexey Milovidov to overcome challenges, primarily with backup and recovery on Azure, which is where they invested a significant amount of time. Rana shared that the team's experience in building this system which can be leveraged by other teams, both internal and external, who want to get efficient, robust, and consistent backups without any downtime.

clarity image 2.png Millions of rows of data are being ingested, while thousands of queries are being executed, at the same time.

Incremental Snapshots is a feature of Azure that enables incremental backups while data ingestion is still occurring. This is a more cost-effective solution than the original snapshots that Azure offered. Freeze and Sync commands are used to synchronize cache and disk, and automated validation is used to test the backup and restore processes periodically.

Satish Manivannan, Senior Director of Data and Analytics at Microsoft, expressed his enthusiasm for ClickHouse, stating, "The key point is, we really love ClickHouse and we hope to have more continued collaboration for years to come." Overall, the meetup was an excellent opportunity to learn about Microsoft's use of ClickHouse and its benefits for website analytics.

More Details

  • This talk was given at the ClickHouse Community Meetup at the Microsoft office in Redmond on January 18, 2023
  • Learn more about ClickHouse within Titan, the internal data analytics tool that enables self-serve analytics for Microsoft in our meetup report.
Share this post

Subscribe to our newsletter

Stay informed on feature releases, product roadmap, support, and cloud offerings!
Loading form...
Follow us
Twitter imageSlack imageGitHub image
Telegram imageMeetup imageRss image