Skip to main content

Integrating Apache Spark with ClickHouse


Apache Spark Apache Spark™ is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters.

There are two main ways to connect Apache Spark and ClickHouse:

  1. Spark Connector - the Spark connector implements the DataSourceV2 and has its own Catalog management. As of today, this is the recommended way to integrate ClickHouse and Spark.
  2. Spark JDBC - Integrate Spark and ClickHouse using a JDBC data source.


Both solutions have been successfully tested and are fully compatible with various APIs, including Java, Scala, PySpark, and SparkSQL.