Iceberg integration
Users can integrate with the Iceberg table format via the table function.
iceberg Table Function
Provides a read-only table-like interface to Apache Iceberg tables in Amazon S3, Azure, HDFS or locally stored.
Syntax
icebergS3(url [, NOSIGN | access_key_id, secret_access_key, [session_token]] [,format] [,compression_method])
icebergS3(named_collection[, option=value [,..]])
icebergAzure(connection_string|storage_account_url, container_name, blobpath, [,account_name], [,account_key] [,format] [,compression_method])
icebergAzure(named_collection[, option=value [,..]])
icebergHDFS(path_to_table, [,format] [,compression_method])
icebergHDFS(named_collection[, option=value [,..]])
icebergLocal(path_to_table, [,format] [,compression_method])
icebergLocal(named_collection[, option=value [,..]])
Arguments
Description of the arguments coincides with description of arguments in table functions s3
, azureBlobStorage
, HDFS
and file
correspondingly.
format
stands for the format of data files in the Iceberg table.
Returned value A table with the specified structure for reading data in the specified Iceberg table.
Example
SELECT * FROM icebergS3('http://test.s3.amazonaws.com/clickhouse-bucket/test_table', 'test', 'test')
ClickHouse currently supports reading v1 and v2 of the Iceberg format via the icebergS3
, icebergAzure
, icebergHDFS
and icebergLocal
table functions and IcebergS3
, icebergAzure
, IcebergHDFS
and IcebergLocal
table engines.
Defining a named collection
Here is an example of configuring a named collection for storing the URL and credentials:
<clickhouse>
<named_collections>
<iceberg_conf>
<url>http://test.s3.amazonaws.com/clickhouse-bucket/</url>
<access_key_id>test<access_key_id>
<secret_access_key>test</secret_access_key>
<format>auto</format>
<structure>auto</structure>
</iceberg_conf>
</named_collections>
</clickhouse>
SELECT * FROM icebergS3(iceberg_conf, filename = 'test_table')
DESCRIBE icebergS3(iceberg_conf, filename = 'test_table')
Schema Evolution At the moment, with the help of CH, you can read iceberg tables, the schema of which has changed over time. We currently support reading tables where columns have been added and removed, and their order has changed. You can also change a column where a value is required to one where NULL is allowed. Additionally, we support permitted type casting for simple types, namely:
- int -> long
- float -> double
- decimal(P, S) -> decimal(P', S) where P' > P.
Currently, it is not possible to change nested structures or the types of elements within arrays and maps.
Partition Pruning
ClickHouse supports partition pruning during SELECT queries for Iceberg tables, which helps optimize query performance by skipping irrelevant data files. Now it works with only identity transforms and time-based transforms (hour, day, month, year). To enable partition pruning, set use_iceberg_partition_pruning = 1
.
Aliases
Table function iceberg
is an alias to icebergS3
now.
See Also