Skip to main content
Skip to main content

Arrow

InputOutputAlias

Description

Apache Arrow comes with two built-in columnar storage formats. ClickHouse supports read and write operations for these formats. Arrow is Apache Arrow’s "file mode" format. It is designed for in-memory random access.

Data Types Matching

The table below shows the supported data types and how they correspond to ClickHouse data types in INSERT and SELECT queries.

Arrow data type (INSERT)ClickHouse data typeArrow data type (SELECT)
BOOLBoolBOOL
UINT8, BOOLUInt8UINT8
INT8Int8/Enum8INT8
UINT16UInt16UINT16
INT16Int16/Enum16INT16
UINT32UInt32UINT32
INT32Int32INT32
UINT64UInt64UINT64
INT64Int64INT64
FLOAT, HALF_FLOATFloat32FLOAT32
DOUBLEFloat64FLOAT64
DATE32Date32UINT16
DATE64DateTimeUINT32
TIMESTAMP, TIME32, TIME64DateTime64UINT32
STRING, BINARYStringBINARY
STRING, BINARY, FIXED_SIZE_BINARYFixedStringFIXED_SIZE_BINARY
DECIMALDecimalDECIMAL
DECIMAL256Decimal256DECIMAL256
LISTArrayLIST
STRUCTTupleSTRUCT
MAPMapMAP
UINT32IPv4UINT32
FIXED_SIZE_BINARY, BINARYIPv6FIXED_SIZE_BINARY
FIXED_SIZE_BINARY, BINARYInt128/UInt128/Int256/UInt256FIXED_SIZE_BINARY

Arrays can be nested and can have a value of the Nullable type as an argument. Tuple and Map types can also be nested.

The DICTIONARY type is supported for INSERT queries, and for SELECT queries there is an output_format_arrow_low_cardinality_as_dictionary setting that allows to output LowCardinality type as a DICTIONARY type.

Unsupported Arrow data types:

  • FIXED_SIZE_BINARY
  • JSON
  • UUID
  • ENUM.

The data types of ClickHouse table columns do not have to match the corresponding Arrow data fields. When inserting data, ClickHouse interprets data types according to the table above and then casts the data to the data type set for the ClickHouse table column.

Example Usage

Inserting Data

You can insert Arrow data from a file into ClickHouse table using the following command:

$ cat filename.arrow | clickhouse-client --query="INSERT INTO some_table FORMAT Arrow"

Selecting Data

You can select data from a ClickHouse table and save it into some file in the Arrow format using the following command:

$ clickhouse-client --query="SELECT * FROM {some_table} FORMAT Arrow" > {filename.arrow}

Format Settings

SettingDescriptionDefault
input_format_arrow_allow_missing_columnsAllow missing columns while reading Arrow input formats1
input_format_arrow_case_insensitive_column_matchingIgnore case when matching Arrow columns with CH columns.0
input_format_arrow_import_nestedObsolete setting, does nothing.0
input_format_arrow_skip_columns_with_unsupported_types_in_schema_inferenceSkip columns with unsupported types while schema inference for format Arrow0
output_format_arrow_compression_methodCompression method for Arrow output format. Supported codecs: lz4_frame, zstd, none (uncompressed)lz4_frame
output_format_arrow_fixed_string_as_fixed_byte_arrayUse Arrow FIXED_SIZE_BINARY type instead of Binary for FixedString columns.1
output_format_arrow_low_cardinality_as_dictionaryEnable output LowCardinality type as Dictionary Arrow type0
output_format_arrow_string_as_stringUse Arrow String type instead of Binary for String columns1
output_format_arrow_use_64_bit_indexes_for_dictionaryAlways use 64 bit integers for dictionary indexes in Arrow format0
output_format_arrow_use_signed_indexes_for_dictionaryUse signed integers for dictionary indexes in Arrow format1