Protobuf
Input | Output | Alias |
---|---|---|
✔ | ✔ |
Description
The Protobuf
format is the Protocol Buffers format.
This format requires an external format schema, which is cached between queries.
ClickHouse supports:
- both
proto2
andproto3
syntaxes. Repeated
/optional
/required
fields.
Example Usage
Basic Examples
Usage examples:
SELECT * FROM test.table FORMAT Protobuf SETTINGS format_schema = 'schemafile:MessageType'
cat protobuf_messages.bin | clickhouse-client --query "INSERT INTO test.table SETTINGS format_schema='schemafile:MessageType' FORMAT Protobuf"
Where the file schemafile.proto
looks like this:
syntax = "proto3";
message MessageType {
string name = 1;
string surname = 2;
uint32 birthDate = 3;
repeated string phoneNumbers = 4;
};
To find the correspondence between table columns and fields of the Protocol Buffers’ message type, ClickHouse compares their names.
This comparison is case-insensitive and the characters _
(underscore) and .
(dot) are considered as equal.
If the types of a column and a field of the Protocol Buffers’ message are different, then the necessary conversion is applied.
Nested messages are supported. For example, for the field z
in the following message type:
message MessageType {
message XType {
message YType {
int32 z;
};
repeated YType y;
};
XType x;
};
ClickHouse tries to find a column named x.y.z
(or x_y_z
or X.y_Z
and so on).
Nested messages are suitable for input or output of a nested data structures.
Default values defined in a protobuf schema like the one that follows are not applied, rather the table defaults are used instead of them:
syntax = "proto2";
message MessageType {
optional int32 result_per_page = 3 [default = 10];
}
ClickHouse inputs and outputs protobuf messages in the length-delimited
format.
This means that before every message its length should be written as a variable width integer (varint).
See also: how to read/write length-delimited protobuf messages in popular languages.
Using Autogenerated Schema
If you don't have an external Protobuf schema for your data, you can still output/input data in the Protobuf format using an autogenerated schema.
For example:
SELECT * FROM test.hits format Protobuf SETTINGS format_protobuf_use_autogenerated_schema=1
In this case, ClickHouse will autogenerate the Protobuf schema according to the table structure using function structureToProtobufSchema
.
It will then use this schema to serialize data in the Protobuf format.
You can also read a Protobuf file with the autogenerated schema. In this case it is necessary for the file to be created using the same schema:
$ cat hits.bin | clickhouse-client --query "INSERT INTO test.hits SETTINGS format_protobuf_use_autogenerated_schema=1 FORMAT Protobuf"
The setting format_protobuf_use_autogenerated_schema
is enabled by default and applies if format_schema
is not set.
You can also save autogenerated schema in the file during input/output using setting output_format_schema
. For example:
SELECT * FROM test.hits format Protobuf SETTINGS format_protobuf_use_autogenerated_schema=1, output_format_schema='path/to/schema/schema.proto'
In this case autogenerated Protobuf schema will be saved in file path/to/schema/schema.capnp
.
Drop Protobuf Cache
To reload the Protobuf schema loaded from format_schema_path
use the SYSTEM DROP ... FORMAT CACHE
statement.
SYSTEM DROP FORMAT SCHEMA CACHE FOR Protobuf