Reading data from S3
Using ClickHouse S3 table function, users can query S3 data as a source without requiring persistence in ClickHouse. The following example illustrates how to read 10 rows of the NYC Taxi dataset.
SELECT
trip_id,
total_amount,
pickup_longitude,
pickup_latitude,
dropoff_longitude,
dropoff_latitude,
pickup_datetime,
dropoff_datetime,
trip_distance
FROM
s3('https://datasets-documentation.s3.eu-west-3.amazonaws.com/nyc-taxi/trips_*.gz', 'TabSeparatedWithNames')
LIMIT 10
SETTINGS input_format_try_infer_datetimes = 0;
Inserting Data from S3
To transfer data from S3 to ClickHouse, users can combine the s3 table function with INSERT statement. Let's create an empty hackernews
table:
CREATE TABLE hackernews ORDER BY tuple
(
) EMPTY AS SELECT * FROM s3('https://datasets-documentation.s3.eu-west-3.amazonaws.com/hackernews/hacknernews.csv.gz', 'CSVWithNames');
This creates an empty table using the schema inferred from the data. We can then insert the first 1 million rows from the remote dataset
INSERT INTO hackernews SELECT *
FROM url('https://datasets-documentation.s3.eu-west-3.amazonaws.com/hackernews/hacknernews.csv.gz', 'CSVWithNames')
LIMIT 1000000;