How to Validate if Two Queries Return the Same Result-sets
· 2 min read
Learn how to validate that two ClickHouse queries produce identical result-sets using hash functions and comparison techniques.
Question
How can I validate that two queries return the same resultsets?
Answer
You can use the below approach:
WITH
(
SELECT sum(cityHash64(*))
FROM
(
-- your query 1 here
-- SELECT ...
)
) AS q1_resultset_hash,
(
SELECT sum(cityHash64(*))
FROM
(
-- your query 2 here
-- SELECT ...
)
) AS q2_resultset_hash
SELECT equals(q1_resultset_hash,q2_resultset_hash) as Q1_equals_Q2
The example uses a CTE to calculate sums of the cityHash value of each row in these two queries and will return 1
if the two resultsets are identical.
Using some integers sequence data and some pretty formatting:
WITH
(
SELECT sum(cityHash64(*))
FROM
(
SELECT *
FROM numbers(10)
ORDER BY number DESC
)
) AS q1_resultset_hash,
(
SELECT sum(cityHash64(*))
FROM
(
SELECT *
FROM numbers(10)
ORDER BY number ASC
)
) AS q2_resultset_hash
SELECT q1_resultset_hash = q2_resultset_hash AS Q1_equals_Q2
FORMAT Pretty
will return:
┏━━━━━━━━━━━━━━┓
┃ Q1_equals_Q2 ┃
┡━━━━━━━━━━━━━━┩
│ 1 │
└──────────────┘
While this can be handy in many scenarios, it can't be considered as a silver bullet to validate equality of resultsets for all types and there are caveats to using it, for example if any row contains NULL
values the above approach will fail.