In 2021, Alexander Kuzmenkov shared a little bit about the work going into fuzzing ClickHouse database and how our core Engineering team at ClickHouse and many contributors are continuously making improvements with our test coverage. As Alexander said, “Testing is a major problem in software development: there is never enough of it.”.
Fuzzing has always been a big part of ClickHouse. In the same year with Alexander’s blog post, our founder, Alexey, also spoke at the Siberia C++ User Group about our various approaches to fuzzing that the ClickHouse team employs to find bugs (video & slide). The presentation covered many topics, from simple fuzzing techniques to more complex ones using libfuzzer and AST Fuzzer approach.
Beside the works above, we are also actively engaging with other security teams, researchers, and partners to work on fuzzing the ClickHouse database. In this blog post, we will be talking about one of our recent experiences with the WINGFUZZ team. Through this collaboration and the innovative fuzzing technology that the team built, we were able to discover several new issues and interesting edge cases in ClickHouse.
Inside WINGFUZZ Technology
To understand how this works together, let's take a look under the hood of the WINGFUZZ fuzzing engine, as it’s explained by their team.
The WINGFUZZ database fuzzing engine can help generate semantically correct SQL sequences with abundant types. It mainly utilizes two optimization technologies: metadata based grammar-free mutation, and type-affinity based sequence generation.
Basically, every DBMS has its own unique features and SQL dialect, including ClickHouse. Rather than s relying on the grammar, WINGFUZZ summarizes the DB’s state into a metadata graph, a lightweight data structure which improves mutation correctness in fuzzing.
Specifically, it first tracks the metadata of the statements in built-in SQL test cases as they are executed, and constructs the metadata graph to describe the dependencies between metadata and statements iteratively. Based on the metadata graphs, it generates test cases with new sequences by randomly reshuffling existing statements, and operates metadata-guided substitution to correct semantic errors that occur after the reshuffle.
The technical details of the grammar-free fuzzing can be found in the "Griffin: Grammar-Free DBMS Fuzzing" research paper.
In order to explore a larger input space, WINGFUZZ can generate SQL sequences with many more types than other fuzzers. The key idea of sequence generation is type-affinity, which indicates the meaningful occurrence of SQL type pairs (e.g., INSERT and SELECT).
During each fuzzing iteration, WINGFUZZ first proactively explores SQL statements generated by sequence-oriented mutation and analyzes affinities with coverage feedback. The sequence-oriented mutations include substitutions, insertions, and deletions, as shown below. If the mutated seed covers new code regions in DBMS, the type-affinity caused by type changes will be recorded.
Next, when a new type-affinity is recorded, WINGFUZZ progressively generates new SQL type sequences containing the type pairs and instantiates them into actual SQL test cases. The figure below shows how WINGFUZZ finds a meaningful SQL sequence, which consists of CREATE, INSERT, CREATE TRIGGER, INSERT, and SELECT.
More technical details of the sequence-oriented fuzzing can be found in the “Sequence-Oriented DBMS Fuzzing” research paper.
Issues & Fixes
With so many test queries readily available in our extensive test folder, the WINGFUZZ team was able to quickly adapt their unique approach and started fuzzing ClickHouse.
While there were other bugs being reported by the WINGFUZZ team in the past few months, the team recently made some adjustments and reported another list of 8 unique findings. These were immediately addressed by our core Engineering team as soon as they landed as GitHub issues (screenshot below).
Interestingly, some of the findings, such as this, were only introduced four months ago. This shows that with the speed of development at ClickHouse, fuzzing plays a crucial role in ensuring stability and security of our product as we move forward and build new features.
What's next?
The key takeaway from this blog post is the importance of improving fuzzing techniques for every Database technology. The more fuzzing, the more coverage, the better. At ClickHouse, we recognize this and continuously integrate additional fuzzers into our daily development process as we expand our feature set. This proactive approach ensures that each version of ClickHouse is delivered with the utmost focus on security and reliability.
Similar to the previous fuzzing efforts, our team did not stop at just these initial findings. In collaboration with the WINGFUZZ team and the engineering team at ClickHouse, our security team is working on integrating the WINGFUZZ fuzzing engine into our day-to-day tests. This will be covered in the next part of this series of blog posts.
Finally, I would like to share a quote from our founder, Alexey, addressed to the WINGFUZZ team. This quote stands as an open challenge to all fuzzer enthusiasts out there!