Clickhouse is a fast open-source column database for online analytical processing. Developed by the Russian IT company, Yandex, Clickhouse allows analysis of data that is updated in real time.
By using SQL queries, its simplicity and expressivity matches that of pl/pgSQL. It is marketed for its high performance since it includes higher order functions for working with arrayFilter and arrayMap.
What Do We Use Clickhouse For?
In simple words, Clickhouse is used to run fast analytics when you have a large amount of data at hand. It might not be ideal for transactional database, but it can run queries on billions of rows in a few seconds.
It allows teams to store and analyze large data sets of high dimensionality with ease. It outshines its competitors in terms of query complexity and speed.
Clickhouse is suitable only with analytic workloads. If you have a lot of Group By and aggregated functions like sum, count and distinct, Clickhouse will be very fast even if the number of rows is in 11-12 digits. But in case you are interested in point queries and large scale joins, a usual RDBMS will be a better choice.
It is important to note that queries like delete and update are not supported by Clickhouse in its stable syntax. Even when it does support, the compression and mechanism will slow it down. This database is not made for transactional workloads.
You can compress your data with it as well and the space it saves depends on column granularity and how much query speed you can sacrifice. The key to its quick queries lies in the way Clickhouse partitions and compresses its data.
What Makes Clickhouse So Amazing?
#1
It doesn’t matter if you need data distribution and data replication on a dozen machines, Clickhouse is easy to setup and use.
If you are familiar with Clickhouse then deploying a cluster on 3 machines is almost instant and in case you are learning it from scratch, it may take a few hours to do it properly.
You won’t have to worry about endless configuration files, obscure files or user permission issues with data distribution and replication.
#2
A lot of Big Data technologies break down without warning or sometimes even without an apparent reason. The worst thing is that they don’t give much information within their humungous logs about the reason of the problem.
Clickhouse on the other hand, is not fragile and is in fact designed to recover from failure. It produces a reasonable system log with explanations of the problems it encountered.
#3
Clickhouse doesn’t redesign your method, it only adds speed to it. The best thing is that it doesn’t limit you or force you to change the way you store your data or query it.
From matrices to nested structures, its data type covers everything. It has a very versatile query language in addition to a huge library of functions.
While other column store solutions force you to change your schema, Clickhouse allows you to duplicate a transactional database schema.
#4
The worst thing about Big Data technologies is that they are designed to sell support. For example, Druid is free and open source in theory but its complexity, mixed with lack of documentation and community involvement is annoying.
Clickhouse on the other hand has amazing documentation and a brilliant community. Its documentation is easy to search through and is always being expanded and improved.
The community of Clickhouse is tiny but gets the job done. It consists of developers and users from Yandex and some early adopters. The community may be small in size but that doesn’t stop them from replying to every issue on Github and StackOverflow within hours.
The expressive syntax and amazing query speed of this database is making developers all over the world excited about it. You can put it to test yourself over a few billion rows and see for yourself how functional Clickhouse is.