From the internship, I started to contact ClickHouse, which can be regarded as the cognition of database before opening. “There is something different about the database.” For details, see the clickHouse website. Simply put clickhouse is a place for good design.
clickhouse
The column type storage
The feature of row storage is that a single data record is set together, which is closer to the relational model. The writing efficiency is high, and a complete data record can be obtained quickly when reading, showing the locality of data.
But row storage is not friendly to OLAP analysis queries:
- When the query will be the whole of the field query out, and returned to the user is generally a few fields, so actually a large number of
IO
The operation is invalid. - Reading into memory is a block structure. Depending on the computer storage medium, there is actually a lot of invalid data filling the CPU cache line.
Clickhouse stores each field in a single file in storage and uses sparse indexes to greatly speed up queries on wide tables.
Table engine
Rich table engine, give developers more choices and adapt to a variety of scenarios. Here are some of the engines I use a lot:
Mysql
: Direct mappingmysql table
Can be used by youclickhouse
Rich functions to operate onmysql
The data.MergeTree
: You could say this isck
The largest branch of the table engine series,Replicated+
Support for data copy,Replacing+
You can delete duplicated data. (Further details can be discussed in separate articles)Kafka
: can be directly used asKafka
The consumer side, at the same timeMaterialized Views
Data persistence can be achieved.
Of course, in addition to these, the official website there are many engines, see the specific
More
Batch insert
, it is recommended to write in batches, which is also in line with the original design of these software database. To reduceIO
To improve write performance.- Rich functions with rich data types, really pure
SQL
Development.
The above mentioned ck’s many excellent features, how to best use these features to operate CK in development?
Go exposes the top-level database/ SQL interface, and there are already drivers like Clickhouse-Go that can manipulate CK just like the native API. But not enough:
- How to use
batch insert
Improve write performance? - Distributed table or local table, which write should be selected?
- The amount of returned data is too large, how to use
go
Good design returns data?
That’s why we designed CKGroup: to provide a more user-friendly library for ClickHouse.
ckgroup
Ckgroup is a clickhouse-Go wrapper that provides a more user-friendly and convenient API for developers to use.
features
- pure
Golang
The development of - The cluster is automatically done during the insert
hash
Sharding, avoidclickhouse
Insertion performance loss and uneven data distribution due to internal fragmentation - Insert failure retry mechanism
- The query result is automatically converted to
struct
The installation
Go to 1.13
And above, support the three latest versions of GoClichouse (+ 19.16)
$ go get -u github.com/tal-tech/cds/tools/ckgroup
Copy the code
Rapid experience
Make sure you have docker, docker-compose installed. After downloading ckGroup, you can run./demo.sh directly.
As shown in the figure above, demo.sh does the following:
- Created to
ReplicatedMergeTree
A table of engines. The data will be stored as a copy inclickhouse
In the. ExecAuto()
Insert will pre-insert each oneshard
This can be seen in the amount of data printed at each node.QueryRow()
The query results
use
Ckgroup is a wrapper around Clickhouse-Go. In use, developers only need to import ckGroup, do not need to import other driver, clickhouse can be operated.
In this case, we prepare two statements:
- One for inserting tuples (rows)
- The other is for queries.
import "github.com/tal-tech/cds/tools/ckgroup"
// Fill the config
var (
ckgroupConfig = config.Config{
ShardGroups: []config.ShardGroupConfig{
{ShardNode: "tcp://localhost:9000", ReplicaNodes: []string{"tcp://localhost:9001"}},
{ShardNode: "tcp://localhost:9002", ReplicaNodes: []string{"tcp://localhost:9003"}}},})func main(a) {
group := ckgroup.MustCKGroup(c)
// Ready data
var args [][]interface{}
for _, item := range generateUsers() {
args = append(args, []interface{}{item.Id, item.RealName, item.City})
}
// Batch insert, ckgroup will help you to make the shard
err := group.ExecAuto(`insert into user (id,real_name,city) values (? ,? ,?) `.0, args)
iferr ! =nil {
panic(err) // Just for example purpose
}
// Query multi rows of the user in Shanghai
datas := make([]user, 0)
err := group.QueryRows(&datas, `select id,real_name,city from user where city=? `."Shanghai")
iferr ! =nil {
panic(err)
}
for i := range datas {
fmt.Println(datas[i])
}
}
func generateUsers(a) []user {
var users []user
for i := 0; i < 10000; i++ {
item := user{
Id: i,
RealName: fmt.Sprint("real_name_", i),
City: "test_city",
}
users = append(users, item)
}
return users
}
Copy the code
Feel free to contribute your own examples!
TODO
- Change to an interface implementation to facilitate test mocks
- Improve the ease of use of inserts
- streaming
- .
The project address
ckgroup