This article is based on a compilation shared with the public a few months ago, with some expansions.

Hopefully, this article will provide a simple step-by-step guide to getting started with ClickHouse. If you have a potential data analysis need but don’t know where to start, I hope this article will help you.

Writing in the front

I don’t think I need to repeat the laudatory words about ClickHouse in a performance-driven setting. As a rule of experience, you can do quasi-real-time data analysis at a fraction of the cost that RDBMSS such as MySQL cannot do, migrate data from multiple database instances remotely, or archive storage.

Thanks to the recommendation of a good friend two years ago, ClickHouse simplifies a lot of unnecessary infrastructure setup and maintenance. We have built a relatively luxurious machine (256 core 512GB memory) to conduct quasi-real-time data analysis (spend a few ten-thousandths of a second to search results from massive data), and support the landing of hundreds of thousands of data per second, without affecting the server query performance; I have also practiced running edge computing tasks from a 2,000-block NUC, getting the results needed to be analyzed relatively quickly (taking 1/1000 to 1/100th of a second), and trying to run complex calculations of over a billion data sets (minute level) on a 16-core 64GB ordinary notebook.

So ClickHouse might also be perfect for you if you have the following needs:

  • Quick analysis of some offline data, data calculation, aggregation, screening.
  • There are a lot of read requirements and very little need to modify the original data. If there are such requirements, you can accept the way of “append data” and “version filtering”.
  • The data field is relatively rich, the data has many “columns”.
  • Business concurrency demand is not high, only a few or less than one or two hundred inquiries (consumers).

Let’s start with hardware choices.

Hardware selection strategy

For those of you who would consider ClickHouse, you are probably at a point in your current business where you need or have to “think about efficiency.”

In general, there are very few cases where you need to return ClickHouse data directly to the caller as a result of synchronization. Frugal programmers tend to use asynchronous mode, so with very little concurrency, our hardware requirements for ClickHouse are getting lower and lower: Data below 100 million level, as long as the minimum of 4 core 16GB virtual machine can easily handle; For gigabytes to gigabytes of data, as long as you can handle 32 to 64 gigabytes of memory, the calculated time will depend almost exclusively on the number of cores on your device and the size of your CPU cache.

So, when considering ClickHouse for fast or relatively fast “offline” data analysis, think first about how much data you have and whether you need to meet the memory threshold for fast computation, and then how fast you need to get results. Prioritize cpus with more cores and larger CPU caches within the cost budget. In the case of Cluster mode, this is not required by default unless you need to provide real-time interfaces, have high dependency and requirements on service availability, and have exceptionally high data write pressures. Cluster configuration, of course, if you have a demand, do not recommend using the default mode of distributed, because the data is not a complete image, but evenly distributed on each node, if one node kneel off, you will be 1 “real-time” N loss of data, lead to the final result can not say it, can only say that there might be wrong. The site has launched a “Replicated” database engine, which is based on the Atomic engine and uses ZooKeeper for full data replication. It’s still experimental, but it’s better than throwing data.

Another factor that can have a significant impact on how long ClickHouse takes us to get our results is storage. SSD storage is recommended, or even terabytes of inexpensive residential storage if you are using small samples for analysis. For extreme cost, you can even refer to my previous article on Cheap Home Workstation Solutions: If you are performing high-frequency, massive data computing, have relatively large storage capacity requirements and can be expected to increase the capacity of the data, considering the cost and higher data storage reliability, Raid 50 mode of mechanical disk is more suitable for you.

Of course, if you currently have nothing but learning, pick up a Docker container and start the learning journey, as well as millions to tens of millions of levels of data calculation and analysis.

Software Environment selection

All of my current machines run in Ubuntu + container environment. Why do I choose to use this environment? Because “Ubuntu is a first-class citizen in the container world”, this article also chooses to use a container environment for quick start.

Of course, if you choose to install ClickHouse on a bare machine, using a more stable Debian is a good option, while CentOS, at this point, is really not recommended (enterprise pay for RHEL is another topic).

Running ClickHouse in a container environment loses a lot of “conversion” performance, both in storage and network forwarding, so if you can install ClickHouse bare-metal in a production environment, use it out of the container.

If you have the Docker environment installed, we can move on to the next chapter. If you are not familiar with how to install Docker, please refer to the contents about container installation in the knowledge map of this site.

Preparation: The data set used for the test

To familiarize yourself with the basic syntax and the ClickHouse high performance experience, you can start by experimenting with the official Yandex.Metrica Data. (For more performance testing, see the test data set in the official repository.)

  1. https://datasets.clickhouse.tech/hits/partitions/hits_v1.tar
  2. https://datasets.clickhouse.tech/visits/partitions/visits_v1.tar

In addition, to demonstrate how to quickly import data without having to worry about casting data, we also need to operate with a dataset of a traditional database type, Choose a net friend here “film” used in the open source project database (MySQL) https://yyets.dmesg.app/database.

After the data is downloaded, we need to decompress the data first.

mkdir data
tar xvf hits_v1.tar -C data
tar xvf visits_v1.tar -C data
Copy the code

Using the du command, you can see that the data used actually uses 1.7GB of space. By the way, if this data is stored in MySQL, the storage space can be expanded by 3 ~ 5 times.

Du-hs data 1.7 GB dataCopy the code

With the data decompressed, you’re ready to run the configuration on ClickHouse’s container.

Preparations: Prepare the ClickHouse configuration to run

version: "2"

services:

  server:
    image: Another dual/clickhouse - server: 21.9.4.35
    container_name: clickhouse
    expose:
      - 9000
      - 8123
      - 9009
    ulimits:
      nproc: 65535
      nofile:
        soft: 262144
        hard: 262144
    environment:
      - TZ=Asia/Shanghai
      # - CLICKHOUSE_USER=root
      # - CLICKHOUSE_PASSWORD=xmnzdwH5
    volumes:
      - ./data:/var/lib/clickhouse
      Use as needed
      # - ./config.xml:/etc/clickhouse-server/config.xml
      # - ./users.xml:/etc/clickhouse-server/users.xml
Copy the code

Save the above configuration as docker-compose. Yml and start ClickHouse with docker-compose up -d for later use.

As an added bonus, ClickHouse builds quickly, so it is recommended that you do some small sample testing to see if common scenarios work before you make a version change.

At the beginning of ClickHouse experience

The SQL syntax used by ClickHouse is much looser than that of a database such as MySQL, which, by analogy, is like a jump into the world of Python and JavaScript for previous Java writers.

Because ClickHouse is started using a container, we can access the Interactive terminal of ClickHouse through the Docker exec command.

docker exec -it clickhouse clickhouse-client
Copy the code

After entering the terminal, let’s have a look at the “databases” and tables:

# query databasecc1b062138da :) show databases SHOW DATABASES Query id: Efaa1c51 - e112-43 d6 - b803-1 e6dd86ad43b ┌ ─ name ─ ─ ─ ─ ─ ┐ │ datasets │ │ default │ │ system │ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘ 3 rowsin setElapsed: 0.003 SEC.Switch database
cc1b062138da :) use datasets

USE datasets

Query id: b10ff8f3-0743-42f4-9ee1-663b9a2c4955

Ok.

0 rows in setElapsed: 0.002 SEC.Check the tablecc1b062138da :) show tables SHOW TABLES Query id: Ea2 c6eb8203-6-4576-9 bb7-74 ad4e1c7de9 ┌ ─ name ─ ─ ─ ─ ─ ─ ┐ │ hits_v1 │ │ visits_v1 │ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘ 2 rowsin setElapsed: 0.005 SEC.Copy the code

The datasets in the above result are the datasets we imported. ClickHouse stores data in the same directory as the data/datasets directory. If you look in the local directory, you can see that the data in the directory is the same as the data/datasets directory.

Tree - L 3 data/data/datasets data/data/datasets ├ ─ ─ hits_v1 │ ├ ─ ─ 201403 _10_18_2 │ │ ├ ─ ─ AdvEngineID. Bin │ │ ├ ─ ─ AdvEngineID. & │ │ ├ ─ ─ the Age. The bin │ │ ├ ─ ─ Age. & │ │ ├ ─ ─ UserID. Bin │ │ ├ ─ ─ UserID. &... │ │ ├ ─ ─ WatchID. Bin │ │ ├ ─ ─ WatchID. & │ │ ├ ─ ─ YCLID. Bin │ │ ├ ─ ─ YCLID. & │ │ ├ ─ ─ checksums. TXT │ │ ├ ─ ─ the columns. TXT │ │ ├ ─ ─ count. TXT │ │ ├ ─ ─ minmax_EventDate. Independence idx │ │ ├ ─ ─ partition. The dat │ │ └ ─ ─ primary. Independence idx │ ├ ─ ─ detached │ └ ─ ─ ├── advEngineid.mrK │ ├─ advEngineId.mrK │ ├─ advEngineid.mrK │ ├─ advEngineid.mrK │ ├─ advEngineid.mrK │ ├─ Age ├─ Age. MRK │ ├─ Attendance. Bin │ ├─ Attendance. │ ├─ Heavy Metal Metal Metal Metal Metal Metal Metal Metal Metal Metal Metal metal metal metal metal metal metal metal metal metal metal metal metal metal metal metal metal metal metal metal metal metal metal metal metal YCLID. & │ ├ ─ ─ checksums. TXT │ ├ ─ ─ the columns. TXT │ └ ─ ─ primary. Independence idx ├ ─ ─ detached └ ─ ─ format_version. TXT 6 directories, 675 filesCopy the code

To make subsequent typing easier, let’s rename the table first.

Remove the version suffix from both tables

cc1b062138da :) rename table hits_v1 to hits

RENAME TABLE hits_v1 TO hits

Query id: dba1405a-1836-4d5a-af23-3ce0f5b31d41

Ok.

0 rows in set.Elapsed: 0.014 sec.cc1b062138da :) Rename table visits_v1 to visits Query ID: 9ffc039c-86c3-42a9-91a6-3ed165254e0b Ok. 0 rowsin setElapsed: 0.012 SEC.# Check the table againcc1b062138da :) show tables SHOW TABLES Query id: C7f da91fb7c - 5224-4-9 a6c - ce4cf37f9fa8 ┌ ─ name ─ ─ ─ ┐ │ hits │ │ visits │ └ ─ ─ ─ ─ ─ ─ ─ ─ ┘ 2 rowsin setElapsed: 0.004 SEC.Copy the code

After renaming the table, let’s see how much data is in the two tables.

SELECT
    hits,
    visits
FROM 
(
    SELECT count(a)AS hits
    FROM hits
) AS table_hits
, 
(
    SELECT count(a)AS visits
    FROM visits
) AS table_visits
Copy the code

As you can see, both tables have very small amounts of data, in the millions to tens of millions.

┌ ─ ─ ─ ─ hits ─ ┬ ─ ─ visits ─ ┐ │88738981676861│ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘1 rows in set. Elapsed: 0.005 sec. 
Copy the code

Then we take a look at the table structure of the two tables, you can see that the two tables, 133, 181 columns respectively, are “normal” wide tables, very suitable for analysis.

desc hits


cc1b062138da :) desc hits
:-] 
:-] 

DESCRIBE TABLE hits

Query id: b8d8b650- 2395.- 4207.-b2ce- 4200.dc9c0fce ┌ ─ name ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┬ ─ type ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┬ ─ default_type ─ ┬ ─ default_expression ─ ┬ ─ the comment ─ ┬ ─ codec_expression ─ ┬ ─ ttl_ex 7. The compression ─ WatchID │ │ │ JavaEnable │ UInt8 │ │ │ Title │ String │ GoodEvent │ │ │ ├ ─ EventDate │ │ ├ ─ EventDate │ │ ├ ─ EventDate │ │ ├ ─ EventDate │ │ ├ ─ EventDate │Date│ │ │ CounterID │ │ │ │ ClientIP │ │ │ │ │ ClientIP6 │ FixedString(16│ │ │ RegionID │ UInt32 │ │ UserID │ UInt64 │ │ │ CounterClass │ Int8 │ OS │ │ UInt8 │ │ ├ ─ UserAgent │ UInt8 │ │ │ URL │ String │ │ Referer │ String │ │ URLDomain │ ├ ─ UInt8 │ ├ ─ devrobot │ ├ ─ RefererDomain │ ├ ─ RefererDomain │ │ │ RefererDomain │ │ │ │ RefererDomain │ │ │ │ RefererDomain │ │ │ │ RefererDomain │ │ │ │ │ RefererDomain │ │ │ │ RefererCategories │Array(UInt16) │ │ │ URLCategories │Array(UInt16) │ │ │ │ URLRegions │Array(UInt32) │ │ │ │ RefererRegions │Array(UInt32) │ │ │ │ UInt16 │ │ ResolutionHeight │ UInt16 │ │ │ ResolutionDepth │ │ UInt8 │ │ FlashMajor │ UInt8 │ │ FlashMinor │ UInt8 │ │ FlashMinor2 │ String │ │ │ │ UInt8 │ │ │ UInt8 │ │ │ UInt8 │ │ │ FixedString(2) │ │ │ │ │ │ │ CookieEnable │ UInt8 │ │ │ │ │ │ │ JavascriptEnable │ UInt8 │ │ │ │ │ │ │ IsMobile │ UInt8 │ │ │ │ │ │ │ │ UInt8 │ │ ├ ─ UInt8 │ │ ├ ─ MobilePhoneModel │ String │ │ Params │ String │ │ IPNetworkID │ UInt32 │ │ │ TraficSourceID │ Int8 │ │ SearchEngineID │ UInt16 │ │ SearchPhrase │ String │ │ │ AdvEngineID │ │ │ AdvEngineID │ UInt8 │ │ │ IsArtifical │ UInt8 │ │ WindowClientWidth │ UInt16 │ │ │ WindowClientHeight │ │ │ UInt16 │ │ │ Int16 │ │ │ ClientEventTime │ │ │ │ UInt8 │ │ │ SilverlightVersion2 │ UInt8 │ │ SilverlightVersion3 │ UInt32 │ │ silverlight3 │ │ │ ├ ─ UInt16 │ │ ├ ─ PageCharset │ String │ │ CodeVersion │ UInt32 │ │ SilverlightVersion4 │ │ │ UInt8 │ │ ├ ─ UInt8 │ │ IsNotBounce │ UInt8 │ │ FUniqID │ UInt64 │ │ │ │ │ HID │ │ │ IsOldCounter │ UInt8 │ │ │ IsEvent │ UInt8 │ │ │ │ ├ ─ UInt8 │ │ ├ ─ UInt8 │ │ HitColor │ FixedString(1│ │ ├ ─ UInt8 │ ├ ─ Sex │ UInt8 │ │ ├ ─ Income │ UInt8 │ │ │ ├ ─ UInt16 │ │ │ Robotness │ UInt8 │ │ GeneralInterests │Array(UInt16) │ │ │ RemoteIP │ │ │ │ RemoteIP6 │16) │ │ │ │ │ │ │ WindowName │ Int32 │ │ │ │ │ │ │ OpenerName │ Int32 │ │ │ │ │ │ │ HistoryLength │ Int16 │ │ │ │ │ │ │ BrowserLanguage │ FixedString (2│ │ │ │ BrowserCountry │ FixedString(2│ │ │ ├ ─ ─ SocialNetwork │ String │ │ │ SocialAction │ String │ │ │ ├ ─ HTTPError │ UInt16 │ │ │ │ │ SendTiming │ Int32 │ │ │ DNSTiming │ Int32 │ │ │ ConnectTiming │ Int32 │ │ │ ResponseStartTiming │ Int32 │ │ │ ResponseEndTiming │ Int32 │ │ │ FetchTiming │ ResponseStartTiming │ Int32 │ │ │ ResponseEndTiming │ Int32 │ │ │ RedirectTiming │ Int32 │ │ │ │ │ │ │ DOMInteractiveTiming │ Int32 │ │ │ │ │ │ │ DOMContentLoadedTiming │ Int32 │ │ │ │ │ │ │ DOMCompleteTiming │ Int32 │ │ │ adideventstartTiming │ Int32 │ │ LoadEventEndTiming │ Int32 │ │ │ │ │ NSToDOMContentLoadedTiming │ Int32 │ │ │ │ │ │ │ FirstPaintTiming │ Int32 │ │ │ │ │ │ │ RedirectCount │ Int8 │ │ │ │ ├ ─ UInt8 │ │ │ │ │ SocialSourcePage │ String │ │ ParamPrice │ Int64 │ │ │ │ │ ParamOrderID │ String │ │ │ ParamCurrency │ FixedString(3│ │ ├ ─ ParamCurrencyID │ UInt16 │ │ │ GoalsReached │Array(UInt32) │ │ │ │ │ │ OpenstatServiceName │ │ String │ │ │ OpenstatCampaignID │ String │ │ │ OpenstatAdID │ │ String │ │ │ OpenstatSourceID │ String │ String │ │ UTMSource │ String │ │ UTMMedium │ String │ │ │ │ │ │ │ UTMCampaign │ String │ │ │ │ UTMContent │ String │ │ UTMTerm │ String │ │ FromTag │ String │ │ │ HasGCLID │ UInt8 │ │ │ RefererHash │ UInt64 │ │ URLHash │ UInt64 │ │ │ │ CLID │ │ │ YCLID │ UInt64 │ │ │ ShareService │ String │ │ ShareURL │ String │ │ │ │ ShareTitle │ String │ │ │ │ parsedparams. Key1 │Array(String) │ │ │ │ │ parsedparams.key2 │Array(String) │ │ │ │ │ parsedparams.key3 │Array(String) │ │ │ │ │ ParsedParams.Key4 │Array(String) │ │ │ │ │ parsedparams.key5 │Array(String) │ │ │ │ │ ParsedParamsArray(Float64) │ │ │ IslandID │ FixedString(16│ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘133 rows in set. Elapsed: 0.003 sec. 

cc1b062138da :) 
Copy the code

Here’s another chart:

cc1b062138da :) desc visits

DESCRIBE TABLE visits

Query id: 821e1692-b571- 42ad-a38f- 88.bd120e478d ┌ ─ name ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┬ ─ type ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┬ ─ default_type ─ ┬ ─ default_expression ─ ┬ ─ the comment ─ ┬ ─ codec_expression ─ ┬─ttl_expression─ ─ CounterID │ │ │ │ StartDate │Date│ │ ├ ─ mark │ Int8 │ │ IsNew │ UInt8 │ │ VisitID │ UInt64 │ │ UserID │ UInt64 │ │ │ │ │ Duration │ UInt32 │ UTCStartTime │ DateTime │ │ │ PageViews │ Int32 │ │ Int32 │ │ IsBounce │ UInt8 │ │ │ Referer │ String │ │ │ StartURL │ String │ │ RefererDomain │ String │ │ │ StartURLDomain │ String │ │ │ EndURL │ String │ │ │ LinkURL │ String │ │ IsDownload │ UInt8 │ │ │ TraficSourceID │ Int8 │ │ │ SearchEngineID │ UInt16 │ │ │ SearchPhrase │ String │ │ AdvEngineID │ UInt8 │ │ Int32 │ │ │ │ RefererCategories │ RefererCategories │Array(UInt16) │ │ │ URLCategories │Array(UInt16) │ │ │ │ URLRegions │Array(UInt32) │ │ │ │ RefererRegions │Array(UInt32) │ │ │ │ IsYandex │ │ │ GoalReachesDepth │ Int32 │ │ │ GoalReachesURL │ Int32 │ │ │ │ ├ ─ UInt8 │ │ │ SocialSourcePage │ String │ GoalReachesAny │ Int32 │ │ │ │ ├ ─ ─ MobilePhoneModel │ │ │ String │ │ │ ClientEventTime │ DateTime │ │ RegionID │ UInt32 │ │ │ │ │ │ │ ClientIP │ │ │ │ ClientIP6 │ FixedString(16│ │ │ RemoteIP │ │ │ RemoteIP6 │ FixedString(│ │ │ RemoteIP │ │ │ │ RemoteIP6 │16(│ │ │ │ IPNetworkID │ │ │ │ │ SilverlightVersion3 │ │ │ CodeVersion │ │ │ │ │ UInt16 │ ResolutionWidth │ │ │ ResolutionHeight │ UInt16 │ │ UserAgentMajor │ UInt16 │ │ │ │ UInt16 │ │ │ WindowClientWidth │ UInt16 │ │ WindowClientHeight │ UInt16 │ │ │ │ ├ ─ UInt16 │ │ FlashVersion3 │ │ ├ ─ UInt16 │ ├ ─ SilverlightVersion2 │ │ │ ├ ─ UInt8 │ │ ├ ─ silverlightversion3 │ │ │ │ │ FlashVersion4 │ │ ├ ─ UInt16 │ │ │ ClientTimeZone │ Int16 │ │ OS │ UInt8 │ │ ├ ─ UserAgent │ UInt8 │ │ │ FlashMajor │ UInt8 │ │ FlashMinor │ UInt8 │ │ │ │ │ UInt8 │ │ │ NetMajor │ NetMinor │ UInt8 │ │ │ MobilePhone │ UInt8 │ │ SilverlightVersion1 │ UInt8 │ │ │ Age │ │ UInt8 │ │ Sex │ UInt8 │ │ │ Income │ UInt8 │ │ │ UInt8 │ │ ├ ─ UInt8 │ ├ ─ JavascriptEnable │ UInt8 │ │ │ UInt8 │ │ │ │ ├ ─ UInt16 │ │ ├ ─ UInt16 │ interest │ │ UInt16 │ │ │ │ UInt8 │ │ │ GeneralInterests │Array(UInt16) │ │ │ │ │ Params │Array(String) │ │ │ │ GoalsArray(UInt32) │ │ │ GoalsArray(UInt32) │ │ │ GoalsArray(DateTime) │ │ │ │ GoalsArray│ │ │ GoalsArray(String) │ │ │ │ Goals.CurrencyID │Array(UInt32) │ │ │ WatchIDs │Array(UInt64) │ │ │ │ ParamCurrency │ Int64 │ │ │ ParamCurrency │ FixedString(│ │ │ │ ParamCurrency │ FixedString()3│ │ ├ ─ ParamCurrencyID │ UInt16 │ │ │ │ ClickLogID │ UInt64 │ │ │ │ ClickEventID │ Int32 │ │ │ │ │ │ │ ClickGoodEvent │ │ │ Int32 │ │ │ │ ClickEventTime │ DateTime │ │ │ │ ClickPriorityID │ Int32 │ │ │ │ ClickPhraseID │ Int32 │ │ │ │ │ │ ClickPageID │ Int32 │ │ │ ClickPlaceID │ Int32 │ │ │ ClickTypeID │ In addition, there are three types of files: Int32 │ │ │ │ ClickResourceID │ Int32 │ │ │ ClickCost │ UInt32 │ │ ClickClientIP │ │ │ │ │ ├ ─ UInt8 │ │ │ ├ ─ UInt8 │ │ │ ├ ─ UInt8 │ │ │ │ │ │ │ │ │ │ ClickOrderID │ │ │ │ ClickBannerID │ UInt32 │ │ │ ClickMarketCategoryID │ UInt32 │ │ ClickMarketPP │ │ │ │ │ │ │ │ │ │ │ │ - click - MarketCategoryName │ - click - marketcategoryName │ - click - marketcategoryName │ - click - marketcategoryName │ - click - marketCategoryName │ - click - marketCategoryName │ - click - marketCategoryName │ - click - marketCategoryName │ - click - marketCategoryName │ │ │ ClickAWAPSCampaignName │ String │ │ │ │ │ │ │ ClickPageName │ String │ │ │ │ │ │ │ ClickTargetType │ UInt16 │ │ │ │ │ │ ├ ─ UInt8 │ │ │ ├ ─ ClickSelectType │ Int8 │ │ ├ ─ UInt8 │ │ ├ ─ ClickTargetPhraseID │ │ │ │ ├ ─ ClickTargetPhraseID │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ ClickOptions │ String │ │ │ │ │ │ │ ClickGroupBannerID │ Int32 │ │ │ │ │ │ │ OpenstatServiceName │ String │ │ │ │ │ │ │ OpenstatCampaignID │ String │ │ │ OpenstatAdID │ String │ │ │ OpenstatSourceID │ String │ │ │ │ UTMSource │ String │ │ │ │ UTMMedium │ String │ │ │ UTMCampaign │ String │ │ │ UTMContent │ String │ │ │ UTMTerm │ String │ │ │ │ FromTag │ String │ │ HasGCLID │ │ FirstVisit │ DateTime │ │ │ PredLastVisit │Date│ │ │ │ LastVisit │Date│ │ │ TotalVisits │ UInt32 │ │ traficsource. ID │Array(Int8) │ │ │ │ │ │ │ TraficSource. SearchEngineID │Array(UInt16) │ │ │ TraficSource.AdvEngineID │Array(UInt8) │ │ │ TraficSourceArray(UInt16) │ │ │ │ │ │ │ TraficSource. SocialSourceNetworkID │Array(UInt8) │ │ │ │ traficsource.domain │Array(String) │ │ │ │ │ │ │ TraficSource. SearchPhrase │Array(String) │ │ │ │ │ │ │ TraficSource. SocialSourcePage │Array(String) │ │ │ │ Attendance │ FixedString(String)16(│ │ │ CLID │ UInt32 │ │ YCLID │ UInt64 │ │ │ NormalizedRefererHash │ │ │ SearchPhraseHash │ UInt64 │ │ │ │ │ │ │ RefererDomainHash │ UInt64 │ │ │ │ │ │ │ NormalizedStartURLHash │ UInt64 │ │ │ │ │ │ │ StartURLDomainHash │ UInt64 │ │ │ │ │ │ │ NormalizedEndURLHash │ UInt64 │ │ │ │ │ │ │ TopLevelDomain │ UInt64 │ │ │ │ │ URLScheme │ UInt64 │ │ │ URLScheme │ UInt64 OpenstatServiceNameHash │ │ │ OpenstatCampaignIDHash │ UInt64 │ │ │ │ OpenstatAdIDHash │ UInt64 │ │ │ │ OpenstatSourceIDHash │ UInt64 │ │ │ UTMSourceHash │ UInt64 │ │ │ │ UTMMediumHash │ UInt64 │ │ │ UTMCampaignHash │ │ │ UTMContentHash │ UInt64 │ │ │ │ │ │ │ UTMTermHash │ UInt64 │ │ │ │ │ │ │ FromHash │ UInt64 │ │ │ │ │ │ │ WebVisorEnabled │ UInt8 │ │ │ │ │ │ ├ ─ WebVisorActivity │ │ │ │ │ parsedparams. Key1 │Array(String) │ │ │ │ │ parsedparams.key2 │Array(String) │ │ │ │ │ parsedparams.key3 │Array(String) │ │ │ │ │ ParsedParams.Key4 │Array(String) │ │ │ │ │ parsedparams.key5 │Array(String) │ │ │ │ │ ParsedParamsArray(Float64) │ │ │ │ MarketArray(UInt8) │ │ │ │Array(UInt32) │ │ │ │ Market.OrderID │Array(String) │ │ │ │ MarketArray│ │ │ │ MarketArray(UInt32) │ │ │ │ MarketArray(UInt32) │ │ │ Market.DirectOrderID │Array(UInt32) │ │ │ │ MarketArray(UInt32) │ │ │ │ Market.GoodID │Array(String) │ │ │ │ │ MarketArray(String) │ │ │ │ │ MarketArray│ │ │ │ Market. (│ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ MarketArray│ │ ├ ─ ─ IslandID │ FixedString(0 folders, 0 folders, 0 folders)16│ │ │ │ │ │ │ │ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘181 rows in set. Elapsed: 0.004 sec. 
Copy the code

Data query operation

Next, let’s do some basic query operations on the data in these two tables.

Group query Obtains Top N records

A frequent operation in daily business is to have a large number of data, we need to filter the “popular” data, suppose we want to count the number of visits to the site with different “counter ID” sub-module, then we can do the following statement:

cc1b062138da :) SELECT CounterID, count(a)AS nums FROM hits GROUP BY CounterID ORDER BY nums DESC LIMIT 10;

SELECT
    CounterID,
    count(a)AS nums
FROM hits
GROUP BY CounterID
ORDER BY nums DESC
LIMIT 10

Query id: 4206896d-e52e- 48fc-bad1- 3Fe60949f90a ┌ ─ CounterID ─ ┬ ─ ─ ─ nums ─ ┐ │1704509523264 │
│    732797475698 │
│    598875337212 │
│    792887252197 │
│   3807842196036 │
│  25703952147211 │
│    71682990109 │
│     5918385379 │
│  3301036277807 │
│    80078477492│ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ┘10 rows in set. Elapsed: 0.032 sec. Processed 8.87 million rows.35.50 MB (277.95 million rows/s., 1.11 GB/s.)  
Copy the code

Or if we want to achieve a function such as “Hot list of Sina and Zhihu search”, we can display the top ten hot keywords and the corresponding times of search in the following way:

cc1b062138da :) SELECT SearchPhrase AS keyword, count(a)AS nums FROM hits GROUP BY keyword ORDER BY nums DESC, keyword LIMIT 10

SELECT
    SearchPhrase AS keyword,
    count(a)AS nums
FROM hits
GROUP BY keyword
ORDER BY
    nums DESC,
    keyword ASC
LIMIT 10

Query id: 62952a2b- 858.a- 463.d-b5d37 -Af60b9f7e92 ┌ ─ keyword ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┬ ─ ─ ─ ─ nums ─ ┐ │ │8267016│ │ с т12.168. 01.3567 │
│ orton                      │    2402│ ├ ─ f · ры лица иг ы н диза н │2166 │
│ imgsrc                     │    1848│ │ б р ы з f о kind guide и seem │1659 │
│ индийский афтозный         │    1549│ │ оооотвод а и │1480│ │ выступ пная н ин │ выступ пная н ин │1247│ │ ю н о с т ь │1112│ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘10 rows in set. Elapsed: 0.096 sec. Processed 8.87 million rows.112.70 MB (92.77 million rows/s., 1.18 GB/s.) 
Copy the code

Perform accurate screening according to numerical values

AdvEngineID = 0; AdvEngineID = 0; AdvEngineID = 0; AdvEngineID = 0;

cc1b062138da :) SELECT count(a)FROM hits WHERE AdvEngineID ! = 0

SELECT count(a)FROM hits
WHERE AdvEngineID ! = 0

Query id: e3bc420a-ef87- 42a4- 879.e-Bbf548d2e3a0 ┌ ─count(a) ─ ┐ │30641│ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘1 rows in set. Elapsed: 0.013 sec. Processed 8.87 million rows.8.87 MB (668.11 million rows/s., 668.11 MB/s.) 
Copy the code

Data calculation

In addition to filtering out empty data, it is easy to have operations that need to sum the total number of accesses and calculate the average of all sources. ClickHouse provides a number of high performance calculation functions:

cc1b062138da :) SELECT sum(AdvEngineID), count(), avg(ResolutionWidth) FROM hits

SELECT
    sum(AdvEngineID),
    count(),
    avg(ResolutionWidth)
FROM hits

Query id: 47aaef91- 69.c9- 49ed-8e2d- 9801.Ec243338 ┌ ─sum(AdvEngineID) ─ ┬ ─count(a) ─ ┬ ─avg(ResolutionWidth) ─ ┐ │32903988738981400.8565027454677│ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘1 rows in set. Elapsed: 0.042 sec. Processed 8.87 million rows.26.62 MB (210.68 million rows/s., 632.05 MB/s.) 
Copy the code

Data to heavy

Of course, sometimes, we store some redundant data in the application side. In order to calculate accurately, we need to take a certain field as the criterion to perform deduplication operation and carry out more accurate calculation.

The following shows how to filter data from the counter whose counter ID is 800784 and group the data according to the RegionID field. Finally, the user ids after deduplication are displayed in descending order.

cc1b062138da :) SELECT RegionID, uniq(UserID) AS UID FROM hits WHERE CounterID = 800784 GROUP BY RegionID ORDER BY UID DESC, RegionID LIMIT 10

SELECT
    RegionID,
    uniq(UserID) AS UID
FROM hits
WHERE CounterID = 800784
GROUP BY RegionID
ORDER BY
    UID DESC,
    RegionID ASC
LIMIT 10

Query id: 02049c4f-78e9- 402.a- 93.ab- 1586.C5d7a406 ┌ ─ RegionID ─ ┬ UID ─ ─ ┐ │196559 │
│        3161 │
│      241147 │
│      207106 │
│      22562 │
│        160 │
│       4659 │
│       3658 │
│      10458 │
│       4757│ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ┘10 rows in set. Elapsed: 0.011 sec. Processed 81.92 thousand rows.1.31 MB (7.61 million rows/s., 121.83 MB/s.) 
Copy the code

For a more complex example, you can use uniqIf to set multiple deduplication conditions at the same time.

cc1b062138da :) SELECT uniq(UserID), uniqIf(UserID, CounterID = 800784), uniqIf(FUniqID, RegionID = 213) FROM hits

SELECT
    uniq(UserID),
    uniqIf(UserID, CounterID = 800784),
    uniqIf(FUniqID, RegionID = 213)
FROM hits

Query id: 76839d79- 9551.- 4863.-a423-E0c21b2193fe ┌ ─ uniq (UserID) ─ ┬ ─ uniqIf (UserID,equals(CounterID, 800784)) ─ ┬ ─ uniqIf (FUniqID,equals(RegionID, 213)) ─ ┐ │1206654057106│ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘1 rows in set. Elapsed: 0.162 sec. Processed 8.87 million rows.212.97 MB (54.87 million rows/s., 1.32 GB/s.) 
Copy the code

Data sorting

Sorting data is also a common requirement. For example, we show the last ten written data according to data time.

cc1b062138da :) SELECT EventTime FROM hits ORDER BY EventTime DESC LIMIT 10

SELECT EventTime
FROM hits
ORDER BY EventTime DESC
LIMIT 10

Query id: fbdc1df90ff1- 408.f-b681- 3A94be1e6a86 ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ EventTime ─ ┐ │2014- 03- 24 03:59:59 │
│ 2014- 03- 24 03:59:59 │
│ 2014- 03- 24 03:59:59 │
│ 2014- 03- 24 03:59:59 │
│ 2014- 03- 24 03:59:59 │
│ 2014- 03- 24 03:59:59 │
│ 2014- 03- 24 03:59:59 │
│ 2014- 03- 24 03:59:59 │
│ 2014- 03- 24 03:59:59 │
│ 2014- 03- 24 03:59:59│ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘10 rows in set. Elapsed: 0.174 sec. Processed 8.87 million rows.35.50 MB (51.11 million rows/s., 204.43 MB/s.) 
Copy the code

Sorting of compound query conditions

ClickHouse also has great query support for structured data. You can query for array type data and sort the results directly using statements like the following:

cc1b062138da :) SELECT ParsedParams.Key1 FROM visits FINAL WHERE VisitID ! = 0 AND notEmpty(ParsedParams.Key1) ORDER BY VisitID LIMIT 10

SELECT ParsedParams.Key1
FROM visits
FINAL
WHERE (VisitID ! = 0) AND notEmpty(ParsedParams.Key1)
ORDER BY VisitID ASC
LIMIT 10

Query id: 05a3ae1843 -a5- 414.a9 -c8a- 8 -b98c6cbce42 ┌ ─ ParsedParams. Key1 ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ │ ['К н о п seem а']                                                                                                                   │
│ ['pageParams'.'pageParams']                                                                                                  │
│ ['pageParams']                                                                                                               │
│ ['gen_time']                                                                                                                 │
│ ['pageParams']                                                                                                               │
│ ['pageParams']                                                                                                               │
│ ['pageParams']                                                                                                               │
│ ['pageParams']                                                                                                               │
│ ['П р а kind guide'.'gen_timestamp'.'П р а kind guide'.'Д е й с т kind guide и'.'affili'.'gen_timestamp'.'кспери ент по отрыв счет папок' non -.'П р о с м о т р п и с е м'] │
│ ['М а р seem а'.'gen_time'.'М а р seem а'] │ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘10 rows in set. Elapsed: 0.307 sec. Processed 1.68 million rows.62.63 MB (5.46 million rows/s., 203.83 MB/s.) 
Copy the code

Range queries

In addition to the use of mathematical comparison symbols, limited range query is generally divided into two types, the first type is to set the “whitelist” query:

cc1b062138da :) select sum(Sign) from visits where CounterID in (942285);

SELECT sum(Sign)
FROM visits
WHERE CounterID IN (942285)

Query id: 44b2f770- 8916.- 46b9-b056- 74.B24b155e95 ┌ ─sum(Sign) ─ ┐ │108133│ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘1 rows in set. Elapsed: 0.008 sec. Processed 114.69 thousand rows.573.44 KB (14.42 million rows/s., 72.12 MB/s.) 
Copy the code

The second type of query is based on a given time range. In ClickHouse, if we want to know when a piece of data was recorded, we can even do this:

cc1b062138da :) SELECT min(EventDate), max(EventDate) FROM hits

SELECT
    min(EventDate),
    max(EventDate)
FROM hits

Query id: dd1b7383- 6846.4 -cf6- 86.cb- 5217.Ffa1357f ┌ ─min(EventDate) ─ ┬ ─max(EventDate) ─ ┐ │2014- 03- 172014- 03- 23│ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘1 rows in set. Elapsed: 0.015 sec. Processed 8.87 million rows.17.75 MB (599.25 million rows/s., 1.20 GB/s.) 
Copy the code

The Join query

Federated queries are also a common requirement, and when it comes to joins, a picture is worth a thousand words.

If we want to combine a table of millions and a table of nearly millions to calculate the user UV and click PV of the same date, we can do this:

SELECT
    EventDate,
    hits,
    visits
FROM 
(
    SELECT
        EventDate,
        count(a)AS hits
    FROM hits
    GROUP BY EventDate
)
ANY LEFT JOIN 
(
    SELECT
        StartDate AS EventDate,
        sum(Sign) AS visits
    FROM visits
    GROUP BY EventDate
) USING (EventDate)
ORDER BY hits DESC
LIMIT 10
SETTINGS joined_subquery_requires_alias = 0

Query id: dbfee618- 01b1- 44bf-af8f- 12A73c7ff300 ┌ ─ ─ EventDate ─ ┬ ─ ─ ─ ─ hits ─ ┬ ─ visits ─ ┐ │2014- 03- 171406958265108 │
│ 2014- 03- 191405797261624 │
│ 2014- 03181383658258723 │
│ 2014- 03- 201353623255328 │
│ 2014- 03- 211245779236232 │
│ 2014- 03- 231046491202212 │
│ 2014- 03- 221031592197354│ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ┘7 rows in set. Elapsed: 0.051 sec. Processed 10.55 million rows.22.78 MB (207.12 million rows/s., 447.17 MB/s.) 
Copy the code

The subquery

Subqueries are also a very common requirement and can help eliminate many unnecessary manual steps, such as filtering a batch of users and then calculating their access:

cc1b062138da :) SELECT count(a)FROM hits PREWHERE UserID IN (SELECT UserID FROM hits WHERE CounterID = 800784);

SELECT count(a)FROM hits
PREWHERE UserID IN 
(
    SELECT UserID
    FROM hits
    WHERE CounterID = 800784
)

Query id: 5d254281- 8 -c9e- 4593.9 -d8a-Bdce24037d97 ┌ ─count(a) ─ ┐ │1956422│ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘1 rows in set. Elapsed: 0.032 sec. Processed 8.96 million rows.71.97 MB (283.90 million rows/s., 2.28 GB/s.) 

cc1b062138da :) 
Copy the code

In the example above, we use PREWHERE to optimize the query. It can filter out a large number of unnecessary reads. However, if you have a small amount of data, or the concurrency pressure is low, the same as WHERE is not a problem.

# PREWHERE
1 rows in set. Elapsed: 0.106 sec. Processed 8.96 million rows.71.97 MB (84.79 million rows/s., 681.46 MB/s.)

# WHERE
1 rows in set. Elapsed: 0.026 sec. Processed 8.96 million rows.71.97 MB (339.11 million rows/s., 2.73 GB/s.)
Copy the code

Fuzzy string search

There are times when we need to do a quick fuzzy search, and ClickHouse also provides amazing performance:

cc1b062138da :) SELECT count(a)FROM hits WHERE URL LIKE '%avtomobili%';

SELECT count(a)FROM hits
WHERE URL LIKE '%avtomobili%'

Query id: 93ea9afc-070e-47ae-885e-6B14c26315a5 ┌ ─count(a) ─ ┐ │51354│ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘1 rows in set. Elapsed: 0.152 sec. Processed 8.87 million rows.767.95 MB (58.44 million rows/s., 5.06 GB/s.) 
Copy the code

Easter egg: Lazy draw a picture

In addition to this, we sometimes ask for graphs to help us visualize the difference in the magnitude of the data, which can also be handled using the bar function that comes with ClickHouse.

cc1b062138da :) SELECT CounterID, count(a)AS c, bar(c, 0.523264) FROM hits GROUP BY CounterID ORDER BY c DESC, CounterID ASC LIMIT 100;

SELECT
    CounterID,
    count(a)AS c,
    bar(c, 0.523264)
FROM hits
GROUP BY CounterID
ORDER BY
    c DESC,
    CounterID ASC
LIMIT 100

Query id: ee79be07-c93f4 -aba-ad81- 8 -D6938c99f96 ┌ ─ CounterID ─ ┬ ─ ─ ─ ─ ─ ─ ─ c ┬ ─ bar (count(), 0.523264) ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ │1704509523264│ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ │ │732797475698│ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ▋ │ │598875337212│ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ▌ │ │792887252197│ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ▌ │ │3807842196036│ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ▊ │ │25703952147211│ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ▌ │ │71682990109│ █ █ █ █ █ █ █ █ █ █ █ █ █ ▋ │ │5918385379│ █ █ █ █ █ █ █ █ █ █ █ █ █ │ │3301036277807│ █ █ █ █ █ █ █ █ █ █ █ ▊ │ │80078477492│ █ █ █ █ █ █ █ █ █ █ █ ▋ │ │2081064573213│ █ █ █ █ █ █ █ █ █ █ █ ▏ │ │2584385068945│ █ █ █ █ █ █ █ █ █ █ ▌ │ │2344712067570│ █ █ █ █ █ █ █ █ █ █ ▎ │ │1473980464174│ █ █ █ █ █ █ █ █ █ ▋ │ │3207771060456│ █ █ █ █ █ █ █ █ █ ▏ │ │2244687958389│ █ █ █ █ █ █ █ █ ▊ │ │17028257017│ █ █ █ █ █ █ █ █ ▋ │ │1148281752345│ █ █ █ █ █ █ █ █ │ │6346952142│ █ █ █ █ █ █ █ ▊ │ │2910347347758│ █ █ █ █ █ █ █ ▎ │ │1013674744080│ █ █ █ █ █ █ ▋ │ │2752880143395│ █ █ █ █ █ █ ▋ │ │1058137743279│ █ █ █ █ █ █ ▌ │ │984120140581│ █ █ █ █ █ █ ▏ │ │2031096337562│ █ █ █ █ █ ▋ │ │1733766734301│ █ █ █ █ █ ▏ │ │2860028132776│ █ █ █ █ █ │ │3204668528788│ █ █ █ █ ▍ │ │1013088026603│ █ █ █ █ │ │867683125733│ █ █ █ ▊ │ │5323025595│ █ █ █ ▊ │ │2027122625585│ █ █ █ ▊ │ │1742066325496│ █ █ █ ▊ │ │63120725270│ █ █ █ ▋ │ │63313024744│ █ █ █ ▋ │ │1432401523349│ █ █ █ ▌ │ │853796521270│ █ █ █ ▎ │ │1128529820825│ █ █ █ ▏ │ │1493761520788│ █ █ █ ▏ │ │18505020785│ █ █ █ ▏ │ │1636823319897│ █ █ █ │ │8160219724│ █ █ █ │ │6289619717│ █ █ █ │ │1296766419402│ █ █ ▊ │ │1599659718557│ █ █ ▋ │ │437923818370│ █ █ ▋ │ │9098217443│ █ █ ▋ │ │1821104517390│ █ █ ▋ │ │1462588417302│ █ █ ▋ │ │1286491017279│ █ █ ▋ │ │12609616959│ █ █ ▌ │ │3029613416849│ █ █ ▌ │ │2636048216175│ █ █ ▍ │ │1778895016017│ █ █ ▍ │ │592871615340│ █ █ ▎ │ │1546903515171│ █ █ ▎ │ │2973212515146│ █ █ ▎ │ │3294624415104│ █ █ ▎ │ │2095724114719│ █ █ ▎ │ │949569514584│ █ █ ▏ │ │2924114614540│ █ █ ▏ │ │10980514199│ █ █ ▏ │ │2690578813972│ █ █ ▏ │ │21201913930│ █ █ ▏ │ │17150913792│ █ █ │ │2391316213615│ █ █ │ │186199313509│ █ █ │ │12577613308│ █ █ │ │1131231613181│ █ █ │ │3266732613181│ █ █ │ │2862897312922│ █ ▊ │ │12280412520│ █ ▊ │ │1232275812352│ █ ▊ │ │130181912283│ █ ▊ │ │1076954512183│ █ ▋ │ │2156693912170│ █ ▋ │ │2890536412158│ █ ▋ │ │425076512049│ █ ▋ │ │1500972711818│ █ ▋ │ │1276193211733│ █ ▋ │ │2699588811658│ █ ▋ │ │1275934611514│ █ ▋ │ │150791111452│ █ ▋ │ │96848811444│ █ ▋ │ │1573617211358│ █ ▋ │ │5431011193│ █ ▋ │ │1702739111047│ █ ▋ │ │1743991910936│ █ ▋ │ │448086010747│ █ ▋ │ │2673846910738│ █ ▋ │ │998623110656│ █ ▋ │ │153999510655│ █ ▋ │ │21455610625│ █ ▌ │ │21933910522│ █ ▌ │ │326610503│ █ ▌ │ │3056342910128│ █ ▌ │ │196046910098│ █ ▌ │ │790114310022│ █ ▌ │ │1945999997│ █ ▌ │ │210524989780│ █ ▍ │ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘100 rows in set. Elapsed: 0.034 sec. Processed 8.87 million rows.35.50 MB (259.87 million rows/s., 1.04 GB/s.) 
Copy the code

MySQL > alter database

Next, let’s learn how to import data, taking MySQL and CSV as examples. (official support direct import or export of dozens of different data formats: clickhouse. Tech/docs/useful/int… , you can adjust according to the actual needs, reduce unnecessary data conversion operations.

Start MySQL database

To enable database instances to communicate with each other, we need to create a virtual container network:

docker network create dbtest
Copy the code

As above, save the following as docker-comemage. yml (you can store it in a different directory, you’ll need to change the name of the data directory if you store it in the same directory) :

version: '3'

services:

  mysql:
    container_name: mysql
    image: Mysql: 5.7.25
    ports:
      - 13306: 3306
    environment:
      MYSQL_USER: test
      MYSQL_PASSWORD: test
      MYSQL_DATABASE: test
      MYSQL_ROOT_PASSWORD: test
    command: --character-set-server=utf8mb4 --collation-server=utf8mb4_unicode_ci --default-storage-engine=INNODB --transaction-isolation=READ-COMMITTED --binlog_format=row --max_allowed_packet=33554432 --sql_mode="STRICT_ALL_TABLES" --local-infile=0
    volumes:
    # - /etc/localtime:/etc/localtime:ro
    # - /etc/timezone:/etc/timezone:ro
      - ./data:/var/lib/mysql
    healthcheck: 
      test: ["CMD-SHELL"."mysqladmin ping -h localhost -p$$MYSQL_ROOT_PASSWORD" ]
    networks:
      - dbtest

networks:
  dbtest:
    external: true
Copy the code

Docker-compose up -d: docker-compose up -d: docker-compose up -d

Initialize test data

After downloading the test data mentioned above, uncompress it and you can see that the compression ratio is approximately 1:6 (1:10 in the earlier example).

unzip yyets_mysql.zip

du -hs yyets*
262M	yyets.sql
 41M	yyets_mysql.zip
Copy the code

Dump the data into a directory that the database container can access (another way is recommended for a more intuitive comparison of data volumes before and after migration) :

The mv yyets - mysql5.7. SQL dataCopy the code

Or use the docker cp command to copy the data into the MySQL container’s data folder:

docker cp yyets.sql mysql:/var/lib/mysql-files/  
Copy the code

Then use the Docker exec command to log in to the data terminal in the container and create an empty database.

docker exec-it mysql mysql -uroot -ptest mysql> create database yyets; Query OK, 1 row affected (0.01sec)Copy the code

Next, the downloaded external data source is initialized for the ClickHouse imported data source.

docker exec -it mysql bash

Import from mysql-files or from mysql directory, depending on where you put the data
mysql -uroot -ptest yyets < /var/lib/mysql-files/yyets.sql

# Ignore the following prompts
mysql: [Warning] Using a password on the command line interface can be insecure.
Copy the code

After a while the data is restored to the MySQL instance.

After the SQL file is imported, use du again to check the disk space. Compared with the SQL file before the import, the disk space has expanded by about 270MB (including the index).

du -hs data
530M	data
Copy the code

Let’s go into the database again and look at the table and the total amount of data.

docker exec -it mysql mysql -uroot -Ptest # check the database list mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
| sys                |
| test               |
| yyets              |
+--------------------+
6 rows in set (0.03SEC) # alter database mysql> use yyets;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -Mysql > alter table mysql > alter table mysql > alter table mysql > alter table mysql> show tables;
+-----------------+
| Tables_in_yyets |
+-----------------+
| yyets           |
+-----------------+
1 row in set (0.00SEC) # check the data volume of mysql> select count(*) from yyets;
+----------+
| count(*) |
+----------+
|    17579 |
+----------+
1 row in set (0.83 sec)
Copy the code

As you can see, there are less than 20,000 pieces of data in the database, which is relatively small, but enough for us to test.

Adjust the ClickHouse instance

To visualize the data storage differences, close the original ClickHouse instance, replace it with a clean data directory, and start a new ClickHouse instance.

However, we also need to make a simple adjustment to the ClickHouse configuration to add it to the same container network:

version: "2"

services:

  server:
    .
    networks:
      - dbtest

networks:
  dbtest:
    external: true
Copy the code

I won’t go over the process as much as I did earlier, but let’s take note of how much disk space an empty ClickHouse directory takes up.

clickhouse du -hs *            
2.0M	data
4.0K	docker-compose.yml
Copy the code

Import MySQL data into ClickHouse

In order to simplify data import and reduce unnecessary “data transformation” operations, we can create a scheme by creating an “online table” first.

  • Official documentation of data types: clickhouse tech/docs/useful/SQL…

Log in to the terminal using the container first:

docker exec -it clickhouse clickhouse-client
Copy the code

Then “mount” MySQL to ClickHouse;

981a7a68eb35 :) CREATE DATABASE yyets_online ENGINE = MySQL('mysql:3306'.'yyets'.'root'.'test')

CREATE DATABASE yyets_online
ENGINE = MySQL('mysql:3306'.'yyets'.'root'.'test')

Query id: 45e4570e- 3536.- 4240.- 9775.-fbc9d91ffdcc

Ok.

0 rows in set. Elapsed: 0.026 sec. 

981a7a68eb35 :) 
Copy the code

MySQL > alter database; MySQL > alter database;

981a7a68eb35 :) use yyets_online;

USE yyets_online

Query id: 9b373b74- 11a5- 47be- 8358.-989522311e3e

Ok.

0 rows in set. Elapsed: 0.002 sec. 

981a7a68eb35 :) show tables

SHOW TABLES

Query id: 04cf20ce-b656- 441.e-b414-A63225d498d1 ┌ ─ name ─ ─ ┐ │ yyets │ └ ─ ─ ─ ─ ─ ─ ─ ┘1 rows in set. Elapsed: 0.006 sec. 
Copy the code

Moving on to the table structure, you see that the data types have been converted.

981a7a68eb35 :) desc yyets

DESCRIBE TABLE yyets

Query id: 650a4f24-fd8b- 4290.-b40e-cb59b78a926e ┌ ─ name ─ ─ ─ ─ ─ ─ ─ ┬ ─ type ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┬ ─ default_type ─ ┬ ─ default_expression ─ ┬ ─ the comment ─ ┬ ─ codec_expression ─ ┬ ─ ttl_expression ─ ┐ │ id │ │ Int32 │ │ │ │ │ URL │ Nullable(String) │ │ │ name │ Nullable(String) │ │ expire │ Nullable (Int32) │ │ │ │ │ │ │ expire_cst │ Nullable (String) │ │ │ │ │ │ │ data │ Nullable (String) │ │ │ │ │ │ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘6 rows in set. Elapsed: 0.004 sec. 
Copy the code

Go ahead and use the show create statement to get the command to create the current table structure.

981a7a68eb35 :) show create yyets

SHOW CREATE TABLE yyets

Query id: 85caee099 -bc0-4e7c9 -ff2-bfab820c2c6b ┌ ─ statement ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ │CREATE TABLE yyets_online.yyets
(
    `id` Nullable(Int32),
    `cnname` Nullable(String),
    `enname` Nullable(String),
    `aliasname` Nullable(String),
    `views` Nullable(Int32),
    `data` Nullable(String)
)
ENGINE = MySQL('mysql:3306'.'yyets'.'yyets'.'root'.'test') │ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘1 rows in set. Elapsed: 0.005 sec. 
Copy the code

Then modify the database engine section and add the primary key to complete our subsequent creation of the offline table for data calculation.

981a7a68eb35 :) CREATE DATABASE yyets_offline;

CREATE DATABASE yyets_offline

Query id: edab3813-f3404 -a48- 86.d7-f12c6ad3297e

Ok.

0 rows in set. Elapsed: 0.019 sec. 

981a7a68eb35 :) CREATE TABLE yyets_offline.yyets
:-] (:-]     `id` Int32,
:-]     `url` Nullable(String),
:-]     `name` Nullable(String),
:-]     `expire` Nullable(Int32),
:-]     `expire_cst` Nullable(String),
:-]     `data` Nullable(String)
:-] ) ENGINE = MergeTree()
:-] ORDER BY `id`

CREATE TABLE yyets_offline.yyets
(
    `id` Int32,
    `url` Nullable(String),
    `name` Nullable(String),
    `expire` Nullable(Int32),
    `expire_cst` Nullable(String),
    `data` Nullable(String)
)
ENGINE = MergeTree
ORDER BY id

Query id: b8371553- 683.a- 418.f- 9593.-b4da49fd5517

Ok.

0 rows in set. Elapsed: 0.031 sec. 

981a7a68eb35 :) 
Copy the code

Finally, use the INSERT into statement to import the data from the online table into the offline table.

981a7a68eb35 :) insert INTO yyets_offline.yyets select * from yyets_online.yyets

INSERT INTO yyets_offline.resource SELECT *
FROM yyets_online.resource

Query id: fe5eec24-ae0d4 -deb-abba- 76.df242ae8a9

Ok.

0 rows in set. Elapsed: 5.276 sec. Processed 17.58 thousand rows.244.68 MB (3.33 thousand rows/s., 46.37 MB/s.) 
Copy the code

After the import is complete, we can first do a simple query to confirm whether the data transfer is complete.

SELECT
    online,
    offline
FROM 
(
    SELECT count(a)AS online
    FROM yyets_online.yyets
) AS table1
, 
(
    SELECT count(a)AS offline
    FROM yyets_offline.yyets
) AS table2

Query id: c0edda99- 8491.4 -dd7-bcbc- 0323.B986553e ┌ ─ online ─ ┬ ─ offline ─ ┐ │1757917579│ └ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘1 rows in set. Elapsed: 0.032 sec. Processed 17.52 thousand rows.74.18 KB (542.94 thousand rows/s., 2.30 MB/s.) 
Copy the code

You can see that the two tables have the same amount of data, indicating that the data import is complete. Feel free to try a simple keyword match.

466b62ff5dae :) SELECT id, name FROM yyets_offline.`yyets` WHERE `data` LIKE 'diffuse wei % %' limit 10

SELECT
    id,
    name
FROM yyets_offline.yyets
WHERE data LIKE 'diffuse wei % %'
LIMIT 10

Query id: 1d1a3884-b73b4 -afd-b850-418e8B218e74 ┌ ─ ─ ─ ─ ─ id ┬ ─ name ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ │29620│ Marvel Phase Two Preview │ └ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘ ┌ ─ ─ ─ ─ ─ id ┬ ─ name ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ │30574 │ Marvel One-Shot: Agent Carter      │
│ 31794 │ Marvel One-Shot: AllHail The King33101 │ Marvel 75 Years: From Pulp to Pop! │
│ 33108 │ Marvel One-Shots                   │
│ 34471 │ Marvel                             │
│ 35159 │ Iron Fist                          │
│ 35506 │ The Defenders                      │
│ 35565│ spiders-Man                         │
│ 35738│ The Punisher │ └ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘10 rows in set. Elapsed: 0.277 sec. Processed 17.58 thousand rows.243.35 MB (63.52 thousand rows/s., 879.26 MB/s.)
Copy the code

After the data is imported, clean up and delete the connection to the mounted remote online database.

981a7a68eb35 :) drop database yyets_online

DROP DATABASE yyets_online

Query id: 31c82654-a15f-48ba-b27e-e7fa67896ecb

Ok.

0 rows in setElapsed: 0.007 SEC.Copy the code

Finally, use DU to take a look at the disk space ClickHouse uses.

du -hs data
 77M	data
Copy the code

ClickHouse’s storage footprint is very light compared to MySQL’s 530M data storage.

CSV Data Import

In general, CSV data is very simple to import, but there is a lot of “non-standard data” in CSV data: Such as tag, not strict escape, not strictly numeric type hybrid system a newline, field data contains “unexpected data”, etc., for this kind of situation, we need to “fix” the data first, because different scenario data completely different, so there is no way to provide a general scheme of “universal”, but with the experience of a n experienced person, Use Golang or Node to write a cleaning tool, the execution efficiency is really quite fast, interested students can have a try.

Back to “data import”, depending on whether the pipeline process is also needed, there are usually two ways to import CSV data. If your data is pre-processed, you can take the following way to import directly:

clickhouse-client --format_csv_delimiter="|" --format_csv_allow_double_quotes=0 --format_csv_allow_single_quotes=0 --query="INSERT INTO yyets_csv.resource FORMAT CSV" < /var/lib/clickhouse/dump.csv
Copy the code

However, if your data needs to be transformed using a procedural tool, you can import it in a way similar to the following, using awK, sed, or your own tools:

cat /var/lib/clickhouse/dump.csv | clickhouse-client --format_csv_delimiter="|" --format_csv_allow_double_quotes=0 --format_csv_allow_single_quotes=0 --query="INSERT INTO yyets_csv.resource FORMAT CSV"
Copy the code

The last

Unconsciously wrote nearly ten thousand words, let’s talk here first, there is time to talk about the pit in the process of use.

–EOF


We have a little group of hundreds of people who like to do things.

In the case of no advertisement, we will talk about software and hardware, HomeLab and programming problems together, and also share some information of technical salon irregularly in the group.

Like to toss small partners welcome to scan code to add friends. (To add friends, please note your real name, source and purpose, otherwise it will not be approved)

All this stuff about getting into groups


If you think the content is still practical, welcome to share it with your friends. Thank you.


This article is published under a SIGNATURE 4.0 International (CC BY 4.0) license. Signature 4.0 International (CC BY 4.0)

Author: Su Yang

Creation time: October 16, 2021 statistical word count: 33858 words reading time: 68 minutes to read this article links: soulteary.com/2021/10/16/…