适用于高级开发人员的ClickHouse功能。阿列克谢·米洛维多夫(Alexey Milovidov)(2018)



在该报告中,计划考虑ClickHouse鲜为人知或未充分记录的功能:增量聚合和具有聚合功能状态的操作,集群间复制,不使用服务器执行查询等。将给出来自开发Yandex服务的实践示例:如何从系统中挤出最大的收益。







! ! ClickHouse. , , .



, . , , . . . .



ClickHouse, , . - clickstream. , , -. , , ..



, , . . - .





. , , pupkin.narod.ru; , yandex.ru. . ? .



. , . MergeTree, .





, . CREATE TABLE. ORDER BY – . , , . . , . . , , Hash . . . hash .





, . - . , . .



4,5 , . . 3,5 . . 4,5 .





1/10. from . 0,6 . , - , 2,5 . , , , . , – . , 10 . , .





, ?



  • -, . . , .
  • . , unix timestamp.
  • timestamp, ( , ). , , DevOps, . . , .
  • - . , .
  • , .
  • , url, hash . , , , . , url . - hash-.
  • - .
  • . , - .
  • , . , . - hash. , . ? . - , . , - , .
  • . – , . . .




, , :



  • -, , , hash-, . . , , .
  • . , . . . . . 1/10 hashes .
  • , .




:



  • – 1/10 .
  • – — - , , 1 000 000. . , , 1 000 000, . , , , , . . . 1 000 000 10 000 000 1 000 000 000? – _sample_factor. : x _sample_factor, , .
  • – SAMPLE OFFSET. . 1/10 , : « , , 1/10 ». : SAMPLE 1/10 OFFSET 1/10. .
  • . , , . , . , , . ? , , , , , overhead.




– . , , , count distinct, uniq. uniq 4 .



. . , If. , : sumIf. , . – , , – , - UInt8. , , . , , .





? – . ..



Google. .





– Array. , Array. , , . .



. . , .



groupArray, . , … . . . .



groupUniqArray. .



Array - , , .



, groupArrayArray? , , . .



groupUniqArrayArray – , .





, . , . , , , , , .





. . . sum, Array, If , , . . . sumArrayIf SumIfArray.



? . , . – , , – : . sumIfArray. . . Array sumIf .



– sumArrayIf. , – : sumArray .





. : sumForEachStateForEachIfArrayIfState. , , , . . , .





. , , . , . .



, , . . .



, - , .



, count distinct? Hash table. hash‑ .



, , . : « , , ». - .



, ClickHouse . –State, . , , AggregateFunction - .



, . . . AggtregateFunction. , , .



–Merge. , , .



, .







uniq . .





. –State. - , . - . , UTF-8. , , UTF-8.





? AggregateFunction . . – , – .





. , , . ?





– .



, ClickHouse – , – . clickstream, , , . , . ClickHouse , .



. - , , .



. AggregatingMergeTree. , . . , - . . , . . , , count distinct, .



, . ClickHouse, , .





. ?



  • – , . , . , . , ClickHouse-, . .
  • , , , , – . sum sumIf – , .
  • , *. , , , arrayReduce. , , , , . . –State, .


* 2020 , initializeAggregation.





ClickHouse – .



, ClickHouse , . . INSERT, , . . INSERT , . , .



conflict-free , update. INSERT. , .



mulit-master. . , . , , . . . exactly-once .



, . , INSERT, , , , .





. , INSERT.



. INSERT, SELECT, .



– INSERT. , , . insert_quorum = 2. .



, INSERT , , INSERT, .



SELECT , select_sequential_consistency. , . select_linearizability, .



, SELECT, ZooKeeper. SELECT , , . . INSERT, .



, , - , . , , .



, . , , , .





. .



, , GROUP BY, , , - . , .



. . . .





. progress-bar, . ClickHouse- – , . , , . , 9,31 , 10 .



, : « 10 ?». . .





, . , , . - 0,5 . , , .



- , 10 . , , . , - .





: « max_memory_usage »?





, . . , production .





– GROUP BY . ? , - . , . , . , . . . . . - . , .



, :



  • – , .
  • – merge, . . merge , .


. dataset - . . datasets - , buckets. .





. max_memory_usage 10 . 8 (max_bytes_before_external_group_by = 8 000 000 000, distributed_aggregation_memory_efficient = 1). progress-bar - . . ? , . , , .



. , , . , , . . . , .



, . . . , 10 GROUP BY, , 1 .



, , , .





. .



, ClickHouse – - geospatial-, .



– . . , , .



? , . , - , . , , .



? , .



pointInPolygon. – : lat, lon, . *. - .



* 2020 , .



, . , -, . , . . .



, pointInPolygon, . pointInElliplses, ellipses . , , , . , . . 0 1, ellipses.



– greatCircleDistance. *. . , .



* 2020 , geoDistance, WGS-84 .





https://events.yandex.ru/lib/talks/5330/



– ClickHouse . ClickHouse. , , - . ClickHouse. .



. . , . modelEvaluate.



, , .



, CTR . . , : https://events.yandex.ru/lib/talks/5330/. .



. CatBoost. CatBoost? .





, ?



  • - , . , - , . *. , . - , , . .
  • , , ClickHouse . . , – . , , . , . , ClickHouse. . .
  • , , -, . . .


* ClickHouse.





. ClickHouse ClickHouse-.



- . ClickHouse. - . .



? grep, sed, awk perl. , , - , ClickHouse , . ClickHouse - , . . clickhouse-local.



. – , . . . , ClickHouse - .



, . , , JSON, JSONEachRow. stdin . , , . , , - ClickHouse. , , . , awk, perl, sed. , grep, . . .



, , , , – ClickHouse-. . .





ClickHouse-, ClickHouse-local , . , ClickHouse- .





? , ?



Date DateTime. Date , , ISO 8601, DateTime , - , . , , , *.



, . parseDateTimeBestEffort. , . , . . , , , .



* , date_time_input_format.



, , Hadoop . ClickHouse-local MapReduce jobs Hadoop.



, Parquet. pull request. , *.



* , — !



, , trash SQL. , - regexp’ , clickhouse-local. , Awk, , ClickHouse*.



* — Regexp.





, , .



? ClickHouse . . . , , . , .



? . , INSERT SELECT . . , .



, . . , , , , , – , , . - . .



, ClickHouse-copier. , . Zookeeper. – , .



. . . , .



- , . - .





– production.



.. 538 . 240 . , . .



. , . . lz4, zstd . . , . ClickHouse-copier. , , . - .



, - . , , . , , , , .



, !







! . . real time , ?



. , .



. ., , , ? ? ?



. , , , . - . , . . . . , . . . . , . .



. . ClickHouse?



, ClickHouse. . . . , . , , . , , , .



! Avro, Parquet, MapReduce machine learning, - , ClickHouse - jobs YARN, Mesos? . . , Spark, , , date locality , ?



, , , . , Hadoop MapReduce YT. , ClickHouse YT, , . - , .



. . . YT - ? . . , ClickHouse, tool CSV job?



*. , - . ClickHouse-, - , ClickHouse, YT. .



* .



! ClickHouse-local - - - , ?



, standalone-. – ClickHouse. . . ClickHouse, , , local, , .



! . groupUniqArrayArray order , ?



, , , .



! ! ClickHouse. . Vertica. , . ? CSV. CSV ClickHouse. ClickHouse , . . , , , . , , . - ?



ClickHouse JOIN , ? *. ClickHouse hash JOIN, . . , hash- . . . , , - . , merge JOIN, , .



* , join_algorithm. , , .



! , , master-master , , , ? , , ?



– , pull request. . , . Master-master . -, ClickHouse ZooKeeper. . , , , - - .



! ! ClickHouse-copier , , , ?



ClickHouse-copier , operation-. . , , , . . . , . . , -, . . , ZooKeeper. – *.



* , .




All Articles