在实际应用中使用ClickHouse的理论和实践。亚历山大·扎伊采夫(2018)



尽管现在几乎到处都有大量数据,但分析数据库仍然非常陌生。他们知之甚少,甚至更不知道如何有效地使用它们。许多人继续使用针对不同情况而设计的MySQL或PostgreSQL“吃仙人掌”,与NoSQL斗争或为商业解决方案多付钱。ClickHouse改变了游戏规则,大大降低了进入分析型DBMS的门槛。



来自BackEnd Conf 2018的报告,并经演讲者许可发表。







我是谁,我为什么谈论ClickHouse?我是LifeStreet的开发总监,该公司使用ClickHouse。另外,我还是Altinity的创始人。这是Yandex的合作伙伴,他推广ClickHouse,并帮助Yandex使ClickHouse更加成功。我也准备分享我对ClickHouse的了解。





. . , .





« », ClickHouse:



  • ,
  • ,
  • .


, .





, , ClickHouse, .



, ClickHouse , ClickHouse , .



, ClickHouse . , .





: « ClickHouse?». , , .





  • – . ClickHouse . ClickHouse . , - .
  • – . . , Vertica – . , . , . . ClickHouse .
  • – . - . RedShift – . RedShift . , , Amazon, . Google BigQuery . - , , .


ClickHouse .





ClickHouse ? ClickHouse .



  • -, . . use case, .
  • AdTech ClickHouse.
  • , .
  • ClickHouse . ClickHouse, , .
  • , . . ClickHouse.
  • CloudFlare. - ClickHouse , . community. ClickHouse-. , Kafka Engine ClickHouse.
  • . ClickHouse proof on concept, production.
  • ClickHouse . , , 2 000 . – .
  • -. , Bloxy.info. ethereum-. ClickHouse.




. , . . .



, :



  • : 500+ , 25 .
  • LifeStreet: 60 , 75 . , , .
  • CloudFlare: 36 , 200 . .
  • Bloomberg: 102 , . .




– . heatmap, ClickHouse . , , . . 4 .



, . , Altinity, . , , , . . , . . . , . , ClickHouse – , , .





? , ClickHouse . , . , , , .





ClickHouse .



  • – : Vertica ClickHouse. , Vertica .
  • – ClickHouse. . , ClickHouse , . , . , .
  • – ClickHouse. , ClickHouse Hadoop . , ClickHouse - map reduce , . , .




  • LifeStreet – Ad Tech , , .
  • , programmatic bidding.
  • : 10 . .
  • , , – , programmatic bidding.




. HighLoad. LifeStreet MySQL ( Oracle) Vertica. .



, , Vertica – . . . proof of concept performance testing , 13- 16- . HighLoad.





– Vertica , . . , . , , - , , . , , .





? , , , . , Vertica, . , .



, open source , , . . . , . .



, , , open source, — .





, , , ClickHouse. , : «?», .





2016- , ClickHouse. , Vertica. . , . . (join), ClickHouse Vertica .



. : ClickHouse Vertica, .



(join), . ClickHouse Vertica . - , . . .





, , LifeStreet ClickHouse.





16- , . , , . , . .





, , .



– :



  • .
  • . 10 , , LifeStreet 75 3 . , . SQL- , .
  • , ClickHouse , Vertica, , SAS-. ClickHouse SATA. ? Vertica insert . , , , , . . . ClickHouse insert . , , , ClickHouse , . . SATA, RAID , .
  • , . . 3 60 (20 – ) 6 . Vertica .




.



  • – . .
  • – SQL.




OLAP- – select. group by, . where, . group by . .





star-, , .





, , , . , . , . . - .



ClickHouse . :



  • – ClickHouse (join), . . (join) , . .
  • – , . , star-, - . , , . .


ClickHouse . :



  • – . External Dictionaries – , 99 % star-, .
  • – . (join) .




  • (join).
  • . 2018- ( ) , . . , . – .
  • , (join) , , , , , , .




  • (join).
  • 1 .
  • , . - .


. , .





, . :



  • . - .
  • key-value . - .
  • , - .


. - , (join).





ClickHouse , string key-value.



– , , . , .



. has, , . , , .



subid - . , , , . .



, , , , -, . , -, , .





. , ID. . arrayMap. -. -. ID .



. , .





.



, , .



  • ClickHouse . .
  • . ?
  • (join), . , , .
  • – . , , . (join), .


- .





. , -5 . 2,5 , -. , - . , , (int). . . 2,5 1,5 . .





. . 5 . , , - , , .





. , , , , , . .





  • .
  • , .
  • - (join), , , , (join) .
  • .


, , . .





. . ?



:



  • - .
  • .




?



, , 20 , . - , , . , , - .



: « , ?» « , ?». . . , , , ?



, , - , . . - . DWH. , .





– , , - , , , , . , , , - , . , . .





.



– :



  • Last Interaction, interaction – , .
  • First Interaction, . . , .
  • – .
  • .
  • .




? Runtime Cassandra. Cassandra transaction storage, . . . - Runtime, , - - , Cassandra – . , . .



, transaction id, . . . .



, . , , 10 , 300 , . Cassandra , , Runtime , 10-15 .



, . ? , 30 . , , 30 . , - . 500 , , , Runtime , - . , .





ClickHouse. ClickHouse? , .



  • , , . . mutable, ClickHouse mutable-.
  • , , visit id. point query, ClickHouse . ClickHouse …, . .
  • , json, , json , , - . .


. . .





, .



? ClickHouse, , . attributed , ClickHouse . visit id , , . . , . , , . . ClickHouse, . . . , DWH, .



. ClickHouse , visit id, 1 000-2 000 visit id 1 000-2 000 . .





ClickHouse, 3 , .



, , .



. materialized view , attributed , . . . materialized view . . . materialized view , .





SQL. .



– ClickHouse json , . . . ClickHouse json. - .



visitParamExtractInt json , . . . transaction id visit id. .



– materialized . ? , , . . , . ClickHouse . json , .



materialized view – . . ? -, , . . visit id, .



– index_granularity. MergeTree, 8 192 index_granularity. ? . ClickHouse , . 8 192. , , , , overhead. index granularity, overhead. , . .





ClickHouse.



-, AggregatingMergeTree. AggregatingMergeTree argMax, . . , timestamp. . . ClickHouse. argMax .





  • «» Runtime.
  • 3 . , Cassandra, . . .
  • 25 ClickHouse. 5 . , Cassandra , click based , impression based. . . , 30 , .




– Y, .



:



  • 5 000 .
  • 100 .
  • 10 . , , .
  • 100 .


.





. , , , . . , . , , , , . . , .



.





. ? 100 , : , . 100 runningDifference . RunningDifference – ClickHouse, .



, . 5 000 12,5 . , . . 12,5 .



- , ͞x ͞y – . . . . , . - 12,5 , . . 60 . .





-, - , , ClickHouse.





Hadoop , Spark, Greenplum. . . . - , .





ClickHouse .



, , . , , .



? . . . logReturn, .



, . , . . . . . , . . .



. «r» , . . . – .



12,5 , , . . . 2 500 5 000 . ClickHouse-. , .





, . : , , . logReturn, . . , logReturn. , . . groupArray . . ClickHouse, . .





proof of concept – , . . . .



: Log_return .



- 50 . 50 – , . . , 70 .



, , . . . . .





  • – . – ClickHouse.
  • Summing/AggregatingMergeTrees – , state . .
  • Materialized Views . , , , , attribute , . . , , . , . Materialized Views , , ClickHouse.
  • .
  • , . , , .




, , ClickHouse , open source , . . . . , , , ClickHouse, InfiniDB. , , , SQL. !





- ! ! - Apache Phoenix?



-, , - . ClickHouse . - ClickHouse, . .



  • ( ) Apache Phoenix – SQL- Hbase. Hbase key-value. . Hbase, Cassandra. . , , ClickHouse.





    • ! , . ClickHouse, , ClickHouse , mutable. - , ClickHouse, , ? , . , ?



    • . . , , mutable, , . . . ClickHouse Oracle, - . ClickHouse , , ClickHouse . . . , , . , - ClickHouse , .







! . . , -. , ? , , , - , ? . . - , ?



, ClickHouse . Hadoop. Hadoop – . Hadoop . – , ClickHouse , , . . , - , .



. , 50 . , ?



-.



, .



3- .



! ! . , ClickHouse . . . -, ? ClickHouse? , ? , , ClickHouse , - .



, . ClickHouse . , . ? , . - 2012-2013- ClickHouse. . . - , , . . ClickHouse – . . . .



! , . ? -. , , , - .



, , . . , Materialized View, , . . . ClickHouse, . . extract - , primary key, , Materialized View. , . Materialized View, . .



.




All Articles