尽管现在几乎到处都有大量数据,但分析数据库仍然非常陌生。他们知之甚少,甚至更不知道如何有效地使用它们。许多人继续使用针对不同情况而设计的MySQL或PostgreSQL“吃仙人掌”,与NoSQL斗争或为商业解决方案多付钱。ClickHouse改变了游戏规则,大大降低了进入分析型DBMS的门槛。
来自BackEnd Conf 2018的报告,并经演讲者许可发表。
我是谁,我为什么谈论ClickHouse?我是LifeStreet的开发总监,该公司使用ClickHouse。另外,我还是Altinity的创始人。这是Yandex的合作伙伴,他推广ClickHouse,并帮助Yandex使ClickHouse更加成功。我也准备分享我对ClickHouse的了解。
. . , .
« », ClickHouse:
- ,
- ,
- .
, .
, , ClickHouse, .
, ClickHouse , ClickHouse , .
, ClickHouse . , .
: « ClickHouse?». , , .
- – . ClickHouse . ClickHouse . , - .
- – . . , Vertica – . , . , . . ClickHouse .
- – . - . RedShift – . RedShift . , , Amazon, . Google BigQuery . - , , .
ClickHouse .
ClickHouse ? ClickHouse .
- -, . . use case, .
- AdTech ClickHouse.
- , .
- ClickHouse . ClickHouse, , .
- , . . ClickHouse.
- CloudFlare. - ClickHouse , . community. ClickHouse-. , Kafka Engine ClickHouse.
- . ClickHouse proof on concept, production.
- ClickHouse . , , 2 000 . – .
- -. , Bloxy.info. ethereum-. ClickHouse.
. , . . .
, :
- : 500+ , 25 .
- LifeStreet: 60 , 75 . , , .
- CloudFlare: 36 , 200 . .
- Bloomberg: 102 , . .
– . heatmap, ClickHouse . , , . . 4 .
, . , Altinity, . , , , . . , . . . , . , ClickHouse – , , .
? , ClickHouse . , . , , , .
ClickHouse .
- – : Vertica ClickHouse. , Vertica .
- – ClickHouse. . , ClickHouse , . , . , .
- – ClickHouse. , ClickHouse Hadoop . , ClickHouse - map reduce , . , .
- LifeStreet – Ad Tech , , .
- , programmatic bidding.
- : 10 . .
- , , – , programmatic bidding.
. HighLoad. LifeStreet MySQL ( Oracle) Vertica. .
, , Vertica – . . . proof of concept performance testing , 13- 16- . HighLoad.
– Vertica , . . , . , , - , , . , , .
? , , , . , Vertica, . , .
, open source , , . . . , . .
, , , open source, — .
, , , ClickHouse. , : «?», .
2016- , ClickHouse. , Vertica. . , . . (join), ClickHouse Vertica .
. : ClickHouse Vertica, .
(join), . ClickHouse Vertica . - , . . .
, , LifeStreet ClickHouse.
16- , . , , . , . .
, , .
– :
- .
- . 10 , , LifeStreet 75 3 . , . SQL- , .
- , ClickHouse , Vertica, , SAS-. ClickHouse SATA. ? Vertica insert . , , , , . . . ClickHouse insert . , , , ClickHouse , . . SATA, RAID , .
- , . . 3 60 (20 – ) 6 . Vertica .
.
- – . .
- – SQL.
OLAP- – select. group by, . where, . group by . .
star-, , .
, , , . , . , . . - .
ClickHouse . :
- – ClickHouse (join), . . (join) , . .
- – , . , star-, - . , , . .
ClickHouse . :
- – . External Dictionaries – , 99 % star-, .
- – . (join) .
- (join).
- . 2018- ( ) , . . , . – .
- , (join) , , , , , , .
- (join).
- 1 .
- , . - .
. , .
, . :
- . - .
- key-value . - .
- , - .
. - , (join).
ClickHouse , string key-value.
– , , . , .
. has
, , . , , .
subid - . , , , . .
, , , , -, . , -, , .
. , ID. . arrayMap
. -. -. ID .
. , .
.
, , .
- ClickHouse . .
- . ?
- (join), . , , .
- – . , , . (join), .
- .
. , -5 . 2,5 , -. , - . , , (int). . . 2,5 1,5 . .
. . 5 . , , - , , .
. , , , , , . .
- .
- , .
- - (join), , , , (join) .
- .
, , . .
. . ?
:
- - .
- .
?
, , 20 , . - , , . , , - .
: « , ?» « , ?». . . , , , ?
, , - , . . - . DWH. , .
– , , - , , , , . , , , - , . , . .
.
– :
- Last Interaction, interaction – , .
- First Interaction, . . , .
- – .
- .
- .
? Runtime Cassandra. Cassandra transaction storage, . . . - Runtime, , - - , Cassandra – . , . .
, transaction id, . . . .
, . , , 10 , 300 , . Cassandra , , Runtime , 10-15 .
, . ? , 30 . , , 30 . , - . 500 , , , Runtime , - . , .
ClickHouse. ClickHouse? , .
- , , . . mutable, ClickHouse mutable-.
- , , visit id. point query, ClickHouse . ClickHouse …, . .
- , json, , json , , - . .
. . .
, .
? ClickHouse, , . attributed , ClickHouse . visit id , , . . , . , , . . ClickHouse, . . . , DWH, .
. ClickHouse , visit id, 1 000-2 000 visit id 1 000-2 000 . .
ClickHouse, 3 , .
, , .
. materialized view , attributed , . . . materialized view . . . materialized view , .
SQL. .
– ClickHouse json , . . . ClickHouse json. - .
visitParamExtractInt json , . . . transaction id visit id. .
– materialized . ? , , . . , . ClickHouse . json , .
materialized view – . . ? -, , . . visit id, .
– index_granularity. MergeTree, 8 192 index_granularity. ? . ClickHouse , . 8 192. , , , , overhead. index granularity, overhead. , . .
ClickHouse.
-, AggregatingMergeTree. AggregatingMergeTree argMax, . . , timestamp. . . ClickHouse. argMax .
- «» Runtime.
- 3 . , Cassandra, . . .
- 25 ClickHouse. 5 . , Cassandra , click based , impression based. . . , 30 , .
– Y, .
:
- 5 000 .
- 100 .
- 10 . , , .
- 100 .
.
. , , , . . , . , , , , . . , .
.
. ? 100 , : , . 100 runningDifference . RunningDifference – ClickHouse, .
, . 5 000 12,5 . , . . 12,5 .
- , ͞x ͞y – . . . . , . - 12,5 , . . 60 . .
-, - , , ClickHouse.
Hadoop , Spark, Greenplum. . . . - , .
ClickHouse .
, , . , , .
? . . . logReturn, .
, . , . . . . . , . . .
. «r» , . . . – .
12,5 , , . . . 2 500 5 000 . ClickHouse-. , .
, . : , , . logReturn, . . , logReturn. , . . groupArray . . ClickHouse, . .
proof of concept – , . . . .
: Log_return .
- 50 . 50 – , . . , 70 .
, , . . . . .
- – . – ClickHouse.
- Summing/AggregatingMergeTrees – , state . .
- Materialized Views . , , , , attribute , . . , , . , . Materialized Views , , ClickHouse.
- .
- , . , , .
, , ClickHouse , open source , . . . . , , , ClickHouse, InfiniDB. , , , SQL. !
- ! ! - Apache Phoenix?
-, , - . ClickHouse . - ClickHouse, . .
( ) Apache Phoenix – SQL- Hbase. Hbase key-value. . Hbase, Cassandra. . , , ClickHouse.
-
! , . ClickHouse, , ClickHouse , mutable. - , ClickHouse, , ? , . , ?
. . , , mutable, , . . . ClickHouse Oracle, - . ClickHouse , , ClickHouse . . . , , . , - ClickHouse , .
! . . , -. , ? , , , - , ? . . - , ?
, ClickHouse . Hadoop. Hadoop – . Hadoop . – , ClickHouse , , . . , - , .
. , 50 . , ?
-.
, .
3- .
! ! . , ClickHouse . . . -, ? ClickHouse? , ? , , ClickHouse , - .
, . ClickHouse . , . ? , . - 2012-2013- ClickHouse. . . - , , . . ClickHouse – . . . .
! , . ? -. , , , - .
, , . . , Materialized View, , . . . ClickHouse, . . extract - , primary key, , Materialized View. , . Materialized View, . .
.