有效使用ClickHouse。阿列克谢·米洛维多夫(Yandex)



由于ClickHouse是专用系统,因此在使用它时必须考虑其体系结构的特殊性。在本次演讲中,Alexey将讨论使用ClickHouse时可能导致无效工作的常见错误示例。实际示例将显示一个或另一个数据处理方案的选择如何将性能改变几个数量级。



视频:





你好!我叫Alexey,我在做ClickHouse。





首先,我急于取悦您,今天我不会告诉您ClickHouse是什么。老实说,我对此感到厌烦。每次我告诉你这是什么。而且,也许每个人都已经知道。





, , . . ClickHouse. , ClickHouse , , , . , .



- , , .



, ? . , , , , - .





, , , , inserts batches, . . inserts.



, ClickHouse insert, . .



, . , .. . 105 - . 700 . - .



MergeTree, . . – , 400 000 .



, , , 250 000 . – ClickHouse*.



* 2020 , .





, ? MergeTree 59 . 10 000 . ReplicatedMergeTree – 6 . , 2 . -, - . ? , ClickHouse . .





– . , , . , – . . , . . , , . , , . , . - .



, insert ClickHouse, memtable. log structure MergeTree, MergeTree, log’, memTable. , . 100 , 200 . .





: « ?», , - - ClickHouse.



1. . - . , Kafka. Kafka, . , , .



, Kafka – . , Kafka. , . , , . .





2. . - , . . , , , . cron, - daemon ClickHouse. , .



, , - , .





3. , . , - - daemon, . , . , , , , , ClickHouse.



kill -9 . , . , , . , .





4. . - . ClickHouse , . , http- transfer-encoding: chunked insert’. , , overhead .



ClickHouse . ClickHouse .



. , , , ClickHouse , insert. ClickHouse inserts . , . .





5. . - community – . , . , ClickHouse . open source, , , . – , GitHub, . , - .



* 2020 , KittenHouse.





6. – Buffer . , . Buffer .



, . MergeTree , buffer , . 10 000 , . , , . .



buffer . - , .





ClickHouse Kafka. – Kafka. . . Kafka .



, . community . «community », . , , .



* 2020 , RabbitMQ.





? insert values values - . , now() – . ClickHouse , . .



* , , VALUES .



, , . ClickHouse . , , . , , .



* ClickHouse write-ahead log, .





– .



, . – , , string. . .



, , , - , , ClickHouse , . - .





, IP-. . , 192.168.1.1. – UInt32*. 32 IPv4 .



-, , . , , . - .



.



IP-, . 137 . , 37 . , . . 4 .



, . - , IP- . , .



. , , , , , .





.



\1. , . , , , . ClickHouse. . .



, , . . : . , «», , , - . .



Ulnt32 250. 250 , , , -. , ClickHouse . , , . . , , , . .





, ClickHouse. Enum. Enum . , : , , , . 4 .



, . . alter table. alter table ClickHouse . Enum, . alter * , selects. alter , . . - .



* ClickHouse, ALTER .





ClickHouse – . ClickHouse , . , : MySQL, Mongo, Postgres. , http . ClickHouse , .



, join . . , . , , .



. .. . , , . . – , . MySQL.



, , hit rate 100 %. , MySQL. ClickHouse , – , , .



, – ClickHouse . . . , , . , .





, , . . – 64 .



, 64- , . , .



. , - .



. , , - , . , . .. , . , , , . , , - , .



– .





, , , . . , , ru – 2 . , , , 2 . , , .





, , , , . – . , - . , . , . , , , . . .



– , , join. join – , . , .



in place, .





- , , - .



, , . , , , .



, . ClickHouse , . , . , , RFC, , , .



. 166 . , 67 , . . . - , - , - , .



- , , . . . , .



, , 126 , 5 . 25 . 4 . . , 25 - -.



, , , - 4. - 25 . ? - . , . .





, , , , . IPv4, UInt32*. IPv6, FixedString(16), IPv6 – 128 , . . .



, IPv4 , IPv6? , . IPv4, IPv6. , IPv4 IPv6. , IPv4 , .



* ClickHouse IPv4, IPv6, , , , .





, . , - . , , ClickHouse, . - , .



, . , , , . . : 12.3. , , , . , . , . , . .



4 . , ClickHouse. ClickHouse – . , . 5 BrowserVersion, 5 . .





, , , . ClickHouse . ClickHouse - . - .



, . , 512. 512 – .





, ClickHouse, Log, . , , .



* ClickHouse LowCardinality .





. - . . , - , , MySQL 3.23.



, .



, , , .





- . , . , MyISAM . .



– , alter . . MySQL .



, , , .





ClickHouse , , -, , .



: « ClickHouse ?». , . , . . , .



, -, . , , , – . - . . . ClickHouse .





Alter ClickHouse , alter add/drop column.



, 10 10 000 , . ClickHouse – , throughput, latency, 10 .





. , .



, maintenance .



, , , - , . – StripeLog. TinyLog, .



* ClickHouse input.





– . , 5 , 6 . , . 5 , 1 000 . . , , 200 ClickHouse, . instance .





ClickHouse . instance ClickHouse . . . - , , 56 . , , 56 . 200 ClickHouse , , 10 000 . , .



, instances . - , - . instance, ClickHouse .



, TCP. , . .





, . .



, – . , 1 000 , . . . ClickHouse AggregatingMergeTree, .



, , . , , SummingMergeTree , 20 - . , .





. -, . , . – , , , . , .



? , , . . . , . ClickHouse alter . - C++. , C++.



ClickHouse, – , . ClickHouse , . , .





– . - production show processlist. , - .



, . , . url in .





– ? , . , , ru url = - . , url, . . ClickHouse .



- , , ClickHouse . , -. . , . , , .





, ClickHouse IN. , MySQL IN, , 100 - , MySQL 10 , .



– , ClickHouse, , , full scan, . . , . , .



. , , IN . . . *.



– , , . , 100 500 . , , 50 . .



* ; , .





, API. , - . - , API , - . - , .



. API, . , -. . .



ClickHouse – . . , , . .





. .



, , , ClickHouse , ClickHouse .



? pipeline . , , -. ClickHouse . , , - - . , .



. , rsync.



ClickHouse . , ZooKeeper. ZooKeeper , , , , - java-, ClickHouse – , C++, . ZooKeeper java. - , .





ClickHouse – . . , Distributed , failover. , , .





, table engines. ClickHouse – , . , , MergeTree. – , .



MergeTree , - . . , , default – 2000 . .



, . .





, . , , . Log.



– StripeLog TinyLog.



Memory , - .





ClickHouse .



. . . JOIN, , , ClickHouse Hash JOIN. , , JOIN *.



, , , inplace .



* ClickHouse merge join , , . .





, .



ClickHouse . *. . - , , , , .



* update delete batch .



, . , ReplaceMergeTree. merges. optimize table. , .



JOIN ClickHouse – .



, .



ClickHouse , select.*



* ClickHouse . , . ClickHouse – Catboost. , : « . !». .





, ClickHouse, . , . , – . - , , , , .





! ClickHouse?



.



ClickHouse. cli .



.



select’.



.



GitHub, .



.



, , .



.



. .



. . LZ4, ZSTD*. 64 1 .



* , .



?



. . , .



.



, , uniqExact , . . , uniqExact , . ? . . , , . , , ? - ? , , - .



, , . IP- . , , ClickHouse IP- . , . . uniqExact , . -.



? , user id, in, , , ?



. , , - – . , , , , , .



, ! ClickHouse! . ?



. . ?



-. MySQL, . . after, , .



. , , - . , .



, .



, , . . , . . . : -. , , , . ?



, , , C++.



C++ ?



-.



*.



* – pull request.



!



! ! , ClickHouse , . . , ClickHouse, , . , ZooKeeper . - ClickHouse, -, ?



, . , , set max_threads = 1. , . . . , , .



, ! . , ClickHouse . , , . . . ?



-, – , , . , , . , - , java, exception, . , . . , . ? – *.



* ClickHouse, " ", .



– ?



.



! ! , , . - WITH CTE?



. WITH . .



. !



! ! . , , - ?



. . , . *.



* .



- ? , ?



, deletes, updates , selects inserts.



. primary key. , , , ? , , , ?



.



. - primary key, «» , , ? , ?



.



在主键中放置一个字段也许很有意义,如果按此字段对数据进行排序,则可以更好地压缩数据。例如,用户ID。例如,用户访问同一站点。在这种情况下,请输入用户ID和时间。然后,您的数据将得到更好的压缩。至于日期,如果您真的没有而且也从来没有按日期进行范围查询,那么就不必将日期放在主键中。



好的,非常感谢!




All Articles