迟早在加载的项目中,需要某种专门的数据库,缓存或其他存储。产生这种需求的原因通常是追求性能,低响应时间或存储效率。
在我的报告中,我将谈谈我们在开发和操作基于Apache Kafka的专用时间序列数据库方面的经验。
. IT. - , , , .
Kafka, Kafka , , .
- Timeseries , , , .
- , .
- , .
- , Kafka.
- Kafka. , . , .
, Okmeter.
Okmeter – , .
, . . . - . , . . , , .
. . , , . .
Timeseries, . . – , .
:
- , , , , , , .
- metric store. , + timestamp + .
5 Cassandra, .
, Timestamp, .
, . .
, , , Cassandra timestamp . . 5 000 , range- Cassandra. . Cassandra – write only storage.
- , . Cassandra, . . . , . chunked. , 240 .
- , timestamp. 240 8 floats.
- . .
- Cassandra. , chunk.
, , . . . , , . , .
- .
- .
- Cassandra. . . . , , , , , , . , Cassandra.
- , Cassandra , , CommitLog. CommitLog – cassandra’ WAL. checkpoint , 8 CommitLog , , , .
- , , .
- , , . , , . .
- . , . . , .
- , blob , . . . requests , , , , .
- , , , - , .
- , , 5 000 , , , 1 000 .
- , -, . , .
****
, , .
«»? , , , , - , . .
, . Cassandra , , . Cassandra.
Cassandra.
, . , , .
WAL. REDO log, WAL , log, .
, , . MySQL buffer pool, Postgres shared buffers, , . . , WAL. datafiles , , -.
crash, , datafiles WAL. , , datafile. offset WAL, . . , . checkpoint.
Postgres WAL . . . , . MySQL log .
, ? primary. . , .
, . . - , , commit «Ok» , . .
, primary. primary.
, lag. , primary.
, , primary . . lag. , .
TSDB.
WAL Kafka. , , primary, . . Kafka.
– . Kafka .
, Kafka – , .
- , Kafka – . , , . . – , , .
- . , - .
- – , .
- Consumer . Consumer :
- - , . . offset 1, 2, 3 , ;
- Kafka , . consumer groups offset commit.
Kafka partition. , , .
? primary, , . - .
- , : «, , primary + 2 », . . .
Kafka? . . consumers . , . , , .
, . , . . N , N .
consumer. consumer group, consumer . consumers, .
, Kafka , . .
, . string. – . value – . Kafka , message .
partition. , .
offset. , , .
timestamp, -, 10 -. , , . . . , 3 . , consumer.
? . .
, . . , , - 2. WAL 2. , delta locality, .
Kafka watch write. . .
low level , consumer groups, offset. , - - offset № X.
, . , , . . t – f . , .
timestamp, Kafka offset - timestamp. , now + 4 . , .
consumers, consumer group. , - . consumers , Kafka , , , consumers . .
, , offset. , , , . , .
. low level , . . , , in-memory storage .
. , . Kafka , . partition worker. workers, , . . instance memory storage .
, , . , .
, Kafka: « , , , 4 ». . . , .
, , . . , message Kafka timestamp , , - , , , .
, . . :
- Kafka HighWatermark , . . offset . offset. , , . . , 1 000 .
- timestamp, timestamp. , , - . , . Postgres, : « , ?».
health check. . . health check Kubernetes , : «, , ». , , . .
, , , . . , .
. – . , , 75 000 000 . . 15 .
– Kafka. . . 130 000 . , . . .
, REDO, .
, , . . , .
, , , . , memory storage 4 , 8 . , , , . storages.
, . , , 1 000 . . , . . , , . .
. 99 20 . , 3 500 . , . . 95 – - 600 3 .
- . , .
- . . instance.
- , , . . , , . consumer, Kafka. , . . . 18 . 18 .
- , , 1,5 . . , read buffer , , , .
- , . . .
LTS. chenker.
- , . . blobs, .
- , . . offsets, Kafka. Kafka , . , .
- , 4 , blob, Kafka, .
- , blob , blob, message, .
, . , Cassandra. Kafka , long term storage, . . Cassandra blobs - , , , MySQL.
, Cassandra , , , . , . . , , .
ConsumerGroup.
?
- 200 Cassandra, . . 30 000 writes , 150, blobs .
- Cassandra . Cassandra . 12 SSD, , 3 SATA-. latency SSD .
. ? - . , . , , offset’, . . .
Kafka . 3 , :
- , , , , . .
- . worker’ .
- , . , .
, . .
http, http :
- http-400 – , . . . , JSON. .
- http-503 – . , storage, , - .
- , .
storage, , , Kafka. , storage .
, , . Cassandra , . , , Cassandra , .
, - .
production-. Kafka . , . , , consumers, .
production 6 . 1.0.
. Kubernetes .
? 2 , 2 SSD. 2 SATA-. system d Kafka , 4 10 . Kubernetes , . . . 4 10 Kafka, Kubernetes . -.
- , WAL, 5 . 5 .
- , blobs, 2 . 3 . , . , . 2 Kafka. Cassandra , .
- 20 000 Kafka 6 .
- 6 consuming, producing, 10 . 6 . 45 . . . .
prod .
- Rolling upgrade . .
- . .
- . , rolling , .
- Kafka . memory leak. , memory leak , JVM heap .
- Kafka Kafka. , lz4, consumer , lz4 . Kafka - - . consumer, , .
- heap , , . . . , lz4, . . , , . , , Kafka.
- consumer , prod lz4 . , , payload . , downgrade heap , heap – .
, Kafka , .
, . . Kafka, ZooKeeper, , ID N+1. , , , . , .
Kafka . kafka-reassign-partitions. generation. , , - - , , - +1 2. .
, , . . , , . , , .
- , , , generate , .
- , , . , , . . , , 10 , . . . , , , generate tools, , Kafka .
- . reassign apply , Kafka , , , , , .
, .
- .
- Kafka . .
- , , , . . . tooling, , - ZooKeeper , .
- , , .
. , . , . , , - . , . , Kafka . , , , , - .
. , reassign . . . ID № 5, , , - .
.
Kafka , REDO? LinkedIn : «, Kafka REDO. , . ». , Kafka : « 5 , ».
, , , , . , , Kafka , , , . . reassign , . . . . 2 : 386 5 , 20 100 , . , .
, , . - , , , , . , . , . , . .
, , reassign . , .
Kafka prod.
, , .
, , Cassandra, . , 5 . , . Kafka . . . message . . , , . .
maintenance- . - , , . , confluent , , Kubernetes , , , reassign, - Kafka Kuber.
, , :
- , , . .
- production. . shadow-. . . storage, . , latency. . . . prod , .
- secretion read, secretion write, . SSD. . 3 , .
, . Kafka , Cassandra . , .
Mongo , , . , Postgres write amplification MySQL . . .
, . . , , , . , .
. ?
- , , . . . , .
- , , :
- : « ?».
- - , , , .
- - . , WAL . Kafka WAL, , . , . . . .
- , . .
, . !
! , . ! ! ! Kafka WAL – , , . . , -, . - JPoint , Kafka 1 events, . , , ? , .
, Kafka, , , . . Kafka - , . . .
! ! ? , , .
, Kafka. , , Kafka- , , . , . . – . .
, . , , . , , , .
! ! bunch size bunch size ?
bunch , , , 1 000 . , bunch’ , . . Kafka- , .
Cassandra memory storage, consumer ?
, consumer memory . consumer, . . . consumers .
. Prometheus Graphite ?
? Prometheus, Influx , -, - . . - -, - , .
, , Kafka REDO. , ?
. - . -, . Cassandra , jbot, . . , . Kafka – . , , .
! , Kuber, Kafka? ?
. , Kafka . , Kafka 2 CPU. CPU . .
! ! , , , , , , . Kafka, , ?
. . Kafka . , Kafka .
- ?
- . , - . . , . , . , Kafka CPU. , ZooKeepers . . . , .
, ! Kafka. . . , . , ? , . . , ?
, . - . . , . , , , , . .
. . ?
– - float - timestamp, . . . . timestamp.
. . Kafka , , . Kafka «», . . . , . , . . , .
. , , . , , . . . , Kafka , , , , , , – . . , . .
. .
, ! ! TSDB?*
Prometheus不知道如何跨节点可靠地布置许多副本。他不知道如何存储LTS。他对此有疑问。而我们发送供阅读的那些请求(每个请求包含10,000个指标)被Prometheus折叠在此之下。
根据工作量,我们对TSDB的要求略有不同。我们有观看图形的用户,但是大多数负载是由触发器创建的,这些触发器不断提取和读取此数据。我们尚未将它们转换为流。这也是卡夫卡之所以成为我们明智选择的另一个原因,因为我们将检查变化流中的触发器。我们将继续进行,不会从该存储中读取。