Zalando的K8上的PostgreSQL:战斗了两年。亚历山大·库库什金(Zalando)



我们都知道,大多数DBA都非常保守,他们宁愿其数据库仅位于专用服务器上。在具有微服务,Kafka和Kubernetes的现代世界中,基础的数量开始与组织的大小成正比地增长,并且很快超过了舒适的手动或半自动管理。







我已经在Zalando工作了将近7年。有多少人听说过扎兰多?





  • 对于那些没有听说过的人,这是一家类似于俄罗斯拉莫达的公司。



  • 我们销售衣服和鞋子,但我们在欧洲的17个国家/地区销售。



  • 我们有7个自己的物流中心和仓库。



  • Zalando拥有15,000多名员工。



  • 其中,约有2,000名从事技术工作。技术人员遍布大约200个编写应用程序的团队。



  • 最近,我们已经在Kubernetes上部署了很多东西,并与Kubernetes进行了很多合作。







?



  • , Kubernetes, , .
  • , Postgres Kubernetes Spilo Patroni.
  • , Postgres-Operator Kubernetes.
  • – , , .




  • Kubernetes . 140 . 50/50 production/test environment. . . cost unit 2 Kubernetes-. , . .
  • production deployment CI/CD. docker image, , CI/CD.
  • production Kubernetes- , . request, 4- , - , . -.




Postgres Kubernetes? . 10 Postgres- Kubernetes-.



Postgres-Operator Postgres Kubernetes , 140, .





Kubernetes, Postgres? . , , Kubernetes.



, , - .



  • Kubernetes . tools.
  • Kubernetes . .




. -. worker-, , Kubernetes , kubelet, docker, fluentd, kube-proxy . .



. , .





?



- . docker . Kubernetes . , PersistentVolumes PersistentVolumeClaim.



– StatefulSets, , -, , . . . , -, StatefulSets PersistentVolumeClaim PersistentVolumeClaim templates volume, volume , .





Postgres Kubernetes, . , Kubernetes docker. , - .



  • docker image. Spilo. Spilo – . image Postgres, . . , 9.3 12.
  • postgres’ extensions , pg_partman, pg_cron, postgis, etc, timescaledb.
  • tools , pgq, pgbouncer, wal-e/wal-g. , , docker Kubernetes, , image Kubernetes EC2 instance Amazon.
  • HA Patroni,
  • .




Patroni? , , . Postgres, HA.



Patroni Python. Kubernetes. Postgres first class citizen Kubernetes, . . Postgres .



Patroni Postgres Kubernetes supervisor , . . .



Patroni – , , failover . Patroni , . . InitDB Postgres, Patroni point in time recovery, .



, , Patroni .



, Patroni, Postgres. - Postgres, Patroni: « ». .





? StatefulSet. . . PersistentVolume. StatefulSet, demo-0 demo-1.



, – Patroni. Patroni kubernetes’ endpoint. . . , Patroni , . , , , endpoint, IP.



-. , .



demo — repl. , labelSelector: role = replica. , labelSelector.





?



, , YAML manifests. . , YAML. , .





Helm, . . CI/CD deployment. . rolling upgrade. minor Postgres, docker image, ? StatefulSet , StatefulSet, . . .



, , rolling upgrade. rolling upgrade Kubernetes-.





? , : 1, 2, 3. availability , . . -. , volumes .



Kubernetes upgrade, workers, . . . cloud environment AWS, - EC2 instance, . .



? , 3 , 3- . 2 availability .



Kubernetes , . Patroni . enter option , . . connections , . , .





.





Kubernetes rolling upgrade .





. . . .



, .





.





? – .





, 3 failover , . . 3 3 failover. B – 2, C – 2.





- , .





.





, , . . , : « Postgres». , pull request Git. kubectl Amazon. .



, - instance, .



.



, .





?



:



  • Deployments. . .
  • Upgrades clusters. rolling upgrade Postgres. rolling upgrade Kubernetes .
  • : , , .
  • failovers maintenance.




Postgres-Operator. Kubernetes, , . . , , . – , .





Postgres, YAML-. .



-, , ID , . . . Team, , ACID. ? , . . Atomicity, Consistency, Isolation, Durability.



-, volume. – 1 . – 2. Postgres. . : «, , . owner ».



?





DB deployer. , CI/CD. YAML- CRD-, . Postgres-operator event . StatefulSet - . endpoint, . Postgres, . . superuser , .



Kubernetes , .





rolling upgrade Kubernetes?



.





3 , 3 . , 3 , .





, . Kubernetes , .





. , , .



switchover.





, . . switchover = 1.





, .





Switchover . , , , , . . , downtime .



? issues ?







-, Kubernetes- AWS. .



AWS API , API. , - , AWS .



? Kubernetes AWS API , volumes, , , volumes , postgres’ . , . .



, deployment , . , .



EC2 instance Amazon. , , , , . Amazon, EBS volumes instances. ? , . . - , instances. , instance Amazon, volumes . . . 30 , . , .





Kubernetes, , Postgres, , . Postgres . Patroni . Postgres , Patroni . – crash loop. , .



partitions , -. volume . . volume, , throughput IOPS. volume .





auto-extend volumes? Amazon . API. volume 100 , .



, , , , , auto-extend. , , . . .



volumes , .



. , - jobs . .





? HA , Disaster Recovery , wal-e continuous archiving , basebackup.



wal-e – , - . pg_stat_statements 2- . , . , : APDATE WHERE id IN 150 . . . Postgres – .



Pg_stat_statements 2- . pg_stat_statements , . Kubernetes , , , . .





wal-e , . , , postgres’ - label- . - reinitializing.



– - tools, , , wal-g, pgBackRest. . -, , Postgres 9.6, 9.5 . -, , , .



. wal-e, , basebackup wal-e.





. Out-Of-Memory? docker Kubernetes – . Postgres, , 9. , . production .



. dmesg. , Memory cgroup out of memory Postgres. , ?





? process ID, .



, , . dmesg -T -. OOM system control «oom_score_adj», . Patroni Postgres, . . , .



memory limit 8 , cgroup , 6 + postgres’ shared buffers 2 . 6 . postgres’ , , , .





. . , cgroup shared memory , - .



, shared buffers 25 % 20 %. , , . . .





Postgres 11- . production minor releases, . , , .



. , – , - , shared memory. docker shared memory 64 .



Postgres 11? Postgres 11 parallel hash join. ? worker hash, shared memory. 64 , hash .



? docker dev/shm, .



Kubernetes . . . – tmpfs volume dshm.



, . . volume – enableShmVolume. , , volume. , .





Postgres . -, failover , . . Patroni, - events. Patroni failover , .



, , FATAL too many connections. . . 12- Postgres . max_wal_senders max_connections. wal_senders Postgres. .



Postgres – Built-in connection pooler.





– :



  • , cluster manifest, , . , : 100 . , , . , . OOM-Killer . , .



    . , : 4 , 32 . , 5 64 , , Kubernetes’ . , - .



  • ? production - ServiceAccout, Spilo. , , Postgres real only. ServiceAccount , , - , . .



  • YAML-.





.





, , , , array . .





tools, , Postgres , , 10.10, . 10. volume . .



tools . , , Git .



environment «». .





1 500 postgres’ . 100 Kubernetes-. . , on-Call , , , , . . - .



, . , , Patroni, Spilo, .





, open source. . Patroni Spilo .





! , .



Questions



availability ?



?



.



, anti-affinity, . . .



! . : production?*



, . 600 1 400 production. . . 600 . , . , , environment . , . , production 2- .



, external volume, . . Host Path , . . - ?



, . . . i3-volume Amazon . ? EBS , . , . . , . , .



, IO-bound , ?



, . Amazon i3-instances. NVMe . instance , . , , . Kubernetes team , , , rolling upgrade , . . 1-2 . 1-2 - .



! ?



wal-e. docker crone, basebackup. archive_command, . . wal, , S3 Amazon. , basebackup + wal . retention – 5 , . . 5 .



! . 1 400? ? 2?



200 . , , , , . . Kafka. , . , . . , . , , . . . 80, . . .



, , Postgres ?



7 . . , . pets world cattle. Pet – , -, . – , . . - , .



?



, .



, ! EBS volumes ?



gp2 , . Io1 – . 3 000 IOPS, io1 , , .



EBS gp2, 250 ?



. Kubernetes. – volumes, RAID. . Kubernetes . Kubernetes , ES2 i3-instance c nvme, instance, EBS , stripe.



Kubernetes + AWS?



, . . . . CPU, memory limit request 100 millicore, 100, 10 . . . . , 101, – . . .



RPO, RTO Postgres ?



, Kafka. . . , .



, .



通常情况下,数据会丢失1-2个沃尔码段(如果完全损坏)。通常,复制不会落后于我们。



1-2段,如果负载很小,则可以是半天。



是的,如果没有负载,那么这些段可能根本不会旋转,也就是说,即使在超时后也没有事务。



我可以把它放在那里吗?



它应该超时,但是如果没有事务,则不会轮换它们。我最近处理了这个。




All Articles