在遥远的将来,自动删除不必要的数据将是DBMS的重要任务之一[1]。同时,我们自己需要注意将不必要的数据删除或移动到较便宜的存储系统中。假设您决定删除几百万行。非常简单的任务,尤其是在条件已知且有合适索引的情况下。“从表1的WHERE col1 =:值中删除”-可能更简单的是吧?
视频:
从第一年起,即从2007年起,我就一直在Highload计划委员会任职。
自2005年以来,我一直在Postgres工作。在许多项目中使用过它。
自2007年以来,还是RuPostges的一个小组。
在Meetup上,我们已经发展到2100多个参与者。它是仅次于旧金山的仅次于纽约的世界第二大城市。
. , . Postgres. .
https://postgres.ai/ – . , , .
- , Postgres - . , , , , DBA . , .
https://www.seagate.com/files/www-content/our-story/trends/files/idc-seagate-dataage-whitepaper.pdf
VLDB -. . , , . .
– 1 000 000 . , 100 . .
https://vldb2019.github.io/files/VLDB19-keynote-2-slides.pdf
? , . . .
, , . , , .
, , , , . . , , . .
, . – . , - , . - , . . . - , , . .
, , - .
, , . . .
. , – . staging – . – . – .
. DELETE .
, , . .
. - , .
dev, staging – . , , . , .
? . , - . , , . , DELETE .
? , . ?
, review, . . DBA- . , prod, .
, - .
.
- , Postgres MySQL .
, , - .
, - , ?
DBA. DBA , : « ». , GitLab, GitHub code review , DBA prod, DBA : « ».
, disk IO , , , .
– . , prod , , staging . , 1 000 , .
, , . . , , . -.
. , , . , . , YouTube .
, ? , latency . , 100 %. , NVMe , , , . , , .
, . . Switchover. . .
- ? DBA , checkpoint tuning. , checkpoint tuning.
checkpoint? . , . , , , write-ahead log. - , , , , REDO. . , checkpoint. .
Postgres . 10-15- . checkpoint – .
c Postgres check-up, . . . - . , checkpoints 90 % .
? . Checkpoint timeout , , 10 . , .
max_wal_saze 1 . , Postgres 300-400 . checkpoint .
, , , , checkpoint , 30 , . .
, . . . max_wal_size. .
, , . . , .
, .
– max_wal_size. . 1 . DELETE .
, . , disk IO . , WAL , . . . , checkpoint . , .
max_wal_size. . , . . , 10 – , 1, 2, 4, 8 . . , prod. , Postgres .
, , DELETE, checkpoint’.
Checkpoint - – .
: DELETE , «» .
. . 1 max_wal_size , . – , . . . , DELETE .
prod, , , DELETE .
, 16 , , . – , . . , . . – . – . , , 16 .
64 , . , , - .
?
, , checkpoint tuning, , , , .
checkpoint , , , , , , , , . checkpoint , , .
checkpoint . .
. Postgres 8 , Linux 4 . full_page_writes. . , , , , .
WAL , , checkpoint , , . . 8 , , 100 . .
, .
, , checkpoint , . checkpoints, , full_page_writes = on , , . . WAL . , , .
, , .
max_wal_size, , checkpoint, wal writer. .
. ? , , checkpoint . REDO . .
, checkpoint , , kill -9 Postgres .
, , , . . REDO .
, . -, checkpoint, , . , -, . checkpoints -, , , WAL checkpoint. . . .
max_wal_size , , max_wal_size 64 , 10 . – . -. , - : « ? 3-5 ?». .
. Patroni. , , . autofailover Postgres. GitLab Data Egret .
autofailover, 30 , 10 ? , . . . , .
, . , , - 10 .
- , autofailover. , , 64, 100 – . . , .
, , , max_wal_size =1, 8, . . , . ?
, , . .
. , «BEGIN, DELETE, ROLLBACK», DELETE . . . , . . bloat . DELETE.
DELETE c ROLLBACK checkpoint tuning, database labs.
«i». Postgres . , . : ctid, xmid, xmax.
Ctid – . , .
, ROOLBACK . . . , . .
Xmax – . , Postgres , , , 0, – . , DELETE . database labs .
. DBA , : « ?». . , .
, . . DELETE . 20 , . , , , .
?
, , . , , throttling.
. , , , , , , autovacuum, , . , - , , - . .
https://postgres.ai/products/joe/
. , : « ?».
, , transaction overhead, . . . .
: , .
? , . . 50 . - , . , . - 100 , , , 100 , . .
, 10- , , - . , . . , transaction overhead . , .
. - . . . . . DELETE UPDATE.
, , , DELETE. , .
, . index scan, index only scan. . . .
, . , , - . , . database labs.
-, production . , , , . , . , - , .
, , . . . 30 , . - RESET, . . . .
https://docs.gitlab.com/ee/development/background_migrations.html
? 3 , .
. . , . . 100 , 5 , 1 000 . , . ID . .
– . Gitlab. . ID , 10 000 . - . . .
, , . .
. . , . . , , ID. ID, index only scan, heap .
, index only scan – , index scan.
, . BATCH_SIZE . , . , , , , . for update skip locked. Postgres , . . CTE – . CTE – returning *
. returning id, *
, .
? , . . ID created_at -. min, max . - . . .
. , , , heap only tuples updates. . . Postgres . pg_stat_user_tables . – hot updates .
, . updates, , . ( - updates, -), . . . , . . , - , , -, , .
, batches , - . .
— https://gitlab.com/snippets/1890447
Blocked autovacuum — https://gitlab.com/snippets/1889668
Blocking issue — https://gitlab.com/snippets/1890428
№ 5 . Okmeter Postgres. Postgres, , . - , - . Okmeter – , , . .
, dead tuples . , - . , , . .
IO, , .
. OLTP . , .
– ? . . autovacuum . . hot standby , . , .
, alert . . autovacuum. Avito, . , , autovacuum. , - . alert .
issues. . - - . Data Egret CTE, . . . . statement_timeout . lock_timeout .
.
, – . , . 2 – .
, production, . production.
. , . DBA , , . .
. , , REPACK . . , .
, , . -. : . , .
, , open source. GitLab. , DBA. database lab, . . , Joe. production. Joe slack, : «explain - » . DELETE , .
, 10 , database lab 10 . 10- 10 . , . , . . . .
thin provisioning. . , , . . . .
: 5 , , 30 . , . . , .
Postgres.ai . , , . . . .
, , , , . . . , , . , , . ?
. , pg_repack, , , 4- . , , 8 .
. . . . . . . . pg_repack. . , , . , . . , . , , .
pg_repack GitHub , , int 4 int 8, pg_repack . , , . , pg_repack : « », . . , . .
, .
Bloat , . , , . . . . Python, .
MySQL, . .
, 90 %. 5 %, .
! prod, - , ?
. . , , , , - . , . , , , .
! , Postgres, - , . . Postgres, - DELETE deferent - , , - ?
SQL , Postgres ? . . . .
.
, checkpoint tuning . - , , . .
, , , ? .. ?
, . Nancy, checkpoint tuning. - Postgres? , . . , . . . , auto tuning . . checkpoint tuning . . . performance, shell buffer . .
checkpoint tuning : , cloud, Nancy . max_wal_size . , .
! . , autovacuum . ? . ?
Autovacuum – , , . , , . , . , . autovacuum – . OLTP: autovacuum. hot standby feedback , autovacuum , . , , . . , – . -, . . . . , autovacuum, .
! , . , , . , . . . live, live, , 60-70 %. ,
DBA, , , — . , . , , production . . . -. , – . .
garbage select , , deleted flag
, autovacuum Postgres.
, ?
Autovacuum garbage collector.
!
! , - ?
, .
如果我们锁定了不应该使用的表,那么可以保护自己吗?
当然有。但这是一个像鸡和鸡蛋的问题。如果我们都知道将来会发生什么,那么我们当然会做的一切都很好。但是业务在变化,有新的专栏,新的要求。然后-糟糕,我们要删除该内容。但是这种理想的情况会在生活中发生,但并非总是如此。但是总的来说,这是一个好主意。只是截断就可以了。