今天,我想向Habr的读者展示一个用Python编写的实用程序,用于在PostgreSQL DBMS中处理表依赖关系。
实用程序API很简单,它包含三种方法:
- archive_table-递归归档/删除具有指定主键的行
- get_table_references-查找表的依赖项(将显示指定表引用和引用的表)
- get_rows_references-在其他表中搜索引用所需表中指定行的行
背景
我的名字叫Oleg Borzov,我是Domklik抵押贷款经理的CRM团队的开发人员。
就公司的数量而言,我们的CRM系统的主数据库是最大的数据库之一。它也是最古老的之一:它出现在项目启动时,当树木很大时,Domclick是一家初创公司,而PHP代替了流行的Python异步框架上的微服务,而这是一个庞大的整体。
从PHP到Python的过渡非常漫长,并且需要两个系统的同时支持,这影响了数据库的设计。
结果,我们有了一个数据库,该数据库包含大量高度连接的庞大表,并具有一堆用于不同类型查询的索引。所有这些都会对数据库的性能产生负面影响:由于大型表和它们之间的大量连接,查询的复杂性一直在增长,这对于负载最大的表尤其重要。
为了减少数据库的负载,我们决定编写一个脚本,每天将旧记录从容量最大和负载最大的表转移到存档表(例如,从
task
c task_archive
)。
由于表之间存在大量关系,因此此任务变得很复杂:仅将行从转移
task
到task_archive
并不足够,然后您需要对所有引用task
表进行递归操作。
我会举一个例子来自postgrespro.ru的演示数据库:
假设我们需要从表中删除记录
Flights
。Postgres不允许这样做:我们首先需要从所有引用表中删除记录,然后递归地删除没有人引用的表。
在我们的示例中,它
Flights
指向Ticket_flights
并指向- Boarding_passes
。
因此,您需要按以下顺序删除:
- 获取其中的行的主键(PK)的值,该值
Ticket_flights
引用中的已删除行Flights
。 - 我们得到
Boarding_passes
引用的行的PKTicket_flights
。 - 我们从表中的项目2按PK删除行
Boarding_passes
。 - 我们从中的第1条删除PK行
Ticket_flights
。 - 我们从中删除行
Flights
。
结果是一个名为PgGraph的实用程序,我们决定将其开源。
如何使用
该实用程序支持两种使用方式:
- 从命令行(
pggraph …
)调用。 - 在Python代码(类
PgGraphApi
)中的用法。
安装与配置
首先,您需要从Pypi存储库安装实用程序:
pip3 install pggraph
然后使用数据库配置和归档脚本在本地计算机上创建config.ini文件:
[db]
host = localhost
port = 5432
user = postgres
password = postgres
dbname = postgres
schema = public ; ,
[archive] ; ,
is_debug = false
chunk_size = 1000
max_depth = 20
to_archive = true
archive_suffix = 'archive'
从控制台运行
参量
$ pggraph -h
usage: pggraph action [-h] --table TABLE [--ids IDS] [--config_path CONFIG_PATH]
positional arguments:
action required action: archive_table, get_table_references, get_rows_references
optional arguments:
-h, --help show this help message and exit
--table TABLE table name
--ids IDS primary key ids, separated by comma, e.g. 1,2,3
--config_path CONFIG_PATH path to config.ini
--log_path LOG_PATH path to log dir
--log_level LOG_LEVEL log level (debug, info, error)
位置参数:
action
-期望的效果:archive_table
,get_table_references
或get_rows_references
。
命名参数:
--config_path
-配置文件的路径;--table
-您要用来执行操作的表;--ids
-以逗号分隔的ID列表,例如1,2,3
(可选);--log_path
-日志文件夹的路径(默认情况下为可选参数-主文件夹);--log_level
-日志记录级别(可选参数,默认情况下为INFO)。
命令示例
存档表格
该实用程序的主要功能是数据归档,即 将行从主表转移到存档表(例如,从books表转移到books_archive)。
还支持不存档删除:为此,请将config.ini中的参数设置为to_archive = false。
必需的参数是config_path,table和ids。
启动后,
ids
该表table
以及所有引用该表的表中的记录将被递归删除。
$ pggraph archive_table --config_path config.hw.local.ini --table flights --ids 1,2,3
2020-06-20 19:27:44 INFO: flights - START
2020-06-20 19:27:44 INFO: flights - start archive_recursive 3 rows (depth=0)
2020-06-20 19:27:44 INFO: START ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO: ticket_flights - start archive_recursive 3 rows (depth=1)
2020-06-20 19:27:44 INFO: START ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO: boarding_passes - start archive_recursive 3 rows (depth=2)
2020-06-20 19:27:44 INFO: START ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO: END ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO: boarding_passes - archive_by_ids 3 rows by ticket_no, flight_id
2020-06-20 19:27:44 INFO: boarding_passes - start archive_recursive 3 rows (depth=2)
2020-06-20 19:27:44 INFO: START ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO: END ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO: boarding_passes - archive_by_ids 3 rows by ticket_no, flight_id
2020-06-20 19:27:44 INFO: boarding_passes - start archive_recursive 3 rows (depth=2)
2020-06-20 19:27:44 INFO: START ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO: END ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO: boarding_passes - archive_by_ids 3 rows by ticket_no, flight_id
2020-06-20 19:27:44 INFO: boarding_passes - start archive_recursive 3 rows (depth=2)
2020-06-20 19:27:44 INFO: START ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO: END ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO: boarding_passes - archive_by_ids 3 rows by ticket_no, flight_id
2020-06-20 19:27:44 INFO: END ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO: ticket_flights - archive_by_ids 3 rows by ticket_no, flight_id
2020-06-20 19:27:44 INFO: END ARCHIVE REFERRING TABLES
2020-06-20 19:27:44 INFO: flights - archive_by_ids 3 rows by id
2020-06-20 19:27:44 INFO: flights - END
查找指定表的依赖项
查找指定表的依赖项的函数
table
。必需的参数是config_path
和table
。
启动后,将在屏幕上显示字典,其中:
in_refs
-引用表到给定表的字典,其中key是表的名称,value是外键对象的列表(pk_main
-主表中的pk_ref
主键,fk_ref
- 引用表中的主键,-源表的外键的列名);out_refs
-给定表引用的表的字典。
$ pggraph get_table_references --config_path config.hw.local.ini --table flights
{'in_refs': {'ticket_flights': [ForeignKey(pk_main='flight_id', pk_ref='ticket_no, flight_id', fk_ref='flight_id')]},
'out_refs': {'aircrafts': [ForeignKey(pk_main='aircraft_code', pk_ref='flight_id', fk_ref='aircraft_code')],
'airports': [ForeignKey(pk_main='airport_code', pk_ref='flight_id', fk_ref='arrival_airport'),
ForeignKey(pk_main='airport_code', pk_ref='flight_id', fk_ref='departure_airport')]}}
查找具有指定主键的字符串的链接
用于在其他表中查找通过外键引用
ids
表行的行的功能table
。所需的参数config_path
,table
和ids
。
启动后,屏幕上将显示具有以下结构的词典:
{
pk_id_1: {
reffering_table_name_1: {
foreign_key_1: [
{row_pk_1: value, row_pk_2: value},
...
],
...
},
...
},
pk_id_2: {...},
...
}
通话示例:
$ pggraph get_rows_references --config_path config.hw.local.ini --table flights --ids 1,2,3
{1: {'ticket_flights': {'flight_id': [{'flight_id': 1,
'ticket_no': '0005432816945'},
{'flight_id': 1,
'ticket_no': '0005432816941'}]}},
2: {'ticket_flights': {'flight_id': [{'flight_id': 2,
'ticket_no': '0005433101832'},
{'flight_id': 2,
'ticket_no': '0005433101864'},
{'flight_id': 2,
'ticket_no': '0005432919715'}]}},
3: {'ticket_flights': {'flight_id': [{'flight_id': 3,
'ticket_no': '0005432817560'},
{'flight_id': 3,
'ticket_no': '0005432817568'},
{'flight_id': 3,
'ticket_no': '0005432817559'}]}}}
代码中的用法
除了在控制台中运行外,该库还可以用Python代码使用。以下是iPython交互式环境中的调用示例。
存档表格
>>> from pg_graph.main import setup_logging
>>> setup_logging(log_level='DEBUG')
>>> from pg_graph.api import PgGraphApi
>>> api = PgGraphApi('config.hw.local.ini')
>>> api.archive_table('flights', [4,5])
2020-06-20 23:12:08 INFO: flights - START
2020-06-20 23:12:08 INFO: flights - start archive_recursive 2 rows (depth=0)
2020-06-20 23:12:08 INFO: START ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 DEBUG: ticket_flights - ForeignKey(pk_main='flight_id', pk_ref='flight_id, ticket_no', fk_ref='flight_id')
2020-06-20 23:12:08 DEBUG: SQL('SELECT flight_id, ticket_no FROM bookings.ticket_flights WHERE (flight_id) IN (%s, %s)')
2020-06-20 23:12:08 INFO: ticket_flights - start archive_recursive 30 rows (depth=1)
2020-06-20 23:12:08 INFO: START ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 DEBUG: boarding_passes - ForeignKey(pk_main='flight_id, ticket_no', pk_ref='flight_id, ticket_no', fk_ref='flight_id, ticket_no')
2020-06-20 23:12:08 INFO: boarding_passes - archive_by_fk 30 rows by ForeignKey(pk_main='flight_id, ticket_no', pk_ref='flight_id, ticket_no', fk_ref='flight_id, ticket_no')
2020-06-20 23:12:08 DEBUG: SQL('CREATE TABLE IF NOT EXISTS bookings.boarding_passes_archive (LIKE bookings.boarding_passes)')
2020-06-20 23:12:08 DEBUG: DELETE FROM boarding_passes by FK flight_id, ticket_no - 30 rows
2020-06-20 23:12:08 INFO: END ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 INFO: ticket_flights - archive_by_ids 30 rows by flight_id, ticket_no
2020-06-20 23:12:08 DEBUG: SQL('CREATE TABLE IF NOT EXISTS bookings.ticket_flights_archive (LIKE bookings.ticket_flights)')
2020-06-20 23:12:08 DEBUG: DELETE FROM ticket_flights by flight_id, ticket_no - 30 rows
2020-06-20 23:12:08 DEBUG: INSERT INTO ticket_flights_archive - 30 rows
2020-06-20 23:12:08 INFO: ticket_flights - start archive_recursive 30 rows (depth=1)
2020-06-20 23:12:08 INFO: START ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 DEBUG: boarding_passes - ForeignKey(pk_main='flight_id, ticket_no', pk_ref='flight_id, ticket_no', fk_ref='flight_id, ticket_no')
2020-06-20 23:12:08 INFO: boarding_passes - archive_by_fk 30 rows by ForeignKey(pk_main='flight_id, ticket_no', pk_ref='flight_id, ticket_no', fk_ref='flight_id, ticket_no')
2020-06-20 23:12:08 DEBUG: SQL('CREATE TABLE IF NOT EXISTS bookings.boarding_passes_archive (LIKE bookings.boarding_passes)')
2020-06-20 23:12:08 DEBUG: DELETE FROM boarding_passes by FK flight_id, ticket_no - 30 rows
2020-06-20 23:12:08 INFO: END ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 INFO: ticket_flights - archive_by_ids 30 rows by flight_id, ticket_no
2020-06-20 23:12:08 DEBUG: SQL('CREATE TABLE IF NOT EXISTS bookings.ticket_flights_archive (LIKE bookings.ticket_flights)')
2020-06-20 23:12:08 DEBUG: DELETE FROM ticket_flights by flight_id, ticket_no - 30 rows
2020-06-20 23:12:08 DEBUG: INSERT INTO ticket_flights_archive - 30 rows
2020-06-20 23:12:08 INFO: ticket_flights - start archive_recursive 30 rows (depth=1)
2020-06-20 23:12:08 INFO: START ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 DEBUG: boarding_passes - ForeignKey(pk_main='flight_id, ticket_no', pk_ref='flight_id, ticket_no', fk_ref='flight_id, ticket_no')
2020-06-20 23:12:08 INFO: boarding_passes - archive_by_fk 30 rows by ForeignKey(pk_main='flight_id, ticket_no', pk_ref='flight_id, ticket_no', fk_ref='flight_id, ticket_no')
2020-06-20 23:12:08 DEBUG: SQL('CREATE TABLE IF NOT EXISTS bookings.boarding_passes_archive (LIKE bookings.boarding_passes)')
2020-06-20 23:12:08 DEBUG: DELETE FROM boarding_passes by FK flight_id, ticket_no - 30 rows
2020-06-20 23:12:08 INFO: END ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 INFO: ticket_flights - archive_by_ids 30 rows by flight_id, ticket_no
2020-06-20 23:12:08 DEBUG: SQL('CREATE TABLE IF NOT EXISTS bookings.ticket_flights_archive (LIKE bookings.ticket_flights)')
2020-06-20 23:12:08 DEBUG: DELETE FROM ticket_flights by flight_id, ticket_no - 30 rows
2020-06-20 23:12:08 DEBUG: INSERT INTO ticket_flights_archive - 30 rows
2020-06-20 23:12:08 INFO: ticket_flights - start archive_recursive 3 rows (depth=1)
2020-06-20 23:12:08 INFO: START ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 DEBUG: boarding_passes - ForeignKey(pk_main='flight_id, ticket_no', pk_ref='flight_id, ticket_no', fk_ref='flight_id, ticket_no')
2020-06-20 23:12:08 INFO: boarding_passes - archive_by_fk 3 rows by ForeignKey(pk_main='flight_id, ticket_no', pk_ref='flight_id, ticket_no', fk_ref='flight_id, ticket_no')
2020-06-20 23:12:08 DEBUG: SQL('CREATE TABLE IF NOT EXISTS bookings.boarding_passes_archive (LIKE bookings.boarding_passes)')
2020-06-20 23:12:08 DEBUG: DELETE FROM boarding_passes by FK flight_id, ticket_no - 3 rows
2020-06-20 23:12:08 INFO: END ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 INFO: ticket_flights - archive_by_ids 3 rows by flight_id, ticket_no
2020-06-20 23:12:08 DEBUG: SQL('CREATE TABLE IF NOT EXISTS bookings.ticket_flights_archive (LIKE bookings.ticket_flights)')
2020-06-20 23:12:08 DEBUG: DELETE FROM ticket_flights by flight_id, ticket_no - 3 rows
2020-06-20 23:12:08 DEBUG: INSERT INTO ticket_flights_archive - 3 rows
2020-06-20 23:12:08 INFO: END ARCHIVE REFERRING TABLES
2020-06-20 23:12:08 INFO: flights - archive_by_ids 2 rows by flight_id
2020-06-20 23:12:09 DEBUG: SQL('CREATE TABLE IF NOT EXISTS bookings.flights_archive (LIKE bookings.flights)')
2020-06-20 23:12:09 DEBUG: DELETE FROM flights by flight_id - 2 rows
2020-06-20 23:12:09 DEBUG: INSERT INTO flights_archive - 2 rows
2020-06-20 23:12:09 INFO: flights - END
查找指定表的依赖项
>>> from pg_graph.api import PgGraphApi
>>> from pprint import pprint
>>> api = PgGraphApi('config.hw.local.ini')
>>> res = api.get_table_references('flights')
>>> pprint(res)
{'in_refs': {'ticket_flights': [ForeignKey(pk_main='flight_id', pk_ref='flight_id, ticket_no', fk_ref='flight_id')]},
'out_refs': {'aircrafts': [ForeignKey(pk_main='aircraft_code', pk_ref='flight_id', fk_ref='aircraft_code')],
'airports': [ForeignKey(pk_main='airport_code', pk_ref='flight_id', fk_ref='arrival_airport'),
ForeignKey(pk_main='airport_code', pk_ref='flight_id', fk_ref='departure_airport')]}}
查找具有指定主键的字符串的链接
>>> from pg_graph.api import PgGraphApi
>>> from pprint import pprint
>>> api = PgGraphApi('config.hw.local.ini')
>>> rows = api.get_rows_references('flights', [1,2,3])
>>> pprint(rows)
{1: {'ticket_flights': {'flight_id': [{'flight_id': 1,
'ticket_no': '0005432816945'},
{'flight_id': 1,
'ticket_no': '0005432816941'}]}},
2: {'ticket_flights': {'flight_id': [{'flight_id': 2,
'ticket_no': '0005433101832'},
{'flight_id': 2,
'ticket_no': '0005433101864'},
{'flight_id': 2,
'ticket_no': '0005432919715'}]}},
3: {'ticket_flights': {'flight_id': [{'flight_id': 3,
'ticket_no': '0005432817560'},
{'flight_id': 3,
'ticket_no': '0005432817568'},
{'flight_id': 3,
'ticket_no': '0005432817559'}]}}}
该库的源代码在MIT许可下的GitHub上以及在PyPI存储库中都可用。
欢迎提出意见,承诺和建议。
我将尝试在此处和存储库中尽可能回答这些问题。