相信您的眼睛和在仪表板上看到的内容
在Wheely,我们严重依赖数据来制定运营和战略决策。从向合作伙伴支付每周奖金到扩展到其他城市和国家。
每个经理或产品负责人都非常了解自己所在的地区,任何偏离都会引起疑问。因此,对仪表板和指标的可靠性提出了更高的要求。我们的Google Analytics(分析)团队努力在发现问题之前发现并解决问题。
如您所知,更容易预防,因此我决定以系统和积极的方式来解决该问题。而且,当然,我要做的第一件事是创建一个Slack通道,在其中配置了有关管道中任何错误的通知的传递。
对数据集市的相关性充满信心
, :
10
8
DWH
, QA :
,
:
.yml freshness:
freshness:
warn_after: {count: 4, period: hour}
error_after: {count: 8, period: hour}
loaded_at_field: "__etl_loaded_at"
SQL-:
select
max({{ loaded_at_field }}) as max_loaded_at,
{{ current_timestamp() }} as snapshotted_at
from {{ source }}
where {{ filter }}
:
, , :
(edge cases),
(bottleneck)
:
: , Out of Memory, Disk Full
SLA
:
, + ( )
CPU
- IO, network
.
:
,
:
+pre-hook: "{{ logging.log_model_start_event() }}"
+post-hook: "{{ logging.log_model_end_event() }}"
, , . - , , , , PRIMARY KEY, FOREIGN KEY, NOT NULL, UNIQUE.
DWH . . .. , .
:
(NULL) , ?
(UNIQUE ID )?
(PRIMARY - FOREIGN KEYS)?
, (ACCEPTED VALUES)?
QA :
,
:
.yml tests:
- name: dim_cars
description: Wheely partners cars.
columns:
- name: car_id
tests:
- not_null
- unique
- name: status
tests:
- not_null
- accepted_values:
values: ['deleted', 'unknown', 'active', 'end_of_life', 'pending', 'rejected'
, 'blocked', 'expired_docs', 'partner_blocked', 'new_partner']
SQL-
-- NOT NULL test
select count(*) as validation_errors
from "wheely"."dbt_test"."dim_cars"
where car_id is null
-- UNIQUE test
select count(*) as validation_errors
from (
select
car_id
from "wheely"."dbt_test"."dim_cars"
where car_id is not null
group by car_id
having count(*) > 1
) validation_errors
-- ACCEPTED VALUES test
with all_values as (
select distinct
status as value_field
from "wheely"."dbt_test"."dim_cars"
),
validation_errors as (
select
value_field
from all_values
where value_field not in (
'deleted','unknown','active','end_of_life','pending','rejected','blocked','expired_docs','partner_blocked','new_partner'
)
)
select count(*) as validation_errors
from validation_errors
-
- - , . -, .
:
,
%
( ), .
QA :
, -.
:
SQL ,
SQL-
(PASSED) 0 , (FAILED) >= 1
Continuous Integration - DWH
, . DWH . . , , , PROD- PR Merge:
DEV- PROD-
(, Out of Memory)
- Continuous Integration (CI). !
:
master- PROD- DWH .
:
CI (, PROD-, 7 )
feature- master
- DWH
( ) :
DWH ,
(, , ) --
, , (, ).
:
, () .
, :
, : , , (, , ), (, , ).
,
DWH
drill-down :
, . , :
,
Continuous Integration and Testing
( )
, Wheely. , .
, , , «Data Engineer» OTUS, .
4 20:00 «Data Engineer». OTUS , .
:
Data Build Tool - DBT
The farm-to-table testing framework -
Tests - Related reference docs - DBT,
How to get started with data testing - dbt discourse
Data testing: why you need it -
Manual Work is a Bug - DRY