帕维尔·特鲁汉诺夫(Pavel Trukhanov)。通过USE和RED监视Postgres。PGConf.Russia解密

Pavel Trukhanov,“通过USE和RED监视Postgres”



有两种性能监视方法:Brendan Gregg的USE(利用率,饱和度,错误)和Tom Wilkie的RED(请求,错误,持续时间)。在演讲中,我想谈谈我们如何受到他们的指导,并在我们在okmeter.io中实现Postgres监视时继续受到指导。





. Okmeter. , , Postgres , . , , , , , USE RED , Postgres.





, . , .





, , , , . - . performance - , , , , .





, , , Postgres – , , . , - , , , . - , . - , , .





USE. , . , , . , , saturation , .





, ? , pg_stat_activity . ? , . , - , . , , . - , - , , .





? «», CPU Usage, , iostat – . , . , , . , , . .





, , , Postgres. , Postgres . . , , . Data Egret. , .





- ?





. . , Postgres , connection connection .





. . , . – - . , , . , , , .



: « ?». , SpinLock - , , , . CPU usage , .



– . , , , , - , , , , - .



? , capacity. 100 %, , , . .



. , . , . . , . , . . .



. - , capacity. , capacity ? . . saturation, . , . , .



Postgres.





pg_stat_activity. - . , . . : 300 connection . , - . , , - .





, . , , . - , , capacity , . . , Postgres max connections.





, state connection, , , idle, . . connection , . - idle in transaction. , , . active, - .





, , . , . ? . , - , . – pool connections, – , , , , . – , . - : locks - .





, , , .





- , , active 5 % connections. 95 % . . , .





, . , connections .



?





, . ? 100 connections, max connections , setting’, . , . , 100 %. , – . - . , . - , - .





saturation, util ? Saturation , utilization 100 % . , , , utilization 100? , .



, , CPU usage , load avarage . , 100 %, saturation . Load avarage — saturation, - . runnable , . . , , , .



, CPU usage . ? . load avarage. Load avarage , . , - . . response .





. - – idle in transaction.





. . - , . saturation .





idle. max connections, . , . -.





, select’ pg_stat_activity connections, waiting try. . . active state, - , -. waiting.



, . utilization connection pool 100 %.





, .





waiting ? . , - saturation , . . stack Postgres, , - - . .





– locks. , lock. , locks - , , connections. , locks.





. . . - lock , .





lock – space , – . , , lock . , , connections, locks, — saturation lock.





Postgres , connection . TCP-. TCP-. Post master . , , , «reset». time wait .



? , connections .





connections .





, connection pool . , , , , . ? - . ? -, . connections 5 000. Postgres . ? - connections. , , .





TCP . time wait, , - Postgres - , .





, connect? postmaster , connections backlog list . , search, backlog 100. . 100 %. – , - – saturation. – .





, backlog , reset.





, . Postgres , TCP «».





RED, USE? DBA, , , , - . , - . - , . . , Postgres .





RED, , , , :



  • ,
  • ,
  • .




Postgres. , . , - . . - , .





rollbacks, , 6 , , , , , search , . . , - .



, RED . , . ? , . , , . , .





queries . - - . 8 , .





, - . . select , .





. , - , . . - . . - . , . . . : « , », , .





, . pg_stat_statements , . . , , . . , . – . . , , - , , . .





slow log. Slow log – durations . , . . , , - , .





, . , - , .





. , - . . , , . – , .





. - .





, , - . - , .





, . , . , , . .





, . USE, RED, ad-hoc , ad-hoc tools - , , , , .



.





Postgres, USE, RED ? . . .



Okmeter, . , - , . , , , , . , - , USE, RED. , . , , , saturation . , , , saturation . , . , - . , , , . , , .



! ! , 4- .



4 – USE RED. , USE, durations. errors . RED , requests durations. - , USE RED . . . - . , , .



– instance.



, ? – . – , requests . .



, !



! . – , - , , . , , . .? . . ?



, . , . , . . , , , , , USE . , , , , , selects, , , requests . , requests .



, , , , ?



. , . , , . , . . , . . . , . - , , . . , queries . - . , .



, , Postgres . , . , .



! , instance Postgres - . , ? , BD .



. – . , , , , , . , , - . , . .



我们斗争的第二种方法是优化。我们优化我们的工作。实际上,Okmeter定期(但很少)每分钟一次对这些视图发出请求。



也就是说,这不是实时的吗?



这是一个难题,什么是实时的。让我们分开讨论。但是负载受到您发出的请求数量的限制。这些请求根本不是很繁重。有几十个。即使您在某种意义上比每分钟一次更实时地进行操作,这种负载仍然非常有限。这是向数据库发送多少查询的示例。有几千个。因此,即使每秒对这几十个对象进行一次轮询,它仍然只是一小部分。



知道了谢谢!




All Articles