如今,无法想象没有ELK堆栈的基于Kubernetes的项目,用它可以保存集群的应用程序和系统组件的日志。在我们的实践中,我们将EFK堆栈与Fluentd一起使用,而不是Logstash。
Fluentd — , Cloud Native Computing Foundation, - Kubernetes.
Fluentd Logstash , , Fluentd , .
, EFK , , Kibana . , .
Fluentd DaemonSet ( Kubernetes) stdout /var/log/containers. JSON- ElasticSearch, standalone , . Kibana.
Fluentd , ElasticSearch . , Nginx. :
127.0.0.1 192.168.0.1 - [28/Feb/2013:12:00:00 +0900] "GET / HTTP/1.1" 200 777 "-" "Opera/12.0" -
, ElasticSearch , :
{
"_index": "test-custom-prod-example-2020.01.02",
"_type": "_doc",
"_id": "HgGl_nIBR8C-2_33RlQV",
"_version": 1,
"_score": 0,
"_source": {
"service": "test-custom-prod-example",
"container_name": "nginx",
"namespace": "test-prod",
"@timestamp": "2020-01-14T05:29:47.599052886 00:00",
"log": "127.0.0.1 192.168.0.1 - [28/Feb/2013:12:00:00 0900] \"GET / HTTP/1.1\" 200 777 \"-\" \"Opera/12.0\" -",
"tag": "custom-log"
}
}
{
"_index": "test-custom-prod-example-2020.01.02",
"_type": "_doc",
"_id": "IgGm_nIBR8C-2_33e2ST",
"_version": 1,
"_score": 0,
"_source": {
"service": "test-custom-prod-example",
"container_name": "nginx",
"namespace": "test-prod",
"@timestamp": "2020-01-14T05:29:47.599052886 00:00",
"log": "127.0.0.1 192.168.0.1 - [28/Feb/2013:12:00:00 0900] \"GET / HTTP/1.1\" 200 777 \"-\" \"Opera/12.0\" -",
"tag": "custom-log"
}
}
, .
Fluentd :
2020-01-16 01:46:46 +0000 [warn]: [test-prod] failed to flush the buffer. retry_time=4 next_retry_seconds=2020-01-16 01:46:53 +0000 chunk="59c37fc3fb320608692c352802b973ce" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch\", :port=>9200, :scheme=>\"http\", :user=>\"elastic\", :password=>\"obfuscated\"}): read timeout reached"
ElasticSearch request_timeout , - . Fluentd ElasticSearch :
2020-01-16 01:47:05 +0000 [warn]: [test-prod] retry succeeded. chunk_id="59c37fc3fb320608692c352802b973ce"
2020-01-16 01:47:05 +0000 [warn]: [test-prod] retry succeeded. chunk_id="59c37fad241ab300518b936e27200747"
2020-01-16 01:47:05 +0000 [warn]: [test-dev] retry succeeded. chunk_id="59c37fc11f7ab707ca5de72a88321cc2"
2020-01-16 01:47:05 +0000 [warn]: [test-dev] retry succeeded. chunk_id="59c37fb5adb70c06e649d8c108318c9b"
2020-01-16 01:47:15 +0000 [warn]: [kube-system] retry succeeded. chunk_id="59c37f63a9046e6dff7e9987729be66f"
, ElasticSearch _id . .
Kibana :
. — fluent-plugin-elasticsearch . , ElasticSearch . , -, .
Fluentd, . - ElasticSearch , , . , , , , , Fluentd .
, , , , : , , . , , , , , Fluentd .
:
<buffer>
@type file
path /var/log/fluentd-buffers/kubernetes.test.buffer
flush_mode interval
retry_type exponential_backoff
flush_thread_count 2
flush_interval 5s
retry_forever
retry_max_interval 30
chunk_limit_size 8M
queue_limit_length 8
overflow_action block
</buffer>
:
chunk_limit_size — , .
- flush_interval — , .
- queue_limit_length — .
- request_timeout — , Fluentd ElasticSearch.
, queue_limit_length chunk_limit_size, « , ». :
2020-01-21 10:22:57 +0000 [warn]: [test-prod] failed to write data into buffer by buffer overflow action=:block
, , , , .
: , , .
chunk_limit_size 32 , ElasticSeacrh , . , , queue_limit_length.
-, request_timeout. , 20 , Fluentd :
2020-01-21 09:55:33 +0000 [warn]: [test-dev] buffer flush took longer time than slow_flush_log_threshold: elapsed_time=20.85753920301795 slow_flush_log_threshold=20.0 plugin_id="postgresql-dev"
, , slow_flush_log_threshold. request_timeout.
:
- request_timeout , ( ). -.
- slow_flush_log_threshold. elapsed_time .
- request_timeout , elapsed_time, . request_timeout elapsed_time + 50%.
- , slow_flush_log_threshold. elapsed_time + 25%.
, , . , , .
, , , :
node-1 | node-2 | node-3 | node-4 | |
---|---|---|---|---|
/ | / | / | / | |
failed to flush the buffer | 1749/2 | 694/2 | 47/0 | 1121/2 |
retry succeeded | 410/2 | 205/1 | 24/0 | 241/2 |
, , , . - Fluentd , slow_flush_log_threshold. request_timeout, , .
Fluentd EFK , . , , ElasticSearch , .