流利的:为什么调整输出缓冲区很重要



如今,无法想象没有ELK堆栈的基于Kubernetes的项目,用它可以保存集群的应用程序和系统组件的日志。在我们的实践中,我们将EFK堆栈与Fluentd一起使用,而不是Logstash。



Fluentd — , Cloud Native Computing Foundation, - Kubernetes.



Fluentd Logstash , , Fluentd , .



, EFK , , Kibana . , .





Fluentd DaemonSet ( Kubernetes) stdout /var/log/containers. JSON- ElasticSearch, standalone , . Kibana.



Fluentd , ElasticSearch . , Nginx. :



127.0.0.1 192.168.0.1 - [28/Feb/2013:12:00:00 +0900] "GET / HTTP/1.1" 200 777 "-" "Opera/12.0" -


, ElasticSearch , :



{
  "_index": "test-custom-prod-example-2020.01.02",
  "_type": "_doc",
  "_id": "HgGl_nIBR8C-2_33RlQV",
  "_version": 1,
  "_score": 0,
  "_source": {
    "service": "test-custom-prod-example",
    "container_name": "nginx",
    "namespace": "test-prod",
    "@timestamp": "2020-01-14T05:29:47.599052886 00:00",
    "log": "127.0.0.1 192.168.0.1 - [28/Feb/2013:12:00:00  0900] \"GET / HTTP/1.1\" 200 777 \"-\" \"Opera/12.0\" -",
    "tag": "custom-log"
  }
}

{
  "_index": "test-custom-prod-example-2020.01.02",
  "_type": "_doc",
  "_id": "IgGm_nIBR8C-2_33e2ST",
  "_version": 1,
  "_score": 0,
  "_source": {
    "service": "test-custom-prod-example",
    "container_name": "nginx",
    "namespace": "test-prod",
    "@timestamp": "2020-01-14T05:29:47.599052886 00:00",
    "log": "127.0.0.1 192.168.0.1 - [28/Feb/2013:12:00:00  0900] \"GET / HTTP/1.1\" 200 777 \"-\" \"Opera/12.0\" -",
    "tag": "custom-log"
  }
}


, .



Fluentd :



2020-01-16 01:46:46 +0000 [warn]: [test-prod] failed to flush the buffer. retry_time=4 next_retry_seconds=2020-01-16 01:46:53 +0000 chunk="59c37fc3fb320608692c352802b973ce" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch\", :port=>9200, :scheme=>\"http\", :user=>\"elastic\", :password=>\"obfuscated\"}): read timeout reached"


ElasticSearch request_timeout , - . Fluentd ElasticSearch :



2020-01-16 01:47:05 +0000 [warn]: [test-prod] retry succeeded. chunk_id="59c37fc3fb320608692c352802b973ce" 
2020-01-16 01:47:05 +0000 [warn]: [test-prod] retry succeeded. chunk_id="59c37fad241ab300518b936e27200747" 
2020-01-16 01:47:05 +0000 [warn]: [test-dev] retry succeeded. chunk_id="59c37fc11f7ab707ca5de72a88321cc2" 
2020-01-16 01:47:05 +0000 [warn]: [test-dev] retry succeeded. chunk_id="59c37fb5adb70c06e649d8c108318c9b" 
2020-01-16 01:47:15 +0000 [warn]: [kube-system] retry succeeded. chunk_id="59c37f63a9046e6dff7e9987729be66f"


, ElasticSearch _id . .



Kibana :







. — fluent-plugin-elasticsearch . , ElasticSearch . , -, .



Fluentd, . - ElasticSearch , , . , , , , , Fluentd .



, , , , : , , . , , , , , Fluentd .



:



 <buffer>
        @type file
        path /var/log/fluentd-buffers/kubernetes.test.buffer
        flush_mode interval
        retry_type exponential_backoff
        flush_thread_count 2
        flush_interval 5s
        retry_forever
        retry_max_interval 30
        chunk_limit_size 8M
        queue_limit_length 8
        overflow_action block
      </buffer>


:

chunk_limit_size — , .



  • flush_interval — , .
  • queue_limit_length — .
  • request_timeout — , Fluentd ElasticSearch.


, queue_limit_length chunk_limit_size, « , ». :



2020-01-21 10:22:57 +0000 [warn]: [test-prod] failed to write data into buffer by buffer overflow action=:block


, , , , .



: , , .



chunk_limit_size 32 , ElasticSeacrh , . , , queue_limit_length.



-, request_timeout. , 20 , Fluentd :



2020-01-21 09:55:33 +0000 [warn]: [test-dev] buffer flush took longer time than slow_flush_log_threshold: elapsed_time=20.85753920301795 slow_flush_log_threshold=20.0 plugin_id="postgresql-dev" 


, , slow_flush_log_threshold. request_timeout.



:



  1. request_timeout , ( ). -.
  2. slow_flush_log_threshold. elapsed_time .
  3. request_timeout , elapsed_time, . request_timeout elapsed_time + 50%.
  4. , slow_flush_log_threshold. elapsed_time + 25%.


, , . , , .



, , , :



node-1 node-2 node-3 node-4
/ / / /
failed to flush the buffer 1749/2 694/2 47/0 1121/2
retry succeeded 410/2 205/1 24/0 241/2


, , , . - Fluentd , slow_flush_log_threshold. request_timeout, , .





Fluentd EFK , . , , ElasticSearch , .



:






All Articles