Fluentd直接传输日志给Elasticsearch
阅读原文时间:2023年07月08日阅读:3

官方文档地址:https://docs.fluentd.org/output/elasticsearch

td-agent的v3.0.1版本以后自带包含out_elasticsearch插件,不用再安装了,可以直接使用。

若是使用的是Fluentd,则需要安装这个插件:

$ fluent-gem install fluent-plugin-elasticsearch

配置示例

<match my.logs>
  @type elasticsearch
  host localhost
  port 9200
  logstash_format true
</match>

参数说明

  • @type:必填,elasticsearch

  • host:可选,elasticsearch连接地址,默认是localhost

  • port:可选,elasticsearch使用的端口,默认是9200

  • hosts:可选,连接多个elasticsearch时使用,若是使用这个,host和port配置的则会被忽略,则用法如下:

    hosts host1:port1,host2:port2,host3:port3

    or

    hosts https://customhost.com:443/path,https://username:password@host-failover.com:443

  • user:可选,默认nil

  • password:可选,默认nil

  • scheme:可选,连接协议,默认http

  • path: 可选,Elasticsearch的REST API端点,用于发布写请求,默认nil

  • index_name,可选,索引名称,默认fluentd,用法示例:

    index by tags

    index_name fluentd.${tag}

    by tags and timestamps

    这种形式的还需要在chunk_keys中设置tag和time,如下所示:

    index_name fluentd.${tag}.%Y%m%d


    @type elasticsearch
    host localhost
    port 9200
    index_name fluentd.${tag}.%Y%m%d => fluentd.my.logs.20201105

    timekey 1m

  • logstash_format:可选,默认false,若为true,则索引名称格式是logstash-%Y.%m.%d,比index_name优先级高

  • logstash_prefix:可选,logstash前缀索引名,用于在logstash_format为true时,默认logstash

  • @log_level:可选,日志等级,参数有fatal, error, warn, info, debug, trace

其他

可以使用%{}样式占位符来转义URL编码所需的字符

比如:

# 有效配置
user %{demo+}
password %{@secret}

hosts https://%{j+hn}:%{passw@rd}@host1:443/elastic/,http://host2

# 无效配置
user demo+
password @secret

实际使用案例

收集openresty(nginx)日志

# cat /etc/td-agent/td-agent.conf 

<source>
  @type tail
  @id input_tail
  <parse>
    @type nginx
  </parse>
  path /usr/local/openresty/nginx/logs/host.access.log
  tag td.nginx.access
</source>

<match td.nginx.access>
  @type elasticsearch
  host localhost
  port 9200
  index_name fluentd.${tag}.%Y%m%d
  <buffer tag,time>
    timekey 1m
  </buffer>
</match>

关于@type nginx日志过滤的内容

官方文档地址:https://docs.fluentd.org/parser/nginx

使用的正则表达式:

expression /^(?<remote>[^ ]*) (?<host>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)"(?:\s+(?<http_x_forwarded_for>[^ ]+))?)?$/
time_format %d/%b/%Y:%H:%M:%S %z

remote, user, method, path, code, size, referer, agent and http_x_forwarded_for 都包含在record中,时间用于事件时间

# 日志内容
127.0.0.1 192.168.0.1 - [28/Feb/2013:12:00:00 +0900] "GET / HTTP/1.1" 200 777 "-" "Opera/12.0" -

# 过滤后的结果
time:
1362020400 (28/Feb/2013:12:00:00 +0900)

record:
{
  "remote"              : "127.0.0.1",
  "host"                : "192.168.0.1",
  "user"                : "-",
  "method"              : "GET",
  "path"                : "/",
  "code"                : "200",
  "size"                : "777",
  "referer"             : "-",
  "agent"               : "Opera/12.0",
  "http_x_forwarded_for": "-"
}

假设不用这个参数的话,假若删除

<parse>
  @type nginx
</parse>

启动后则会报错:

<parse> section is required

只得使用none替换:

<parse>
  @type none
</parse>