0

I currently have this nginx log output.

      log_format json_logs escape=json
                            '{'
                            '"time_local":"$time_local",'
                            '"remote_addr":"$remote_addr",'
                            '"remote_user":"$remote_user",'
                            '"request":"$request",'
                            '"status": "$status",'
                            '"body_bytes_sent":"$body_bytes_sent",'
                            '"request_time":"$request_time",'
                            '"http_referrer":"$http_referer",'
                            '"http_user_agent":"$http_user_agent"'
                            '}';
      access_log /var/log/nginx/access.log json_logs;

However, when outputted and collected by Fluentd it is prefixed with the timestamp and stdout.

For example..

2022-06-18T19:05:15.014296769Z stdout F {\"time_local\":\"18/Jun/2022:19:05:15       +0000\",\"remote_addr\":\"10.106.0.5\",\"remote_user\":\"\",\"request\":\"GET /       HTTP/1.1\",\"status\":       \"304\",\"body_bytes_sent\":\"0\",\"request_time\":\"0.000\",\"http_referrer\":\"\",\"htt      p_user_agent\":\"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36    (KHTML, like Gecko) Chrome/102.0.5005.99 Safari/537.36\"}

I can't parse it correctly, i temporarily set it to

 <source>
      @type tail
      path /var/log/containers/*nginx*.log
      pos_file /var/log/nginx.log.pos
      tag nginx.access
      <parse>
        @type nginx
        expression ^(?<somenginxstuff>.*)$
        time_key logtime
        time_format %d/%b/%Y:%H:%M:%S.%z
      </parse>
      
    </source>

to dump it all in elastic/kibana so i can check the outputs.

Questions is - what is the best/easiest way to do this? I assume it would be very common usecase?

Also, i've seen mention of plugins and i'm using the base fluent/fluentd-kubernetes-daemonset:v1.14.6-debian-elasticsearch7-1.0 image. How do i add these (if they help)?

Many thanks in advance

1 Answer 1

1

I ended up doing this by parsing it with json then filtering the field to parse as json like follows.

For following log output...

2022-06-18T19:05:15.014296769Z stdout F {\"time_local\":\"18/Jun/2022:19:05:15       +0000\",\"remote_addr\":\"10.106.0.5\",\"remote_user\":\"\",\"request\":\"GET /       HTTP/1.1\",\"status\":       \"304\",\"body_bytes_sent\":\"0\",\"request_time\":\"0.000\",\"http_referrer\":\"\",\"htt      p_user_agent\":\"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36    (KHTML, like Gecko) Chrome/102.0.5005.99 Safari/537.36\"}

And this configuration

<source>
  @type tail
  path /var/log/containers/*nginx*.log
  pos_file /var/log/nginx.log.pos
  tag nginx.access
  <parse>
    @type regexp
    expression ^(?<timestamp>[^ ]*) [^ ]*[ ][^ ] (?<data>.*).*$
   </parse>
</source>

 <filter nginx.access>
  @type parser
  key_name data
  <parse>
    @type json
  </parse>
</filter>
2
  • those stdout F/P prefixes are related to cri-o/containerd logs format. You should use them re-constructing multi-line logs (your snippet would often fail, when larger jsons are broken into multiple logs entry). See github.com/fluent/fluentd-kubernetes-daemonset/issues/…
    – SYN
    Commented Jun 19, 2022 at 19:35
  • Appreciate this, thank you. Commented Jun 22, 2022 at 8:13

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.