Using Hekad to parse logs for relevant parts

We at Taxspanner have been looking at different options for analytics pipeline and setting up things to capture relevant information. Vivek had noticed Hekad and he was wondering if we could use syslogs which are already being generated by the app. Idea was to look for specific log in a format which could contain information like App name, Model, UUID and operation being done.

We followed basic setup guide to get a feel of it and it was processing nginx logs in tunes of millions very quickly. Apart from official documentation, this post talks about how to setup a quick filter around Hekad processing pipeline. We experimented with a client-server setup where client running on app server can tail the django log file, filter relevant log message and push it to server hekad instance aggregating logs from all app servers.

This was the client's side hekad config toml file:

[hekad]
maxprocs = 1
base_dir = "."
share_dir = "hekad-location/share/heka/"

# Input is django log file
[django_logs]
type = "LogstreamerInput"
splitter = "TokenSplitter"
log_directory = "logs/"
file_match = 'django\.log'

# Decoder to parse logs and extracting relevant log
[DjangoLogDecoder]
type = "SandboxFilter"
message_matcher = "Logger == 'django_logs'"
filename = "lua_decoders/django_logs.lua"

# Encoder for output streams
[PayloadEncoder]

# We channel output generated from DjangoLogDecoder to a certain UDP port
[UdpOutput]
message_matcher = "Logger == 'DjangoLogDecoder'"
address = ":34567"
encoder = "PayloadEncoder"

The Lua script to filter relevant log pretty small:

local string = require "string"
local table = require "table"

-- This structure could be used in better way
local msg = {
Timestamp   = nil,
Type        = msg_type,
Payload     = nil,
Fields      = nil
}

function process_message ()
    local log = read_message("Payload")
    if log == nil then
      return 0
    end
    local log_blocks = {}
    for i in string.gmatch(log, "%S+") do
      table.insert(log_blocks, i)
    end
    if table.getn(log_blocks) >= 4 then
      if log_blocks[3] == "CRITICAL" then
	msg.Payload = log
	inject_message(msg)
      end
    end
    return 0
end

With client instance in place now we get out listener config sorted out.

[hekad]
maxprocs = 4
base_dir = "."
share_dir = "hekad-location/share/heka/"

[PayloadEncoder]

# Input listening to port 
[app_logs]
type = "UdpInput"
address = ":34567"

# Output channels message received and just prints them
[LogOutput]
message_matcher = "Logger == 'app_logs'"
encoder = "PayloadEncoder"

And that's it, this will have a basic hekad based pipeline in place which can simply pick information from django logs.