Using Hekad to parse logs for relevant parts

We at Taxspanner have been looking at different options for analytics pipeline and setting up things to capture relevant information. Vivek had noticed Hekad and he was wondering if we could use syslogs which are already being generated by the app. Idea was to look for specific log in a format which could contain information like App name, Model, UUID and operation being done.

We followed basic setup guide to get a feel of it and it was processing nginx logs in tunes of millions very quickly. Apart from official documentation, this post talks about how to setup a quick filter around Hekad processing pipeline. We experimented with a client-server setup where client running on app server can tail the django log file, filter relevant log message and push it to server hekad instance aggregating logs from all app servers.

This was the client's side hekad config toml file:

maxprocs = 1
base_dir = "."
share_dir = "hekad-location/share/heka/"

# Input is django log file
type = "LogstreamerInput"
splitter = "TokenSplitter"
log_directory = "logs/"
file_match = 'django\.log'

# Decoder to parse logs and extracting relevant log
type = "SandboxFilter"
message_matcher = "Logger == 'django_logs'"
filename = "lua_decoders/django_logs.lua"

# Encoder for output streams

# We channel output generated from DjangoLogDecoder to a certain UDP port
message_matcher = "Logger == 'DjangoLogDecoder'"
address = ":34567"
encoder = "PayloadEncoder"

The Lua script to filter relevant log pretty small:

local string = require "string"
local table = require "table"

-- This structure could be used in better way
local msg = {
Timestamp   = nil,
Type        = msg_type,
Payload     = nil,
Fields      = nil

function process_message ()
    local log = read_message("Payload")
    if log == nil then
      return 0
    local log_blocks = {}
    for i in string.gmatch(log, "%S+") do
      table.insert(log_blocks, i)
    if table.getn(log_blocks) >= 4 then
      if log_blocks[3] == "CRITICAL" then
	msg.Payload = log
    return 0

With client instance in place now we get out listener config sorted out.

maxprocs = 4
base_dir = "."
share_dir = "hekad-location/share/heka/"


# Input listening to port 
type = "UdpInput"
address = ":34567"

# Output channels message received and just prints them
message_matcher = "Logger == 'app_logs'"
encoder = "PayloadEncoder"

And that's it, this will have a basic hekad based pipeline in place which can simply pick information from django logs.