Open Source Log Processing

An Attempt At “Big Data”

Created by Laurence J MacGuire a.k.a Liu Jian Ming

ThoughtWorks Xi’An, 2015/06/02

Creative Commons License

Splunk?

Splunk

:( or $$$

Open Source?

“Not only is it free. It’s better”

– Me

You can do really cool stuff

You can do really cool stuff

User interactions w/ out apps

You can do really cool stuff

Capture user input

You can do really cool stuff

Audit Trails - Who delete my stack?

You can do really cool stuff

Go meta - How long does it take to process log messages?

You can do really cool stuff

Go meta - How long does it take to process log messages?

You can do really cool stuff

Monitor beer brewing (!)

How?

1) Make it reliable

  • Tame your log transport
  • How long does it take from your app to Elasticsearch?
  • Under what circumstances can message be dropped?
  • Are these acceptable? If not, get back to work!

This is how

event = {
	# When the event was generated
	"@timestamp": "2015-08-12T10:06:15.000Z",

	# When the event was indexed/processed
	"@timestamp.indexed": "2015-08-12T10:06:25.001Z",

	# @timestamp.indexed - @timestamp == delay in seconds
	"@timestamp.indexed.delay": 10.001,

	# Delay rounded to the nearest second
	"@timestamp.indexed.delay.rounded": 10,

	# Delay arounded to the 5 second bins
	"@timestamp.indexed.delay.rounded.5": 10
	/* ... event continues */
}

This is how

2) “Have a clear ‘source of truth’ for log events”

  • Standardize on a transport technology
  • If using multiple tranports, merge sources into one

This is how

How?

3) “Make it easy for developers to submit data”

  • Make it a default on your systems
  • Provide your developers w/ simple libraries

This is how

This is how

require 'logging/lumberjack'
Logging.appenders.syslog( "syslog", {
  :layout => Logging.layouts.lumberjack(),
  :ident => "my-awesome-ruby-application"
})

Logging.logger["my-class"].info({:key => "value", :another_key: 123})

Done! This is all developers need to care about.

How?

4) “Ensure all apps log in an easily parseable format”

  • Your apps are easy to change (logging, log4j, logback, etc)
  • Apache/Nginx are easy to change
  • sshd, crond, syslog, all these can be easily parsed
  • Ruby on Rails logging. No. Stop using it.

This is how

How?

5) “Clearly mark the source of the messages – it’ll be useful later”

  • Which machine?
  • User?
  • Application?
  • Severity?
  • Network?
  • Destination email?

Each of those can be the slice of a pie chart!

How?

6) “Enable event tracing”

Akamai => HA Proxy => Nginx => Unicorn => Active Record

Follow events per transaction through the stack

This is how

Browser:

GET /users/1; DROP TABLE `users`; HTTP/1.1

CDN:

GET /users/1; DROP TABLE `users`; HTTP/1.1
x-request-id: 123-234-345

SQL

/* x-request-id: 123-234-345 */ SELECT * FROM `users` WHERE id = 1; DROP TABLE `users`;

Perhaps your CDN can set an ID? Perhaps you can compute one?

How

6) “Provide simple alternate input methods for non-traditional uses”

  • HTTP Requests?
  • TCP/UDP sockets?
  • Just tell people to send you their data

What we’ve got going

What we’ve got going

“It scales”

What we’ve got going

“It scales to OVER 9000!”

Logstash is great!

It will input, clean, manipulate and index your data easily.

It provides 90% of functionality with 10% of effort.

But its’ not perfect

Multiple threads?

  • Metrics plugin? Fail.
  • Output plugins may be limited on I/O.

Config format and custom plugins?

  • Config format may be limiting.
  • Sometimes it might be easier to write code.

Complex Event Processing

Going to the next level

Enter: Complex Event Processing

It’s much easier if you merge all your cleaned sources into one.

Once you get a clean event pipeline w/ Logstash, you can start thinking about C.E.P.

Complex Event Processing

How can you find if someone is port-scanning you?

Complex Event Processing

100% rejected connections. Every 12 hours.
Across multiple networks? Increasing destination port numbers?

I see what you did there

Complex Event Processing

  • 180 lines of (j)Ruby
  • One afternoon
  • Found that guy 45 minutes later

Complex Event Processing

Other Example Scenarios
  • Live web-analytics (trending pages, keywords, referers)
  • Alerts on (complex) event sequences
  • Correlate multiple live and static sources

Apache Storm

Apache Storm

Storm Topologies

Shuttle your events through a DAG of computations

Green Node = “Spout” = Data Input

Blue Node = “Bolt” = Data Processing

“Bolts” and “Spouts” map to a class

Each class runs in one or more threads

Apache Storm

Storm Provides

  • Thread Scheduling across many machines
  • Reliable messaging and accounting between machines
  • Simple scaling
    • Run 1x Spout “KinesisSpout”
    • Run 4x Bolt “ExtractData”
    • Run 2x Bolt “BeanCounting”
    • etc

Apache Storm

Storm does NOT provide

  • Built-in aggregations
  • Shared state types

Complex Event Processing

You may not require CEP

But plan as though you do. Your logs are first class citizens. And deserve to be treated as such.

You can get tremendous value out of them.