Created by Laurence J MacGuire a.k.a Liu Jian Ming aka 刘建明
ThoughtWorks Xi'An, 2015/05/20
ssh user@system-01.xyz.com tail -f /var/log/my-application.log
ssh user@system-02.xyz.com tail -f /var/log/my-application.log
ssh user@system-XX.xyz.com tail -f /var/log/my-application.log
Nope! Nope! Nope!
“Distribute data and processing. Centralise access.”
– Me
127.0.0.1 - - [01/Feb/2015:12:41:18 +0800] "GET /optout_check?callback=Krux.ns._default.kxjsonp_optOutCheck HTTP/1.1" 404 506 "http://www.theguardian.com/us-news/2015/jan /31/detroit-aiyana-stanley-jones-police-officer-cleared" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/39.0.2171.65 Chrome/39.0. 2171.65 Safari/537.36"
$> cat access.log | awk '{print $1}' | sort -n | uniq -c | sort -nr
29877 93.xxx.xxx.xxx
17538 80.xxx.xxx.xxx
5895 198.xxx.xxx.xxx
3042 37.xxx.xxx.xxx
2956 208.xxx.xxx.xxx
2613 94.xxx.xxx.xxx
...
Cool! But inaccessible.
Started GET "/meta/processed_dates.json" for 127.0.0.1 at 2015-01-25 18:24:49 +0800
Processing by Meta::ProcessedDatesController#all as JSON
Rendered text template (0.0ms)
Completed 404 Not Found in 8.4ms (Views: 4.9ms | ActiveRecord: 0.5ms)
But there’s really cool information in there!
host=8.8.8.8 level=info ip=86.86.86.86 msg=hello world ...
Why not JSON?
{
"host":"8.8.8.8",
"level":"info",
"ip":"86.86.86.86",
"msg":"hello world"
}
About logs. But not really about logs.
Events go in, somehow
Do stuff to them
Put them somewhere
More or less
Nginx
log_format logstash_json '{ "@timestamp": "$time_iso8601", '
'"clientip": "$remote_addr", '
'"time_spent.total": $request_time, '
'"forwarded_for": "$http_x_forwarded_for", '
'"response.cache_status": "$upstream_cache_status", '
'"response.mime_type": "$sent_http_content_type", '
'"response.status": $status, '
'"response.size": $body_bytes_sent, '
'"request.path": "$request_uri", '
'"request.verb": "$request_method", '
'"request.referrer": "$http_referer", '
'"request.host": "$host", '
'"request.agent": "$http_user_agent" }';
Logstash
input {
lumberjack {
port => 443
ssl_certificate => "/opt/logstash/lumberjack.crt"
ssl_key => "/opt/logstash/lumberjack.key"
codec => "json"
}
}
output {
elasticsearch {
host => "my-elasticsearch-cluster.example.com"
cluster => "my-elasticsearch-cluster-name"
protocol => "http"
}
}
All your events are in a distributed, easily queryable datastore
curl -XPOST http://my-elasticsearchcluster.example.com/_search -d '{"query":{"bool":{"must":[{"term":{"http_access_logs.response.status":"404"}}],"must_not":[],"should":[]}},"from":0,"size":10,"sort":[],"facets":{}}'
{
"took": 1357,
"timed_out": false,
"_shards": {
"total": 4,
"successful": 4,
"failed": 0
},
"hits": {
"total": 184229,
"max_score": 1,
"hits": [
{
"_index": "logstash-2015.01.25",
"_type": "http_access_logs",
"_id": "AUsfjnQ7mXbLjGbOXHyr",
"_score": 1,
"_source": {
"@timestamp": "2015-01-25T05:20:10.182Z",
"type": "http_access_logs",
"timestamp": "2015-01-25T05:20:10.182641Z",
"request": {
"scheme": "http",
"host": "some-home.domain.com",
"port": 80,
"path": "/request/blah/sdfsdf",
"query": null,
"fragment": null,
"complete": "http://some-home.domain.com/request/blah/sdfsdf",
"verb": "GET",
"http_version":"HTTP/1.1",
"size": 0
},
"response": {
"status": 404,
"size": 21,
},
"time_spent": {
"request_processing": 0.000049,
"backend": 0.012,
"response_processing": 0.000042,
"total": 0.012505
},
"source": {
"lob": "mad",
"project": "my-project",
"application": "my-application",
"stack": "b",
"tier": "elb"
},
}
]
}
}
Well, kind of easy.
http://URL-REDATED/
“
EveryoneEverything!!!”
Adding an asterisk, just because
System logs
Application logs
“If your logs don’t give you a really, really good clue what happened, they’re useless.
– Me
Application Business logs
Try logging all business/domain relevant events
“Logs aren’t files full of stacktraces you should never read. They’re an event stream that should be treated and processed like a first-class citizen.
– Me
Developers are aware of the business context. Let them log appropriately.
{"id":"12414"}
Nope! We’re shoving this in a database. Treat it with respect.
{
"source": {
"environment": "production",
"project": "my-project",
"application": "my-application",
"tier": "nginx",
"host": "ip-10-13.24.52",
...
}
}
Unicorn, NGinx, HAProxy … they all log:
request: {
"verb": "GET",
"path": "/somewhere/file.html",
"http_version": "1.1",
"host": "virtual-host.domain.com",
"referer": "http://www.some-other-domain/page.html",
"user-agent": "Mozilla Compatible (blah blah)"
},
response: {
"status": 200,
"size": 123123,
"etc": "asdasda"
}
These app all log (more or less) the same information. Categorise accordingly.
Which user made this post?
{
"board.id":123,
"board.name":"Self-help",
"sender.id":3462,
"sender.username": "lmac",
"post.subjet": "Lol, wut?"
}
To and from which email address was this email sent?
{
"from.address":"bob@example.com",
"to.address":"alice@example.com",
"sender.username":"bob_awesome_saas",
"message.id": "eeaa4f14-c720-4c9f-b2f8-b5b859e49e46",
"message.subjet": "password reminder",
"message.status": "relayed"
}
Which build number of this app is running?
{
"source": {
"project": "my-project",
"application": "my-application",
"build-number": 1234
}
}
Why was this decision taken?
{
"entity": "email",
"status": "marked-spam",
"reason": "source ip not authorized by SPF record"
}
Cloudfront: X-Request-Id: eeaa4f14-c720-4c9f-b2f8-b5b859e49e46
Load Balancer: X-Request-Id: eeaa4f14-c720-4c9f-b2f8-b5b859e49e46
Local Proxy: X-Request-Id: eeaa4f14-c720-4c9f-b2f8-b5b859e49e46
Unicorn: X-Request-Id: eeaa4f14-c720-4c9f-b2f8-b5b859e49e46
Database: X-Request-Id: eeaa4f14-c720-4c9f-b2f8-b5b859e49e46
CloudFront -> ELB -> NGinx -> Unicorn -> ActiveRecord = 345ms
Even if you’re doing structured logging, always include a humand readable version, in a field such as “message”.
logger.debug(personal_information.to_json)
{
"name": {"first":"JianMing","last":"Liu"},
"birth": {"date":"8888/88/88", "location":"Somewhere"},
"job": {"company":"ThoughtWorks", "salary":"888888888888888888888888888888"},
"email": "ljm@thoughtworks.com",
"address": {"city":"XiAn","state":"ShanXi","road":"JinYeYiLu","number":34},
"credit_card": {"number":"88888888", "exp":"2020/12/12", "cvv":"123"},
"security_questions": { "mothers_name": "Mary", "first_pet": "dog" }
}
This can get us in very serious trouble.
You’ll need to de-normalise your data a lot.
Most event oriented databases are NOT relational.
Storage is cheap.
Pipe you app logs somewhere and get useful data.
As usual, I don’t really care how you do it.
rsyslog, files, directly over the wire, etc
rsyslog, Kinesis, Kafka, Lumberjack, Graylog, etc
Logstash, fluentd, rsyslog, scripts, splunk, etc
Elasticsearch, Influxdb, OpenTSDB, MongoDB, etc
Query: Kibana (3)