Parsing Log Files

Apache Log File

Let's first list the important information that we may need from the Apache logs
  • IP address
  • Time stamp
  • HTTP method
  • URI path
  • Response code
  • User agent
To read a log file, I prefer to read it as lines
1
apache_logs = File.readlines "/var/log/apache2/access.log"
Copied!
I was looking for a simple regular expression for Apache logs. I found one here with small tweak.
1
apache_regex = /(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}) - (.{0})- \[([^\]]+?)\] "(GET|POST|PUT|DELETE) ([^\s]+?) (HTTP\/1\.1)" (\d+) (\d+) "-" "(.*)"/
Copied!
So I came up with this small method which parses and converts Apache "access.log" file to an array contains a list of hashes with our needed information.
1
#!/usr/bin/env ruby
2
# KING SABRI | @KINGSABRI
3
4
5
apache_logs = File.readlines "/var/log/apache2/access.log"
6
7
def parse(logs)
8
9
apache_regex = /(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}) - (.{0})- \[([^\]]+?)\] "(GET|POST|PUT|DELETE) ([^\s]+?) (HTTP\/1\.1)" (\d+) (\d+) ([^\s]+?) "(.*)"/
10
11
result_parse = []
12
logs.each do |log|
13
parser = log.scan(apache_regex)[0]
14
15
# If can't parse the log line for any reason.
16
if log.scan(apache_regex)[0].nil?
17
puts "Can't parse: #{log}\n\n"
18
next
19
end
20
21
parse =
22
{
23
:ip => parser[0],
24
:user => parser[1],
25
:time => parser[2],
26
:method => parser[3],
27
:uri_path => parser[4],
28
:protocol => parser[5],
29
:code => parser[6],
30
:res_size => parser[7],
31
:referer => parser[8],
32
:user_agent => parser[9]
33
}
34
result_parse << parse
35
end
36
37
return result_parse
38
end
39
40
require 'pp'
41
pp parse(apache_logs)
Copied!
Returns
1
[{:ip=>"127.0.0.1",
2
:user=>"",
3
:time=>"12/Dec/2015:20:09:05 +0300",
4
:method=>"GET",
5
:uri_path=>"/",
6
:protocol=>"HTTP/1.1",
7
:code=>"200",
8
:res_size=>"3525",
9
:referer=>"\"-\"",
10
:user_agent=>
11
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.80 Safari/537.36"},
12
{:ip=>"127.0.0.1",
13
:user=>"",
14
:time=>"12/Dec/2015:20:09:05 +0300",
15
:method=>"GET",
16
:uri_path=>"/icons/ubuntu-logo.png",
17
:protocol=>"HTTP/1.1",
18
:code=>"200",
19
:res_size=>"3689",
20
:referer=>"\"http://localhost/\"",
21
:user_agent=>
22
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.80 Safari/537.36"},
23
{:ip=>"127.0.0.1",
24
:user=>"",
25
:time=>"12/Dec/2015:20:09:05 +0300",
26
:method=>"GET",
27
:uri_path=>"/favicon.ico",
28
:protocol=>"HTTP/1.1",
29
:code=>"404",
30
:res_size=>"500",
31
:referer=>"\"http://localhost/\"",
32
:user_agent=>
33
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.80 Safari/537.36"}]
Copied!
Note: The Apache LogFormat is configured as LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"" combined which is the default configurations.
  • %h is the remote host (i.e. the client IP address)
  • %l is the identity of the user determined by identd (not usually used since not reliable)
  • %u is the user name determined by HTTP authentication
  • %t is the time the request was received.
  • %r is the request line from the client. ("GET / HTTP/1.0")
  • %>s is the status code sent from the server to the client (200, 404 etc.)
  • %b is the size of the response to the client (in bytes)
  • Referer is the page that linked to this URL.
  • User-agent is the browser identification string.

IIS Log File

Here is a basic IIS log regular expression
1
iis_regex = /(\d{4}-\d{2}-\d{2}) (\d{2}:\d{2}:\d{2}) (\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}) ([^\s]++?) (\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}) (\d{2}) (GET|POST|PUT|DELETE) ([^\s]++?) - (\d+) (\d+) (\d+) (\d+) ([^\s]++?) (.*)/
Copied!
Last modified 3yr ago