If you’re anything like me, you spend far too much time checking your web server stats, and not enough time actually creating content and coding. Thankfully, my logs are always close at hand since I work almost exclusively on the command line. With the help of a few common unix filters, you can quickly gauge how things are going on your site. These commands work with Apache, or Apache compatible log files, and can probably be tweaked to work with other log file formats pretty easily.

Here’s what a line from an Apache log file looks like (I added the line break):

71.206.3.109 - - [12/Jul/2007:09:16:31 -0500] "GET / HTTP/1.1"
    200 33545 "-" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en)
    AppleWebKit/419 (KHTML, like Gecko) Safari/419.3"

For a quick feel for how many visitors I’ve received for the day I use grep to find all of today’s requests, awk to extract the IP address, then sort and uniq to eliminate duplicate IPs (sort is necessary because uniq only works with sorted input). Piping the result through wc results in the number of unique IP addresses that have made requests:

# grep "12/Jul" immike.net-access.log | awk '{print $1}' | \\
> sort | uniq | wc -l
1188

To determine how many people visited an individual page, you can add a grep for that page’s URL before sending the logs through awk:

# grep "12/Jul" immike.net-access.log | \\
> grep "interview-with-leah-culver" | \\
> awk '{print $1}' | sort | uniq | wc -l
261

For a bit more detail, the following command will determine the 10 most requested pages (excluding css, js, gif, ico, png, and jpg files) and list them in order:

# awk '{print $7}' immike.net-access.log | \\
> grep -ivE '(\.gif|\.jpg|\.png|\.ico|\.css|\.js)' | \\
> sed 's/\\/$//g' | sort | uniq -c | sort -rn | head -10
   2972 /blog/feed
   2590 /blog/2007/07/06/interview-with-leah-culver-the-making...
    712 /blog/2007/04/06/5-regular-expressions-every-web...
    648 /robots.txt
    588 /blog/2007/04/06/the-absolute-bare-minimum-every...
    467 /blog
    321
    280 /blog/2007/06/21/extreme-regex-foo-what-you-need-to...
    279 /blog/2007/07/03/full-text-search-with-apache-lucene
    195 /blog/2007/07/06/interview-with-leah-culver-the-making...

A similar command can be used to list the top 10 referrers. I’ve added an additional filter (the first grep) to remove any pages from my own site, since I’m only interested in counting referrals from external sites:

# awk '{print $11}' immike.net-access.log | grep -v 'immike.net' | \\
> grep -v '"-"' | sort | uniq -c | sort -rn | head -10
    341 "http://www.google.com/reader/view/"
    328 "http://simonwillison.net/2007/Jul/7/interview/"
    184 "http://feeds.feedburner.com/ImMike"
    114 "http://www.djangoproject.com/weblog/2007/jul/08/django...
    112 "http://www.dzone.com/rsslinks/interview_with_leah...
    112 "http://www.dzone.com/links/interview_with_leah...
     74 "http://www.planetpython.org/"
     57 "http://www.santosj.name/programming/php-related/php...
     57 "http://blog.assembleron.com/2007/07/10/competition...
     51 "http://agiletesting.blogspot.com/2007/07/another-django...

So, there’s a wealth of information sitting in Apache log files, and you don’t need fancy log analyzers to get at it. Just don’t spend too much time grokking it, or you’ll never get anything done!