Wow, it's the site's nine-millionth hit:
184.108.40.206 - - [10/Jul/2007:21:58:59 -0400] "GET /departures/monterey/0/3267_diver_tm.jpg HTTP/1.1" 301 376 "-" "Googlebot-Image/1.0"
It looks like some Google web crawler is making sure that my photo of a diver in the Monterey Bay Aquarium from my Monterey travelog is still there.
"Millions of hits" doesn't mean that millions of people look at the site. Plenty of people do look at it. But there are plenty of robots, too. Maybe hits aren't the best thing to count. But it's not really clear what I do want to count. Counting hits is easy. So I count the hits--whether they be from humans, robots, or whatever.
Error hits add to the count. It's easier to count them than to decide which hits are errors and which aren't. I recently decided that, web-wise, my site was going to be lahosken.san-francisco.ca.us, not www.lahosken.san-francisco.ca.us. To do this I set up a "301 redirect". That is, any time someone points their browser at the web address www.lahosken..., the web server returns an error saying "Error 301: You meant lahosken.... When your browser sees one of these "301" errors, it knows to load the corrected address. But that generates two hits: first you try to load www.lahosken..., then you successfully load lahosken.... Eventually, no-one will have the "www" in their bookmarks and so these errors will stop happening. But since I just recently set up the redirect, the old bookmarks and links and whatnot have been boosting the count.
Oh, and the count... the count is not so rigorous. In theory, each night my web service provider rotates the log files: each night, some magical script somewhere renames the access log, so that I know it was "yesterday's" access log. A few hours later, my magical script runs over "yesterday's" access log; my script maintains the permanent long-term count, the thing that just ticked past nine million. Except that a few months ago, my web service provider's magic script had a hiccup. The log file didn't get renamed. My script happily read "yesterday's" log file--but that was really yester-yesterday's log file. So my script counted yester-yesterday's hits twice, artificially boosting the count. I noticed it happening. If I was super-rigorous, I would have subtracted out those numbers. I noticed it happened at least a couple of times since then. I didn't fix those either. It might have happened a few times when I didn't notice. I don't always pay so much attention. This morning, I noticed a different problem: the log file got renamed, but at a different time than usual. The result this time is that my magical script totally overlooked a day's worth of logs. They were named as if they were yester-yesterday's logs, but were really just a few hours old. I'm too lazy to fix that, too. I don't know how many times that's happened.
A few years ago, there was one of those double-counted days. I carefully fixed up my permanent count to undo the double-counting. I was more rigorous then, more careful.
Labels: million, site, trust the machine