New: Milestone: 10 Million Hits (including 26000 strange ones)

Wow, it's the site's ten-millionth hit. In decimal notation, that is a very round number. Let's take a look at the log record of that hit: - - [23/Oct/2007:08:33:47 -0400] "GET /frivolity/LuxiSerif-Bold_pfa_u_tm.png HTTP/1.1" 503 413 "-" "MSNPTC/1.0"

This is probably a 'bot, a crawler, a program that automatically reads web pages without human intervention following web links to find other web pages to download. I couldn't figure that out just from this record, but looking at the many many records that precede it, I see that the same, uhm, entity is downloading a lot of files without taking much time to read them. The internet address is, which is at Beijing Telecom--so perhaps this is a Chinese user using Beijing Telecom as an ISP? My site returned a status code 503 which, roughly speaking, means "You're asking me for stuff too quickly. Please slow down." As I look at previous requests that this bot made, I see that it did not check for the existence of a robots.txt file which suggests it was either written by an ignoramus or else it is illegitimate or both. As I keep looking at previous requests, I also notice that the bot tries to read several nonexistent files--so I guess it probably was coded by someone incompetent. Hey, now that I look more closely I notice that /frivolity/LuxiSerif-Bold_pfa_u_tm.png, the file that this ten-millionth hit asked for--that file doesn't exist. If the 'bot had asked for /frivolity/tav/LuxiSerif-Bold_pfa_u_tm.png then it would have been onto something.

This crude bot is not the strangest phenomenon to hit the web site recently. I never would have noticed that little bot if it hadn't been responsible for the site's 10000000th hit.

The strangest thing recently has been the 26 thousand visits to the Book Report: Leave Me Alone, I'm Reading page. To compare, that page has had more hits in the few weeks of its existence than, say, my Japanese Ska page has had in the past few years. Hundreds of hits a day. It's not people using browsers. People-controlled browsers report refering pages. People-controlled browsers download the pictures that go with a page. These "visitors" haven't reported a referer, haven't downloaded graphics. They come from a wide variety of IP addresses. If the requests came 1000 times per second instead of 1000 times per day, I'd think they were a distributed denial-of-service attack. If I displayed advertising, I'd think they were trying to corrupt my advertising statistics. Oh, and some of them garble the file name in strange ways like the middle request here: - - [27/Sep/2007:21:31:47 -0400] "GET /new/2007/08/book-report-leave-me-alone-im-reading.html HTTP/1.1" 200 7953 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)" - - [27/Sep/2007:21:32:03 -0400] "GET /new/2007/08/book-report-leave-me-alone%0D%0A1bd5%0D%0A-im-reading.html HTTP/1.1" 404 1123 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)" - - [27/Sep/2007:21:33:03 -0400] "GET /new/2007/08/book-report-leave-me-alone-im-reading.html HTTP/1.1" 200 7953 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)"

Could someone be trying to crack a web server by putting gobbledegook into the requested address and hoping to choke the web server program and trick it into... writing that gobbledegook into memory somewhere where it might get executed? It seems like there are other easier ways to crack into systems, ways more likely to succeed. I have no idea what the story is behind these 26000 hits. If you know or if you have an amusing theory, please drop me a line.

Labels: ,

Posted 2007-10-23