207.46.12.178 - - [25/Dec/2011:22:21:57 -0400] "GET /new/atom.xml HTTP/1.1" 200 15073 "-" "msnbot-UDiscovery/2.0b (+http://search.msn.com/msnbot.htm)"
The user agent string makes me think this might be Microsoft bot... and it was coming from a machine on the search.msn.com, so... yeah. It was looking at atom.xml, a page that tells computers when I've updated my blog. Microsoft checks this special page... uhm, more than 60 times yesterday. If you're wondering why my millionth-update logs are usually by bots, that's because bots tend to visit often compared to humans. Humans have figured out that I update this blog once every few days. But bots are optimistic, they keep checking.
The humans still visit this site about as much as they used top; but the bots don't come so much. If you remember back to the misty past of 2010, there was one of these "million" posts coming along once every few months. I skipped making a 19-million post because that 19-millionth hit came so soon after the 18 millionth hit that I hadn't thought of anything to say in the meantime. But then something changed. Traffic slowed down. And it took many months to go from 19 million to 20 million. But it happened; here we are.
I don't know what robot[s] stopped coming to this site last year, what slowed down the rate of hits so much. I don't keep the old logs, that would feel creepy.
Maybe we can get an idea of the robots that visited the site yesterday by looking at the most "popular" user agent strings that show up in yesterday's logs.
- SISTRIX Crawler; http://crawler.sistrix.net/ Sistrix makes SEO (search engine optimization) tools. I'm not exactly sure why they think they need their own web crawler, but they do. Yesterday, it crawled hundreds of pages on my site that haven't changed in years. But I guess there's just one way to know for certain that those pages haven't changed recently: crawl them again. And again.
- YandexBot/3.0; +http://yandex.com/bots Yandex is the popular search engine in Russia. You might not think that Russian speakers would care much for my English-heavy articles. But I do get visits refered by Yandex for some image searches.
- bingbot/2.0; +http://www.bing.com/bingbot.htm It's the Bing search engine, checking hundreds of my old pages to see which of them I updated yesterday (none of them).
- Googlebot/2.1; +http://www.google.com/bot.html It's Googlebot, making similar checks, but fewer of them. And yet it seems to pick up on my site changes pretty quickly when they do happen.
- AhrefsBot/2.0; +http://ahrefs.com/robot/ Ahrefs is another SEO tool. According to their writeup, they look for links between sites. But yesterday, their robot wasn't just looking at my site's pages; it also looked at a bunch of my site's photos. Photos can't link to other things. I don't know whether that means that the bot's written description is misleading or the bot's poorly programmed.
- MSIE 7.0; Windows NT 5.2; .NET CLR 2.0; This almost looks like it could be someone on a Windows machine, except it's a little too generic. (Typical Windows users show up with a wide variety of user agent texts; there are many variations of Windows out there. You expect something longer and weirder than what we see here, something closer to MSIE 8.0; Windows NT 6.0; Trident/4.0; GTB7.2; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.5.30729; .NET CLR 3.0.307) And all the hits using this user agent came from some advertising agency network. And if you look at the hits, it doesn't look like a human made them; the timing's weird. You expect a human to look at a page for some amount of time before clicking through to the next, this "MSIE 7.0" doesn't act human. I think this is a bot whose creator tried to make it look more human by supplying misleading user agent text.
- Google Web Preview These are almost hits from humans. Some Googler users can see "previews" of results on a search results page. I got several dozen hits from Google as it "slurped" the pages so that it could render those previews. So humans triggered these hits... but the hits were still for robots so that they could render previews.
- AppleWebKit/534.52.7 (KHTML, like Gecko) Version/5.1.2 Safari/534.52.7 I think these were humans!
- Speedy Spider (http://www.entireweb.com/about/search_tech/speedy_spider/) "Entireweb" is a search engine. I'd forgotten that it existed, but it's there, it's a thing. This is their web robot.
- msnbot-UDiscovery/2.0b (+http://search.msn.com/msnbot.htm) Checked for blog updates more than 60 times yesterday.
I don't see Yahoo Slurp on this list. Wasn't there once a bot named Yahoo Slurp? I kind of remember that. Did Yahoo slurp stop crawling? Maybe it stopped crawling in mid-2011, and that would explain the robot-traffic slowdown on my site?
Anyhow, if you're a human and you're reading this, you're more precious to me than all the bots combined, sure. (But sometimes the bots are kind of interesting.) Thanks for reading!