: New: Milestone: 27 Million Hits

Wow, it's the site's 27 millionth it. As usual, these "hits" aren't a measure of humans visiting pages; that count would be much lower. It's just requests to the website: every time a robot visits some page, the count goes up. If a human views a page that contains a dozen graphics, those graphics cause another dozen hits. So it's not as impressive as it sounds. But it's easy to measure so that's what I measure. We can take a look at the log: - - [04/Mar/2015:16:16:39 -0400] "GET /new/2010/12/04/charitable-giving-for-people-with-tiny-mailboxes/ HTTP/1.0" 200 4051 "-" "CCBot/2.0 (http://commoncrawl.org/faq/)"

This is not a human, it's a bot. It claims to be working for commoncrawl.org. There are many crawlers on the web—computer programs that try to download a copy of (much of) the web by fetching some web pages, seeing what other pages those pages link to, downloading those pages, and thus "crawling" from page to page. Many organizations do this; it's a little silly, a kind of wasted effort. Common Crawl crawls the web to get a copy and gives copies away to people. So if you're, say, a grad student studying link patterns on the web, you don't need to crawl it yourself; instead get a copy from Common Crawl and start studying.

I'm glad that Common Crawl visited my site so that plenty of grad students can try to make sense of it.

Tags: million site research

blog comments powered by Disqus