Don't crawl me so fast, bro

This weekend I had a developer contact to see if my crawler for Route285.com respected the crawl-delay directive in robots.txt. It does.

I thought this was a cool bit of HTTP working well. My crawler sets it's User Agent header, so the site owner knew who to reach out to.

The main purpose of robots.txt is to disallow pages and directories, but there are some other directives that some crawlers will use.

Crawl-delay is one I follow because the only other option for site owners is to disallow the bot from crawling at all.

By following the spirit of the web, I get to crawl the site and the site gets to make sure bots don't request too much too quickly. Win - win.

Subscribe to WebDev news from candland.net

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe