Don't crawl me so fast, bro
This weekend I had a developer contact to see if my crawler for Route285.com respected the crawl-delay
directive in robots.txt. It does.
I thought this was a cool bit of HTTP working well. My crawler sets it's User Agent
header, so the site owner knew who to reach out to.
The main purpose of robots.txt is to disallow
pages and directories, but there are some other directives that some crawlers will use.
Crawl-delay
is one I follow because the only other option for site owners is to disallow the bot from crawling at all.
By following the spirit of the web, I get to crawl the site and the site gets to make sure bots don't request too much too quickly. Win - win.