I thought this was a cool bit of HTTP working well. My crawler sets it's
User Agent header, so the site owner knew who to reach out to.
The main purpose of robots.txt is to
disallow pages and directories, but there are some other directives that some crawlers will use.
Crawl-delay is one I follow because the only other option for site owners is to disallow the bot from crawling at all.
By following the spirit of the web, I get to crawl the site and the site gets to make sure bots don't request too much too quickly. Win - win.