By Dusty — Dec 5, 2023

Don't crawl me so fast, bro

This weekend I had a developer contact to see if my crawler for Route285.com respected the crawl-delay directive in robots.txt. It does.

I thought this was a cool bit of HTTP working well. My crawler sets it's User Agent header, so the site owner knew who to reach out to.

The main purpose of robots.txt is to disallow pages and directories, but there are some other directives that some crawlers will use.

Crawl-delay is one I follow because the only other option for site owners is to disallow the bot from crawling at all.

By following the spirit of the web, I get to crawl the site and the site gets to make sure bots don't request too much too quickly. Win - win.