What is Crawl Exclusion?
Crawl exclusion is the set of mechanisms—both technical and legal—that publishers use to prevent automated agents from accessing, indexing, or extracting specific paths on a domain. While robots.txt is the most common standard, exclusions also manifest as HTTP headers, meta tags, and WAF rules. For scraping engineers, honoring exclusions isn't just about being polite; it's the baseline requirement for maintaining IP reputation and avoiding permanent infrastructure bans.