What is Focused Crawler?
Focused crawling is a crawl strategy that restricts traversal to pages topically relevant to a predefined domain of interest — ignoring all outbound links that score below a relevance threshold. Instead of mapping the entire web graph, it stays on-topic: a crawler targeting product listings never follows the "About us" or "Press" links that would waste quota and dilute your dataset. For data pipelines, it's the difference between 10k relevant records and 10k records of noise.