What is URL Frontier?
URL frontier is the data structure at the heart of any crawler — the queue that holds discovered-but-not-yet-fetched URLs and determines what gets crawled next, in what order, and at what rate. A poorly designed frontier causes duplicate fetches, host hammering, priority inversion, and memory exhaustion; the frontier is where crawl efficiency is won or lost before a single request fires.