What is Data Provenance?
Data provenance is the immutable record of a dataset's origin, custody, and transformation history. In web scraping, it answers the critical questions of where a specific record was fetched from, when it was extracted, which proxy IP was used, and what schema version parsed it. Without strict provenance metadata attached to every row, downstream data consumers cannot audit quality, debug pipeline failures, or prove legal compliance when challenged.