What is Data Lineage?
Data lineage is the complete historical record of a dataset's lifecycle — mapping its origin, transformations, and movements from raw extraction to final delivery. In scraping pipelines, it answers the critical question of why a specific field holds a specific value. Without lineage, debugging a downstream anomaly means guessing which extraction rule, type coercion, or deduplication step corrupted the record. With it, you have a deterministic audit trail that makes schema drift and parsing errors immediately traceable.