What is Data Schema?
Data schema is the formal contract defining the structure, types, and constraints of extracted records before they enter a data warehouse. In scraping pipelines, it acts as the defensive perimeter against upstream site changes. Without a strict schema, a drifted CSS selector silently injects nulls or malformed strings into your downstream analytics, breaking dashboards and ML models. A versioned schema ensures that when the target site changes, the pipeline fails loudly rather than failing silently.