What is Apache Avro?
Apache Avro is a row-based data serialization system that relies on JSON for defining data types and protocols, but serializes the actual data into a compact binary format. In scraping pipelines, it acts as the strict schema contract between the extraction workers and downstream message queues like Kafka. If a scraper hallucinates a string where an integer belongs, Avro catches it at the edge, preventing poisoned records from silently corrupting your data warehouse.