What is Data Compaction?
Data compaction is the background process of merging thousands of small, incrementally written files into fewer, larger files optimized for read performance. In scraping pipelines that stream data continuously, the "small file problem" destroys query speed and inflates cloud storage API costs. Compaction reorganizes these fragmented writes into contiguous columnar blocks, trading background compute for massive downstream read efficiency.