What is Data Retention Policy?
Data retention policy is the formal protocol dictating how long scraped or ingested data is stored, where it resides during its lifecycle, and exactly when it is permanently destroyed. In scraping pipelines, retention isn't just about managing AWS S3 costs, it is a critical compliance boundary. Holding PII or copyrighted material longer than legally justified transforms a benign data lake into a massive liability.