What is Data Archiving?
Data archiving is the automated process of moving historical, infrequently accessed scraped data from expensive hot storage (like PostgreSQL or Elasticsearch) to cost-effective cold storage (like S3 Glacier). It is not a backup — it is a lifecycle transition for data that must be retained for compliance, longitudinal analysis, or model training, but no longer requires millisecond query latency. Without a strict archiving policy, a high-volume scraping pipeline will eventually crush its own database under the weight of its historical success.