What is Apache ORC?
Apache ORC (Optimized Row Columnar) is a highly efficient, strongly typed columnar storage format designed for massive analytical workloads. For data engineering teams ingesting terabytes of scraped records, ORC dramatically reduces storage costs and query latency by organizing data into compressed stripes and embedding lightweight indexes that allow query engines to skip irrelevant data entirely.