← Glossary / Fact Table

What is Fact Table?

Fact table is the central table in a dimensional model, storing the quantitative metrics and foreign keys of a business event. In a scraping context, it holds the actual numerical data you extracted—prices, inventory levels, follower counts—while descriptive attributes live in surrounding dimension tables. Get the granularity wrong here, and every downstream aggregation in your pipeline will silently report incorrect numbers.

Data EngineeringDimensional ModelingStar SchemaMetricsOLAP

// 02 — definitions

The center of
the star.

The massive, append-only ledger of quantitative events that drives analytical queries and downstream aggregations.

Ask a DataFlirt engineer →

TL;DR

A fact table stores the quantitative measurements of a business process, surrounded by descriptive dimension tables. In scraping pipelines, it's where the actual extracted metrics—prices, stock counts, review scores—live, while the product names and categories sit in dimensions.

01Definition & structure

A fact table is the primary table in a dimensional model. It consists of two types of columns: foreign keys that link to dimension tables, and numeric measures (facts) that quantify the event. Because they record events or snapshots over time, fact tables are typically append-only and contain the vast majority of rows in a data warehouse.

02Types of fact tables

There are three primary types:

Transactional: One row per event (e.g., a single purchase).
Periodic Snapshot: One row per entity per time period (e.g., daily scraped price of a SKU). Most common in scraping pipelines.
Accumulating Snapshot: One row per entity that updates as it moves through a workflow (e.g., order fulfillment steps).

03The importance of granularity

The "grain" of a fact table defines exactly what one row represents. For a scraped pricing table, the grain must be explicitly defined—for example, "one row per SKU, per retailer, per day." If you mix grains (e.g., storing daily prices and weekly average prices in the same table), downstream `SUM()` or `AVG()` queries will double-count and return catastrophic errors.

04How DataFlirt structures scraped facts

We design our delivery pipelines to output clean, periodic snapshot fact tables. When we scrape 5 million products daily, we don't dump a massive JSON blob into your warehouse. We separate the product metadata into a dimension table and append 5 million narrow, highly-compressed rows into the fact table, partitioned by the scrape date. This keeps your query costs low and your BI dashboards fast.

05Did you know?

You can have a fact table with absolutely no facts. A "factless fact table" contains only foreign keys. It is used to record that an event occurred or a relationship existed at a point in time. For example, tracking which products were featured on a homepage on a given day requires no numeric metrics, just the date ID, product ID, and placement ID.

// 03 — fact table sizing

How big will
the table get?

Fact tables grow linearly with event volume and are typically the largest tables in a data warehouse. DataFlirt models row growth to provision appropriate partitioning and storage tiers in Snowflake or BigQuery.

Row growth = R = events_per_day × retention_days

A daily snapshot of 1M SKUs generates 365M rows annually. Kimball Dimensional Modeling

Storage size = S = (R × avg_row_bytes) / compression_ratio

Columnar databases typically achieve 3x to 10x compression on numeric facts. Data Warehouse Sizing

Query scan cost = C = partitions_scanned × bytes_per_partition × rate

Why partitioning fact tables by date is mandatory for cost control. BigQuery Pricing Model

// 04 — fact insertion

Loading scraped metrics
into the warehouse.

A micro-batch insert of scraped e-commerce pricing data into a daily snapshot fact table, resolving surrogate keys from dimensions.

dbtSnowflakemicro-batch

edge.dataflirt.io — live

CAPTURED

-- inserting into fact_product_price_daily
INSERT INTO warehouse.marts.fact_product_price_daily
SELECT
d_date.date_id,
d_product.product_id,
d_retailer.retailer_id,
raw.extracted_price AS price_amount,
raw.extracted_stock AS stock_quantity,
raw.is_promoted AS promotion_flag
FROM warehouse.staging.stg_scraped_prices raw
LEFT JOIN warehouse.marts.dim_date d_date
ON raw.scrape_date = d_date.full_date
LEFT JOIN warehouse.marts.dim_product d_product
ON raw.sku = d_product.sku

-- execution result
status: SUCCESS
rows_inserted: 1,240,500
null_foreign_keys: 0 // referential integrity maintained

// 05 — design failures

Where fact tables
break down.

Ranked by frequency of occurrence in downstream data warehouses. Poor granularity choices and mixed grains are the most common causes of analytical failure.

PIPELINES AUDITED · · 150+ models

IMPACT · · · · · · Query accuracy

UPDATED · · · · · · 2026-05-19

01

Mixed granularity

fatal error · Mixing daily and weekly metrics in one table

02

Null foreign keys

data loss · Facts that cannot join to dimensions are orphaned

03

Updating facts

anti-pattern · Mutating historical facts instead of appending

04

Too many dimensions

centipede · Table with 50+ foreign keys degrades performance

05

Text in fact table

bloat · Storing long strings instead of dimension keys

// 06 — our architecture

Append only,

partitioned by scrape time, clustered by target.

DataFlirt delivers scraped metrics directly into fact tables optimized for OLAP workloads. We enforce strict grain definitions—one row per SKU per scrape event—ensuring that downstream aggregations never double-count and historical price changes are perfectly preserved. We handle the surrogate key lookups during the transformation phase, so the data lands in your warehouse ready for immediate BI consumption.

fact_product_price_daily

Table metadata for a standard retail pricing fact table delivered by DataFlirt.

table.type transactional fact

grain 1 row per SKU per day

partitioning scrape_date_iddaily

clustering retailer_id, category_id

row_count 1.4B rows

null_foreign_keys 0

late_arriving_sla < 15m

Stay ahead of the pipeline

Data engineering
intel, weekly.

Anti-bot shifts, scraping infrastructure updates, dataset delivery patterns, and business outcomes from our pipelines. Short, technical, no fluff.

// 07 — FAQ

Common
questions.

About fact table design, granularity, handling scraped metrics, and how DataFlirt models data for analytical workloads.

Ask us directly →

What is the difference between a fact table and a dimension table? +

A fact table contains the quantitative measurements (prices, quantities, counts) and foreign keys. It is deep and narrow, growing constantly. A dimension table contains the descriptive attributes (product names, categories, locations). It is shallow and wide, changing slowly. Facts are the "verbs" (a price was recorded); dimensions are the "nouns" (the product, the store).

Why not just put everything in one big table? +

One Big Table (OBT) is fine for small datasets or specific BI extracts, but it creates massive redundancy. If you store the product name, brand, and category on every single daily price record for a million SKUs, your storage costs explode and updating a category name requires mutating millions of rows. Fact tables normalize the descriptive data out to dimensions.

How do you handle null metrics in a fact table? +

If a metric is genuinely missing (e.g., a product was out of stock so no price was displayed), store a null. Do not store zero, as zero is a valid measurement that will skew averages. If the entire event didn't happen, do not insert a row at all. Fact tables should only record events that actually occurred.

Can DataFlirt deliver directly into our fact tables? +

Yes. We can deliver raw JSON/CSV to a data lake, or we can run the transformation layer (via dbt or SQL) to resolve your existing dimension surrogate keys and insert the clean metrics directly into your Snowflake, BigQuery, or Redshift fact tables.

What is a factless fact table? +

A fact table that contains only foreign keys and no numeric metrics. It is used to record events or coverage. In scraping, a factless fact table might record that a specific SKU was present on a specific category page on a specific day, even if no price or stock data was extracted. It tracks the relationship itself.

How do you handle late-arriving facts? +

When a scrape job is delayed or retried, the extracted data might arrive after the dimension processing for that day has finished. We use robust surrogate key generation (often hashing the natural key) and idempotent inserts so that late-arriving facts can be safely appended without creating duplicate records or orphaned foreign keys.

$ dataflirt scope --new-project --target=fact-table READY

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off catalogue dump or a continuous feed across millions of records — we scope, build, and operate the pipeline.

Start a pipeline → View pricing

hello@dataflirt.com · Bengaluru · IST · typical reply < 4h