← Glossary / Cost Per 1000 Requests (CPM)

What is Cost Per 1000 Requests (CPM)?

Cost Per 1000 Requests (CPM) is the standard unit economics metric for web scraping infrastructure. It aggregates the blended cost of compute, proxy bandwidth, anti-bot bypass, and egress into a single measurable figure per thousand HTTP requests. While simple HTTP GETs cost fractions of a cent, headless browser renders on residential IPs can push CPM into the dollars. Tracking CPM prevents silent margin erosion as target sites escalate their anti-bot defenses.

Unit EconomicsFinOpsProxy BandwidthCompute CostAnti-Bot Overhead
// 02 — definitions

Unit economics
of extraction.

Why measuring the cost of a single HTTP request is meaningless, and how aggregating at the thousand-request level exposes the true cost of your pipeline.

Ask a DataFlirt engineer →

TL;DR

CPM (Cost Per Mille) normalizes the variable costs of scraping - proxies, compute, CAPTCHA solving, and storage - into a predictable metric. A pipeline fetching static HTML via datacenter IPs might run at a $0.02 CPM, while a Playwright script routing through premium residential proxies to bypass DataDome can easily exceed $3.50 CPM.

01Definition & structure
Cost Per 1000 Requests (CPM) is the standard metric for evaluating the unit economics of a scraping pipeline. It is calculated by taking the total cost of a scraping job - including compute instances, proxy bandwidth, CAPTCHA solving services, and data egress - and dividing it by the number of thousands of requests made. It provides a normalized baseline to compare the efficiency of different scraping architectures.
02The proxy bandwidth trap
Unlike datacenter proxies which are often billed per IP, residential proxies are almost universally billed by bandwidth (per GB). This fundamentally changes scraping economics. If a target page is 2MB, and residential bandwidth costs $10/GB, the proxy cost alone is $20 per 1000 requests. Optimizing payload size by blocking media and intercepting unnecessary network requests is the most effective way to lower CPM.
03Headless vs. HTTP costs
The choice of fetch layer drastically impacts compute CPM. A lightweight HTTP client like aiohttp requires minimal RAM and CPU, allowing thousands of concurrent requests on a small server. Running a headless browser like Playwright requires significant memory and CPU per tab. Shifting a pipeline from HTTP to Headless to bypass a new JavaScript challenge will typically increase the compute portion of your CPM by 10x to 50x.
04How DataFlirt handles it
We treat CPM optimization as a core engineering discipline. Our infrastructure automatically downgrades requests to the cheapest viable proxy tier - attempting datacenter IPs first, escalating to ISP proxies, and only using premium residential IPs when strictly necessary. We also deploy aggressive network interception rules to block tracking scripts, fonts, and images before they consume residential bandwidth, keeping our blended CPM well below industry averages.
05Did you know?
A 403 Forbidden response often costs more than a 200 OK. When a request is blocked, the scraper typically executes retry logic, rotating to a new proxy IP and attempting the request again. If a target site increases its anti-bot sensitivity, your success rate drops, but your total requests (and therefore your proxy bandwidth and compute time) spike as the system struggles to hit its extraction targets.
// 03 — the math

How do you
calculate CPM?

True CPM requires fully burdened costs. DataFlirt's billing engine calculates this dynamically, factoring in retry overhead and bandwidth consumption per target.

Blended CPM = (Compute + Proxy + Bypass) / (Total_Requests / 1000)
The baseline cost of executing 1,000 requests, regardless of success. Standard FinOps model
Proxy Cost Contribution = Avg_Payload_MB × Cost_per_MB × 1000
For residential proxies billed by bandwidth, payload size dictates the CPM floor. Infrastructure planning
Effective CPM = Raw_CPM / Success_Rate
A $1.00 CPM with a 50% success rate is actually a $2.00 Effective CPM. DataFlirt pipeline metrics
// 04 — cost trace

A $2.40 CPM pipeline,
itemized.

Cost breakdown for a 10,000 request batch against a Cloudflare-protected e-commerce target using residential proxies and headless rendering.

PlaywrightResidential Proxy10k batch
edge.dataflirt.io — live
CAPTURED
// Batch initialization
target: "https://shop.example.com/category/*"
requests_attempted: 10,000
requests_successful: 9,842

// Compute costs (AWS Fargate)
compute.duration: 14,200s
compute.cost: $4.12

// Proxy bandwidth (Residential pool)
proxy.egress: 1.42 GB
proxy.rate: $12.00 / GB
proxy.cost: $17.04

// Anti-bot & bypass
bypass.captcha_solved: 412
bypass.cost: $2.88

// Final Unit Economics
total_cost: $24.04
cpm_raw: $2.40
cpm_effective: $2.44 // burdened by 1.58% failure rate
// 05 — cost drivers

Where the budget
actually goes.

Ranked by their contribution to total CPM across DataFlirt's managed pipelines. Proxy bandwidth dominates modern scraping costs due to the shift toward residential IPs.

DATASET ·  ·  ·  ·  ·  ·  1.2B requests
WINDOW ·  ·  ·  ·  ·  ·   30d trailing
UPDATED ·  ·  ·  ·  ·  ·  2026-05-19
01

Residential proxy bandwidth

Payload size × per-GB cost · The largest variable expense in modern scraping
02

Headless compute

Memory & CPU · Running Chromium is 10x-50x more expensive than HTTP clients
03

Anti-bot bypass

CAPTCHA & premium routing · Spikes unpredictably when targets update WAF rules
04

Retries and failures

Wasted bandwidth · A 403 Forbidden still costs proxy bandwidth
05

Egress and storage

S3 / Snowflake costs · Moving extracted data to the final destination
// 06 — FinOps

Stop paying for HTML,

start paying for data.

When you manage your own infrastructure, your CPM is highly volatile. A target deploys a new WAF rule, your retry rate spikes, and your proxy bill doubles overnight. DataFlirt abstracts this volatility. We optimize the payload, block media assets at the network layer, and route requests through the cheapest viable proxy tier. You pay a predictable rate for successful records, and we absorb the infrastructure variance.

Pipeline FinOps Profile

Live cost metrics for a high-volume retail catalog pipeline.

pipeline.id retail-catalog-04
target.waf DataDome
proxy.tier hybrid-residential
media.blocked true
payload.avg 142 KB
cpm.current $1.85
cpm.trend -12% (30d)

Stay ahead of the pipeline

Data engineering
intel, weekly.

Anti-bot shifts, scraping infrastructure updates, dataset delivery patterns, and business outcomes from our pipelines. Short, technical, no fluff.

// 07 — FAQ

Common
questions.

Common questions about scraping unit economics, cost optimization, and how DataFlirt manages infrastructure spend.

Ask us directly →
Why use CPM instead of Cost Per Record? +
CPM measures infrastructure efficiency; Cost Per Record measures business value. You need both. A highly optimized pipeline with a $0.50 CPM is useless if it's extracting empty records, resulting in an infinite Cost Per Record. Conversely, a high CPM is acceptable if the data extracted is highly valuable. We track both metrics simultaneously.
How much does a typical HTTP request cost? +
It varies wildly by architecture. Datacenter IPs with simple HTTP clients (like aiohttp or Go's net/http) cost $0.01 to $0.05 per 1000 requests. Residential proxies combined with headless browsers (Playwright/Puppeteer) cost $1.00 to $5.00+ CPM. The goal is always to use the cheapest architecture that successfully bypasses the target's anti-bot stack.
How can I lower my proxy bandwidth costs? +
Block images, fonts, and CSS at the proxy or browser level. Ensure gzip or brotli compression is enabled in your headers. Abort requests for third-party analytics scripts. Because residential proxies charge by the gigabyte, every kilobyte you prevent from downloading drops your CPM directly.
Does DataFlirt charge by CPM or by record? +
We offer both models. For raw infrastructure APIs where you control the extraction, we charge by CPM. For managed data feeds where we handle the end-to-end pipeline, we charge per successful record. In the latter model, we absorb the CPM variance ourselves, giving you predictable data costs.
Why did my CPM spike suddenly? +
Usually, the target site increased their anti-bot sensitivity. This triggers more retries, more CAPTCHA challenges, and forces your scraper to consume more residential bandwidth to get the same amount of data. A sudden CPM spike is often the first leading indicator of a silent pipeline failure.
Is it legal to scrape if it costs the target money? +
Scraping consumes target bandwidth and compute. While scraping public data is generally lawful, aggressive scraping that causes material financial harm to the target's infrastructure can trigger Computer Fraud and Abuse Act (CFAA) claims or trespass to chattels. Rate limiting and respecting robots.txt is a legal and financial necessity, not just politeness.
$ dataflirt scope --new-project --target=cost-per-1000-requests-(cpm) READY

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off catalogue dump or a continuous feed across millions of records — we scope, build, and operate the pipeline.

hello@dataflirt.com  ·  Bengaluru  ·  IST  ·  typical reply < 4h