← Glossary / Network Bandwidth Per Job

What is Network Bandwidth Per Job?

Network Bandwidth Per Job is the total volume of ingress and egress data transferred during a single execution of a scraping pipeline. It encompasses HTML payloads, JSON API responses, downloaded assets, proxy overhead, and TLS handshake bloat. Because residential proxy networks charge per gigabyte, unoptimized bandwidth consumption directly destroys the unit economics of a data pipeline.

Scraping PerformanceProxy EconomicsPayload OptimizationEgress CostsData Engineering
// 02 — definitions

Measure every
byte.

The hidden cost driver of web scraping, where uncompressed payloads and redundant asset downloads turn profitable pipelines into financial liabilities.

Ask a DataFlirt engineer →

TL;DR

Network bandwidth per job dictates the variable cost of a scraping run. While compute is cheap, residential proxy bandwidth is expensive. A pipeline that downloads 2MB of images to extract 400 bytes of JSON will burn through proxy budgets 5,000x faster than an optimized equivalent, making payload minimization a critical engineering discipline.

01Definition & structure
Network Bandwidth Per Job measures the total data transferred across the network during a scraping pipeline's execution. It includes the ingress (data downloaded from the target, such as HTML, JSON, images, and scripts) and egress (HTTP request headers, POST bodies, and data written to external storage). Because premium proxy networks bill by the gigabyte, bandwidth is the primary variable cost in modern web scraping.
02How it works in practice
When a scraper requests a page, it doesn't just download the text. A default headless browser will download the HTML, parse it, and immediately request every linked image, stylesheet, font, and JavaScript bundle. A single product page might weigh 15MB. If you are scraping 100,000 pages, that is 1.5 Terabytes of data. At $5/GB for residential proxies, an unoptimized job costs $7,500, whereas an optimized job blocking media might cost $150.
03The proxy pricing trap
Many engineers build scrapers on local machines or cloud VMs where bandwidth is effectively unmetered, leading to sloppy network habits. When the scraper is deployed to production and routed through a residential proxy pool to avoid IP bans, the per-gigabyte billing model suddenly applies. Failing to track network bandwidth per job is the most common reason scraping projects exceed their infrastructure budgets in the first month.
04How DataFlirt handles it
We treat bandwidth as a strict engineering constraint. Our extraction workers use aggressive request interception to abort non-essential assets at the network layer. We enforce Accept-Encoding: br, gzip on all requests, and we utilize hybrid proxy routing—fetching heavy, low-risk assets via cheap datacenter IPs, and reserving premium residential bandwidth strictly for high-risk API endpoints and HTML documents.
05Did you know?
Simply forgetting to include the Accept-Encoding header in your HTTP requests can increase your HTML payload size by 300% to 800%. Without this header, the target server will send uncompressed plain text instead of a Gzip or Brotli compressed response, instantly multiplying your proxy costs for zero benefit.
// 03 — the math

How expensive
is a scrape?

Bandwidth isn't just the HTML body. It includes headers, TLS negotiation, and proxy protocol overhead. DataFlirt models total byte cost per target to optimize routing.

Total Job Bandwidth = Btotal = Σ (req_bytes + res_bytes + tls_overhead)
Includes all HTTP headers, uncompressed bodies, and proxy tunneling bytes. Network Layer Analysis
Proxy Cost Per Job = Cjob = (Btotal / 1024³) × cost_per_gb
Residential proxy bandwidth typically costs $3 to $15 per GB. Proxy Economics
Payload Efficiency Ratio = E = extracted_bytes / Btotal
DataFlirt targets E > 0.05 for HTML targets, > 0.4 for JSON APIs. DataFlirt SLO
// 04 — network trace

Where the bytes
actually go.

A bandwidth profile of a single product page scrape using a headless browser through a residential proxy, before and after resource blocking.

PlaywrightResidential ProxyResource Blocking
edge.dataflirt.io — live
CAPTURED
// unoptimized run (default browser)
html_document: 142 KB
javascript_bundles: 3.4 MB // 14 files
images_media: 8.2 MB // 42 files
fonts_css: 850 KB
total_ingress: 12.59 MB
proxy_cost: $0.06 per page

// optimized run (dataflirt resource blocking)
route.abort("image", "media", "font")
route.abort("*.js") // except target hydration script
html_document: 142 KB
javascript_bundles: 210 KB // 1 file
images_media: 0 KB
fonts_css: 0 KB
total_ingress: 352 KB
proxy_cost: $0.0017 per page
bandwidth_reduction: 97.2%
// 05 — bandwidth hogs

What consumes
the proxy budget.

Ranked by average byte contribution in unoptimized scraping jobs across e-commerce and social media targets.

SAMPLE SIZE ·  ·  ·  ·    1.2B requests
WINDOW ·  ·  ·  ·  ·  ·   30d trailing
UPDATED ·  ·  ·  ·  ·  ·  2026-05-19
01

Images and Media

~88% of bytes · Usually irrelevant for text extraction
02

JavaScript Bundles

~75% of bytes · React/Vue hydration payloads
03

Uncompressed HTML

~60% of bytes · Missing Accept-Encoding: gzip
04

Fonts and CSS

~45% of bytes · Pure visual overhead
05

TLS/Proxy Overhead

~15% of bytes · Handshakes and tunneling protocols
// 06 — optimization stack

Don't pay for bytes,

you aren't going to extract.

DataFlirt's routing layer intercepts requests before they hit the expensive residential proxy network. We strip headers, enforce gzip/brotli compression, and aggressively block media, fonts, and non-essential JavaScript at the network level. By reducing the network bandwidth per job, we drop the marginal cost of extraction, allowing us to offer fixed-price data feeds rather than passing variable proxy costs onto the client.

Bandwidth Optimization Profile

Live metrics for a daily 500k-page e-commerce catalog scrape.

target.domain example-retail.com
accept_encoding brotli, gzip
resource.images blocked
resource.scripts whitelisted (hydration only)
avg_page_size 114 KB
job.total_bandwidth 57 GB
cost.savings $684 / run

Stay ahead of the pipeline

Data engineering
intel, weekly.

Anti-bot shifts, scraping infrastructure updates, dataset delivery patterns, and business outcomes from our pipelines. Short, technical, no fluff.

// 07 — FAQ

Common
questions.

About bandwidth optimization, proxy economics, resource blocking, and how DataFlirt minimizes network overhead at scale.

Ask us directly →
Why does bandwidth matter if I use AWS/GCP where ingress is free? +
Cloud provider ingress is free, but scraping requires proxy networks to avoid IP bans. Residential and mobile proxy providers charge strictly by the gigabyte (often $5 to $15 per GB). If your scraper downloads 5MB of images per page, a 1-million page job costs $25,000 in proxy bandwidth alone.
How do I reduce bandwidth when using a headless browser? +
Implement aggressive request interception. Use Playwright or Puppeteer's routing capabilities to abort requests for images, media, fonts, and CSS. If the target data is in the initial HTML, block JavaScript entirely.
Does blocking resources affect anti-bot detection? +
Yes, it can. Some advanced anti-bot systems (like DataDome or Akamai) monitor whether essential assets (like their own tracking scripts or specific fonts) are loaded. You must selectively whitelist anti-bot scripts while blocking heavy media assets to balance bandwidth savings with detection evasion.
What is the difference between ingress and egress bandwidth in scraping? +
Ingress is the data downloaded from the target site (HTML, JSON, images). Egress is the data your scraper sends (HTTP requests, POST payloads) and the extracted data delivered to your storage (S3, databases). Ingress dominates proxy costs; egress dominates cloud provider costs.
How does DataFlirt handle bandwidth-heavy targets? +
We use a hybrid routing approach. We fetch the initial HTML through a cheap datacenter proxy to analyze the payload. If the data requires a residential IP, we route only the specific API calls or minimal hydration scripts through the expensive network, cutting proxy bandwidth by up to 95%.
Is it legal to block ads and trackers during scraping? +
Yes. As a client making HTTP requests, you have no legal obligation to download or render third-party scripts, ads, or analytics trackers. Blocking them is standard practice for performance, security, and bandwidth optimization.
$ dataflirt scope --new-project --target=network-bandwidth-per-job READY

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off catalogue dump or a continuous feed across millions of records — we scope, build, and operate the pipeline.

hello@dataflirt.com  ·  Bengaluru  ·  IST  ·  typical reply < 4h