← Glossary / Resource Blocking

What is Resource Blocking?

Resource blocking is the practice of intercepting and aborting non-essential network requests—like images, fonts, media, and third-party tracking scripts—at the browser level before they consume bandwidth or CPU cycles. For headless scraping pipelines, it's the primary lever for reducing memory footprint, accelerating DOM-ready times, and preventing third-party analytics from flagging your automated sessions.

PlaywrightPuppeteerBandwidth OptimizationRequest InterceptionHeadless Performance
// 02 — definitions

Shed the
dead weight.

Why downloading 4 MB of hero images and tracking scripts to extract a 12-byte price string is an infrastructure anti-pattern.

Ask a DataFlirt engineer →

TL;DR

Resource blocking stops the browser from fetching assets that don't contribute to the data extraction goal. By aborting requests for .png, .woff2, and third-party analytics domains, pipelines routinely cut bandwidth consumption by 80% and reduce page load times from 4 seconds to under 800 milliseconds.

01Definition & structure
Resource blocking is a network-layer optimization technique used in browser-based scraping. By intercepting outbound requests before they leave the browser process, engineers can selectively abort requests for assets that are irrelevant to data extraction. The most commonly blocked resource types are:
  • image and media — massive bandwidth hogs.
  • font — delays text rendering and DOM stability.
  • stylesheet — safe to block on static sites, risky on SPAs.
  • Third-party script — analytics, ads, and telemetry that consume CPU.
02How it works in practice
In frameworks like Playwright or Puppeteer, resource blocking is implemented via the Chrome DevTools Protocol (CDP). When request interception is enabled, the browser pauses every outbound request and hands control to your Node/Python script. You evaluate the request's URL or resource type, and synchronously call abort() or continue(). Because this happens before the DNS lookup or TCP handshake, the bandwidth and latency costs are entirely bypassed.
03The risk of over-blocking
Blocking too much breaks the pipeline. If you block a site's core JavaScript bundle, the React/Vue application will never hydrate, leaving you with an empty <div id="root">. If you block CSS on a site that uses virtualized lists (like infinite scrolling grids), the JavaScript may fail to calculate element heights, causing the scroll event to never trigger the next API fetch. Blocking must be tuned to the specific architecture of the target.
04How DataFlirt handles it
We don't rely on static, hardcoded blocklists. Our orchestration layer profiles every new target by running a matrix of headless sessions with varying interception strictness. We measure DOM completeness against the required extraction schema. Once the optimal profile is found, it is locked in. This ensures we minimise proxy egress costs and maximise worker density without ever compromising data completeness.
05Did you know?
Many anti-bot vendors use "canary" resources to detect headless browsers. They might inject a 1x1 pixel tracking image into the DOM. If your script aggressively blocks all images, the server notes that the canary was never fetched, instantly flagging your session as an automated script. Sophisticated blocking requires allowing specific tracking pixels while dropping the heavy hero images.
// 03 — the efficiency math

How much does
blocking save?

Resource blocking directly impacts unit economics. DataFlirt's fleet telemetry tracks bandwidth and memory savings per blocked resource type to optimize our headless compute density and proxy egress costs.

Bandwidth Savings = S = 1 − ( bytes_with_blocking / bytes_full_render )
Typically 70–90% for e-commerce targets when media is dropped. DataFlirt pipeline telemetry
Memory Footprint Reduction = M = base_heap + ( DOM_nodes × 1.2 KB )
Blocking media prevents massive GPU buffer allocations in Chromium. V8 Engine Heap Profiling
Fleet Density Multiplier = D = concurrent_contexts × ( 1 / avg_load_time )
Aggressive blocking allows up to 3x more browser contexts per vCPU. DataFlirt infrastructure SLO
// 04 — playwright route interception

Dropping 3.2 MB
before it hits the wire.

A live trace of Playwright's network interception aborting media and tracking requests on a product listing page, while allowing the core GraphQL API call to resolve.

Playwrightroute.abort()CDP
edge.dataflirt.io — live
CAPTURED
// init interception profile: ecom-strict
page.route: "**/*" // matching all requests

// request stream
req.document: "https://target.com/p/123" ALLOWED
req.script: "https://target.com/app.js" ALLOWED
req.image: "https://cdn.target.com/hero.jpg" ABORTED (image)
req.font: "https://fonts.gstatic.com/s/...woff2" ABORTED (font)
req.script: "https://www.google-analytics.com/analytics.js" ABORTED (tracker)
req.fetch: "https://target.com/api/graphql" ALLOWED

// performance outcome
domcontentloaded: 412ms
networkidle: 680ms
bandwidth_consumed: 142 KB // 92% reduction
// 05 — the blocklist

What consumes
the most resources.

Ranked by average bandwidth and CPU consumption across DataFlirt's headless fleet. Images dominate proxy egress costs, but third-party scripts dominate CPU time and event loop blocking.

SAMPLE SIZE ·  ·  ·  ·    18M page loads
METRIC ·  ·  ·  ·  ·  ·   Resource weight
UPDATED ·  ·  ·  ·  ·  ·  2026-05-19
01

High-res images & video

Bandwidth heavy · Massive proxy egress costs, GPU memory bloat
02

Third-party analytics

CPU heavy · Blocks the main thread, high bot-detection risk
03

Web fonts (.woff2)

Render blocking · Delays text rendering and DOM stability
04

CSS frameworks

Parse heavy · Often safe to block, but risks breaking layout JS
05

Ad network iframes

Memory heavy · Spawns child processes and isolated contexts
// 06 — our architecture

Block aggressively,

but never break the render tree.

Naive resource blocking breaks modern web apps. If you block CSS on a site that uses JavaScript to check element dimensions before rendering data (like virtualized lists), your scraper hangs indefinitely. DataFlirt uses heuristic-based interception profiles. We fingerprint the target's rendering strategy during the discovery phase and deploy a custom blocklist that strips maximum weight without triggering anti-bot traps or breaking the hydration cycle.

Interception Profile

Active resource blocking configuration for a heavy single-page application.

profile.id spa-virtualized-list
block.images true
block.fonts true
block.css falselayout-dependent JS detected
block.trackers true
bandwidth.saved 84.2%
load.time 640ms

Stay ahead of the pipeline

Data engineering
intel, weekly.

Anti-bot shifts, scraping infrastructure updates, dataset delivery patterns, and business outcomes from our pipelines. Short, technical, no fluff.

// 07 — FAQ

Common
questions.

Common questions about request interception, performance tuning, and the risks of over-blocking in headless browsers.

Ask us directly →
What is the easiest way to block resources in Playwright? +
Use page.route('**/*', route => ...). Check route.request().resourceType() and call route.abort() for types like image, media, or font. For everything else, call route.continue(). It's a 5-line implementation that cuts bandwidth by 80%.
Can resource blocking trigger bot detection? +
Yes. If you block the anti-bot script itself (like Akamai or DataDome), the server will notice the missing sensor payload and block your subsequent API requests. Additionally, some sites use "canary" images; if the image isn't fetched, they know you're headless. You must profile the target before deploying a blanket blocklist.
Should I block CSS files? +
It depends on the target. For static HTML sites, blocking CSS is perfectly safe and speeds up parsing. For React/Vue apps, blocking CSS often breaks layout-dependent JavaScript (e.g., infinite scroll libraries that check clientHeight). When in doubt, allow CSS and block images.
Does blocking images save proxy bandwidth costs? +
Massively. Residential proxies are billed per gigabyte. A typical e-commerce page is 3 MB, of which 2.8 MB is imagery. Blocking images reduces your proxy bill by over 90% while extracting the exact same JSON or DOM data.
What's the difference between request interception and using an ad-blocker extension? +
Ad-blockers (like uBlock Origin) evaluate complex regex rulesets (EasyList) against every request, which consumes significant CPU time. Request interception by resource type (e.g., "abort all images") is an O(1) operation at the CDP layer. It's vastly more efficient for scraping.
How does DataFlirt determine what to block? +
We run an automated discovery phase for new targets. The system loads the page with full resources, then iteratively blocks CSS, fonts, and specific script domains, measuring if the target data still materialises in the DOM. The most aggressive successful profile is then locked in for production.
$ dataflirt scope --new-project --target=resource-blocking READY

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off catalogue dump or a continuous feed across millions of records — we scope, build, and operate the pipeline.

hello@dataflirt.com  ·  Bengaluru  ·  IST  ·  typical reply < 4h