← Glossary / Image Blocking (Performance)

What is Image Blocking (Performance)?

Image blocking in a scraping context is the deliberate interception and abortion of network requests for image assets (JPEGs, PNGs, WebPs, AVIFs) during browser-based extraction. Because images often account for 60–80% of a page's total payload but rarely contain the target data, blocking them drastically reduces bandwidth egress, cuts page load times, and lowers memory pressure on the worker node. It is the highest-ROI optimization for headless browser pipelines.

Bandwidth OptimizationHeadless BrowsersPlaywrightEgress CostsResource Interception
// 02 — definitions

Cut the
fat.

Why downloading megabytes of product photos to extract a 10-byte price string is an architectural failure.

Ask a DataFlirt engineer →

TL;DR

Image blocking intercepts browser network requests before they hit the wire, aborting anything matching an image resource type. In Playwright or Puppeteer, this simple route interception can reduce bandwidth consumption by 70% and cut DOM ready times in half, directly increasing worker concurrency limits.

01Definition & structure
Image blocking is a performance optimization technique used in browser-based web scraping. By intercepting network requests at the browser level and aborting any request where the resourceType is an image, the scraper avoids downloading heavy media files. This reduces the total payload of a page by up to 80%, speeding up page load times and drastically reducing cloud egress costs.
02How it works in practice
In frameworks like Playwright or Puppeteer, you enable request interception on the page object. Before any HTTP request is sent to the network, the framework pauses it and passes it to your handler. If the request is for an image, font, or media file, you call request.abort(). The browser immediately stops trying to fetch the asset, leaving a broken image icon in the DOM but allowing the HTML and JavaScript to continue parsing unhindered.
03The lazy loading trap
The most common failure mode of naive image blocking is breaking lazy-loaded content. Many modern sites use JavaScript that waits for an image's onload event to fire before fetching the next batch of data or initializing a UI component. If you hard-abort the image request, the event never fires, and the scraper stalls waiting for content that will never appear.
04How DataFlirt handles it
We don't use hard aborts on production pipelines. Instead, our interception middleware mocks the response. When an image request is intercepted, we immediately fulfill it locally with a base64-encoded 1x1 transparent pixel. This consumes zero network bandwidth, but it satisfies the browser's layout engine, triggers all necessary onload JavaScript events, and passes anti-bot checks that look for rendered image dimensions.
05Did you know?
Images aren't the only assets you should be blocking. Web fonts (WOFF2) often account for hundreds of kilobytes per page and are entirely useless for data extraction. Similarly, third-party tracking scripts (Google Analytics, Meta Pixel) can be blocked to speed up network idle times and prevent the target site from polluting their analytics with your scraper traffic.
// 03 — the math

How much bandwidth
do you save?

Image blocking directly impacts your cloud egress bill and worker density. Here is how DataFlirt calculates the ROI of resource interception across our headless fleet.

Bandwidth Savings = S = Σ (reqimg × sizeavg) / total_payload
Typically 60–85% on e-commerce and media targets. Network payload analysis
Worker Density Increase = D = RAMtotal / (RAMbase + RAMdom)
Dropping images reduces decoded bitmap memory, allowing more tabs per CPU. Infrastructure scaling model
DataFlirt Egress Efficiency = E = bytes_extracted / bytes_transferred
Target > 0.05. Unoptimized headless runs at < 0.001. Internal SLO
// 04 — resource interception

Playwright routing
in action.

A live trace of a headless worker intercepting and aborting image requests on a heavy e-commerce product listing page.

Playwrightpage.routee-commerce
edge.dataflirt.io — live
CAPTURED
// navigation started
page.goto: "https://target-store.com/category/shoes"

// route interception active
req: document -> "index.html" [200 OK]
req: script -> "app.js" [200 OK]
req: image -> "hero-banner.jpg" [ABORTED]
req: image -> "product-1.webp" [ABORTED]
req: image -> "product-2.webp" [ABORTED]
req: font -> "inter.woff2" [ABORTED]

// metrics
images_blocked: 42
bandwidth_saved: 14.2 MB
dom_content_loaded: 850ms [FAST]
network_idle: 1200ms

// extraction
status: SUCCESS -> 40 records extracted
// 05 — the bottlenecks

What eats your
headless budget.

When running full browsers, rendering assets is the primary driver of infrastructure costs. Here is the breakdown of bandwidth consumption on a typical unoptimized retail page.

SAMPLE SIZE ·  ·  ·  ·    100k retail pages
AVG PAYLOAD ·  ·  ·  ·    4.8 MB
UPDATED ·  ·  ·  ·  ·  ·  2026-05-19
01

Images (JPEGs, WebP, PNG)

~72% of payload · Primary bandwidth sink, zero data value
02

Video / Media

~15% of payload · Autoplay backgrounds, heavy egress
03

JavaScript

~8% of payload · Required for SPA rendering, hard to block
04

Fonts

~3% of payload · Blocks text rendering, safe to abort
05

CSS / Styles

~2% of payload · Required for layout and visibility checks
// 06 — our stack

Block the payload,

but satisfy the layout engine.

Naive image blocking—simply aborting the request—often breaks modern web applications. Lazy-loading scripts wait for image onload events to trigger the next batch of content, and anti-bot systems check if elements have rendered with non-zero dimensions. DataFlirt's interception layer doesn't just abort; it intercepts the request and instantly returns a locally cached 1x1 transparent pixel. The browser layout engine is satisfied, the JavaScript events fire correctly, and the anti-bot checks see a rendered element—all while consuming zero network egress.

Interception Middleware

Live metrics from a DataFlirt worker node running 1x1 pixel mocking.

worker.id node-eu-west-42
interception.mode mock_1x1_pixel
requests.total 1,420
requests.mocked 984
egress.prevented 3.4 GB / hr
layout.shifts 0
script.errors 2

Stay ahead of the pipeline

Data engineering
intel, weekly.

Anti-bot shifts, scraping infrastructure updates, dataset delivery patterns, and business outcomes from our pipelines. Short, technical, no fluff.

// 07 — FAQ

Common
questions.

Common questions about resource interception, layout breakage, and optimizing headless browser performance.

Ask us directly →
Does blocking images speed up the actual scraping? +
Yes, drastically. Less bytes on the wire means faster network idle times, and fewer decoded bitmaps means significantly less CPU and RAM overhead for the browser process. This allows you to run more concurrent tabs per worker node.
Can image blocking get me flagged by anti-bot systems? +
Sometimes. Advanced systems like DataDome or Akamai check if images actually loaded to verify a real browser environment. That's why mocking the response with a 1x1 transparent pixel is safer than a hard network abort — it satisfies the DOM checks without the bandwidth penalty.
What happens to lazy-loaded content if I block images? +
If the site relies on image onload events to trigger the next page of an infinite scroll, a hard abort will stall the scraper. You must mock the response to trigger the event, tricking the site into thinking the image loaded successfully.
How does DataFlirt handle sites that require image analysis? +
If the pipeline requires OCR, logo detection, or visual diffing, we selectively allow specific image URLs based on regex patterns (e.g., allowing /product-main/) while blocking the rest of the page's decorative assets and tracking pixels.
Is it worth blocking CSS and Fonts too? +
Fonts, yes — they consume bandwidth and aren't needed for data extraction. CSS is risky; many scrapers rely on innerText or Playwright's visibility checks, which will fail if the CSS isn't loaded to unhide the target elements.
How much money does this actually save at scale? +
At 10 million pages a day, dropping a 3MB average image payload saves 30 TB of egress daily. In AWS or GCP egress costs, that translates to thousands of dollars a month saved purely through one line of interception code.
$ dataflirt scope --new-project --target=image-blocking-(performance) READY

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off catalogue dump or a continuous feed across millions of records — we scope, build, and operate the pipeline.

hello@dataflirt.com  ·  Bengaluru  ·  IST  ·  typical reply < 4h