← Glossary / Conditional GET (ETag)

What is Conditional GET (ETag)?

A Conditional GET (ETag) is an HTTP mechanism that allows a client to ask a server for a resource only if it has changed since the last fetch. By passing an entity tag (ETag) or a Last-Modified timestamp in the request headers, the server can return a lightweight 304 Not Modified response instead of the full payload. For scraping pipelines, mastering conditional requests is the difference between saturating your proxy bandwidth with redundant HTML and running a highly efficient, incremental data sync.

HTTP HeadersBandwidth OptimizationCache ValidationNetwork Efficiency304 Not Modified
// 02 — definitions

Fetch only
what changed.

The mechanics of HTTP cache validation and how scrapers use it to drastically reduce bandwidth and server load on target sites.

Ask a DataFlirt engineer →

TL;DR

A Conditional GET uses the If-None-Match (ETag) or If-Modified-Since headers to validate a cached response. If the content is unchanged, the server replies with a 304 Not Modified and an empty body. It's the most effective way to run high-frequency incremental crawls without burning proxy bandwidth or triggering rate limits.

01Definition & structure
A Conditional GET is a standard HTTP request that includes validation headers—most commonly If-None-Match (containing an ETag) or If-Modified-Since (containing a timestamp). The server evaluates these headers against the current state of the requested resource. If the resource has not changed, the server aborts the data transfer and returns a 304 Not Modified status code with an empty body, instructing the client to use its cached copy.
02How it works in practice
The workflow requires two steps. First, the scraper makes a standard GET request. The server returns a 200 OK along with the full payload and an ETag header (e.g., "v1-abc"). The scraper caches the payload and the ETag. On the next crawl cycle, the scraper sends a GET request with the header If-None-Match: "v1-abc". The server hashes its current content; if it still equals "v1-abc", it sends back a 304. The scraper then passes its cached payload to the extraction layer.
03Strong vs. Weak ETags
ETags come in two variants. A strong ETag (e.g., "33a64df5") guarantees that the resource is byte-for-byte identical to the cached version. A weak ETag (prefixed with W/, e.g., W/"33a64df5") indicates that the resource is semantically equivalent, even if the exact bytes differ (for example, if the server applied a different compression algorithm). For data extraction purposes, weak ETags are perfectly acceptable and should always be utilized.
04How DataFlirt handles it
We maintain a distributed, Redis-backed ETag registry across our worker fleet. When polling product availability or news feeds every few minutes, our workers automatically attach the latest known ETag for that URL. Across our high-frequency pipelines, over 85% of requests result in 304s. This saves terabytes of proxy egress bandwidth and ensures our crawlers are viewed as polite, low-impact traffic by target CDNs.
05The anti-bot tracking risk
Because ETags are unique strings stored by the client and sent back to the server, they can be weaponized as "supercookies." If an anti-bot system issues a unique ETag to your scraper on IP Address A, and you later send that same ETag while routing through IP Address B, the anti-bot system instantly links the two IPs to the same scraping session. Managing ETag state boundaries is critical when rotating proxy identities.
// 03 — bandwidth math

The economics of
conditional requests.

Bandwidth savings scale linearly with the cache hit rate, but the real value is in proxy cost reduction and target server goodwill. Here is how we model incremental crawl efficiency.

Bandwidth Saved = Bsaved = Requests · HitRate · (Size200Size304)
A 304 response is typically under 300 bytes, saving 99% of the payload size. Network Optimization Model
Effective Scrape Latency = Leff = (HitRate · RTT304) + ((1HitRate) · RTT200)
304s bypass database queries and template rendering on the target server, dropping RTT significantly. DataFlirt Pipeline Metrics
DataFlirt 304 Ratio = R304 = Responses304 / Requeststotal
Target > 0.85 for high-frequency pricing feeds to minimize egress costs. Internal SLO
// 04 — the wire trace

A 304 Not Modified
handshake.

Trace of an incremental pricing scraper hitting an e-commerce endpoint. The first request pulls the full payload; the second, 15 minutes later, validates the cache.

HTTP/2ETag304 Not Modified
edge.dataflirt.io — live
CAPTURED
// Request 1: Initial Fetch
> GET /api/v1/products/sku-992 HTTP/2
< HTTP/2 200 OK
< ETag: "33a64df551425fcc55e4d42a148795d9f25f89d4"
< Content-Length: 142050
[Body downloaded: 142 KB]

// Request 2: 15 minutes later
> GET /api/v1/products/sku-992 HTTP/2
> If-None-Match: "33a64df551425fcc55e4d42a148795d9f25f89d4"
< HTTP/2 304 Not Modified
< ETag: "33a64df551425fcc55e4d42a148795d9f25f89d4"
< Content-Length: 0
[Body downloaded: 0 KB] // bandwidth saved
pipeline.status: cache validated, no schema extraction needed
// 05 — implementation failures

Why conditional
GETs fail.

Ranked by frequency across DataFlirt's pipeline audits. Implementing ETags incorrectly doesn't just waste bandwidth — it can leak your scraper's identity to anti-bot systems.

PIPELINES AUDITED ·  ·    140+
CACHE MISS RATE ·  ·  ·   avg 62% unoptimized
UPDATED ·  ·  ·  ·  ·  ·  2026-05-19
01

ETag tracking across IPs

Identity leak · Sending a known ETag from a fresh residential IP links the sessions
02

Dynamic ad/token injection

Cache busting · CSRF tokens or timestamps in HTML change the ETag every request
03

Ignoring Weak ETags (W/)

Missed cache hits · Failing to parse or send weak ETags drops the 304 hit rate
04

Missing If-Modified-Since

Protocol downgrade · Not all servers support ETags; fallback timestamps are required
05

Gzip vs Brotli mismatch

Encoding errors · Accept-Encoding changes alter the ETag hash on strict servers
// 06 — our architecture

Stateful caching,

in a stateless proxy network.

To use Conditional GETs effectively, your scraper needs memory. But tying an ETag to a specific proxy IP creates a tracking vector for anti-bot systems. DataFlirt decouples cache state from network state. We maintain a centralized ETag registry per target domain. When a worker requests a URL, it pulls the latest known ETag from Redis, attaches it to the If-None-Match header, and routes the request through a fresh residential IP. If the target returns a 304, we serve the cached 200 OK payload to the extraction layer. We get the bandwidth savings of a cache with the anonymity of a stateless crawler.

ETag Registry Lookup

Worker state during an incremental catalog sync.

target.url /category/industrial-valves
redis.etag_hit W/5f4a-3b9c8d
proxy.ip 103.45.x.xresidential
request.header If-None-Match: W/5f4a-3b9c8d
response.status 304 Not Modified
bandwidth.saved 2.4 MB
extraction.source redis_cache_payload

Stay ahead of the pipeline

Data engineering
intel, weekly.

Anti-bot shifts, scraping infrastructure updates, dataset delivery patterns, and business outcomes from our pipelines. Short, technical, no fluff.

// 07 — FAQ

Common
questions.

Common questions about cache validation, ETag tracking risks, and how to optimize bandwidth for high-frequency data pipelines.

Ask us directly →
What is the difference between an ETag and Last-Modified? +
An ETag is a unique identifier (usually a hash) representing a specific version of a resource. Last-Modified is simply a timestamp. ETags are more precise because they catch content changes that happen within the same second, and they don't rely on clock synchronization between the client and server.
Can anti-bot systems use ETags to track scrapers? +
Yes. This is known as "ETag tracking" or "supercookies." If you receive an ETag on IP A, and then send that exact ETag in an If-None-Match header from IP B, the server instantly knows IP A and IP B are the same client. We mitigate this by grouping ETag usage by proxy subnet or clearing the cache registry when rotating identities.
Why does my target always return 200 OK even with the right ETag? +
Dynamic content injection. If the server embeds a unique CSRF token, a timestamp, or rotating ad scripts directly into the HTML, the payload's hash changes on every single request. The server generates a new ETag, compares it to yours, sees a mismatch, and returns a 200 OK. In these cases, ETags are useless for HTML and should only be used for API JSON responses.
How does DataFlirt handle ETags for high-frequency pricing feeds? +
We use them aggressively. For API targets that support strict cache validation, we poll every 60 seconds using ETags. This drops our effective bandwidth consumption by over 90% and keeps our request footprint incredibly light, allowing us to stay well under target rate limits while delivering near real-time price updates.
Is it legal to cache scraped data using ETags? +
Caching for the purpose of minimizing server load is standard HTTP behavior and is generally viewed favorably as "polite" crawling. It demonstrates that your scraper is actively trying to reduce the burden on the target's infrastructure. However, standard data retention and copyright policies still apply to the cached payload itself.
Should my scraper send Weak ETags (W/)? +
Yes. Weak ETags (prefixed with W/) indicate that the semantic content is the same, even if the byte-for-byte representation differs slightly (e.g., due to dynamic compression or minor layout shifts). They are highly effective for data scraping where you only care if the core data has changed, not the exact byte stream.
$ dataflirt scope --new-project --target=conditional-get-(etag) READY

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off catalogue dump or a continuous feed across millions of records — we scope, build, and operate the pipeline.

hello@dataflirt.com  ·  Bengaluru  ·  IST  ·  typical reply < 4h