← Glossary / HTTP 504 Gateway Timeout

What is HTTP 504 Gateway Timeout?

HTTP 504 Gateway Timeout occurs when a server acting as a gateway or proxy fails to receive a timely response from an upstream server it needs to access in order to complete the request. In scraping pipelines, this usually means the target's backend database is buckling under load, or an anti-bot tarpit is intentionally holding the connection open until the load balancer drops it. Misinterpreting a 504 as a simple network blip leads to aggressive retries that trigger permanent IP bans.

Scraping ErrorsProxy TimeoutUpstream FailureTarpittingLoad Balancer
// 02 — definitions

When the upstream
goes dark.

The difference between a client-side timeout and a 504 is who gave up first. A 504 means the edge proxy survived, but the backend didn't.

Ask a DataFlirt engineer →

TL;DR

An HTTP 504 means the target's reverse proxy (like Cloudflare, AWS ALB, or Nginx) timed out waiting for the origin server to generate the page. For scrapers, it's a critical signal: the target infrastructure is struggling. Blindly retrying 504s at the same concurrency will either take the target offline or get your proxy pool blacklisted.

01Definition & structure
An HTTP 504 Gateway Timeout is an error response indicating that a server acting as a gateway or proxy did not receive a timely response from an upstream server. In modern web architecture, requests hit an edge proxy (like a CDN or load balancer) which then forwards the request to the origin server. If the origin server hangs—due to a heavy database query, resource exhaustion, or a crashed worker process—the edge proxy will eventually sever the connection and return a 504 to the client.
02How it works in practice
When your scraper sends a request, it connects to the target's edge (e.g., Cloudflare). Cloudflare opens a connection to the target's backend. If the backend takes too long to generate the HTML or JSON, Cloudflare hits its internal timeout limit (often 60 or 100 seconds). Cloudflare then closes the connection to the backend and sends a 504 HTML page back to your scraper. The error is generated by the edge, not the origin.
03Tarpitting vs. Overload
Not all 504s are accidental. Advanced anti-bot systems use tarpitting: when they detect a scraper, instead of blocking it with a 403, they intentionally route the request to a black hole that holds the TCP connection open. This ties up your scraper's worker threads. Eventually, the load balancer times out and returns a 504. If you see 504s only on specific proxy IPs or specific user-agents, you are likely being tarpitted, not overloading the server.
04How DataFlirt handles it
We treat 504s as a critical backpressure signal. Our fleet uses distributed circuit breakers. If a target domain starts throwing 504s, the circuit opens: we immediately halve the concurrency budget for that target globally across all clients. We then send a single probe request every 60 seconds. Once the probe returns a 200 OK, the circuit half-closes, and we slowly ramp concurrency back up. This prevents our fleet from accidentally DDoS-ing a struggling target.
05The retry trap
The most common mistake junior engineers make is treating a 504 like a 502 or a network blip, retrying immediately. If the origin server is hanging because a database query is locked, sending the exact same request again just queues up another locked query. This creates a cascading failure that takes the target offline. Always use exponential backoff, and always reduce concurrency when 504s spike.
// 03 — timeout math

Calculating the
retry threshold.

A 504 requires a backoff strategy that accounts for backend recovery time. DataFlirt's circuit breaker models upstream health before allowing retries.

Exponential Backoff = Tretry = base · 2attempt + jitter
Standard backoff. Jitter prevents thundering herd problems when multiple workers retry. Network Reliability 101
Concurrency Step-Down = Cnew = Ccurrent · 0.5
Halve concurrency immediately on 504 clusters to allow the origin to recover. DataFlirt circuit breaker logic
Gateway Timeout Window = T504 = min(proxy_timeout, origin_timeout)
The binding constraint. Cloudflare defaults to 100s for Enterprise, 60s for Free. CDN Specifications
// 04 — the edge trace

Watching the origin
bleed out.

A trace of a scraper hitting a target's Cloudflare edge. The edge accepts the connection, but the origin server hangs on a heavy database query.

Cloudflare EdgeOrigin TimeoutCircuit Breaker
edge.dataflirt.io — live
CAPTURED
// connection established to edge
edge.handshake: ok // TLS 1.3
edge.route: "sfo-02"

// edge forwarding to origin
origin.ip: "192.0.2.44"
origin.status: pending...
timer.elapsed: 15.0s
timer.elapsed: 30.0s
timer.elapsed: 60.0s
edge.timeout_reached: true // 60s limit hit

// response generated by edge
response.status: 504 Gateway Timeout
response.server: "cloudflare"

// scraper circuit breaker
worker.action: drop_connection
scheduler.concurrency: reduced by 50%
// 05 — upstream failure modes

Why the origin
stops responding.

Ranked by frequency across DataFlirt's monitoring of 504 errors. Most 504s in scraping are self-inflicted by aggressive concurrency hitting unoptimized backend queries.

504 INCIDENTS ·  ·  ·  ·  1.2M / month
AVG RECOVERY ·  ·  ·  ·   45 seconds
UPDATED ·  ·  ·  ·  ·  ·  2026-05-19
01

Heavy DB queries

89% of 504s · Unindexed search or complex filters hanging the origin
02

Anti-bot tarpitting

72% of 504s · Intentional delay to exhaust scraper concurrency
03

Origin resource exhaustion

64% of 504s · CPU/RAM spikes causing worker thread starvation
04

Third-party API failure

41% of 504s · Target's own downstream dependencies timing out
05

WAF silent drops

28% of 504s · Firewall drops packets to origin, edge waits forever
// 06 — circuit breaking

Respect the origin,

or lose the pipeline entirely.

When a target throws a 504, it is begging for mercy. DataFlirt implements distributed circuit breakers across our fleet. If a target domain returns a cluster of 504s, we don't just back off the failing worker — we globally throttle the concurrency budget for that target across all active jobs. Pushing through a 504 storm doesn't get you data faster; it gets the target's sysadmin paged, which inevitably leads to a hard IP ban.

Circuit Breaker State

Live telemetry from a worker node handling a 504 cascade.

target.domain ecom-target.com
error.rate_1m 14%elevated
circuit.state OPEN
concurrency.limit 2 workers
backoff.strategy exponential_jitter
next_probe_in 45s
pipeline.status degraded_but_safe

Stay ahead of the pipeline

Data engineering
intel, weekly.

Anti-bot shifts, scraping infrastructure updates, dataset delivery patterns, and business outcomes from our pipelines. Short, technical, no fluff.

// 07 — FAQ

Common
questions.

About gateway timeouts, retry strategies, tarpitting, and how DataFlirt manages upstream load.

Ask us directly →
What is the difference between a 504 and a 503 error? +
A 503 Service Unavailable means the origin server knows it is overloaded and immediately rejects the request. A 504 Gateway Timeout means the origin accepted the request but hung, forcing the edge proxy (like Cloudflare) to kill the connection after a set time (usually 60-100 seconds).
What is the difference between a 504 and a client read timeout? +
A client read timeout means your scraper gave up waiting. A 504 means the target's load balancer gave up waiting for its own backend. If you set your scraper's timeout to 30s, but the edge proxy's timeout is 60s, you will see a read timeout locally while the edge proxy eventually throws a 504.
Should I retry a 504 immediately? +
No. Immediate retries compound the origin load. If a database query is hanging, sending the same query again just adds another locked thread to the backend. Use exponential backoff with jitter, and reduce your overall concurrency until the 504s stop.
Can anti-bot systems fake a 504? +
Yes. This is called tarpitting. Instead of blocking you with a 403, the anti-bot system intentionally holds the connection open without sending data. This wastes your scraper's concurrency budget. Eventually, the edge proxy hits its timeout limit and returns a 504.
How does DataFlirt handle 504s at scale? +
We use global circuit breakers. If one worker sees a spike in 504s, the entire fleet steps down concurrency for that specific domain. We probe the target with a single worker every few minutes; once the origin recovers, we gradually ramp concurrency back to the baseline SLO.
Why do I only get 504s on specific search pages? +
This usually points to unindexed database queries on the target's backend. The edge proxy has a fixed timeout (e.g., 60 seconds). If the database takes 65 seconds to run a complex filter combination, the proxy will return a 504 every single time you hit that URL, regardless of your scrape rate.
$ dataflirt scope --new-project --target=http-504-gateway-timeout READY

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off catalogue dump or a continuous feed across millions of records — we scope, build, and operate the pipeline.

hello@dataflirt.com  ·  Bengaluru  ·  IST  ·  typical reply < 4h