← Glossary / HTTP 502 Bad Gateway

What is HTTP 502 Bad Gateway?

HTTP 502 Bad Gateway is a server-side error indicating that a proxy, edge server, or load balancer received an invalid response from the upstream origin server. In scraping pipelines, this rarely means the target site is completely down. Instead, it usually signals that your proxy node was blocked by the target's firewall, the target's rate-limiting dropped the connection abruptly, or the upstream server crashed while processing a heavy query. Misinterpreting a 502 as a generic retryable error leads to infinite loops and burned proxy IPs.

Scraping ErrorsNetwork LayerProxiesWAF BlocksStatus Codes
// 02 — definitions

The edge
failed upstream.

Why the server sitting between you and the target couldn't complete the request, and how to tell if it's your fault.

Ask a DataFlirt engineer →

TL;DR

A 502 means the edge server (like Cloudflare, AWS ALB, or your proxy gateway) reached out to the origin server but got garbage, a reset connection, or a timeout back. For scrapers, 502s are often stealth blocks — the WAF killed the connection, causing the edge to return a 502 to your client.

01Definition & structure

An HTTP 502 Bad Gateway error occurs in a distributed network architecture where a server acting as a gateway or proxy receives an invalid response from an inbound server. The request flow looks like this:

  • Your scraper sends a request to a Proxy/Edge Server.
  • The Proxy/Edge Server forwards the request to the Origin Server.
  • The Origin Server returns garbage data, resets the TCP connection, or closes the socket prematurely.
  • The Proxy/Edge Server cannot fulfill your request, so it returns a 502 to your scraper.
02Proxy pool exhaustion vs Target overload

When you see a 502, you must determine which gateway failed. If you are using a commercial proxy provider, the 502 might mean their proxy node failed to reach the target (often due to the proxy IP being banned). Alternatively, if the target uses Cloudflare, the 502 might come from Cloudflare indicating the target's actual origin server is overloaded and crashing. Inspecting the HTTP response headers (e.g., Server: cloudflare vs X-Proxy-Error) is critical for debugging.

03The stealth 502 block

Modern anti-bot systems (like Akamai BMP or DataDome) often avoid sending 403 Forbidden responses to known bots, as this confirms to the bot operator that their fingerprint is burned. Instead, they simply drop the TCP connection at the firewall level. If you are routing through a proxy, the proxy interprets this dropped connection as an upstream failure and returns a 502. If you only get 502s via your scraper but 200s in your local browser, you are being stealth-blocked.

04How DataFlirt handles it

We do not treat 502s as generic retryable errors. Our proxy gateway intercepts 502s and evaluates the context. If the error originates from our exit node failing to establish a TLS session with the target, we flag the IP as burned for that specific domain, quarantine it, and seamlessly retry the request on a new IP with a rotated fingerprint. The client pipeline never sees the 502; it only sees a slightly delayed 200 OK.

05Did you know?

A 502 Bad Gateway can sometimes be caused by a simple HTTP header mismatch. If your scraper sends a request with a payload but omits the Content-Length header, or sends a malformed Transfer-Encoding header, strict upstream servers will instantly terminate the connection, resulting in a 502 at the edge. Always ensure your HTTP client library is constructing RFC-compliant requests.

// 03 — retry logic

When to retry
a 502 error?

Not all 502s are created equal. DataFlirt's proxy gateway analyzes the TCP state and response headers to determine if a 502 is a transient origin failure or a hard WAF block before triggering a retry.

Upstream Failure Rate = Ufail = 502_count / total_requests
If U_fail > 0.05 across all IPs, the origin is likely down. If isolated to specific IPs, it's a block. DataFlirt gateway heuristics
Exponential Backoff = Twait = 2attempt + jitter
Standard retry delay for genuine 502s to avoid thundering herd problems. Network resilience best practices
Proxy Burn Rate = B = 502_blocks / active_ips
High burn rate indicates your fingerprint or request rate is triggering upstream connection resets. DataFlirt fleet monitoring
// 04 — proxy gateway trace

A stealth block
masquerading as a 502.

A trace from a scraping worker hitting a target via a residential proxy. The target's WAF drops the connection due to a bad TLS fingerprint, causing the proxy gateway to return a 502 to the worker.

Proxy GatewayWAF DropTCP RST
edge.dataflirt.io — live
CAPTURED
// 1. Worker initiates request to proxy gateway
worker.req: GET https://target.com/api/pricing
proxy.assign: res_ip_104.28.x.x

// 2. Proxy attempts upstream connection
upstream.dns: resolved 192.0.2.44
upstream.tcp: connected
upstream.tls: ClientHello sent (JA3: 771,4865...)

// 3. Target WAF detects bot fingerprint
upstream.recv: TCP RST (Connection reset by peer)
proxy.state: upstream_connection_failed

// 4. Proxy gateway responds to worker
response.status: 502 Bad Gateway
response.header: X-Proxy-Error: upstream_reset
worker.action: quarantine IP, rotate fingerprint, retry
// 05 — failure modes

Why the origin
dropped the ball.

Ranked by frequency of occurrence across DataFlirt's scraping pipelines. While genuine server overloads happen, connection resets triggered by anti-bot systems are the dominant cause of 502s in scraping.

PIPELINES MONITORED ·   300+ active
502 INCIDENTS ·  ·  ·  ·  30d trailing
UPDATED ·  ·  ·  ·  ·  ·  2026-05-19
01

WAF connection reset (Stealth Block)

% of 502s · Target firewall drops TCP connection abruptly
02

Origin server overload

% of 502s · Target database locked or workers exhausted
03

Proxy pool IP banned

% of 502s · Target edge refuses to route the specific IP
04

Malformed upstream response

% of 502s · Origin sent invalid HTTP headers to the edge
05

TLS negotiation failure

% of 502s · Cipher mismatch between proxy and origin
// 06 — our architecture

Don't just retry,

rotate the exit node and back off.

Blindly retrying a 502 Bad Gateway on the same proxy IP is a great way to get that IP permanently banned. DataFlirt's infrastructure treats 502s as context-dependent signals. If the 502 comes from a public edge like Cloudflare, we assume the origin is genuinely struggling and apply exponential backoff. If the 502 comes from our own proxy gateway with an upstream_reset flag, we assume the target's WAF dropped the connection. We immediately quarantine the exit node and retry the request on a fresh residential IP with a different TLS fingerprint.

proxy-gateway.log

Real-time routing decision after a 502 upstream failure.

event.type upstream_502
target.host api.target-ecommerce.com
exit_node.ip 104.28.x.xquarantined
failure.reason tcp_connection_reset
heuristic.eval waf_stealth_block
retry.strategy rotate_ip_and_fingerprint
retry.status 200 OK

Stay ahead of the pipeline

Data engineering
intel, weekly.

Anti-bot shifts, scraping infrastructure updates, dataset delivery patterns, and business outcomes from our pipelines. Short, technical, no fluff.

// 07 — FAQ

Common
questions.

Common questions about diagnosing, handling, and mitigating 502 Bad Gateway errors in production scraping pipelines.

Ask us directly →
What is the difference between a 502 and a 504 error? +
A 502 Bad Gateway means the edge server received an invalid response (or a connection reset) from the origin. A 504 Gateway Timeout means the edge server received no response from the origin within its allowed time window. 502s are often active rejections; 504s are usually passive timeouts.
Should my scraper automatically retry a 502 error? +
Yes, but conditionally. If you are using a proxy network, a 502 often means that specific proxy IP failed to connect. You should retry, but only after rotating to a new IP. Retrying a 502 on the exact same IP and fingerprint usually results in another 502 and increases your risk of a subnet ban.
Why do I get 502s when scraping, but the site works fine in my browser? +
This is the classic signature of a stealth block. The target's Web Application Firewall (WAF) detects your scraper's TLS fingerprint, missing headers, or datacenter IP. Instead of serving a 403 Forbidden or a CAPTCHA, it simply drops the TCP connection. Your proxy server sees the dropped connection and returns a 502 to your script.
Can a 502 be caused by scraping too fast? +
Absolutely. If you exceed the target origin's database or worker capacity, the origin will crash or hang, causing the load balancer to return a 502. This is why DataFlirt enforces strict concurrency limits based on target capacity. Crashing the target is not just bad etiquette; it halts your data extraction.
How does DataFlirt distinguish between a dead site and a blocked proxy? +
We monitor the error distribution across our fleet. If a target returns 502s across 100 different residential IPs simultaneously, the origin is down. If it returns 502s on 5 IPs but 200 OKs on the other 95, those 5 IPs were blocked. Our scheduler automatically quarantines the blocked IPs and routes traffic to the healthy ones.
How do 502s impact pipeline SLAs? +
Transient 502s are absorbed by our retry queues and rarely impact delivery SLAs. However, if a target deploys a new WAF rule that causes a fleet-wide spike in 502s, our anomaly detection pauses the pipeline within minutes. We patch the fingerprinting logic and resume the crawl, typically resolving the issue well within the 24-hour SLA window.
$ dataflirt scope --new-project --target=http-502-bad-gateway READY

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off catalogue dump or a continuous feed across millions of records — we scope, build, and operate the pipeline.

hello@dataflirt.com  ·  Bengaluru  ·  IST  ·  typical reply < 4h