← Glossary / Dead Letter Queue

What is Dead Letter Queue?

Dead Letter Queue (DLQ) is a secondary message queue where a scraping pipeline routes failed jobs, unparseable payloads, and exhausted retries. Instead of silently dropping records when a target site changes its DOM or an API returns malformed JSON, the pipeline isolates the failure. It's the safety net that ensures data loss is explicit, measurable, and recoverable once the underlying extraction logic is patched.

Message QueuesKafka / RabbitMQError HandlingData EngineeringFault Tolerance

// 02 — definitions

Where failed
jobs go.

A dedicated holding pen for scraping tasks that cannot be processed successfully, preventing poison pills from blocking the main pipeline.

Ask a DataFlirt engineer →

TL;DR

A Dead Letter Queue captures messages that fail processing after a defined number of retries. In scraping, this usually means a URL that consistently times out, an HTML payload that fails schema validation, or a blocked request. It allows the main crawler to keep moving while engineers inspect and replay the failures.

01Definition & structure

A Dead Letter Queue (DLQ) is a specialized message queue used in distributed systems to store messages that cannot be processed successfully. In a scraping architecture, the main queue distributes URLs to workers. If a worker fails to fetch or parse the URL, it retries. If it fails repeatedly, or encounters a fatal error (like a missing CSS selector), the job is routed to the DLQ. This keeps the main queue clear of unprocessable tasks and provides a centralized location for debugging.

02How it works in practice

When a scraping job is dead-lettered, the worker attaches metadata explaining why it failed: the exception stack trace, the number of attempts, the HTTP status code, and the proxy IP used. Crucially, robust pipelines also save the raw HTML or JSON payload to object storage (like S3) and include the URI in the DLQ message. This allows engineers to inspect the exact payload that caused the failure without having to send another request to the target site.

03Poison pill messages

Without a DLQ, pipelines are vulnerable to "poison pills." If a target site returns a malformed 5GB JSON response that causes the parsing library to run out of memory, the worker crashes. The message broker assumes the worker died, so it requeues the job. Another worker picks it up and crashes. This cycle repeats until the entire cluster is down. A DLQ prevents this by enforcing a strict maximum retry limit, quarantining the poison pill after a few attempts.

04How DataFlirt handles it

We treat the DLQ as an operational backlog, not a graveyard. Every message routed to a DLQ triggers an aggregation script that groups failures by root cause (e.g., 400 messages failed due to .price-tag missing). If a group exceeds our alerting threshold, an engineer is paged. We patch the extraction schema, deploy the update, and run a replay script that processes the DLQ messages against the cached S3 payloads. The data is recovered seamlessly.

05The infinite retry trap

A common anti-pattern in amateur scraping scripts is infinite retries on 403 Forbidden or 503 Service Unavailable errors. If the target has deployed a new anti-bot challenge, retrying infinitely will just burn through your proxy pool and rack up compute costs. A DLQ forces you to acknowledge that the environment has changed and requires human intervention, rather than blindly hammering a closed door.

// 03 — routing logic

When does a job
become dead?

Routing to a DLQ is deterministic. A job is dead-lettered either because it exhausted its retry budget, or because it encountered a fatal error that retries cannot fix.

DLQ Routing Condition = if (retries > max_retries) OR (error_type == FATAL)

Fatal errors (e.g., schema mismatch) bypass retries and go straight to DLQ. Standard queue topology

Retry Backoff = delay = base · 2^attempt + jitter

Exponential backoff applied before a message is finally dead-lettered. Network resilience patterns

DLQ Volume Rate = V_dlq = failed_jobs / total_jobs

DataFlirt triggers an on-call alert if V_dlq exceeds 0.5% on any active pipeline. DataFlirt pipeline SLO

// 04 — pipeline trace

Isolating a schema
validation failure.

A worker fetches a product page, but the target site has changed its layout. The price selector fails. Instead of dropping the record, the worker saves the raw HTML and routes the job to the DLQ.

RabbitMQSchema ValidationS3 Payload Cache

edge.dataflirt.io — live

CAPTURED

// worker-04 processing job
job.id: "ext-7721-abc"
url: "https://target.com/product/9921"
fetch.status: 200 OK

// extraction phase
schema.validate: failed
error: "SelectorNotFound: .price-tag-main"
attempt: 3/3 // retries exhausted

// routing logic
action: "route_to_dlq"
dlq.topic: "scrape-failures-dlq"
dlq.reason: "schema_validation_failed"
dlq.payload_s3: "s3://df-raw-cache/ext-7721-abc.html.gz"

worker.status: ready for next job

// 05 — failure modes

Why jobs end up
in the DLQ.

Ranked by frequency across DataFlirt's infrastructure. Schema drift is the primary driver of DLQ volume, as it represents a permanent failure that retries cannot resolve.

PIPELINES · · · · · 300+ active

DLQ RATE · · · · · < 0.5% avg

UPDATED · · · · · · 2026-05-19

01

Schema drift / validation failure

fatal error · Target DOM changed, selectors broke

02

Persistent anti-bot blocks

retry exhausted · 403s or CAPTCHAs across all proxy rotations

03

Target server timeouts

retry exhausted · 504 Gateway Timeout after max backoff

04

Malformed API responses

fatal error · Truncated JSON or invalid XML payloads

05

Unresolvable DNS errors

retry exhausted · NXDOMAIN or connection refused

// 06 — architecture

Never drop a payload,

even if you can't parse it yet.

When a site deploys a new layout mid-crawl, the extraction layer will fail. If you drop the HTML, you lose the data forever. DataFlirt's architecture writes the raw response payload to S3 and routes the job metadata to a Dead Letter Queue. Once our engineers patch the selector, we replay the DLQ against the saved payloads. The client gets their data without us having to re-fetch the target, saving bandwidth and avoiding rate limits.

DLQ Monitor

Live status of a dead letter queue for a manufacturing catalog pipeline.

dlq.topic mfg-catalog-dlq

messages.queued 1,204

primary.cause schema_drift

payload.storage s3://df-raw-cache/100% hit rate

replay.status paused

engineer.assigned true

Stay ahead of the pipeline

Data engineering
intel, weekly.

Anti-bot shifts, scraping infrastructure updates, dataset delivery patterns, and business outcomes from our pipelines. Short, technical, no fluff.

// 07 — FAQ

Common
questions.

Common questions about queue topologies, error handling, and how DataFlirt recovers failed scraping jobs.

Ask us directly →

What is the difference between a retry queue and a DLQ? +

A retry queue holds messages that failed temporarily (like a 502 Bad Gateway) and will be processed again after a delay. A DLQ holds messages that have either exhausted their retry budget or encountered a fatal error (like a schema validation failure). The DLQ requires human intervention or a code deployment to resolve.

Should I automatically replay messages from the DLQ? +

No. If a message is in the DLQ, it means the automated systems already failed to process it. Replaying it without changing the code or the environment will just result in another failure, wasting compute and potentially triggering rate limits. Fix the underlying issue first, then replay.

How long should messages stay in the DLQ? +

Typically 7 to 14 days. This provides enough time for an engineer to investigate the failure, patch the scraper, and replay the queue. If messages sit longer than that, the data they represent is usually too stale to be valuable, and they should be purged to save storage costs.

Does DataFlirt charge for jobs that end up in the DLQ? +

No. Our pricing is based on successfully extracted and delivered records. If a job hits the DLQ due to schema drift or an anti-bot block, we absorb the compute and proxy costs. We only bill for the data once the issue is patched and the record is successfully recovered.

What is a 'poison pill' in a scraping queue? +

A poison pill is a message that consistently crashes the worker processing it (e.g., an out-of-memory error caused by an infinitely large payload). If you don't have a DLQ, the message goes back to the main queue, gets picked up by another worker, and crashes that one too, eventually taking down the whole cluster. The DLQ isolates poison pills.

How do you monitor the DLQ without getting alert fatigue? +

We alert on the rate of DLQ routing, not absolute numbers. A few network timeouts hitting the DLQ is normal background noise. But if the DLQ routing rate suddenly spikes from 0.1% to 15%, it indicates a systemic issue like a site layout change or a new WAF rule. That triggers an immediate page to the on-call engineer.

$ dataflirt scope --new-project --target=dead-letter-queue READY

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off catalogue dump or a continuous feed across millions of records — we scope, build, and operate the pipeline.

Start a pipeline → View pricing

hello@dataflirt.com · Bengaluru · IST · typical reply < 4h