← Glossary / Prometheus Metrics

What is Prometheus Metrics?

Prometheus metrics are the time-series telemetry data emitted by scraping workers, proxy gateways, and extraction pipelines to track system health in real-time. In a scraping context, they move observability from binary "did it run?" to dimensional "what is the 95th percentile latency of our residential proxy pool targeting this specific ASN?" Without them, pipeline failures are silent and debugging is guesswork.

ObservabilityTime-SeriesAlertingInfrastructureGrafana
// 02 — definitions

Measure the
machine.

How scraping fleets expose their internal state to a central time-series database, enabling real-time alerting before data quality degrades.

Ask a DataFlirt engineer →

TL;DR

Prometheus is a pull-based monitoring system that scrapes HTTP endpoints exposed by your scraping workers. It stores metrics as time-series data with key-value labels, allowing engineers to slice block rates by target domain, proxy provider, or scraper version. It is the industry standard for scraping infrastructure observability.

01Definition & structure
Prometheus metrics are numerical measurements recorded over time. In a scraping architecture, every worker, proxy router, and queue manager exposes an HTTP endpoint (usually /metrics) containing its current state. A central Prometheus server periodically pulls this data, storing it as a time series. Each metric has a name and a set of key-value labels (e.g., status="200", domain="example.com") that allow engineers to filter and aggregate the data.
02How it works in practice
When a Python or Node.js scraper runs, it uses a Prometheus client library to increment counters (e.g., requests made) or update gauges (e.g., memory used). Every 15 seconds, the Prometheus server hits the worker's /metrics endpoint, ingests the current numbers, and evaluates alerting rules. If a rule condition is met—like the error rate exceeding 5% for three consecutive minutes—it fires an alert to Alertmanager, which routes it to Slack or PagerDuty.
03The four metric types
Prometheus defines four core metric types. Counters track cumulative events (total pages scraped). Gauges track current state (active headless browsers). Histograms sample observations into configurable buckets (proxy latency, allowing you to calculate the 95th percentile). Summaries are similar to histograms but calculate quantiles on the client side, which is computationally cheaper but cannot be aggregated across multiple workers.
04How DataFlirt handles it
We instrument every layer of our stack. Our proxy gateways expose metrics on TLS handshake times and ASN success rates. Our extraction workers expose metrics on schema validation failures and DOM parsing times. This telemetry flows into a centralized VictoriaMetrics cluster (a highly scalable Prometheus alternative), powering our internal Grafana dashboards and driving the auto-scaling logic that spins up new workers when queue depths cross threshold.
05The silent failure trap
A common mistake is monitoring HTTP status codes but ignoring extraction success. If a target site changes its layout, your scraper might still receive a 200 OK, but extract null values. To Prometheus, the pipeline looks perfectly healthy. You must instrument business logic metrics—like fields_extracted_total or null_value_ratio—to catch schema drift before you deliver empty datasets to your clients.
// 03 — the queries

How we query
pipeline health.

PromQL (Prometheus Query Language) allows us to aggregate millions of data points across the fleet. These are the baseline queries DataFlirt uses to monitor pipeline stability.

Global Error Rate = sum(rate(scrape_requests_total{status=~"5.."}[5m])) / sum(rate(scrape_requests_total[5m]))
Percentage of 5xx errors over a 5-minute rolling window. Standard PromQL
Proxy Block Rate = rate(proxy_responses_total{status="403", provider="X"}[1m])
Tracks 403 Forbidden responses per proxy provider to detect IP bans. DataFlirt proxy router
Queue Backlog Growth = deriv(scraper_queue_depth[10m]) > 0
Alerts if the URL queue is growing faster than workers can process it. DataFlirt scheduler alerts
// 04 — the /metrics endpoint

What a worker
exposes.

A raw HTTP GET request to a DataFlirt scraping worker's Prometheus endpoint. This text format is parsed by the Prometheus server every 15 seconds.

text/plainPrometheus exposition formatport 9090
edge.dataflirt.io — live
CAPTURED
# HELP scrape_requests_total Total HTTP requests made by the worker
# TYPE scrape_requests_total counter
scrape_requests_total{target="example.com",status="200"} 14582
scrape_requests_total{target="example.com",status="403"} 12

# HELP proxy_latency_seconds Histogram of proxy response times
# TYPE proxy_latency_seconds histogram
proxy_latency_seconds_bucket{provider="res_pool_a",le="0.5"} 8430
proxy_latency_seconds_bucket{provider="res_pool_a",le="1.0"} 12045
proxy_latency_seconds_sum{provider="res_pool_a"} 6421.5

# HELP worker_memory_bytes Current RAM usage
# TYPE worker_memory_bytes gauge
worker_memory_bytes 482344960
// 05 — key telemetry

The metrics that
actually matter.

Not all metrics are created equal. In a distributed scraping environment, these are the five signals that most reliably predict pipeline failure before data delivery is impacted.

METRICS RETAINED ·  ·  ·  15 days
SCRAPE INTERVAL ·  ·  ·   15s
UPDATED ·  ·  ·  ·  ·  ·  2026-05-19
01

Block Rate (403/429)

critical · Immediate indicator of anti-bot detection or IP pool exhaustion.
02

Queue Depth

capacity · Measures throughput vs capacity. Growing queues mean missed SLAs.
03

Extraction Success Rate

quality · Tracks schema drift. 200 OKs mean nothing if fields are null.
04

Proxy Latency (p95)

network · Early warning for proxy network congestion or routing issues.
05

Worker Memory Usage

system · Detects memory leaks in headless browser instances.
// 06 — observability stack

High cardinality,

without the explosion.

Scraping generates massive cardinality. If you label every metric with the specific URL scraped, your Prometheus instance will crash within an hour. DataFlirt aggregates metrics at the domain, pipeline, and proxy-provider level. We use Prometheus for real-time alerting and short-term operational dashboards, while pushing raw, high-cardinality execution logs to ClickHouse for deep forensic analysis. Metrics tell us something is broken; logs tell us exactly what.

Prometheus Alerting Rule

A critical alert configuration for proxy pool health.

alert.name HighProxyBlockRate
expr rate(proxy_responses{status="403"}[5m]) > 0.05
for 3m
labels.severity critical
annotations.summary Proxy pool {{ $labels.provider }} is burning.
action.webhook trigger_pool_rotation

Stay ahead of the pipeline

Data engineering
intel, weekly.

Anti-bot shifts, scraping infrastructure updates, dataset delivery patterns, and business outcomes from our pipelines. Short, technical, no fluff.

// 07 — FAQ

Common
questions.

About Prometheus architecture, cardinality limits, alerting strategies, and how DataFlirt monitors scraping fleets at scale.

Ask us directly →
What is cardinality explosion in Prometheus? +
Cardinality is the number of unique time series stored. Every unique combination of labels creates a new series. If you add a url label to your metrics and scrape 10 million unique URLs, you create 10 million time series. This will consume all available RAM and crash Prometheus. Always label by bounded dimensions like domain or worker_id, never by unbounded dimensions like url or session_id.
Why use Prometheus instead of just logging errors? +
Logs are expensive to aggregate in real-time. If you want to know the 5-minute rolling error rate across 1,000 workers, querying a log database requires scanning millions of text lines. Prometheus metrics are pre-aggregated in memory, making that exact query return in milliseconds. Metrics are for alerting; logs are for debugging.
How do short-lived scrapers report metrics? +
Prometheus is a pull-based system—it expects to scrape a long-running HTTP server. If you run ephemeral scrapers (like AWS Lambda functions or short cron jobs), they won't live long enough to be scraped. You must push their metrics to a Prometheus Pushgateway, which holds the metrics in memory until the main Prometheus server pulls them.
How does DataFlirt handle alerting? +
We route Prometheus alerts through Alertmanager to PagerDuty. We rely heavily on automated remediation: a 5% spike in 403s triggers a webhook that automatically rotates the proxy pool or shifts traffic to a different ASN. If the automated rotation fails to resolve the spike within 5 minutes, it pages an on-call engineer.
What is the difference between a Counter and a Gauge? +
A Counter is a metric that only goes up, like scrape_requests_total. You use the rate() function in PromQL to see how fast it's increasing. A Gauge is a metric that can go up and down, like worker_memory_bytes or active_browser_tabs. You query Gauges directly to see the current state.
How long should I retain scraping metrics? +
15 to 30 days is standard for operational metrics in Prometheus. It keeps the database fast and memory footprint low. For long-term trend analysis—like tracking block rates over a year to negotiate proxy contracts—you should downsample the data and ship it to a long-term storage system like Thanos, Cortex, or VictoriaMetrics.
$ dataflirt scope --new-project --target=prometheus-metrics READY

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off catalogue dump or a continuous feed across millions of records — we scope, build, and operate the pipeline.

hello@dataflirt.com  ·  Bengaluru  ·  IST  ·  typical reply < 4h