← Glossary / IP Whitelisting

What is IP Whitelisting?

IP whitelisting is a default-deny network security model where access to an API, database, or server is restricted to a predefined list of trusted IP addresses. In data extraction pipelines, it is the standard mechanism for securing egress sinks — ensuring that when a scraper writes records directly to your internal data warehouse or webhook endpoint, the traffic is provably originating from your scraping infrastructure and not the public internet.

Network SecurityEgressCIDRData DeliveryZero Trust
// 02 — definitions

Trust by
address.

Why static IP assignments remain the bedrock of secure data delivery, even in modern zero-trust architectures.

Ask a DataFlirt engineer →

TL;DR

IP whitelisting flips the standard web model from "allow all, block bad" to "block all, allow known." For data engineering teams, it's the non-negotiable first layer of defense when exposing internal endpoints to external scraping vendors like DataFlirt.

01Definition & structure
IP whitelisting (or allowlisting) is a network-layer security control that operates on a default-deny principle. Instead of trying to identify and block malicious traffic, the firewall drops all incoming connections by default, only permitting traffic from a specific list of IP addresses or CIDR blocks. It operates at Layer 3 (Network) or Layer 4 (Transport) of the OSI model, making it extremely fast and computationally cheap to enforce.
02How it works in practice
When a scraping pipeline finishes extracting data, it often needs to push that data to a client's internal system (like a REST API or a database). When the HTTP request hits the client's load balancer or WAF, the firewall inspects the source IP packet header. If the IP is on the whitelist, the packet is passed to the application layer. If it's not, the connection is immediately dropped or rejected with a 403 Forbidden, before the application ever sees the payload.
03The ingress vs egress divide
A common point of confusion in data engineering is the difference between ingress and egress IPs. Ingress is the scraper fetching data from a target website — this requires millions of rotating residential IPs to avoid anti-bot blocks. Egress is the scraper delivering the extracted data to you — this requires a single, static IP so you can whitelist it. You can never whitelist a scraper's ingress proxy pool.
04How DataFlirt handles it
We maintain strict physical and logical separation between our scraping network and our delivery network. All data delivery traffic from DataFlirt pipelines is routed through dedicated NAT gateways with static Elastic IPs. For enterprise clients, we provision isolated /29 CIDR blocks that are never shared with other tenants, ensuring your firewall rules are tightly scoped to your specific pipeline traffic.
05Did you know?
While IP spoofing (forging the source IP address in a packet) is possible, it is practically useless for establishing a TCP connection — like an HTTP webhook delivery. Because TCP requires a three-way handshake (SYN, SYN-ACK, ACK), the server's response goes to the spoofed IP, not the attacker, meaning the attacker can never complete the handshake to send the actual data payload.
// 03 — the math

Calculating
allowlist capacity.

CIDR notation defines the size of the IP block you need to whitelist. DataFlirt provisions dedicated /29 or /28 egress blocks for enterprise pipelines to ensure stable delivery.

Usable IPs in CIDR = 2(32prefix)2
Subtract network and broadcast addresses. RFC 4632
DataFlirt /29 Egress Block = 2(3229)2 = 6 IPs
Standard dedicated delivery pool. DataFlirt Infra
Delivery Success Rate = whitelisted_writes / total_delivery_attempts
Drops below 1.0 if IPs rotate without updating the firewall. Pipeline SLO
// 04 — firewall logs

A webhook delivery,
blocked then allowed.

What happens when a scraping pipeline attempts to push data to a client's API before and after the egress IP is added to the AWS WAF allowlist.

AWS WAFWebhookCIDR /32
edge.dataflirt.io — live
CAPTURED
// Attempt 1: IP not whitelisted
src_ip: "203.0.113.42"
dest: "api.client.com/v1/ingest"
waf.rule: "Default_Deny_All"
action: BLOCK // HTTP 403 Forbidden

// Infra update: Client adds DataFlirt egress IP
aws.waf.update_ipset: "DataFlirt_Egress"
added_cidr: "203.0.113.42/32"

// Attempt 2: Retry queue fires
src_ip: "203.0.113.42"
dest: "api.client.com/v1/ingest"
waf.rule: "Allow_DataFlirt_Egress"
action: ALLOW
response: 201 Created // 1,400 records written
// 05 — implementation risks

Where whitelisting
breaks pipelines.

Static IPs are secure but brittle. When data delivery fails due to network layer blocks, it is almost always an operational breakdown in IP lifecycle management.

DELIVERY FAILURES ·  ·    12% network-related
ROOT CAUSE ·  ·  ·  ·  ·  IP rotation
UPDATED ·  ·  ·  ·  ·  ·  2026-05-19
01

Unannounced IP rotation

Silent failure · Vendor changes egress IPs without notifying client.
02

Dynamic IP environments

Architecture mismatch · Serverless scrapers lack static IPs by default.
03

Over-permissive CIDRs

Security risk · Whitelisting a whole AWS region (/16) instead of a specific NAT gateway.
04

Stale IP retention

Security risk · Failing to remove IPs after a vendor contract ends.
05

Proxy pool leakage

Data contamination · Accidentally delivering data through the scraping proxy pool instead of the egress NAT.
// 06 — DataFlirt's egress architecture

Scrape dynamic,

deliver static.

Scraping requires millions of rotating residential IPs to avoid detection. Delivering that data requires the exact opposite: a stable, predictable, static IP that your firewall can trust. DataFlirt strictly separates these two network paths. Our workers fetch target data through dynamic proxy pools, but when it's time to push the structured records to your Snowflake instance or REST API, the traffic is routed through a dedicated, static NAT gateway assigned exclusively to your pipeline.

Egress network routing

Traffic flow for a pipeline pushing data to a client's internal API.

worker.fetch_ip dynamic residential pool
target.response 200 OK
internal.routing VPC peering to egress NAT
egress.static_ip 203.0.113.42
client.firewall IP match
delivery.status records written

Stay ahead of the pipeline

Data engineering
intel, weekly.

Anti-bot shifts, scraping infrastructure updates, dataset delivery patterns, and business outcomes from our pipelines. Short, technical, no fluff.

// 07 — FAQ

Common
questions.

About static IPs, serverless delivery, securing webhooks, and how DataFlirt manages egress networking.

Ask us directly →
Why can't I just whitelist your scraping proxies? +
You shouldn't, and you can't. Scraping proxies rotate constantly, often using millions of residential IPs. Whitelisting them means whitelisting a significant chunk of the public internet. Data delivery must always happen through a separate, static egress gateway.
How does DataFlirt provide static IPs if you run on serverless infrastructure? +
We route all outbound delivery traffic from our serverless workers through a managed NAT Gateway with an attached Elastic IP (EIP). The worker scales dynamically, but the client-facing IP remains absolutely static.
Is IP whitelisting enough to secure a webhook? +
No. IP whitelisting prevents unauthorized network access, but IPs can theoretically be spoofed (though difficult over TCP) or reassigned. You should always combine IP whitelisting with application-layer authentication, such as HMAC signatures or bearer tokens.
What happens if DataFlirt needs to change its egress IP? +
We provide a 14-day advance notice for any planned egress IP changes. For enterprise clients, we provision dedicated IPs that are never shared with other tenants and persist for the lifetime of the contract.
Can we whitelist a domain name instead of an IP? +
Some modern firewalls support FQDN (Fully Qualified Domain Name) whitelisting, but it relies on DNS resolution which can introduce latency and caching issues. IP-based rules at the network layer (Layer 3/4) are faster, more reliable, and universally supported.
Do I need to whitelist IPs if I'm pulling data from DataFlirt's S3 buckets? +
If you are pulling data, you don't need to whitelist our IPs on your end. However, you can configure the S3 bucket policy to only allow reads from your own corporate IPs, adding an extra layer of security to the data at rest.
$ dataflirt scope --new-project --target=ip-whitelisting READY

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off catalogue dump or a continuous feed across millions of records — we scope, build, and operate the pipeline.

hello@dataflirt.com  ·  Bengaluru  ·  IST  ·  typical reply < 4h