← Glossary / SFTP Data Delivery

What is SFTP Data Delivery?

SFTP data delivery is the secure, file-based transfer of extracted datasets from a scraping pipeline directly into a client's internal storage infrastructure. While APIs and cloud buckets dominate modern data engineering, SFTP remains the enterprise standard for legacy system integration, strict firewall compliance, and batch-oriented ingestion. If your downstream consumer is an on-premise data warehouse or a heavily regulated financial system, SFTP is often the only approved bridge between the public web and your private network.

Data DeliveryBatch ProcessingEnterprise IntegrationSSHFile Transfer
// 02 — definitions

The enterprise
bridge.

Why the oldest secure file transfer protocol is still the mandatory delivery mechanism for heavily regulated data pipelines.

Ask a DataFlirt engineer →

TL;DR

SFTP (SSH File Transfer Protocol) pushes flat files—usually CSV, JSON Lines, or Parquet—over an encrypted SSH tunnel. It lacks the real-time streaming capabilities of Kafka or the native cloud integration of S3, but its simplicity, firewall-friendly nature, and ubiquitous enterprise support make it the default for batch data delivery to banks, insurers, and legacy ERPs.

01Definition & structure
SFTP data delivery is the process of pushing extracted data files to a client's server using the SSH File Transfer Protocol. Unlike APIs that require the client to pull data, or cloud buckets that require IAM role configuration, SFTP is a push mechanism. The scraping provider authenticates via an SSH key, opens a secure tunnel, and writes flat files (CSV, JSONL, Parquet) directly to a specified directory on the client's infrastructure.
02How it works in practice
A typical SFTP delivery pipeline runs on a cron schedule. Once the data extraction and validation phases are complete, the pipeline compresses the output into a single file. A delivery worker connects to the client's SFTP server using a pre-shared Ed25519 or RSA key, uploads the file to an /inbound directory, and verifies the transfer. The client's internal systems then monitor that directory, pick up the new file, and ingest it into their data warehouse.
03The security and firewall advantage
SFTP's primary advantage is its simplicity from a network security perspective. It operates entirely over a single port (typically TCP 22). This makes it trivial for enterprise security teams to configure strict firewall rules: they simply allow inbound traffic on port 22 from the scraping provider's static IP addresses, and block everything else. There are no complex IAM policies, cross-account trusts, or dynamic port ranges to manage.
04How DataFlirt handles it
We treat SFTP delivery as a first-class citizen alongside S3 and Snowflake. Our delivery workers use static egress IPs for easy whitelisting, enforce atomic renames to prevent partial reads, and automatically compress payloads to minimize transfer time. If your server is down during a scheduled delivery, our infrastructure automatically queues the payload and applies exponential backoff retries, alerting your team if the outage exceeds the SLA window.
05SFTP vs FTPS vs FTP
They are entirely different protocols. FTP is unencrypted and obsolete. FTPS is FTP over SSL/TLS; it is secure but notoriously difficult to route through firewalls because it uses dynamic ports for data transfer. SFTP is a subsystem of SSH; it uses a single port, provides robust encryption, and is the modern standard for secure file transfer. Never accept FTPS when SFTP is an option.
// 03 — delivery metrics

How fast is
SFTP delivery?

SFTP throughput is bounded by network latency, SSH encryption overhead, and disk I/O on the receiving server. DataFlirt monitors these metrics to ensure batch SLAs are met.

Effective Throughput = Teff = WindowSize / (RTT + CryptoOverhead)
SSH encryption limits single-thread speed. High latency kills throughput. Network Engineering
Delivery Latency = L = ExtractTime + WriteTime + (FileSize / Teff)
Total time from pipeline completion to client availability. DataFlirt SLA model
Compression Ratio = C = RawSize / GzipSize
Compressing JSONL before SFTP transfer typically yields 6x–8x speedups. DataFlirt pipeline defaults
// 04 — the transfer log

Pushing 12GB of
pricing data.

A standard automated SFTP push from DataFlirt's delivery workers to a client's on-premise ingestion server.

SSH-2.0Ed25519AES-256-GCM
edge.dataflirt.io — live
CAPTURED
// connection init
ssh.connect: "sftp.client-corp.internal:22"
ssh.auth: "publickey (ed25519)"
ssh.status: Authenticated

// file preparation
file.source: "s3://df-out/batch_992.jsonl.gz"
file.size: 12,405,112 bytes
file.checksum: "sha256:8f4e...2b1a"

// transfer execution
sftp.put: "/inbound/pricing/batch_992.jsonl.gz.tmp"
sftp.progress: 100% (12.4 MB/s)
sftp.rename: "/inbound/pricing/batch_992.jsonl.gz"
sftp.verify: Transfer complete

// cleanup
delivery.status: SLA met (T-minus 14m)
ssh.disconnect: "Session closed"
// 05 — failure modes

Where SFTP
deliveries fail.

SFTP is robust, but the infrastructure surrounding it is brittle. Ranked by frequency of delivery alerts across DataFlirt's enterprise pipelines.

PIPELINES MONITORED ·   140+ enterprise
DELIVERY WINDOW ·  ·  ·   Daily batch
UPDATED ·  ·  ·  ·  ·  ·  2026-05-19
01

Firewall / IP Whitelist drops

Network layer · Client rotated IPs without notice
02

Disk full on target

Storage layer · Client ingestion stalled, disk filled up
03

Public key rotation mismatch

Auth layer · Expired credentials or revoked keys
04

Partial file ingestion

Race condition · Client reading before rename completes
05

SSH connection timeouts

Network layer · Aggressive load balancers killing idle connections
// 06 — DataFlirt's delivery architecture

Atomic writes,

zero partial reads.

The most common SFTP failure isn't network-related—it's a race condition where the client's cron job starts reading a file before the scraper finishes writing it. DataFlirt prevents this using atomic delivery: we upload the dataset with a .tmp extension, verify the checksum, and issue an atomic SSH rename command only when the transfer is 100% complete. Your ingestion scripts never see a partial file.

sftp-delivery.yml

Delivery configuration for a daily enterprise batch.

target.host sftp.enterprise.com:2222
auth.method ed25519_key
write.strategy atomic_rename
compression gzip_level_6
retry.policy exponential_backoff
alert.timeout 15m
delivery.status active

Stay ahead of the pipeline

Data engineering
intel, weekly.

Anti-bot shifts, scraping infrastructure updates, dataset delivery patterns, and business outcomes from our pipelines. Short, technical, no fluff.

// 07 — FAQ

Common
questions.

About SFTP delivery, enterprise integration, security, and how DataFlirt ensures reliable batch transfers.

Ask us directly →
Why use SFTP instead of S3 or GCS? +
Compliance and legacy infrastructure. Many enterprise environments—especially in banking, healthcare, and insurance—have strict data perimeter rules that prohibit pulling from external cloud buckets. SFTP allows them to open a single, highly monitored port (22) to a whitelisted IP address, pushing the data directly into their secure zone.
How do you handle partial file reads? +
We use atomic renames. The file is uploaded as data.csv.tmp. Once the transfer is complete and the checksum is verified, we issue an SSH command to rename it to data.csv. Since the rename operation is atomic at the filesystem level, your ingestion scripts will never pick up a half-written file.
Is SFTP secure enough for sensitive data? +
Yes, the transfer itself is encrypted via SSH, protecting data in transit from man-in-the-middle attacks. However, SFTP does not encrypt the data at rest on the target server. If you are handling PII, the files themselves should be encrypted (e.g., via PGP or GPG) before they are pushed over the SFTP connection.
How does DataFlirt handle IP whitelisting for SFTP? +
We route all SFTP delivery traffic through a set of dedicated, static egress IPs. We provide these IPs during onboarding so your network security team can whitelist them on your corporate firewall. We never rotate these egress IPs without a 30-day notice.
What happens if our SFTP server goes down? +
DataFlirt's delivery workers use exponential backoff to retry the connection. If the server remains unreachable after the configured retry window (typically 4–12 hours), the dataset is held in a secure dead-letter queue, and an automated alert is sent to your engineering team. Once your server is back online, the queue automatically flushes.
Can SFTP handle real-time data streaming? +
No. SFTP is fundamentally a batch-oriented protocol. While you can simulate near-real-time delivery by pushing micro-batches (e.g., every 5 minutes), the overhead of establishing SSH connections and managing thousands of tiny files makes it highly inefficient. For real-time data, webhooks or Kafka are the correct architectural choices.
$ dataflirt scope --new-project --target=sftp-data-delivery READY

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off catalogue dump or a continuous feed across millions of records — we scope, build, and operate the pipeline.

hello@dataflirt.com  ·  Bengaluru  ·  IST  ·  typical reply < 4h