SYSTEM all green source secretflying.com queue 1,492 pages p99 latency 185ms dataflirt.com · scraper/secretflying-com
RUN · 18 active pipelines · secretflying.com live

Flight deal data,
at warehouse scale.

We extract error fares, route matrices, pricing signals, and booking links from Secretflying. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Snowflake on your cadence.

Deals extracted
412 /day
Error fares
28 /week
Route updates
3,104 /24h
Active pipelines
18
Uptime
99.98%
Data Dictionary

Every field we extract from secretflying.com

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Flight Deals objects from secretflying.com. All fields typed and schema-versioned.

deal_idtitleorigindestinationpricecurrencyairlinecabin_classtravel_datespost_dateis_error_fareurl
flight_deals
● 200 OK
"deal_id": "sf-84921",
"title": "New York to Paris for $210 roundtrip",
"origin": "JFK",
"destination": "CDG",
"price": 210.0,
"currency": "USD",
"airline": "Air France",
"is_error_fare": false
# deal_idtitleorigindestinationpricecurrency
1
2
3

Complete list of extractable fields for Booking Links objects from secretflying.com. All fields typed and schema-versioned.

deal_idota_namebooking_urlprice_at_otareferral_paramsplatformaffiliate_networkis_activescraped_at
booking_links
● 200 OK
"deal_id": "sf-84921",
"ota_name": "Skyscanner",
"booking_url": "https://skyscanner.com/transport/flights/...",
"price_at_ota": 210.0,
"platform": "web",
"is_active": true,
"scraped_at": "2023-10-24T14:30:00Z"
# deal_idota_namebooking_urlprice_at_otareferral_paramsplatform
1
2
3

Complete list of extractable fields for Route Matrix objects from secretflying.com. All fields typed and schema-versioned.

origin_airportdest_airportorigin_citydest_cityregioncountryflight_typestopoversdurationdeal_id
route_matrix
● 200 OK
"origin_airport": "JFK",
"dest_airport": "CDG",
"origin_city": "New York",
"dest_city": "Paris",
"region": "Europe",
"flight_type": "roundtrip",
"stopovers": 0
# origin_airportdest_airportorigin_citydest_cityregioncountry
1
2
3

Complete list of extractable fields for Deal Metadata objects from secretflying.com. All fields typed and schema-versioned.

deal_idpost_authorpublish_timestamptagscategoriesimage_urldeal_descriptionexpiry_statusview_count
deal_metadata
● 200 OK
"deal_id": "sf-84921",
"post_author": "Secret Flying Team",
"publish_timestamp": "2023-10-24T12:00:00Z",
"tags": "['Europe', 'Non-stop', 'SkyTeam']",
"categories": "['USA Deals', 'Economy']",
"expiry_status": "active"
# deal_idpost_authorpublish_timestamptagscategoriesimage_url
1
2
3

Complete list of extractable fields for Error Fares objects from secretflying.com. All fields typed and schema-versioned.

deal_idnormal_priceerror_pricediscount_pctrisk_levelhonoring_probabilityairline_involveddetection_timestatus
error_fares
● 200 OK
"deal_id": "sf-84922",
"normal_price": 1200.0,
"error_price": 150.0,
"discount_pct": 87.5,
"risk_level": "high",
"airline_involved": "British Airways",
"status": "expired"
# deal_idnormal_priceerror_pricediscount_pctrisk_levelhonoring_probability
1
2
3

Capabilities

Extract aviation pricing anomalies instantly

Our Secretflying scraper parses unstructured travel deals into normalised route matrices, extracting exact dates, origins, destinations, and pricing before the deals expire.

Error Fare Detection

Identify pricing anomalies and mistake fares immediately upon publication, delivered via low-latency webhooks.

Origin/Destination Mapping

Parse unstructured text to map origins and destinations to standard IATA airport codes.

Travel Date Parsing

Extract complex date ranges (e.g., 'Jan - Mar 2024') into structured ISO-8601 timestamps for database ingestion.

OTA Link Extraction

Capture outbound booking links to Skyscanner, Kayak, and direct airlines, including referral parameters.

Airline & Alliance Intel

Identify operating carriers, codeshares, and alliance networks associated with each published deal.

Cabin Class Identification

Categorise deals into Economy, Premium Economy, Business, and First Class based on post metadata.

Expired Deal Tracking

Monitor active deals and update status flags when prices jump or error fares are corrected.

Multi-Region Support

Filter and route deals based on departure regions: Euro, US, Asia, or global feeds.

High-Frequency Polling

Run continuous extraction at sub-minute intervals to ensure no flash sale is missed.

// engagement pipeline

From deal feed to warehouse record

Brief in. Clean data out.

Define Scope
d 0

Provide target regions, deal types, or specific alert criteria. We design the extraction schema together.

Pipeline Build
d 2–4

We configure Scrapy / Playwright crawlers, proxy rotation, session management, and CAPTCHA handling for secretflying.com.

Validation & QA
d 4–6

Schema validation, null-rate checks, price-outlier detection, and sample deal parsing before full launch.

Delivery
ongoing

JSON / CSV / Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.

Under the hood

How our Secretflying pipeline handles the hard parts

Extracting unstructured travel deals requires more than simple HTTP requests. Here is how we normalise the data.

pipeline-monitor · secretflying.com · live ● active
// fingerprinting
Identity rotation
TLS fingerprintrandomised
User-agentrotated
IP poolresidential
Challenges blocked0
// pagination
Page coverage
48,291 pages queued running
// observability
Pipeline health
99.9%
uptime
142ms
p99 lat
0.3%
null rate
2
alerts
Bot Protection
Cloudflare bypass and residential IPs

Secretflying relies on Cloudflare to block automated traffic. We route requests through residential proxies and use Playwright to solve JS challenges, ensuring uninterrupted deal flow.

Unstructured Data
NLP for date and route parsing

Deals are often posted as unstructured text ('Fly from London to Tokyo for £300'). We use custom NER models to extract accurate IATA codes, prices, and travel date windows.

Link Unrolling
Resolving affiliate redirects

Booking links often pass through multiple redirect chains. We follow the HTTP redirects to capture the final destination URL and pricing parameters on the OTA site.

Low Latency
Sub-minute polling for error fares

Error fares exist for hours, sometimes minutes. Our pipelines poll the feed continuously, using conditional requests (ETags) to minimise overhead while delivering alerts instantly.

Schema Stability
Resilient selectors for CMS changes

WordPress DOM structures change frequently with theme updates. We use multi-layer fallback chains targeting structured data (JSON-LD) and CSS to maintain pipeline integrity.

Applications

Who uses Secretflying data — and how

Teams across industries use secretflying.com data to build competitive products and smarter operations.

01
Travel Aggregators

Integrate error fares and flash sales directly into consumer-facing meta-search platforms to drive conversion.

02
Price Arbitrage

Travel agencies monitor error fares to build high-margin package deals before airlines correct the pricing.

03
Airline Competitive Intel

Revenue management teams track competitor flash sales and unfiled discount fares to adjust their own pricing models.

04
Consumer Deal Apps

Mobile applications ingest our webhook feed to send push notifications to users for specific route combinations.

05
Market Research

Analysts track historical discount trends to predict seasonal sales and route-specific price drops.

06
AI Training Data

ML teams use historical deal text and parsed outcomes to train travel-specific NLP extraction models.

Why DataFlirt

"Secretflying surfaces the most volatile pricing anomalies in aviation — but error fares vanish in hours unless you capture them programmatically."

Most teams underestimate the required infrastructure: capturing transient flight deals requires sub-minute polling, residential proxies to bypass Cloudflare, and custom NLP to parse unstructured travel dates. DataFlirt absorbs that complexity so your engineers can focus on the analysis — not the infrastructure.

Technical Spec

Secretflying scraper — technical capabilities

Everything supported by our secretflying.com scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

Cloudflare bypass
Automated JS challenge resolution via Playwright and residential IPs
Supported
Unstructured date parsing
Converts text like 'Jan-Mar' into structured date ranges
Supported
Affiliate link resolution
Follows redirect chains to capture final OTA destination URLs
Supported
Error fare flagging
Identifies deals explicitly marked or structured as mistake fares
Supported
Deal expiry tracking
Monitors active deals and updates status when marked expired
Supported
Region filtering
Filter feeds by US, Euro, or Rest of World origin points
Supported
Webhook alerts
HTTP POST delivery within seconds of a new deal publication
Supported
Historical deal archive
Extract years of past deal data for trend analysis
Supported
User comments extraction
Dynamic Disqus/Facebook comment threads on deal pages
Partial
Newsletter exclusive deals
Deals sent only via email and not published on the web feed
Partial
Infrastructure

Infrastructure powering the Secretflying pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheus
Scrapy + Playwright Stack

Scrapy handles crawl orchestration, deduplication, and retry logic. Playwright handles JavaScript rendering, cookie sessions, and interaction flows. Combined via scrapy-playwright middleware.

Residential Proxy Infrastructure

We maintain pools of residential ISP proxies across IN/US/UK/DE regions. Rotation happens per-request with sticky sessions where required. IP score monitoring prevents blacklisted pool contamination.

Cloud-Native Orchestration

Pipelines run on AWS Lambda (burst) and ECS (sustained). Airflow handles scheduling, dependency management, and SLA alerting. All state stored in managed Postgres.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON
Newline-delimited or nested — schema versioned per run
CSV
Flat file with typed columns — Excel/Sheets compatible
XLS
Legacy Excel format for offline business analyst workflows
Parquet
Columnar format for BigQuery, Snowflake, Athena
AWS S3
Direct bucket delivery — compatible with any data lake
Webhook
HTTP POST per record for real-time downstream processing
API
REST endpoints to query historical and active deals
BigQuery
Streamed directly into your dataset with schema auto-detect
S3
Direct bucket delivery — compatible with any data lake
// faq

Common questions.

About secretflying.com scraping, legality, and pipeline operations.

Ask us directly →
Is scraping Secretflying legal?

Scraping publicly available flight deals is generally permissible. DataFlirt targets only public, non-authenticated deal data. We do not extract personal data or circumvent authentication walls.

How do you handle Cloudflare protection?

We use residential ISP proxies and full Playwright browser sessions with realistic TLS fingerprints to pass JS challenges and maintain high success rates.

Can you parse unstructured date ranges?

Yes. We use custom NLP models to convert text descriptions like 'Jan-Mar' or 'Late November' into structured date fields suitable for database querying.

How fast can you deliver error fares?

Our high-frequency pipelines poll the feed continuously. Webhook delivery ensures you receive the data within seconds of the deal being published on the site.

Do you unroll affiliate links?

Yes. We follow the HTTP redirect chains to extract the final OTA or airline URL, allowing you to bypass affiliate networks if required.

Can I filter by departure region?

Absolutely. Pipelines can be configured to only extract and deliver deals originating from specific regions, such as the US or Europe.

What is the minimum viable engagement?

Our smallest packages start with continuous monitoring of the global feed with webhook delivery. Contact us with your latency requirements for a scoped quote.

$ dataflirt scope --new-project --source=secretflying.com ready

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a historical archive of flight deals or a real-time webhook for error fares — we scope, build, and operate the pipeline. Tell us what you need.

hello@dataflirt.com · Bengaluru · IST · typical reply < 4h
Services

Data Extraction for Every Industry

View All Services →