SYSTEM all green source cntraveler.com queue 14,892 pages p99 latency 185ms dataflirt.com · scraper/cntraveler-com
RUN · 42 active pipelines · cntraveler.com live

CNTraveler data,
at warehouse scale.

We extract hotel reviews, destination guides, restaurant recommendations, and Readers' Choice rankings from Condé Nast Traveler. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Snowflake on your cadence.

Hotels extracted
32,410 /run
Destination guides
8,941 /run
Editorial articles
142K /total
Active pipelines
42
Uptime
99.98%
Data Dictionary

Every field we extract from cntraveler.com

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Hotel Reviews objects from cntraveler.com. All fields typed and schema-versioned.

hotel_namelocationratingeditor_reviewprice_rangeamenitiesreaders_choice_winnergold_list_statusbooking_urlimage_urls
hotel_reviews
● 200 OK
"hotel_name": "The Ritz-Carlton, Kyoto",
"location": "Kyoto, Japan",
"rating": 98.4,
"price_range": "$$$$",
"readers_choice_winner": true,
"gold_list_status": true,
"editor_review": "A riverside sanctuary blending traditional ryokan aesthetics with modern luxury."
# hotel_namelocationratingeditor_reviewprice_rangeamenities
1
2
3

Complete list of extractable fields for Destination Guides objects from cntraveler.com. All fields typed and schema-versioned.

destination_nameregioncountrybest_time_to_visitcurrencylanguagetop_hotelstop_restaurantsthings_to_doauthor
destination_guides
● 200 OK
"destination_name": "Amalfi Coast",
"country": "Italy",
"best_time_to_visit": "May to September",
"top_hotels": "['Le Sirenuse', 'Hotel Santa Caterina']",
"top_restaurants": "['La Sponda', 'Lo Scoglio']",
"language": "Italian"
# destination_nameregioncountrybest_time_to_visitcurrencylanguage
1
2
3

Complete list of extractable fields for Readers' Choice Awards objects from cntraveler.com. All fields typed and schema-versioned.

award_yearcategoryregionrankentity_namescoreprevious_rankdescriptionurl
readers'_choice awards
● 200 OK
"award_year": 2023,
"category": "Top 50 Hotels in the World",
"rank": 1,
"entity_name": "Ballyfin",
"score": 99.2,
"previous_rank": 4
# award_yearcategoryregionrankentity_namescore
1
2
3

Complete list of extractable fields for Restaurant Reviews objects from cntraveler.com. All fields typed and schema-versioned.

restaurant_namecitycuisineprice_tierchefmust_orderatmosphereeditor_ratingaddress
restaurant_reviews
● 200 OK
"restaurant_name": "Pujol",
"city": "Mexico City",
"cuisine": "Mexican",
"price_tier": "$$$",
"chef": "Enrique Olvera",
"must_order": "Mole Madre"
# restaurant_namecitycuisineprice_tierchefmust_order
1
2
3

Complete list of extractable fields for Editorial Articles objects from cntraveler.com. All fields typed and schema-versioned.

article_titleauthorpublish_dateupdate_datecategorytagscontent_bodyhero_image_urlrelated_articles
editorial_articles
● 200 OK
"article_title": "The 21 Best Places to Go in 2024",
"author": "CN Traveler Editors",
"publish_date": "2023-11-15",
"category": "Inspiration",
"tags": "['Travel Guide', '2024']",
"update_date": "2024-01-05"
# article_titleauthorpublish_dateupdate_datecategorytags
1
2
3

Capabilities

Everything you need from Condé Nast Traveler

Our CNTraveler scraper handles every layer of the platform, extracting structured data from complex editorial layouts, infinite scroll feeds, and interactive maps.

Full Hotel Directory Extraction

Extract property details, editor reviews, amenities, and pricing tiers across all global regions.

Readers' Choice Data Mining

Capture historical and current rankings for hotels, resorts, cities, islands, and airlines.

Destination Guide Aggregation

Compile curated itineraries, best-time-to-visit recommendations, and local laws for thousands of cities.

Restaurant & Bar Curation

Extract editor-approved dining spots, signature dishes, and atmosphere descriptors.

Gold List & Hot List Tracking

Monitor the properties that make Condé Nast Traveler's highly coveted annual editor lists.

Cruise & Airline Ratings

Scrape detailed reviews of cruise itineraries, cabin classes, and airline lounge experiences.

Editorial Article Parsing

Extract full text, author metadata, publication dates, and embedded media from travel features.

High-Resolution Image Capture

Extract CDN links for professional photography galleries associated with properties and destinations.

Scheduled Content Syncing

Run one-off bulk exports or configure continuous pipelines at weekly cadences for new content.

// engagement pipeline

From URL list to warehouse record

Brief in. Clean data out.

Define Scope
d 0

Provide destination URLs, hotel categories, or award years. We design the extraction schema together.

Pipeline Build
d 2–4

We configure Scrapy crawlers, proxy rotation, and JavaScript rendering to handle media-heavy page loads.

Validation & QA
d 4–6

Schema validation, null-rate checks, and sample data review before full launch.

Delivery
ongoing

JSON / CSV / Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.

Under the hood

How our CNTraveler pipeline handles the hard parts

Modern media sites rely on heavy JavaScript frameworks and aggressive caching. Here is how we extract structured data reliably.

pipeline-monitor · cntraveler.com · live ● active
// fingerprinting
Identity rotation
TLS fingerprintrandomised
User-agentrotated
IP poolresidential
Challenges blocked0
// pagination
Page coverage
48,291 pages queued running
// observability
Pipeline health
99.9%
uptime
142ms
p99 lat
0.3%
null rate
2
alerts
JavaScript rendering
Full Playwright execution for Next.js hydration

CNTraveler uses modern frontend frameworks. We run full browser sessions to execute JavaScript, hydrate React components, and trigger lazy-loaded image galleries that headless clients miss entirely.

Anti-bot layer
Residential proxy rotation

Media publishers deploy Web Application Firewalls to block automated scraping. Our crawlers use residential ISP proxies with realistic browser fingerprints to bypass rate limits.

Infinite scroll handling
Automated pagination extraction

Destination guides and article feeds rely on infinite scrolling. Our scripts simulate user scrolling behaviour to capture the complete document tree before extraction.

Schema stability
Resilient selectors for editorial layouts

Editorial content formats vary wildly between standard articles, galleries, and interactive maps. We use multi-layer fallback chains to ensure consistent data extraction across all template types.

Change detection
Only re-scrape updated articles

We maintain a hash index of last-seen publication and modification dates. Subsequent runs only pull newly published or updated guides, reducing compute overhead.

Applications

Who uses CNTraveler data

Teams across industries use cntraveler.com data to build competitive products and smarter operations.

01
Travel Aggregator Enrichment

OTAs and booking platforms enrich their property listings with third-party editor reviews and award badges to drive conversion.

02
Luxury Brand Intelligence

Hospitality brands monitor their inclusion in the Gold List, Hot List, and Readers' Choice Awards against competitor sets.

03
Market Research

Tourism boards analyse destination sentiment, recommended itineraries, and trending regions to inform marketing spend.

04
AI Training Data

LLM developers use high-quality, editorially curated travel guides and hotel descriptions to fine-tune travel recommendation models.

05
Sentiment Analysis

Hospitality holding companies track qualitative descriptors used by professional travel writers to assess property positioning.

06
Content Curation

Travel agents and concierge services ingest curated restaurant and activity data to build bespoke client itineraries.

Why DataFlirt

"Condé Nast Traveler holds decades of the most authoritative hospitality curation on the internet. Extracting it from unstructured editorial layouts requires purpose-built infrastructure."

Most teams underestimate the complexity of scraping modern media publications. Extracting clean, structured data from disparate editorial templates, interactive maps, and infinite-scroll galleries requires full JavaScript rendering and resilient selector strategies. DataFlirt manages the infrastructure so your data science team can focus on analysis.

Technical Spec

CNTraveler scraper technical capabilities

Everything supported by our cntraveler.com scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

JavaScript rendering
Full Playwright sessions for React hydration and lazy-loaded galleries
Supported
Residential proxy rotation
ISP-grade residential IPs to bypass WAF rate limits
Supported
Readers' Choice historical data
Extract award data across all historical years available on the site
Supported
Interactive map extraction
Capture coordinates and POI metadata from embedded Mapbox widgets
Supported
Gallery image CDN links
Extract high-resolution image URLs without watermarks where available
Supported
Author & metadata parsing
Extract publication dates, update timestamps, and author bios
Supported
Change detection
Hash-based diff to only emit records with changed fields since last run
Supported
Subscriber-only premium content
Articles locked behind the Condé Nast digital subscription paywall
Partial
User saved itineraries
Private lists and saved places tied to individual user accounts
Partial
Infrastructure

Infrastructure powering the CNTraveler pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheus
Scrapy + Playwright Stack

Scrapy handles crawl orchestration and deduplication. Playwright handles JavaScript rendering, infinite scrolling, and React hydration. Combined via scrapy-playwright middleware.

Residential Proxy Infrastructure

We maintain pools of residential ISP proxies. Rotation happens per-request to bypass media publisher WAFs and rate limits without triggering blocks.

Cloud-Native Orchestration

Pipelines run on AWS Lambda and ECS. Airflow handles scheduling, dependency management, and SLA alerting. All state is stored in managed Postgres.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON
Newline-delimited or nested, schema versioned per run
CSV
Flat file with typed columns, Excel/Sheets compatible
Parquet
Columnar format for BigQuery, Snowflake, Athena
AWS S3
Direct bucket delivery, compatible with any data lake
Webhook
HTTP POST per record for real-time downstream processing
API
RESTful endpoint to query extracted destination data on demand
XLS
Legacy spreadsheet format for non-technical business teams
BigQuery
Streamed directly into your dataset with schema auto-detect
S3
Direct bucket delivery — compatible with any data lake
// faq

Common questions.

About cntraveler.com scraping, legality, and pipeline operations.

Ask us directly →
Is scraping Condé Nast Traveler legal?

Scraping publicly available editorial content is generally permissible under applicable law, provided it does not infringe on copyright for republication. DataFlirt extracts factual data like hotel names, locations, ratings, award status, and snippets for analytical use. We do not bypass subscription paywalls. Clients should consult legal counsel regarding fair use and copyright.

How do you handle different article layouts?

CNTraveler uses various templates for standard articles, galleries, and listicles. Our selector strategy uses multi-layer fallback chains incorporating CSS, XPath, and LD+JSON to normalise unstructured editorial text into clean, structured schemas.

Can you extract data from the Readers' Choice Awards?

Yes. We extract the complete hierarchy of Readers' Choice data, including award year, category, regional rank, property name, and numerical score across all available historical data.

Do you capture high-resolution images?

We extract the CDN URLs for all property and destination images embedded in articles and galleries. We do not download the binary files directly, but provide the links for your systems to ingest.

How fresh is the data?

For editorial sites like CNTraveler, we typically configure weekly or monthly pipeline runs to capture newly published guides, updated hotel reviews, and annual award releases. One-off historical extractions are also available.

What is the minimum viable engagement?

Our smallest packages start at a defined category extraction, such as all European hotel reviews, with monthly delivery. For full-site extraction, we price based on volume and delivery frequency.

$ dataflirt scope --new-project --source=cntraveler.com ready

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off dump of historical Readers' Choice winners or a continuous feed of new hotel reviews, we scope, build, and operate the pipeline. Tell us what you need.

hello@dataflirt.com · Bengaluru · IST · typical reply < 4h
Services

Data Extraction for Every Industry

View All Services →