SYSTEM all green source getyourguide.com queue 12,943 URLs p99 latency 184ms dataflirt.com · scraper/getyourguide-com
RUN 114 active pipelines getyourguide.com live

GetYourGuide data,
at warehouse scale.

We extract tour listings, dynamic pricing, availability calendars, operator intelligence, and verified reviews from GetYourGuide. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Snowflake on your cadence.

Tours extracted
142K /day
Availability updates
1.8M /24h
Review records
312K /run
Active pipelines
114
Uptime
99.98%
Data Dictionary

Every field we extract from getyourguide.com

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Tour Listings objects from getyourguide.com. All fields typed and schema-versioned.

tour_idtitlelocationcategorydurationratingreview_countbase_pricecurrencyoperator_namecancellation_policyhighlightsincludesexcludes
tour_listings
● 200 OK
"tour_id": "39281",
"title": "Louvre Museum Skip-the-Line Access Tour",
"location": "Paris, France",
"duration": "3 hours",
"rating": 4.8,
"review_count": 14290,
"base_price": 65.0,
"currency": "EUR"
# tour_idtitlelocationcategorydurationrating
1
2
3

Complete list of extractable fields for Pricing & Availability objects from getyourguide.com. All fields typed and schema-versioned.

tour_iddatetime_slotticket_typepricecurrencyavailability_statusremaining_spotsdiscount_pct
pricing_& availability
● 200 OK
"tour_id": "39281",
"date": "2026-08-15",
"time_slot": "09:30:00",
"ticket_type": "Adult",
"price": 65.0,
"currency": "EUR",
"availability_status": "AVAILABLE",
"remaining_spots": 12
# tour_iddatetime_slotticket_typepricecurrency
1
2
3

Complete list of extractable fields for Reviews & Ratings objects from getyourguide.com. All fields typed and schema-versioned.

review_idtour_idreviewer_nameratingreview_datereview_texttraveler_typecountryhelpful_votes
reviews_& ratings
● 200 OK
"review_id": "RV-9928174",
"tour_id": "39281",
"rating": 5,
"review_date": "2026-05-10",
"traveler_type": "Couples",
"country": "United Kingdom",
"helpful_votes": 14
# review_idtour_idreviewer_nameratingreview_datereview_text
1
2
3

Complete list of extractable fields for Operator Data objects from getyourguide.com. All fields typed and schema-versioned.

operator_idoperator_nametotal_toursaverage_ratingreview_countresponse_rateoperator_descriptionlanguages_spoken
operator_data
● 200 OK
"operator_id": "OP-4412",
"operator_name": "Paris City Vision",
"total_tours": 48,
"average_rating": 4.6,
"review_count": 85400,
"response_rate": 98.5,
"languages_spoken": "['English', 'French', 'Spanish']"
# operator_idoperator_nametotal_toursaverage_ratingreview_countresponse_rate
1
2
3

Complete list of extractable fields for Search Results objects from getyourguide.com. All fields typed and schema-versioned.

keywordlocationpositiontour_idtitleratingreview_countbase_pricebadge_typethumbnail_url
search_results
● 200 OK
"keyword": "museum tours",
"location": "Paris",
"position": 1,
"tour_id": "39281",
"rating": 4.8,
"base_price": 65.0,
"badge_type": "Originals by GetYourGuide"
# keywordlocationpositiontour_idtitlerating
1
2
3

Capabilities

Everything you need from GetYourGuide

Our GetYourGuide scraper handles dynamic calendars, complex pricing tiers, and deep pagination with anti-bot circumvention built directly into the pipeline.

Full Tour Data Extraction

Title, description, itinerary, highlights, inclusions, exclusions, and meeting points scraped at the individual tour level.

Dynamic Pricing & Availability

Extract ticket tiers, date-specific pricing, and real-time availability calendars across a rolling 365-day window.

Verified Review Mining

Scrape text, rating, traveler type, and date across paginated review sections to analyse customer sentiment.

Operator Intelligence

Track operator portfolios, aggregate ratings, and response metrics to evaluate supplier performance.

Geo-Location & Meeting Points

Extract exact latitude and longitude coordinates for starting locations and points of interest.

Multi-Currency & Localization

Capture pricing in EUR, USD, GBP and other supported currencies alongside localized descriptions.

Categorisation & Taxonomy

Map activities to specific tags like Culture, Adventure, or Skip-the-line to build precise catalogues.

SERP & Destination Scraping

Track ranking positions for specific destination pages and keyword searches to monitor visibility.

Scheduled + Streaming Modes

Configure continuous pipelines at daily or real-time cadences with change-detection diffing.

// engagement pipeline

From URL list to warehouse record

Brief in. Clean data out.

Define Scope
d 0

Provide destination URLs, category pages, or operator IDs. We design the extraction schema together.

Pipeline Build
d 2–4

We configure Scrapy / Playwright crawlers, proxy rotation, and session management for getyourguide.com.

Validation & QA
d 4–6

Schema validation, null-rate checks, and sample data review before full launch.

Delivery
ongoing

JSON / CSV / Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.

Under the hood

How our GetYourGuide pipeline handles the hard parts

Travel aggregators rely heavily on dynamic availability and bot protection. Here is how we build resilient extraction pipelines.

pipeline-monitor · getyourguide.com · live ● active
// fingerprinting
Identity rotation
TLS fingerprintrandomised
User-agentrotated
IP poolresidential
Challenges blocked0
// pagination
Page coverage
48,291 pages queued running
// observability
Pipeline health
99.9%
uptime
142ms
p99 lat
0.3%
null rate
2
alerts
Anti-bot layer
Residential proxy rotation + fingerprint spoofing

GetYourGuide employs strict rate limiting and bot detection. Our crawlers use residential ISP proxies with realistic browser fingerprints, randomised request timing, and full cookie session management.

JavaScript rendering
Full Playwright execution for dynamic calendars

Availability calendars and pricing tiers load dynamically. We run full Playwright browser sessions with JavaScript execution to trigger API calls and hydrate pricing widgets.

Schema stability
Resilient selectors with fallback chains

DOM structures shift frequently. Our strategy uses multiple fallback chains per field, including CSS selectors, XPath, and LD+JSON extraction.

Change detection
Only re-scrape what changes

For large activity catalogues, we maintain a hash index of last-seen values. Subsequent runs only push diffs, reducing compute cost and storage bloat.

Monitoring & alerting
24/7 pipeline health with anomaly detection

Every run emits structured logs to our observability stack. We alert on null-rate spikes, schema drift, and coverage drops.

Applications

Who uses GetYourGuide data and how

Teams across industries use getyourguide.com data to build competitive products and smarter operations.

01
OTA Price Parity

Online travel agencies monitor pricing and availability to ensure competitiveness and detect parity violations.

02
Competitive Intelligence

Tour operators track competitor pricing, review velocity, and itinerary changes to optimise their own offerings.

03
Market & Destination Research

Tourism boards and analysts evaluate destination popularity, average pricing, and seasonal demand fluctuations.

04
Yield Management

Revenue managers analyse availability calendars to forecast demand and adjust dynamic pricing models.

05
AI Travel Planner Training

Machine learning teams ingest structured itineraries and reviews to train conversational travel assistants.

06
Operator Benchmarking

Aggregators evaluate supplier performance by tracking review scores, cancellation policies, and response rates.

Why DataFlirt

"GetYourGuide holds the definitive graph of global experiences and availability but extracting it requires navigating aggressive rate limits and dynamic calendars."

Most travel data teams underestimate the investment required: reliable GetYourGuide scraping requires residential proxies, full JavaScript rendering for availability calendars, and daily selector maintenance. DataFlirt absorbs that complexity so your engineers can focus on yield analysis instead of infrastructure.

Technical Spec

GetYourGuide scraper technical capabilities

Everything supported by our getyourguide.com scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

JavaScript rendering
Full Playwright sessions required for dynamic availability calendars
Supported
CAPTCHA bypass
Automated 2Captcha + CapSolver integration
Supported
Residential proxy rotation
ISP-grade residential IPs rotated per request
Supported
Availability calendar extraction
Scrapes date-specific pricing and remaining spots
Supported
Multi-currency pricing
Captures pricing in local and specified currencies
Supported
Review pagination
Extracts the full historical review corpus
Supported
Destination SERP tracking
Monitors ranking positions for specific keywords
Supported
Change detection
Hash-based diffs emit only changed records
Supported
User booking history
Requires authenticated user credentials
Partial
Operator dashboard analytics
Requires authenticated supplier credentials
Partial
Infrastructure

Infrastructure powering the GetYourGuide pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheus
Scrapy + Playwright Stack

Scrapy handles crawl orchestration and retry logic. Playwright handles JavaScript rendering, cookie sessions, and calendar interaction flows.

Residential Proxy Infrastructure

We maintain pools of residential ISP proxies across global regions. Rotation happens per-request with sticky sessions where required.

Cloud-Native Orchestration

Pipelines run on AWS Lambda and ECS. Airflow handles scheduling, dependency management, and SLA alerting.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON
Newline-delimited or nested
CSV
Flat file with typed columns
XLS
Excel compatible format
Parquet
Columnar format for data warehouses
AWS S3
Direct bucket delivery
Webhook
HTTP POST per record
API
REST endpoint access
BigQuery
Streamed directly into your dataset
Snowflake
Stage and COPY INTO workflow
Postgres
Upsert into your existing schema
S3
Direct bucket delivery — compatible with any data lake
// faq

Common questions.

About getyourguide.com scraping, legality, and pipeline operations.

Ask us directly →
Is scraping GetYourGuide legal?

Scraping publicly available information is generally permissible under applicable law. DataFlirt targets only public, non-authenticated tour, pricing, and review data. We do not extract personal data or circumvent authentication walls.

How do you handle dynamic availability calendars?

We use full Playwright browser sessions to execute JavaScript, triggering the API calls necessary to hydrate the calendar widgets and extract date-specific pricing.

Can you track pricing in multiple currencies?

Yes. We can configure the crawler session to request pricing in EUR, USD, GBP, or other supported currencies as required.

How fresh is the data?

Real-time streaming pipelines achieve sub-60-minute latency for availability signals. Full destination refreshes at daily cadence complete within an 8-hour window.

Do you extract exact meeting point coordinates?

Yes. We parse the embedded map data to extract precise latitude and longitude coordinates for tour starting locations.

What is the minimum viable engagement?

Our smallest packages start at a defined URL list or specific destination categories with weekly delivery. We price based on volume and delivery frequency.

Do you support review scraping?

Yes. We handle deep pagination across the review corpus, extracting ratings, text, traveler types, and dates.

$ dataflirt scope --new-project --source=getyourguide.com ready

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off destination catalogue dump or a continuous availability monitoring feed, we scope, build, and operate the pipeline.

hello@dataflirt.com · Bengaluru · IST · typical reply < 4h
Services

Data Extraction for Every Industry

View All Services →