SYSTEM all green source getyourguide.com queue 12,943 URLs p99 latency 184ms dataflirt.com · scraper/getyourguide-com

RUN 114 active pipelines getyourguide.com live

GetYourGuide data,
at warehouse scale.

We extract tour listings, dynamic pricing, availability calendars, operator intelligence, and verified reviews from GetYourGuide. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Snowflake on your cadence.

Get data from getyourguide.com → See how it works

Tours extracted

142K /day

Availability updates

1.8M /24h

Review records

312K /run

Active pipelines

114

Uptime

99.98%

◆ Tour & Activity Data◆ Dynamic Pricing◆ Availability Calendars◆ Operator Intelligence◆ Verified Reviews◆ Itinerary Details◆ Meeting Point Coordinates◆ Cancellation Policies◆ Multi-Language Support◆ Multi-Currency Pricing◆ Guided Tour Metadata◆ Managed Pipeline◆ S3 / BigQuery Delivery◆ Bengaluru HQ◆ Enterprise SLA◆ Tour & Activity Data◆ Dynamic Pricing◆ Availability Calendars◆ Operator Intelligence◆ Verified Reviews◆ Itinerary Details◆ Meeting Point Coordinates◆ Cancellation Policies◆ Multi-Language Support◆ Multi-Currency Pricing◆ Guided Tour Metadata◆ Managed Pipeline◆ S3 / BigQuery Delivery◆ Bengaluru HQ◆ Enterprise SLA

Data Dictionary

Every field we extract from getyourguide.com

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Tour Listings objects from getyourguide.com. All fields typed and schema-versioned.

tour_idtitlelocationcategorydurationratingreview_countbase_pricecurrencyoperator_namecancellation_policyhighlightsincludesexcludes

"tour_id": "39281",
"title": "Louvre Museum Skip-the-Line Access Tour",
"location": "Paris, France",
"duration": "3 hours",
"rating": 4.8,
"review_count": 14290,
"base_price": 65.0,
"currency": "EUR"

#	tour_id	title	location	category	duration	rating
1
2
3

Complete list of extractable fields for Pricing & Availability objects from getyourguide.com. All fields typed and schema-versioned.

tour_iddatetime_slotticket_typepricecurrencyavailability_statusremaining_spotsdiscount_pct

"tour_id": "39281",
"date": "2026-08-15",
"time_slot": "09:30:00",
"ticket_type": "Adult",
"price": 65.0,
"currency": "EUR",
"availability_status": "AVAILABLE",
"remaining_spots": 12

#	tour_id	date	time_slot	ticket_type	price	currency
1
2
3

Complete list of extractable fields for Reviews & Ratings objects from getyourguide.com. All fields typed and schema-versioned.

review_idtour_idreviewer_nameratingreview_datereview_texttraveler_typecountryhelpful_votes

"review_id": "RV-9928174",
"tour_id": "39281",
"rating": 5,
"review_date": "2026-05-10",
"traveler_type": "Couples",
"country": "United Kingdom",
"helpful_votes": 14

#	review_id	tour_id	reviewer_name	rating	review_date	review_text
1
2
3

Complete list of extractable fields for Operator Data objects from getyourguide.com. All fields typed and schema-versioned.

operator_idoperator_nametotal_toursaverage_ratingreview_countresponse_rateoperator_descriptionlanguages_spoken

"operator_id": "OP-4412",
"operator_name": "Paris City Vision",
"total_tours": 48,
"average_rating": 4.6,
"review_count": 85400,
"response_rate": 98.5,
"languages_spoken": "['English', 'French', 'Spanish']"

#	operator_id	operator_name	total_tours	average_rating	review_count	response_rate
1
2
3

Complete list of extractable fields for Search Results objects from getyourguide.com. All fields typed and schema-versioned.

keywordlocationpositiontour_idtitleratingreview_countbase_pricebadge_typethumbnail_url

"keyword": "museum tours",
"location": "Paris",
"position": 1,
"tour_id": "39281",
"rating": 4.8,
"base_price": 65.0,
"badge_type": "Originals by GetYourGuide"

#	keyword	location	position	tour_id	title	rating
1
2
3

Capabilities

Everything you need from GetYourGuide

Our GetYourGuide scraper handles dynamic calendars, complex pricing tiers, and deep pagination with anti-bot circumvention built directly into the pipeline.

Full Tour Data Extraction

Title, description, itinerary, highlights, inclusions, exclusions, and meeting points scraped at the individual tour level.

Dynamic Pricing & Availability

Extract ticket tiers, date-specific pricing, and real-time availability calendars across a rolling 365-day window.

Verified Review Mining

Scrape text, rating, traveler type, and date across paginated review sections to analyse customer sentiment.

Operator Intelligence

Track operator portfolios, aggregate ratings, and response metrics to evaluate supplier performance.

Geo-Location & Meeting Points

Extract exact latitude and longitude coordinates for starting locations and points of interest.

Multi-Currency & Localization

Capture pricing in EUR, USD, GBP and other supported currencies alongside localized descriptions.

Categorisation & Taxonomy

Map activities to specific tags like Culture, Adventure, or Skip-the-line to build precise catalogues.

SERP & Destination Scraping

Track ranking positions for specific destination pages and keyword searches to monitor visibility.

Scheduled + Streaming Modes

Configure continuous pipelines at daily or real-time cadences with change-detection diffing.

// engagement pipeline

From URL list to warehouse record

Brief in. Clean data out.

Define Scope

d 0

Provide destination URLs, category pages, or operator IDs. We design the extraction schema together.

Pipeline Build

d 2–4

We configure Scrapy / Playwright crawlers, proxy rotation, and session management for getyourguide.com.

Validation & QA

d 4–6

Schema validation, null-rate checks, and sample data review before full launch.

Delivery

ongoing

JSON / CSV / Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.

Under the hood

How our GetYourGuide pipeline handles the hard parts

Travel aggregators rely heavily on dynamic availability and bot protection. Here is how we build resilient extraction pipelines.

// fingerprinting

Identity rotation

TLS fingerprintrandomised

User-agentrotated

IP poolresidential

Challenges blocked0

// pagination

Page coverage

48,291 pages queued running

// observability

Pipeline health

99.9%

uptime

142ms

p99 lat

0.3%

null rate

alerts

Anti-bot layer

Residential proxy rotation + fingerprint spoofing

GetYourGuide employs strict rate limiting and bot detection. Our crawlers use residential ISP proxies with realistic browser fingerprints, randomised request timing, and full cookie session management.

JavaScript rendering

Full Playwright execution for dynamic calendars

Availability calendars and pricing tiers load dynamically. We run full Playwright browser sessions with JavaScript execution to trigger API calls and hydrate pricing widgets.

Schema stability

Resilient selectors with fallback chains

DOM structures shift frequently. Our strategy uses multiple fallback chains per field, including CSS selectors, XPath, and LD+JSON extraction.

Change detection

Only re-scrape what changes

For large activity catalogues, we maintain a hash index of last-seen values. Subsequent runs only push diffs, reducing compute cost and storage bloat.

Monitoring & alerting

24/7 pipeline health with anomaly detection

Every run emits structured logs to our observability stack. We alert on null-rate spikes, schema drift, and coverage drops.

Applications

Who uses GetYourGuide data and how

Teams across industries use getyourguide.com data to build competitive products and smarter operations.

OTA Price Parity

Online travel agencies monitor pricing and availability to ensure competitiveness and detect parity violations.

Competitive Intelligence

Tour operators track competitor pricing, review velocity, and itinerary changes to optimise their own offerings.

Market & Destination Research

Tourism boards and analysts evaluate destination popularity, average pricing, and seasonal demand fluctuations.

Yield Management

Revenue managers analyse availability calendars to forecast demand and adjust dynamic pricing models.

AI Travel Planner Training

Machine learning teams ingest structured itineraries and reviews to train conversational travel assistants.

Operator Benchmarking

Aggregators evaluate supplier performance by tracking review scores, cancellation policies, and response rates.

Why DataFlirt

"GetYourGuide holds the definitive graph of global experiences and availability but extracting it requires navigating aggressive rate limits and dynamic calendars."

Most travel data teams underestimate the investment required: reliable GetYourGuide scraping requires residential proxies, full JavaScript rendering for availability calendars, and daily selector maintenance. DataFlirt absorbs that complexity so your engineers can focus on yield analysis instead of infrastructure.

Technical Spec

GetYourGuide scraper technical capabilities

Everything supported by our getyourguide.com scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

JavaScript rendering

Full Playwright sessions required for dynamic availability calendars

Supported

CAPTCHA bypass

Automated 2Captcha + CapSolver integration

Supported

Residential proxy rotation

ISP-grade residential IPs rotated per request

Supported

Availability calendar extraction

Scrapes date-specific pricing and remaining spots

Supported

Multi-currency pricing

Captures pricing in local and specified currencies

Supported

Review pagination

Extracts the full historical review corpus

Supported

Destination SERP tracking

Monitors ranking positions for specific keywords

Supported

Change detection

Hash-based diffs emit only changed records

Supported

User booking history

Requires authenticated user credentials

Partial

Operator dashboard analytics

Requires authenticated supplier credentials

Partial

Infrastructure

Infrastructure powering the GetYourGuide pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheus

Scrapy + Playwright Stack

Scrapy handles crawl orchestration and retry logic. Playwright handles JavaScript rendering, cookie sessions, and calendar interaction flows.

Residential Proxy Infrastructure

We maintain pools of residential ISP proxies across global regions. Rotation happens per-request with sticky sessions where required.

Cloud-Native Orchestration

Pipelines run on AWS Lambda and ECS. Airflow handles scheduling, dependency management, and SLA alerting.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON

Newline-delimited or nested

CSV

Flat file with typed columns

XLS

Excel compatible format

Parquet

Columnar format for data warehouses

AWS S3

Direct bucket delivery

Webhook

HTTP POST per record

API

REST endpoint access

BigQuery

Streamed directly into your dataset

Snowflake

Stage and COPY INTO workflow

Postgres

Upsert into your existing schema

Direct bucket delivery — compatible with any data lake

// faq

Common questions.

About getyourguide.com scraping, legality, and pipeline operations.

Ask us directly →

Is scraping GetYourGuide legal?

Scraping publicly available information is generally permissible under applicable law. DataFlirt targets only public, non-authenticated tour, pricing, and review data. We do not extract personal data or circumvent authentication walls.

How do you handle dynamic availability calendars?

We use full Playwright browser sessions to execute JavaScript, triggering the API calls necessary to hydrate the calendar widgets and extract date-specific pricing.

Can you track pricing in multiple currencies?

Yes. We can configure the crawler session to request pricing in EUR, USD, GBP, or other supported currencies as required.

How fresh is the data?

Real-time streaming pipelines achieve sub-60-minute latency for availability signals. Full destination refreshes at daily cadence complete within an 8-hour window.

Do you extract exact meeting point coordinates?

Yes. We parse the embedded map data to extract precise latitude and longitude coordinates for tour starting locations.

What is the minimum viable engagement?

Our smallest packages start at a defined URL list or specific destination categories with weekly delivery. We price based on volume and delivery frequency.

Do you support review scraping?

Yes. We handle deep pagination across the review corpus, extracting ratings, text, traveler types, and dates.

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off destination catalogue dump or a continuous availability monitoring feed, we scope, build, and operate the pipeline.

Start a getyourguide.com pipeline → View pricing

hello@dataflirt.com · Bengaluru · IST · typical reply < 4h

Services

Data Extraction for Every Industry

View All Services →

🛍️ eCommerce → 🔍 Search Engine → ⚽ Sports Data → 📱 App Store → 🍕 Food Delivery → 📉 Betting Odds → ✈️ Aviation & Flight → 🛒 Grocery → 🎓 E-Learning → 💹 Stock Market → 🏠 Real Estate → 🤖 AI Training Data → 🧠 LLM Data → 📰 News → ⭐ Reviews → 💼 Job Board → 🏥 Healthcare → 💊 Pharma → 🏢 Company Data → 🤝 B2B Marketplace → 🚗 Automotive → 🌍 Travel → 🏨 Hospitality → 🪙 Cryptocurrency → 💡 IP & Patents → 📈 SEO Data → ⚖️ Legal → 🛡️ Insurance → 📲 Mobile App → 📸 Influencer → 🏛️ Government → 🚚 Transportation → 🎟️ Events → 📂 Directory → ⚡ Dynamic Websites → 📄 PDF Extraction → ✍️ Blog Content → ☁️ Weather → 🖥️ Cloud Scraping → 👨‍💻 Managed Service →

GetYourGuide data, at warehouse scale.

Every field we extract from getyourguide.com

Everything you need from GetYourGuide

From URL list to warehouse record

How our GetYourGuide pipeline handles the hard parts

Who uses GetYourGuide data and how

GetYourGuide scraper technical capabilities

Infrastructure powering the GetYourGuide pipeline

Your data, your destination

Common questions.

Tell us whatto extract. We do the rest.

Data Extraction for Every Industry

GetYourGuide data,
at warehouse scale.

Tell us what
to extract.
We do the rest.