SYSTEM all green source gelbeseiten.de queue 12,841 pages p99 latency 218ms dataflirt.com · scraper/gelbeseiten-de

RUN · 42 active pipelines · gelbeseiten.de live

German business data,
at warehouse scale.

We extract verified business profiles, contact coordinates, categories, and operating hours from Gelbe Seiten. Delivered as clean JSON, CSV, or Parquet to S3, Postgres, or Snowflake on your cadence.

Get data from gelbeseiten.de → See how it works

Listings extracted

1.2M /run

Phone numbers

845K /24h

Category updates

42K /day

Active pipelines

Uptime

99.98%

◆ Gelbe Seiten Listings◆ Business Contact Data◆ Industry Categories◆ Operating Hours◆ Geolocation Coordinates◆ Rating & Review Scores◆ Premium Listing Flags◆ Website URLs◆ Click-to-Reveal Phones◆ Regional Search Results◆ Managed Pipeline◆ S3 / Postgres Delivery◆ Bengaluru HQ◆ Enterprise SLA◆ Gelbe Seiten Listings◆ Business Contact Data◆ Industry Categories◆ Operating Hours◆ Geolocation Coordinates◆ Rating & Review Scores◆ Premium Listing Flags◆ Website URLs◆ Click-to-Reveal Phones◆ Regional Search Results◆ Managed Pipeline◆ S3 / Postgres Delivery◆ Bengaluru HQ◆ Enterprise SLA

Data Dictionary

Every field we extract from gelbeseiten.de

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Business Profiles objects from gelbeseiten.de. All fields typed and schema-versioned.

business_idnamelegal_namedescriptioncategorysub_categoriesyear_establishedpremium_statusprofile_urllogo_url

"business_id": "gs_98412941",
"name": "Müller Sanitärtechnik GmbH",
"category": "Klempner",
"premium_status": true,
"profile_url": "https://www.gelbeseiten.de/gs/mueller-sanitaertechnik",
"logo_url": "https://images.gelbeseiten.de/logo_98412941.jpg"

#	business_id	name	legal_name	description	category	sub_categories
1
2
3

Complete list of extractable fields for Contact & Location objects from gelbeseiten.de. All fields typed and schema-versioned.

business_idphone_primaryphone_secondaryemailwebsitestreet_addresspostal_codecitystatelatitudelongitudedirections_url

"phone_primary": "+49 30 1234567",
"email": "info@mueller-sanitaer.de",
"website": "www.mueller-sanitaer.de",
"street_address": "Hauptstraße 42",
"postal_code": "10115",
"city": "Berlin",
"latitude": 52.531677

#	business_id	phone_primary	phone_secondary	email	website	street_address
1
2
3

Complete list of extractable fields for Operating Hours objects from gelbeseiten.de. All fields typed and schema-versioned.

business_idmonday_openmonday_closetuesday_opentuesday_closewednesday_openwednesday_closethursday_openthursday_closefriday_openfriday_closeweekend_hoursholiday_exceptions

"business_id": "gs_98412941",
"monday_open": "08:00",
"monday_close": "17:00",
"friday_open": "08:00",
"friday_close": "15:00",
"weekend_hours": "Closed"

#	business_id	monday_open	monday_close	tuesday_open	tuesday_close	wednesday_open
1
2
3

Complete list of extractable fields for Ratings & Reviews objects from gelbeseiten.de. All fields typed and schema-versioned.

business_idrating_scorereview_countsource_platformrecent_review_datetop_review_textresponse_raterating_breakdown

"business_id": "gs_98412941",
"rating_score": 4.8,
"review_count": 124,
"source_platform": "Golocal",
"recent_review_date": "2026-05-10",
"response_rate": 0.95

#	business_id	rating_score	review_count	source_platform	recent_review_date	top_review_text
1
2
3

Complete list of extractable fields for Search Results objects from gelbeseiten.de. All fields typed and schema-versioned.

keywordlocationradiuspositionbusiness_idnameis_sponsoredad_typesnippet_textscraped_at

"keyword": "Klempner",
"location": "Berlin",
"position": 3,
"business_id": "gs_98412941",
"name": "Müller Sanitärtechnik GmbH",
"is_sponsored": false,
"scraped_at": "2026-05-12T09:14:33Z"

#	keyword	location	radius	position	business_id	name
1
2
3

Capabilities

Complete German directory data at scale

Our Gelbe Seiten scraper navigates complex regional search grids, resolves dynamic contact fields, and structures business profiles into clean warehouse records.

Full Business Profiles

Extract company names, legal entities, detailed descriptions, and primary/secondary industry categories from every listing.

Contact Information Extraction

Capture primary phone numbers, secondary lines, public email addresses, and external website URLs reliably.

Dynamic Phone Resolution

Render JavaScript and simulate click events to reveal masked phone numbers on heavily protected profiles.

Geolocation Normalisation

Extract precise latitude and longitude coordinates alongside structured postal addresses for geospatial mapping.

Operating Hours Parsing

Standardise opening and closing times across weekdays, weekends, and holiday exception schedules.

Rating Aggregation

Capture composite rating scores, total review counts, and source platform attributions displayed on listings.

Regional Search Traversal

Navigate complex search grids using German postal codes (PLZ) to bypass strict pagination limits.

Premium Listing Detection

Identify sponsored placements, premium profile flags, and ad types to map competitor marketing spend.

Scheduled Change Detection

Maintain a hash index of business records and only deliver diffs when contact details or statuses change.

// engagement pipeline

From PLZ list to warehouse record

Brief in. Clean data out.

Define Scope

d 0

Provide target categories, cities, or specific German postal codes (PLZ). We design the extraction schema.

Pipeline Build

d 2–4

We configure Scrapy crawlers, Playwright renderers for contact reveals, and German residential proxy pools.

Validation & QA

d 4–6

Schema validation, phone number format checks, and coordinate verification before full launch.

Delivery

ongoing

JSON, CSV, or Parquet pushed to your S3 bucket, Postgres database, or Snowflake stage.

Under the hood

How our Gelbe Seiten pipeline handles the hard parts

Extracting national directory data requires bypassing rate limits and dynamic field masking. Here is how we maintain pipeline stability.

// fingerprinting

Identity rotation

TLS fingerprintrandomised

User-agentrotated

IP poolresidential

Challenges blocked0

// pagination

Page coverage

48,291 pages queued running

// observability

Pipeline health

99.9%

uptime

142ms

p99 lat

0.3%

null rate

alerts

Anti-bot layer

German residential proxies

Directory sites aggressively block datacentre IPs. We route all requests through verified residential ISP proxies located within Germany, maintaining realistic request signatures and preventing region blocks.

JavaScript rendering

Playwright for contact reveals

Many listings mask phone numbers and email addresses behind JavaScript click events to deter basic HTTP scrapers. We deploy full Playwright browser sessions to trigger these events and capture the underlying data.

Pagination limits

Micro-radius spatial grids

Gelbe Seiten limits search results to a fixed number of pages per query. We bypass this by dividing major cities into micro-radius coordinate grids, ensuring zero data loss across dense commercial zones.

Schema stability

Fallback selectors for varied templates

Premium and free listings use entirely different DOM structures. Our extraction logic applies multi-layer fallback chains to normalise data regardless of the underlying profile template.

Change detection

Hash-indexed updates

For ongoing directory monitoring, we hash every listing. Subsequent runs only extract and deliver records that have changed, drastically reducing your storage costs and processing overhead.

Applications

Who uses Gelbe Seiten data

Teams across industries use gelbeseiten.de data to build competitive products and smarter operations.

B2B Lead Generation

Sales teams extract targeted lists of local businesses by industry and region to feed CRM pipelines.

Local SEO Monitoring

Agencies track client visibility and search rankings across specific German postal codes and categories.

Market Mapping

Analysts map business density and competitor distribution across federal states to guide expansion strategy.

Data Enrichment

Platforms append missing phone numbers, emails, and operating hours to existing incomplete business records.

Review Management

Brands monitor franchise locations for rating fluctuations and customer feedback across regional directories.

Geospatial Analysis

Logistics and retail planners use extracted coordinate data to optimise delivery routes and physical store placements.

Why DataFlirt

"Gelbe Seiten holds the definitive map of German commerce, but extracting structured postal and contact data at scale requires bypassing aggressive rate limits."

Most teams underestimate the investment required: reliable Gelbe Seiten scraping requires German residential proxies, full JavaScript rendering for contact reveals, and spatial search grids to bypass pagination limits. DataFlirt absorbs that complexity so your engineers can focus on the analysis.

Technical Spec

Gelbe Seiten scraper — technical capabilities

Everything supported by our gelbeseiten.de scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

JavaScript rendering

Playwright sessions required for click-to-reveal phone numbers

Supported

CAPTCHA bypass

Automated solver integration for rate-limit challenges

Supported

German residential proxies

DE-based ISP proxy pools to prevent geographic blocking

Supported

Spatial search grids

Radius-based pagination bypass for dense city centres

Supported

Operating hours normalisation

Standardised ISO 8601 formatting for time blocks

Supported

Change detection (diffs)

Hash-based diff logic to emit only changed business records

Supported

Webhook delivery

HTTP POST per record for real-time lead ingestion

Supported

User account credentials

Private saved lists and user-specific dashboard data

Partial

Direct messaging

Automated sending of messages to businesses via the platform

Partial

Infrastructure

Infrastructure powering the directory pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheus

Scrapy + Playwright Stack

Scrapy handles search grid orchestration and deduplication. Playwright executes JavaScript to reveal masked contact details on individual business profiles.

DE-Targeted Proxy Infrastructure

We route all directory traffic through German residential ISP proxies. This maintains high trust scores and bypasses aggressive datacentre IP blocking.

Cloud-Native Orchestration

Pipelines run on containerised AWS infrastructure. Airflow manages spatial grid dependencies and schedule frequencies, with all state stored in PostgreSQL.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON

Newline-delimited files containing nested business profiles

CSV

Flat tabular data ideal for CRM imports

XLS

Excel format for manual sales team review

Parquet

Columnar format for data warehouse ingestion

AWS S3

Direct bucket delivery on pipeline completion

Webhook

HTTP POST per record for immediate downstream processing

API

REST endpoints to query your extracted dataset

PostgreSQL

Direct database upserts with schema conflict resolution

Direct bucket delivery — compatible with any data lake

// faq

Common questions.

About gelbeseiten.de scraping, legality, and pipeline operations.

Ask us directly →

Is scraping gelbeseiten.de legal?

Scraping public business directory data is generally permissible for B2B use cases. We extract only publicly visible company information, contact details, and operating hours. We do not extract personal consumer data or bypass authentication walls. Clients must ensure their subsequent use of contact data complies with GDPR and local telemarketing regulations.

How do you handle hidden phone numbers?

Gelbe Seiten frequently masks phone numbers requiring a user click to reveal them. We use headless Playwright browsers to load the page, execute the necessary JavaScript, simulate the interaction, and capture the fully rendered contact string.

How do you bypass the 50-page search limit?

Directory searches cap out at a fixed number of results per query. We map large cities into micro-radius coordinate grids or granular postal code (PLZ) lists. This forces the platform to return smaller, complete result sets that fit within the pagination limits.

Can I target specific German postal codes (PLZ)?

Yes. You can provide a specific list of PLZs, cities, or federal states. We inject these directly into the search logic to extract highly targeted regional datasets.

How fresh is the data?

We can configure pipelines to run daily, weekly, or monthly depending on your requirements. Most clients opt for a full initial extraction followed by weekly diff-based updates to capture new listings and modified operating hours.

Do you extract business emails?

Yes, we extract email addresses whenever they are publicly listed on the business profile. Note that some businesses rely solely on contact forms rather than publishing raw email addresses.

What is the minimum viable engagement?

Our minimum engagement typically starts with a defined category set across major German cities. For full national directory coverage, we price based on the total volume of records and the required update frequency. Contact us to scope your specific data needs.

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a national directory dump or targeted regional extracts — we scope, build, and operate the pipeline. Tell us what you need.

Start a gelbeseiten.de pipeline → View pricing

hello@dataflirt.com · Bengaluru · IST · typical reply < 4h

Services

Data Extraction for Every Industry

View All Services →

🛍️ eCommerce → 🔍 Search Engine → ⚽ Sports Data → 📱 App Store → 🍕 Food Delivery → 📉 Betting Odds → ✈️ Aviation & Flight → 🛒 Grocery → 🎓 E-Learning → 💹 Stock Market → 🏠 Real Estate → 🤖 AI Training Data → 🧠 LLM Data → 📰 News → ⭐ Reviews → 💼 Job Board → 🏥 Healthcare → 💊 Pharma → 🏢 Company Data → 🤝 B2B Marketplace → 🚗 Automotive → 🌍 Travel → 🏨 Hospitality → 🪙 Cryptocurrency → 💡 IP & Patents → 📈 SEO Data → ⚖️ Legal → 🛡️ Insurance → 📲 Mobile App → 📸 Influencer → 🏛️ Government → 🚚 Transportation → 🎟️ Events → 📂 Directory → ⚡ Dynamic Websites → 📄 PDF Extraction → ✍️ Blog Content → ☁️ Weather → 🖥️ Cloud Scraping → 👨‍💻 Managed Service →

German business data, at warehouse scale.

Every field we extract from gelbeseiten.de

Complete German directory data at scale

From PLZ list to warehouse record

How our Gelbe Seiten pipeline handles the hard parts

Who uses Gelbe Seiten data

Gelbe Seiten scraper — technical capabilities

Infrastructure powering the directory pipeline

Your data, your destination

Common questions.

Tell us whatto extract. We do the rest.

Data Extraction for Every Industry

German business data,
at warehouse scale.

Tell us what
to extract.
We do the rest.