SYSTEM all green source gelbeseiten.de queue 12,841 pages p99 latency 218ms dataflirt.com · scraper/gelbeseiten-de
RUN · 42 active pipelines · gelbeseiten.de live

German business data,
at warehouse scale.

We extract verified business profiles, contact coordinates, categories, and operating hours from Gelbe Seiten. Delivered as clean JSON, CSV, or Parquet to S3, Postgres, or Snowflake on your cadence.

Listings extracted
1.2M /run
Phone numbers
845K /24h
Category updates
42K /day
Active pipelines
42
Uptime
99.98%
Data Dictionary

Every field we extract from gelbeseiten.de

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Business Profiles objects from gelbeseiten.de. All fields typed and schema-versioned.

business_idnamelegal_namedescriptioncategorysub_categoriesyear_establishedpremium_statusprofile_urllogo_url
business_profiles
● 200 OK
"business_id": "gs_98412941",
"name": "Müller Sanitärtechnik GmbH",
"category": "Klempner",
"premium_status": true,
"profile_url": "https://www.gelbeseiten.de/gs/mueller-sanitaertechnik",
"logo_url": "https://images.gelbeseiten.de/logo_98412941.jpg"
# business_idnamelegal_namedescriptioncategorysub_categories
1
2
3

Complete list of extractable fields for Contact & Location objects from gelbeseiten.de. All fields typed and schema-versioned.

business_idphone_primaryphone_secondaryemailwebsitestreet_addresspostal_codecitystatelatitudelongitudedirections_url
contact_& location
● 200 OK
"phone_primary": "+49 30 1234567",
"email": "info@mueller-sanitaer.de",
"website": "www.mueller-sanitaer.de",
"street_address": "Hauptstraße 42",
"postal_code": "10115",
"city": "Berlin",
"latitude": 52.531677
# business_idphone_primaryphone_secondaryemailwebsitestreet_address
1
2
3

Complete list of extractable fields for Operating Hours objects from gelbeseiten.de. All fields typed and schema-versioned.

business_idmonday_openmonday_closetuesday_opentuesday_closewednesday_openwednesday_closethursday_openthursday_closefriday_openfriday_closeweekend_hoursholiday_exceptions
operating_hours
● 200 OK
"business_id": "gs_98412941",
"monday_open": "08:00",
"monday_close": "17:00",
"friday_open": "08:00",
"friday_close": "15:00",
"weekend_hours": "Closed"
# business_idmonday_openmonday_closetuesday_opentuesday_closewednesday_open
1
2
3

Complete list of extractable fields for Ratings & Reviews objects from gelbeseiten.de. All fields typed and schema-versioned.

business_idrating_scorereview_countsource_platformrecent_review_datetop_review_textresponse_raterating_breakdown
ratings_& reviews
● 200 OK
"business_id": "gs_98412941",
"rating_score": 4.8,
"review_count": 124,
"source_platform": "Golocal",
"recent_review_date": "2026-05-10",
"response_rate": 0.95
# business_idrating_scorereview_countsource_platformrecent_review_datetop_review_text
1
2
3

Complete list of extractable fields for Search Results objects from gelbeseiten.de. All fields typed and schema-versioned.

keywordlocationradiuspositionbusiness_idnameis_sponsoredad_typesnippet_textscraped_at
search_results
● 200 OK
"keyword": "Klempner",
"location": "Berlin",
"position": 3,
"business_id": "gs_98412941",
"name": "Müller Sanitärtechnik GmbH",
"is_sponsored": false,
"scraped_at": "2026-05-12T09:14:33Z"
# keywordlocationradiuspositionbusiness_idname
1
2
3

Capabilities

Complete German directory data at scale

Our Gelbe Seiten scraper navigates complex regional search grids, resolves dynamic contact fields, and structures business profiles into clean warehouse records.

Full Business Profiles

Extract company names, legal entities, detailed descriptions, and primary/secondary industry categories from every listing.

Contact Information Extraction

Capture primary phone numbers, secondary lines, public email addresses, and external website URLs reliably.

Dynamic Phone Resolution

Render JavaScript and simulate click events to reveal masked phone numbers on heavily protected profiles.

Geolocation Normalisation

Extract precise latitude and longitude coordinates alongside structured postal addresses for geospatial mapping.

Operating Hours Parsing

Standardise opening and closing times across weekdays, weekends, and holiday exception schedules.

Rating Aggregation

Capture composite rating scores, total review counts, and source platform attributions displayed on listings.

Regional Search Traversal

Navigate complex search grids using German postal codes (PLZ) to bypass strict pagination limits.

Premium Listing Detection

Identify sponsored placements, premium profile flags, and ad types to map competitor marketing spend.

Scheduled Change Detection

Maintain a hash index of business records and only deliver diffs when contact details or statuses change.

// engagement pipeline

From PLZ list to warehouse record

Brief in. Clean data out.

Define Scope
d 0

Provide target categories, cities, or specific German postal codes (PLZ). We design the extraction schema.

Pipeline Build
d 2–4

We configure Scrapy crawlers, Playwright renderers for contact reveals, and German residential proxy pools.

Validation & QA
d 4–6

Schema validation, phone number format checks, and coordinate verification before full launch.

Delivery
ongoing

JSON, CSV, or Parquet pushed to your S3 bucket, Postgres database, or Snowflake stage.

Under the hood

How our Gelbe Seiten pipeline handles the hard parts

Extracting national directory data requires bypassing rate limits and dynamic field masking. Here is how we maintain pipeline stability.

pipeline-monitor · gelbeseiten.de · live ● active
// fingerprinting
Identity rotation
TLS fingerprintrandomised
User-agentrotated
IP poolresidential
Challenges blocked0
// pagination
Page coverage
48,291 pages queued running
// observability
Pipeline health
99.9%
uptime
142ms
p99 lat
0.3%
null rate
2
alerts
Anti-bot layer
German residential proxies

Directory sites aggressively block datacentre IPs. We route all requests through verified residential ISP proxies located within Germany, maintaining realistic request signatures and preventing region blocks.

JavaScript rendering
Playwright for contact reveals

Many listings mask phone numbers and email addresses behind JavaScript click events to deter basic HTTP scrapers. We deploy full Playwright browser sessions to trigger these events and capture the underlying data.

Pagination limits
Micro-radius spatial grids

Gelbe Seiten limits search results to a fixed number of pages per query. We bypass this by dividing major cities into micro-radius coordinate grids, ensuring zero data loss across dense commercial zones.

Schema stability
Fallback selectors for varied templates

Premium and free listings use entirely different DOM structures. Our extraction logic applies multi-layer fallback chains to normalise data regardless of the underlying profile template.

Change detection
Hash-indexed updates

For ongoing directory monitoring, we hash every listing. Subsequent runs only extract and deliver records that have changed, drastically reducing your storage costs and processing overhead.

Applications

Who uses Gelbe Seiten data

Teams across industries use gelbeseiten.de data to build competitive products and smarter operations.

01
B2B Lead Generation

Sales teams extract targeted lists of local businesses by industry and region to feed CRM pipelines.

02
Local SEO Monitoring

Agencies track client visibility and search rankings across specific German postal codes and categories.

03
Market Mapping

Analysts map business density and competitor distribution across federal states to guide expansion strategy.

04
Data Enrichment

Platforms append missing phone numbers, emails, and operating hours to existing incomplete business records.

05
Review Management

Brands monitor franchise locations for rating fluctuations and customer feedback across regional directories.

06
Geospatial Analysis

Logistics and retail planners use extracted coordinate data to optimise delivery routes and physical store placements.

Why DataFlirt

"Gelbe Seiten holds the definitive map of German commerce, but extracting structured postal and contact data at scale requires bypassing aggressive rate limits."

Most teams underestimate the investment required: reliable Gelbe Seiten scraping requires German residential proxies, full JavaScript rendering for contact reveals, and spatial search grids to bypass pagination limits. DataFlirt absorbs that complexity so your engineers can focus on the analysis.

Technical Spec

Gelbe Seiten scraper — technical capabilities

Everything supported by our gelbeseiten.de scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

JavaScript rendering
Playwright sessions required for click-to-reveal phone numbers
Supported
CAPTCHA bypass
Automated solver integration for rate-limit challenges
Supported
German residential proxies
DE-based ISP proxy pools to prevent geographic blocking
Supported
Spatial search grids
Radius-based pagination bypass for dense city centres
Supported
Operating hours normalisation
Standardised ISO 8601 formatting for time blocks
Supported
Change detection (diffs)
Hash-based diff logic to emit only changed business records
Supported
Webhook delivery
HTTP POST per record for real-time lead ingestion
Supported
User account credentials
Private saved lists and user-specific dashboard data
Partial
Direct messaging
Automated sending of messages to businesses via the platform
Partial
Infrastructure

Infrastructure powering the directory pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheus
Scrapy + Playwright Stack

Scrapy handles search grid orchestration and deduplication. Playwright executes JavaScript to reveal masked contact details on individual business profiles.

DE-Targeted Proxy Infrastructure

We route all directory traffic through German residential ISP proxies. This maintains high trust scores and bypasses aggressive datacentre IP blocking.

Cloud-Native Orchestration

Pipelines run on containerised AWS infrastructure. Airflow manages spatial grid dependencies and schedule frequencies, with all state stored in PostgreSQL.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON
Newline-delimited files containing nested business profiles
CSV
Flat tabular data ideal for CRM imports
XLS
Excel format for manual sales team review
Parquet
Columnar format for data warehouse ingestion
AWS S3
Direct bucket delivery on pipeline completion
Webhook
HTTP POST per record for immediate downstream processing
API
REST endpoints to query your extracted dataset
PostgreSQL
Direct database upserts with schema conflict resolution
S3
Direct bucket delivery — compatible with any data lake
// faq

Common questions.

About gelbeseiten.de scraping, legality, and pipeline operations.

Ask us directly →
Is scraping gelbeseiten.de legal?

Scraping public business directory data is generally permissible for B2B use cases. We extract only publicly visible company information, contact details, and operating hours. We do not extract personal consumer data or bypass authentication walls. Clients must ensure their subsequent use of contact data complies with GDPR and local telemarketing regulations.

How do you handle hidden phone numbers?

Gelbe Seiten frequently masks phone numbers requiring a user click to reveal them. We use headless Playwright browsers to load the page, execute the necessary JavaScript, simulate the interaction, and capture the fully rendered contact string.

How do you bypass the 50-page search limit?

Directory searches cap out at a fixed number of results per query. We map large cities into micro-radius coordinate grids or granular postal code (PLZ) lists. This forces the platform to return smaller, complete result sets that fit within the pagination limits.

Can I target specific German postal codes (PLZ)?

Yes. You can provide a specific list of PLZs, cities, or federal states. We inject these directly into the search logic to extract highly targeted regional datasets.

How fresh is the data?

We can configure pipelines to run daily, weekly, or monthly depending on your requirements. Most clients opt for a full initial extraction followed by weekly diff-based updates to capture new listings and modified operating hours.

Do you extract business emails?

Yes, we extract email addresses whenever they are publicly listed on the business profile. Note that some businesses rely solely on contact forms rather than publishing raw email addresses.

What is the minimum viable engagement?

Our minimum engagement typically starts with a defined category set across major German cities. For full national directory coverage, we price based on the total volume of records and the required update frequency. Contact us to scope your specific data needs.

$ dataflirt scope --new-project --source=gelbeseiten.de ready

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a national directory dump or targeted regional extracts — we scope, build, and operate the pipeline. Tell us what you need.

hello@dataflirt.com · Bengaluru · IST · typical reply < 4h
Services

Data Extraction for Every Industry

View All Services →