SYSTEM all green source cylex.de queue 12,941 pages p99 latency 214ms dataflirt.com · scraper/cylex-de
RUN : 34 active pipelines : cylex.de live

Cylex business data,
at warehouse scale.

We extract company profiles, contact details, category classifications, and customer reviews from Cylex Germany. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Snowflake on your cadence.

Businesses extracted
1.2M /month
Reviews scraped
450K /run
Contact updates
89K /24h
Active pipelines
34
Uptime
99.98%
Data Dictionary

Every field we extract from cylex.de

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Company Profiles objects from cylex.de. All fields typed and schema-versioned.

cylex_idbusiness_nameprimary_categorysub_categoriesdescriptionfounded_yearregistration_numbervat_numberprofile_urlverified_status
company_profiles
● 200 OK
"cylex_id": "12345678",
"business_name": "Muller Haustechnik GmbH",
"primary_category": "Sanitarinstallationen",
"description": "Ihr zuverlassiger Partner fur Sanitär und Heizung in Berlin.",
"founded_year": 1998,
"verified_status": true,
"profile_url": "https://www.cylex.de/firma/muller-haustechnik-gmbh-12345678.html"
# cylex_idbusiness_nameprimary_categorysub_categoriesdescriptionfounded_year
1
2
3

Complete list of extractable fields for Contact & Location objects from cylex.de. All fields typed and schema-versioned.

cylex_idstreet_addresspostal_codecitystatephone_primaryphone_mobilefaxemailwebsitelatitudelongitude
contact_& location
● 200 OK
"cylex_id": "12345678",
"street_address": "Kantstrasse 124",
"postal_code": "10625",
"city": "Berlin",
"phone_primary": "+49 30 1234567",
"email": "info@muller-haustechnik.de",
"website": "www.muller-haustechnik.de",
"latitude": 52.5065,
"longitude": 13.3032
# cylex_idstreet_addresspostal_codecitystatephone_primary
1
2
3

Complete list of extractable fields for Reviews & Ratings objects from cylex.de. All fields typed and schema-versioned.

review_idcylex_idreviewer_namerating_scorereview_textreview_datehelpful_votesowner_responseresponse_datesource_platform
reviews_& ratings
● 200 OK
"review_id": "rev_98765",
"cylex_id": "12345678",
"reviewer_name": "Klaus W.",
"rating_score": 4.5,
"review_text": "Schneller Service und faire Preise.",
"review_date": "2025-08-14",
"helpful_votes": 3,
"owner_response": "Vielen Dank fur Ihr Feedback!"
# review_idcylex_idreviewer_namerating_scorereview_textreview_date
1
2
3

Complete list of extractable fields for Operating Hours objects from cylex.de. All fields typed and schema-versioned.

cylex_idmondaytuesdaywednesdaythursdayfridaysaturdaysundayspecial_hourstimezone
operating_hours
● 200 OK
"cylex_id": "12345678",
"monday": "08:00 - 17:00",
"tuesday": "08:00 - 17:00",
"wednesday": "08:00 - 17:00",
"saturday": "Geschlossen",
"sunday": "Geschlossen",
"timezone": "Europe/Berlin"
# cylex_idmondaytuesdaywednesdaythursdayfriday
1
2
3

Complete list of extractable fields for Products & Services objects from cylex.de. All fields typed and schema-versioned.

cylex_idservice_listbrand_affiliationspayment_methodslanguages_spokencertificationssocial_media_linksfacilitiesimage_urls
products_& services
● 200 OK
"cylex_id": "12345678",
"service_list": "['Rohrreinigung', 'Heizungswartung', 'Badrenovierung']",
"payment_methods": "['Rechnung', 'Barzahlung', 'EC-Karte']",
"languages_spoken": "['Deutsch', 'Englisch']",
"certifications": "['Meisterbetrieb']",
"social_media_links": "['https://facebook.com/mullerhaustechnik']"
# cylex_idservice_listbrand_affiliationspayment_methodslanguages_spokencertifications
1
2
3

Capabilities

Everything you need from Cylex Germany

Our Cylex scraper handles category pagination, geolocated search results, and hidden contact details with JavaScript rendering and anti-bot circumvention built in.

Business Profile Extraction

Company name, description, registration numbers, and core metadata extracted cleanly from every listing.

Contact Detail Resolution

Extract phone numbers, fax, physical addresses, and resolve obfuscated email addresses via JavaScript rendering.

Review & Rating Mining

Full review text, star ratings, reviewer names, and owner responses paginated across all profile views.

Operating Hours & Status

Standard weekly hours, holiday exceptions, and temporary closure statuses mapped to standard formats.

Category & Taxonomy Mapping

Extract primary classifications and sub-categories to build accurate industry segmentations.

Geolocation & Maps Data

Capture exact latitude and longitude coordinates for spatial analysis and map integrations.

B2B Lead Enrichment

Cross-reference Cylex data with website links and social profiles to build comprehensive sales lists.

Local SEO Audit Data

Track business visibility, citation accuracy, and review velocity across German municipalities.

Scheduled + Streaming Modes

Run one-off bulk exports or configure continuous pipelines at monthly or weekly cadences.

// engagement pipeline

From city list to warehouse record

Brief in. Clean data out.

Define Scope
d 0

Provide categories, city names, or postal codes. We design the extraction schema together.

Pipeline Build
d 2–4

We configure Scrapy crawlers, proxy rotation, and session management for cylex.de.

Validation & QA
d 4–6

Schema validation, null-rate checks, and location accuracy tests before full launch.

Delivery
ongoing

JSON / CSV / Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage.

Under the hood

How our Cylex pipeline handles the hard parts

Directory sites aggressively block automated scraping to protect their data assets. Here is how we stay resilient.

pipeline-monitor · cylex.de · live ● active
// fingerprinting
Identity rotation
TLS fingerprintrandomised
User-agentrotated
IP poolresidential
Challenges blocked0
// pagination
Page coverage
48,291 pages queued running
// observability
Pipeline health
99.9%
uptime
142ms
p99 lat
0.3%
null rate
2
alerts
Anti-bot layer
German residential proxies

Cylex employs strict rate limits and IP reputation checks. We route requests through German ISP proxies to mimic local user traffic and prevent subnet bans.

JavaScript rendering
Resolving hidden contact data

Phone numbers and email addresses are often obfuscated or require interaction to view. We use Playwright to execute JavaScript and trigger these elements natively.

Pagination handling
Deep category traversal

Extracting an entire city category requires navigating complex pagination structures. Our spiders handle infinite scroll and URL parameter manipulation reliably.

Schema stability
Resilient DOM selectors

Directory layouts change to disrupt scrapers. We use fallback chains involving XPath, CSS, and JSON-LD structured data to maintain extraction accuracy.

Change detection
Only re-scrape what changed

For ongoing monitoring, we maintain a hash index of business records. Subsequent runs only push diffs, reducing downstream processing load.

Applications

Who uses Cylex data and how

Teams across industries use cylex.de data to build competitive products and smarter operations.

01
B2B Lead Generation

Sales teams extract hyper-local business lists by category and postal code to build targeted outreach campaigns.

02
Local SEO Monitoring

Agencies track citation consistency, review sentiment, and category rankings for client businesses across Germany.

03
Competitor Analysis

Retailers map competitor locations, service offerings, and operating hours to identify underserved regional markets.

04
Market Research

Consultancies analyse business density, opening/closure rates, and industry distribution across different federal states.

05
Map & Navigation Enrichment

GIS platforms ingest verified addresses, coordinates, and business names to improve local search accuracy.

06
Franchise Monitoring

Corporate brands audit franchise locations for brand compliance, correct contact details, and review management.

Why DataFlirt

"Cylex Germany holds millions of verified local business records, but extracting them at scale requires navigating strict rate limits and dynamic DOM structures."

Most teams underestimate the investment required. Reliable Cylex scraping requires residential proxies, full JavaScript rendering for obfuscated emails, CAPTCHA handling, daily selector maintenance, and anomaly monitoring. DataFlirt absorbs that complexity so your engineers can focus on the analysis, not the infrastructure.

Technical Spec

Cylex scraper technical capabilities

Everything supported by our cylex.de scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

JavaScript rendering
Required to reveal obfuscated phone numbers and email addresses
Supported
CAPTCHA bypass
Automated solver integration for bot-protection walls
Supported
Residential proxy rotation
German ISP proxies to bypass geographic and rate-limit blocks
Supported
Multi-city targeting
Crawl specific postal codes, cities, or federal states
Supported
Review pagination
Extract all historical reviews, not just the recent highlights
Supported
Change detection (diffs)
Only emit records with changed fields since the last pipeline run
Supported
Webhook delivery
HTTP POST per record for real-time CRM ingestion
Supported
User account settings
Access to private business owner dashboards and analytics
Partial
Private direct messages
Extraction of private messaging between users and businesses
Partial
Infrastructure

Infrastructure powering the Cylex pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheus
Scrapy + Playwright Stack

Scrapy handles crawl orchestration and deduplication. Playwright handles JavaScript rendering for hidden contact details.

Residential Proxy Infrastructure

We maintain pools of German residential ISP proxies. Rotation happens per-request to prevent rate-limiting.

Cloud-Native Orchestration

Pipelines run on AWS ECS. Airflow handles scheduling and dependency management. All state stored in Postgres.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON
Newline-delimited or nested array structures
CSV
Flat file with typed columns for CRM imports
XLS
Excel compatible format for business analysts
Parquet
Columnar format for BigQuery and Snowflake
AWS S3
Direct bucket delivery on defined schedules
Webhook
HTTP POST per record for immediate processing
API
REST endpoints to query your extracted datasets
PostgreSQL
Direct database upserts with conflict resolution
S3
Direct bucket delivery — compatible with any data lake
// faq

Common questions.

About cylex.de scraping, legality, and pipeline operations.

Ask us directly →
Is scraping Cylex legal?

Scraping publicly available business directory information is generally permissible for non-personal data. DataFlirt extracts only public company profiles, public contact details, and public reviews. We do not bypass authenticated user areas. Clients should consult legal counsel regarding GDPR compliance for B2B contact data usage in Germany.

How do you handle hidden phone numbers and emails?

Cylex often uses JavaScript obfuscation or interaction requirements to display full contact details. We utilise headless Playwright browsers to execute the necessary scripts and render the DOM exactly as a human user would see it.

Can you target specific cities or categories?

Yes. We can configure the pipeline to target specific postal codes, municipalities, federal states, or industry categories based on your exact requirements.

How fresh is the data?

We can run one-off historical dumps or set up weekly/monthly recurring pipelines to capture new business listings, updated contact details, and fresh reviews.

Do you provide geocoding data?

Yes. We extract the latitude and longitude coordinates embedded in Cylex map widgets for every business profile.

Can I request a sample dataset?

Yes. We provide a sample run of up to 500 business records during the scoping phase so you can validate schema fit and data quality before signing a contract.

$ dataflirt scope --new-project --source=cylex.de ready

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off directory dump or a continuous monitoring feed across 1M businesses, we scope, build, and operate the pipeline.

hello@dataflirt.com · Bengaluru · IST · typical reply < 4h
Services

Data Extraction for Every Industry

View All Services →