We extract company profiles, contact details, category classifications, and customer reviews from Cylex Germany. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Snowflake on your cadence.
Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.
Complete list of extractable fields for Company Profiles objects from cylex.de. All fields typed and schema-versioned.
"cylex_id": "12345678", "business_name": "Muller Haustechnik GmbH", "primary_category": "Sanitarinstallationen", "description": "Ihr zuverlassiger Partner fur Sanitär und Heizung in Berlin.", "founded_year": 1998, "verified_status": true, "profile_url": "https://www.cylex.de/firma/muller-haustechnik-gmbh-12345678.html"
| # | cylex_id | business_name | primary_category | sub_categories | description | founded_year |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for Contact & Location objects from cylex.de. All fields typed and schema-versioned.
"cylex_id": "12345678", "street_address": "Kantstrasse 124", "postal_code": "10625", "city": "Berlin", "phone_primary": "+49 30 1234567", "email": "info@muller-haustechnik.de", "website": "www.muller-haustechnik.de", "latitude": 52.5065, "longitude": 13.3032
| # | cylex_id | street_address | postal_code | city | state | phone_primary |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for Reviews & Ratings objects from cylex.de. All fields typed and schema-versioned.
"review_id": "rev_98765", "cylex_id": "12345678", "reviewer_name": "Klaus W.", "rating_score": 4.5, "review_text": "Schneller Service und faire Preise.", "review_date": "2025-08-14", "helpful_votes": 3, "owner_response": "Vielen Dank fur Ihr Feedback!"
| # | review_id | cylex_id | reviewer_name | rating_score | review_text | review_date |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for Operating Hours objects from cylex.de. All fields typed and schema-versioned.
"cylex_id": "12345678", "monday": "08:00 - 17:00", "tuesday": "08:00 - 17:00", "wednesday": "08:00 - 17:00", "saturday": "Geschlossen", "sunday": "Geschlossen", "timezone": "Europe/Berlin"
| # | cylex_id | monday | tuesday | wednesday | thursday | friday |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for Products & Services objects from cylex.de. All fields typed and schema-versioned.
"cylex_id": "12345678", "service_list": "['Rohrreinigung', 'Heizungswartung', 'Badrenovierung']", "payment_methods": "['Rechnung', 'Barzahlung', 'EC-Karte']", "languages_spoken": "['Deutsch', 'Englisch']", "certifications": "['Meisterbetrieb']", "social_media_links": "['https://facebook.com/mullerhaustechnik']"
| # | cylex_id | service_list | brand_affiliations | payment_methods | languages_spoken | certifications |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Our Cylex scraper handles category pagination, geolocated search results, and hidden contact details with JavaScript rendering and anti-bot circumvention built in.
Company name, description, registration numbers, and core metadata extracted cleanly from every listing.
Extract phone numbers, fax, physical addresses, and resolve obfuscated email addresses via JavaScript rendering.
Full review text, star ratings, reviewer names, and owner responses paginated across all profile views.
Standard weekly hours, holiday exceptions, and temporary closure statuses mapped to standard formats.
Extract primary classifications and sub-categories to build accurate industry segmentations.
Capture exact latitude and longitude coordinates for spatial analysis and map integrations.
Cross-reference Cylex data with website links and social profiles to build comprehensive sales lists.
Track business visibility, citation accuracy, and review velocity across German municipalities.
Run one-off bulk exports or configure continuous pipelines at monthly or weekly cadences.
Brief in. Clean data out.
Provide categories, city names, or postal codes. We design the extraction schema together.
We configure Scrapy crawlers, proxy rotation, and session management for cylex.de.
Schema validation, null-rate checks, and location accuracy tests before full launch.
JSON / CSV / Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage.
Directory sites aggressively block automated scraping to protect their data assets. Here is how we stay resilient.
Cylex employs strict rate limits and IP reputation checks. We route requests through German ISP proxies to mimic local user traffic and prevent subnet bans.
Phone numbers and email addresses are often obfuscated or require interaction to view. We use Playwright to execute JavaScript and trigger these elements natively.
Extracting an entire city category requires navigating complex pagination structures. Our spiders handle infinite scroll and URL parameter manipulation reliably.
Directory layouts change to disrupt scrapers. We use fallback chains involving XPath, CSS, and JSON-LD structured data to maintain extraction accuracy.
For ongoing monitoring, we maintain a hash index of business records. Subsequent runs only push diffs, reducing downstream processing load.
Sales teams extract hyper-local business lists by category and postal code to build targeted outreach campaigns.
Agencies track citation consistency, review sentiment, and category rankings for client businesses across Germany.
Retailers map competitor locations, service offerings, and operating hours to identify underserved regional markets.
Consultancies analyse business density, opening/closure rates, and industry distribution across different federal states.
GIS platforms ingest verified addresses, coordinates, and business names to improve local search accuracy.
Corporate brands audit franchise locations for brand compliance, correct contact details, and review management.
"Cylex Germany holds millions of verified local business records, but extracting them at scale requires navigating strict rate limits and dynamic DOM structures."
Most teams underestimate the investment required. Reliable Cylex scraping requires residential proxies, full JavaScript rendering for obfuscated emails, CAPTCHA handling, daily selector maintenance, and anomaly monitoring. DataFlirt absorbs that complexity so your engineers can focus on the analysis, not the infrastructure.
Everything supported by our cylex.de scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.
Open-source tooling on proven cloud infra — no vendor lock-in, full observability.
Scrapy handles crawl orchestration and deduplication. Playwright handles JavaScript rendering for hidden contact details.
We maintain pools of German residential ISP proxies. Rotation happens per-request to prevent rate-limiting.
Pipelines run on AWS ECS. Airflow handles scheduling and dependency management. All state stored in Postgres.
Data delivered to where your team already works — no new tooling required.
About cylex.de scraping, legality, and pipeline operations.
Ask us directly →Scraping publicly available business directory information is generally permissible for non-personal data. DataFlirt extracts only public company profiles, public contact details, and public reviews. We do not bypass authenticated user areas. Clients should consult legal counsel regarding GDPR compliance for B2B contact data usage in Germany.
Cylex often uses JavaScript obfuscation or interaction requirements to display full contact details. We utilise headless Playwright browsers to execute the necessary scripts and render the DOM exactly as a human user would see it.
Yes. We can configure the pipeline to target specific postal codes, municipalities, federal states, or industry categories based on your exact requirements.
We can run one-off historical dumps or set up weekly/monthly recurring pipelines to capture new business listings, updated contact details, and fresh reviews.
Yes. We extract the latitude and longitude coordinates embedded in Cylex map widgets for every business profile.
Yes. We provide a sample run of up to 500 business records during the scoping phase so you can validate schema fit and data quality before signing a contract.
20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off directory dump or a continuous monitoring feed across 1M businesses, we scope, build, and operate the pipeline.