SYSTEM all green source stepstone.de queue 14,892 pages p99 latency 185ms dataflirt.com · scraper/stepstone-de
RUN · 84 active pipelines · stepstone.de live

Stepstone data,
at warehouse scale.

We extract job postings, employer profiles, salary estimates, and geographical distribution metrics from stepstone.de. Delivered as clean JSON, CSV, or Parquet to S3 or BigQuery on your schedule.

Jobs extracted
142K /day
Salary points
85K /24h
Company profiles
12K /run
Active pipelines
84
Uptime
99.98%
Data Dictionary

Every field we extract from stepstone.de

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Job Postings objects from stepstone.de. All fields typed and schema-versioned.

job_idtitlecompany_namelocationcontract_typework_modelposted_datesalary_minsalary_maxdescriptionrequired_skillsbenefits
job_postings
● 200 OK
"job_id": "8947210",
"title": "Senior Data Engineer (m/w/d)",
"company_name": "TechLogix GmbH",
"location": "Berlin",
"contract_type": "Vollzeit",
"work_model": "Hybrid",
"salary_min": 75000,
"salary_max": 95000,
"posted_date": "2026-05-10"
# job_idtitlecompany_namelocationcontract_typework_model
1
2
3

Complete list of extractable fields for Company Profiles objects from stepstone.de. All fields typed and schema-versioned.

company_idnameindustrysizewebsiteheadquartersratingreview_countactive_jobs_countbenefits_list
company_profiles
● 200 OK
"company_id": "C-4921",
"name": "TechLogix GmbH",
"industry": "IT-Dienstleistungen",
"size": "501-1000",
"headquarters": "Berlin",
"rating": 4.2,
"active_jobs_count": 47,
"website": "https://techlogix.de"
# company_idnameindustrysizewebsiteheadquarters
1
2
3

Complete list of extractable fields for Salary Data objects from stepstone.de. All fields typed and schema-versioned.

job_titlelocationexperience_levelbase_salarybonustotal_compensationdata_points_countconfidence_scorecurrency
salary_data
● 200 OK
"job_title": "Data Engineer",
"location": "München",
"experience_level": "Senior",
"base_salary": 85000,
"total_compensation": 92000,
"data_points_count": 124,
"confidence_score": "High",
"currency": "EUR"
# job_titlelocationexperience_levelbase_salarybonustotal_compensation
1
2
3

Complete list of extractable fields for Search Results objects from stepstone.de. All fields typed and schema-versioned.

keywordlocationrankjob_idtitlecompanyposted_agopromoted_flageasy_apply_flag
search_results
● 200 OK
"keyword": "Python Developer",
"location": "Hamburg",
"rank": 1,
"job_id": "8931024",
"title": "Python Backend Developer",
"company": "HanseTech",
"promoted_flag": true,
"easy_apply_flag": false
# keywordlocationrankjob_idtitlecompany
1
2
3

Complete list of extractable fields for Skill Requirements objects from stepstone.de. All fields typed and schema-versioned.

job_idtitlehard_skillssoft_skillslanguageseducation_levelcertificationsyears_experiencetools_software
skill_requirements
● 200 OK
"job_id": "8947210",
"hard_skills": "['Python', 'SQL', 'AWS']",
"soft_skills": "['Kommunikation', 'Teamfähigkeit']",
"languages": "['Deutsch', 'Englisch']",
"education_level": "Bachelor",
"years_experience": "3-5 Jahre",
"tools_software": "['Docker', 'Kubernetes', 'Airflow']"
# job_idtitlehard_skillssoft_skillslanguageseducation_level
1
2
3

Capabilities

Deep labour market intelligence from Stepstone

Our Stepstone scraper navigates search pagination, dynamic job descriptions, and salary estimates while bypassing strict anti-bot measures to deliver structured DACH labour data.

Full Job Description Extraction

Extract complete job texts, including requirements, responsibilities, benefits, and company descriptions, parsed into structured fields.

Salary Estimate Capture

Scrape Stepstone Gehaltsplaner data and job-specific salary ranges, including minimum, maximum, and median figures.

Company Hub Data

Extract employer profiles, including industry, employee count, headquarters location, and aggregated employee ratings.

Location & Remote Work Flags

Capture precise job locations, hybrid work models, and fully remote flags to map geographical hiring trends.

Posting Chronology

Track exact posting dates and active duration to calculate time-to-hire metrics and vacancy ageing.

Keyword & SERP Tracking

Monitor job search rankings for specific titles or skills, differentiating between organic listings and promoted placements.

Skill & Tech Stack Parsing

Isolate programming languages, certifications, and soft skills from unstructured job descriptions using regex and NLP.

Incremental Updates

Run daily diffs to capture only new job postings, modified listings, or removed vacancies without re-downloading the entire catalogue.

Anti-Bot Circumvention

Bypass Stepstone's Datadome and Cloudflare protections using residential German IP proxies and headless browser fingerprinting.

// engagement pipeline

From search parameters to structured data

Brief in. Clean data out.

Define Scope
d 0

Specify target keywords, locations, industries, or specific company IDs. We map the extraction schema to your requirements.

Pipeline Build
d 2–4

We configure Playwright crawlers, German residential proxy rotation, and session management to navigate stepstone.de.

Validation & QA
d 4–6

We test data completeness, verify salary extraction accuracy, and ensure location fields are correctly normalised.

Delivery
ongoing

Clean JSON, CSV, or Parquet files delivered to your AWS S3 bucket or Snowflake instance on a daily or weekly schedule.

Under the hood

Overcoming Stepstone's extraction barriers

Job boards protect their listings aggressively. Here is how we maintain stable extraction pipelines against strict mitigation systems.

pipeline-monitor · stepstone.de · live ● active
// fingerprinting
Identity rotation
TLS fingerprintrandomised
User-agentrotated
IP poolresidential
Challenges blocked0
// pagination
Page coverage
48,291 pages queued running
// observability
Pipeline health
99.9%
uptime
142ms
p99 lat
0.3%
null rate
2
alerts
Bot Mitigation
Bypassing Datadome and Cloudflare

Stepstone uses advanced bot protection that flags data centre IPs and headless browsers. We route requests through German residential proxies and use Playwright with stealth plugins to mimic legitimate user behaviour.

Dynamic Content
Rendering React-based job pages

Job descriptions and salary widgets on Stepstone are rendered client-side. We execute full JavaScript sessions to ensure all dynamic elements, including hidden contact details and expandable text blocks, are captured.

Pagination Limits
Navigating deep search results

Search results are capped at a specific number of pages. We use granular geographic and keyword filtering to break down large queries, ensuring we extract the entire catalogue without hitting pagination walls.

Data Normalisation
Structuring varied job descriptions

Employers format job postings differently. We apply post-processing to normalise contract types, standardise location names, and extract specific data points like salary ranges from free-text descriptions.

State Management
Tracking vacancy lifecycles

We maintain a database of active job IDs. By comparing current runs against historical state, we accurately report when a job is closed or modified, providing precise time-to-fill metrics.

Applications

Who uses Stepstone data — and how

Teams across industries use stepstone.de data to build competitive products and smarter operations.

01
Labour Market Analytics

Economic researchers and analysts track hiring volume, skill demand, and salary trends across the DACH region.

02
Competitor Intelligence

Enterprises monitor rival hiring activity to deduce strategic shifts, expansion plans, and technology stack adoption.

03
Salary Benchmarking

HR departments use aggregated salary estimates to ensure their compensation packages remain competitive in specific regions.

04
Lead Generation for B2B

Sales teams identify companies actively hiring for specific roles (e.g., IT Directors) as a signal for software or service procurement.

05
Job Board Aggregation

Niche job portals supplement their own inventory by aggregating relevant listings from Stepstone.

06
Real Estate & Urban Planning

Analysts correlate job location data and remote work trends with commercial real estate demand and urban migration patterns.

Why DataFlirt

"Stepstone.de holds the most accurate pulse on the DACH region's labour market, but extracting that intelligence requires bypassing aggressive bot mitigation."

Job boards deploy heavy anti-scraping measures to protect their primary asset. Reliable Stepstone extraction demands residential proxies, JavaScript rendering, and constant DOM monitoring. DataFlirt manages this infrastructure entirely, delivering structured labour market intelligence straight to your warehouse.

Technical Spec

Stepstone scraper — technical capabilities

Everything supported by our stepstone.de scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

JavaScript rendering
Full Playwright sessions to render React components and dynamic salary widgets
Supported
CAPTCHA & Bot bypass
Automated handling of Datadome challenges and Cloudflare turnstiles
Supported
Geo-targeted proxies
German residential IPs to ensure localized search results and prevent blocking
Supported
Salary extraction
Capture of Stepstone Gehaltsplaner estimates and explicit job salary ranges
Supported
Company rating capture
Extraction of aggregated employer scores and review counts from company hubs
Supported
Promoted job detection
Flags indicating paid placement versus organic search ranking
Supported
Historical vacancy tracking
Diffing logic to track when a job is posted, modified, and removed
Supported
Candidate CV download
Accessing user profiles or candidate resumes requires authenticated employer access
Partial
Application tracking data
Number of applicants per job is gated behind employer login
Partial
Infrastructure

Infrastructure powering the Stepstone pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheusDatadome Bypass
Scrapy + Playwright Stack

Scrapy manages orchestration and retry logic, while Playwright handles JavaScript execution to render Stepstone's dynamic job descriptions and salary widgets.

Localised Proxy Infrastructure

We maintain pools of German residential ISP proxies. Rotation happens per-request to mimic local user traffic and bypass geographic rate limits.

Cloud-Native Orchestration

Pipelines run on Kubernetes clusters. Airflow handles scheduling for daily job diffs, ensuring data freshness. All state is stored in managed PostgreSQL.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON
Newline-delimited or nested arrays for complex job descriptions
CSV
Flat file with typed columns for quick analyst review
XLS
Excel format for non-technical HR and recruitment teams
Parquet
Columnar format optimised for Athena, BigQuery, and Snowflake
AWS S3
Direct delivery to your bucket on a defined schedule
Webhook
HTTP POST per new job posting for real-time alerts
API
REST endpoints to query your extracted Stepstone dataset
BigQuery
Streamed directly into your dataset with schema auto-detect
PostgreSQL
Direct database inserts for integration with your internal tools
S3
Direct bucket delivery — compatible with any data lake
// faq

Common questions.

About stepstone.de scraping, legality, and pipeline operations.

Ask us directly →
Is scraping Stepstone legal?

Scraping publicly available job postings is generally permissible. DataFlirt extracts only public, non-authenticated job and company data. We do not access candidate profiles, CVs, or bypass authentication walls. Clients should consult legal counsel regarding their specific data usage.

How do you handle Stepstone's bot protection?

We utilise German residential proxies, headless browsers with realistic fingerprints, and randomised request intervals. This approach effectively navigates Datadome and Cloudflare protections without triggering blocks.

Can you extract salary data even if it is not explicitly stated?

Yes. When employers do not list a salary, Stepstone often provides an estimated range via their Gehaltsplaner feature. We extract this estimate alongside the job posting.

How frequently can I receive data updates?

Most clients opt for daily updates to track new postings and removed vacancies. We can configure pipelines for hourly runs if you require near real-time alerts for specific keywords.

Do you support other DACH job boards?

Yes. We also build pipelines for Xing, LinkedIn, Indeed.de, and regional portals to provide comprehensive coverage of the German-speaking labour market.

Can I get a sample of the extracted job data?

Yes. We provide a sample dataset of up to 1,000 job postings based on your target criteria during the scoping phase, allowing you to validate the schema and data quality.

$ dataflirt scope --new-project --source=stepstone.de ready

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a daily feed of IT roles in Berlin or a comprehensive dump of DACH salary data — we build and operate the pipeline. Tell us your requirements.

hello@dataflirt.com · Bengaluru · IST · typical reply < 4h
Services

Data Extraction for Every Industry

View All Services →