SYSTEM all green source wellfound.com queue 18,432 companies p99 latency 184ms dataflirt.com · scraper/wellfound-com
RUN . 142 active pipelines . wellfound.com live

Startup talent data,
at warehouse scale.

We extract job listings, equity ranges, founder profiles, tech stacks, and company funding signals from Wellfound. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Snowflake on your cadence.

Jobs extracted
145K /day
Salary updates
42K /24h
Company profiles
89K /run
Active pipelines
142
Uptime
99.98%
Data Dictionary

Every field we extract from wellfound.com

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Job Postings objects from wellfound.com. All fields typed and schema-versioned.

job_idtitlecompany_idlocationremote_policysalary_minsalary_maxequity_minequity_maxjob_typeexperience_requiredskillsvisa_sponsorshipposted_at
job_postings
● 200 OK
"job_id": "1492834",
"title": "Senior Backend Engineer",
"company_id": "83921",
"location": "San Francisco, CA",
"remote_policy": "Remote within US",
"salary_min": 150000,
"salary_max": 180000,
"equity_min": 0.1,
"equity_max": 0.5,
"visa_sponsorship": false
# job_idtitlecompany_idlocationremote_policysalary_min
1
2
3

Complete list of extractable fields for Company Profiles objects from wellfound.com. All fields typed and schema-versioned.

company_idnamewebsiteindustrysizelocationpitchdescriptionfunding_totalfunding_stagetech_stackfoundersemployee_count
company_profiles
● 200 OK
"company_id": "83921",
"name": "FinScale",
"industry": "Fintech",
"size": "51-200",
"funding_total": 24000000,
"funding_stage": "Series A",
"tech_stack": "['Python', 'React', 'PostgreSQL', 'AWS']",
"employee_count": 84
# company_idnamewebsiteindustrysizelocation
1
2
3

Complete list of extractable fields for Salary & Equity objects from wellfound.com. All fields typed and schema-versioned.

job_idtitlecurrencybase_salary_minbase_salary_maxequity_minequity_maxrole_typemarket_rate_comparisonupdated_at
salary_& equity
● 200 OK
"job_id": "1492834",
"title": "Senior Backend Engineer",
"currency": "USD",
"base_salary_min": 150000,
"base_salary_max": 180000,
"equity_min": 0.1,
"equity_max": 0.5,
"updated_at": "2026-03-14T10:00:00Z"
# job_idtitlecurrencybase_salary_minbase_salary_maxequity_min
1
2
3

Complete list of extractable fields for Founders & Team objects from wellfound.com. All fields typed and schema-versioned.

person_idnamecurrent_rolecompany_idlinkedin_urltwitter_urlbiopast_experienceeducationjoined_date
founders_& team
● 200 OK
"person_id": "92831",
"name": "Sarah Jenkins",
"current_role": "Co-Founder & CEO",
"company_id": "83921",
"linkedin_url": "linkedin.com/in/sarahjenkins",
"twitter_url": "twitter.com/sarahj",
"bio": "Former VP Product at Stripe.",
"joined_date": "2022-01-15"
# person_idnamecurrent_rolecompany_idlinkedin_urltwitter_url
1
2
3

Complete list of extractable fields for Search Results objects from wellfound.com. All fields typed and schema-versioned.

keywordlocationpositioncompany_namejob_titlesalary_rangeequity_rangeremote_badgeact_fast_badgescraped_at
search_results
● 200 OK
"keyword": "machine learning",
"location": "Remote",
"position": 3,
"company_name": "AI Dynamics",
"job_title": "ML Engineer",
"salary_range": "$140k - $190k",
"remote_badge": true,
"act_fast_badge": false,
"scraped_at": "2026-03-14T10:15:00Z"
# keywordlocationpositioncompany_namejob_titlesalary_range
1
2
3

Capabilities

Extract hiring signals and startup intelligence

Our Wellfound scraper navigates Cloudflare protections and dynamic React hydration to extract accurate compensation data, funding signals, and tech stacks at scale.

Startup Profile Extraction

Company names, pitches, descriptions, funding stages, total capital raised, and employee count brackets mapped to unique company IDs.

Job Listing Parsing

Extract job titles, locations, remote policies, required experience levels, and visa sponsorship availability for every active role.

Compensation & Equity Data

Capture base salary ranges, equity percentages, and currency types. Wellfound holds the most accurate early-stage compensation data.

Tech Stack Mapping

Extract programming languages, frameworks, and infrastructure tools listed on company profiles and job descriptions.

Founder Intelligence

Scrape founder bios, past experience, education, and social links to build comprehensive talent intelligence graphs.

Remote Work Signals

Identify timezone overlap requirements, remote-first policies, and geographical hiring constraints.

Recruiter Activity Tracking

Monitor 'Actively Hiring' badges, recent activity timestamps, and response rate indicators to gauge hiring urgency.

Market Categorisation

Extract industry tags like Fintech, SaaS, Web3, and AI to classify companies into specific market segments.

Scheduled Diffs

Run continuous pipelines to detect newly posted jobs, closed roles, and updated salary bands without downloading the entire catalogue.

// engagement pipeline

From company list to warehouse record

Brief in. Clean data out.

Define Scope
d 0

Provide company URLs, search keywords, or industry tags. We design the extraction schema together.

Pipeline Build
d 2–4

We configure Scrapy crawlers, intercept GraphQL queries, and manage residential proxy rotation for wellfound.com.

Validation & QA
d 4–6

Schema validation, null-rate checks, and salary outlier detection before full launch.

Delivery
ongoing

JSON, CSV, or Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.

Under the hood

How our Wellfound pipeline handles the hard parts

Wellfound protects its data with strict rate limits and dynamic front-end architectures. Here is how we maintain stable extraction.

pipeline-monitor · wellfound.com · live ● active
// fingerprinting
Identity rotation
TLS fingerprintrandomised
User-agentrotated
IP poolresidential
Challenges blocked0
// pagination
Page coverage
48,291 pages queued running
// observability
Pipeline health
99.9%
uptime
142ms
p99 lat
0.3%
null rate
2
alerts
Anti-bot layer
Cloudflare bypass and proxy rotation

Wellfound relies heavily on Cloudflare for bot mitigation. Our infrastructure uses residential proxies combined with TLS fingerprint spoofing and automated challenge solvers to maintain access without triggering blocks.

API Interception
Undocumented GraphQL queries

Instead of parsing complex React DOM structures, our Playwright sessions intercept the underlying GraphQL network requests. This provides cleaner, more structured data directly from Wellfound's backend.

Pagination
Handling infinite scroll and limits

Wellfound limits search results to a specific number of pages. We bypass this by programmatically slicing search queries by granular locations, salary brackets, and tech stacks to extract the full dataset.

Change detection
Tracking job lifecycle

We maintain a state index of all active jobs. Subsequent runs only push diffs, allowing you to accurately track exactly when a role is opened, updated, or closed.

Monitoring
Schema drift detection

Wellfound updates its GraphQL schema frequently. Our observability stack detects missing fields or type changes immediately, automatically pausing delivery and alerting our engineers to patch the selectors.

Applications

Who uses Wellfound data - and how

Teams across industries use wellfound.com data to build competitive products and smarter operations.

01
Talent Intelligence

Recruiting agencies and internal talent teams map tech stacks and salary ranges to optimise their sourcing strategies.

02
Venture Capital Deal Flow

VC firms monitor hiring velocity, key executive appointments, and tech stack choices as leading indicators of startup growth.

03
Compensation Benchmarking

HR platforms aggregate Wellfound salary and equity data to build accurate compensation models for early-stage companies.

04
B2B Lead Generation

SaaS companies target startups based on their funding stage, employee count, and specific technologies listed in job descriptions.

05
Market Research

Analysts track the rise of new programming languages and frameworks by analyzing occurrence rates in startup job postings.

06
Job Board Aggregation

Niche job boards syndicate remote and startup-specific roles to expand their catalogue and drive candidate traffic.

Why DataFlirt

"Wellfound holds the most accurate equity and compensation signals for early-stage startups on the internet, but it is locked behind heavy rate limits and dynamic endpoints."

Extracting startup data requires navigating strict Cloudflare protections, complex React hydration states, and undocumented GraphQL queries. DataFlirt handles the proxy rotation, session management, and schema maintenance so your data science team can focus on identifying hiring signals and market trends.

Technical Spec

Wellfound scraper - technical capabilities

Everything supported by our wellfound.com scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

JavaScript rendering
Full Playwright sessions to execute React hydration and trigger API calls
Supported
GraphQL interception
Direct extraction of structured JSON payloads from network traffic
Supported
Residential proxy rotation
ISP-grade IPs to bypass Cloudflare rate limits and IP bans
Supported
Change detection (diffs)
Hash-based diffing to track job openings and closures accurately
Supported
Webhook delivery
HTTP POST per record for real-time downstream processing
Supported
Historical funding data
Extraction of past funding rounds and investor lists where public
Supported
Candidate profiles & resumes
Private applicant data and resumes are strictly protected by Wellfound
Partial
Direct messaging to founders
Requires authenticated recruiter accounts and violates platform terms
Partial
Infrastructure

Infrastructure powering the Wellfound pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheusGraphQLSnowflake
GraphQL Extraction Stack

Playwright intercepts Wellfound's internal GraphQL queries, bypassing the need to parse complex React DOM structures and ensuring cleaner data extraction.

Cloudflare Bypass Infrastructure

We maintain custom TLS fingerprints and residential proxy pools specifically tuned to navigate Wellfound's strict bot mitigation layers without detection.

Cloud-Native Orchestration

Pipelines run on AWS ECS with Airflow managing dependency graphs and SLA alerting. State is maintained in Postgres for accurate change detection.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON
Newline-delimited or nested arrays containing full job details
CSV
Flat file with typed columns for easy spreadsheet import
XLS
Excel format for immediate business user analysis
Parquet
Columnar format optimized for BigQuery and Snowflake
AWS S3
Direct delivery to your AWS environment on completion
Webhook
HTTP POST payloads sent immediately upon job discovery
API
Queryable REST endpoints to access your extracted datasets
PostgreSQL
Direct database inserts with conflict resolution for existing roles
S3
Direct bucket delivery — compatible with any data lake
// faq

Common questions.

About wellfound.com scraping, legality, and pipeline operations.

Ask us directly →
Can you extract salary and equity data from all jobs?

We extract all compensation data that is publicly visible on the platform. Wellfound is unique because it requires startups to post salary and equity ranges for most roles, making this data highly available and accurate.

How do you handle Wellfound's search pagination limits?

Wellfound caps the number of visible results for broad searches. Our orchestration engine automatically slices broad queries into hundreds of granular micro-searches based on specific locations, salary bands, and tech stacks to ensure 100% coverage.

Do you scrape candidate profiles?

No. DataFlirt focuses exclusively on public company profiles, job listings, and founder information. We do not extract private candidate data, resumes, or bypass authentication walls intended to protect user privacy.

How fresh is the job data?

Pipelines can be configured to run daily or hourly. Our change detection system ensures that closed jobs are flagged and new postings are delivered within minutes of the pipeline completing its run.

Can you track changes in startup funding?

Yes. We extract the funding stage and total capital raised from the company profile. By running continuous pipelines, we can log when a company updates its profile to reflect a new funding round.

What is the delivery format for tech stacks?

Tech stacks and required skills are extracted as structured JSON arrays, making it simple to query for specific languages or frameworks in your data warehouse.

$ dataflirt scope --new-project --source=wellfound.com ready

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a complete export of startup profiles or a continuous feed of new engineering roles - we build and operate the infrastructure. Tell us your requirements.

hello@dataflirt.com · Bengaluru · IST · typical reply < 4h
Services

Data Extraction for Every Industry

View All Services →