SYSTEM all green source angel.co queue 12,491 profiles p99 latency 218ms dataflirt.com · scraper/angel-co
RUN - 84 active pipelines - angel.co live

Startup data,
at warehouse scale.

We extract company profiles, job listings, equity bands, founder histories, and funding rounds from Angel.co (Wellfound). Delivered as clean JSON, CSV, or Parquet to S3 or Snowflake on your cadence.

Startups extracted
142K /month
Job postings
84K /24h
Founder profiles
312K /run
Active pipelines
84
Uptime
99.94%
Data Dictionary

Every field we extract from angel.co

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Company Profiles objects from angel.co. All fields typed and schema-versioned.

company_idnamewebsiteangel_urllocationemployee_countfunding_stagetotal_raisedmarketstech_stackpitchdescriptionfounders
company_profiles
● 200 OK
"company_id": "84921",
"name": "Stripe",
"website": "stripe.com",
"employee_count": "1000-5000",
"funding_stage": "Series I",
"total_raised": 8700000000.0,
"markets": "['Fintech', 'Payments', 'SaaS']"
# company_idnamewebsiteangel_urllocationemployee_count
1
2
3

Complete list of extractable fields for Job Listings objects from angel.co. All fields typed and schema-versioned.

job_idcompany_nametitlerole_typelocationremote_policysalary_minsalary_maxcurrencyequity_minequity_maxposted_datedescription
job_listings
● 200 OK
"job_id": "j-928174",
"title": "Senior Backend Engineer",
"role_type": "Full-time",
"remote_policy": "Remote",
"salary_min": 140000,
"salary_max": 180000,
"currency": "USD",
"equity_min": 0.1,
"equity_max": 0.25
# job_idcompany_nametitlerole_typelocationremote_policy
1
2
3

Complete list of extractable fields for Founder & Team Data objects from angel.co. All fields typed and schema-versioned.

person_idnamerolecompanylinkedin_urltwitter_urlbiopast_companieseducationjoined_date
founder_& team data
● 200 OK
"person_id": "p-10293",
"name": "Patrick Collison",
"role": "Co-Founder & CEO",
"company": "Stripe",
"twitter_url": "twitter.com/patrickc",
"past_companies": "['Auctomatic']",
"education": "['MIT']"
# person_idnamerolecompanylinkedin_urltwitter_url
1
2
3

Complete list of extractable fields for Funding & Investors objects from angel.co. All fields typed and schema-versioned.

funding_idcompany_nameround_typeamount_raisedcurrencyvaluationdatelead_investorparticipating_investors
funding_& investors
● 200 OK
"round_type": "Series C",
"amount_raised": 50000000,
"currency": "USD",
"date": "2024-02-15",
"lead_investor": "Sequoia Capital",
"participating_investors": "['Andreessen Horowitz', 'Founders Fund']"
# funding_idcompany_nameround_typeamount_raisedcurrencyvaluation
1
2
3

Complete list of extractable fields for Search & Discovery objects from angel.co. All fields typed and schema-versioned.

keywordmarket_tagpositioncompany_namesignal_scorehiring_statusscraped_atpage_url
search_& discovery
● 200 OK
"keyword": "Artificial Intelligence",
"position": 1,
"company_name": "OpenAI",
"signal_score": 9.8,
"hiring_status": "Actively Hiring",
"scraped_at": "2026-05-12T09:14:33Z"
# keywordmarket_tagpositioncompany_namesignal_scorehiring_status
1
2
3

Capabilities

Extract startup intelligence precisely

Our Angel.co scraper targets the underlying GraphQL APIs and React states to extract company profiles, job listings, and equity bands - bypassing Cloudflare and session limits.

Startup Profile Extraction

Capture company name, pitch, employee count, funding stage, markets, and total capital raised from thousands of startup profiles.

Job & Equity Data

Extract highly accurate salary bands and equity percentages for engineering, product, and sales roles across global markets.

Founder Histories

Map founder backgrounds, past exits, education, and social links to build detailed talent and investment graphs.

Funding Rounds

Track historical funding events, round types, capital raised, and participating investors for every company on the platform.

Tech Stack Mapping

Extract the specific programming languages, frameworks, and infrastructure tools listed by engineering teams.

Signal Score Tracking

Monitor Wellfound Signal scores to identify trending startups and high-growth companies before they hit mainstream news.

Pagination Handling

Navigate infinite scroll and complex React pagination to ensure complete data extraction without missing records.

Anti-Bot Circumvention

Bypass strict Cloudflare protection and rate limits using residential proxies and human-like request patterns.

Continuous Diffing

Receive only new jobs, updated funding rounds, or changed salary bands to optimise your downstream storage.

// engagement pipeline

From target markets to warehouse records

Brief in. Clean data out.

Define Scope
d 0

Provide specific market tags, company sizes, or job roles. We configure the extraction schema to match your requirements.

Pipeline Build
d 2–4

We deploy Playwright spiders, residential proxies, and GraphQL interceptors to bypass Cloudflare on angel.co.

Validation & QA
d 4–6

We test the pipeline for null-rate anomalies, salary outliers, and incomplete profiles before full production launch.

Delivery
ongoing

JSON, CSV, or Parquet files pushed to your S3 bucket or Snowflake environment on a daily or weekly schedule.

Under the hood

Overcoming Wellfound extraction barriers

Angel.co employs aggressive bot mitigation and complex frontend rendering. We handle the infrastructure so you receive clean data.

pipeline-monitor · angel.co · live ● active
// fingerprinting
Identity rotation
TLS fingerprintrandomised
User-agentrotated
IP poolresidential
Challenges blocked0
// pagination
Page coverage
48,291 pages queued running
// observability
Pipeline health
99.9%
uptime
142ms
p99 lat
0.3%
null rate
2
alerts
Anti-bot layer
Cloudflare Turnstile bypass

Wellfound uses Cloudflare to block automated traffic. We utilise residential proxies and tailored Playwright contexts with authentic TLS fingerprints to solve challenges and maintain persistent sessions.

API Interception
Direct GraphQL extraction

Instead of parsing complex React DOMs, our pipeline intercepts Wellfound internal GraphQL queries from the network tab, ensuring 100% accurate extraction of nested data like equity bands and tech stacks.

Session Management
Authenticated crawl states

Certain data points on Angel.co require an active user session. We manage a pool of aged accounts with automated cookie rotation to access restricted job details and salary metrics safely.

Schema stability
Query version tracking

When Wellfound updates their GraphQL schema, our monitors detect query payload changes immediately. We map new aliases to our normalised schema to prevent pipeline failure.

Change detection
Incremental job updates

We hash job descriptions and salary bands to detect changes. Your warehouse receives a clean diff of new listings, closed roles, and modified compensation packages without redundant data.

Applications

Who uses Angel.co data

Teams across industries use angel.co data to build competitive products and smarter operations.

01
VC Deal Sourcing

Venture capital firms track hiring velocity and engineering headcount to identify breakout startups before their next funding round.

02
Competitor Intelligence

Startups monitor competitor job postings to understand product roadmaps and benchmark their own salary and equity offers.

03
Talent Acquisition

Recruiters map founder networks and track employee movement between early-stage companies to source high-tier talent.

04
Market Research

Analysts aggregate salary and equity data across thousands of listings to publish compensation reports for specific tech hubs.

05
B2B Lead Generation

Sales teams target newly funded companies that are actively expanding their engineering or marketing departments.

06
Investment Thesis Validation

Funds correlate specific tech stack choices (e.g., Rust, AI frameworks) with funding success rates to validate market trends.

Why DataFlirt

"Angel.co holds the most accurate equity and salary data for early-stage startups globally, but extracting it requires bypassing aggressive anti-scraping layers."

Most teams fail at scraping Wellfound because they rely on basic HTTP clients. We deploy full browser automation with residential proxies to bypass Cloudflare, execute React hydration, and extract clean GraphQL responses directly from the network tab. DataFlirt handles the infrastructure so your team can focus on analysis.

Technical Spec

Angel.co scraper - technical specifications

Everything supported by our angel.co scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

GraphQL interception
Extracts raw JSON payloads directly from Wellfound internal API calls
Supported
Cloudflare bypass
Automated handling of Turnstile challenges via Playwright stealth
Supported
Salary & Equity extraction
Captures precise min/max bands and currency for every job listing
Supported
Tech stack mapping
Extracts listed technologies and frameworks per company profile
Supported
Investor portfolio scraping
Maps connections between venture funds and their portfolio companies
Supported
Historical funding rounds
Captures all disclosed funding events, dates, and valuations
Supported
Change detection (diffs)
Only outputs new, updated, or closed job listings per run
Supported
Private applicant messages
Gated recruiter inbox and direct applicant messaging
Partial
Candidate contact details
Private candidate emails and phone numbers hidden behind apply walls
Partial
Infrastructure

Infrastructure powering the Angel.co pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheus
Playwright + GraphQL Interception

We bypass the complex React DOM entirely by using Playwright to intercept and parse the raw GraphQL responses that populate the Wellfound frontend.

Residential Proxy Infrastructure

Our proxy rotation logic uses high-quality US and EU residential IPs to bypass Cloudflare rate limits and IP reputation blocks seamlessly.

Cloud-Native Orchestration

Airflow manages pipeline scheduling and dependency resolution, while Kubernetes scales Playwright browser instances horizontally to meet data volume demands.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON
Nested objects mapping the exact GraphQL schema relationships
CSV
Flat file with normalised columns for immediate analysis
XLS
Excel format for non-technical talent acquisition teams
Parquet
Highly compressed columnar format for data warehouse ingestion
AWS S3
Direct delivery to your cloud storage buckets
Webhook
Real-time HTTP POST alerts for new jobs matching specific criteria
API
Queryable REST endpoints to access your extracted startup data
BigQuery
Direct table insertion with automated schema updates
S3
Direct bucket delivery — compatible with any data lake
// faq

Common questions.

About angel.co scraping, legality, and pipeline operations.

Ask us directly →
Is scraping Angel.co legal?

Scraping public data from Angel.co is generally protected under rulings like hiQ v. LinkedIn. DataFlirt extracts only public company profiles, job listings, and founder histories. We do not extract private candidate data, internal recruiter messages, or violate GDPR. Clients must ensure their specific use cases comply with local regulations.

How do you handle Cloudflare on Wellfound?

We use tailored Playwright browser contexts with residential proxies and specific TLS fingerprints to solve Cloudflare Turnstile challenges. Our request timing mimics human behaviour to prevent session invalidation.

Can you extract equity and salary bands?

Yes. We extract the exact minimum and maximum salary bands, currency, and equity percentages listed on every job posting.

How fresh is the job data?

We can run pipelines daily to capture new job postings and detect closed roles within 24 hours of the change occurring on Wellfound.

Do you support extracting full tech stacks?

Yes. We map the complete list of programming languages, frameworks, and infrastructure tools associated with a company profile.

What is the minimum viable engagement?

Our pipelines start at a defined set of market tags or specific company lists. We price based on data volume and extraction frequency. Contact us to scope your exact requirements.

$ dataflirt scope --new-project --source=angel.co ready

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a daily feed of new engineering jobs or a complete export of Series A startups - we build and operate the infrastructure. Tell us what you need.

hello@dataflirt.com · Bengaluru · IST · typical reply < 4h
Services

Data Extraction for Every Industry

View All Services →