SYSTEM all green source greenhouse.io queue 12,492 boards p99 latency 184ms dataflirt.com · scraper/greenhouse-io

RUN · 84 active pipelines · greenhouse.io live

Greenhouse ATS data,
at warehouse scale.

We extract job listings, department structures, office locations, and custom application fields from Greenhouse boards. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Snowflake on your cadence.

Get data from greenhouse.io → See how it works

Jobs extracted

842K /day

Board updates

14.2K /24h

Departments mapped

94K /run

Active pipelines

Uptime

99.98%

◆ Greenhouse Job Boards◆ Requisition IDs◆ Department Hierarchies◆ Office Locations◆ Custom Application Fields◆ Job Descriptions◆ Employment Types◆ Remote Status Flags◆ Salary Ranges◆ Board Discovery◆ Managed Pipeline◆ S3 / BigQuery Delivery◆ Bengaluru HQ◆ Enterprise SLA◆ Greenhouse Job Boards◆ Requisition IDs◆ Department Hierarchies◆ Office Locations◆ Custom Application Fields◆ Job Descriptions◆ Employment Types◆ Remote Status Flags◆ Salary Ranges◆ Board Discovery◆ Managed Pipeline◆ S3 / BigQuery Delivery◆ Bengaluru HQ◆ Enterprise SLA

Data Dictionary

Every field we extract from greenhouse.io

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Job Postings objects from greenhouse.io. All fields typed and schema-versioned.

req_idinternal_job_idtitledepartmentlocationremote_statusemployment_typeposted_dateurldescription_htmldescription_textsalary_minsalary_maxcurrency

"req_id": "req_84921",
"title": "Senior Backend Engineer",
"department": "Engineering",
"location": "London, UK",
"remote_status": "Hybrid",
"employment_type": "Full-time",
"salary_min": 85000,
"salary_max": 110000,
"currency": "GBP"

#	req_id	internal_job_id	title	department	location	remote_status
1
2
3

Complete list of extractable fields for Department Data objects from greenhouse.io. All fields typed and schema-versioned.

department_iddepartment_nameparent_department_idparent_department_nameboard_tokenactive_job_countmanager_namecost_centerinternal_code

"department_id": "dept_402",
"department_name": "Data Infrastructure",
"parent_department_id": "dept_101",
"parent_department_name": "Engineering",
"board_token": "dataflirt",
"active_job_count": 14,
"cost_center": "CC-ENG-04"

#	department_id	department_name	parent_department_id	parent_department_name	board_token	active_job_count
1
2
3

Complete list of extractable fields for Location Data objects from greenhouse.io. All fields typed and schema-versioned.

office_idoffice_namecitystatecountryregionis_remotetimezoneaddress

"office_id": "loc_883",
"office_name": "Bengaluru HQ",
"city": "Bengaluru",
"state": "Karnataka",
"country": "India",
"region": "APAC",
"is_remote": false,
"timezone": "Asia/Kolkata"

#	office_id	office_name	city	state	country	region
1
2
3

Complete list of extractable fields for Application Fields objects from greenhouse.io. All fields typed and schema-versioned.

field_idfield_namefield_typeis_requiredoptions_listjob_idboard_tokenvalidation_rules

"field_id": "custom_49201",
"field_name": "LinkedIn Profile URL",
"field_type": "url",
"is_required": true,
"options_list": "[]",
"job_id": "req_84921",
"board_token": "dataflirt"

#	field_id	field_name	field_type	is_required	options_list	job_id
1
2
3

Complete list of extractable fields for Board Metadata objects from greenhouse.io. All fields typed and schema-versioned.

board_tokencompany_namelogo_urltotal_active_jobsdepartments_countlocations_countlast_updatedscrape_timestamp

"board_token": "dataflirt",
"company_name": "DataFlirt",
"logo_url": "https://boards.greenhouse.io/dataflirt/logo.png",
"total_active_jobs": 42,
"departments_count": 8,
"locations_count": 3,
"scrape_timestamp": "2026-08-14T10:22:15Z"

#	board_token	company_name	logo_url	total_active_jobs	departments_count	locations_count
1
2
3

Capabilities

Everything you need from Greenhouse ATS

Our Greenhouse scraper targets the underlying API structures and custom domain implementations to extract clean, normalised job and department data across thousands of companies.

Full Job Listing Extraction

Extract title, rich text description, requirements, employment type, and custom metadata fields per requisition.

Board Discovery

Identify hidden Greenhouse boards via footprinting and standardise custom domain implementations back to a unified schema.

Department Mapping

Reconstruct nested department hierarchies to understand organisational structure and hiring focus areas.

Salary Band Extraction

Parse pay transparency data, extracting minimum and maximum salary ranges along with currency codes.

Location & Remote Flags

Standardise location strings and extract explicit remote, hybrid, or on-site designations.

Custom Field Parsing

Extract custom application questions and requirements configured by the employer on a per-job basis.

Historical Tracking

Track requisition lifecycle events. Identify exact dates when roles are opened, updated, or closed.

Multi-Region Support

Handle international Greenhouse instances, local language postings, and region-specific compliance fields.

Scheduled Diffs

Run continuous pipelines that only output new, modified, or deleted roles to reduce downstream processing load.

// engagement pipeline

From board list to warehouse record

Brief in. Clean data out.

Define Scope

d 0

Provide company names, domains, or Greenhouse board tokens. We design the extraction schema together.

Pipeline Build

d 2–4

We configure Scrapy / Playwright crawlers, proxy rotation, and session management for greenhouse.io endpoints.

Validation & QA

d 4–6

Schema validation, null-rate checks, and department mapping verification before full launch.

Delivery

ongoing

JSON / CSV / Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.

Under the hood

How our Greenhouse pipeline handles the hard parts

Greenhouse implementations vary wildly across companies. Here is how we normalise the data and maintain pipeline stability.

// fingerprinting

Identity rotation

TLS fingerprintrandomised

User-agentrotated

IP poolresidential

Challenges blocked0

// pagination

Page coverage

48,291 pages queued running

// observability

Pipeline health

99.9%

uptime

142ms

p99 lat

0.3%

null rate

alerts

API vs DOM extraction

Targeting the Greenhouse embedded API

Whenever possible, we target the Greenhouse embedded API (boards-api.greenhouse.io) for structured JSON. For companies using custom hosted boards or iframe implementations, we fall back to DOM parsing, normalising the output to match the API schema.

Rate limiting

Distributed request timing

Greenhouse rate limits aggressively on single IP addresses. We distribute requests across a large pool of datacenter and residential proxies, implementing exponential backoff and jitter to stay under rate limit thresholds.

Schema stability

Handling custom board structures

Employers frequently customise their Greenhouse boards with unique JavaScript frameworks and CSS layouts. Our selector strategy uses structural heuristics to identify job blocks, departments, and locations regardless of the visual presentation.

Change detection

Requisition lifecycle tracking

We maintain a hash index of active job IDs per board. Subsequent runs compare the current state against the index, emitting structured diffs for newly opened roles, updated descriptions, and closed requisitions.

Monitoring & alerting

Anomaly detection on board availability

Companies frequently migrate ATS platforms or change board tokens. We alert on 404 errors, sudden drops in job counts, and domain redirects, allowing us to update the target configuration before data gaps occur.

Applications

Who uses Greenhouse data

Teams across industries use greenhouse.io data to build competitive products and smarter operations.

Competitor Intelligence

Track hiring velocity, department expansion, and strategic focus areas of competitor companies based on open requisitions.

Lead Generation

Identify companies using specific technology stacks or expanding certain departments based on job requirements and titles.

Labour Market Analytics

Aggregate salary bands, remote work trends, and skill demand across thousands of high-growth companies.

Job Aggregation

Feed job boards, recruitment marketplaces, and talent networks with fresh, structured job postings.

Investment Research

Private equity and venture capital firms monitor startup growth signals via headcount expansion and executive hiring.

HR Tech Integrations

Sync ATS data for external tooling, compensation benchmarking platforms, and diversity analytics software.

Why DataFlirt

"Greenhouse powers the hiring for the world's fastest-growing companies. Tracking their open requisitions is the clearest signal of corporate strategy and financial health."

Most teams underestimate the investment required to track thousands of disparate Greenhouse boards: reliable scraping requires discovering hidden board tokens, normalising custom domain implementations, handling rate limits, and standardising nested department structures. DataFlirt absorbs that complexity so your engineers can focus on the analysis, not the infrastructure.

Technical Spec

Greenhouse scraper technical capabilities

Everything supported by our greenhouse.io scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

Embedded API extraction

Direct extraction from boards-api.greenhouse.io endpoints

Supported

Custom domain board parsing

Extraction from custom hosted careers pages and iframes

Supported

Department hierarchy mapping

Nested parent-child department relationships

Supported

Salary band normalisation

Extraction of min/max numerical values and currency codes

Supported

Remote status classification

Standardised remote, hybrid, or on-site designations

Supported

Historical job tracking

Time-series tracking of job open and close dates

Supported

Daily change detection (diffs)

Emit records only for new, modified, or closed roles

Supported

Internal candidate notes

Interview feedback and recruiter notes gated behind ATS login

Partial

Applicant PII and resumes

Candidate profiles and submitted applications gated behind ATS login

Partial

Infrastructure

Infrastructure powering the Greenhouse pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheus

API & DOM Extraction Engine

A hybrid approach targeting the Greenhouse embedded API for structured JSON where possible, falling back to Playwright DOM parsing for custom hosted boards.

Proxy Infrastructure

Datacenter and residential IP rotation to bypass Cloudflare protections and distribute requests across rate limit windows.

Cloud-Native Orchestration

Pipelines run on AWS Lambda and ECS. Airflow handles scheduling, dependency management, and SLA alerting for thousands of concurrent board scrapes.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON

Newline-delimited or nested arrays

CSV

Flat file with typed columns

XLS

Excel compatible format for manual review

Parquet

Columnar format for data warehouses

AWS S3

Direct bucket delivery

Webhook

HTTP POST per record or batch

API

Queryable REST endpoints for extracted data

BigQuery

Streamed directly into your dataset

Snowflake

Stage and COPY INTO workflow

Postgres

Upsert into your existing schema

Direct bucket delivery — compatible with any data lake

// faq

Common questions.

About greenhouse.io scraping, legality, and pipeline operations.

Ask us directly →

Is scraping Greenhouse public job boards legal?

Scraping publicly available job postings from Greenhouse is generally permissible. DataFlirt targets only public, non-authenticated job and department data. We do not extract personal candidate data, circumvent authentication walls, or access internal ATS systems.

How do you find Greenhouse boards for target companies?

We use domain footprinting, DNS records, and search engine dorking to identify Greenhouse board tokens and custom careers page implementations for your target company list.

Do you support custom domain boards?

Yes. Many companies host Greenhouse boards on custom domains or via iframes. Our pipeline identifies the underlying API calls or parses the custom DOM structure to extract the data into our normalised schema.

How fresh is the data?

Pipelines can be configured for daily, hourly, or near real-time cadences depending on the number of boards tracked and your freshness requirements.

Can you extract salary transparency data?

Yes. We extract minimum and maximum salary bands, currency codes, and equity indicators where provided in the job description or structured metadata fields.

What is the minimum viable engagement?

Our minimum engagement typically starts at tracking 500 target boards with daily delivery. For larger aggregations or custom schema requirements, we price based on volume and frequency.

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off dump of target companies or a continuous feed of job market data across 10,000 boards, we scope, build, and operate the pipeline. Tell us what you need.

Start a greenhouse.io pipeline → View pricing

hello@dataflirt.com · Bengaluru · IST · typical reply < 4h

Services

Data Extraction for Every Industry

View All Services →

🛍️ eCommerce → 🔍 Search Engine → ⚽ Sports Data → 📱 App Store → 🍕 Food Delivery → 📉 Betting Odds → ✈️ Aviation & Flight → 🛒 Grocery → 🎓 E-Learning → 💹 Stock Market → 🏠 Real Estate → 🤖 AI Training Data → 🧠 LLM Data → 📰 News → ⭐ Reviews → 💼 Job Board → 🏥 Healthcare → 💊 Pharma → 🏢 Company Data → 🤝 B2B Marketplace → 🚗 Automotive → 🌍 Travel → 🏨 Hospitality → 🪙 Cryptocurrency → 💡 IP & Patents → 📈 SEO Data → ⚖️ Legal → 🛡️ Insurance → 📲 Mobile App → 📸 Influencer → 🏛️ Government → 🚚 Transportation → 🎟️ Events → 📂 Directory → ⚡ Dynamic Websites → 📄 PDF Extraction → ✍️ Blog Content → ☁️ Weather → 🖥️ Cloud Scraping → 👨‍💻 Managed Service →

Greenhouse ATS data, at warehouse scale.

Every field we extract from greenhouse.io

Everything you need from Greenhouse ATS

From board list to warehouse record

How our Greenhouse pipeline handles the hard parts

Who uses Greenhouse data

Greenhouse scraper technical capabilities

Infrastructure powering the Greenhouse pipeline

Your data, your destination

Common questions.

Tell us whatto extract. We do the rest.

Data Extraction for Every Industry

Greenhouse ATS data,
at warehouse scale.

Tell us what
to extract.
We do the rest.