SYSTEM all green source healthcare.gov queue 11,402 zip codes p99 latency 318ms dataflirt.com · scraper/healthcare-gov

RUN · 14 active pipelines · healthcare.gov live

ACA plan data,
at warehouse scale.

We extract health insurance plans, premium rates, out-of-pocket limits, metal tiers, and drug formularies from healthcare.gov. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Snowflake on your cadence.

Get data from healthcare.gov → See how it works

Plans extracted

41,294 /run

Premium variations

1.2M /month

Formulary drugs

384K /run

Active pipelines

Uptime

99.94%

◆ ACA Plan Data◆ Premium Rates◆ Deductibles & Copays◆ Metal Tiers◆ Provider Networks◆ Drug Formularies◆ Out-of-Pocket Limits◆ Subsidy Calculations◆ Rating Areas◆ Zip Code Mapping◆ Managed Pipeline◆ S3 / BigQuery Delivery◆ Bengaluru HQ◆ Enterprise SLA◆ ACA Plan Data◆ Premium Rates◆ Deductibles & Copays◆ Metal Tiers◆ Provider Networks◆ Drug Formularies◆ Out-of-Pocket Limits◆ Subsidy Calculations◆ Rating Areas◆ Zip Code Mapping◆ Managed Pipeline◆ S3 / BigQuery Delivery◆ Bengaluru HQ◆ Enterprise SLA

Data Dictionary

Every field we extract from healthcare.gov

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Plan Overview objects from healthcare.gov. All fields typed and schema-versioned.

plan_idissuer_nameplan_namemetal_tierplan_typerating_areastatenetwork_urlformulary_urlsummary_url

"plan_id": "12345TX0010001",
"issuer_name": "Blue Cross Blue Shield",
"plan_name": "Blue Advantage Bronze HMO 205",
"metal_tier": "Bronze",
"plan_type": "HMO",
"state": "TX"

#	plan_id	issuer_name	plan_name	metal_tier	plan_type	rating_area
1
2
3

Complete list of extractable fields for Pricing & Premiums objects from healthcare.gov. All fields typed and schema-versioned.

plan_idbase_premiumage_21_premiumage_40_premiumage_60_premiumchild_premiumehb_percenttobacco_surchargerating_areazip_codes

"plan_id": "12345TX0010001",
"base_premium": 345.5,
"age_40_premium": 412.75,
"ehb_percent": 98.5,
"tobacco_surcharge": 50.0,
"rating_area": "Rating Area 3"

#	plan_id	base_premium	age_21_premium	age_40_premium	age_60_premium	child_premium
1
2
3

Complete list of extractable fields for Cost Sharing & Deductibles objects from healthcare.gov. All fields typed and schema-versioned.

plan_idmedical_deductible_individualmedical_deductible_familydrug_deductible_individualdrug_deductible_familyoop_max_individualoop_max_familyprimary_care_copayspecialist_copayer_copay

"plan_id": "12345TX0010001",
"medical_deductible_individual": 7500.0,
"oop_max_individual": 9100.0,
"primary_care_copay": 40.0,
"specialist_copay": 80.0,
"er_copay": 500.0

#	plan_id	medical_deductible_individual	medical_deductible_family	drug_deductible_individual	drug_deductible_family	oop_max_individual
1
2
3

Complete list of extractable fields for Drug Formularies objects from healthcare.gov. All fields typed and schema-versioned.

rx_cuindc_codedrug_nametier_levelprior_authorisationstep_therapyquantity_limitplan_idissuer_idcoverage_status

"rx_cui": "855332",
"drug_name": "Atorvastatin 20mg",
"tier_level": "Tier 1",
"prior_authorisation": false,
"step_therapy": false,
"coverage_status": "Covered"

#	rx_cui	ndc_code	drug_name	tier_level	prior_authorisation	step_therapy
1
2
3

Complete list of extractable fields for Provider Networks objects from healthcare.gov. All fields typed and schema-versioned.

network_idprovider_typefacility_namenpispecialtyaccepting_new_patientstelehealth_offeredaddresscitystatezip_code

"network_id": "NW-88392",
"provider_type": "Facility",
"facility_name": "Methodist Hospital",
"npi": "1932485721",
"specialty": "Acute Care Hospital",
"accepting_new_patients": true

#	network_id	provider_type	facility_name	npi	specialty	accepting_new_patients
1
2
3

Capabilities

Everything you need from the federal exchange

Our healthcare.gov scraper targets the underlying API endpoints powering the plan comparison tool, extracting clean data across thousands of rating areas without fragile DOM parsing.

Full ACA Plan Extraction

Extract metal tiers, plan types, issuer details, and plan IDs across all rating areas and states on the federal exchange.

Premium Rate Aggregation

Capture base premiums, age-curve pricing, tobacco surcharges, and child rates per geographic rating area.

Cost-Sharing & Copay Data

Extract individual and family deductibles, out-of-pocket maximums, and specific copays for primary care, ER, and specialists.

Drug Formulary Mapping

Map NDC codes and RxNorm identifiers to plan tiers, capturing step therapy and prior authorisation requirements.

Provider Network Parsing

Extract in-network hospitals, specialists, and primary care physicians linked to specific plan network IDs.

Rating Area Resolution

Resolve county-level and zip-code-level plan availability across the 30+ states using the federal exchange.

Subsidy & Tax Credit Logic

Extract advanced premium tax credit (APTC) baseline data and cost-sharing reduction (CSR) plan variations.

Quality Rating Capture

Extract CMS star ratings, member experience scores, and clinical quality metrics for each health plan.

Document URL Extraction

Capture direct links to Summary of Benefits and Coverage (SBC), plan brochures, and network directories.

Change Detection & Diffing

Track premium adjustments, network exits, and formulary tier changes across open enrollment periods.

// engagement pipeline

From zip code list to warehouse record

Brief in. Clean data out.

Define Scope

d 0

Provide target states, rating areas, or specific issuers. We design the extraction schema together.

Pipeline Build

d 2–4

We configure Scrapy / Playwright crawlers, handle zip code session states, and bypass rate limits for healthcare.gov.

Validation & QA

d 4–6

Schema validation, null-rate checks, and premium-outlier detection before full launch.

Delivery

ongoing

JSON / CSV / Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.

Under the hood

How our pipeline handles federal exchange complexities

Healthcare.gov relies on strict rate limits and complex session states. Here is how we extract data reliably at scale.

// fingerprinting

Identity rotation

TLS fingerprintrandomised

User-agentrotated

IP poolresidential

Challenges blocked0

// pagination

Page coverage

48,291 pages queued running

// observability

Pipeline health

99.9%

uptime

142ms

p99 lat

0.3%

null rate

alerts

Session state management

Zip code and county tokenisation

Healthcare.gov requires setting a geographic context before rendering plan data. Our crawlers maintain isolated cookie sessions per geographic area, preventing cross-contamination of premium rates.

API payload interception

Direct JSON extraction from backend endpoints

Rather than scraping the DOM, we intercept the undocumented XHR requests powering the plan comparison tool. This yields cleaner, heavily structured JSON payloads with precise age-curve pricing data.

Formulary PDF parsing

Converting unstructured drug lists to tabular data

Many issuers still publish drug formularies as complex PDFs. We pipeline these documents through OCR and NLP layers to extract NDC codes, tier levels, and restriction flags into structured database records.

Rate limiting & WAF

Bypassing Akamai and federal firewall rules

The federal exchange employs strict rate limiting and Akamai bot protection. We distribute requests across a vast pool of US-based residential proxies, pacing requests to mimic standard user navigation patterns.

Data normalisation

Standardising issuer-specific terminology

Different insurers use varied terminology for copays and tier levels. Our pipeline applies regular expressions and mapping dictionaries to normalise these fields into a unified, queryable schema.

Applications

Who uses healthcare.gov data

Teams across industries use healthcare.gov data to build competitive products and smarter operations.

Market Intelligence & Competitive Analysis

Health insurers monitor competitor premiums, network sizing, and metal tier positioning across overlapping rating areas.

Broker & Agency Tooling

Health insurance brokerages power their proprietary quoting and comparison engines using our normalised plan datasets.

Pharma Market Access

Pharmaceutical companies track formulary tier placement and utilisation management restrictions for their drug portfolios.

Actuarial Modelling

Actuarial teams ingest historical premium and deductible data to model risk and price future plan offerings.

Provider Network Optimisation

Healthcare systems analyse network adequacy and competitor overlap to negotiate better reimbursement rates with payers.

Policy & Academic Research

Health policy researchers track ACA market stability, subsidy impacts, and out-of-pocket cost trends over time.

Why DataFlirt

"Healthcare.gov contains the definitive dataset of US individual health insurance markets, but extracting it across 30,000 zip codes requires significant infrastructure."

Most teams underestimate the complexity of federal exchange data. Reliable healthcare.gov scraping requires managing thousands of geographic sessions, parsing undocumented APIs, and normalising disparate issuer formats. DataFlirt absorbs that complexity so your engineers can focus on actuarial analysis, not pipeline maintenance.

Technical Spec

Healthcare.gov scraper technical capabilities

Everything supported by our healthcare.gov scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

Plan pricing & age curves

Extract base rates and age-adjusted premiums per rating area

Supported

Deductible & OOP max

Capture individual and family cost-sharing limits

Supported

Drug formulary tiers

Map RxNorm/NDC codes to coverage tiers and restrictions

Supported

CMS Star Ratings

Extract clinical quality and member experience scores

Supported

Geographic rating areas

Resolve plan availability down to the county and zip code level

Supported

SBC document links

Capture URLs for Summary of Benefits and Coverage documents

Supported

State-based exchanges

Data from Covered California, NY State of Health, etc.

Partial

Member enrollment status

PII/PHI regarding actual user enrollments and eligibility

Partial

Medicaid eligibility API

Access to federal data services hub for income verification

Partial

Infrastructure

Infrastructure powering the pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheus

XHR Interception & API Replay

Instead of fragile DOM scraping, we monitor network traffic and replay undocumented API calls to the healthcare.gov backend, extracting clean JSON payloads directly.

Distributed Session Management

We maintain thousands of concurrent geographic sessions using Redis, ensuring premium data is accurately tied to the correct rating area without cross-contamination.

US-Residential Proxy Pool

To navigate federal firewalls and Akamai bot protection, we route traffic exclusively through US-based ISP proxies, rotating IPs dynamically based on response latency and block rates.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON

Newline-delimited or nested - schema versioned per run

CSV

Flat file with typed columns - Excel/Sheets compatible

Parquet

Columnar format for BigQuery, Snowflake, Athena

Direct bucket delivery - compatible with any data lake

Webhook

HTTP POST per record for real-time downstream processing

XLS

Excel format for business and actuarial teams

API

REST endpoint for querying the extracted plan database

PostgreSQL

Direct upsert into your relational database schema

// faq

Common questions.

About healthcare.gov scraping, legality, and pipeline operations.

Ask us directly →

Is scraping healthcare.gov legal?

Scraping publicly available plan and pricing data from healthcare.gov is generally permissible. DataFlirt targets only public, non-authenticated market data. We do not extract PII/PHI, circumvent authentication walls, or attempt to access the federal data services hub.

How do you handle the geographic variations in plan data?

Healthcare.gov relies on rating areas determined by zip code and county. Our crawlers systematically iterate through a master list of US zip codes, setting the appropriate session state to extract localised premium and network data.

Can you extract data from state-based exchanges?

This specific pipeline targets the federal exchange serving over 30 states. State-based exchanges require separate custom pipelines due to entirely different underlying architectures and schemas.

How do you extract drug formulary data?

Where available, we extract structured JSON from the formulary search endpoints. If issuers only provide PDF formularies, we utilise an OCR and NLP pipeline to convert the documents into structured tabular data mapping NDC codes to tiers.

How fresh is the data?

During the Open Enrollment Period, we can configure daily or weekly runs to capture plan updates and corrections. Outside of OEP, monthly runs are typical for capturing mid-year network changes.

Do you capture cost-sharing reduction plan variations?

Yes. We extract the standard plan designs as well as the 73%, 87%, and 94% actuarial value CSR variations, detailing the reduced deductibles and copays for eligible individuals.

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off export of all ACA plans or a continuous feed of formulary changes, we scope, build, and operate the pipeline. Tell us what you need.

Start a healthcare.gov pipeline → View pricing

hello@dataflirt.com · Bengaluru · IST · typical reply < 4h

Services

Data Extraction for Every Industry

View All Services →

🛍️ eCommerce → 🔍 Search Engine → ⚽ Sports Data → 📱 App Store → 🍕 Food Delivery → 📉 Betting Odds → ✈️ Aviation & Flight → 🛒 Grocery → 🎓 E-Learning → 💹 Stock Market → 🏠 Real Estate → 🤖 AI Training Data → 🧠 LLM Data → 📰 News → ⭐ Reviews → 💼 Job Board → 🏥 Healthcare → 💊 Pharma → 🏢 Company Data → 🤝 B2B Marketplace → 🚗 Automotive → 🌍 Travel → 🏨 Hospitality → 🪙 Cryptocurrency → 💡 IP & Patents → 📈 SEO Data → ⚖️ Legal → 🛡️ Insurance → 📲 Mobile App → 📸 Influencer → 🏛️ Government → 🚚 Transportation → 🎟️ Events → 📂 Directory → ⚡ Dynamic Websites → 📄 PDF Extraction → ✍️ Blog Content → ☁️ Weather → 🖥️ Cloud Scraping → 👨‍💻 Managed Service →

ACA plan data, at warehouse scale.

Every field we extract from healthcare.gov

Everything you need from the federal exchange

From zip code list to warehouse record

How our pipeline handles federal exchange complexities

Who uses healthcare.gov data

Healthcare.gov scraper technical capabilities

Infrastructure powering the pipeline

Your data, your destination

Common questions.

Tell us whatto extract. We do the rest.

Data Extraction for Every Industry

ACA plan data,
at warehouse scale.

Tell us what
to extract.
We do the rest.