SYSTEM all green source capsule.com queue 18,392 pages p99 latency 184ms dataflirt.com · scraper/capsule-com
RUN · 41 active pipelines · capsule.com live

Capsule pharmacy data,
at warehouse scale.

We extract prescription drug catalogues, OTC inventory, retail pricing, insurance copay estimates, and delivery coverage zones from Capsule. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Snowflake on your cadence.

Drugs catalogued
14.2K /run
Price updates
38.1K /24h
OTC records
4.8K /run
Active pipelines
41
Uptime
99.98%
Data Dictionary

Every field we extract from capsule.com

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Prescription Drugs objects from capsule.com. All fields typed and schema-versioned.

ndc_codegeneric_namebrand_nameactive_ingredientsdosage_formstrengthmanufacturerretail_pricerx_requiredside_effects
prescription_drugs
● 200 OK
"ndc_code": "00069-3150-83",
"generic_name": "Atorvastatin Calcium",
"brand_name": "Lipitor",
"dosage_form": "Tablet",
"strength": "40mg",
"retail_price": 145.5,
"rx_required": true
# ndc_codegeneric_namebrand_nameactive_ingredientsdosage_formstrength
1
2
3

Complete list of extractable fields for OTC Products objects from capsule.com. All fields typed and schema-versioned.

skuproduct_namecategorybrandpricestock_statusdescriptioningredientssizebarcode
otc_products
● 200 OK
"sku": "OTC-88492",
"product_name": "Zyrtec Allergy Relief",
"category": "Allergy & Asthma",
"brand": "Zyrtec",
"price": 22.99,
"stock_status": "In Stock",
"size": "30 Tablets"
# skuproduct_namecategorybrandpricestock_status
1
2
3

Complete list of extractable fields for Pricing & Insurance objects from capsule.com. All fields typed and schema-versioned.

drug_idcash_priceaverage_copayinsurance_networks_acceptedtier_statusprior_auth_requiredquantity_limitsstep_therapy_flag
pricing_& insurance
● 200 OK
"drug_id": "DRG-4921",
"cash_price": 145.5,
"average_copay": 15.0,
"tier_status": "Tier 2",
"prior_auth_required": false,
"step_therapy_flag": false,
"quantity_limits": "30 per 30 days"
# drug_idcash_priceaverage_copayinsurance_networks_acceptedtier_statusprior_auth_required
1
2
3

Complete list of extractable fields for Delivery Coverage objects from capsule.com. All fields typed and schema-versioned.

zip_codecitystatedelivery_window_availablecourier_typepharmacy_hub_locationcutoff_timeservice_status
delivery_coverage
● 200 OK
"zip_code": "10001",
"city": "New York",
"state": "NY",
"delivery_window_available": "Same Day",
"pharmacy_hub_location": "Manhattan Hub",
"service_status": "Active",
"cutoff_time": "14:00"
# zip_codecitystatedelivery_window_availablecourier_typepharmacy_hub_location
1
2
3

Complete list of extractable fields for Drug Information objects from capsule.com. All fields typed and schema-versioned.

drug_idwarningscontraindicationscommon_side_effectssevere_side_effectsstorage_instructionsadministration_routefda_approval_date
drug_information
● 200 OK
"drug_id": "DRG-4921",
"administration_route": "Oral",
"storage_instructions": "Store at room temperature",
"common_side_effects": "['Nausea', 'Headache', 'Fatigue']",
"severe_side_effects": "['Muscle pain', 'Liver problems']",
"warnings": "Do not take if pregnant"
# drug_idwarningscontraindicationscommon_side_effectssevere_side_effectsstorage_instructions
1
2
3

Capabilities

Complete pharmacy data extraction

Our Capsule scraper navigates location-based routing, dynamic React interfaces, and complex drug taxonomies to deliver structured catalogue and pricing data.

Full Drug Catalogue

Extract generic names, brand names, active ingredients, dosage forms, strengths, and NDC codes across the entire formulary.

Dynamic Retail Pricing

Capture cash prices and estimated copays for specific dosage and quantity combinations, timestamped per crawl.

OTC Inventory Tracking

Monitor over-the-counter product availability, pricing, and category placement across different delivery zones.

Insurance Network Mapping

Extract accepted insurance networks and standard tier statuses for prescription medications.

Geographic Coverage Areas

Map active service zones by iterating through ZIP codes to determine delivery availability and hub assignments.

Dosage Combinations

Extract all available strength and quantity permutations for a given medication to build a complete pricing matrix.

Manufacturer Data

Identify drug manufacturers and distributors listed for specific generic and brand-name medications.

Clinical Information

Scrape warnings, contraindications, side effects, and storage instructions surfaced on the medication detail pages.

Scheduled Updates

Run continuous pipelines at daily or weekly cadences to track formulary changes and price fluctuations.

// engagement pipeline

From target list to warehouse record

Brief in. Clean data out.

Define Scope
d 0

Provide drug names, NDC codes, or ZIP codes. We design the extraction schema together.

Pipeline Build
d 2–4

We configure Scrapy / Playwright crawlers, proxy rotation, location spoofing, and session management for capsule.com.

Validation & QA
d 4–6

Schema validation, null-rate checks, price-outlier detection, and sample payloads before full launch.

Delivery
ongoing

JSON / CSV / Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.

Under the hood

How our Capsule pipeline handles the hard parts

Modern digital pharmacies use complex location routing and single-page application architectures. Here is how we build resilient pipelines.

pipeline-monitor · capsule.com · live ● active
// fingerprinting
Identity rotation
TLS fingerprintrandomised
User-agentrotated
IP poolresidential
Challenges blocked0
// pagination
Page coverage
48,291 pages queued running
// observability
Pipeline health
99.9%
uptime
142ms
p99 lat
0.3%
null rate
2
alerts
Location spoofing
Precise ZIP code iteration

Capsule's inventory and delivery options vary strictly by geographic location. We inject specific coordinates and ZIP codes into the browser session to map coverage and pricing across multiple metropolitan areas accurately.

JavaScript rendering
Full Playwright execution for SPA content

Capsule relies heavily on React and dynamic API calls to render drug information and pricing. We run full Playwright browser sessions to hydrate the application state and capture data that headless HTTP clients miss entirely.

Schema stability
Resilient selectors with fallback chains

Healthcare platform interfaces evolve frequently. Our selector strategy uses multiple fallback chains per field — CSS selectors, XPath, and intercepted XHR responses — so a layout change does not break your data pipeline.

Change detection
Only re-scrape what has changed

For large drug catalogues, we maintain a hash index of last-seen values per field. Subsequent runs only push diffs — reducing compute cost and downstream processing load. You get a clean changelog rather than full re-dumps.

Monitoring & alerting
24/7 pipeline health with anomaly detection

Every run emits structured logs to our observability stack. We alert on null-rate spikes, price outliers, schema drift, and coverage drops — and respond before you notice. SLA uptime is contractual.

Applications

Who uses Capsule data — and how

Teams across industries use capsule.com data to build competitive products and smarter operations.

01
Price Benchmarking

Retail pharmacies and telehealth providers monitor cash prices and copay estimates to maintain competitive pricing.

02
Market Research

Healthcare analysts track formulary additions, generic substitutions, and OTC category expansion trends.

03
Telehealth Integration

Digital health platforms map local pharmacy delivery coverage zones to route prescriptions efficiently.

04
Insurance Analysis

Payers and PBMs analyze accepted networks and tier placements across digital pharmacy platforms.

05
Supply Chain Monitoring

Manufacturers track stock availability and geographic distribution of their medications.

06
AI Training Data

ML teams use structured medication catalogues and clinical information to train healthcare NLP models.

Why DataFlirt

"Capsule maps the modern pharmacy experience, but extracting their pricing and coverage data requires navigating complex location-based routing and dynamic React interfaces."

Most teams underestimate the investment required: reliable Capsule scraping requires residential proxies, location spoofing, full JavaScript rendering, and daily selector maintenance. DataFlirt absorbs that complexity so your engineers can focus on the analysis, not the infrastructure.

Technical Spec

Capsule scraper — technical capabilities

Everything supported by our capsule.com scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

Location spoofing
Iterate through specific ZIP codes to capture localized inventory and pricing
Supported
JavaScript rendering
Full Playwright sessions to handle React application hydration
Supported
OTC product extraction
Capture non-prescription inventory, pricing, and stock status
Supported
Insurance copay mapping
Extract estimated copays and accepted insurance networks
Supported
Dosage permutation iteration
Extract pricing for all strength and quantity combinations per drug
Supported
XHR interception
Capture structured JSON payloads directly from backend API calls
Supported
Patient prescription history
Individual user prescription records and refill statuses
Partial
User account profiles
Personal patient data, payment methods, and HIPAA-protected information
Partial
Infrastructure

Infrastructure powering the Capsule pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheus
Scrapy + Playwright Stack

Scrapy handles crawl orchestration, deduplication, and retry logic. Playwright handles JavaScript rendering, location spoofing, and interaction flows. Combined via scrapy-playwright middleware.

Residential Proxy Infrastructure

We maintain pools of residential ISP proxies across US regions. Rotation happens per-request with sticky sessions where required. IP score monitoring prevents blacklisted pool contamination.

Cloud-Native Orchestration

Pipelines run on AWS Lambda (burst) and ECS (sustained). Airflow handles scheduling, dependency management, and SLA alerting. All state stored in managed Postgres.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON
Newline-delimited or nested — schema versioned per run
CSV
Flat file with typed columns — Excel/Sheets compatible
XLS
Excel spreadsheet format for immediate business analysis
Parquet
Columnar format for BigQuery, Snowflake, Athena
AWS S3
Direct bucket delivery — compatible with any data lake
Webhook
HTTP POST per record for real-time downstream processing
API
REST endpoints to query your extracted datasets on demand
PostgreSQL
Upsert into your existing schema with conflict resolution
S3
Direct bucket delivery — compatible with any data lake
// faq

Common questions.

About capsule.com scraping, legality, and pipeline operations.

Ask us directly →
Is scraping Capsule legal?

Scraping publicly available information from Capsule is generally permissible under applicable law. DataFlirt targets only public, non-authenticated drug catalogues, pricing, and coverage data. We do not extract personal patient data, circumvent authentication walls, or violate HIPAA regulations.

How do you handle location-based data?

We utilize Playwright to inject specific geolocation coordinates and ZIP codes into the browser session, allowing us to map coverage areas and extract localized pricing accurately across different metropolitan regions.

Can you extract all dosage combinations for a drug?

Yes. Our pipeline iterates through all available strength and quantity selectors on a medication page to build a complete pricing matrix for every possible permutation.

How fresh is the data?

Full catalogue refreshes at daily or weekly cadences complete within a defined window. Specific pricing monitors can be configured to run more frequently based on your requirements.

Do you extract OTC products as well?

Yes. We extract the entire over-the-counter catalogue, including pricing, stock status, descriptions, and category hierarchies.

Can I request a sample dataset before committing?

Absolutely. We provide a sample run of up to 100 medications or specific ZIP codes as part of the pre-engagement scoping process — so you can validate schema fit, field completeness, and data quality.

$ dataflirt scope --new-project --source=capsule.com ready

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off formulary dump or a continuous price-monitoring feed across multiple ZIP codes — we scope, build, and operate the pipeline. Tell us what you need.

hello@dataflirt.com · Bengaluru · IST · typical reply < 4h
Services

Data Extraction for Every Industry

View All Services →