SYSTEM all green source hoovers.com queue 18,942 profiles p99 latency 812ms dataflirt.com · scraper/hoovers-com
RUN * 114 active pipelines * hoovers.com live

Hoovers data,
at warehouse scale.

We extract company profiles, D-U-N-S numbers, corporate family trees, financials, and industry classifications. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Snowflake on your cadence.

Companies extracted
840K /day
Executives mapped
3.2M /24h
Financial records
142K /run
Active pipelines
114
Uptime
99.98%
Data Dictionary

Every field we extract from hoovers.com

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Company Profiles objects from hoovers.com. All fields typed and schema-versioned.

duns_numbercompany_namelegal_nameaddresscitystatecountryzip_codephonewebsiteyear_foundeddescriptionemployee_countrevenue_usd
company_profiles
● 200 OK
"duns_number": "00-123-4567",
"company_name": "Acme Corp",
"city": "San Francisco",
"state": "CA",
"country": "USA",
"employee_count": 1450,
"revenue_usd": 250000000,
"year_founded": 1998
# duns_numbercompany_namelegal_nameaddresscitystate
1
2
3

Complete list of extractable fields for Corporate Hierarchy objects from hoovers.com. All fields typed and schema-versioned.

duns_numberparent_dunsglobal_ultimate_dunsdomestic_ultimate_dunssubsidiary_counthierarchy_levelrelationship_typeparent_nameglobal_ultimate_name
corporate_hierarchy
● 200 OK
"duns_number": "00-123-4567",
"parent_duns": "00-987-6543",
"global_ultimate_duns": "00-111-2222",
"hierarchy_level": 3,
"relationship_type": "Subsidiary",
"parent_name": "Acme Holdings LLC",
"subsidiary_count": 12
# duns_numberparent_dunsglobal_ultimate_dunsdomestic_ultimate_dunssubsidiary_counthierarchy_level
1
2
3

Complete list of extractable fields for Financials objects from hoovers.com. All fields typed and schema-versioned.

duns_numberfiscal_yearannual_revenuegross_profitnet_incometotal_assetstotal_liabilitiesfiscal_year_endcurrencygrowth_pct
financials
● 200 OK
"duns_number": "00-123-4567",
"fiscal_year": 2025,
"annual_revenue": 250000000,
"gross_profit": 85000000,
"net_income": 12000000,
"currency": "USD",
"growth_pct": 14.2
# duns_numberfiscal_yearannual_revenuegross_profitnet_incometotal_assets
1
2
3

Complete list of extractable fields for Executives objects from hoovers.com. All fields typed and schema-versioned.

duns_numbercontact_idfirst_namelast_namejob_titledepartmentmanagement_levelemail_formatlinkedin_urldirect_phone
executives
● 200 OK
"duns_number": "00-123-4567",
"first_name": "Jane",
"last_name": "Doe",
"job_title": "Chief Technology Officer",
"department": "Engineering",
"management_level": "C-Level",
"email_format": "{first}.{last}@acmecorp.com"
# duns_numbercontact_idfirst_namelast_namejob_titledepartment
1
2
3

Complete list of extractable fields for Industry & Competitors objects from hoovers.com. All fields typed and schema-versioned.

duns_numberprimary_naicsprimary_sicsecondary_naicsindustry_descriptioncompetitor_duns_listmarket_share_pctindustry_rank
industry_& competitors
● 200 OK
"duns_number": "00-123-4567",
"primary_naics": "511210",
"primary_sic": "7372",
"industry_description": "Software Publishers",
"competitor_duns_list": "['00-222-3333', '00-444-5555']",
"industry_rank": 4
# duns_numberprimary_naicsprimary_sicsecondary_naicsindustry_descriptioncompetitor_duns_list
1
2
3

Capabilities

Complete corporate intelligence extraction

Our Hoovers scraper navigates complex session states and rate limits to extract firmographics, hierarchies, and financial data with high fidelity.

Firmographic Extraction

Capture company names, addresses, employee counts, revenue figures, and foundational firmographics for any target list.

D-U-N-S Number Mapping

Extract and map the proprietary Dun & Bradstreet D-U-N-S numbers to your internal CRM records for master data management.

Corporate Family Trees

Crawl paginated hierarchy views to reconstruct parent, subsidiary, and global ultimate relationships.

Executive Contacts

Extract leadership teams, job titles, departments, and management levels associated with specific corporate entities.

Financial Data

Capture annual revenue, gross profit, net income, and asset metrics across multiple fiscal years.

Industry Classifications

Extract primary and secondary NAICS and SIC codes to categorise your target accounts accurately.

Competitor Mapping

Extract Hoovers competitor lists to map market landscapes and identify overlapping accounts.

Scheduled Updates

Run continuous pipelines to detect changes in employee counts, revenue, or executive leadership.

Automated Schema Validation

Detect DOM changes immediately. Our pipelines use fallback selectors to ensure data continuity.

// engagement pipeline

From target list to warehouse record

Brief in. Clean data out.

Define Scope
d 0

Provide target company names, domains, or D-U-N-S numbers. We define the extraction schema together.

Pipeline Build
d 2–4

We configure crawlers, proxy rotation, session management, and authentication handling for hoovers.com.

Validation & QA
d 4–6

Schema validation, null-rate checks, and data normalisation testing before full launch.

Delivery
ongoing

JSON, CSV, or Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.

Under the hood

How our Hoovers pipeline handles the hard parts

Hoovers gates data behind complex session states and strict rate limits. We manage the infrastructure so you get clean records.

pipeline-monitor · hoovers.com · live ● active
// fingerprinting
Identity rotation
TLS fingerprintrandomised
User-agentrotated
IP poolresidential
Challenges blocked0
// pagination
Page coverage
48,291 pages queued running
// observability
Pipeline health
99.9%
uptime
142ms
p99 lat
0.3%
null rate
2
alerts
Session management
Handling auth tokens and cookies

Hoovers monitors session states aggressively. Our pipeline maintains isolated cookie jars and rotates TLS fingerprints to prevent session invalidation.

Rate limit circumvention
IP rotation and delay jitter

We utilise US-based residential proxies with randomized request delays to stay under Hoovers' strict request-per-minute thresholds.

JavaScript rendering
Playwright for dynamic tables

Financial tables and extended executive lists require JavaScript execution. We run headless Playwright instances to render the full DOM.

Hierarchy parsing
Reconstructing family trees

Corporate hierarchies are often deeply nested and paginated. Our recursive crawlers traverse every branch to build complete parent-child maps.

Data normalisation
Standardising addresses and revenue

We normalise raw text strings into structured data types, converting '2.5M' into numeric integers and standardising global address formats.

Applications

Who uses Hoovers data and how

Teams across industries use hoovers.com data to build competitive products and smarter operations.

01
CRM Enrichment

Sales operations teams append accurate firmographics and D-U-N-S numbers to incomplete Salesforce or HubSpot records.

02
Territory Planning

Revenue leaders use employee counts, revenue bands, and industry codes to carve equitable sales territories.

03
Master Data Management

Data engineering teams use corporate family trees to link subsidiary accounts to global ultimate parents.

04
Competitor Intelligence

Strategy teams map industry landscapes by extracting competitor lists and financial growth metrics.

05
Supply Chain Risk

Procurement teams monitor the financial health and corporate structure of critical vendors.

06
Private Equity Sourcing

Investment analysts screen for acquisition targets using specific revenue, growth, and industry criteria.

Why DataFlirt

"Hoovers holds the definitive map of global B2B corporate structures, but extracting it requires navigating strict session controls and complex paginated hierarchies."

Most data teams waste months building custom scrapers for Hoovers, only to find their IP blocked or their schema broken by a minor DOM update. DataFlirt manages the proxies, session rotation, and parsing logic. You just query the normalised data in your warehouse.

Technical Spec

Hoovers scraper technical capabilities

Everything supported by our hoovers.com scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

JavaScript rendering
Full Playwright sessions required for dynamic financial tables
Supported
CAPTCHA bypass
Automated solver integration for rate-limit friction
Supported
Residential proxy rotation
ISP-grade US residential IPs rotated to maintain session validity
Supported
D-U-N-S mapping
Extraction of primary identifiers for CRM matching
Supported
Corporate tree recursion
Traversal of multi-page parent/child subsidiary structures
Supported
Change detection (diffs)
Hash-based diff to only emit records with changed firmographics
Supported
Webhook delivery
HTTP POST per record for immediate CRM ingestion
Supported
Premium industry reports
PDF downloads gated behind specific premium tier credits
Partial
Direct executive emails
Hoovers masks direct emails requiring manual credit expenditure
Partial
Infrastructure

Infrastructure powering the Hoovers pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheus
Scrapy + Playwright Stack

Scrapy handles crawl orchestration and deduplication. Playwright handles JavaScript rendering and session token maintenance required by Hoovers.

Residential Proxy Infrastructure

We maintain pools of residential ISP proxies. Rotation happens with sticky sessions to prevent forced logouts and IP bans.

Cloud-Native Orchestration

Pipelines run on AWS ECS. Airflow handles scheduling and dependency management. All state is stored in managed Postgres.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON
Newline-delimited or nested objects
CSV
Flat file with typed columns
XLS
Excel format for business users
Parquet
Columnar format for BigQuery and Snowflake
AWS S3
Direct bucket delivery
Webhook
HTTP POST per record
API
REST endpoints to query extracted records
PostgreSQL
Direct database inserts
S3
Direct bucket delivery — compatible with any data lake
// faq

Common questions.

About hoovers.com scraping, legality, and pipeline operations.

Ask us directly →
How do you handle Hoovers rate limits?

We use residential ISP proxies and enforce strict concurrency limits per IP. We also add randomized jitter between requests to mimic human browsing patterns and avoid triggering automated blocks.

Can you extract complete corporate family trees?

Yes. Our crawlers recursively follow pagination links within the corporate hierarchy views to map every subsidiary back to the global ultimate parent.

Do you provide direct executive emails?

No. Hoovers masks direct contact information behind a credit-based system. We extract the available metadata (names, titles, LinkedIn URLs) but cannot bypass credit-gated contact reveals.

How fresh is the firmographic data?

Data freshness depends on your pipeline cadence. We can run daily, weekly, or monthly diffs against your target list to capture changes in employee count, revenue, or executive leadership.

Can you match my existing CRM records to Hoovers data?

Yes. If you provide a list of company names and domains, we use search parameters to locate the correct profile and append the corresponding D-U-N-S number to your dataset.

What is the minimum viable engagement?

Our minimum engagement starts with a defined list of 10,000 target companies. We price based on total volume and the frequency of updates required.

$ dataflirt scope --new-project --source=hoovers.com ready

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need to enrich 50,000 CRM records or map an entire industry sector, we scope, build, and operate the pipeline. Tell us what you need.

hello@dataflirt.com · Bengaluru · IST · typical reply < 4h
Services

Data Extraction for Every Industry

View All Services →