We extract company profiles, D-U-N-S numbers, corporate family trees, financials, and industry classifications. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Snowflake on your cadence.
Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.
Complete list of extractable fields for Company Profiles objects from hoovers.com. All fields typed and schema-versioned.
"duns_number": "00-123-4567", "company_name": "Acme Corp", "city": "San Francisco", "state": "CA", "country": "USA", "employee_count": 1450, "revenue_usd": 250000000, "year_founded": 1998
| # | duns_number | company_name | legal_name | address | city | state |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for Corporate Hierarchy objects from hoovers.com. All fields typed and schema-versioned.
"duns_number": "00-123-4567", "parent_duns": "00-987-6543", "global_ultimate_duns": "00-111-2222", "hierarchy_level": 3, "relationship_type": "Subsidiary", "parent_name": "Acme Holdings LLC", "subsidiary_count": 12
| # | duns_number | parent_duns | global_ultimate_duns | domestic_ultimate_duns | subsidiary_count | hierarchy_level |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for Financials objects from hoovers.com. All fields typed and schema-versioned.
"duns_number": "00-123-4567", "fiscal_year": 2025, "annual_revenue": 250000000, "gross_profit": 85000000, "net_income": 12000000, "currency": "USD", "growth_pct": 14.2
| # | duns_number | fiscal_year | annual_revenue | gross_profit | net_income | total_assets |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for Executives objects from hoovers.com. All fields typed and schema-versioned.
"duns_number": "00-123-4567", "first_name": "Jane", "last_name": "Doe", "job_title": "Chief Technology Officer", "department": "Engineering", "management_level": "C-Level", "email_format": "{first}.{last}@acmecorp.com"
| # | duns_number | contact_id | first_name | last_name | job_title | department |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for Industry & Competitors objects from hoovers.com. All fields typed and schema-versioned.
"duns_number": "00-123-4567", "primary_naics": "511210", "primary_sic": "7372", "industry_description": "Software Publishers", "competitor_duns_list": "['00-222-3333', '00-444-5555']", "industry_rank": 4
| # | duns_number | primary_naics | primary_sic | secondary_naics | industry_description | competitor_duns_list |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Our Hoovers scraper navigates complex session states and rate limits to extract firmographics, hierarchies, and financial data with high fidelity.
Capture company names, addresses, employee counts, revenue figures, and foundational firmographics for any target list.
Extract and map the proprietary Dun & Bradstreet D-U-N-S numbers to your internal CRM records for master data management.
Crawl paginated hierarchy views to reconstruct parent, subsidiary, and global ultimate relationships.
Extract leadership teams, job titles, departments, and management levels associated with specific corporate entities.
Capture annual revenue, gross profit, net income, and asset metrics across multiple fiscal years.
Extract primary and secondary NAICS and SIC codes to categorise your target accounts accurately.
Extract Hoovers competitor lists to map market landscapes and identify overlapping accounts.
Run continuous pipelines to detect changes in employee counts, revenue, or executive leadership.
Detect DOM changes immediately. Our pipelines use fallback selectors to ensure data continuity.
Brief in. Clean data out.
Provide target company names, domains, or D-U-N-S numbers. We define the extraction schema together.
We configure crawlers, proxy rotation, session management, and authentication handling for hoovers.com.
Schema validation, null-rate checks, and data normalisation testing before full launch.
JSON, CSV, or Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.
Hoovers gates data behind complex session states and strict rate limits. We manage the infrastructure so you get clean records.
Hoovers monitors session states aggressively. Our pipeline maintains isolated cookie jars and rotates TLS fingerprints to prevent session invalidation.
We utilise US-based residential proxies with randomized request delays to stay under Hoovers' strict request-per-minute thresholds.
Financial tables and extended executive lists require JavaScript execution. We run headless Playwright instances to render the full DOM.
Corporate hierarchies are often deeply nested and paginated. Our recursive crawlers traverse every branch to build complete parent-child maps.
We normalise raw text strings into structured data types, converting '2.5M' into numeric integers and standardising global address formats.
Sales operations teams append accurate firmographics and D-U-N-S numbers to incomplete Salesforce or HubSpot records.
Revenue leaders use employee counts, revenue bands, and industry codes to carve equitable sales territories.
Data engineering teams use corporate family trees to link subsidiary accounts to global ultimate parents.
Strategy teams map industry landscapes by extracting competitor lists and financial growth metrics.
Procurement teams monitor the financial health and corporate structure of critical vendors.
Investment analysts screen for acquisition targets using specific revenue, growth, and industry criteria.
"Hoovers holds the definitive map of global B2B corporate structures, but extracting it requires navigating strict session controls and complex paginated hierarchies."
Most data teams waste months building custom scrapers for Hoovers, only to find their IP blocked or their schema broken by a minor DOM update. DataFlirt manages the proxies, session rotation, and parsing logic. You just query the normalised data in your warehouse.
Everything supported by our hoovers.com scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.
Open-source tooling on proven cloud infra — no vendor lock-in, full observability.
Scrapy handles crawl orchestration and deduplication. Playwright handles JavaScript rendering and session token maintenance required by Hoovers.
We maintain pools of residential ISP proxies. Rotation happens with sticky sessions to prevent forced logouts and IP bans.
Pipelines run on AWS ECS. Airflow handles scheduling and dependency management. All state is stored in managed Postgres.
Data delivered to where your team already works — no new tooling required.
About hoovers.com scraping, legality, and pipeline operations.
Ask us directly →We use residential ISP proxies and enforce strict concurrency limits per IP. We also add randomized jitter between requests to mimic human browsing patterns and avoid triggering automated blocks.
Yes. Our crawlers recursively follow pagination links within the corporate hierarchy views to map every subsidiary back to the global ultimate parent.
No. Hoovers masks direct contact information behind a credit-based system. We extract the available metadata (names, titles, LinkedIn URLs) but cannot bypass credit-gated contact reveals.
Data freshness depends on your pipeline cadence. We can run daily, weekly, or monthly diffs against your target list to capture changes in employee count, revenue, or executive leadership.
Yes. If you provide a list of company names and domains, we use search parameters to locate the correct profile and append the corresponding D-U-N-S number to your dataset.
Our minimum engagement starts with a defined list of 10,000 target companies. We price based on total volume and the frequency of updates required.
20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need to enrich 50,000 CRM records or map an entire industry sector, we scope, build, and operate the pipeline. Tell us what you need.