We extract 501(c)(3) tax filings, executive compensation, financial metrics, and governance data from Guidestar. Delivered as clean JSON, CSV, or Parquet to your infrastructure.
Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.
Complete list of extractable fields for Organization Profile objects from guidestar.org. All fields typed and schema-versioned.
"ein": "12-3456789", "organization_name": "Global Health Initiative", "ntee_code": "E19", "ruling_year": 1998, "city": "Seattle", "state": "WA", "website": "https://example.org"
| # | ein | organization_name | ntee_code | ruling_year | mission_statement | address_line_1 |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for Financial Metrics objects from guidestar.org. All fields typed and schema-versioned.
"ein": "12-3456789", "fiscal_year": 2023, "total_revenue": 4500000.0, "total_expenses": 4100000.0, "net_assets": 1200000.0, "contributions_gifts": 3800000.0
| # | ein | fiscal_year | total_revenue | total_expenses | net_assets | total_assets |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for Executive Compensation objects from guidestar.org. All fields typed and schema-versioned.
"executive_name": "Jane Doe", "title": "Chief Executive Officer", "base_salary": 185000.0, "bonus": 15000.0, "total_compensation": 200000.0, "hours_per_week": 40.0
| # | ein | executive_name | title | base_salary | bonus | other_compensation |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for Governance & Board objects from guidestar.org. All fields typed and schema-versioned.
"board_size": 12, "independent_voting_members": 11, "conflict_of_interest_policy": true, "whistleblower_policy": true, "document_retention_policy": true, "audit_committee": true
| # | ein | board_chair_name | board_size | independent_voting_members | conflict_of_interest_policy | whistleblower_policy |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for Programs & Impact objects from guidestar.org. All fields typed and schema-versioned.
"program_1_expense": 2100000.0, "populations_served": "Low-income youth", "geographic_area": "Pacific Northwest", "evaluation_method": "Annual external audit", "impact_metrics": "15,000 students reached", "program_1_description": "After-school tutoring"
| # | ein | program_1_description | program_1_expense | program_2_description | program_2_expense | populations_served |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Our Guidestar scraper handles profile lookups, financial parsing, and Form 990 extraction with session management and anti-bot circumvention built in.
Extract core organizational data, mission statements, and contact details using EIN or name matching.
Automated extraction of financial metrics from historical and current IRS Form 990 filings.
Capture base salary, bonuses, and total compensation for key officers and board members.
Standardise revenue, expenses, and asset metrics across different filing years and formats.
Identify board composition, independent voting members, and governance policy adherence.
Calculate programmatic efficiency by extracting detailed functional expense breakdowns.
Track grants paid out by foundations, including recipient details and grant amounts.
Access filing history to build time-series models of nonprofit growth and financial health.
Run scheduled pipelines to capture new tax filings and profile updates as they appear.
Brief in. Clean data out.
Provide EIN lists, NTEE codes, or geographic filters. We design the extraction schema together.
We configure Scrapy crawlers, proxy rotation, and PDF parsing logic for guidestar.org.
Schema validation, null-rate checks, and financial outlier detection before full launch.
JSON, CSV, or Parquet pushed to your S3 bucket or data warehouse on agreed cadence.
Extracting nonprofit data requires bypassing rate limits and parsing complex tax documents. Here is how we build resilient pipelines.
Guidestar relies heavily on PDF tax filings. We use OCR and structured document parsing to extract line-item financial data from historical 990s, converting unstructured PDFs into queryable JSON.
Guidestar strictly limits search volume and profile views per IP. Our crawlers use residential ISP proxies with realistic request timing to maintain continuous extraction without triggering blocks.
IRS forms change over time. Our pipeline normalises financial fields across different versions of Form 990, 990-EZ, and 990-PF, ensuring consistent downstream data.
Guidestar truncates broad search results. We programmatically segment searches by NTEE code, state, and revenue brackets to extract the complete catalogue of 501(c)(3) organizations.
Accessing detailed financial data requires authenticated sessions. We manage cookie jars and session tokens securely, rotating credentials to maintain uninterrupted access to public filings.
Foundations use NTEE codes and financial metrics to identify eligible nonprofits for grantmaking initiatives.
Advisors track executive compensation to identify high-net-worth individuals in the nonprofit sector.
Software vendors target nonprofits based on revenue size, employee count, and technology budgets.
Researchers analyze long-term trends in nonprofit financial health, executive pay gaps, and sector growth.
Major gift officers review governance policies and board composition to evaluate organizational stability.
Think tanks aggregate program expense ratios and geographic data to measure the impact of social programs.
"Guidestar holds the definitive record of US nonprofit financials, but extracting structured data from millions of Form 990 filings requires serious infrastructure."
Most teams underestimate the complexity of parsing historical tax filings and bypassing aggressive rate limits. DataFlirt manages the residential proxies, document extraction, and schema normalisation so your data engineers can focus on downstream analysis rather than pipeline maintenance.
Everything supported by our guidestar.org scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.
Open-source tooling on proven cloud infra — no vendor lock-in, full observability.
Scrapy handles crawl orchestration and deduplication. Playwright manages JavaScript rendering and authenticated sessions required for detailed profile views.
We maintain pools of residential ISP proxies across US regions. Rotation happens per-request with sticky sessions where required to bypass rate limits.
Pipelines run on AWS Lambda and ECS. Airflow handles scheduling, dependency management, and SLA alerting. All state is stored in managed Postgres.
Data delivered to where your team already works — no new tooling required.
About guidestar.org scraping, legality, and pipeline operations.
Ask us directly →Scraping publicly available information, such as public IRS Form 990 filings, is generally permissible. DataFlirt targets only public profile data and tax documents. We do not extract proprietary Guidestar Pro analytics or bypass premium paywalls. Clients should review Guidestar's ToS and consult legal counsel for specific use cases.
We use OCR and structured document parsing libraries to extract specific line items from IRS PDFs. Our pipeline normalises these fields across different tax years and form variants (990, 990-EZ, 990-PF).
Yes. We can extract and aggregate financial data from multiple historical filings to build a time-series view of an organization's revenue and expenses.
We use US-based residential ISP proxies and realistic request timing. Our crawlers distribute requests across large IP pools to maintain extraction velocity without triggering blocks.
Yes. We extract the compensation tables from Form 990s, capturing base salary, bonuses, and total compensation for listed officers and key employees.
Our smallest packages start at a defined EIN list (typically 5,000-50,000 profiles) with monthly delivery. For full database extractions, we price based on volume and delivery frequency.
20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a targeted EIN lookup or a continuous feed of nonprofit financials across 1.8M profiles, we scope, build, and operate the pipeline. Tell us what you need.