SYSTEM all green source guidestar.org queue 18,942 profiles p99 latency 284ms dataflirt.com · scraper/guidestar-org
RUN - 37 active pipelines - guidestar.org live

Nonprofit intelligence,
at warehouse scale.

We extract 501(c)(3) tax filings, executive compensation, financial metrics, and governance data from Guidestar. Delivered as clean JSON, CSV, or Parquet to your infrastructure.

Nonprofits extracted
1.8M /month
Form 990s parsed
4.2M /run
Exec profiles
840K /run
Active pipelines
37
Uptime
99.98%
Data Dictionary

Every field we extract from guidestar.org

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Organization Profile objects from guidestar.org. All fields typed and schema-versioned.

einorganization_namentee_coderuling_yearmission_statementaddress_line_1citystatezip_codewebsitephone_number
organization_profile
● 200 OK
"ein": "12-3456789",
"organization_name": "Global Health Initiative",
"ntee_code": "E19",
"ruling_year": 1998,
"city": "Seattle",
"state": "WA",
"website": "https://example.org"
# einorganization_namentee_coderuling_yearmission_statementaddress_line_1
1
2
3

Complete list of extractable fields for Financial Metrics objects from guidestar.org. All fields typed and schema-versioned.

einfiscal_yeartotal_revenuetotal_expensesnet_assetstotal_assetstotal_liabilitiescontributions_giftsgrants_paidprogram_service_revenue
financial_metrics
● 200 OK
"ein": "12-3456789",
"fiscal_year": 2023,
"total_revenue": 4500000.0,
"total_expenses": 4100000.0,
"net_assets": 1200000.0,
"contributions_gifts": 3800000.0
# einfiscal_yeartotal_revenuetotal_expensesnet_assetstotal_assets
1
2
3

Complete list of extractable fields for Executive Compensation objects from guidestar.org. All fields typed and schema-versioned.

einexecutive_nametitlebase_salarybonusother_compensationtotal_compensationhours_per_weekreport_year
executive_compensation
● 200 OK
"executive_name": "Jane Doe",
"title": "Chief Executive Officer",
"base_salary": 185000.0,
"bonus": 15000.0,
"total_compensation": 200000.0,
"hours_per_week": 40.0
# einexecutive_nametitlebase_salarybonusother_compensation
1
2
3

Complete list of extractable fields for Governance & Board objects from guidestar.org. All fields typed and schema-versioned.

einboard_chair_nameboard_sizeindependent_voting_membersconflict_of_interest_policywhistleblower_policydocument_retention_policyceo_compensation_processaudit_committee
governance_& board
● 200 OK
"board_size": 12,
"independent_voting_members": 11,
"conflict_of_interest_policy": true,
"whistleblower_policy": true,
"document_retention_policy": true,
"audit_committee": true
# einboard_chair_nameboard_sizeindependent_voting_membersconflict_of_interest_policywhistleblower_policy
1
2
3

Complete list of extractable fields for Programs & Impact objects from guidestar.org. All fields typed and schema-versioned.

einprogram_1_descriptionprogram_1_expenseprogram_2_descriptionprogram_2_expensepopulations_servedgeographic_areaimpact_metricsevaluation_method
programs_& impact
● 200 OK
"program_1_expense": 2100000.0,
"populations_served": "Low-income youth",
"geographic_area": "Pacific Northwest",
"evaluation_method": "Annual external audit",
"impact_metrics": "15,000 students reached",
"program_1_description": "After-school tutoring"
# einprogram_1_descriptionprogram_1_expenseprogram_2_descriptionprogram_2_expensepopulations_served
1
2
3

Capabilities

Everything you need from Guidestar

Our Guidestar scraper handles profile lookups, financial parsing, and Form 990 extraction with session management and anti-bot circumvention built in.

EIN & Profile Lookups

Extract core organizational data, mission statements, and contact details using EIN or name matching.

Form 990 Parsing

Automated extraction of financial metrics from historical and current IRS Form 990 filings.

Executive Compensation Tracking

Capture base salary, bonuses, and total compensation for key officers and board members.

Financial Normalisation

Standardise revenue, expenses, and asset metrics across different filing years and formats.

Board Member Extraction

Identify board composition, independent voting members, and governance policy adherence.

Program Expense Ratios

Calculate programmatic efficiency by extracting detailed functional expense breakdowns.

Grantmaking Intelligence

Track grants paid out by foundations, including recipient details and grant amounts.

Historical Data Snapshots

Access filing history to build time-series models of nonprofit growth and financial health.

Continuous Pipeline Updates

Run scheduled pipelines to capture new tax filings and profile updates as they appear.

// engagement pipeline

From EIN list to warehouse record

Brief in. Clean data out.

Define Scope
d 0

Provide EIN lists, NTEE codes, or geographic filters. We design the extraction schema together.

Pipeline Build
d 2–4

We configure Scrapy crawlers, proxy rotation, and PDF parsing logic for guidestar.org.

Validation & QA
d 4–6

Schema validation, null-rate checks, and financial outlier detection before full launch.

Delivery
ongoing

JSON, CSV, or Parquet pushed to your S3 bucket or data warehouse on agreed cadence.

Under the hood

How our Guidestar pipeline handles the hard parts

Extracting nonprofit data requires bypassing rate limits and parsing complex tax documents. Here is how we build resilient pipelines.

pipeline-monitor · guidestar.org · live ● active
// fingerprinting
Identity rotation
TLS fingerprintrandomised
User-agentrotated
IP poolresidential
Challenges blocked0
// pagination
Page coverage
48,291 pages queued running
// observability
Pipeline health
99.9%
uptime
142ms
p99 lat
0.3%
null rate
2
alerts
PDF extraction
Automated Form 990 parsing

Guidestar relies heavily on PDF tax filings. We use OCR and structured document parsing to extract line-item financial data from historical 990s, converting unstructured PDFs into queryable JSON.

Rate limits
Residential proxy rotation

Guidestar strictly limits search volume and profile views per IP. Our crawlers use residential ISP proxies with realistic request timing to maintain continuous extraction without triggering blocks.

Schema stability
Normalisation across tax years

IRS forms change over time. Our pipeline normalises financial fields across different versions of Form 990, 990-EZ, and 990-PF, ensuring consistent downstream data.

Search pagination
Deep category crawling

Guidestar truncates broad search results. We programmatically segment searches by NTEE code, state, and revenue brackets to extract the complete catalogue of 501(c)(3) organizations.

Authentication
Session management

Accessing detailed financial data requires authenticated sessions. We manage cookie jars and session tokens securely, rotating credentials to maintain uninterrupted access to public filings.

Applications

Who uses Guidestar data

Teams across industries use guidestar.org data to build competitive products and smarter operations.

01
Philanthropic Sourcing

Foundations use NTEE codes and financial metrics to identify eligible nonprofits for grantmaking initiatives.

02
Wealth Management

Advisors track executive compensation to identify high-net-worth individuals in the nonprofit sector.

03
B2B Sales

Software vendors target nonprofits based on revenue size, employee count, and technology budgets.

04
Academic Research

Researchers analyze long-term trends in nonprofit financial health, executive pay gaps, and sector growth.

05
Donor Intelligence

Major gift officers review governance policies and board composition to evaluate organizational stability.

06
Policy Analysis

Think tanks aggregate program expense ratios and geographic data to measure the impact of social programs.

Why DataFlirt

"Guidestar holds the definitive record of US nonprofit financials, but extracting structured data from millions of Form 990 filings requires serious infrastructure."

Most teams underestimate the complexity of parsing historical tax filings and bypassing aggressive rate limits. DataFlirt manages the residential proxies, document extraction, and schema normalisation so your data engineers can focus on downstream analysis rather than pipeline maintenance.

Technical Spec

Guidestar scraper — technical capabilities

Everything supported by our guidestar.org scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

Public 990 PDF parsing
Extract line-item financials from standard 990, 990-EZ, and 990-PF filings
Supported
Executive compensation extraction
Capture base, bonus, and total compensation for listed officers
Supported
EIN batch lookups
Match internal lists against Guidestar database via EIN
Supported
NTEE category filtering
Segment extraction by specific nonprofit taxonomy codes
Supported
Financial time-series
Aggregate multiple years of financial data into a single record
Supported
Change detection diffs
Hash-based diff: only emit records with changed fields since last run
Supported
Guidestar Pro premium financials
Proprietary financial analysis and premium benchmark reports
Partial
Seals of Transparency gated documents
Internal diversity metrics and gated strategic plans
Partial
Infrastructure

Infrastructure powering the Guidestar pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheus
Scrapy + Playwright Stack

Scrapy handles crawl orchestration and deduplication. Playwright manages JavaScript rendering and authenticated sessions required for detailed profile views.

Residential Proxy Infrastructure

We maintain pools of residential ISP proxies across US regions. Rotation happens per-request with sticky sessions where required to bypass rate limits.

Cloud-Native Orchestration

Pipelines run on AWS Lambda and ECS. Airflow handles scheduling, dependency management, and SLA alerting. All state is stored in managed Postgres.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON
Newline-delimited or nested records
CSV
Flat file with typed columns
XLS
Excel format for business teams
Parquet
Columnar format for data warehouses
AWS S3
Direct bucket delivery
Webhook
HTTP POST per record
API
REST endpoint for on-demand queries
PostgreSQL
Direct database upsert
S3
Direct bucket delivery — compatible with any data lake
// faq

Common questions.

About guidestar.org scraping, legality, and pipeline operations.

Ask us directly →
Is scraping Guidestar legal?

Scraping publicly available information, such as public IRS Form 990 filings, is generally permissible. DataFlirt targets only public profile data and tax documents. We do not extract proprietary Guidestar Pro analytics or bypass premium paywalls. Clients should review Guidestar's ToS and consult legal counsel for specific use cases.

How do you handle PDF Form 990 extraction?

We use OCR and structured document parsing libraries to extract specific line items from IRS PDFs. Our pipeline normalises these fields across different tax years and form variants (990, 990-EZ, 990-PF).

Can you extract historical financial data?

Yes. We can extract and aggregate financial data from multiple historical filings to build a time-series view of an organization's revenue and expenses.

How do you manage Guidestar rate limits?

We use US-based residential ISP proxies and realistic request timing. Our crawlers distribute requests across large IP pools to maintain extraction velocity without triggering blocks.

Do you extract executive compensation?

Yes. We extract the compensation tables from Form 990s, capturing base salary, bonuses, and total compensation for listed officers and key employees.

What is the minimum viable engagement?

Our smallest packages start at a defined EIN list (typically 5,000-50,000 profiles) with monthly delivery. For full database extractions, we price based on volume and delivery frequency.

$ dataflirt scope --new-project --source=guidestar.org ready

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a targeted EIN lookup or a continuous feed of nonprofit financials across 1.8M profiles, we scope, build, and operate the pipeline. Tell us what you need.

hello@dataflirt.com · Bengaluru · IST · typical reply < 4h
Services

Data Extraction for Every Industry

View All Services →