We extract 30-year financial histories, GF Value calculations, insider transaction logs, and guru portfolio updates from Gurufocus. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Snowflake on your cadence.
Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.
Complete list of extractable fields for Financial Statements objects from gurufocus.com. All fields typed and schema-versioned.
"ticker": "AAPL", "company_name": "Apple Inc", "exchange": "NAS", "sector": "Technology", "market_cap": 2984500000000, "revenue_ttm": 383285000000, "pe_ratio": 28.5, "free_cash_flow": 99584000000
| # | ticker | company_name | exchange | sector | industry | market_cap |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for Valuation & GF Metrics objects from gurufocus.com. All fields typed and schema-versioned.
"ticker": "AAPL", "gf_score": 94, "financial_strength": 7, "profitability_rank": 10, "gf_value_rank": 5, "piotroski_f_score": 7, "altman_z_score": 6.84, "roic": 28.4
| # | ticker | gf_score | financial_strength | profitability_rank | growth_rank | gf_value_rank |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for Insider Trades objects from gurufocus.com. All fields typed and schema-versioned.
"ticker": "AAPL", "insider_name": "Cook Timothy D", "position": "Chief Executive Officer", "transaction_date": "2026-04-01", "transaction_type": "Sell", "price": 175.42, "shares_traded": 196410, "value_usd": 34454242
| # | ticker | insider_name | position | transaction_date | transaction_type | price |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for Guru Portfolios objects from gurufocus.com. All fields typed and schema-versioned.
"ticker": "AAPL", "guru_name": "Warren Buffett", "quarter": "Q4 2025", "action": "Reduce", "impact_on_portfolio": -1.2, "shares_held": 905560000, "current_value": 158473000000, "percent_of_shares_outstanding": 5.8
| # | ticker | guru_name | quarter | action | impact_on_portfolio | shares_held |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for Dividend Data objects from gurufocus.com. All fields typed and schema-versioned.
"ticker": "AAPL", "dividend_yield": 0.54, "forward_dividend": 0.96, "payout_ratio": 0.15, "dividend_growth_3y": 5.2, "ex_dividend_date": "2026-05-10", "consecutive_years_growth": 11, "buyback_yield": 3.1
| # | ticker | dividend_yield | forward_dividend | payout_ratio | dividend_growth_3y | dividend_growth_5y |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Our Gurufocus scraper navigates complex JavaScript grids, heavy DOM structures, and bot protection to deliver clean tabular data for your financial models.
Extract income statements, balance sheets, and cash flow data across three decades. Normalised into flat time-series records.
Capture the proprietary GF Score, Peter Lynch fair value, Piotroski F-Score, Altman Z-Score, and Beneish M-Score for any ticker.
Track CEO, CFO, and director buying and selling behaviour. Extract transaction dates, share volumes, and average prices.
Monitor portfolio changes from superinvestors. Extract quarterly additions, reductions, and portfolio impact percentages.
Scrape Gurufocus proprietary warning signs like inventory build-up, margin contraction, or asset growth vs revenue growth.
Extract default inputs for Discounted Cash Flow models including WACC, terminal growth rates, and projected EPS.
Track shareholder yield metrics including historical dividend growth rates, payout ratios, and net share repurchases.
Extract percentile rankings for profitability, growth, and financial strength against industry peers.
Configure daily or weekly runs to capture new 13F filings, insider trades, and earnings report updates automatically.
Brief in. Clean data out.
Provide ticker lists, exchanges, or specific guru profiles. We design the extraction schema together.
We configure Playwright crawlers, proxy rotation, session management, and Cloudflare bypass for gurufocus.com.
Schema validation, null-rate checks, and financial data normalisation before full launch.
JSON / CSV / Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.
Financial data sites use heavy JavaScript grids and aggressive rate limiting. Here is how we ensure reliable data delivery.
Gurufocus uses Cloudflare to block automated traffic. Our infrastructure uses residential ISP proxies and Playwright sessions with realistic TLS fingerprints to bypass bot detection without triggering blocks.
The 30-year financial tables are rendered dynamically via JavaScript. We execute full browser sessions to wait for data hydration, ensuring we capture the complete time series rather than empty DOM nodes.
Financial statements on the web are messy. We normalise nested rows, handle missing quarters, standardise currency formats, and convert string representations into strict numerical types for your warehouse.
Extracting decades of data across thousands of tickers triggers rate limits. We distribute requests across thousands of IPs and manage concurrency strictly to ensure complete extraction without timeouts.
We hash records per ticker and only emit data when new filings, trades, or price updates occur. This reduces ingestion costs and keeps your quantitative models fed with only fresh signals.
Quant funds use 30-year financial histories and Piotroski F-Scores to backtest value investing factors and build alpha-generating models.
Asset managers screen thousands of global equities for low P/B ratios, high ROIC, and strong GF Scores to identify undervalued assets.
Hedge funds aggregate cluster buying from C-suite executives to predict positive earnings surprises or upcoming acquisitions.
Corporate finance teams track industry peers to benchmark capital allocation, margin expansion, and debt-to-equity ratios over time.
Research analysts automate the collection of DCF inputs, WACC estimates, and historical growth rates to accelerate report generation.
Risk departments monitor Altman Z-Scores and Beneish M-Scores across portfolios to detect bankruptcy risk or earnings manipulation.
"Gurufocus aggregates decades of SEC filings into clean valuation metrics, but running quantitative models requires raw access to the underlying 30-year time series."
Extracting deep financial history requires navigating complex JavaScript grids, aggressive rate limits, and dynamic chart hydration. DataFlirt handles the Cloudflare bypass, session rotation, and tabular normalisation so your quants can focus on alpha generation instead of DOM parsing.
Everything supported by our gurufocus.com scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.
Open-source tooling on proven cloud infra — no vendor lock-in, full observability.
Financial tables on Gurufocus are heavily reliant on client-side rendering. We use Playwright to execute JavaScript, hydrate the DOM, and extract complete time series data accurately.
We route requests through high-reputation US residential proxies to avoid rate limits and Cloudflare blocks when extracting decades of data across thousands of tickers.
Pipelines run on Kubernetes and AWS Lambda. Apache Airflow manages execution schedules, retries, and data delivery to ensure your warehouse is updated before market open.
Data delivered to where your team already works — no new tooling required.
About gurufocus.com scraping, legality, and pipeline operations.
Ask us directly →Scraping publicly available financial data is generally permissible. DataFlirt extracts only public, non-authenticated metrics, historical financials, and SEC-derived data. We do not bypass paywalls or extract premium-gated content without authorised credentials. Clients must ensure their use case complies with applicable terms of service.
We use residential ISP proxies combined with Playwright browser sessions that mimic realistic TLS fingerprints and user behaviour. This allows us to navigate bot protection reliably without triggering CAPTCHAs or IP blocks.
Yes. We navigate the paginated and JavaScript-rendered financial grids to extract the complete available history for income statements, balance sheets, and cash flow statements.
No. We only scrape data that is publicly accessible on the free tier. If you require premium data, you must provide valid paid account credentials, and we can configure a dedicated authenticated pipeline for you.
Pipelines can be configured to run daily or weekly. For time-sensitive data like insider trades or 13F filings, we can configure high-frequency monitoring on specific ticker lists.
Financial APIs are often expensive, strictly rate-limited, and lack proprietary metrics like the GF Score or Peter Lynch charts. Scraping provides access to the exact data structures and proprietary valuation models visible on the site.
Yes. We offer a sample extraction of up to 50 tickers during the scoping phase so your quantitative team can validate the schema, accuracy, and completeness of the historical time series.
20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a daily feed of insider trades or a complete historical dump of 30-year financials across 10,000 tickers - we build and manage the infrastructure. Tell us your requirements.