SYSTEM all green source gurufocus.com queue 12,841 tickers p99 latency 312ms dataflirt.com · scraper/gurufocus-com
RUN - 31 active pipelines - gurufocus.com live

Value investing data,
at quantitative scale.

We extract 30-year financial histories, GF Value calculations, insider transaction logs, and guru portfolio updates from Gurufocus. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Snowflake on your cadence.

Tickers tracked
68,412 /run
Insider trades
14,921 /24h
13F updates
4,192 /week
Active pipelines
31
Uptime
99.98%
Data Dictionary

Every field we extract from gurufocus.com

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Financial Statements objects from gurufocus.com. All fields typed and schema-versioned.

tickercompany_nameexchangesectorindustrymarket_caprevenue_ttmnet_income_ttmeps_dilutedpe_ratiopb_ratiops_ratiofree_cash_flow
financial_statements
● 200 OK
"ticker": "AAPL",
"company_name": "Apple Inc",
"exchange": "NAS",
"sector": "Technology",
"market_cap": 2984500000000,
"revenue_ttm": 383285000000,
"pe_ratio": 28.5,
"free_cash_flow": 99584000000
# tickercompany_nameexchangesectorindustrymarket_cap
1
2
3

Complete list of extractable fields for Valuation & GF Metrics objects from gurufocus.com. All fields typed and schema-versioned.

tickergf_scorefinancial_strengthprofitability_rankgrowth_rankgf_value_rankmomentum_rankpiotroski_f_scorealtman_z_scorebeneish_m_scorewaccroic
valuation_& gf metrics
● 200 OK
"ticker": "AAPL",
"gf_score": 94,
"financial_strength": 7,
"profitability_rank": 10,
"gf_value_rank": 5,
"piotroski_f_score": 7,
"altman_z_score": 6.84,
"roic": 28.4
# tickergf_scorefinancial_strengthprofitability_rankgrowth_rankgf_value_rank
1
2
3

Complete list of extractable fields for Insider Trades objects from gurufocus.com. All fields typed and schema-versioned.

tickerinsider_namepositiontransaction_datetransaction_typepriceshares_tradedshares_heldvalue_usdfiling_date
insider_trades
● 200 OK
"ticker": "AAPL",
"insider_name": "Cook Timothy D",
"position": "Chief Executive Officer",
"transaction_date": "2026-04-01",
"transaction_type": "Sell",
"price": 175.42,
"shares_traded": 196410,
"value_usd": 34454242
# tickerinsider_namepositiontransaction_datetransaction_typeprice
1
2
3

Complete list of extractable fields for Guru Portfolios objects from gurufocus.com. All fields typed and schema-versioned.

tickerguru_namequarteractionimpact_on_portfolioshares_heldcurrent_valuepercent_of_shares_outstandingaverage_price
guru_portfolios
● 200 OK
"ticker": "AAPL",
"guru_name": "Warren Buffett",
"quarter": "Q4 2025",
"action": "Reduce",
"impact_on_portfolio": -1.2,
"shares_held": 905560000,
"current_value": 158473000000,
"percent_of_shares_outstanding": 5.8
# tickerguru_namequarteractionimpact_on_portfolioshares_held
1
2
3

Complete list of extractable fields for Dividend Data objects from gurufocus.com. All fields typed and schema-versioned.

tickerdividend_yieldforward_dividendpayout_ratiodividend_growth_3ydividend_growth_5yex_dividend_dateconsecutive_years_growthbuyback_yield
dividend_data
● 200 OK
"ticker": "AAPL",
"dividend_yield": 0.54,
"forward_dividend": 0.96,
"payout_ratio": 0.15,
"dividend_growth_3y": 5.2,
"ex_dividend_date": "2026-05-10",
"consecutive_years_growth": 11,
"buyback_yield": 3.1
# tickerdividend_yieldforward_dividendpayout_ratiodividend_growth_3ydividend_growth_5y
1
2
3

Capabilities

Extract the full quantitative dataset

Our Gurufocus scraper navigates complex JavaScript grids, heavy DOM structures, and bot protection to deliver clean tabular data for your financial models.

30-Year Financial Histories

Extract income statements, balance sheets, and cash flow data across three decades. Normalised into flat time-series records.

GF Value & Ratios

Capture the proprietary GF Score, Peter Lynch fair value, Piotroski F-Score, Altman Z-Score, and Beneish M-Score for any ticker.

Insider Transaction Logs

Track CEO, CFO, and director buying and selling behaviour. Extract transaction dates, share volumes, and average prices.

Guru & 13F Tracking

Monitor portfolio changes from superinvestors. Extract quarterly additions, reductions, and portfolio impact percentages.

Warning Signs & Red Flags

Scrape Gurufocus proprietary warning signs like inventory build-up, margin contraction, or asset growth vs revenue growth.

DCF Model Parameters

Extract default inputs for Discounted Cash Flow models including WACC, terminal growth rates, and projected EPS.

Dividend & Buyback Yields

Track shareholder yield metrics including historical dividend growth rates, payout ratios, and net share repurchases.

Industry Benchmarking

Extract percentile rankings for profitability, growth, and financial strength against industry peers.

Continuous Pipeline Updates

Configure daily or weekly runs to capture new 13F filings, insider trades, and earnings report updates automatically.

// engagement pipeline

From ticker list to quantitative warehouse

Brief in. Clean data out.

Define Scope
d 0

Provide ticker lists, exchanges, or specific guru profiles. We design the extraction schema together.

Pipeline Build
d 2–4

We configure Playwright crawlers, proxy rotation, session management, and Cloudflare bypass for gurufocus.com.

Validation & QA
d 4–6

Schema validation, null-rate checks, and financial data normalisation before full launch.

Delivery
ongoing

JSON / CSV / Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.

Under the hood

How our pipeline handles financial data complexity

Financial data sites use heavy JavaScript grids and aggressive rate limiting. Here is how we ensure reliable data delivery.

pipeline-monitor · gurufocus.com · live ● active
// fingerprinting
Identity rotation
TLS fingerprintrandomised
User-agentrotated
IP poolresidential
Challenges blocked0
// pagination
Page coverage
48,291 pages queued running
// observability
Pipeline health
99.9%
uptime
142ms
p99 lat
0.3%
null rate
2
alerts
Anti-bot layer
Cloudflare bypass and residential proxies

Gurufocus uses Cloudflare to block automated traffic. Our infrastructure uses residential ISP proxies and Playwright sessions with realistic TLS fingerprints to bypass bot detection without triggering blocks.

JavaScript rendering
Hydrating complex financial grids

The 30-year financial tables are rendered dynamically via JavaScript. We execute full browser sessions to wait for data hydration, ensuring we capture the complete time series rather than empty DOM nodes.

Data normalisation
Structuring nested financial data

Financial statements on the web are messy. We normalise nested rows, handle missing quarters, standardise currency formats, and convert string representations into strict numerical types for your warehouse.

Rate limiting
Distributed crawling architecture

Extracting decades of data across thousands of tickers triggers rate limits. We distribute requests across thousands of IPs and manage concurrency strictly to ensure complete extraction without timeouts.

Change detection
Optimise downstream ingestion

We hash records per ticker and only emit data when new filings, trades, or price updates occur. This reduces ingestion costs and keeps your quantitative models fed with only fresh signals.

Applications

Who uses Gurufocus data - and how

Teams across industries use gurufocus.com data to build competitive products and smarter operations.

01
Quantitative Backtesting

Quant funds use 30-year financial histories and Piotroski F-Scores to backtest value investing factors and build alpha-generating models.

02
Value Investment Screening

Asset managers screen thousands of global equities for low P/B ratios, high ROIC, and strong GF Scores to identify undervalued assets.

03
Insider Sentiment Analysis

Hedge funds aggregate cluster buying from C-suite executives to predict positive earnings surprises or upcoming acquisitions.

04
Competitor Benchmarking

Corporate finance teams track industry peers to benchmark capital allocation, margin expansion, and debt-to-equity ratios over time.

05
Equity Research Automation

Research analysts automate the collection of DCF inputs, WACC estimates, and historical growth rates to accelerate report generation.

06
Risk Modeling

Risk departments monitor Altman Z-Scores and Beneish M-Scores across portfolios to detect bankruptcy risk or earnings manipulation.

Why DataFlirt

"Gurufocus aggregates decades of SEC filings into clean valuation metrics, but running quantitative models requires raw access to the underlying 30-year time series."

Extracting deep financial history requires navigating complex JavaScript grids, aggressive rate limits, and dynamic chart hydration. DataFlirt handles the Cloudflare bypass, session rotation, and tabular normalisation so your quants can focus on alpha generation instead of DOM parsing.

Technical Spec

Gurufocus scraper - technical capabilities

Everything supported by our gurufocus.com scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

JavaScript rendering
Full Playwright sessions required for 30-year financial grids and charts
Supported
Cloudflare bypass
Automated solver and TLS fingerprinting to navigate bot protection
Supported
Residential proxy rotation
ISP-grade IPs rotated to avoid IP bans during heavy historical data extraction
Supported
30-year financial tables
Extraction of full historical time series for income, balance, and cash flow
Supported
GF Value charts
Extraction of historical fair value estimates and current GF scores
Supported
Insider trade pagination
Full historical logs of executive buying and selling activity
Supported
Change detection
Hash-based diffing to emit only new 13F filings or insider trades
Supported
Premium / Gated Guru data
Access to real-time premium guru trades requires paid account credentials
Partial
Excel Export downloads
We parse the DOM directly rather than triggering native file downloads
Partial
Infrastructure

Infrastructure powering the financial pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheus
Playwright for Financial Grids

Financial tables on Gurufocus are heavily reliant on client-side rendering. We use Playwright to execute JavaScript, hydrate the DOM, and extract complete time series data accurately.

Residential Proxy Pools

We route requests through high-reputation US residential proxies to avoid rate limits and Cloudflare blocks when extracting decades of data across thousands of tickers.

Cloud-Native Orchestration

Pipelines run on Kubernetes and AWS Lambda. Apache Airflow manages execution schedules, retries, and data delivery to ensure your warehouse is updated before market open.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON
Nested structures ideal for complex financial time series
CSV
Flat tabular files ready for pandas or quantitative models
XLS
Excel compatible output for analyst consumption
Parquet
Columnar format optimised for BigQuery and Snowflake
AWS S3
Direct delivery to your cloud storage buckets
Webhook
HTTP POST for real-time insider trade alerts
API
Queryable REST endpoints for on-demand ticker data
PostgreSQL
Direct upsert into your relational database schemas
S3
Direct bucket delivery — compatible with any data lake
// faq

Common questions.

About gurufocus.com scraping, legality, and pipeline operations.

Ask us directly →
Is scraping Gurufocus legal?

Scraping publicly available financial data is generally permissible. DataFlirt extracts only public, non-authenticated metrics, historical financials, and SEC-derived data. We do not bypass paywalls or extract premium-gated content without authorised credentials. Clients must ensure their use case complies with applicable terms of service.

How do you handle Cloudflare protection?

We use residential ISP proxies combined with Playwright browser sessions that mimic realistic TLS fingerprints and user behaviour. This allows us to navigate bot protection reliably without triggering CAPTCHAs or IP blocks.

Can you extract the full 30-year financial history?

Yes. We navigate the paginated and JavaScript-rendered financial grids to extract the complete available history for income statements, balance sheets, and cash flow statements.

Do you provide premium data?

No. We only scrape data that is publicly accessible on the free tier. If you require premium data, you must provide valid paid account credentials, and we can configure a dedicated authenticated pipeline for you.

How fresh is the data?

Pipelines can be configured to run daily or weekly. For time-sensitive data like insider trades or 13F filings, we can configure high-frequency monitoring on specific ticker lists.

Why scrape when APIs exist?

Financial APIs are often expensive, strictly rate-limited, and lack proprietary metrics like the GF Score or Peter Lynch charts. Scraping provides access to the exact data structures and proprietary valuation models visible on the site.

Can I get a sample of the financial data?

Yes. We offer a sample extraction of up to 50 tickers during the scoping phase so your quantitative team can validate the schema, accuracy, and completeness of the historical time series.

$ dataflirt scope --new-project --source=gurufocus.com ready

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a daily feed of insider trades or a complete historical dump of 30-year financials across 10,000 tickers - we build and manage the infrastructure. Tell us your requirements.

hello@dataflirt.com · Bengaluru · IST · typical reply < 4h
Services

Data Extraction for Every Industry

View All Services →