SYSTEM all green source gurufocus.com queue 12,841 tickers p99 latency 312ms dataflirt.com · scraper/gurufocus-com

RUN - 31 active pipelines - gurufocus.com live

Value investing data,
at quantitative scale.

We extract 30-year financial histories, GF Value calculations, insider transaction logs, and guru portfolio updates from Gurufocus. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Snowflake on your cadence.

Get data from gurufocus.com → See how it works

Tickers tracked

68,412 /run

Insider trades

14,921 /24h

13F updates

4,192 /week

Active pipelines

Uptime

99.98%

◆ 30-Year Financial Data◆ GF Value & Ratios◆ Peter Lynch Charts◆ Guru Portfolio Tracking◆ 13F Filing Extraction◆ Insider Trade Logs◆ DCF Model Inputs◆ Dividend History◆ Piotroski F-Score◆ Altman Z-Score◆ Managed Pipeline◆ S3 / BigQuery Delivery◆ Bengaluru HQ◆ 30-Year Financial Data◆ GF Value & Ratios◆ Peter Lynch Charts◆ Guru Portfolio Tracking◆ 13F Filing Extraction◆ Insider Trade Logs◆ DCF Model Inputs◆ Dividend History◆ Piotroski F-Score◆ Altman Z-Score◆ Managed Pipeline◆ S3 / BigQuery Delivery◆ Bengaluru HQ

Data Dictionary

Every field we extract from gurufocus.com

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Financial Statements objects from gurufocus.com. All fields typed and schema-versioned.

tickercompany_nameexchangesectorindustrymarket_caprevenue_ttmnet_income_ttmeps_dilutedpe_ratiopb_ratiops_ratiofree_cash_flow

"ticker": "AAPL",
"company_name": "Apple Inc",
"exchange": "NAS",
"sector": "Technology",
"market_cap": 2984500000000,
"revenue_ttm": 383285000000,
"pe_ratio": 28.5,
"free_cash_flow": 99584000000

#	ticker	company_name	exchange	sector	industry	market_cap
1
2
3

Complete list of extractable fields for Valuation & GF Metrics objects from gurufocus.com. All fields typed and schema-versioned.

tickergf_scorefinancial_strengthprofitability_rankgrowth_rankgf_value_rankmomentum_rankpiotroski_f_scorealtman_z_scorebeneish_m_scorewaccroic

"ticker": "AAPL",
"gf_score": 94,
"financial_strength": 7,
"profitability_rank": 10,
"gf_value_rank": 5,
"piotroski_f_score": 7,
"altman_z_score": 6.84,
"roic": 28.4

#	ticker	gf_score	financial_strength	profitability_rank	growth_rank	gf_value_rank
1
2
3

Complete list of extractable fields for Insider Trades objects from gurufocus.com. All fields typed and schema-versioned.

tickerinsider_namepositiontransaction_datetransaction_typepriceshares_tradedshares_heldvalue_usdfiling_date

"ticker": "AAPL",
"insider_name": "Cook Timothy D",
"position": "Chief Executive Officer",
"transaction_date": "2026-04-01",
"transaction_type": "Sell",
"price": 175.42,
"shares_traded": 196410,
"value_usd": 34454242

#	ticker	insider_name	position	transaction_date	transaction_type	price
1
2
3

Complete list of extractable fields for Guru Portfolios objects from gurufocus.com. All fields typed and schema-versioned.

tickerguru_namequarteractionimpact_on_portfolioshares_heldcurrent_valuepercent_of_shares_outstandingaverage_price

"ticker": "AAPL",
"guru_name": "Warren Buffett",
"quarter": "Q4 2025",
"action": "Reduce",
"impact_on_portfolio": -1.2,
"shares_held": 905560000,
"current_value": 158473000000,
"percent_of_shares_outstanding": 5.8

#	ticker	guru_name	quarter	action	impact_on_portfolio	shares_held
1
2
3

Complete list of extractable fields for Dividend Data objects from gurufocus.com. All fields typed and schema-versioned.

tickerdividend_yieldforward_dividendpayout_ratiodividend_growth_3ydividend_growth_5yex_dividend_dateconsecutive_years_growthbuyback_yield

"ticker": "AAPL",
"dividend_yield": 0.54,
"forward_dividend": 0.96,
"payout_ratio": 0.15,
"dividend_growth_3y": 5.2,
"ex_dividend_date": "2026-05-10",
"consecutive_years_growth": 11,
"buyback_yield": 3.1

#	ticker	dividend_yield	forward_dividend	payout_ratio	dividend_growth_3y	dividend_growth_5y
1
2
3

Capabilities

Extract the full quantitative dataset

Our Gurufocus scraper navigates complex JavaScript grids, heavy DOM structures, and bot protection to deliver clean tabular data for your financial models.

30-Year Financial Histories

Extract income statements, balance sheets, and cash flow data across three decades. Normalised into flat time-series records.

GF Value & Ratios

Capture the proprietary GF Score, Peter Lynch fair value, Piotroski F-Score, Altman Z-Score, and Beneish M-Score for any ticker.

Insider Transaction Logs

Track CEO, CFO, and director buying and selling behaviour. Extract transaction dates, share volumes, and average prices.

Guru & 13F Tracking

Monitor portfolio changes from superinvestors. Extract quarterly additions, reductions, and portfolio impact percentages.

Warning Signs & Red Flags

Scrape Gurufocus proprietary warning signs like inventory build-up, margin contraction, or asset growth vs revenue growth.

DCF Model Parameters

Extract default inputs for Discounted Cash Flow models including WACC, terminal growth rates, and projected EPS.

Dividend & Buyback Yields

Track shareholder yield metrics including historical dividend growth rates, payout ratios, and net share repurchases.

Industry Benchmarking

Extract percentile rankings for profitability, growth, and financial strength against industry peers.

Continuous Pipeline Updates

Configure daily or weekly runs to capture new 13F filings, insider trades, and earnings report updates automatically.

// engagement pipeline

From ticker list to quantitative warehouse

Brief in. Clean data out.

Define Scope

d 0

Provide ticker lists, exchanges, or specific guru profiles. We design the extraction schema together.

Pipeline Build

d 2–4

We configure Playwright crawlers, proxy rotation, session management, and Cloudflare bypass for gurufocus.com.

Validation & QA

d 4–6

Schema validation, null-rate checks, and financial data normalisation before full launch.

Delivery

ongoing

JSON / CSV / Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.

Under the hood

How our pipeline handles financial data complexity

Financial data sites use heavy JavaScript grids and aggressive rate limiting. Here is how we ensure reliable data delivery.

// fingerprinting

Identity rotation

TLS fingerprintrandomised

User-agentrotated

IP poolresidential

Challenges blocked0

// pagination

Page coverage

48,291 pages queued running

// observability

Pipeline health

99.9%

uptime

142ms

p99 lat

0.3%

null rate

alerts

Anti-bot layer

Cloudflare bypass and residential proxies

Gurufocus uses Cloudflare to block automated traffic. Our infrastructure uses residential ISP proxies and Playwright sessions with realistic TLS fingerprints to bypass bot detection without triggering blocks.

JavaScript rendering

Hydrating complex financial grids

The 30-year financial tables are rendered dynamically via JavaScript. We execute full browser sessions to wait for data hydration, ensuring we capture the complete time series rather than empty DOM nodes.

Data normalisation

Structuring nested financial data

Financial statements on the web are messy. We normalise nested rows, handle missing quarters, standardise currency formats, and convert string representations into strict numerical types for your warehouse.

Rate limiting

Distributed crawling architecture

Extracting decades of data across thousands of tickers triggers rate limits. We distribute requests across thousands of IPs and manage concurrency strictly to ensure complete extraction without timeouts.

Change detection

Optimise downstream ingestion

We hash records per ticker and only emit data when new filings, trades, or price updates occur. This reduces ingestion costs and keeps your quantitative models fed with only fresh signals.

Applications

Who uses Gurufocus data - and how

Teams across industries use gurufocus.com data to build competitive products and smarter operations.

Quantitative Backtesting

Quant funds use 30-year financial histories and Piotroski F-Scores to backtest value investing factors and build alpha-generating models.

Value Investment Screening

Asset managers screen thousands of global equities for low P/B ratios, high ROIC, and strong GF Scores to identify undervalued assets.

Insider Sentiment Analysis

Hedge funds aggregate cluster buying from C-suite executives to predict positive earnings surprises or upcoming acquisitions.

Competitor Benchmarking

Corporate finance teams track industry peers to benchmark capital allocation, margin expansion, and debt-to-equity ratios over time.

Equity Research Automation

Research analysts automate the collection of DCF inputs, WACC estimates, and historical growth rates to accelerate report generation.

Risk Modeling

Risk departments monitor Altman Z-Scores and Beneish M-Scores across portfolios to detect bankruptcy risk or earnings manipulation.

Why DataFlirt

"Gurufocus aggregates decades of SEC filings into clean valuation metrics, but running quantitative models requires raw access to the underlying 30-year time series."

Extracting deep financial history requires navigating complex JavaScript grids, aggressive rate limits, and dynamic chart hydration. DataFlirt handles the Cloudflare bypass, session rotation, and tabular normalisation so your quants can focus on alpha generation instead of DOM parsing.

Technical Spec

Gurufocus scraper - technical capabilities

Everything supported by our gurufocus.com scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

JavaScript rendering

Full Playwright sessions required for 30-year financial grids and charts

Supported

Cloudflare bypass

Automated solver and TLS fingerprinting to navigate bot protection

Supported

Residential proxy rotation

ISP-grade IPs rotated to avoid IP bans during heavy historical data extraction

Supported

30-year financial tables

Extraction of full historical time series for income, balance, and cash flow

Supported

GF Value charts

Extraction of historical fair value estimates and current GF scores

Supported

Insider trade pagination

Full historical logs of executive buying and selling activity

Supported

Change detection

Hash-based diffing to emit only new 13F filings or insider trades

Supported

Premium / Gated Guru data

Access to real-time premium guru trades requires paid account credentials

Partial

Excel Export downloads

We parse the DOM directly rather than triggering native file downloads

Partial

Infrastructure

Infrastructure powering the financial pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheus

Playwright for Financial Grids

Financial tables on Gurufocus are heavily reliant on client-side rendering. We use Playwright to execute JavaScript, hydrate the DOM, and extract complete time series data accurately.

Residential Proxy Pools

We route requests through high-reputation US residential proxies to avoid rate limits and Cloudflare blocks when extracting decades of data across thousands of tickers.

Cloud-Native Orchestration

Pipelines run on Kubernetes and AWS Lambda. Apache Airflow manages execution schedules, retries, and data delivery to ensure your warehouse is updated before market open.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON

Nested structures ideal for complex financial time series

CSV

Flat tabular files ready for pandas or quantitative models

XLS

Excel compatible output for analyst consumption

Parquet

Columnar format optimised for BigQuery and Snowflake

AWS S3

Direct delivery to your cloud storage buckets

Webhook

HTTP POST for real-time insider trade alerts

API

Queryable REST endpoints for on-demand ticker data

PostgreSQL

Direct upsert into your relational database schemas

Direct bucket delivery — compatible with any data lake

// faq

Common questions.

About gurufocus.com scraping, legality, and pipeline operations.

Ask us directly →

Is scraping Gurufocus legal?

Scraping publicly available financial data is generally permissible. DataFlirt extracts only public, non-authenticated metrics, historical financials, and SEC-derived data. We do not bypass paywalls or extract premium-gated content without authorised credentials. Clients must ensure their use case complies with applicable terms of service.

How do you handle Cloudflare protection?

We use residential ISP proxies combined with Playwright browser sessions that mimic realistic TLS fingerprints and user behaviour. This allows us to navigate bot protection reliably without triggering CAPTCHAs or IP blocks.

Can you extract the full 30-year financial history?

Yes. We navigate the paginated and JavaScript-rendered financial grids to extract the complete available history for income statements, balance sheets, and cash flow statements.

Do you provide premium data?

No. We only scrape data that is publicly accessible on the free tier. If you require premium data, you must provide valid paid account credentials, and we can configure a dedicated authenticated pipeline for you.

How fresh is the data?

Pipelines can be configured to run daily or weekly. For time-sensitive data like insider trades or 13F filings, we can configure high-frequency monitoring on specific ticker lists.

Why scrape when APIs exist?

Financial APIs are often expensive, strictly rate-limited, and lack proprietary metrics like the GF Score or Peter Lynch charts. Scraping provides access to the exact data structures and proprietary valuation models visible on the site.

Can I get a sample of the financial data?

Yes. We offer a sample extraction of up to 50 tickers during the scoping phase so your quantitative team can validate the schema, accuracy, and completeness of the historical time series.

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a daily feed of insider trades or a complete historical dump of 30-year financials across 10,000 tickers - we build and manage the infrastructure. Tell us your requirements.

Start a gurufocus.com pipeline → View pricing

hello@dataflirt.com · Bengaluru · IST · typical reply < 4h

Services

Data Extraction for Every Industry

View All Services →

🛍️ eCommerce → 🔍 Search Engine → ⚽ Sports Data → 📱 App Store → 🍕 Food Delivery → 📉 Betting Odds → ✈️ Aviation & Flight → 🛒 Grocery → 🎓 E-Learning → 💹 Stock Market → 🏠 Real Estate → 🤖 AI Training Data → 🧠 LLM Data → 📰 News → ⭐ Reviews → 💼 Job Board → 🏥 Healthcare → 💊 Pharma → 🏢 Company Data → 🤝 B2B Marketplace → 🚗 Automotive → 🌍 Travel → 🏨 Hospitality → 🪙 Cryptocurrency → 💡 IP & Patents → 📈 SEO Data → ⚖️ Legal → 🛡️ Insurance → 📲 Mobile App → 📸 Influencer → 🏛️ Government → 🚚 Transportation → 🎟️ Events → 📂 Directory → ⚡ Dynamic Websites → 📄 PDF Extraction → ✍️ Blog Content → ☁️ Weather → 🖥️ Cloud Scraping → 👨‍💻 Managed Service →

Value investing data, at quantitative scale.

Every field we extract from gurufocus.com

Extract the full quantitative dataset

From ticker list to quantitative warehouse

How our pipeline handles financial data complexity

Who uses Gurufocus data - and how

Gurufocus scraper - technical capabilities

Infrastructure powering the financial pipeline

Your data, your destination

Common questions.

Tell us whatto extract. We do the rest.

Data Extraction for Every Industry

Value investing data,
at quantitative scale.

Tell us what
to extract.
We do the rest.