SYSTEM all green source fool.com queue 12,409 URLs p99 latency 218ms dataflirt.com · scraper/fool-com
RUN · 42 active pipelines · fool.com live

Motley Fool data,
at warehouse scale.

We extract earnings transcripts, market analysis, ticker sentiment, and author publication histories from fool.com. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Snowflake on your cadence.

Articles extracted
14.2K /day
Transcripts parsed
1.8K /week
Ticker mentions
89.4K /run
Active pipelines
42
Uptime
99.98%
Data Dictionary

Every field we extract from fool.com

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Articles & News objects from fool.com. All fields typed and schema-versioned.

article_idurlheadlinesubheadlineauthor_nameauthor_urlpublish_dateticker_mentionsbody_textcategoryreading_time_minutesdisclosure_text
articles_& news
● 200 OK
"article_id": "mf-art-89210",
"url": "https://www.fool.com/investing/2026/04/12/why-apple-stock-jumped/",
"headline": "Why Apple Stock Jumped Today",
"author_name": "Jane Doe",
"publish_date": "2026-04-12T14:30:00Z",
"ticker_mentions": "['AAPL', 'MSFT']"
# article_idurlheadlinesubheadlineauthor_nameauthor_url
1
2
3

Complete list of extractable fields for Earnings Transcripts objects from fool.com. All fields typed and schema-versioned.

transcript_idurltickercompany_namequarteryearcall_dateexecutivesanalystspresentation_textqa_textduration_minutes
earnings_transcripts
● 200 OK
"transcript_id": "mf-trans-10492",
"ticker": "TSLA",
"quarter": "Q1",
"year": 2026,
"call_date": "2026-04-18T21:00:00Z",
"executives": "['Elon Musk', 'Zachary Kirkhorn']",
"analysts": "['Adam Jonas', 'Dan Ives']"
# transcript_idurltickercompany_namequarteryear
1
2
3

Complete list of extractable fields for Authors & Contributors objects from fool.com. All fields typed and schema-versioned.

author_idnameprofile_urlbiotwitter_handletotal_articlesfirst_publish_datelatest_publish_dateprimary_sectorsdisclosure_policy
authors_& contributors
● 200 OK
"author_id": "auth-492",
"name": "John Smith",
"profile_url": "https://www.fool.com/author/johnsmith/",
"total_articles": 412,
"latest_publish_date": "2026-05-10T09:15:00Z",
"primary_sectors": "['Technology', 'Consumer Goods']",
"twitter_handle": "@jsmith_invests"
# author_idnameprofile_urlbiotwitter_handletotal_articles
1
2
3

Complete list of extractable fields for Ticker Mentions objects from fool.com. All fields typed and schema-versioned.

mention_idarticle_idtickercompany_nameexchangecontext_textsentiment_scorebullish_keywordsbearish_keywordsmention_date
ticker_mentions
● 200 OK
"mention_id": "ment-9912",
"article_id": "mf-art-89210",
"ticker": "AAPL",
"exchange": "NASDAQ",
"context_text": "Apple continues to show strong free cash flow generation...",
"mention_date": "2026-04-12T14:30:00Z",
"sentiment_score": 0.82
# mention_idarticle_idtickercompany_nameexchangecontext_text
1
2
3

Complete list of extractable fields for Podcast Episodes objects from fool.com. All fields typed and schema-versioned.

episode_idshow_nametitlepublish_datedurationaudio_urlsummaryguest_namesticker_tagstranscript_available
podcast_episodes
● 200 OK
"episode_id": "pod-1029",
"show_name": "Motley Fool Money",
"title": "Tech Earnings Breakdown",
"publish_date": "2026-05-01T16:00:00Z",
"duration": "42:15",
"audio_url": "https://media.fool.com/podcasts/mfm-20260501.mp3",
"ticker_tags": "['AMZN', 'GOOGL']"
# episode_idshow_nametitlepublish_datedurationaudio_url
1
2
3

Capabilities

Extracting financial media at scale

Our fool.com scraper parses complex article DOMs, maps inline ticker mentions, and structures earnings transcripts into clean, speaker-attributed JSON.

Full Article Extraction

Capture headlines, subheadlines, author metadata, publication timestamps, and complete body text from free market analysis articles.

Earnings Transcript Parsing

Extract Q1-Q4 earnings calls with structured separation between prepared remarks and Q&A sessions, including speaker attribution.

Ticker Entity Resolution

Map inline company mentions to standard exchange tickers (e.g., NASDAQ:AAPL) for downstream quantitative analysis.

Author History Tracking

Monitor analyst publication frequency, sector focus, and historical accuracy across their entire fool.com portfolio.

Disclosure Extraction

Capture mandatory financial interest statements and position disclosures appended to analyst articles.

Historical Archives

Execute deep crawls of historical market coverage dating back years to build comprehensive NLP training corpora.

Podcast Metadata

Extract show notes, audio URLs, guest lists, and ticker tags from Motley Fool Money and Rule Breaker Investing podcasts.

Category Taxonomy

Map articles to internal site categories like growth stocks, dividend investing, retirement, and personal finance.

Scheduled Updates

Configure hourly polling for breaking market coverage or weekly batches for earnings transcript aggregation.

// engagement pipeline

From target tickers to warehouse records

Brief in. Clean data out.

Define Scope
d 0

Provide target tickers, author URLs, or category sections. We design the extraction schema together.

Pipeline Build
d 2–4

We configure Scrapy crawlers, proxy rotation, and text parsing logic for fool.com's specific DOM structures.

Validation & QA
d 4–6

Schema validation, null-rate checks, and transcript speaker attribution verification before full launch.

Delivery
ongoing

JSON / CSV / Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.

Under the hood

Handling financial media extraction

Extracting structured text from fool.com requires parsing complex DOM structures, handling pagination, and standardising entity references.

pipeline-monitor · fool.com · live ● active
// fingerprinting
Identity rotation
TLS fingerprintrandomised
User-agentrotated
IP poolresidential
Challenges blocked0
// pagination
Page coverage
48,291 pages queued running
// observability
Pipeline health
99.9%
uptime
142ms
p99 lat
0.3%
null rate
2
alerts
Transcript parsing
Structured speaker attribution

Earnings transcripts are often published as flat HTML. We apply regex and DOM-traversal logic to separate prepared remarks from Q&A, mapping paragraphs to specific executives and analysts.

Anti-bot layer
IP rotation and rate limiting

While less aggressive than e-commerce platforms, financial media sites still implement rate limiting. We utilise rotating IP pools and randomised request delays to maintain stable extraction without triggering WAF blocks.

Ticker normalisation
Standardising financial entities

Articles reference companies via various formats. We parse internal links and metadata tags to ensure every mention is mapped to a standardised exchange ticker for easy database joining.

Change detection
Tracking post-publish edits

Financial articles are frequently updated post-publication. We maintain hash indexes of article body text to detect edits and emit updated records to your pipeline.

Pagination handling
Deep crawling author histories

Extracting an author's complete history requires traversing hundreds of paginated index pages. Our crawlers handle infinite scroll and standard pagination to ensure complete historical capture.

Applications

Who uses Motley Fool data — and how

Teams across industries use fool.com data to build competitive products and smarter operations.

01
NLP Training

Machine learning teams use historical article archives as a corpus for training financial language models.

02
Sentiment Analysis

Quantitative funds track ticker sentiment over time by parsing bullish and bearish keywords in daily market coverage.

03
Earnings Call Intelligence

Analysts extract Q&A signals from structured earnings transcripts to evaluate management tone and analyst skepticism.

04
Quantitative Trading

Systematic traders use publication volume and ticker mention frequency as alternative data inputs for momentum models.

05
Author Tracking

Media monitoring teams track specific analysts to measure their accuracy and influence on retail trading volume.

06
Competitor Intelligence

Public companies monitor media share of voice and sentiment compared to their direct industry peers.

Why DataFlirt

"Financial media is unstructured by nature. Extracting clean, speaker-attributed earnings transcripts and ticker-mapped sentiment requires precise DOM parsing."

DataFlirt handles the heavy lifting of parsing fool.com's article structures, standardising ticker mentions, and formatting earnings transcripts into clean JSON. Your quantitative analysts receive warehouse-ready text corpora without writing a single line of web scraping code or managing proxy infrastructure.

Technical Spec

Fool scraper — technical capabilities

Everything supported by our fool.com scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

Public article extraction
Full text, author, and date metadata for all free market news
Supported
Earnings call transcripts
Structured parsing of Q1-Q4 calls with speaker attribution
Supported
Ticker mention mapping
Resolution of inline company mentions to standard exchange tickers
Supported
Author profile scraping
Extraction of analyst bios and complete publication histories
Supported
Podcast metadata
Show notes and audio URLs for investing podcasts
Supported
Historical archive crawling
Deep pagination traversal for building historical NLP corpora
Supported
Post-publish edit tracking
Hash-based diffing to detect updates to existing articles
Supported
Webhook delivery
HTTP POST for real-time article ingestion
Supported
Motley Fool Stock Advisor picks
Premium subscription content requires authenticated access
Partial
Rule Breakers portfolio allocations
Gated premium investment models and specific stock recommendations
Partial
Infrastructure

Infrastructure powering the Fool pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheus
Scrapy + Playwright Stack

Scrapy handles high-throughput text extraction and pagination, while Playwright manages dynamic content loading where necessary.

Residential Proxy Infrastructure

We maintain pools of residential ISP proxies to bypass rate limits during deep historical archive crawls.

Cloud-Native Orchestration

Pipelines run on AWS ECS. Airflow handles scheduling, dependency management, and SLA alerting. All state stored in managed Postgres.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON
Newline-delimited or nested — ideal for unstructured text
CSV
Flat file with typed columns for basic metadata
XLS
Excel compatible format for manual analyst review
Parquet
Columnar format for BigQuery, Snowflake, Athena
AWS S3
Direct bucket delivery — compatible with any data lake
Webhook
HTTP POST per record for real-time downstream processing
API
REST endpoint for querying historical extractions
BigQuery
Streamed directly into your dataset with schema auto-detect
Snowflake
Stage + COPY INTO workflow — incremental or full-replace
S3
Direct bucket delivery — compatible with any data lake
// faq

Common questions.

About fool.com scraping, legality, and pipeline operations.

Ask us directly →
Is scraping fool.com legal?

Scraping publicly available information from fool.com is generally permissible under applicable law. DataFlirt targets only public, non-authenticated articles and transcripts. We do not circumvent paywalls to access premium Stock Advisor or Rule Breakers content. Clients should review Terms of Service and consult legal counsel for specific use cases.

How do you format earnings transcripts?

We parse the raw HTML text to identify speaker transitions. The output JSON separates the presentation segment from the Q&A segment, and attributes every paragraph to the specific executive or analyst speaking.

Can you extract historical articles?

Yes. We can execute deep crawls of author indexes or category archives to extract articles dating back several years, providing a robust corpus for backtesting or NLP model training.

Do you scrape premium Stock Advisor picks?

No. DataFlirt does not extract data that is placed behind authentication walls or paid subscription paywalls.

How fast is new market analysis captured?

For active monitoring pipelines, we configure hourly polling on target author feeds or category pages. Newly published articles are extracted and delivered via Webhook or S3 within minutes of publication.

Can I filter data by specific tickers?

Yes. You can provide a defined list of target tickers. We will configure the pipeline to only extract articles, transcripts, or podcast metadata where those specific tickers are mentioned or tagged.

Do you extract author disclosure statements?

Yes. We capture the standard disclosure text typically appended to the bottom of fool.com articles, which outlines the author's personal positions in the mentioned securities.

$ dataflirt scope --new-project --source=fool.com ready

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a historical corpus of earnings transcripts or a continuous feed of ticker sentiment — we scope, build, and operate the pipeline. Tell us what you need.

hello@dataflirt.com · Bengaluru · IST · typical reply < 4h
Services

Data Extraction for Every Industry

View All Services →