SYSTEM all green source fool.com queue 12,409 URLs p99 latency 218ms dataflirt.com · scraper/fool-com

RUN · 42 active pipelines · fool.com live

Motley Fool data,
at warehouse scale.

We extract earnings transcripts, market analysis, ticker sentiment, and author publication histories from fool.com. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Snowflake on your cadence.

Get data from fool.com → See how it works

Articles extracted

14.2K /day

Transcripts parsed

1.8K /week

Ticker mentions

89.4K /run

Active pipelines

Uptime

99.98%

◆ Earnings Call Transcripts◆ Market News Articles◆ Ticker Sentiment Mapping◆ Author Publication History◆ Sector Analysis Reports◆ Historical Article Archives◆ Investing Podcast Metadata◆ Dividend Stock Coverage◆ Growth Stock Analysis◆ Managed Pipeline Delivery◆ S3 / BigQuery Integration◆ Bengaluru HQ SLA◆ Earnings Call Transcripts◆ Market News Articles◆ Ticker Sentiment Mapping◆ Author Publication History◆ Sector Analysis Reports◆ Historical Article Archives◆ Investing Podcast Metadata◆ Dividend Stock Coverage◆ Growth Stock Analysis◆ Managed Pipeline Delivery◆ S3 / BigQuery Integration◆ Bengaluru HQ SLA

Data Dictionary

Every field we extract from fool.com

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Articles & News objects from fool.com. All fields typed and schema-versioned.

article_idurlheadlinesubheadlineauthor_nameauthor_urlpublish_dateticker_mentionsbody_textcategoryreading_time_minutesdisclosure_text

"article_id": "mf-art-89210",
"url": "https://www.fool.com/investing/2026/04/12/why-apple-stock-jumped/",
"headline": "Why Apple Stock Jumped Today",
"author_name": "Jane Doe",
"publish_date": "2026-04-12T14:30:00Z",
"ticker_mentions": "['AAPL', 'MSFT']"

#	article_id	url	headline	subheadline	author_name	author_url
1
2
3

Complete list of extractable fields for Earnings Transcripts objects from fool.com. All fields typed and schema-versioned.

transcript_idurltickercompany_namequarteryearcall_dateexecutivesanalystspresentation_textqa_textduration_minutes

"transcript_id": "mf-trans-10492",
"ticker": "TSLA",
"quarter": "Q1",
"year": 2026,
"call_date": "2026-04-18T21:00:00Z",
"executives": "['Elon Musk', 'Zachary Kirkhorn']",
"analysts": "['Adam Jonas', 'Dan Ives']"

#	transcript_id	url	ticker	company_name	quarter	year
1
2
3

Complete list of extractable fields for Authors & Contributors objects from fool.com. All fields typed and schema-versioned.

author_idnameprofile_urlbiotwitter_handletotal_articlesfirst_publish_datelatest_publish_dateprimary_sectorsdisclosure_policy

"author_id": "auth-492",
"name": "John Smith",
"profile_url": "https://www.fool.com/author/johnsmith/",
"total_articles": 412,
"latest_publish_date": "2026-05-10T09:15:00Z",
"primary_sectors": "['Technology', 'Consumer Goods']",
"twitter_handle": "@jsmith_invests"

#	author_id	name	profile_url	bio	twitter_handle	total_articles
1
2
3

Complete list of extractable fields for Ticker Mentions objects from fool.com. All fields typed and schema-versioned.

mention_idarticle_idtickercompany_nameexchangecontext_textsentiment_scorebullish_keywordsbearish_keywordsmention_date

"mention_id": "ment-9912",
"article_id": "mf-art-89210",
"ticker": "AAPL",
"exchange": "NASDAQ",
"context_text": "Apple continues to show strong free cash flow generation...",
"mention_date": "2026-04-12T14:30:00Z",
"sentiment_score": 0.82

#	mention_id	article_id	ticker	company_name	exchange	context_text
1
2
3

Complete list of extractable fields for Podcast Episodes objects from fool.com. All fields typed and schema-versioned.

episode_idshow_nametitlepublish_datedurationaudio_urlsummaryguest_namesticker_tagstranscript_available

"episode_id": "pod-1029",
"show_name": "Motley Fool Money",
"title": "Tech Earnings Breakdown",
"publish_date": "2026-05-01T16:00:00Z",
"duration": "42:15",
"audio_url": "https://media.fool.com/podcasts/mfm-20260501.mp3",
"ticker_tags": "['AMZN', 'GOOGL']"

#	episode_id	show_name	title	publish_date	duration	audio_url
1
2
3

Capabilities

Extracting financial media at scale

Our fool.com scraper parses complex article DOMs, maps inline ticker mentions, and structures earnings transcripts into clean, speaker-attributed JSON.

Full Article Extraction

Capture headlines, subheadlines, author metadata, publication timestamps, and complete body text from free market analysis articles.

Earnings Transcript Parsing

Extract Q1-Q4 earnings calls with structured separation between prepared remarks and Q&A sessions, including speaker attribution.

Ticker Entity Resolution

Map inline company mentions to standard exchange tickers (e.g., NASDAQ:AAPL) for downstream quantitative analysis.

Author History Tracking

Monitor analyst publication frequency, sector focus, and historical accuracy across their entire fool.com portfolio.

Disclosure Extraction

Capture mandatory financial interest statements and position disclosures appended to analyst articles.

Historical Archives

Execute deep crawls of historical market coverage dating back years to build comprehensive NLP training corpora.

Podcast Metadata

Extract show notes, audio URLs, guest lists, and ticker tags from Motley Fool Money and Rule Breaker Investing podcasts.

Category Taxonomy

Map articles to internal site categories like growth stocks, dividend investing, retirement, and personal finance.

Scheduled Updates

Configure hourly polling for breaking market coverage or weekly batches for earnings transcript aggregation.

// engagement pipeline

From target tickers to warehouse records

Brief in. Clean data out.

Define Scope

d 0

Provide target tickers, author URLs, or category sections. We design the extraction schema together.

Pipeline Build

d 2–4

We configure Scrapy crawlers, proxy rotation, and text parsing logic for fool.com's specific DOM structures.

Validation & QA

d 4–6

Schema validation, null-rate checks, and transcript speaker attribution verification before full launch.

Delivery

ongoing

JSON / CSV / Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.

Under the hood

Handling financial media extraction

Extracting structured text from fool.com requires parsing complex DOM structures, handling pagination, and standardising entity references.

// fingerprinting

Identity rotation

TLS fingerprintrandomised

User-agentrotated

IP poolresidential

Challenges blocked0

// pagination

Page coverage

48,291 pages queued running

// observability

Pipeline health

99.9%

uptime

142ms

p99 lat

0.3%

null rate

alerts

Transcript parsing

Structured speaker attribution

Earnings transcripts are often published as flat HTML. We apply regex and DOM-traversal logic to separate prepared remarks from Q&A, mapping paragraphs to specific executives and analysts.

Anti-bot layer

IP rotation and rate limiting

While less aggressive than e-commerce platforms, financial media sites still implement rate limiting. We utilise rotating IP pools and randomised request delays to maintain stable extraction without triggering WAF blocks.

Ticker normalisation

Standardising financial entities

Articles reference companies via various formats. We parse internal links and metadata tags to ensure every mention is mapped to a standardised exchange ticker for easy database joining.

Change detection

Tracking post-publish edits

Financial articles are frequently updated post-publication. We maintain hash indexes of article body text to detect edits and emit updated records to your pipeline.

Pagination handling

Deep crawling author histories

Extracting an author's complete history requires traversing hundreds of paginated index pages. Our crawlers handle infinite scroll and standard pagination to ensure complete historical capture.

Applications

Who uses Motley Fool data — and how

Teams across industries use fool.com data to build competitive products and smarter operations.

NLP Training

Machine learning teams use historical article archives as a corpus for training financial language models.

Sentiment Analysis

Quantitative funds track ticker sentiment over time by parsing bullish and bearish keywords in daily market coverage.

Earnings Call Intelligence

Analysts extract Q&A signals from structured earnings transcripts to evaluate management tone and analyst skepticism.

Quantitative Trading

Systematic traders use publication volume and ticker mention frequency as alternative data inputs for momentum models.

Author Tracking

Media monitoring teams track specific analysts to measure their accuracy and influence on retail trading volume.

Competitor Intelligence

Public companies monitor media share of voice and sentiment compared to their direct industry peers.

Why DataFlirt

"Financial media is unstructured by nature. Extracting clean, speaker-attributed earnings transcripts and ticker-mapped sentiment requires precise DOM parsing."

DataFlirt handles the heavy lifting of parsing fool.com's article structures, standardising ticker mentions, and formatting earnings transcripts into clean JSON. Your quantitative analysts receive warehouse-ready text corpora without writing a single line of web scraping code or managing proxy infrastructure.

Technical Spec

Fool scraper — technical capabilities

Everything supported by our fool.com scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

Public article extraction

Full text, author, and date metadata for all free market news

Supported

Earnings call transcripts

Structured parsing of Q1-Q4 calls with speaker attribution

Supported

Ticker mention mapping

Resolution of inline company mentions to standard exchange tickers

Supported

Author profile scraping

Extraction of analyst bios and complete publication histories

Supported

Podcast metadata

Show notes and audio URLs for investing podcasts

Supported

Historical archive crawling

Deep pagination traversal for building historical NLP corpora

Supported

Post-publish edit tracking

Hash-based diffing to detect updates to existing articles

Supported

Webhook delivery

HTTP POST for real-time article ingestion

Supported

Motley Fool Stock Advisor picks

Premium subscription content requires authenticated access

Partial

Rule Breakers portfolio allocations

Gated premium investment models and specific stock recommendations

Partial

Infrastructure

Infrastructure powering the Fool pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheus

Scrapy + Playwright Stack

Scrapy handles high-throughput text extraction and pagination, while Playwright manages dynamic content loading where necessary.

Residential Proxy Infrastructure

We maintain pools of residential ISP proxies to bypass rate limits during deep historical archive crawls.

Cloud-Native Orchestration

Pipelines run on AWS ECS. Airflow handles scheduling, dependency management, and SLA alerting. All state stored in managed Postgres.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON

Newline-delimited or nested — ideal for unstructured text

CSV

Flat file with typed columns for basic metadata

XLS

Excel compatible format for manual analyst review

Parquet

Columnar format for BigQuery, Snowflake, Athena

AWS S3

Direct bucket delivery — compatible with any data lake

Webhook

HTTP POST per record for real-time downstream processing

API

REST endpoint for querying historical extractions

BigQuery

Streamed directly into your dataset with schema auto-detect

Snowflake

Stage + COPY INTO workflow — incremental or full-replace

Direct bucket delivery — compatible with any data lake

// faq

Common questions.

About fool.com scraping, legality, and pipeline operations.

Ask us directly →

Is scraping fool.com legal?

Scraping publicly available information from fool.com is generally permissible under applicable law. DataFlirt targets only public, non-authenticated articles and transcripts. We do not circumvent paywalls to access premium Stock Advisor or Rule Breakers content. Clients should review Terms of Service and consult legal counsel for specific use cases.

How do you format earnings transcripts?

We parse the raw HTML text to identify speaker transitions. The output JSON separates the presentation segment from the Q&A segment, and attributes every paragraph to the specific executive or analyst speaking.

Can you extract historical articles?

Yes. We can execute deep crawls of author indexes or category archives to extract articles dating back several years, providing a robust corpus for backtesting or NLP model training.

Do you scrape premium Stock Advisor picks?

No. DataFlirt does not extract data that is placed behind authentication walls or paid subscription paywalls.

How fast is new market analysis captured?

For active monitoring pipelines, we configure hourly polling on target author feeds or category pages. Newly published articles are extracted and delivered via Webhook or S3 within minutes of publication.

Can I filter data by specific tickers?

Yes. You can provide a defined list of target tickers. We will configure the pipeline to only extract articles, transcripts, or podcast metadata where those specific tickers are mentioned or tagged.

Do you extract author disclosure statements?

Yes. We capture the standard disclosure text typically appended to the bottom of fool.com articles, which outlines the author's personal positions in the mentioned securities.

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a historical corpus of earnings transcripts or a continuous feed of ticker sentiment — we scope, build, and operate the pipeline. Tell us what you need.

Start a fool.com pipeline → View pricing

hello@dataflirt.com · Bengaluru · IST · typical reply < 4h

Services

Data Extraction for Every Industry

View All Services →

🛍️ eCommerce → 🔍 Search Engine → ⚽ Sports Data → 📱 App Store → 🍕 Food Delivery → 📉 Betting Odds → ✈️ Aviation & Flight → 🛒 Grocery → 🎓 E-Learning → 💹 Stock Market → 🏠 Real Estate → 🤖 AI Training Data → 🧠 LLM Data → 📰 News → ⭐ Reviews → 💼 Job Board → 🏥 Healthcare → 💊 Pharma → 🏢 Company Data → 🤝 B2B Marketplace → 🚗 Automotive → 🌍 Travel → 🏨 Hospitality → 🪙 Cryptocurrency → 💡 IP & Patents → 📈 SEO Data → ⚖️ Legal → 🛡️ Insurance → 📲 Mobile App → 📸 Influencer → 🏛️ Government → 🚚 Transportation → 🎟️ Events → 📂 Directory → ⚡ Dynamic Websites → 📄 PDF Extraction → ✍️ Blog Content → ☁️ Weather → 🖥️ Cloud Scraping → 👨‍💻 Managed Service →

Motley Fool data, at warehouse scale.

Every field we extract from fool.com

Extracting financial media at scale

From target tickers to warehouse records

Handling financial media extraction

Who uses Motley Fool data — and how

Fool scraper — technical capabilities

Infrastructure powering the Fool pipeline

Your data, your destination

Common questions.

Tell us whatto extract. We do the rest.

Data Extraction for Every Industry

Motley Fool data,
at warehouse scale.

Tell us what
to extract.
We do the rest.