We extract earnings transcripts, market analysis, ticker sentiment, and author publication histories from fool.com. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Snowflake on your cadence.
Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.
Complete list of extractable fields for Articles & News objects from fool.com. All fields typed and schema-versioned.
"article_id": "mf-art-89210", "url": "https://www.fool.com/investing/2026/04/12/why-apple-stock-jumped/", "headline": "Why Apple Stock Jumped Today", "author_name": "Jane Doe", "publish_date": "2026-04-12T14:30:00Z", "ticker_mentions": "['AAPL', 'MSFT']"
| # | article_id | url | headline | subheadline | author_name | author_url |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for Earnings Transcripts objects from fool.com. All fields typed and schema-versioned.
"transcript_id": "mf-trans-10492", "ticker": "TSLA", "quarter": "Q1", "year": 2026, "call_date": "2026-04-18T21:00:00Z", "executives": "['Elon Musk', 'Zachary Kirkhorn']", "analysts": "['Adam Jonas', 'Dan Ives']"
| # | transcript_id | url | ticker | company_name | quarter | year |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for Authors & Contributors objects from fool.com. All fields typed and schema-versioned.
"author_id": "auth-492", "name": "John Smith", "profile_url": "https://www.fool.com/author/johnsmith/", "total_articles": 412, "latest_publish_date": "2026-05-10T09:15:00Z", "primary_sectors": "['Technology', 'Consumer Goods']", "twitter_handle": "@jsmith_invests"
| # | author_id | name | profile_url | bio | twitter_handle | total_articles |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for Ticker Mentions objects from fool.com. All fields typed and schema-versioned.
"mention_id": "ment-9912", "article_id": "mf-art-89210", "ticker": "AAPL", "exchange": "NASDAQ", "context_text": "Apple continues to show strong free cash flow generation...", "mention_date": "2026-04-12T14:30:00Z", "sentiment_score": 0.82
| # | mention_id | article_id | ticker | company_name | exchange | context_text |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for Podcast Episodes objects from fool.com. All fields typed and schema-versioned.
"episode_id": "pod-1029", "show_name": "Motley Fool Money", "title": "Tech Earnings Breakdown", "publish_date": "2026-05-01T16:00:00Z", "duration": "42:15", "audio_url": "https://media.fool.com/podcasts/mfm-20260501.mp3", "ticker_tags": "['AMZN', 'GOOGL']"
| # | episode_id | show_name | title | publish_date | duration | audio_url |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Our fool.com scraper parses complex article DOMs, maps inline ticker mentions, and structures earnings transcripts into clean, speaker-attributed JSON.
Capture headlines, subheadlines, author metadata, publication timestamps, and complete body text from free market analysis articles.
Extract Q1-Q4 earnings calls with structured separation between prepared remarks and Q&A sessions, including speaker attribution.
Map inline company mentions to standard exchange tickers (e.g., NASDAQ:AAPL) for downstream quantitative analysis.
Monitor analyst publication frequency, sector focus, and historical accuracy across their entire fool.com portfolio.
Capture mandatory financial interest statements and position disclosures appended to analyst articles.
Execute deep crawls of historical market coverage dating back years to build comprehensive NLP training corpora.
Extract show notes, audio URLs, guest lists, and ticker tags from Motley Fool Money and Rule Breaker Investing podcasts.
Map articles to internal site categories like growth stocks, dividend investing, retirement, and personal finance.
Configure hourly polling for breaking market coverage or weekly batches for earnings transcript aggregation.
Brief in. Clean data out.
Provide target tickers, author URLs, or category sections. We design the extraction schema together.
We configure Scrapy crawlers, proxy rotation, and text parsing logic for fool.com's specific DOM structures.
Schema validation, null-rate checks, and transcript speaker attribution verification before full launch.
JSON / CSV / Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.
Extracting structured text from fool.com requires parsing complex DOM structures, handling pagination, and standardising entity references.
Earnings transcripts are often published as flat HTML. We apply regex and DOM-traversal logic to separate prepared remarks from Q&A, mapping paragraphs to specific executives and analysts.
While less aggressive than e-commerce platforms, financial media sites still implement rate limiting. We utilise rotating IP pools and randomised request delays to maintain stable extraction without triggering WAF blocks.
Articles reference companies via various formats. We parse internal links and metadata tags to ensure every mention is mapped to a standardised exchange ticker for easy database joining.
Financial articles are frequently updated post-publication. We maintain hash indexes of article body text to detect edits and emit updated records to your pipeline.
Extracting an author's complete history requires traversing hundreds of paginated index pages. Our crawlers handle infinite scroll and standard pagination to ensure complete historical capture.
Machine learning teams use historical article archives as a corpus for training financial language models.
Quantitative funds track ticker sentiment over time by parsing bullish and bearish keywords in daily market coverage.
Analysts extract Q&A signals from structured earnings transcripts to evaluate management tone and analyst skepticism.
Systematic traders use publication volume and ticker mention frequency as alternative data inputs for momentum models.
Media monitoring teams track specific analysts to measure their accuracy and influence on retail trading volume.
Public companies monitor media share of voice and sentiment compared to their direct industry peers.
"Financial media is unstructured by nature. Extracting clean, speaker-attributed earnings transcripts and ticker-mapped sentiment requires precise DOM parsing."
DataFlirt handles the heavy lifting of parsing fool.com's article structures, standardising ticker mentions, and formatting earnings transcripts into clean JSON. Your quantitative analysts receive warehouse-ready text corpora without writing a single line of web scraping code or managing proxy infrastructure.
Everything supported by our fool.com scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.
Open-source tooling on proven cloud infra — no vendor lock-in, full observability.
Scrapy handles high-throughput text extraction and pagination, while Playwright manages dynamic content loading where necessary.
We maintain pools of residential ISP proxies to bypass rate limits during deep historical archive crawls.
Pipelines run on AWS ECS. Airflow handles scheduling, dependency management, and SLA alerting. All state stored in managed Postgres.
Data delivered to where your team already works — no new tooling required.
About fool.com scraping, legality, and pipeline operations.
Ask us directly →Scraping publicly available information from fool.com is generally permissible under applicable law. DataFlirt targets only public, non-authenticated articles and transcripts. We do not circumvent paywalls to access premium Stock Advisor or Rule Breakers content. Clients should review Terms of Service and consult legal counsel for specific use cases.
We parse the raw HTML text to identify speaker transitions. The output JSON separates the presentation segment from the Q&A segment, and attributes every paragraph to the specific executive or analyst speaking.
Yes. We can execute deep crawls of author indexes or category archives to extract articles dating back several years, providing a robust corpus for backtesting or NLP model training.
No. DataFlirt does not extract data that is placed behind authentication walls or paid subscription paywalls.
For active monitoring pipelines, we configure hourly polling on target author feeds or category pages. Newly published articles are extracted and delivered via Webhook or S3 within minutes of publication.
Yes. You can provide a defined list of target tickers. We will configure the pipeline to only extract articles, transcripts, or podcast metadata where those specific tickers are mentioned or tagged.
Yes. We capture the standard disclosure text typically appended to the bottom of fool.com articles, which outlines the author's personal positions in the mentioned securities.
20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a historical corpus of earnings transcripts or a continuous feed of ticker sentiment — we scope, build, and operate the pipeline. Tell us what you need.