SYSTEM all green source ft.com queue 12,943 URLs p99 latency 218ms dataflirt.com · scraper/ft-com
RUN | 64 active pipelines | ft.com live

Financial Times data,
at warehouse scale.

We extract global market news, corporate tearsheets, economic indicators, and Lex column analysis from ft.com. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Snowflake on your cadence.

Articles extracted
4,192 /day
Market tickers
84,201 /run
Company tearsheets
18,400 /week
Active pipelines
64
Uptime
99.98%
Data Dictionary

Every field we extract from ft.com

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for News Articles objects from ft.com. All fields typed and schema-versioned.

article_idheadlinesubheadlineauthorpublished_dateupdated_datetopic_tagsbody_text_summaryword_countpaywall_status
news_articles
● 200 OK
"article_id": "0b1a2c3d-4e5f-6g7h-8i9j",
"headline": "Global markets rally on inflation data",
"author": "Katie Martin",
"published_date": "2026-05-12T08:30:00Z",
"topic_tags": "['Equities', 'Inflation', 'Global Economy']",
"paywall_status": "hard"
# article_idheadlinesubheadlineauthorpublished_dateupdated_date
1
2
3

Complete list of extractable fields for Market Data objects from ft.com. All fields typed and schema-versioned.

tickerexchangecompany_namecurrent_pricecurrencyprice_change_absprice_change_pctvolumemarket_cappe_ratiodividend_yield52_week_high52_week_low
market_data
● 200 OK
"ticker": "AAPL",
"exchange": "NSQ",
"current_price": 185.42,
"price_change_pct": 1.24,
"volume": 45210000,
"market_cap": "2.8T",
"pe_ratio": 28.4
# tickerexchangecompany_namecurrent_pricecurrencyprice_change_abs
1
2
3

Complete list of extractable fields for Company Tearsheets objects from ft.com. All fields typed and schema-versioned.

company_idnamesectorindustrydescriptionhq_locationemployeesrevenue_ttmnet_incometotal_assetskey_executiveswebsite
company_tearsheets
● 200 OK
"company_id": "847291",
"name": "Unilever PLC",
"sector": "Consumer Defensive",
"revenue_ttm": "60.1B",
"net_income": "7.6B",
"hq_location": "London, UK"
# company_idnamesectorindustrydescriptionhq_location
1
2
3

Complete list of extractable fields for Lex Column objects from ft.com. All fields typed and schema-versioned.

lex_idtitleteaserpublished_datecompanies_mentionedtickers_mentionedprimary_sectorsentiment_scoreword_count
lex_column
● 200 OK
"lex_id": "lex-998877",
"title": "Tech valuations: back to reality",
"published_date": "2026-05-11T14:00:00Z",
"companies_mentioned": "['Microsoft', 'Alphabet']",
"tickers_mentioned": "['MSFT', 'GOOGL']",
"sentiment_score": -0.45
# lex_idtitleteaserpublished_datecompanies_mentionedtickers_mentioned
1
2
3

Complete list of extractable fields for Economic Indicators objects from ft.com. All fields typed and schema-versioned.

countryindicator_namecurrent_valueprevious_valueunitfrequencyrelease_datenext_release_datesource_agency
economic_indicators
● 200 OK
"country": "United Kingdom",
"indicator_name": "CPI YoY",
"current_value": 2.1,
"previous_value": 2.3,
"unit": "Percentage",
"release_date": "2026-05-10T07:00:00Z"
# countryindicator_namecurrent_valueprevious_valueunitfrequency
1
2
3

Capabilities

Financial intelligence: structured and delivered

Our FT scraper processes high-velocity news cycles, complex market data tables, and corporate tearsheets. We handle session management, dynamic charts, and anti-bot circumvention.

Global News Extraction

Headlines, metadata, summaries, and topic tags extracted across all geographic and sector-specific news feeds.

Lex Column Parsing

Opinion and analysis targeting specific tickers, captured with author metadata and publication timestamps.

Market Data Tracking

Equities, commodities, and FX prices captured from FT's market data portal with full historical snapshots.

Corporate Tearsheets

Fundamentals, key executives, and corporate descriptions parsed from nested HTML financial tables.

Author & Topic Tagging

Track specific journalists or macro themes across the entire ft.com domain.

Economic Calendar

Central bank rates, inflation data, and GDP prints structured into queryable time-series data.

ESG Metrics

Extracting sustainability reporting data and corporate governance news mentions.

M&A Intelligence

Parsing deal announcements, valuations, and advisor metadata from the deals section.

Scheduled + Streaming Modes

Run intraday updates for breaking news or daily historical dumps for quantitative modelling.

// engagement pipeline

From FT URL to warehouse record

Brief in. Clean data out.

Define Scope
d 0

Provide topics, tickers, authors, or market indices. We design the extraction schema together.

Pipeline Build
d 2–4

We configure Scrapy / Playwright crawlers, proxy rotation, and session handling for ft.com.

Validation & QA
d 4–6

Schema validation, null-rate monitoring, ticker mapping, and sample records before full launch.

Delivery
ongoing

JSON / CSV / Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.

Under the hood

Handling FT's technical complexity

Financial Times employs strict paywalls, complex dynamic data visualisations, and aggressive bot mitigation. Here is how we maintain pipeline stability.

pipeline-monitor · ft.com · live ● active
// fingerprinting
Identity rotation
TLS fingerprintrandomised
User-agentrotated
IP poolresidential
Challenges blocked0
// pagination
Page coverage
48,291 pages queued running
// observability
Pipeline health
99.9%
uptime
142ms
p99 lat
0.3%
null rate
2
alerts
Paywall state management
Detecting hard vs soft paywalls

We identify paywall states dynamically and extract all public metadata, tags, and summaries without violating access controls.

JavaScript rendering
Playwright execution for dynamic charts

Market data and interactive charts require full DOM hydration. We run headless Playwright sessions to capture data that standard HTTP requests miss.

Residential proxy rotation
UK and US residential IPs

Datacenter IPs are blocked instantly. We route requests through residential ISP proxies to maintain high success rates and avoid rate limits.

Complex table parsing
Normalising nested HTML tables

Corporate tearsheets use complex, frequently changing table structures. Our selectors normalise these into flat, predictable JSON schemas.

High-frequency polling
Change detection for breaking news

We maintain hash indexes of article states to detect updates and corrections in real time, pushing only the diffs to your warehouse.

Applications

Who uses FT data and how

Teams across industries use ft.com data to build competitive products and smarter operations.

01
Algorithmic Trading

Quantitative funds run sentiment analysis on breaking news and Lex columns to inform high-frequency trading models.

02
Competitor Intelligence

Corporate strategy teams monitor sector-specific news, executive moves, and M&A activity.

03
Macroeconomic Forecasting

Economists track global economic indicators and central bank commentary to adjust macro models.

04
ESG Monitoring

Asset managers aggregate sustainability reports and corporate governance news for portfolio screening.

05
Credit Risk Assessment

Risk teams monitor negative news flow and market data for corporate debt issuers.

06
Investment Research

Analysts feed quantitative models with corporate fundamental data extracted from FT tearsheets.

Why DataFlirt

"Financial Times dictates the narrative for global markets. Without structured extraction, quantitative teams miss the critical sentiment signals embedded in the Lex column and breaking news."

Extracting data from ft.com requires navigating strict access controls, dynamic market widgets, and aggressive rate limiting. DataFlirt manages the infrastructure layer: proxy rotation, session handling, and schema maintenance: so your quantitative analysts can focus on signal generation rather than DOM parsing.

Technical Spec

FT scraper: technical capabilities

Everything supported by our ft.com scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

JavaScript rendering
Playwright sessions for dynamic market data and interactive charts
Supported
Residential proxy rotation
UK and US ISP proxies rotated per request
Supported
Metadata & Tag extraction
Authors, topics, and sectors mapped per article
Supported
Corporate tearsheet parsing
Fundamentals, executives, and historical pricing
Supported
Change detection
Hash-based diffs to emit only new or updated articles
Supported
Webhook delivery
HTTP POST for real-time news alerts
Supported
Historical archive
Extraction spanning 10+ years where public metadata exists
Supported
Full-text article extraction
Premium subscriber-only deep archives without client credentials
Partial
MyFT personalised feeds
Custom user feeds requiring authenticated sessions
Partial
Infrastructure

Infrastructure powering the FT pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheusBeautifulSoupCelery
Scrapy + Playwright Stack

Scrapy handles orchestration and deduplication. Playwright handles JavaScript rendering and interaction flows for market data.

Residential Proxy Infrastructure

We maintain pools of residential ISP proxies across UK and US regions. Rotation happens per request with sticky sessions where required.

Cloud-Native Orchestration

Pipelines run on AWS Lambda and ECS. Airflow handles scheduling, dependency management, and SLA alerting.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON
Newline-delimited or nested
CSV
Flat file with typed columns
XLS
Excel format for analyst review
Parquet
Columnar format for BigQuery and Snowflake
AWS S3
Direct bucket delivery
Webhook
HTTP POST per record for real-time alerts
API
REST endpoints for on-demand querying
BigQuery
Streamed directly into your dataset
Snowflake
Stage and COPY INTO workflow
S3
Direct bucket delivery — compatible with any data lake
// faq

Common questions.

About ft.com scraping, legality, and pipeline operations.

Ask us directly →
Is scraping ft.com legal?

Scraping publicly available metadata, headlines, and market data is generally permissible. We do not bypass paywalls to extract gated full-text content without client-provided credentials. Clients must review FT terms of service and consult legal counsel for their specific use case.

Do you bypass the FT paywall?

No. For unauthenticated pipelines, we only extract publicly visible metadata, summaries, tags, and market data. Full-text extraction requires the client to supply valid FT enterprise credentials for an isolated, authenticated pipeline.

How fast can you deliver breaking news?

For targeted sections or specific tickers, we can configure sub-minute polling intervals with webhook delivery, ensuring your trading models receive signals instantly.

Can you extract data from the Lex column?

Yes. We extract Lex column metadata, publication timestamps, author details, and the specific companies or tickers mentioned, which is highly valuable for sentiment analysis.

Do you scrape market data and corporate tearsheets?

Yes. We parse the FT market data portal to extract equity pricing, corporate fundamentals, key executives, and historical performance metrics.

What formats do you deliver?

We deliver structured JSON, CSV, XLS, and Parquet files directly to AWS S3, Google BigQuery, or Snowflake. We also support Webhooks and API endpoints.

Can I provide my own FT credentials?

Yes. If your organisation has an enterprise FT subscription that permits automated access, we can configure an authenticated pipeline using your credentials in an isolated environment.

$ dataflirt scope --new-project --source=ft.com ready

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off historical news dump or a continuous market data feed, we scope, build, and operate the pipeline. Tell us what you need.

hello@dataflirt.com · Bengaluru · IST · typical reply < 4h
Services

Data Extraction for Every Industry

View All Services →