SYSTEM all green source dataroma.com queue 1,429 profiles p99 latency 218ms dataflirt.com · scraper/dataroma-com
RUN * 41 active pipelines * dataroma.com live

Superinvestor data,
at warehouse scale.

We extract 13F filings, portfolio updates, insider trading signals, and holding histories from Dataroma. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Snowflake on your cadence.

Portfolios tracked
84
13F filings extracted
4,192 /quarter
Insider trades
12,405 /month
Active pipelines
41
Uptime
99.98%
Data Dictionary

Every field we extract from dataroma.com

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Portfolio Holdings objects from dataroma.com. All fields typed and schema-versioned.

investor_nametickercompany_nameportfolio_pctshares_heldreported_valuerecent_activityquarterfiling_date
portfolio_holdings
● 200 OK
"investor_name": "Warren Buffett - Berkshire Hathaway",
"ticker": "AAPL",
"company_name": "Apple Inc.",
"portfolio_pct": 42.9,
"shares_held": 905560000,
"reported_value": 156234000000,
"recent_activity": "Reduce 1.2%",
"quarter": "Q4 2025"
# investor_nametickercompany_nameportfolio_pctshares_heldreported_value
1
2
3

Complete list of extractable fields for Insider Trades objects from dataroma.com. All fields typed and schema-versioned.

tickercompany_nameinsider_namerelationshiptransaction_datetransaction_typeshares_tradedprice_per_sharetotal_valuefiling_date
insider_trades
● 200 OK
"ticker": "MSFT",
"company_name": "Microsoft Corp.",
"insider_name": "Nadella Satya",
"relationship": "CEO",
"transaction_date": "2026-01-14",
"transaction_type": "Sell",
"shares_traded": 120000,
"price_per_share": 385.42,
"total_value": 46250400
# tickercompany_nameinsider_namerelationshiptransaction_datetransaction_type
1
2
3

Complete list of extractable fields for Superinvestor Activity objects from dataroma.com. All fields typed and schema-versioned.

investor_namequartertickeraction_typeshares_tradedprice_avgportfolio_impactfiling_datetotal_value
superinvestor_activity
● 200 OK
"investor_name": "Bill Ackman - Pershing Square",
"quarter": "Q4 2025",
"ticker": "GOOGL",
"action_type": "Buy",
"shares_traded": 1500000,
"price_avg": 142.5,
"portfolio_impact": 2.1,
"filing_date": "2026-02-14"
# investor_namequartertickeraction_typeshares_tradedprice_avg
1
2
3

Complete list of extractable fields for Grand Portfolio objects from dataroma.com. All fields typed and schema-versioned.

tickercompany_namesuperinvestor_counttotal_shares_heldownership_pctcurrent_pricepe_ratioforward_pemarket_cap
grand_portfolio
● 200 OK
"ticker": "META",
"company_name": "Meta Platforms",
"superinvestor_count": 34,
"total_shares_held": 45200000,
"ownership_pct": 1.7,
"current_price": 485.2,
"pe_ratio": 24.5,
"market_cap": 1240000000000
# tickercompany_namesuperinvestor_counttotal_shares_heldownership_pctcurrent_price
1
2
3

Complete list of extractable fields for S&P 500 Grid objects from dataroma.com. All fields typed and schema-versioned.

tickercompany_namesectorcurrent_pricehigh_52wlow_52wsuperinvestor_buyssuperinvestor_sellsnet_activityvolume
s&p_500 grid
● 200 OK
"ticker": "AMZN",
"company_name": "Amazon.com Inc.",
"sector": "Consumer Discretionary",
"current_price": 178.4,
"superinvestor_buys": 12,
"superinvestor_sells": 4,
"net_activity": "Strong Buy",
"volume": 42000000
# tickercompany_namesectorcurrent_pricehigh_52wlow_52w
1
2
3

Capabilities

Everything you need from Dataroma, nothing you do not

Our Dataroma scraper parses nested HTML tables, normalises financial metrics, and tracks historical 13F filings with precision. Built for quant funds and financial analysts.

Superinvestor Portfolios

Extract complete holdings, portfolio percentages, and recent activity for every tracked investor on Dataroma.

Insider Trading Feeds

Capture CEO, CFO, and director transactions, including share counts, average prices, and total transaction values.

Grand Portfolio Consensus

Aggregate data across all superinvestors to identify highly concentrated consensus trades and ownership metrics.

Historical 13F Parsing

Traverse historical pagination to extract quarter-over-quarter portfolio adjustments and long-term holding strategies.

Ticker-Level Aggregation

Pull every superinvestor transaction and insider trade associated with a specific stock ticker in a single unified view.

S&P 500 Grid Extraction

Extract the Dataroma S&P 500 grid matrix, capturing sector activity and aggregate buy/sell signals.

Real-Time Filing Detection

Monitor Dataroma for new 13F updates during SEC filing season and push updates directly to your warehouse.

Change Detection (Diffs)

Run continuous pipelines that only emit new trades or portfolio adjustments, reducing downstream processing costs.

Automated Pagination Handling

Navigate deep historical transaction pages automatically, ensuring no trade is missed in your dataset.

// engagement pipeline

From ticker list to warehouse record

Brief in. Clean data out.

Define Scope
d 0

Provide target investors, tickers, or specific data grids. We design the extraction schema together.

Pipeline Build
d 2–4

We configure Scrapy crawlers, proxy rotation, and HTML table parsers specifically for Dataroma's DOM structure.

Validation & QA
d 4–6

Schema validation, null-rate checks, and financial metric normalisation before full launch.

Delivery
ongoing

JSON, CSV, or Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.

Under the hood

How our Dataroma pipeline handles the hard parts

Financial data requires absolute precision. Here is how we maintain data integrity across Dataroma's nested HTML structures.

pipeline-monitor · dataroma.com · live ● active
// fingerprinting
Identity rotation
TLS fingerprintrandomised
User-agentrotated
IP poolresidential
Challenges blocked0
// pagination
Page coverage
48,291 pages queued running
// observability
Pipeline health
99.9%
uptime
142ms
p99 lat
0.3%
null rate
2
alerts
Anti-bot layer
IP rotation and rate limiting

Dataroma implements rate limiting to prevent bulk scraping. We use distributed US-based proxies with strict concurrency controls to extract data reliably without triggering blocks.

HTML Table Parsing
Resilient selectors for financial data

Dataroma relies heavily on nested HTML tables. Our parsers use strict row-column mapping and data-type casting to ensure financial figures are never misaligned during extraction.

Change detection
Only re-scrape what has changed

For daily insider trading monitoring, we maintain a hash index of last-seen transactions. Subsequent runs only push new trades, providing a clean changelog.

Historical pagination
Deep scraping for historical context

Extracting years of insider trades requires traversing hundreds of paginated views. Our pipeline handles stateful pagination automatically, ensuring complete historical datasets.

Monitoring & alerting
24/7 pipeline health

Every run emits structured logs. We alert on null-rate spikes or schema drift immediately, ensuring your quantitative models are never fed corrupted data.

Applications

Who uses Dataroma data and how

Teams across industries use dataroma.com data to build competitive products and smarter operations.

01
Quant & Hedge Fund Alpha

Quantitative funds ingest 13F and insider trading signals to backtest strategies and identify institutional money flow.

02
Retail Trading Apps

Fintech platforms display superinvestor consensus and insider buying activity directly within their user interfaces.

03
Alternative Data Feeds

Data vendors aggregate Dataroma metrics with social sentiment and news feeds to create composite trading signals.

04
Financial Research

Equity analysts track historical portfolio adjustments of successful investors to validate their own investment theses.

05
Insider Anomaly Detection

Compliance and research teams monitor cluster buying by corporate executives as a leading indicator of company performance.

06
Sector Sentiment Analysis

Macro analysts aggregate S&P 500 grid data to identify which sectors superinvestors are rotating into or out of.

Why DataFlirt

"Dataroma aggregates the highest-signal 13F filings and insider trades in the market, but extracting it programmatically requires dedicated infrastructure."

Financial data pipelines require absolute precision. A single misaligned table cell corrupts downstream models. DataFlirt handles the extraction, validation, and normalisation of Dataroma's nested HTML structures so your quants can focus on alpha generation.

Technical Spec

Dataroma scraper technical capabilities

Everything supported by our dataroma.com scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

13F Portfolio Extraction
Complete extraction of all superinvestor holdings and quarterly adjustments
Supported
Insider Trading History
Full historical extraction of CEO, CFO, and director transactions
Supported
Grand Portfolio Aggregation
Consensus ownership metrics across all tracked investors
Supported
Pagination traversal
Automated navigation of deep historical transaction pages
Supported
Change detection (diffs)
Hash-based diff to only emit new trades since last run
Supported
Webhook delivery
HTTP POST per new trade for real-time alerts
Supported
S&P 500 Grid scraping
Extraction of sector-level superinvestor activity matrices
Supported
Real-time pre-filing trades
Trades before SEC 13F publication are gated by SEC rules and not available on Dataroma
Partial
Personal user watchlists
Extraction of private user portfolios requires account authentication
Partial
Infrastructure

Infrastructure powering the Dataroma pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheus
Scrapy + Playwright Stack

Scrapy handles crawl orchestration, deduplication, and retry logic. Playwright handles any dynamic content rendering required.

Proxy Infrastructure

We maintain pools of proxies to distribute request load, ensuring consistent access without triggering rate limits.

Cloud-Native Orchestration

Pipelines run on AWS Lambda and ECS. Airflow handles scheduling, dependency management, and SLA alerting.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON
Newline-delimited or nested array structures
CSV
Flat file with typed columns for financial modelling
Parquet
Columnar format for BigQuery, Snowflake, Athena
S3
Direct bucket delivery compatible with any data lake
Webhook
HTTP POST per new trade for real-time downstream processing
API
REST endpoint to query your extracted dataset
BigQuery
Streamed directly into your dataset
Snowflake
Stage and COPY INTO workflow
XLS
Excel format for manual analyst review
// faq

Common questions.

About dataroma.com scraping, legality, and pipeline operations.

Ask us directly →
Is scraping Dataroma legal?

Scraping publicly available information from Dataroma is generally permissible. DataFlirt extracts only public 13F filing aggregations and insider trading data. Clients should review Dataroma's terms of service and consult legal counsel for specific use cases.

How do you handle rate limits on Dataroma?

We use distributed proxy pools and strict concurrency controls. Our request timing is modelled to respect server load, ensuring reliable extraction without triggering blocks.

How fresh is the data?

Pipelines can be configured to run daily or hourly, capturing new insider trades or 13F updates as soon as they are published to the platform.

Can you extract historical data?

Yes. We can traverse historical pagination to extract years of insider trading data and quarterly portfolio adjustments for backtesting.

What delivery formats do you support?

We deliver in JSON, CSV, Parquet, and XLS. We can push directly to S3, BigQuery, Snowflake, or trigger webhooks for real-time alerts.

Can I request a sample dataset?

Absolutely. We provide a sample run of up to 5 superinvestor profiles or 500 insider trades as part of the pre-engagement scoping process.

$ dataflirt scope --new-project --source=dataroma.com ready

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a historical 13F backtest dataset or a continuous daily feed of insider trades, we build and operate the pipeline. Tell us what you need.

hello@dataflirt.com · Bengaluru · IST · typical reply < 4h
Services

Data Extraction for Every Industry

View All Services →