We extract 13F filings, portfolio updates, insider trading signals, and holding histories from Dataroma. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Snowflake on your cadence.
Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.
Complete list of extractable fields for Portfolio Holdings objects from dataroma.com. All fields typed and schema-versioned.
"investor_name": "Warren Buffett - Berkshire Hathaway", "ticker": "AAPL", "company_name": "Apple Inc.", "portfolio_pct": 42.9, "shares_held": 905560000, "reported_value": 156234000000, "recent_activity": "Reduce 1.2%", "quarter": "Q4 2025"
| # | investor_name | ticker | company_name | portfolio_pct | shares_held | reported_value |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for Insider Trades objects from dataroma.com. All fields typed and schema-versioned.
"ticker": "MSFT", "company_name": "Microsoft Corp.", "insider_name": "Nadella Satya", "relationship": "CEO", "transaction_date": "2026-01-14", "transaction_type": "Sell", "shares_traded": 120000, "price_per_share": 385.42, "total_value": 46250400
| # | ticker | company_name | insider_name | relationship | transaction_date | transaction_type |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for Superinvestor Activity objects from dataroma.com. All fields typed and schema-versioned.
"investor_name": "Bill Ackman - Pershing Square", "quarter": "Q4 2025", "ticker": "GOOGL", "action_type": "Buy", "shares_traded": 1500000, "price_avg": 142.5, "portfolio_impact": 2.1, "filing_date": "2026-02-14"
| # | investor_name | quarter | ticker | action_type | shares_traded | price_avg |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for Grand Portfolio objects from dataroma.com. All fields typed and schema-versioned.
"ticker": "META", "company_name": "Meta Platforms", "superinvestor_count": 34, "total_shares_held": 45200000, "ownership_pct": 1.7, "current_price": 485.2, "pe_ratio": 24.5, "market_cap": 1240000000000
| # | ticker | company_name | superinvestor_count | total_shares_held | ownership_pct | current_price |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for S&P 500 Grid objects from dataroma.com. All fields typed and schema-versioned.
"ticker": "AMZN", "company_name": "Amazon.com Inc.", "sector": "Consumer Discretionary", "current_price": 178.4, "superinvestor_buys": 12, "superinvestor_sells": 4, "net_activity": "Strong Buy", "volume": 42000000
| # | ticker | company_name | sector | current_price | high_52w | low_52w |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Our Dataroma scraper parses nested HTML tables, normalises financial metrics, and tracks historical 13F filings with precision. Built for quant funds and financial analysts.
Extract complete holdings, portfolio percentages, and recent activity for every tracked investor on Dataroma.
Capture CEO, CFO, and director transactions, including share counts, average prices, and total transaction values.
Aggregate data across all superinvestors to identify highly concentrated consensus trades and ownership metrics.
Traverse historical pagination to extract quarter-over-quarter portfolio adjustments and long-term holding strategies.
Pull every superinvestor transaction and insider trade associated with a specific stock ticker in a single unified view.
Extract the Dataroma S&P 500 grid matrix, capturing sector activity and aggregate buy/sell signals.
Monitor Dataroma for new 13F updates during SEC filing season and push updates directly to your warehouse.
Run continuous pipelines that only emit new trades or portfolio adjustments, reducing downstream processing costs.
Navigate deep historical transaction pages automatically, ensuring no trade is missed in your dataset.
Brief in. Clean data out.
Provide target investors, tickers, or specific data grids. We design the extraction schema together.
We configure Scrapy crawlers, proxy rotation, and HTML table parsers specifically for Dataroma's DOM structure.
Schema validation, null-rate checks, and financial metric normalisation before full launch.
JSON, CSV, or Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.
Financial data requires absolute precision. Here is how we maintain data integrity across Dataroma's nested HTML structures.
Dataroma implements rate limiting to prevent bulk scraping. We use distributed US-based proxies with strict concurrency controls to extract data reliably without triggering blocks.
Dataroma relies heavily on nested HTML tables. Our parsers use strict row-column mapping and data-type casting to ensure financial figures are never misaligned during extraction.
For daily insider trading monitoring, we maintain a hash index of last-seen transactions. Subsequent runs only push new trades, providing a clean changelog.
Extracting years of insider trades requires traversing hundreds of paginated views. Our pipeline handles stateful pagination automatically, ensuring complete historical datasets.
Every run emits structured logs. We alert on null-rate spikes or schema drift immediately, ensuring your quantitative models are never fed corrupted data.
Quantitative funds ingest 13F and insider trading signals to backtest strategies and identify institutional money flow.
Fintech platforms display superinvestor consensus and insider buying activity directly within their user interfaces.
Data vendors aggregate Dataroma metrics with social sentiment and news feeds to create composite trading signals.
Equity analysts track historical portfolio adjustments of successful investors to validate their own investment theses.
Compliance and research teams monitor cluster buying by corporate executives as a leading indicator of company performance.
Macro analysts aggregate S&P 500 grid data to identify which sectors superinvestors are rotating into or out of.
"Dataroma aggregates the highest-signal 13F filings and insider trades in the market, but extracting it programmatically requires dedicated infrastructure."
Financial data pipelines require absolute precision. A single misaligned table cell corrupts downstream models. DataFlirt handles the extraction, validation, and normalisation of Dataroma's nested HTML structures so your quants can focus on alpha generation.
Everything supported by our dataroma.com scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.
Open-source tooling on proven cloud infra — no vendor lock-in, full observability.
Scrapy handles crawl orchestration, deduplication, and retry logic. Playwright handles any dynamic content rendering required.
We maintain pools of proxies to distribute request load, ensuring consistent access without triggering rate limits.
Pipelines run on AWS Lambda and ECS. Airflow handles scheduling, dependency management, and SLA alerting.
Data delivered to where your team already works — no new tooling required.
About dataroma.com scraping, legality, and pipeline operations.
Ask us directly →Scraping publicly available information from Dataroma is generally permissible. DataFlirt extracts only public 13F filing aggregations and insider trading data. Clients should review Dataroma's terms of service and consult legal counsel for specific use cases.
We use distributed proxy pools and strict concurrency controls. Our request timing is modelled to respect server load, ensuring reliable extraction without triggering blocks.
Pipelines can be configured to run daily or hourly, capturing new insider trades or 13F updates as soon as they are published to the platform.
Yes. We can traverse historical pagination to extract years of insider trading data and quarterly portfolio adjustments for backtesting.
We deliver in JSON, CSV, Parquet, and XLS. We can push directly to S3, BigQuery, Snowflake, or trigger webhooks for real-time alerts.
Absolutely. We provide a sample run of up to 5 superinvestor profiles or 500 insider trades as part of the pre-engagement scoping process.
20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a historical 13F backtest dataset or a continuous daily feed of insider trades, we build and operate the pipeline. Tell us what you need.