SYSTEM all green source dataroma.com queue 1,429 profiles p99 latency 218ms dataflirt.com · scraper/dataroma-com

RUN * 41 active pipelines * dataroma.com live

Superinvestor data,
at warehouse scale.

We extract 13F filings, portfolio updates, insider trading signals, and holding histories from Dataroma. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Snowflake on your cadence.

Get data from dataroma.com → See how it works

Portfolios tracked

13F filings extracted

4,192 /quarter

Insider trades

12,405 /month

Active pipelines

Uptime

99.98%

◆ Superinvestor Portfolios◆ 13F Filing Extraction◆ Insider Trading Data◆ Portfolio Holdings◆ Buy/Sell Activity◆ Grand Portfolio Consensus◆ S&P 500 Grid Data◆ Historical Holding Activity◆ Ticker Level Analysis◆ CEO/CFO Transactions◆ Managed Pipeline◆ S3 / BigQuery Delivery◆ Bengaluru HQ◆ Superinvestor Portfolios◆ 13F Filing Extraction◆ Insider Trading Data◆ Portfolio Holdings◆ Buy/Sell Activity◆ Grand Portfolio Consensus◆ S&P 500 Grid Data◆ Historical Holding Activity◆ Ticker Level Analysis◆ CEO/CFO Transactions◆ Managed Pipeline◆ S3 / BigQuery Delivery◆ Bengaluru HQ

Data Dictionary

Every field we extract from dataroma.com

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Portfolio Holdings objects from dataroma.com. All fields typed and schema-versioned.

investor_nametickercompany_nameportfolio_pctshares_heldreported_valuerecent_activityquarterfiling_date

"investor_name": "Warren Buffett - Berkshire Hathaway",
"ticker": "AAPL",
"company_name": "Apple Inc.",
"portfolio_pct": 42.9,
"shares_held": 905560000,
"reported_value": 156234000000,
"recent_activity": "Reduce 1.2%",
"quarter": "Q4 2025"

#	investor_name	ticker	company_name	portfolio_pct	shares_held	reported_value
1
2
3

Complete list of extractable fields for Insider Trades objects from dataroma.com. All fields typed and schema-versioned.

tickercompany_nameinsider_namerelationshiptransaction_datetransaction_typeshares_tradedprice_per_sharetotal_valuefiling_date

"ticker": "MSFT",
"company_name": "Microsoft Corp.",
"insider_name": "Nadella Satya",
"relationship": "CEO",
"transaction_date": "2026-01-14",
"transaction_type": "Sell",
"shares_traded": 120000,
"price_per_share": 385.42,
"total_value": 46250400

#	ticker	company_name	insider_name	relationship	transaction_date	transaction_type
1
2
3

Complete list of extractable fields for Superinvestor Activity objects from dataroma.com. All fields typed and schema-versioned.

investor_namequartertickeraction_typeshares_tradedprice_avgportfolio_impactfiling_datetotal_value

"investor_name": "Bill Ackman - Pershing Square",
"quarter": "Q4 2025",
"ticker": "GOOGL",
"action_type": "Buy",
"shares_traded": 1500000,
"price_avg": 142.5,
"portfolio_impact": 2.1,
"filing_date": "2026-02-14"

#	investor_name	quarter	ticker	action_type	shares_traded	price_avg
1
2
3

Complete list of extractable fields for Grand Portfolio objects from dataroma.com. All fields typed and schema-versioned.

tickercompany_namesuperinvestor_counttotal_shares_heldownership_pctcurrent_pricepe_ratioforward_pemarket_cap

"ticker": "META",
"company_name": "Meta Platforms",
"superinvestor_count": 34,
"total_shares_held": 45200000,
"ownership_pct": 1.7,
"current_price": 485.2,
"pe_ratio": 24.5,
"market_cap": 1240000000000

#	ticker	company_name	superinvestor_count	total_shares_held	ownership_pct	current_price
1
2
3

Complete list of extractable fields for S&P 500 Grid objects from dataroma.com. All fields typed and schema-versioned.

tickercompany_namesectorcurrent_pricehigh_52wlow_52wsuperinvestor_buyssuperinvestor_sellsnet_activityvolume

"ticker": "AMZN",
"company_name": "Amazon.com Inc.",
"sector": "Consumer Discretionary",
"current_price": 178.4,
"superinvestor_buys": 12,
"superinvestor_sells": 4,
"net_activity": "Strong Buy",
"volume": 42000000

#	ticker	company_name	sector	current_price	high_52w	low_52w
1
2
3

Capabilities

Everything you need from Dataroma, nothing you do not

Our Dataroma scraper parses nested HTML tables, normalises financial metrics, and tracks historical 13F filings with precision. Built for quant funds and financial analysts.

Superinvestor Portfolios

Extract complete holdings, portfolio percentages, and recent activity for every tracked investor on Dataroma.

Insider Trading Feeds

Capture CEO, CFO, and director transactions, including share counts, average prices, and total transaction values.

Grand Portfolio Consensus

Aggregate data across all superinvestors to identify highly concentrated consensus trades and ownership metrics.

Historical 13F Parsing

Traverse historical pagination to extract quarter-over-quarter portfolio adjustments and long-term holding strategies.

Ticker-Level Aggregation

Pull every superinvestor transaction and insider trade associated with a specific stock ticker in a single unified view.

S&P 500 Grid Extraction

Extract the Dataroma S&P 500 grid matrix, capturing sector activity and aggregate buy/sell signals.

Real-Time Filing Detection

Monitor Dataroma for new 13F updates during SEC filing season and push updates directly to your warehouse.

Change Detection (Diffs)

Run continuous pipelines that only emit new trades or portfolio adjustments, reducing downstream processing costs.

Automated Pagination Handling

Navigate deep historical transaction pages automatically, ensuring no trade is missed in your dataset.

// engagement pipeline

From ticker list to warehouse record

Brief in. Clean data out.

Define Scope

d 0

Provide target investors, tickers, or specific data grids. We design the extraction schema together.

Pipeline Build

d 2–4

We configure Scrapy crawlers, proxy rotation, and HTML table parsers specifically for Dataroma's DOM structure.

Validation & QA

d 4–6

Schema validation, null-rate checks, and financial metric normalisation before full launch.

Delivery

ongoing

JSON, CSV, or Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.

Under the hood

How our Dataroma pipeline handles the hard parts

Financial data requires absolute precision. Here is how we maintain data integrity across Dataroma's nested HTML structures.

// fingerprinting

Identity rotation

TLS fingerprintrandomised

User-agentrotated

IP poolresidential

Challenges blocked0

// pagination

Page coverage

48,291 pages queued running

// observability

Pipeline health

99.9%

uptime

142ms

p99 lat

0.3%

null rate

alerts

Anti-bot layer

IP rotation and rate limiting

Dataroma implements rate limiting to prevent bulk scraping. We use distributed US-based proxies with strict concurrency controls to extract data reliably without triggering blocks.

HTML Table Parsing

Resilient selectors for financial data

Dataroma relies heavily on nested HTML tables. Our parsers use strict row-column mapping and data-type casting to ensure financial figures are never misaligned during extraction.

Change detection

Only re-scrape what has changed

For daily insider trading monitoring, we maintain a hash index of last-seen transactions. Subsequent runs only push new trades, providing a clean changelog.

Historical pagination

Deep scraping for historical context

Extracting years of insider trades requires traversing hundreds of paginated views. Our pipeline handles stateful pagination automatically, ensuring complete historical datasets.

Monitoring & alerting

24/7 pipeline health

Every run emits structured logs. We alert on null-rate spikes or schema drift immediately, ensuring your quantitative models are never fed corrupted data.

Applications

Who uses Dataroma data and how

Teams across industries use dataroma.com data to build competitive products and smarter operations.

Quant & Hedge Fund Alpha

Quantitative funds ingest 13F and insider trading signals to backtest strategies and identify institutional money flow.

Retail Trading Apps

Fintech platforms display superinvestor consensus and insider buying activity directly within their user interfaces.

Alternative Data Feeds

Data vendors aggregate Dataroma metrics with social sentiment and news feeds to create composite trading signals.

Financial Research

Equity analysts track historical portfolio adjustments of successful investors to validate their own investment theses.

Insider Anomaly Detection

Compliance and research teams monitor cluster buying by corporate executives as a leading indicator of company performance.

Sector Sentiment Analysis

Macro analysts aggregate S&P 500 grid data to identify which sectors superinvestors are rotating into or out of.

Why DataFlirt

"Dataroma aggregates the highest-signal 13F filings and insider trades in the market, but extracting it programmatically requires dedicated infrastructure."

Financial data pipelines require absolute precision. A single misaligned table cell corrupts downstream models. DataFlirt handles the extraction, validation, and normalisation of Dataroma's nested HTML structures so your quants can focus on alpha generation.

Technical Spec

Dataroma scraper technical capabilities

Everything supported by our dataroma.com scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

13F Portfolio Extraction

Complete extraction of all superinvestor holdings and quarterly adjustments

Supported

Insider Trading History

Full historical extraction of CEO, CFO, and director transactions

Supported

Grand Portfolio Aggregation

Consensus ownership metrics across all tracked investors

Supported

Pagination traversal

Automated navigation of deep historical transaction pages

Supported

Change detection (diffs)

Hash-based diff to only emit new trades since last run

Supported

Webhook delivery

HTTP POST per new trade for real-time alerts

Supported

S&P 500 Grid scraping

Extraction of sector-level superinvestor activity matrices

Supported

Real-time pre-filing trades

Trades before SEC 13F publication are gated by SEC rules and not available on Dataroma

Partial

Personal user watchlists

Extraction of private user portfolios requires account authentication

Partial

Infrastructure

Infrastructure powering the Dataroma pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheus

Scrapy + Playwright Stack

Scrapy handles crawl orchestration, deduplication, and retry logic. Playwright handles any dynamic content rendering required.

Proxy Infrastructure

We maintain pools of proxies to distribute request load, ensuring consistent access without triggering rate limits.

Cloud-Native Orchestration

Pipelines run on AWS Lambda and ECS. Airflow handles scheduling, dependency management, and SLA alerting.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON

Newline-delimited or nested array structures

CSV

Flat file with typed columns for financial modelling

Parquet

Columnar format for BigQuery, Snowflake, Athena

Direct bucket delivery compatible with any data lake

Webhook

HTTP POST per new trade for real-time downstream processing

API

REST endpoint to query your extracted dataset

BigQuery

Streamed directly into your dataset

Snowflake

Stage and COPY INTO workflow

XLS

Excel format for manual analyst review

// faq

Common questions.

About dataroma.com scraping, legality, and pipeline operations.

Ask us directly →

Is scraping Dataroma legal?

Scraping publicly available information from Dataroma is generally permissible. DataFlirt extracts only public 13F filing aggregations and insider trading data. Clients should review Dataroma's terms of service and consult legal counsel for specific use cases.

How do you handle rate limits on Dataroma?

We use distributed proxy pools and strict concurrency controls. Our request timing is modelled to respect server load, ensuring reliable extraction without triggering blocks.

How fresh is the data?

Pipelines can be configured to run daily or hourly, capturing new insider trades or 13F updates as soon as they are published to the platform.

Can you extract historical data?

Yes. We can traverse historical pagination to extract years of insider trading data and quarterly portfolio adjustments for backtesting.

What delivery formats do you support?

We deliver in JSON, CSV, Parquet, and XLS. We can push directly to S3, BigQuery, Snowflake, or trigger webhooks for real-time alerts.

Can I request a sample dataset?

Absolutely. We provide a sample run of up to 5 superinvestor profiles or 500 insider trades as part of the pre-engagement scoping process.

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a historical 13F backtest dataset or a continuous daily feed of insider trades, we build and operate the pipeline. Tell us what you need.

Start a dataroma.com pipeline → View pricing

hello@dataflirt.com · Bengaluru · IST · typical reply < 4h

Services

Data Extraction for Every Industry

View All Services →

🛍️ eCommerce → 🔍 Search Engine → ⚽ Sports Data → 📱 App Store → 🍕 Food Delivery → 📉 Betting Odds → ✈️ Aviation & Flight → 🛒 Grocery → 🎓 E-Learning → 💹 Stock Market → 🏠 Real Estate → 🤖 AI Training Data → 🧠 LLM Data → 📰 News → ⭐ Reviews → 💼 Job Board → 🏥 Healthcare → 💊 Pharma → 🏢 Company Data → 🤝 B2B Marketplace → 🚗 Automotive → 🌍 Travel → 🏨 Hospitality → 🪙 Cryptocurrency → 💡 IP & Patents → 📈 SEO Data → ⚖️ Legal → 🛡️ Insurance → 📲 Mobile App → 📸 Influencer → 🏛️ Government → 🚚 Transportation → 🎟️ Events → 📂 Directory → ⚡ Dynamic Websites → 📄 PDF Extraction → ✍️ Blog Content → ☁️ Weather → 🖥️ Cloud Scraping → 👨‍💻 Managed Service →

Superinvestor data, at warehouse scale.

Every field we extract from dataroma.com

Everything you need from Dataroma, nothing you do not

From ticker list to warehouse record

How our Dataroma pipeline handles the hard parts

Who uses Dataroma data and how

Dataroma scraper technical capabilities

Infrastructure powering the Dataroma pipeline

Your data, your destination

Common questions.

Tell us whatto extract. We do the rest.

Data Extraction for Every Industry

Superinvestor data,
at warehouse scale.

Tell us what
to extract.
We do the rest.