SYSTEM all green source lego.com queue 12,408 pages p99 latency 184ms dataflirt.com · scraper/lego-com
RUN · 31 active pipelines · lego.com live

Lego catalogue data,
at warehouse scale.

We extract set metadata, dynamic inventory states, regional pricing, and the complete Pick a Brick catalogue from Lego.com. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Snowflake on your cadence.

Sets extracted
14.2K /run
Inventory checks
85.4K /24h
Pick a Brick elements
42.1K /run
Active pipelines
31
Uptime
99.98%
Data Dictionary

Every field we extract from lego.com

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Set Metadata objects from lego.com. All fields typed and schema-versioned.

item_numbertitlethemesub_themepiece_countminifigure_countage_rangedimensionsrelease_datedescriptionimage_urlsinstruction_urlpage_url
set_metadata
● 200 OK
"item_number": "75313",
"title": "AT-AT™",
"theme": "Star Wars™",
"sub_theme": "Ultimate Collector Series",
"piece_count": 6785,
"minifigure_count": 9,
"age_range": "18+"
# item_numbertitlethemesub_themepiece_countminifigure_count
1
2
3

Complete list of extractable fields for Inventory & Pricing objects from lego.com. All fields typed and schema-versioned.

item_numberpricelist_pricecurrencydiscount_pctstock_statusbackorder_dateretiring_soonhard_to_findlimit_per_customerregionscraped_at
inventory_& pricing
● 200 OK
"item_number": "75313",
"price": 849.99,
"currency": "USD",
"stock_status": "BACKORDER",
"backorder_date": "2026-11-15",
"retiring_soon": true,
"hard_to_find": true,
"limit_per_customer": 2
# item_numberpricelist_pricecurrencydiscount_pctstock_status
1
2
3

Complete list of extractable fields for Pick a Brick objects from lego.com. All fields typed and schema-versioned.

element_iddesign_idnamecolourcategorypricecurrencystock_statusweight_gdimensionsimage_url
pick_a brick
● 200 OK
"element_id": "6335146",
"design_id": "3001",
"name": "Brick 2x4",
"colour": "Bright Red",
"category": "Bricks",
"price": 0.24,
"stock_status": "IN_STOCK",
"weight_g": 2.32
# element_iddesign_idnamecolourcategoryprice
1
2
3

Complete list of extractable fields for Reviews & Ratings objects from lego.com. All fields typed and schema-versioned.

review_iditem_numberreviewer_nicknameoverall_ratingbuild_experienceplayabilityvalue_for_moneyreview_titlereview_textrecommendeddate_posted
reviews_& ratings
● 200 OK
"review_id": "REV-98241",
"item_number": "75313",
"overall_rating": 4.8,
"build_experience": 5.0,
"playability": 4.0,
"value_for_money": 4.5,
"recommended": true,
"date_posted": "2026-02-14"
# review_iditem_numberreviewer_nicknameoverall_ratingbuild_experienceplayability
1
2
3

Capabilities

Every brick, set, and stock state — structured

Our Lego scraper handles the entire digital catalogue: set specifications, dynamic inventory states, regional pricing disparities, and the granular Pick a Brick database — bypassing rate limits and SPA rendering issues.

Complete Set Metadata

Extract item numbers, piece counts, minifigure counts, age ranges, dimensions, and high-resolution image URLs across all themes.

Dynamic Inventory Tracking

Monitor exact stock states: In Stock, Backorder (with estimated ship dates), Out of Stock, and Retiring Soon flags.

Regional Pricing Intelligence

Capture pricing, currency, and availability disparities across US, UK, EU, and APAC regional storefronts.

Pick a Brick Extraction

Scrape individual element IDs, design IDs, exact colour taxonomies, and per-piece pricing for the entire loose parts catalogue.

Review & Rating Mining

Extract granular review metrics including build experience, playability, and value for money ratings alongside full text.

Scheduled + Streaming Modes

Run one-off bulk exports or configure continuous pipelines at hourly or daily cadences with change-detection diffing.

// engagement pipeline

From set numbers to warehouse records

Brief in. Clean data out.

Define Scope
d 0

Provide theme URLs, regional requirements, or specific data targets like Pick a Brick. We design the extraction schema together.

Pipeline Build
d 2–4

We configure Scrapy / Playwright crawlers, proxy rotation, session management, and CAPTCHA handling for lego.com.

Validation & QA
d 4–6

Schema validation, null-rate checks, price-outlier detection, and sample outputs before full launch.

Delivery
ongoing

JSON / CSV / Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.

Under the hood

How our Lego pipeline handles the hard parts

Lego.com relies heavily on GraphQL and client-side rendering. Here is how we maintain resilient extraction pipelines.

pipeline-monitor · lego.com · live ● active
// fingerprinting
Identity rotation
TLS fingerprintrandomised
User-agentrotated
IP poolresidential
Challenges blocked0
// pagination
Page coverage
48,291 pages queued running
// observability
Pipeline health
99.9%
uptime
142ms
p99 lat
0.3%
null rate
2
alerts
Anti-bot layer
Residential proxy rotation + fingerprint spoofing

Retail sites employ aggressive rate limiting to prevent automated stock checking. Our crawlers use residential ISP proxies with realistic browser fingerprints and randomised request timing to bypass these restrictions.

JavaScript rendering
Full Playwright execution for SPA content

Lego.com is a React-based single-page application. We run full Playwright browser sessions with JavaScript execution to ensure dynamic inventory states and pricing widgets hydrate correctly before extraction.

GraphQL interception
Direct API extraction

Where possible, our pipeline intercepts Lego's backend GraphQL requests, extracting structured JSON directly from the API layer rather than parsing the DOM. This ensures higher reliability and schema stability.

Change detection
Only re-scrape what has changed

For inventory tracking, we maintain a hash index of last-seen stock states. Subsequent runs only push diffs — reducing compute cost and ensuring you only process actual inventory events.

Monitoring & alerting
24/7 pipeline health with anomaly detection

Every run emits structured logs to our observability stack. We alert on null-rate spikes, missing item numbers, and coverage drops — responding before you notice data gaps.

Applications

Who uses Lego data — and how

Teams across industries use lego.com data to build competitive products and smarter operations.

01
Alternative Investment Tracking

Investors track 'Retiring Soon' flags and backorder velocity to predict secondary market price appreciation for highly sought-after sets.

02
Retail Arbitrage

Resellers monitor regional pricing disparities and stock availability to identify cross-border arbitrage opportunities.

03
Competitor Price Monitoring

Toy retailers and department stores track Lego's direct-to-consumer pricing and discount strategies to optimise their own margins.

04
Supply Chain Analysis

Analysts track backorder dates and out-of-stock durations across themes to model manufacturing constraints and demand curves.

05
AFOL Community Platforms

Adult Fans of Lego (AFOL) database maintainers synchronise their platforms with official set metadata, piece counts, and instruction links.

06
Market Research

Industry analysts evaluate price-per-piece metrics, theme longevity, and licensed IP performance based on catalogue composition.

Why DataFlirt

"Lego's digital catalogue contains the most predictable retail arbitrage signals in the toy industry — provided you can track inventory state changes in real time."

Most teams underestimate the investment required: reliable Lego.com scraping requires residential proxies, full JavaScript rendering for their SPA, handling GraphQL rate limits, and monitoring dynamic stock states. DataFlirt absorbs that complexity so your engineers can focus on the analysis — not the infrastructure.

Technical Spec

Lego scraper — technical capabilities

Everything supported by our lego.com scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

JavaScript rendering
Full Playwright sessions — required for dynamic stock states and pricing
Supported
CAPTCHA bypass
Automated 2Captcha + CapSolver integration
Supported
Residential proxy rotation
ISP-grade residential IPs rotated per request to bypass rate limits
Supported
Multi-region support
Extract data across US, UK, EU, and APAC regional storefronts
Supported
GraphQL query extraction
Direct interception of backend API responses for stable schema mapping
Supported
Pick a Brick element mapping
Full extraction of design IDs, element IDs, and exact colour names
Supported
Change detection (diffs)
Hash-based diff: only emit records with changed inventory states
Supported
Webhook delivery
HTTP POST per record for real-time stock alerts
Supported
Lego Insiders point balances
Gated data tied to individual authenticated user accounts
Partial
User order history
Private transactional data behind authentication walls
Partial
Infrastructure

Infrastructure powering the Lego pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheus
Scrapy + Playwright Stack

Scrapy handles crawl orchestration and retry logic. Playwright executes JavaScript and intercepts GraphQL responses for reliable data capture.

Residential Proxy Infrastructure

We maintain pools of residential ISP proxies to bypass aggressive retail rate-limiting, ensuring continuous inventory monitoring.

Cloud-Native Orchestration

Pipelines run on AWS Lambda and ECS. Airflow handles scheduling for high-frequency stock checks. All state stored in managed Postgres.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON
Newline-delimited or nested — schema versioned per run
CSV
Flat file with typed columns — Excel/Sheets compatible
Parquet
Columnar format for BigQuery, Snowflake, Athena
S3
Direct bucket delivery — compatible with any data lake
Webhook
HTTP POST per record for real-time stock alerts
BigQuery
Streamed directly into your dataset
Snowflake
Stage + COPY INTO workflow
// faq

Common questions.

About lego.com scraping, legality, and pipeline operations.

Ask us directly →
Is scraping Lego.com legal?

Scraping publicly available catalogue and pricing information is generally permissible under applicable law. DataFlirt targets only public, non-authenticated set metadata, inventory states, and reviews. We do not extract personal data or circumvent authentication walls. Clients should review Lego's ToS and consult legal counsel.

How do you handle rate limits on inventory checks?

We use residential ISP proxies and precise request timing modelled on human behaviour. By intercepting GraphQL queries rather than brute-forcing HTML loads, we minimise the footprint of our extraction while maintaining high-frequency stock monitoring.

Can you track the 'Retiring Soon' status?

Yes. We specifically monitor and extract lifecycle flags including 'Retiring Soon', 'Hard to Find', and 'New', alongside exact backorder fulfillment dates.

Do you extract the Pick a Brick catalogue?

Yes. We scrape the entire Pick a Brick database, including design IDs, element IDs, exact colour taxonomies, weight, and per-piece pricing.

Can I track pricing across different countries?

Yes. We can configure pipelines to extract data from specific regional subdomains (e.g., en-gb, en-us, de-de), capturing local currency pricing and regional stock availability.

What is the minimum viable engagement?

Our packages start at defined theme lists or the complete active set catalogue with daily delivery. High-frequency stock monitoring (hourly or minute-level) is priced based on compute and proxy volume.

$ dataflirt scope --new-project --source=lego.com ready

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a daily catalogue dump or real-time inventory alerts across regions — we scope, build, and operate the pipeline. Tell us what you need.

hello@dataflirt.com · Bengaluru · IST · typical reply < 4h
Services

Data Extraction for Every Industry

View All Services →