SYSTEM all green source cb2.com queue 4,192 pages p99 latency 184ms dataflirt.com · scraper/cb2-com
RUN - 14 active pipelines - cb2.com live

CB2 catalogue data,
at warehouse scale.

We extract furniture listings, designer collaborations, material specifications, and real-time inventory from CB2. Delivered as clean JSON, CSV, or Parquet to S3 or BigQuery on your cadence.

Products extracted
14.2K /day
Price updates
28.5K /24h
Lookbooks parsed
412 /run
Active pipelines
14
Uptime
99.98%
Data Dictionary

Every field we extract from cb2.com

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Product Listings objects from cb2.com. All fields typed and schema-versioned.

skunamecategorysub_categorybase_pricedesignercolourdimensionsimage_urlsoverviewurl
product_listings
● 200 OK
"sku": "439182",
"name": "Boucle Sofa",
"category": "Furniture",
"sub_category": "Sofas",
"base_price": 1999.0,
"designer": "Gwyneth Paltrow",
"colour": "Ivory",
"dimensions": "84 W x 36 D x 30 H"
# skunamecategorysub_categorybase_pricedesigner
1
2
3

Complete list of extractable fields for Pricing & Inventory objects from cb2.com. All fields typed and schema-versioned.

skucurrent_priceoriginal_priceclearance_flagin_stockstock_status_messagedelivery_estimatezip_codecurrencytimestamp
pricing_& inventory
● 200 OK
"sku": "439182",
"current_price": 1799.0,
"original_price": 1999.0,
"clearance_flag": true,
"in_stock": true,
"stock_status_message": "In Stock and Ready to Ship",
"delivery_estimate": "3-5 Business Days",
"currency": "USD"
# skucurrent_priceoriginal_priceclearance_flagin_stockstock_status_message
1
2
3

Complete list of extractable fields for Designer Collections objects from cb2.com. All fields typed and schema-versioned.

collection_namedesigner_nameexclusive_collaborationcollection_urlitem_countdescriptionactive_datesmaterialsfeatured_skus
designer_collections
● 200 OK
"collection_name": "Goop x CB2",
"designer_name": "Gwyneth Paltrow",
"exclusive_collaboration": true,
"item_count": 42,
"description": "A curated collection of modern elegance.",
"materials": "['Boucle', 'Brass', 'Marble']",
"featured_skus": "['439182', '439185']"
# collection_namedesigner_nameexclusive_collaborationcollection_urlitem_countdescription
1
2
3

Complete list of extractable fields for Lookbooks & Rooms objects from cb2.com. All fields typed and schema-versioned.

lookbook_idtitleroom_typeaestheticimage_urltagged_skustotal_room_costdesigner_notesseason
lookbooks_& rooms
● 200 OK
"lookbook_id": "LB-2023-Fall-04",
"title": "Modern Parisian Living Room",
"room_type": "Living Room",
"aesthetic": "Modern Parisian",
"tagged_skus": "['439182', '882104', '119283']",
"total_room_cost": 4550.0,
"season": "Fall 2023"
# lookbook_idtitleroom_typeaestheticimage_urltagged_skus
1
2
3

Complete list of extractable fields for Specifications objects from cb2.com. All fields typed and schema-versioned.

skumaterialfinishcare_instructionsassembly_requiredweightoriginwarning_textcertifications
specifications
● 200 OK
"sku": "439182",
"material": "Polyester Boucle",
"finish": "Matte Black Legs",
"care_instructions": "Spot clean with mild detergent",
"assembly_required": false,
"weight": "125 lbs",
"origin": "Imported",
"certifications": "['FSC Certified Wood']"
# skumaterialfinishcare_instructionsassembly_requiredweight
1
2
3

Capabilities

Extracting CB2 data with architectural precision

Our CB2 scraper navigates complex product variants, designer collections, and dynamic inventory systems. We handle the rendering and session management required to extract complete specification data.

Variant & Fabric Mapping

Extract every colour, fabric, and configuration option for made-to-order furniture, linking parent SKUs to specific variant pricing and lead times.

Dimensional Data Parsing

Capture width, depth, height, and seat height as structured numeric fields rather than raw text blocks.

Designer Collaborations

Track exclusive collections from Kravitz Design, Goop, Paul McCobb, and others, mapping items back to their respective campaigns.

Inventory & Lead Times

Monitor stock availability, backorder dates, and delivery estimates based on specific zip codes and fulfillment centres.

Lookbook Deconstruction

Parse 'Shop the Room' and Lookbook pages to extract tagged SKUs, room aesthetics, and aggregate pricing for curated spaces.

Clearance & Sale Tracking

Identify clearance items, promotional pricing, and limited-time discounts across the entire catalogue.

Material Specifications

Extract detailed material compositions, finish types, care instructions, and origin data for every product.

Regional Pricing

Capture pricing variations across different geographic regions and shipping zones.

Automated Diffing

Receive only updated records when prices change, new items are added, or stock statuses shift, reducing processing overhead.

// engagement pipeline

From category URL to warehouse record

Brief in. Clean data out.

Define Scope
d 0

Provide CB2 categories, specific designer collections, or search terms. We define the schema together.

Pipeline Build
d 2–4

We configure crawlers to handle CB2's dynamic loading, variant selectors, and image galleries.

Validation & QA
d 4–6

Schema validation, null-rate checks, and dimension parsing tests before full launch.

Delivery
ongoing

Structured data pushed to your S3 bucket, BigQuery dataset, or via Webhook on your defined schedule.

Under the hood

Navigating CB2's digital storefront

Extracting home decor data requires handling complex product configurations and visual-heavy pages. Here is how we maintain data integrity.

pipeline-monitor · cb2.com · live ● active
// fingerprinting
Identity rotation
TLS fingerprintrandomised
User-agentrotated
IP poolresidential
Challenges blocked0
// pagination
Page coverage
48,291 pages queued running
// observability
Pipeline health
99.9%
uptime
142ms
p99 lat
0.3%
null rate
2
alerts
Dynamic variants
Handling made-to-order configurations

CB2 sofas and beds often have dozens of fabric and leg combinations. We execute JavaScript to trigger variant selections, capturing the specific price, SKU, and lead time for every possible configuration.

Data structuring
Normalising dimensional text

Furniture dimensions are often presented as unstructured text. Our pipeline uses regex and NLP to parse '84"Wx36"Dx30"H' into distinct, numeric width, depth, and height columns.

Visual data
High-resolution image extraction

We extract the highest resolution image URLs for primary photos, lifestyle shots, and detailed fabric swatches, bypassing lazy-loaded thumbnails.

Inventory tracking
Location-based availability

Stock status on CB2 varies by delivery location. We inject specific zip codes into the session to extract accurate delivery estimates and backorder dates for your target regions.

Schema stability
Resilient DOM parsing

Retail sites update their front-end frequently. We rely on underlying JSON APIs and structured data objects where possible, using DOM parsing only as a secondary fallback.

Applications

Who uses CB2 data - and why

Teams across industries use cb2.com data to build competitive products and smarter operations.

01
Competitor Price Tracking

Home decor retailers monitor CB2 pricing, clearance cycles, and promotional events to adjust their own merchandising strategies.

02
Assortment Planning

Merchandisers analyse category depth, material trends, and colour palettes across CB2 collections to inform product development.

03
Interior Design Platforms

Aggregators and design apps ingest CB2 product catalogues to offer accurate 3D modeling, pricing, and purchasing options to their users.

04
Trend Forecasting

Analysts track the introduction of new designer collaborations and material shifts to identify emerging interior design trends.

05
Supply Chain Analysis

Logistics teams monitor backorder dates and out-of-stock rates to gauge macroeconomic supply chain health in the furniture sector.

06
Marketplace Aggregation

Affiliate sites and home goods aggregators maintain synchronised listings with accurate pricing and availability.

Why DataFlirt

"In the furniture sector, dimensions, materials, and lead times are just as critical as price. Extracting this data accurately requires a pipeline built for complex retail structures."

Scraping a modern furniture retailer involves navigating endless variant combinations, dynamic inventory checks, and unstructured specification text. DataFlirt manages the JavaScript rendering, session state, and data normalisation required to deliver clean, structured interior design catalogues.

Technical Spec

CB2 scraper - technical specifications

Everything supported by our cb2.com scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

JavaScript rendering
Required for variant selection, pricing updates, and image galleries
Supported
Variant mapping
Extracts all fabric, colour, and size combinations per product
Supported
Dimension parsing
Converts text dimensions into structured numeric fields
Supported
Zip code injection
Session modification to check stock for specific regions
Supported
Lookbook extraction
Maps tagged products to curated room scenes
Supported
Review extraction
Captures customer ratings, review text, and helpful votes
Supported
Change detection
Emits only records with changed fields since the last run
Supported
CB2 Trade Program pricing
Exclusive discounts requiring an approved interior designer account
Partial
User Wishlists & Carts
Private user data requiring authentication
Partial
Infrastructure

Infrastructure powering the CB2 pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheus
Playwright Integration

Executes full browser sessions to interact with fabric selectors and dynamic pricing modules, capturing data hidden from standard HTTP requests.

Data Normalisation

Custom Python parsing logic converts inconsistent retail text into strict numeric types for dimensions, weights, and pricing.

Orchestrated Delivery

Airflow manages the dependency chain, ensuring categories are scraped, variants mapped, and diffs calculated before warehouse delivery.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON
Nested structures ideal for complex variant mapping
CSV
Flat files for immediate analyst use
XLS
Spreadsheet format for merchandising teams
Parquet
Columnar storage for efficient warehouse querying
AWS S3
Direct bucket upload on completion
Webhook
HTTP POST for real-time inventory alerts
API
REST endpoints to query your extracted dataset
BigQuery
Direct streaming into your GCP environment
S3
Direct bucket delivery — compatible with any data lake
// faq

Common questions.

About cb2.com scraping, legality, and pipeline operations.

Ask us directly →
Can you extract all fabric options for a single sofa?

Yes. Our pipeline iterates through all available fabric and colour combinations on a product page, capturing the specific price, SKU, and lead time for each variant.

How do you handle dimensions?

We use custom parsing logic to extract width, depth, height, and seat height from the unstructured text blocks on CB2, delivering them as clean numeric fields in your database.

Can you check inventory for specific locations?

Yes. We can inject target zip codes into the scraping session to extract accurate delivery estimates and stock availability for specific regions.

Do you extract data from Lookbooks and 'Shop the Room' pages?

Yes. We map the curated lifestyle images to their tagged product SKUs, allowing you to reconstruct the room aesthetic and calculate aggregate room costs.

How frequently can you update pricing and stock?

We can configure pipelines to run daily for the entire catalogue, or at higher frequencies for specific high-priority SKUs.

Do you scrape customer reviews?

Yes. We extract star ratings, review text, submission dates, and helpful votes across all paginated review sections on a product page.

Can I get historical pricing data?

We begin tracking pricing history from the moment your pipeline is activated. We do not have access to historical pricing prior to pipeline initiation.

$ dataflirt scope --new-project --source=cb2.com ready

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. From dimensional specs to real-time inventory, we build and manage the pipeline. Tell us your data requirements and delivery cadence.

hello@dataflirt.com · Bengaluru · IST · typical reply < 4h
Services

Data Extraction for Every Industry

View All Services →