SYSTEM all green source 100layercake.com queue 3,194 pages p99 latency 185ms dataflirt.com · scraper/100layercake-com
RUN · 14 active pipelines · 100layercake.com live

Event vendor data,
at warehouse scale.

We extract A-List vendor directories, venue profiles, real wedding galleries, and event metadata from 100Layercake. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Snowflake on your cadence.

Vendor profiles
14.2K /run
Real weddings
8.4K /run
Image URLs
412K /run
Active pipelines
14
Uptime
99.98%
Data Dictionary

Every field we extract from 100layercake.com

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Vendor Profiles objects from 100layercake.com. All fields typed and schema-versioned.

vendor_idnamecategorylocationwebsite_urlinstagram_handledescriptionfeatured_weddings_countprofile_image_urlcontact_email
vendor_profiles
● 200 OK
"vendor_id": "v-84920",
"name": "Wandering Floral Design",
"category": "Florist",
"location": "Los Angeles, CA",
"instagram_handle": "@wanderingflorals",
"featured_weddings_count": 12,
"website_url": "https://wanderingfloral.example.com"
# vendor_idnamecategorylocationwebsite_urlinstagram_handle
1
2
3

Complete list of extractable fields for Real Weddings objects from 100layercake.com. All fields typed and schema-versioned.

post_idtitlepublish_datelocationstyle_tagscolour_palettedescriptionimage_countphotographer_namevendor_credits
real_weddings
● 200 OK
"post_id": "rw-5921",
"title": "Modern Desert Wedding in Joshua Tree",
"publish_date": "2023-10-14",
"location": "Joshua Tree, California",
"style_tags": "['desert', 'modern', 'boho']",
"colour_palette": "['terracotta', 'sage', 'cream']",
"image_count": 45
# post_idtitlepublish_datelocationstyle_tagscolour_palette
1
2
3

Complete list of extractable fields for Venues objects from 100layercake.com. All fields typed and schema-versioned.

venue_idnamecitystatecapacity_maxvenue_typesettingdescriptionwebsite_urlimage_urls
venues
● 200 OK
"venue_id": "vn-1044",
"name": "The Fig House",
"city": "Los Angeles",
"state": "CA",
"capacity_max": 250,
"venue_type": "Event Space",
"setting": "Indoor/Outdoor",
"website_url": "https://fighousela.example.com"
# venue_idnamecitystatecapacity_maxvenue_type
1
2
3

Complete list of extractable fields for Image Galleries objects from 100layercake.com. All fields typed and schema-versioned.

image_idpost_idimage_url_highresalt_textcategorydominant_colourwidthheightpin_countcredit_name
image_galleries
● 200 OK
"image_id": "img-993821",
"post_id": "rw-5921",
"image_url_highres": "https://100layercake.com/wp-content/uploads/2023/10/desert-arch.jpg",
"category": "Ceremony Backdrop",
"width": 1200,
"height": 1800,
"credit_name": "Sarah Smith Photography"
# image_idpost_idimage_url_highresalt_textcategorydominant_colour
1
2
3

Complete list of extractable fields for Blog Posts & DIY objects from 100layercake.com. All fields typed and schema-versioned.

post_idtitleauthorpublish_datecategorytagscontent_htmlmaterials_liststep_countcomment_count
blog_posts & diy
● 200 OK
"post_id": "diy-412",
"title": "How to make a dried floral installation",
"author": "Jillian Clark",
"category": "DIY",
"tags": "['floral', 'backdrop', 'tutorial']",
"step_count": 6,
"comment_count": 14
# post_idtitleauthorpublish_datecategorytags
1
2
3

Capabilities

Everything you need from 100Layercake - nothing you don't

Our 100Layercake scraper extracts structured vendor directories, nested event metadata, and high-resolution image galleries with complete credit mapping.

A-List Vendor Extraction

Extract full vendor profiles including names, categories, locations, website URLs, and Instagram handles from the A-List directory.

Real Wedding Parsing

Capture event titles, dates, locations, and descriptive text from real wedding features, structured into clean database rows.

Venue Specifications

Extract venue capacities, settings, locations, and contact information from the venue directory.

Image Gallery Scraping

Extract high-resolution image URLs, alt text, and dimensions from lazy-loaded blog galleries.

Vendor Credit Mapping

Parse unstructured blog text to map specific vendors and photographers to the events they serviced.

Event Style Classification

Extract style tags, categorisation labels, and colour palettes associated with featured events.

DIY Project Structuring

Parse tutorial posts into structured step-by-step arrays, including materials lists and instructional text.

Social Media Mapping

Extract embedded Instagram, Pinterest, and Facebook links for cross-platform vendor tracking.

Scheduled Updates

Run continuous pipelines to capture new blog posts, vendor additions, and venue updates as they are published.

// engagement pipeline

From URL list to warehouse record

Brief in. Clean data out.

Define Scope
d 0

Provide category URLs, vendor lists, or specific post types. We design the extraction schema together.

Pipeline Build
d 2–4

We configure Scrapy / Playwright crawlers, proxy rotation, session management, and CAPTCHA handling for 100layercake.com.

Validation & QA
d 4–6

Schema validation, null-rate checks, and credit mapping accuracy verification before full launch.

Delivery
ongoing

JSON / CSV / Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.

Under the hood

How our 100Layercake pipeline handles the hard parts

Extracting visual-heavy blogs requires handling complex DOM structures, lazy-loaded image galleries, and unstructured vendor credits.

pipeline-monitor · 100layercake.com · live ● active
// fingerprinting
Identity rotation
TLS fingerprintrandomised
User-agentrotated
IP poolresidential
Challenges blocked0
// pagination
Page coverage
48,291 pages queued running
// observability
Pipeline health
99.9%
uptime
142ms
p99 lat
0.3%
null rate
2
alerts
Unstructured credit mapping
NLP for vendor lists

Vendor credits in blog posts are often unstructured text blocks. We use regex patterns and NLP to parse these blocks into structured key-value pairs, linking specific roles (e.g., Florist) to the correct vendor entity.

Lazy-loaded galleries
Playwright scrolling execution

100Layercake uses infinite scroll and lazy-loading for large image galleries. Our Playwright instances execute the necessary JavaScript and scroll events to ensure all images are hydrated in the DOM before extraction.

CDN image resolution
Extracting maximum size

WordPress themes often serve compressed thumbnails by default. Our pipeline parses the srcset attributes to extract the highest resolution CDN URL available for every image.

Categorisation normalisation
Mapping tags and taxonomy

Blog tags can be messy. We normalise category strings and style tags into standard arrays, ensuring your downstream database remains clean and queryable.

Monitoring & alerting
Schema drift detection

Content-heavy sites change layouts frequently. We monitor selector success rates and trigger alerts if WordPress theme updates alter the DOM structure, deploying fixes before data drops occur.

Applications

Who uses 100Layercake data - and how

Teams across industries use 100layercake.com data to build competitive products and smarter operations.

01
Vendor Lead Generation

B2B SaaS companies targeting wedding professionals use extracted A-List directories to build targeted outreach lists.

02
Venue Competitive Analysis

Hospitality groups track venue capacities, settings, and featured events to benchmark against local competitors.

03
Trend Forecasting

Retailers and designers analyse colour palettes and style tags across real weddings to forecast upcoming seasonal trends.

04
Event Planning Aggregators

Marketplaces populate their local vendor and venue directories with structured data extracted from 100Layercake profiles.

05
AI Image Training

Machine learning teams use high-resolution wedding galleries categorised by style to train aesthetic classification models.

06
Social Media Benchmarking

Marketing agencies correlate featured vendors with their Instagram handles to analyse cross-platform engagement metrics.

Why DataFlirt

"100Layercake holds the definitive graph of event vendors, venues, and visual inspiration - but extracting structured relational data from blog posts requires deep DOM parsing."

Most teams underestimate the investment required: reliable blog scraping requires handling infinite scroll galleries, unstructured vendor credits, and constant WordPress theme updates. DataFlirt absorbs that complexity so your engineers can focus on the analysis - not the infrastructure.

Technical Spec

100Layercake scraper - technical capabilities

Everything supported by our 100layercake.com scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

Gallery lazy-loading
Full Playwright sessions to trigger scroll events and hydrate image carousels
Supported
High-res image extraction
Parsing srcset attributes to capture the maximum resolution CDN links
Supported
Vendor credit parsing
Regex and NLP mapping of unstructured text blocks to vendor entities
Supported
Social link extraction
Capture of Instagram, Facebook, and Pinterest URLs from vendor profiles
Supported
WordPress REST API fallback
Querying exposed WP-JSON endpoints for cleaner metadata when available
Supported
Change detection (diffs)
Hash-based diff: only emit records with changed fields since last run
Supported
Webhook delivery
HTTP POST per record or batch for real-time downstream processing
Supported
Private saved boards
Extraction of user account credentials or private saved inspiration boards
Partial
Vendor direct messages
Reading private inquiries or direct communications sent through the platform
Partial
Infrastructure

Infrastructure powering the 100Layercake pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheus
Scrapy + Playwright Stack

Scrapy handles crawl orchestration, deduplication, and retry logic. Playwright handles JavaScript rendering, infinite scroll, and interaction flows. Combined via scrapy-playwright middleware.

Residential Proxy Infrastructure

We maintain pools of residential ISP proxies across US regions. Rotation happens per-request with sticky sessions where required. IP score monitoring prevents blacklisted pool contamination.

Cloud-Native Orchestration

Pipelines run on AWS Lambda (burst) and ECS (sustained). Airflow handles scheduling, dependency management, and SLA alerting. All state stored in managed Postgres.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON
Newline-delimited or nested - schema versioned per run
CSV
Flat file with typed columns - Excel/Sheets compatible
XLS
Standard Excel workbook format for business teams
Parquet
Columnar format for BigQuery, Snowflake, Athena
AWS S3
Direct bucket delivery - compatible with any data lake
Webhook
HTTP POST per record for real-time downstream processing
API
REST endpoint to query your extracted datasets
BigQuery
Streamed directly into your dataset with schema auto-detect
S3
Direct bucket delivery — compatible with any data lake
// faq

Common questions.

About 100layercake.com scraping, legality, and pipeline operations.

Ask us directly →
Is scraping 100Layercake legal?

Scraping publicly available information from 100Layercake is generally permissible under applicable law. DataFlirt targets only public, non-authenticated vendor profiles, venue specifications, and blog posts. We do not extract personal user data or circumvent authentication walls. Clients should review terms of service and consult legal counsel for specific use cases.

How do you handle lazy-loaded image galleries?

We use Playwright to execute full browser sessions, triggering the necessary scroll events and JavaScript execution to ensure all image nodes are hydrated in the DOM before we extract the URLs.

Can you extract high-resolution images?

Yes. We parse the srcset attributes within the image tags to identify and extract the highest resolution CDN URL available, rather than capturing compressed thumbnails.

How accurate is the vendor credit mapping?

We use custom regex patterns and NLP to parse unstructured credit blocks at the end of blog posts. While highly accurate, we continuously monitor and refine these patterns to account for variations in how authors format their text.

Can I get historical blog post data?

Yes. We can configure a backfill pipeline to traverse the archive and extract historical real weddings, DIY posts, and venue features dating back to the site's inception.

What is the minimum viable engagement?

Our minimum engagement starts at a full extraction of the A-List vendor directory or a defined historical backfill of blog posts. Contact us with your specific data requirements for a scoped quote.

$ dataflirt scope --new-project --source=100layercake.com ready

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off vendor directory dump or a continuous feed of new wedding inspiration galleries - we scope, build, and operate the pipeline. Tell us what you need.

hello@dataflirt.com · Bengaluru · IST · typical reply < 4h
Services

Data Extraction for Every Industry

View All Services →