SYSTEM all green source nykaa.com queue 14,882 pages p99 latency 154ms dataflirt.com · scraper/nykaa-com
RUN · 81 active pipelines · nykaa.com live

Nykaa data,
beauty intelligence at scale.

We extract product listings, ingredient lists, shade matrices, pricing signals, influencer-attributed reviews, and brand catalogue data from Nykaa. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Snowflake on your cadence.

Products extracted
680K /day
Price updates
2.9M /24h
Review records
310K /run
Active pipelines
81
Uptime
99.95%
Data Dictionary

Every field we extract from nykaa.com

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Product Listings objects from nykaa.com. All fields typed and schema-versioned.

product_idtitlebrandcategorysub_categorypricemrpcurrencydiscount_pctin_stockfree_shipping_eligibleratingreview_countwishlist_countshade_countshade_namesfinish_typeskin_type_tagsconcern_tagskey_ingredientsingredient_listhow_to_usedescriptionimage_urlsvideo_urlpage_urlcountry_of_originnet_quantityexpiry_period
product_listings
● 200 OK
"product_id": "NYK-91824",
"title": "Lakme 9to5 Weightless Matte Mousse Lip & Cheek Color",
"brand": "Lakme",
"price": 395,
"mrp": 499,
"currency": "INR",
"discount_pct": 21,
"shade_count": 14,
"finish_type": "Matte",
"skin_type_tags": "All Skin Types",
"rating": 4.3,
"review_count": 11482
# product_idtitlebrandcategorysub_categoryprice
1
2
3

Complete list of extractable fields for Ingredients & Formulation objects from nykaa.com. All fields typed and schema-versioned.

product_idingredient_listkey_ingredientsfree_from_claimscertificationsskin_type_tagsconcern_tagsspf_valuefinish_typecoverage_levelformulation_typeph_levelcruelty_freevegandermatologist_testedcountry_of_originexpiry_period
ingredients_& formulation
● 200 OK
"product_id": "NYK-91824",
"key_ingredients": "Vitamin E, Shea Butter, Hyaluronic Acid",
"free_from_claims": "Paraben-Free, Sulphate-Free",
"cruelty_free": true,
"vegan": false,
"dermatologist_tested": true,
"skin_type_tags": "Dry, Normal",
"concern_tags": "Pigmentation, Dullness",
"expiry_period": "24 months"
# product_idingredient_listkey_ingredientsfree_from_claimscertificationsskin_type_tags
1
2
3

Complete list of extractable fields for Reviews & Ratings objects from nykaa.com. All fields typed and schema-versioned.

review_idproduct_idreviewer_nameverified_purchasestar_ratingreview_titlereview_bodyreview_datehelpful_votesshade_reviewedskin_type_self_reportedskin_tone_self_reportedimage_urlsinfluencer_flag
reviews_& ratings
● 200 OK
"review_id": "nyk_rv_3301872",
"product_id": "NYK-91824",
"star_rating": 5,
"verified_purchase": true,
"shade_reviewed": "Rose Rush",
"skin_type_self_reported": "Combination",
"skin_tone_self_reported": "Medium",
"influencer_flag": false,
"review_date": "2026-04-14"
# review_idproduct_idreviewer_nameverified_purchasestar_ratingreview_title
1
2
3

Complete list of extractable fields for Brand Catalogue objects from nykaa.com. All fields typed and schema-versioned.

brand_idbrand_namebrand_urlnykaa_exclusivebrand_origin_countrytotal_productsavg_ratingtotal_reviewscategory_presenceprice_tierbrand_descriptionbrand_storyis_luxuryis_indieis_ayurvedicfeatured_on_nykaa_homepage
brand_catalogue
● 200 OK
"brand_id": "dot-key",
"brand_name": "Dot & Key",
"nykaa_exclusive": true,
"brand_origin_country": "India",
"total_products": 148,
"avg_rating": 4.4,
"price_tier": "mid-premium",
"is_indie": true,
"is_ayurvedic": false
# brand_idbrand_namebrand_urlnykaa_exclusivebrand_origin_countrytotal_products
1
2
3

Complete list of extractable fields for Search & Rankings objects from nykaa.com. All fields typed and schema-versioned.

keywordcategory_pathpositionproduct_idtitlebrandpriceratingreview_countbestseller_badgenykaa_choice_badgenykaa_exclusive_badgenew_launch_badgesponsoredthumbnail_urlscraped_at
search_& rankings
● 200 OK
"keyword": "vitamin c serum",
"position": 1,
"product_id": "NYK-91824",
"bestseller_badge": true,
"nykaa_choice_badge": true,
"nykaa_exclusive_badge": false,
"sponsored": false,
"price": 395,
"scraped_at": "2026-05-12T07:30:11Z"
# keywordcategory_pathpositionproduct_idtitlebrand
1
2
3

Capabilities

Everything you need from Nykaa — nothing you don't

Nykaa is India's most data-rich beauty platform. Our scraper goes beyond price and title — capturing ingredient lists, shade matrices, skin-type compatibility tags, and the influencer-review layer that drives purchasing decisions.

Full Formulation Data

Ingredient lists, key actives, free-from claims, cruelty-free and vegan flags, SPF values, finish type, and dermatologist-tested badges — the data beauty R&D and compliance teams actually need.

Shade & Variant Matrix

Every shade name, shade hex code, finish, coverage level, and stock status — mapped from parent product to individual SKU. Essential for shade gap analysis and trend tracking.

Price & Discount Tracking

Capture price, MRP, discount percentage, Nykaa sale pricing, free-shipping eligibility, and Pink Friday / End of Season Sale events — timestamped per crawl.

Review Mining with Skin Context

Reviews include self-reported skin type, skin tone, and shade reviewed — making Nykaa reviews uniquely valuable for formulation validation and personalisation models.

Brand Intelligence

Brand origin, Nykaa-exclusive status, price tier, luxury / indie / ayurvedic classification, homepage featuring, and full product catalogue — per brand.

SERP & Category Rank Tracking

Track organic vs sponsored position for any keyword or category — with Bestseller, Nykaa's Choice, and New Launch badge capture.

Concern & Skin Type Taxonomy

Products are tagged with skin concern (acne, pigmentation, dullness, ageing) and skin type (oily, dry, combination, sensitive) — critical for building personalisation recommendation layers.

Sale Event Monitoring

Monitor Pink Friday, End of Season Sale, Nykaa Birthday, and flash sale price movements — with pre/during/post event snapshots per SKU.

Multi-Property Coverage

Nykaa Beauty, NykaaMan, and Nykaa Fashion covered from a single pipeline — normalised into a consistent schema with property-level tagging.

// engagement pipeline

From brand catalogue to warehouse record

Brief in. Clean data out.

Define Scope
d 0

Provide brand lists, category URLs, keyword sets, or specific product IDs. We design the extraction schema — including which formulation fields matter most.

Pipeline Build
d 2–4

We configure Scrapy / Playwright crawlers with Indian residential proxies, shade-variant traversal logic, and ingredient text parsing.

Validation & QA
d 4–6

Schema validation, ingredient null-rate checks, shade-count verification, and sample review quality review before full launch.

Delivery
ongoing

JSON / CSV / Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.

Under the hood

How our Nykaa pipeline handles the hard parts

Nykaa's beauty catalogue has unique complexity — shade matrices, ingredient text, and influencer review layers that most scrapers flatten or miss entirely.

pipeline-monitor · nykaa.com · live ● active
// fingerprinting
Identity rotation
TLS fingerprintrandomised
User-agentrotated
IP poolresidential
Challenges blocked0
// pagination
Page coverage
48,291 pages queued running
// observability
Pipeline health
99.9%
uptime
142ms
p99 lat
0.3%
null rate
2
alerts
Shade matrix traversal
Every shade scraped — not just the default

Nykaa product pages load shade-specific price, stock status, and images via JavaScript interactions. Our Playwright sessions click through every shade option and record the resulting state — so you get a full SKU-level dataset, not just the parent product's default view.

Ingredient parsing
Structured formulation data from raw ingredient text

Ingredient lists on Nykaa are unstructured strings. Our post-processing pipeline parses INCI names, identifies key actives, and normalises free-from claims into structured fields — ready for formulation analysis or regulatory compliance checks.

JavaScript rendering
Full Playwright execution for dynamic content

Nykaa's product pages, shade swatches, and review sections are React-rendered. We run full Playwright sessions to capture lazy-loaded review content, dynamically injected pricing, and concern/skin-type filter tags that HTTP clients miss entirely.

Schema stability
Resilient selectors across Nykaa properties

Nykaa Beauty, NykaaMan, and Nykaa Fashion have different page structures. Our selector strategy uses multi-layer fallback chains per field and per property — so a layout change on one property doesn't break the others.

Monitoring & alerting
24/7 pipeline health with anomaly detection

Every run emits structured logs to our observability stack. We alert on ingredient null-rate spikes, shade-count drops, price outliers, and coverage gaps — and respond before you notice.

Applications

Who uses Nykaa data — and how

Teams across industries use nykaa.com data to build competitive products and smarter operations.

01
Beauty Brand Competitive Intelligence

Brands track competitor pricing, new launch velocity, shade range gaps, and review sentiment across categories — to inform product development and pricing strategy.

02
Ingredient & Formulation Research

R&D and regulatory teams extract ingredient lists and free-from claims at scale — to benchmark formulations, track trending actives, and support compliance audits.

03
Retail & Distribution Analytics

Distributors and retail buyers track which brands and SKUs are gaining shelf velocity on Nykaa — using review count growth and discount patterns as demand proxies.

04
AI Personalisation Models

ML teams train skin-type and concern-based recommendation engines using Nykaa's uniquely rich review metadata — skin tone, skin type, and shade reviewed per review.

05
Market Entry Research

International beauty brands use Nykaa data to identify whitespace in India's beauty market — by category, price tier, and ingredient positioning — before committing to distribution.

06
Investor & Analyst Due Diligence

Analysts track brand catalogue growth, review velocity, and indie-brand penetration on Nykaa as leading indicators for India's beauty market trajectory.

Why DataFlirt

"Nykaa's beauty catalogue is the richest source of formulation, shade, and consumer sentiment data in Indian eCommerce — but almost none of it is structured out of the box."

Extracting real value from Nykaa requires shade-level traversal, ingredient text parsing, skin-type taxonomy normalisation, and per-review metadata extraction. Most scraping tools stop at price and title. DataFlirt delivers a complete, formulation-aware Nykaa dataset — structured and ready for analysis.

Technical Spec

Nykaa scraper — technical capabilities

Everything supported by our nykaa.com scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

JavaScript rendering
Full Playwright sessions — required for shade swatches, pricing, and React-rendered content
Supported
Shade / SKU traversal
Playwright clicks through every shade option to capture per-SKU price, stock, and image
Supported
Ingredient text parsing
Post-processing pipeline normalises raw INCI strings into structured ingredient fields
Supported
CAPTCHA bypass
Automated 2Captcha + CapSolver integration with fallback to manual queue
Supported
Residential proxy rotation
Indian residential ISP IPs — rotated per request with sticky sessions where required
Supported
Multi-property coverage
Nykaa Beauty, NykaaMan, and Nykaa Fashion covered via unified pipeline
Supported
Sale event monitoring
Elevated cadence during Pink Friday, End of Season Sale, and flash events
Supported
Review pagination
Full review corpus with skin type, skin tone, and shade metadata per review
Supported
Brand catalogue scraping
All active products per brand, with brand-level metadata extraction
Supported
Sponsored ad detection
Distinguishes organic vs sponsored placements in category and search pages
Supported
Change detection (diffs)
Hash-based diff: only emit records with changed fields since last run
Supported
Authenticated account data
Wishlist, purchase history, and loyalty rewards require authenticated sessions
Partial
Infrastructure

Infrastructure powering the Nykaa pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential Proxies (IN)INCI ParserDockerKubernetesGrafanaPrometheus
Scrapy + Playwright Stack

Scrapy handles crawl orchestration, deduplication, and retry logic. Playwright drives shade traversal, JavaScript rendering, and cookie sessions. A custom INCI parser normalises ingredient strings into structured fields post-extraction.

Indian Residential Proxy Infrastructure

We maintain pools of Indian residential ISP proxies. Rotation happens per-request with sticky sessions for shade traversal flows. IP score monitoring prevents blacklisted pool contamination.

Cloud-Native Orchestration

Pipelines run on AWS Lambda (burst) and ECS (sustained). Airflow handles scheduling, dependency management, and SLA alerting. All state stored in managed Postgres.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON
Newline-delimited or nested — schema versioned per run
CSV
Flat file with typed columns — Excel/Sheets compatible
Parquet
Columnar format for BigQuery, Snowflake, Athena
S3
Direct bucket delivery — compatible with any data lake
BigQuery
Streamed directly into your dataset with schema auto-detect
Webhook
HTTP POST per record for real-time downstream processing
Postgres
Upsert into your existing schema with conflict resolution
Snowflake
Stage + COPY INTO workflow — incremental or full-replace
// faq

Common questions.

About nykaa.com scraping, legality, and pipeline operations.

Ask us directly →
Is scraping Nykaa legal?

Scraping publicly available information from Nykaa is generally permissible under applicable law in India — reinforced by precedents such as hiQ v. LinkedIn. DataFlirt targets only public, non-authenticated product, pricing, ingredient, and review data. We do not extract personal data or circumvent authentication walls. We recommend clients review Nykaa's ToS independently and consult legal counsel for specific use cases.

Can you extract ingredient lists in a structured format?

Yes. Our pipeline includes a post-processing INCI parser that normalises raw ingredient text into structured fields: ordered ingredient list, identified key actives, detected preservatives, and free-from claim extraction. Output is a clean array per product — not a raw string.

Do you capture every shade variant — including out-of-stock ones?

Yes. Our Playwright sessions traverse every shade option on a product page and record price, stock status, and shade-specific image URL for each — including shades that are currently out of stock. This gives you a complete shade matrix rather than just the in-stock default.

Can you track Nykaa sale events like Pink Friday?

Yes. We run elevated-frequency crawls during Nykaa Pink Friday, End of Season Sale, Nykaa Birthday, and flash sale events — capturing price, discount depth, and stock signals at the SKU level with pre/during/post event snapshots.

Do you cover NykaaMan and Nykaa Fashion as well?

Yes. Our pipeline covers Nykaa Beauty, NykaaMan, and Nykaa Fashion from a unified architecture — delivered via a single normalised schema with a property-level tag per record so you can filter by vertical downstream.

Can I request a sample dataset before committing?

Yes. We provide a sample run of up to 500 products — including full ingredient, shade, and review data — as part of pre-engagement scoping, so you can validate schema fit and data quality before signing any contract.

$ dataflirt scope --new-project --source=nykaa.com ready

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a full brand catalogue with formulation data, a continuous price-monitoring feed, or a shade-level SKU matrix — we scope, build, and operate the pipeline. Tell us what you need.

hello@dataflirt.com · Bengaluru · IST · typical reply < 4h
Services

Data Extraction for Every Industry

View All Services →