SYSTEM all green source unacademy.com queue 12,483 pages p99 latency 184ms dataflirt.com · scraper/unacademy-com
RUN · 42 active pipelines · unacademy.com live

Unacademy data,
at warehouse scale.

We extract educator metrics, batch schedules, course syllabi, pricing, and learner reviews from Unacademy. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Snowflake on your cadence.

Courses extracted
84.2K /month
Educator profiles
14.5K /run
Batch schedules
218K /24h
Active pipelines
42
Uptime
99.98%
Data Dictionary

Every field we extract from unacademy.com

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Educator Profiles objects from unacademy.com. All fields typed and schema-versioned.

educator_idnamebiofollowers_countwatch_minutes_30dwatch_minutes_lifetimebadges_earnedcourses_countratingprofile_url
educator_profiles
● 200 OK
"educator_id": "EDU-98421",
"name": "Mrunal Patel",
"followers_count": 892401,
"watch_minutes_30d": 4500000,
"watch_minutes_lifetime": 128000000,
"courses_count": 42,
"rating": 4.9,
"badges_earned": "['Legend', 'Top Educator']"
# educator_idnamebiofollowers_countwatch_minutes_30dwatch_minutes_lifetime
1
2
3

Complete list of extractable fields for Courses & Batches objects from unacademy.com. All fields typed and schema-versioned.

batch_idtitletarget_examlanguagestart_dateend_dateeducator_idssyllabus_modulesprice_tierstatus
courses_& batches
● 200 OK
"batch_id": "BCH-44192",
"title": "Comprehensive Batch for UPSC CSE 2025",
"target_exam": "UPSC CSE",
"language": "Hinglish",
"start_date": "2024-06-15",
"end_date": "2025-05-20",
"price_tier": "Plus",
"status": "Active"
# batch_idtitletarget_examlanguagestart_dateend_date
1
2
3

Complete list of extractable fields for Live Classes objects from unacademy.com. All fields typed and schema-versioned.

class_idtitleeducator_idstart_timeduration_minutestopicexam_categoryis_freestatus
live_classes
● 200 OK
"class_id": "LC-77310",
"title": "Indian Economy: Monetary Policy Review",
"educator_id": "EDU-98421",
"start_time": "2024-10-12T18:00:00Z",
"duration_minutes": 120,
"topic": "Economy",
"is_free": true,
"status": "Scheduled"
# class_idtitleeducator_idstart_timeduration_minutestopic
1
2
3

Complete list of extractable fields for Test Series objects from unacademy.com. All fields typed and schema-versioned.

test_idtitleexam_categorytotal_testsenrolled_countprice_tierratingstart_dateschedule_url
test_series
● 200 OK
"test_id": "TS-1104",
"title": "NEET UG 2025 All India Mock Test Series",
"exam_category": "NEET UG",
"total_tests": 15,
"enrolled_count": 45192,
"price_tier": "Lite",
"rating": 4.7,
"start_date": "2024-08-01"
# test_idtitleexam_categorytotal_testsenrolled_countprice_tier
1
2
3

Complete list of extractable fields for Pricing & Subscriptions objects from unacademy.com. All fields typed and schema-versioned.

plan_idexam_categoryduration_monthstier_nameoriginal_pricediscounted_pricediscount_pctfeatures_includedcurrency
pricing_& subscriptions
● 200 OK
"plan_id": "SUB-UPSC-12M-ICONIC",
"exam_category": "UPSC CSE",
"duration_months": 12,
"tier_name": "Iconic",
"original_price": 119999.0,
"discounted_price": 89999.0,
"discount_pct": 25,
"currency": "INR"
# plan_idexam_categoryduration_monthstier_nameoriginal_pricediscounted_price
1
2
3

Capabilities

Extract the entire EdTech taxonomy

Our Unacademy scraper navigates complex React state, GraphQL endpoints, and infinite scrolling to extract structured educator profiles, batch schedules, and pricing data.

Educator Metrics Extraction

Track follower counts, 30-day watch minutes, lifetime watch minutes, and educator badges to identify rising talent and platform engagement.

Course & Batch Details

Extract complete syllabi, module breakdowns, start/end dates, target exams, and language mediums for all active and upcoming batches.

Live Class Timings

Monitor schedules for free Special Classes and paid Plus/Iconic sessions, including educator mapping and topic categorisation.

Test Series Metadata

Capture mock test schedules, enrollment counts, and difficulty levels across UPSC, JEE, NEET, and state board categories.

Pricing & Subscription Tiers

Track dynamic pricing, discount percentages, and feature differences across Plus, Iconic, and Lite subscription tiers.

Category Taxonomy Mapping

Navigate Unacademy's deep category tree to map courses and educators to specific exams and sub-topics.

Learner Reviews & Ratings

Extract aggregated star ratings, written feedback, and upvotes on educator profiles and completed courses.

Educator Activity Tracking

Monitor class frequency, new course launches, and schedule adherence for specific educators over time.

Scheduled Change Detection

Run daily or weekly pipelines to capture diffs in pricing, new batch announcements, and watch minute growth.

// engagement pipeline

From category URL to warehouse record

Brief in. Clean data out.

Define Scope
d 0

Provide target exams (e.g., UPSC, NEET), educator lists, or specific batches. We design the extraction schema together.

Pipeline Build
d 2–4

We configure Playwright crawlers, GraphQL API interception, and proxy rotation to handle Unacademy's infrastructure.

Validation & QA
d 4–6

Schema validation, null-rate checks, and data type enforcement before full launch.

Delivery
ongoing

JSON / CSV / Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.

Under the hood

How our Unacademy pipeline handles the hard parts

Unacademy relies on heavy client-side rendering and dynamic API endpoints. Here is how we maintain stable extraction.

pipeline-monitor · unacademy.com · live ● active
// fingerprinting
Identity rotation
TLS fingerprintrandomised
User-agentrotated
IP poolresidential
Challenges blocked0
// pagination
Page coverage
48,291 pages queued running
// observability
Pipeline health
99.9%
uptime
142ms
p99 lat
0.3%
null rate
2
alerts
SPA rendering
Playwright for Next.js hydration

Unacademy is a modern Single Page Application. We use Playwright to execute JavaScript, wait for React state hydration, and trigger lazy-loaded components before parsing the DOM.

API interception
Capturing raw JSON from GraphQL endpoints

Rather than scraping fragile HTML, our pipeline intercepts Unacademy's internal GraphQL and REST API responses during page load, extracting clean, structured JSON payloads directly.

Infinite scroll
Handling paginated batch and educator lists

Educator lists and course catalogues load dynamically via infinite scroll. We simulate user scroll behaviour and capture the subsequent XHR requests to ensure complete coverage of the category.

Change detection
Diffing watch minutes and follower counts

For tracking educator performance over time, we maintain state across pipeline runs. You receive a clean changelog of watch minute growth and follower acquisition rather than full daily re-dumps.

Anti-bot layer
Residential proxies to bypass rate limits

Unacademy rate-limits aggressive IP addresses. We distribute requests across a pool of Indian residential proxies, matching the geographic origin expected by the platform's load balancers.

Applications

Who uses Unacademy data — and how

Teams across industries use unacademy.com data to build competitive products and smarter operations.

01
EdTech Competitor Intelligence

Rival platforms track Unacademy's pricing, discount strategies, and new batch launches to inform their own product roadmaps.

02
Educator Talent Acquisition

EdTech recruiters monitor watch minutes, follower growth, and engagement metrics to identify and poach top-performing educators.

03
Market Expansion Analysis

Strategy teams analyse course volumes and educator density across categories (e.g., UPSC vs State PSC) to identify underserved exam markets.

04
Pricing Strategy

Analysts monitor subscription tier pricing, promotional periods, and duration discounts to understand EdTech monetisation trends.

05
Content Gap Analysis

Curriculum designers parse syllabi and batch schedules to find missing topics or emerging subjects in the test prep space.

06
Investment Due Diligence

Private equity firms and analysts track active batch volumes, educator retention, and pricing stability to evaluate platform health.

Why DataFlirt

"Unacademy hosts the most comprehensive dataset of Indian test prep activity, educator performance, and learner engagement — accessible only if you build the extraction infrastructure."

Extracting data from modern EdTech platforms requires handling complex state management, dynamic API payloads, and aggressive rate limiting. DataFlirt manages the proxies, browser sessions, and schema maintenance so your engineering team can focus on deriving insights from educator metrics and course catalogues.

Technical Spec

Unacademy scraper — technical capabilities

Everything supported by our unacademy.com scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

JavaScript rendering
Full Playwright sessions to handle Next.js hydration and dynamic routing
Supported
GraphQL API interception
Direct extraction of structured JSON from internal API responses
Supported
Residential proxy rotation
ISP-grade residential IPs from India to prevent rate limiting
Supported
Change detection (diffs)
Track daily changes in watch minutes, followers, and pricing
Supported
Educator metrics tracking
Capture lifetime and 30-day watch minutes, badges, and follower counts
Supported
Course syllabus extraction
Deep extraction of module breakdowns and class schedules
Supported
Subscription pricing history
Track Plus, Iconic, and Lite tier pricing across all exam categories
Supported
Paid video content download
Extraction of DRM-protected video files or live streams
Partial
User-specific mock test results
Requires authenticated learner credentials and contains PII
Partial
Internal discussion forums
Gated behind active paid subscription walls
Partial
Infrastructure

Infrastructure powering the Unacademy pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheus
Scrapy + Playwright Stack

Scrapy handles crawl orchestration, deduplication, and retry logic. Playwright handles JavaScript rendering, infinite scrolling, and API interception.

API Interception & GraphQL

We bypass fragile DOM parsing by intercepting Unacademy's internal GraphQL and REST responses during page load, yielding cleaner, more reliable data.

Cloud-Native Orchestration

Pipelines run on AWS Lambda and ECS. Airflow handles scheduling, dependency management, and SLA alerting. All state stored in managed Postgres.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON
Newline-delimited or nested — schema versioned per run
CSV
Flat file with typed columns — Excel/Sheets compatible
XLS
Excel format for business analyst workflows
Parquet
Columnar format for BigQuery, Snowflake, Athena
AWS S3
Direct bucket delivery — compatible with any data lake
Webhook
HTTP POST per record for real-time downstream processing
API
REST endpoint to query your extracted datasets
BigQuery
Streamed directly into your dataset with schema auto-detect
Snowflake
Stage + COPY INTO workflow — incremental or full-replace
PostgreSQL
Upsert into your existing schema with conflict resolution
S3
Direct bucket delivery — compatible with any data lake
// faq

Common questions.

About unacademy.com scraping, legality, and pipeline operations.

Ask us directly →
Is scraping Unacademy legal?

Scraping publicly available information from Unacademy is generally permissible. DataFlirt targets only public, non-authenticated educator profiles, course syllabi, and pricing data. We do not extract personal learner data, circumvent DRM on video content, or violate copyright law.

How do you handle Unacademy's dynamic React interface?

We use Playwright to execute full browser sessions, allowing React to hydrate the DOM. More importantly, we intercept the underlying GraphQL and REST API calls made by the frontend, extracting the raw JSON payloads directly for higher reliability.

Can you track educator watch minutes over time?

Yes. We can configure daily or weekly pipeline runs to capture '30-day watch minutes', 'lifetime watch minutes', and 'follower count'. We store the state and deliver a time-series dataset showing growth metrics.

Do you extract actual video content or class recordings?

No. We extract metadata about the classes (titles, educators, durations, schedules, topics) but we do not download, store, or distribute DRM-protected video content or live streams.

What is the minimum viable engagement?

Our minimum engagement typically starts with a defined set of exam categories (e.g., UPSC, NEET, JEE) or a specific list of educators, with weekly delivery. Contact us with your specific scope for pricing.

Can I request a sample dataset before committing?

Absolutely. We provide a sample run of up to 100 educator profiles or 50 course batches as part of the pre-engagement scoping process, allowing you to validate the schema and data quality.

How fresh is the schedule data for live classes?

For active tracking pipelines, we can configure hourly or daily runs to capture new class announcements, schedule changes, and live status updates with minimal latency.

$ dataflirt scope --new-project --source=unacademy.com ready

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off dump of educator profiles or a continuous feed of batch schedules and pricing — we scope, build, and operate the pipeline. Tell us what you need.

hello@dataflirt.com · Bengaluru · IST · typical reply < 4h
Services

Data Extraction for Every Industry

View All Services →