Unacademy Scraper — Educator, Course & Batch Data Extraction

Data Dictionary

Every field we extract from unacademy.com

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Educator Profiles objects from unacademy.com. All fields typed and schema-versioned.

educator_idnamebiofollowers_countwatch_minutes_30dwatch_minutes_lifetimebadges_earnedcourses_countratingprofile_url

"educator_id": "EDU-98421",
"name": "Mrunal Patel",
"followers_count": 892401,
"watch_minutes_30d": 4500000,
"watch_minutes_lifetime": 128000000,
"courses_count": 42,
"rating": 4.9,
"badges_earned": "['Legend', 'Top Educator']"

#	educator_id	name	bio	followers_count	watch_minutes_30d	watch_minutes_lifetime
1
2
3

Complete list of extractable fields for Courses & Batches objects from unacademy.com. All fields typed and schema-versioned.

batch_idtitletarget_examlanguagestart_dateend_dateeducator_idssyllabus_modulesprice_tierstatus

"batch_id": "BCH-44192",
"title": "Comprehensive Batch for UPSC CSE 2025",
"target_exam": "UPSC CSE",
"language": "Hinglish",
"start_date": "2024-06-15",
"end_date": "2025-05-20",
"price_tier": "Plus",
"status": "Active"

#	batch_id	title	target_exam	language	start_date	end_date
1
2
3

Complete list of extractable fields for Live Classes objects from unacademy.com. All fields typed and schema-versioned.

class_idtitleeducator_idstart_timeduration_minutestopicexam_categoryis_freestatus

"class_id": "LC-77310",
"title": "Indian Economy: Monetary Policy Review",
"educator_id": "EDU-98421",
"start_time": "2024-10-12T18:00:00Z",
"duration_minutes": 120,
"topic": "Economy",
"is_free": true,
"status": "Scheduled"

#	class_id	title	educator_id	start_time	duration_minutes	topic
1
2
3

Complete list of extractable fields for Test Series objects from unacademy.com. All fields typed and schema-versioned.

test_idtitleexam_categorytotal_testsenrolled_countprice_tierratingstart_dateschedule_url

"test_id": "TS-1104",
"title": "NEET UG 2025 All India Mock Test Series",
"exam_category": "NEET UG",
"total_tests": 15,
"enrolled_count": 45192,
"price_tier": "Lite",
"rating": 4.7,
"start_date": "2024-08-01"

#	test_id	title	exam_category	total_tests	enrolled_count	price_tier
1
2
3

Complete list of extractable fields for Pricing & Subscriptions objects from unacademy.com. All fields typed and schema-versioned.

plan_idexam_categoryduration_monthstier_nameoriginal_pricediscounted_pricediscount_pctfeatures_includedcurrency

"plan_id": "SUB-UPSC-12M-ICONIC",
"exam_category": "UPSC CSE",
"duration_months": 12,
"tier_name": "Iconic",
"original_price": 119999.0,
"discounted_price": 89999.0,
"discount_pct": 25,
"currency": "INR"

#	plan_id	exam_category	duration_months	tier_name	original_price	discounted_price
1
2
3

Capabilities

Extract the entire EdTech taxonomy

Our Unacademy scraper navigates complex React state, GraphQL endpoints, and infinite scrolling to extract structured educator profiles, batch schedules, and pricing data.

Educator Metrics Extraction

Track follower counts, 30-day watch minutes, lifetime watch minutes, and educator badges to identify rising talent and platform engagement.

Course & Batch Details

Extract complete syllabi, module breakdowns, start/end dates, target exams, and language mediums for all active and upcoming batches.

Live Class Timings

Monitor schedules for free Special Classes and paid Plus/Iconic sessions, including educator mapping and topic categorisation.

Test Series Metadata

Capture mock test schedules, enrollment counts, and difficulty levels across UPSC, JEE, NEET, and state board categories.

Pricing & Subscription Tiers

Track dynamic pricing, discount percentages, and feature differences across Plus, Iconic, and Lite subscription tiers.

Category Taxonomy Mapping

Navigate Unacademy's deep category tree to map courses and educators to specific exams and sub-topics.

Learner Reviews & Ratings

Extract aggregated star ratings, written feedback, and upvotes on educator profiles and completed courses.

Educator Activity Tracking

Monitor class frequency, new course launches, and schedule adherence for specific educators over time.

Scheduled Change Detection

Run daily or weekly pipelines to capture diffs in pricing, new batch announcements, and watch minute growth.

// engagement pipeline

From category URL to warehouse record

Brief in. Clean data out.

Define Scope

d 0

Provide target exams (e.g., UPSC, NEET), educator lists, or specific batches. We design the extraction schema together.

Pipeline Build

d 2–4

We configure Playwright crawlers, GraphQL API interception, and proxy rotation to handle Unacademy's infrastructure.

Validation & QA

d 4–6

Schema validation, null-rate checks, and data type enforcement before full launch.

Delivery

ongoing

JSON / CSV / Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.

Under the hood

How our Unacademy pipeline handles the hard parts

Unacademy relies on heavy client-side rendering and dynamic API endpoints. Here is how we maintain stable extraction.

// fingerprinting

Identity rotation

TLS fingerprintrandomised

User-agentrotated

IP poolresidential

Challenges blocked0

// pagination

Page coverage

48,291 pages queued running

// observability

Pipeline health

99.9%

uptime

142ms

p99 lat

0.3%

null rate

alerts

SPA rendering

Playwright for Next.js hydration

Unacademy is a modern Single Page Application. We use Playwright to execute JavaScript, wait for React state hydration, and trigger lazy-loaded components before parsing the DOM.

API interception

Capturing raw JSON from GraphQL endpoints

Rather than scraping fragile HTML, our pipeline intercepts Unacademy's internal GraphQL and REST API responses during page load, extracting clean, structured JSON payloads directly.

Infinite scroll

Handling paginated batch and educator lists

Educator lists and course catalogues load dynamically via infinite scroll. We simulate user scroll behaviour and capture the subsequent XHR requests to ensure complete coverage of the category.

Change detection

Diffing watch minutes and follower counts

For tracking educator performance over time, we maintain state across pipeline runs. You receive a clean changelog of watch minute growth and follower acquisition rather than full daily re-dumps.

Anti-bot layer

Residential proxies to bypass rate limits

Unacademy rate-limits aggressive IP addresses. We distribute requests across a pool of Indian residential proxies, matching the geographic origin expected by the platform's load balancers.

Applications

Who uses Unacademy data — and how

Teams across industries use unacademy.com data to build competitive products and smarter operations.

EdTech Competitor Intelligence

Rival platforms track Unacademy's pricing, discount strategies, and new batch launches to inform their own product roadmaps.

Educator Talent Acquisition

EdTech recruiters monitor watch minutes, follower growth, and engagement metrics to identify and poach top-performing educators.

Market Expansion Analysis

Strategy teams analyse course volumes and educator density across categories (e.g., UPSC vs State PSC) to identify underserved exam markets.

Pricing Strategy

Analysts monitor subscription tier pricing, promotional periods, and duration discounts to understand EdTech monetisation trends.

Content Gap Analysis

Curriculum designers parse syllabi and batch schedules to find missing topics or emerging subjects in the test prep space.

Investment Due Diligence

Private equity firms and analysts track active batch volumes, educator retention, and pricing stability to evaluate platform health.

Technical Spec

Unacademy scraper — technical capabilities

Everything supported by our unacademy.com scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

JavaScript rendering

Full Playwright sessions to handle Next.js hydration and dynamic routing

Supported

GraphQL API interception

Direct extraction of structured JSON from internal API responses

Supported

Residential proxy rotation

ISP-grade residential IPs from India to prevent rate limiting

Supported

Change detection (diffs)

Track daily changes in watch minutes, followers, and pricing

Supported

Educator metrics tracking

Capture lifetime and 30-day watch minutes, badges, and follower counts

Supported

Course syllabus extraction

Deep extraction of module breakdowns and class schedules

Supported

Subscription pricing history

Track Plus, Iconic, and Lite tier pricing across all exam categories

Supported

Paid video content download

Extraction of DRM-protected video files or live streams

Partial

User-specific mock test results

Requires authenticated learner credentials and contains PII

Partial

Internal discussion forums

Gated behind active paid subscription walls

Partial

Infrastructure

Infrastructure powering the Unacademy pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheus

Scrapy + Playwright Stack

Scrapy handles crawl orchestration, deduplication, and retry logic. Playwright handles JavaScript rendering, infinite scrolling, and API interception.

API Interception & GraphQL

We bypass fragile DOM parsing by intercepting Unacademy's internal GraphQL and REST responses during page load, yielding cleaner, more reliable data.

Cloud-Native Orchestration

Pipelines run on AWS Lambda and ECS. Airflow handles scheduling, dependency management, and SLA alerting. All state stored in managed Postgres.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON

Newline-delimited or nested — schema versioned per run

CSV

Flat file with typed columns — Excel/Sheets compatible

XLS

Excel format for business analyst workflows

Parquet

Columnar format for BigQuery, Snowflake, Athena

AWS S3

Direct bucket delivery — compatible with any data lake

Webhook

HTTP POST per record for real-time downstream processing

API

REST endpoint to query your extracted datasets

BigQuery

Streamed directly into your dataset with schema auto-detect

Snowflake

Stage + COPY INTO workflow — incremental or full-replace

PostgreSQL

Upsert into your existing schema with conflict resolution

Direct bucket delivery — compatible with any data lake

// faq

Common questions.

About unacademy.com scraping, legality, and pipeline operations.

Ask us directly →

Is scraping Unacademy legal?

Scraping publicly available information from Unacademy is generally permissible. DataFlirt targets only public, non-authenticated educator profiles, course syllabi, and pricing data. We do not extract personal learner data, circumvent DRM on video content, or violate copyright law.

How do you handle Unacademy's dynamic React interface?

We use Playwright to execute full browser sessions, allowing React to hydrate the DOM. More importantly, we intercept the underlying GraphQL and REST API calls made by the frontend, extracting the raw JSON payloads directly for higher reliability.

Can you track educator watch minutes over time?

Yes. We can configure daily or weekly pipeline runs to capture '30-day watch minutes', 'lifetime watch minutes', and 'follower count'. We store the state and deliver a time-series dataset showing growth metrics.

Do you extract actual video content or class recordings?

No. We extract metadata about the classes (titles, educators, durations, schedules, topics) but we do not download, store, or distribute DRM-protected video content or live streams.

What is the minimum viable engagement?

Our minimum engagement typically starts with a defined set of exam categories (e.g., UPSC, NEET, JEE) or a specific list of educators, with weekly delivery. Contact us with your specific scope for pricing.

Can I request a sample dataset before committing?

Absolutely. We provide a sample run of up to 100 educator profiles or 50 course batches as part of the pre-engagement scoping process, allowing you to validate the schema and data quality.

How fresh is the schedule data for live classes?

For active tracking pipelines, we can configure hourly or daily runs to capture new class announcements, schedule changes, and live status updates with minimal latency.

Unacademy data,
at warehouse scale.

Every field we extract from unacademy.com

Extract the entire EdTech taxonomy

From category URL to warehouse record

How our Unacademy pipeline handles the hard parts

Who uses Unacademy data — and how

Unacademy scraper — technical capabilities

Infrastructure powering the Unacademy pipeline

Your data, your destination

Common questions.

Tell us what
to extract.
We do the rest.

Data Extraction for Every Industry

Unacademy data, at warehouse scale.

Every field we extract from unacademy.com

Extract the entire EdTech taxonomy

From category URL to warehouse record

How our Unacademy pipeline handles the hard parts

Who uses Unacademy data — and how

Unacademy scraper — technical capabilities

Infrastructure powering the Unacademy pipeline

Your data, your destination

Common questions.

Tell us whatto extract. We do the rest.

Data Extraction for Every Industry

Unacademy data,
at warehouse scale.

Tell us what
to extract.
We do the rest.