SYSTEM all green source brilliant.org queue 3,491 paths p99 latency 215ms dataflirt.com · scraper/brilliant-org

RUN · 31 active pipelines · brilliant.org live

Brilliant.org data,
at warehouse scale.

We extract course hierarchies, concept dependencies, lesson metadata, and syllabus structures from Brilliant.org. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Snowflake on your cadence.

Get data from brilliant.org → See how it works

Courses mapped

1,294 /run

Lessons extracted

14,821 /run

Concept nodes

42,105 /week

Active pipelines

Uptime

99.98%

◆ STEM Course Data◆ Learning Paths◆ Concept Dependencies◆ Lesson Metadata◆ Difficulty Rankings◆ Prerequisite Graphs◆ Educator Profiles◆ Subscription Pricing◆ Syllabus Structures◆ Interactive Module Types◆ Managed Pipeline◆ S3 / BigQuery Delivery◆ Bengaluru HQ◆ Enterprise SLA◆ STEM Course Data◆ Learning Paths◆ Concept Dependencies◆ Lesson Metadata◆ Difficulty Rankings◆ Prerequisite Graphs◆ Educator Profiles◆ Subscription Pricing◆ Syllabus Structures◆ Interactive Module Types◆ Managed Pipeline◆ S3 / BigQuery Delivery◆ Bengaluru HQ◆ Enterprise SLA

Data Dictionary

Every field we extract from brilliant.org

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Course Catalogue objects from brilliant.org. All fields typed and schema-versioned.

course_idtitlecategorysub_categorydifficulty_levelduration_minuteslesson_countconcept_countdescriptionprerequisitestarget_audienceauthor_name

"course_id": "cs-algorithms-101",
"title": "Computer Science Algorithms",
"category": "Computer Science",
"difficulty_level": "Intermediate",
"lesson_count": 24,
"duration_minutes": 480,
"prerequisites": "['python-basics', 'discrete-math']"

#	course_id	title	category	sub_category	difficulty_level	duration_minutes
1
2
3

Complete list of extractable fields for Learning Paths objects from brilliant.org. All fields typed and schema-versioned.

path_idpath_titledescriptiontotal_coursesestimated_hourstarget_roledifficulty_progressionmilestonesskills_acquiredpath_url

"path_id": "data-analyst-track",
"path_title": "Foundations of Data Science",
"total_courses": 6,
"estimated_hours": 35,
"target_role": "Data Analyst",
"skills_acquired": "['Probability', 'Statistics', 'SQL Logic']"

#	path_id	path_title	description	total_courses	estimated_hours	target_role
1
2
3

Complete list of extractable fields for Concept Dependencies objects from brilliant.org. All fields typed and schema-versioned.

concept_idconcept_namedomainparent_conceptchild_conceptsrelated_coursesdefinition_summarydifficulty_weightvisual_assets

"concept_id": "bayes-theorem",
"concept_name": "Bayes' Theorem",
"domain": "Probability",
"parent_concept": "conditional-probability",
"child_concepts": "['bayesian-updating', 'naive-bayes']",
"difficulty_weight": 4.2

#	concept_id	concept_name	domain	parent_concept	child_concepts	related_courses
1
2
3

Complete list of extractable fields for Lesson Metadata objects from brilliant.org. All fields typed and schema-versioned.

lesson_idcourse_idtitlesequence_numberinteraction_typeestimated_minuteskey_takeawaysmodule_countrequires_premiumtags

"lesson_id": "neural-nets-01",
"course_id": "intro-to-ai",
"title": "The Perceptron",
"sequence_number": 1,
"estimated_minutes": 15,
"requires_premium": true,
"module_count": 8

#	lesson_id	course_id	title	sequence_number	interaction_type	estimated_minutes
1
2
3

Complete list of extractable fields for Pricing & Plans objects from brilliant.org. All fields typed and schema-versioned.

plan_idtier_namebilling_cycleprice_usdcurrencydiscount_pctfeatures_includedtrial_daysregional_pricingactive_status

"tier_name": "Premium Annual",
"billing_cycle": "yearly",
"price_usd": 149.85,
"currency": "USD",
"discount_pct": 20,
"trial_days": 7,
"active_status": true

#	plan_id	tier_name	billing_cycle	price_usd	currency	discount_pct
1
2
3

Capabilities

Extract the complete STEM curriculum graph

Brilliant.org relies heavily on client-side rendering and interactive components. Our infrastructure executes the JavaScript, extracts the internal state, and normalises the syllabus data into queryable formats.

Course Catalogue Extraction

Extract titles, descriptions, categories, difficulty levels, and duration estimates for every course on the platform.

Learning Path Mapping

Map multi-course tracks, including milestone requirements, skill progression, and target audience definitions.

Prerequisite Graphs

Capture the exact dependency chain between concepts and courses to understand structural curriculum design.

Lesson Metadata Parsing

Extract sequence numbers, estimated completion times, and premium-lock status for individual lessons.

Concept Node Indexing

Index the underlying concepts taught, including parent-child relationships and domain categorisation.

Pricing & Subscription Tracking

Monitor tier structures, promotional discounts, trial periods, and regional pricing variations.

Multi-Region Localisation

Extract translated course titles and localised pricing structures across supported geographic regions.

Curriculum Change Detection

Track when courses are added, deprecated, or restructured with hash-based diffing on daily runs.

SPA State Hydration

Bypass complex DOM structures by extracting the raw JSON state directly from the Next.js application layer.

// engagement pipeline

From target list to warehouse record

Brief in. Clean data out.

Define Scope

d 0

Provide categories, specific learning paths, or request a full site crawl. We map the required schema.

Pipeline Build

d 2–4

We configure Playwright spiders, session management, and React state parsers specific to Brilliant.org.

Validation & QA

d 4–6

Schema validation, null-rate checks, and structural integrity testing of the prerequisite graphs.

Delivery

ongoing

JSON / CSV / Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.

Under the hood

Handling modern SPA architectures

Brilliant.org is a highly interactive React application. Standard HTTP requests return empty shells. Here is how we extract the actual data.

// fingerprinting

Identity rotation

TLS fingerprintrandomised

User-agentrotated

IP poolresidential

Challenges blocked0

// pagination

Page coverage

48,291 pages queued running

// observability

Pipeline health

99.9%

uptime

142ms

p99 lat

0.3%

null rate

alerts

JavaScript rendering

Full Playwright execution for React components

Brilliant's content loads dynamically via client-side JavaScript. We run full Playwright browser sessions to trigger API calls, hydrate the DOM, and capture the rendered syllabus structures.

State extraction

Parsing internal application state

Rather than scraping fragile CSS classes in interactive SVG modules, we intercept the underlying JSON state objects passed to the React components, ensuring high data accuracy and pipeline stability.

Anti-bot layer

Residential proxies and rate limiting

We route requests through ISP-grade residential proxies with conservative concurrency limits to respect the platform's infrastructure while avoiding IP bans and CAPTCHA walls.

Graph normalisation

Flattening nested curriculum trees

Courses, lessons, and concepts form a complex directional graph. Our pipeline flattens this nested data into relational tables, making it immediately queryable in SQL environments.

Change detection

Only re-scrape what changes

We maintain a hash index of last-seen values for course structures. Subsequent runs only push diffs, providing a clean changelog of curriculum updates without redundant data.

Applications

Who uses Brilliant.org data

Teams across industries use brilliant.org data to build competitive products and smarter operations.

EdTech Competitor Analysis

Online learning platforms monitor Brilliant's catalogue expansion, course structures, and difficulty curves to benchmark their own curricula.

AI Tutor Training

Machine learning teams use structured concept dependency graphs to train educational LLMs on logical progression and prerequisite mapping.

Pricing Intelligence

Strategy teams track subscription tiers, promotional cadences, and regional pricing strategies to optimise EdTech revenue models.

Academic Research

Education researchers analyse the taxonomy of STEM concepts and how modern interactive platforms sequence complex topics.

Market Gap Analysis

Content creators and publishers identify underserved STEM domains by analysing course density and topic coverage.

Taxonomy Development

Data science teams extract category and sub-category metadata to build standardised skill ontologies for HR and recruitment platforms.

Why DataFlirt

"Brilliant.org maps the dependency graph of modern STEM education. Extracting it requires parsing complex state from a highly interactive single-page application."

Standard HTTP clients fail against Brilliant's React architecture. We deploy Playwright clusters to execute JavaScript, hydrate course states, and extract the underlying syllabus structures and concept graphs without triggering rate limits. DataFlirt handles the infrastructure so you can focus on curriculum analysis.

Technical Spec

Brilliant.org scraper — technical capabilities

Everything supported by our brilliant.org scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

JavaScript rendering

Full Playwright sessions to load dynamic course content

Supported

Next.js state extraction

Direct interception of __NEXT_DATA__ JSON payloads

Supported

Residential proxy rotation

ISP-grade residential IPs to bypass rate limits

Supported

Curriculum graph mapping

Relational linking of paths, courses, and lessons

Supported

Pricing localisation

Extraction of regional pricing via geo-targeted proxies

Supported

Change detection (diffs)

Hash-based diffing to track newly added courses

Supported

Webhook delivery

HTTP POST per record for real-time processing

Supported

Premium lesson content

Actual text and interactive modules inside paid courses

Partial

User progress telemetry

Individual learner completion rates and quiz scores

Partial

Infrastructure

Infrastructure powering the pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheus

Scrapy + Playwright Stack

Scrapy handles crawl orchestration and deduplication. Playwright handles JavaScript rendering and React state extraction. Combined via scrapy-playwright middleware.

Residential Proxy Infrastructure

We maintain pools of residential ISP proxies. Rotation happens per-request with sticky sessions where required to prevent IP bans.

Cloud-Native Orchestration

Pipelines run on AWS Lambda and ECS. Airflow handles scheduling and dependency management. All state stored in managed Postgres.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON

Newline-delimited or nested — schema versioned per run

CSV

Flat file with typed columns — Excel/Sheets compatible

XLS

Excel format for non-technical analyst teams

Parquet

Columnar format for BigQuery, Snowflake, Athena

AWS S3

Direct bucket delivery — compatible with any data lake

Webhook

HTTP POST per record for downstream processing

API

REST endpoint to query extracted catalogue data

BigQuery

Streamed directly into your dataset

Snowflake

Stage + COPY INTO workflow

Postgres

Upsert into your existing schema

Direct bucket delivery — compatible with any data lake

// faq

Common questions.

About brilliant.org scraping, legality, and pipeline operations.

Ask us directly →

Can you extract the actual interactive quiz questions?

No. DataFlirt only extracts publicly available metadata such as course titles, syllabus structures, and concept dependencies. We do not bypass authentication walls to scrape premium gated lesson content or interactive quiz answers.

How do you handle the complex React application state?

Instead of writing fragile CSS selectors for interactive SVG elements, our Playwright implementation intercepts the initial JSON state payloads used to hydrate the Next.js frontend. This guarantees highly accurate and structured data extraction.

How often can the catalogue be updated?

We typically configure Brilliant.org pipelines to run weekly or monthly, as educational curricula do not change with high frequency. However, daily runs can be configured if required for pricing intelligence.

Do you map the prerequisite relationships between courses?

Yes. We extract the dependency graphs that link concepts and courses, delivering them as relational data or nested JSON arrays suitable for graph database ingestion.

What formats do you deliver the data in?

We deliver data in JSON, CSV, XLS, and Parquet formats. We can push directly to AWS S3, Google BigQuery, Snowflake, or trigger Webhooks and API endpoints.

Can I request a sample dataset?

Yes. We offer a sample extraction of a specific learning path or category during the scoping phase, allowing your engineering team to validate the schema before signing a contract.

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off catalogue map or a continuous feed of EdTech pricing changes — we scope, build, and operate the pipeline. Tell us what you need.

Start a brilliant.org pipeline → View pricing

hello@dataflirt.com · Bengaluru · IST · typical reply < 4h

Services

Data Extraction for Every Industry

View All Services →

🛍️ eCommerce → 🔍 Search Engine → ⚽ Sports Data → 📱 App Store → 🍕 Food Delivery → 📉 Betting Odds → ✈️ Aviation & Flight → 🛒 Grocery → 🎓 E-Learning → 💹 Stock Market → 🏠 Real Estate → 🤖 AI Training Data → 🧠 LLM Data → 📰 News → ⭐ Reviews → 💼 Job Board → 🏥 Healthcare → 💊 Pharma → 🏢 Company Data → 🤝 B2B Marketplace → 🚗 Automotive → 🌍 Travel → 🏨 Hospitality → 🪙 Cryptocurrency → 💡 IP & Patents → 📈 SEO Data → ⚖️ Legal → 🛡️ Insurance → 📲 Mobile App → 📸 Influencer → 🏛️ Government → 🚚 Transportation → 🎟️ Events → 📂 Directory → ⚡ Dynamic Websites → 📄 PDF Extraction → ✍️ Blog Content → ☁️ Weather → 🖥️ Cloud Scraping → 👨‍💻 Managed Service →

Brilliant.org data, at warehouse scale.

Every field we extract from brilliant.org

Extract the complete STEM curriculum graph

From target list to warehouse record

Handling modern SPA architectures

Who uses Brilliant.org data

Brilliant.org scraper — technical capabilities

Infrastructure powering the pipeline

Your data, your destination

Common questions.

Tell us whatto extract. We do the rest.

Data Extraction for Every Industry

Brilliant.org data,
at warehouse scale.

Tell us what
to extract.
We do the rest.