We extract course hierarchies, concept dependencies, lesson metadata, and syllabus structures from Brilliant.org. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Snowflake on your cadence.
Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.
Complete list of extractable fields for Course Catalogue objects from brilliant.org. All fields typed and schema-versioned.
"course_id": "cs-algorithms-101", "title": "Computer Science Algorithms", "category": "Computer Science", "difficulty_level": "Intermediate", "lesson_count": 24, "duration_minutes": 480, "prerequisites": "['python-basics', 'discrete-math']"
| # | course_id | title | category | sub_category | difficulty_level | duration_minutes |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for Learning Paths objects from brilliant.org. All fields typed and schema-versioned.
"path_id": "data-analyst-track", "path_title": "Foundations of Data Science", "total_courses": 6, "estimated_hours": 35, "target_role": "Data Analyst", "skills_acquired": "['Probability', 'Statistics', 'SQL Logic']"
| # | path_id | path_title | description | total_courses | estimated_hours | target_role |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for Concept Dependencies objects from brilliant.org. All fields typed and schema-versioned.
"concept_id": "bayes-theorem", "concept_name": "Bayes' Theorem", "domain": "Probability", "parent_concept": "conditional-probability", "child_concepts": "['bayesian-updating', 'naive-bayes']", "difficulty_weight": 4.2
| # | concept_id | concept_name | domain | parent_concept | child_concepts | related_courses |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for Lesson Metadata objects from brilliant.org. All fields typed and schema-versioned.
"lesson_id": "neural-nets-01", "course_id": "intro-to-ai", "title": "The Perceptron", "sequence_number": 1, "estimated_minutes": 15, "requires_premium": true, "module_count": 8
| # | lesson_id | course_id | title | sequence_number | interaction_type | estimated_minutes |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for Pricing & Plans objects from brilliant.org. All fields typed and schema-versioned.
"tier_name": "Premium Annual", "billing_cycle": "yearly", "price_usd": 149.85, "currency": "USD", "discount_pct": 20, "trial_days": 7, "active_status": true
| # | plan_id | tier_name | billing_cycle | price_usd | currency | discount_pct |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Brilliant.org relies heavily on client-side rendering and interactive components. Our infrastructure executes the JavaScript, extracts the internal state, and normalises the syllabus data into queryable formats.
Extract titles, descriptions, categories, difficulty levels, and duration estimates for every course on the platform.
Map multi-course tracks, including milestone requirements, skill progression, and target audience definitions.
Capture the exact dependency chain between concepts and courses to understand structural curriculum design.
Extract sequence numbers, estimated completion times, and premium-lock status for individual lessons.
Index the underlying concepts taught, including parent-child relationships and domain categorisation.
Monitor tier structures, promotional discounts, trial periods, and regional pricing variations.
Extract translated course titles and localised pricing structures across supported geographic regions.
Track when courses are added, deprecated, or restructured with hash-based diffing on daily runs.
Bypass complex DOM structures by extracting the raw JSON state directly from the Next.js application layer.
Brief in. Clean data out.
Provide categories, specific learning paths, or request a full site crawl. We map the required schema.
We configure Playwright spiders, session management, and React state parsers specific to Brilliant.org.
Schema validation, null-rate checks, and structural integrity testing of the prerequisite graphs.
JSON / CSV / Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.
Brilliant.org is a highly interactive React application. Standard HTTP requests return empty shells. Here is how we extract the actual data.
Brilliant's content loads dynamically via client-side JavaScript. We run full Playwright browser sessions to trigger API calls, hydrate the DOM, and capture the rendered syllabus structures.
Rather than scraping fragile CSS classes in interactive SVG modules, we intercept the underlying JSON state objects passed to the React components, ensuring high data accuracy and pipeline stability.
We route requests through ISP-grade residential proxies with conservative concurrency limits to respect the platform's infrastructure while avoiding IP bans and CAPTCHA walls.
Courses, lessons, and concepts form a complex directional graph. Our pipeline flattens this nested data into relational tables, making it immediately queryable in SQL environments.
We maintain a hash index of last-seen values for course structures. Subsequent runs only push diffs, providing a clean changelog of curriculum updates without redundant data.
Online learning platforms monitor Brilliant's catalogue expansion, course structures, and difficulty curves to benchmark their own curricula.
Machine learning teams use structured concept dependency graphs to train educational LLMs on logical progression and prerequisite mapping.
Strategy teams track subscription tiers, promotional cadences, and regional pricing strategies to optimise EdTech revenue models.
Education researchers analyse the taxonomy of STEM concepts and how modern interactive platforms sequence complex topics.
Content creators and publishers identify underserved STEM domains by analysing course density and topic coverage.
Data science teams extract category and sub-category metadata to build standardised skill ontologies for HR and recruitment platforms.
"Brilliant.org maps the dependency graph of modern STEM education. Extracting it requires parsing complex state from a highly interactive single-page application."
Standard HTTP clients fail against Brilliant's React architecture. We deploy Playwright clusters to execute JavaScript, hydrate course states, and extract the underlying syllabus structures and concept graphs without triggering rate limits. DataFlirt handles the infrastructure so you can focus on curriculum analysis.
Everything supported by our brilliant.org scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.
Open-source tooling on proven cloud infra — no vendor lock-in, full observability.
Scrapy handles crawl orchestration and deduplication. Playwright handles JavaScript rendering and React state extraction. Combined via scrapy-playwright middleware.
We maintain pools of residential ISP proxies. Rotation happens per-request with sticky sessions where required to prevent IP bans.
Pipelines run on AWS Lambda and ECS. Airflow handles scheduling and dependency management. All state stored in managed Postgres.
Data delivered to where your team already works — no new tooling required.
About brilliant.org scraping, legality, and pipeline operations.
Ask us directly →No. DataFlirt only extracts publicly available metadata such as course titles, syllabus structures, and concept dependencies. We do not bypass authentication walls to scrape premium gated lesson content or interactive quiz answers.
Instead of writing fragile CSS selectors for interactive SVG elements, our Playwright implementation intercepts the initial JSON state payloads used to hydrate the Next.js frontend. This guarantees highly accurate and structured data extraction.
We typically configure Brilliant.org pipelines to run weekly or monthly, as educational curricula do not change with high frequency. However, daily runs can be configured if required for pricing intelligence.
Yes. We extract the dependency graphs that link concepts and courses, delivering them as relational data or nested JSON arrays suitable for graph database ingestion.
We deliver data in JSON, CSV, XLS, and Parquet formats. We can push directly to AWS S3, Google BigQuery, Snowflake, or trigger Webhooks and API endpoints.
Yes. We offer a sample extraction of a specific learning path or category during the scoping phase, allowing your engineering team to validate the schema before signing a contract.
20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off catalogue map or a continuous feed of EdTech pricing changes — we scope, build, and operate the pipeline. Tell us what you need.