SYSTEM all green source learnworlds.com queue 12,408 pages p99 latency 186ms dataflirt.com · scraper/learnworlds-com
RUN | 42 active pipelines | learnworlds.com live

LearnWorlds data,
extracted at scale.

We extract course listings, membership pricing, curriculum structures, and instructor profiles across LearnWorlds schools. Delivered as clean JSON, CSV, or Parquet to S3 or BigQuery on your cadence.

Courses extracted
412K /month
Price updates
85K /24h
Instructor profiles
34K /run
Active pipelines
42
Uptime
99.94%
Data Dictionary

Every field we extract from learnworlds.com

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Course Catalogue objects from learnworlds.com. All fields typed and schema-versioned.

course_idtitleslugcategorydescriptionpricecurrencyenrollment_countratingreview_countinstructor_idlanguageleveldurationthumbnail_urlupdated_at
course_catalogue
● 200 OK
"course_id": "lw-crs-8921",
"title": "Advanced Python for Data Science",
"price": 199.0,
"currency": "USD",
"instructor_id": "inst-402",
"language": "English",
"rating": 4.8,
"review_count": 342
# course_idtitleslugcategorydescriptionprice
1
2
3

Complete list of extractable fields for Pricing & Memberships objects from learnworlds.com. All fields typed and schema-versioned.

plan_idcourse_idplan_nameplan_typepriceintervalcurrencytrial_daysfeaturesdiscount_pctoriginal_priceis_activecheckout_url
pricing_& memberships
● 200 OK
"plan_id": "plan-9012",
"plan_name": "Pro Access",
"plan_type": "subscription",
"price": 49.0,
"interval": "monthly",
"currency": "USD",
"trial_days": 14,
"is_active": true
# plan_idcourse_idplan_nameplan_typepriceinterval
1
2
3

Complete list of extractable fields for Curriculum Outlines objects from learnworlds.com. All fields typed and schema-versioned.

course_idsection_idsection_titlesection_orderitem_iditem_titleitem_typeis_freeduration_minutesdrip_feed_dayscontent_url
curriculum_outlines
● 200 OK
"course_id": "lw-crs-8921",
"section_title": "Module 1: Data Structures",
"item_title": "Lists and Dictionaries",
"item_type": "video",
"is_free": true,
"duration_minutes": 18,
"drip_feed_days": 0
# course_idsection_idsection_titlesection_orderitem_iditem_title
1
2
3

Complete list of extractable fields for Instructor Profiles objects from learnworlds.com. All fields typed and schema-versioned.

instructor_idnamebioavatar_urlsocial_linkstotal_coursestotal_studentsaverage_ratingjoined_datecontact_emailwebsite
instructor_profiles
● 200 OK
"instructor_id": "inst-402",
"name": "Sarah Jenkins",
"bio": "Data Scientist and Python educator.",
"total_courses": 4,
"total_students": 12500,
"average_rating": 4.9,
"website": "https://sarahjenkins.dev"
# instructor_idnamebioavatar_urlsocial_linkstotal_courses
1
2
3

Complete list of extractable fields for School Metadata objects from learnworlds.com. All fields typed and schema-versioned.

school_iddomainschool_nametheme_settingslogo_urlactive_coursestotal_instructorssupported_languagescurrency_defaultsocial_profilescreated_at
school_metadata
● 200 OK
"school_id": "sch-104",
"domain": "academy.example.com",
"school_name": "Tech Academy",
"active_courses": 45,
"total_instructors": 12,
"currency_default": "USD",
"supported_languages": "['English', 'Spanish']"
# school_iddomainschool_nametheme_settingslogo_urlactive_courses
1
2
3

Capabilities

Extract course intelligence across the LearnWorlds ecosystem

Our scraper handles LearnWorlds school domains, parsing custom themes, dynamic pricing widgets, and nested curriculum structures with full JavaScript rendering.

Course Metadata Extraction

Title, description, categories, language, and difficulty level across diverse school themes.

Dynamic Pricing & Tiers

Capture one-off payments, subscriptions, payment plans, and bundle pricing.

Curriculum Mapping

Extract nested course structures, section titles, module types, and free preview flags.

Instructor Intelligence

Scrape instructor bios, social links, course portfolios, and student counts.

Review & Rating Mining

Extract student testimonials, star ratings, and review text from course landing pages.

Cross-School Aggregation

Monitor multiple LearnWorlds subdomains or custom domains simultaneously.

Discount Tracking

Identify active coupon codes, discounted pricing, and limited-time offers.

Bundle & Membership Logic

Map relationships between individual courses and overarching subscription bundles.

Custom Domain Resolution

Identify and track LearnWorlds instances operating on white-labelled custom domains.

// engagement pipeline

From school list to warehouse record

Brief in. Clean data out.

Define Scope
d 0

Provide target LearnWorlds domains, specific course URLs, or instructor profiles. We design the extraction schema.

Pipeline Build
d 2–4

We configure Scrapy and Playwright crawlers, handle theme variations, and bypass rate limits.

Validation & QA
d 4–6

Schema validation, null-rate checks, and nested curriculum structure verification before launch.

Delivery
ongoing

JSON, CSV, or Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.

Under the hood

Handling LearnWorlds extraction complexity

LearnWorlds schools use highly customisable themes, dynamic SPAs, and varying DOM structures. We standardise the output.

pipeline-monitor · learnworlds.com · live ● active
// fingerprinting
Identity rotation
TLS fingerprintrandomised
User-agentrotated
IP poolresidential
Challenges blocked0
// pagination
Page coverage
48,291 pages queued running
// observability
Pipeline health
99.9%
uptime
142ms
p99 lat
0.3%
null rate
2
alerts
Theme variability
Standardising custom CSS structures

LearnWorlds allows extensive CSS and HTML customisation. Our selectors target underlying data attributes and JSON state rather than fragile visual classes.

JavaScript rendering
Hydrating dynamic pricing widgets

Course catalogues and pricing widgets rely on client-side rendering. We use Playwright to hydrate the DOM before extraction.

White-label detection
Identifying custom domain schools

Many schools use custom domains. Our pipeline identifies LearnWorlds infrastructure via headers and injects standard extractors.

Nested data structures
Flattening complex curriculums

Curriculums are deeply nested. We flatten sections and modules into relational tables for easy warehouse ingestion.

Rate limit management
Distributing requests safely

Aggressive scraping triggers WAF blocks. We distribute requests across residential proxies with human-like timing.

Applications

Who uses LearnWorlds data

Teams across industries use learnworlds.com data to build competitive products and smarter operations.

01
Market Research

EdTech analysts track course pricing trends, popular categories, and curriculum depth across specific niches.

02
Competitor Intelligence

Course creators monitor competing schools for new curriculum additions, pricing changes, and bundle strategies.

03
Lead Generation

B2B service providers extract instructor profiles and school metadata to build targeted outreach lists.

04
Content Aggregation

eLearning directories consolidate course listings, ratings, and pricing from multiple LearnWorlds instances.

05
Pricing Optimisation

Schools analyse subscription versus one-off payment models across top-performing competitors.

06
AI Training Data

ML teams extract curriculum structures and course descriptions to train educational content generation models.

Why DataFlirt

"The creator economy runs on platforms like LearnWorlds, generating a massive, fragmented dataset of educational content and pricing models."

Extracting data from LearnWorlds requires navigating thousands of distinct, highly customised school themes. Relying on basic HTTP requests fails due to client-side rendering and aggressive caching. DataFlirt manages the JavaScript execution and normalises the output across every school, delivering clean, structured curriculum and pricing data directly to your warehouse.

Technical Spec

LearnWorlds scraper technical capabilities

Everything supported by our learnworlds.com scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

JavaScript rendering
Full Playwright sessions to hydrate pricing and course catalogues
Supported
Custom domain support
Extract data from white-labelled LearnWorlds instances
Supported
Theme standardisation
Normalise data across Site Builder variations
Supported
Curriculum mapping
Extract nested sections and module titles
Supported
Bundle extraction
Map individual courses to subscription tiers
Supported
Residential proxy rotation
Bypass WAF blocks and rate limiting
Supported
Change detection
Emit diffs for pricing or curriculum updates
Supported
Student progress data
Extract completion rates or quiz scores (requires student login)
Partial
Private community posts
Scrape internal social network discussions within gated courses
Partial
Video content extraction
Download hosted SCORM packages or DRM-protected video files
Partial
Infrastructure

Infrastructure powering the LearnWorlds pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheus
Scrapy + Playwright Stack

Handles client-side rendering for LearnWorlds Site Builder themes while maintaining high concurrency.

Residential Proxy Infrastructure

Distributes requests across diverse IP pools to avoid WAF blocks on custom domains.

Cloud-Native Orchestration

Pipelines run on AWS Lambda and ECS, scheduled via Airflow for reliable daily or weekly extraction.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON
Newline-delimited or nested, ideal for complex curriculum structures
CSV
Flat file with typed columns for pricing and catalogues
Parquet
Columnar format for BigQuery, Snowflake, Athena
AWS S3
Direct bucket delivery on pipeline completion
Webhook
HTTP POST per record for real-time updates
API
RESTful access to extracted course data
XLS
Excel-compatible files for analyst teams
PostgreSQL
Direct database insertion with schema matching
S3
Direct bucket delivery — compatible with any data lake
// faq

Common questions.

About learnworlds.com scraping, legality, and pipeline operations.

Ask us directly →
Is scraping LearnWorlds legal?

Scraping publicly available course catalogues, pricing, and instructor profiles is generally permissible. DataFlirt extracts only public metadata and does not bypass authentication walls or extract private student information.

Can you scrape custom domains?

Yes. We can target any domain running LearnWorlds infrastructure by identifying the underlying platform fingerprints and applying our standard extraction schema.

How do you handle different school themes?

Our selectors target underlying data structures and JSON payloads rather than visual CSS classes, ensuring consistent output regardless of the school's active theme.

Can you extract gated video content?

No. We focus exclusively on public metadata. We do not bypass DRM or extract proprietary video files.

How fresh is the pricing data?

Pipelines can be configured for daily runs to capture limited-time discounts and subscription changes.

Do you support nested curriculums?

Yes. We extract the full hierarchy of sections, modules, and lessons, delivering it as a relational dataset.

What is the minimum engagement?

We typically start with a defined list of target schools or a specific category of courses. Contact us for scoping.

$ dataflirt scope --new-project --source=learnworlds.com ready

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need course catalogues or pricing structures, we build and operate the extraction infrastructure. Tell us your target domains.

hello@dataflirt.com · Bengaluru · IST · typical reply < 4h
Services

Data Extraction for Every Industry

View All Services →