Teachable Scraper — Course, Pricing & Instructor Data Extraction

Data Dictionary

Every field we extract from teachable.com

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Course Metadata objects from teachable.com. All fields typed and schema-versioned.

course_idtitlesubtitleinstructor_namecategoryprice_minprice_maxcurrencyis_publishedenrollment_statusthumbnail_urlstorefront_url

"course_id": "crs_8921x",
"title": "Advanced Python Data Engineering",
"subtitle": "Build scalable data pipelines from scratch",
"instructor_name": "Jane Doe",
"price_min": 199.0,
"currency": "USD",
"enrollment_status": "open",
"storefront_url": "https://courses.janedoe.com/p/data-engineering"

#	course_id	title	subtitle	instructor_name	category	price_min
1
2
3

Complete list of extractable fields for Pricing Plans objects from teachable.com. All fields typed and schema-versioned.

course_idplan_idplan_nameplan_typepricecurrencybilling_intervaltrial_daysinstallmentsis_active

"course_id": "crs_8921x",
"plan_id": "pln_441a",
"plan_name": "Lifetime Access",
"plan_type": "one_time",
"price": 199.0,
"currency": "USD",
"installments": 1,
"is_active": true

#	course_id	plan_id	plan_name	plan_type	price	currency
1
2
3

Complete list of extractable fields for Curriculum Structure objects from teachable.com. All fields typed and schema-versioned.

course_idmodule_idmodule_namelesson_idlesson_titleis_previewcontent_typeduration_secondsorder_index

"course_id": "crs_8921x",
"module_name": "Module 1: Infrastructure",
"lesson_title": "Setting up AWS IAM",
"is_preview": true,
"content_type": "video",
"duration_seconds": 845,
"order_index": 3

#	course_id	module_id	module_name	lesson_id	lesson_title	is_preview
1
2
3

Complete list of extractable fields for Instructor Profiles objects from teachable.com. All fields typed and schema-versioned.

instructor_idnamebioavatar_urlsocial_linkstotal_coursesschool_namejoined_date

"instructor_id": "inst_77b2",
"name": "Jane Doe",
"bio": "Ex-FAANG Data Engineer teaching modern data stacks.",
"avatar_url": "https://cdn.teachable.com/avatars/77b2.jpg",
"social_links": "['https://twitter.com/janedoe']",
"total_courses": 4,
"school_name": "Data Engineering Academy"

#	instructor_id	name	bio	avatar_url	social_links	total_courses
1
2
3

Complete list of extractable fields for Sales Page Copy objects from teachable.com. All fields typed and schema-versioned.

course_idheadlinedescription_htmltarget_audiencerequirementsfaq_jsontestimonialsscraped_at

"course_id": "crs_8921x",
"headline": "Master the Modern Data Stack",
"target_audience": "Software engineers transitioning to data roles",
"requirements": "['Basic Python', 'SQL fundamentals']",
"testimonials": 12,
"scraped_at": "2026-05-12T09:14:33Z"

#	course_id	headline	description_html	target_audience	requirements	faq_json
1
2
3

Capabilities

Everything you need from Teachable — nothing you don't

Our Teachable scraper handles the platform's custom domain mapping, heavily customised storefront themes, dynamic pricing widgets, and curriculum structures — delivering normalised data regardless of how the creator configured their school.

Full Curriculum Extraction

Extract module names, lesson titles, content types, duration metadata, and free preview flags across the entire course syllabus.

Pricing Tier Parsing

Capture one-time payments, subscriptions, payment plans, and bundle pricing accurately, normalising currencies and billing intervals.

Instructor Intelligence

Scrape instructor names, biographies, social links, and cross-reference multiple courses taught by the same creator.

Custom Domain Resolution

Automatically identify and map creators using custom domains back to the underlying Teachable infrastructure for consistent extraction.

Sales Page Copy Mining

Extract headlines, HTML descriptions, FAQs, and testimonials from highly customised sales pages using NLP heuristic matching.

Course Bundle Detection

Identify when courses are sold as bundles and map the parent-child relationships between individual courses and the bundle package.

Multi-Currency Support

Extract pricing data across all supported local currencies, maintaining exact price points and currency codes.

Storefront Discovery

Crawl entire Teachable schools to discover unlisted or newly published courses automatically.

Scheduled Change Detection

Run continuous pipelines to monitor for pricing changes, new course launches, or syllabus updates with hash-based diffing.

// engagement pipeline

From target list to warehouse record

Brief in. Clean data out.

Define Scope

d 0

Provide Teachable school URLs, custom domains, or instructor names. We design the extraction schema together.

Pipeline Build

d 2–4

We configure Scrapy / Playwright crawlers, proxy rotation, and custom domain resolution logic.

Validation & QA

d 4–6

Schema validation, null-rate checks, and pricing accuracy verification before full launch.

Delivery

ongoing

JSON / CSV / Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.

Under the hood

How our Teachable pipeline handles the hard parts

Extracting data from a platform designed for extreme customisation requires adaptive parsing. Here is how we normalise fragmented storefronts.

// fingerprinting

Identity rotation

TLS fingerprintrandomised

User-agentrotated

IP poolresidential

Challenges blocked0

// pagination

Page coverage

48,291 pages queued running

// observability

Pipeline health

99.9%

uptime

142ms

p99 lat

0.3%

null rate

alerts

Custom domains

Resolving vanity URLs to underlying infrastructure

Many top creators use custom domains (e.g., courses.creator.com) rather than teachable.com subdomains. Our pipeline identifies underlying Teachable footprints via HTTP headers and specific DOM structures, allowing us to aggregate data across thousands of independent domains into a single normalised dataset.

Theme variations

Heuristic parsing for customised layouts

Teachable allows creators to heavily modify their sales pages using custom HTML/CSS blocks. We use heuristic parsing and XPath fallback chains to reliably identify pricing widgets, curriculum lists, and instructor bios regardless of the visual theme applied.

Dynamic pricing

Hydrating JavaScript pricing widgets

Pricing tiers and checkout links are often loaded dynamically via JavaScript based on geo-location or active promotions. We use Playwright to execute these scripts, capturing the true rendered price rather than stale server-side HTML.

Rate limiting

Distributed crawling across residential IPs

Scraping an entire school's catalogue rapidly triggers rate limits. We distribute requests across residential IP pools, managing concurrency and request delays to ensure complete extraction without triggering defensive blocks.

Change detection

Only re-scrape what's changed

For large course catalogues, we maintain a hash index of last-seen values per field. Subsequent runs only push diffs — reducing compute cost and downstream processing load. You get a clean changelog rather than full re-dumps.

Applications

Who uses Teachable data — and how

Teams across industries use teachable.com data to build competitive products and smarter operations.

Creator Discovery & Sponsorships

MCNs, agencies, and brands identify successful course creators for partnership opportunities based on catalogue size and pricing tiers.

Pricing Strategy Analysis

EdTech platforms and independent creators monitor competitor pricing models, subscription vs one-time ratios, and bundle strategies.

Market Research

Analysts track trending course topics, curriculum density, and category saturation to identify whitespace in the eLearning market.

Course Aggregators

Review sites and course aggregators build search indexes by normalising metadata across thousands of independent Teachable schools.

AI Curriculum Training

LLM developers use structured syllabus data (modules, lesson titles, sequencing) to train educational planning and curriculum generation models.

Lead Generation

B2B SaaS companies targeting the creator economy build highly qualified prospect lists based on course volume and pricing tiers.

Technical Spec

Teachable scraper — technical capabilities

Everything supported by our teachable.com scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

JavaScript rendering

Full Playwright sessions — required for dynamic pricing widgets and lazy-loaded curricula

Supported

Custom domain resolution

Identifies and extracts from white-labelled Teachable instances

Supported

Curriculum mapping

Nested extraction of modules, lessons, and preview status

Supported

Pricing tier extraction

Captures one-time, subscription, and multi-payment plans

Supported

Instructor bio parsing

Extracts text, avatars, and social links from custom layouts

Supported

Change detection (diffs)

Hash-based diff: only emit records with changed fields since last run

Supported

Webhook delivery

HTTP POST per record or batch — useful for real-time workflows

Supported

Gated course content

Actual video files, PDFs, and text inside paid lessons are behind a login wall

Partial

Student enrollment numbers

Exact student counts and progress metrics are private to the creator

Partial

Infrastructure

Infrastructure powering the Teachable pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheusTerraformCelery

Scrapy + Playwright Stack

Scrapy handles crawl orchestration, deduplication, and retry logic. Playwright handles JavaScript rendering, cookie sessions, and interaction flows. Combined via scrapy-playwright middleware.

Residential Proxy Infrastructure

We maintain pools of residential ISP proxies across US/UK/EU regions. Rotation happens per-request with sticky sessions where required. IP score monitoring prevents blacklisted pool contamination.

Cloud-Native Orchestration

Pipelines run on AWS Lambda (burst) and ECS (sustained). Airflow handles scheduling, dependency management, and SLA alerting. All state stored in managed Postgres.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON

Newline-delimited or nested — schema versioned per run

CSV

Flat file with typed columns — Excel/Sheets compatible

XLS

Excel format for business analyst teams

Parquet

Columnar format for BigQuery, Snowflake, Athena

AWS S3

Direct bucket delivery — compatible with any data lake

Webhook

HTTP POST per record for real-time downstream processing

API

REST endpoint to query your extracted datasets

PostgreSQL

Upsert into your existing schema with conflict resolution

BigQuery

Streamed directly into your dataset with schema auto-detect

Snowflake

Stage + COPY INTO workflow — incremental or full-replace

Direct bucket delivery — compatible with any data lake

// faq

Common questions.

About teachable.com scraping, legality, and pipeline operations.

Ask us directly →

Is scraping Teachable legal?

Scraping publicly available information from Teachable storefronts is generally permissible. DataFlirt targets only public, non-authenticated course metadata, pricing, and curriculum structures. We do not extract gated paid content, student PII, or circumvent authentication walls. Clients should consult legal counsel for specific use cases.

Can you extract data from Teachable schools using custom domains?

Yes. Our pipeline identifies the underlying Teachable infrastructure via network fingerprints, allowing us to scrape and normalise data from custom domains (e.g., courses.creator.com) exactly as we would from a teachable.com subdomain.

Do you extract the actual course videos or PDFs?

No. We only extract the public-facing curriculum structure (module names, lesson titles, duration, and free preview status). The actual paid content remains behind a login wall and is not supported.

How do you handle highly customised sales pages?

Teachable allows heavy theme customisation. We use heuristic parsing, NLP matching, and multi-layer XPath fallback chains to identify pricing widgets, instructor bios, and FAQs regardless of the specific visual theme applied by the creator.

How fresh is the data?

For continuous monitoring, pipelines can run daily or weekly to detect new course launches and pricing updates. Full catalogue refreshes complete within a few hours depending on the target list size.

Can I request a sample dataset before committing?

Absolutely. We provide a sample run of up to 50 Teachable schools as part of the pre-engagement scoping process — so you can validate schema fit, field completeness, and data quality before signing any contract.

Teachable data,
at warehouse scale.

Every field we extract from teachable.com

Everything you need from Teachable — nothing you don't

From target list to warehouse record

How our Teachable pipeline handles the hard parts

Who uses Teachable data — and how

Teachable scraper — technical capabilities

Infrastructure powering the Teachable pipeline

Your data, your destination

Common questions.

Tell us what
to extract.
We do the rest.

Data Extraction for Every Industry

Teachable data, at warehouse scale.

Every field we extract from teachable.com

Everything you need from Teachable — nothing you don't

From target list to warehouse record

How our Teachable pipeline handles the hard parts

Who uses Teachable data — and how

Teachable scraper — technical capabilities

Infrastructure powering the Teachable pipeline

Your data, your destination

Common questions.

Tell us whatto extract. We do the rest.

Data Extraction for Every Industry

Teachable data,
at warehouse scale.

Tell us what
to extract.
We do the rest.