SYSTEM all green source cv-library.co.uk queue 14,293 pages p99 latency 184ms dataflirt.com · scraper/cv-library-co.uk
RUN / 42 active pipelines / cv-library.co.uk live

CV-Library data,
at warehouse scale.

We extract job listings, salary signals, recruiter profiles, and employer data from CV-Library. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Snowflake on your cadence.

Jobs extracted
145K /day
Salary updates
89K /24h
Company profiles
12K /run
Active pipelines
42
Uptime
99.98%
Data Dictionary

Every field we extract from cv-library.co.uk

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Job Postings objects from cv-library.co.uk. All fields typed and schema-versioned.

job_idtitlecompany_namelocationsalary_minsalary_maxsalary_typejob_typeposted_datedescriptionremote_flagapply_url
job_postings
● 200 OK
"job_id": "21849201",
"title": "Senior Python Developer",
"company_name": "TechCorp UK",
"location": "London",
"salary_min": 75000.0,
"salary_max": 90000.0,
"salary_type": "per annum",
"remote_flag": true,
"job_type": "Permanent"
# job_idtitlecompany_namelocationsalary_minsalary_max
1
2
3

Complete list of extractable fields for Salary Data objects from cv-library.co.uk. All fields typed and schema-versioned.

job_idtitlesectorsalary_rawsalary_minsalary_maxcurrencyperiodbenefitstax_estimated
salary_data
● 200 OK
"job_id": "21849201",
"title": "Senior Python Developer",
"salary_raw": "£75,000 - £90,000/annum + Bonus",
"salary_min": 75000.0,
"salary_max": 90000.0,
"currency": "GBP",
"period": "annual",
"benefits": "Bonus, Pension, Healthcare"
# job_idtitlesectorsalary_rawsalary_minsalary_max
1
2
3

Complete list of extractable fields for Company Profiles objects from cv-library.co.uk. All fields typed and schema-versioned.

company_idnameindustrydescriptionlogo_urlwebsiteactive_jobs_countheadquarterssize
company_profiles
● 200 OK
"company_id": "C9842",
"name": "TechCorp UK",
"industry": "Information Technology",
"active_jobs_count": 14,
"headquarters": "London, UK",
"website": "https://techcorp.co.uk",
"size": "501-1000"
# company_idnameindustrydescriptionlogo_urlwebsite
1
2
3

Complete list of extractable fields for Recruiter Intelligence objects from cv-library.co.uk. All fields typed and schema-versioned.

recruiter_idnameagency_nameagency_urltotal_postingscontact_phonecontact_emailsector_focus
recruiter_intelligence
● 200 OK
"recruiter_id": "R4921",
"name": "Sarah Jenkins",
"agency_name": "TechTalent Partners",
"total_postings": 42,
"contact_phone": "+44 20 7946 0958",
"sector_focus": "Software Engineering",
"agency_url": "https://cv-library.co.uk/agency/techtalent"
# recruiter_idnameagency_nameagency_urltotal_postingscontact_phone
1
2
3

Complete list of extractable fields for Search Results objects from cv-library.co.uk. All fields typed and schema-versioned.

keywordlocationradiuspositionjob_idtitlecompanysalary_previewsponsored_flagscraped_at
search_results
● 200 OK
"keyword": "data engineer",
"location": "Manchester",
"position": 3,
"job_id": "21849555",
"sponsored_flag": false,
"salary_preview": "£60,000/annum",
"scraped_at": "2026-05-12T09:14:33Z"
# keywordlocationradiuspositionjob_idtitle
1
2
3

Capabilities

Complete CV-Library data extraction

Our pipeline handles the complexities of job board scraping, including pagination limits, location normalisation, salary text parsing, and bot mitigation.

Full Job Listing Extraction

Capture title, raw HTML description, job type, location, and application URLs for every listing on the platform.

Salary Normalisation

Parse unstructured salary strings into min, max, currency, and period fields for direct quantitative analysis.

Agency vs Direct Employer

Identify whether a listing is posted by a recruitment agency or a direct employer, including agency contact details.

Location Mapping

Extract and normalise location data, including remote work flags and regional categorisation.

Expired Listing Detection

Monitor active URLs to detect exactly when a job is removed, providing precise time-to-fill metrics.

Keyword Search Tracking

Track organic and sponsored search positions for specific job titles across target UK postcodes.

Company Profile Scraping

Extract employer descriptions, active job counts, and metadata from dedicated company pages.

Incremental Change Detection

Maintain a hash index of active jobs. Subsequent runs only push new listings or status changes.

High-Frequency Updates

Configure continuous pipelines at hourly or daily cadences to capture fast-moving contract roles.

// engagement pipeline

From search criteria to structured records

Brief in. Clean data out.

Define Scope
d 0

Provide target keywords, sectors, locations, or specific employer IDs. We design the extraction schema together.

Pipeline Build
d 2–4

We configure Scrapy crawlers, proxy rotation, and parsing logic specific to CV-Library DOM structures.

Validation & QA
d 4–6

Schema validation, null-rate checks, and salary parsing accuracy verification before full launch.

Delivery
ongoing

JSON, CSV, or Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.

Under the hood

Handling job board scraping challenges

Job boards employ strict rate limits and pagination caps. Here is how our infrastructure guarantees complete data capture.

pipeline-monitor · cv-library.co.uk · live ● active
// fingerprinting
Identity rotation
TLS fingerprintrandomised
User-agentrotated
IP poolresidential
Challenges blocked0
// pagination
Page coverage
48,291 pages queued running
// observability
Pipeline health
99.9%
uptime
142ms
p99 lat
0.3%
null rate
2
alerts
Pagination limits
Bypassing hard search caps

CV-Library limits search results to a fixed number of pages. We bypass this by programmatically bisecting searches using granular location radii and salary bands to ensure 100% capture of large categories.

Salary parsing
Regex-driven normalisation

Salary data is often entered as free text (e.g., '£45k - £50k + OTE'). Our parsing layer uses compiled regex patterns to extract absolute numeric values, standardise currencies, and normalise time periods.

Anti-bot layer
UK residential proxy rotation

We route requests through UK-based residential ISP proxies to avoid geographic blocking and rate-limiting heuristics common to major UK job boards.

Change detection
Only re-scrape what changed

For large daily runs, we maintain a state store of all active job IDs. The pipeline only emits net-new jobs, modified listings, and expired flags, reducing downstream processing costs.

Schema stability
Resilient selectors

We use multiple fallback chains for critical fields like salary and job type, ensuring that minor A/B tests on the CV-Library frontend do not break your data feed.

Applications

Who uses CV-Library data

Teams across industries use cv-library.co.uk data to build competitive products and smarter operations.

01
Labour Market Analysis

Economic researchers and hedge funds track hiring volume across UK regions and sectors as a leading indicator of economic health.

02
Salary Benchmarking

HR platforms aggregate real-time salary bands by job title and location to provide accurate compensation guidance to employers.

03
B2B Lead Generation

SaaS companies identify businesses actively hiring for specific roles (e.g., hiring a CRM manager signals intent to buy CRM software).

04
Competitor Tracking

Enterprise talent teams monitor competitor hiring velocity and strategic role openings to anticipate product roadmaps.

05
Job Board Aggregation

Niche job boards backfill their inventory by programmatically importing relevant listings from major generalist boards.

06
Recruitment Agency Intelligence

Agencies track which direct employers are struggling to fill roles, identifying high-probability targets for their services.

Why DataFlirt

"CV-Library holds one of the most comprehensive datasets of UK hiring intent and salary trends, but extracting it reliably requires bypassing strict pagination caps and parsing highly unstructured text."

Building a scraper for a major job board is trivial. Maintaining it at scale is not. Rate limits, layout changes, and complex text parsing require constant engineering attention. DataFlirt manages the entire extraction lifecycle, delivering clean, query-ready data so your team can focus on analytics rather than maintenance.

Technical Spec

CV-Library scraper technical capabilities

Everything supported by our cv-library.co.uk scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

UK Proxy Routing
Requests routed exclusively through UK residential IPs to avoid geo-blocks
Supported
Pagination Bypass
Automated search bisection to capture listings beyond the 100-page limit
Supported
Salary Normalisation
Extraction of min/max values from unstructured text strings
Supported
Agency Detection
Boolean flags separating direct employers from recruitment agencies
Supported
Remote Work Classification
Identification of fully remote or hybrid roles based on metadata and text
Supported
Change Detection (Diffs)
Hash-based diffing to emit only new, updated, or expired listings
Supported
Webhook Delivery
HTTP POST per record for real-time alerting on specific keywords
Supported
Candidate CV Downloads
Access to the CV database requires a paid employer account and login
Partial
1-Click Apply Execution
Automating job applications requires authenticated user sessions
Partial
Infrastructure

Infrastructure powering the pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheus
Scrapy Extraction

Scrapy handles high-throughput crawl orchestration, deduplication, and retry logic, optimising for speed on static HTML pages.

Proxy Infrastructure

We maintain pools of UK residential ISP proxies. Rotation happens per-request to distribute load and avoid IP reputation degradation.

Cloud-Native Orchestration

Pipelines run on Kubernetes. Airflow handles scheduling and dependency management. All state is stored in managed PostgreSQL.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON
Newline-delimited or nested arrays
CSV
Flat file with typed columns
XLS
Excel compatible format for analyst teams
Parquet
Columnar format for data warehouse ingestion
AWS S3
Direct bucket delivery
Webhook
HTTP POST per record for real-time workflows
API
REST endpoints to query your extracted dataset
Postgres
Direct database upserts
S3
Direct bucket delivery — compatible with any data lake
// faq

Common questions.

About cv-library.co.uk scraping, legality, and pipeline operations.

Ask us directly →
Is scraping CV-Library legal?

Scraping publicly available job listings is generally permissible. DataFlirt targets only public, non-authenticated job and company data. We do not extract personal candidate data or CVs. Clients should review terms of service and consult legal counsel for specific use cases.

Can you download candidate CVs?

No. Candidate CVs are gated behind employer logins and subject to strict data protection regulations (GDPR). We only extract public job postings and employer/agency metadata.

How do you handle unstructured salary text?

Our parsing engine uses regular expressions to identify currency symbols, numeric ranges, and time periods (e.g., hourly, annual). We output both the raw string and the normalised min/max numerical values.

How fresh is the data?

Pipelines can be configured to run hourly for specific high-priority searches, or daily for full category sweeps. Incremental runs complete quickly by only checking known active URLs and new search pages.

How do you track expired jobs?

We maintain a state database of all active job URLs. During a run, we verify the HTTP status and page content of known URLs. If a listing redirects or displays an expired notice, we flag it as closed in the output.

What is the minimum viable engagement?

Our smallest packages start at a defined set of search parameters or categories with daily delivery. We price based on data volume, extraction complexity, and delivery frequency.

Can I request a sample dataset?

Yes. We provide a sample run of up to 1,000 job records as part of the scoping process. This allows you to validate our salary parsing accuracy and schema fit before committing.

$ dataflirt scope --new-project --source=cv-library.co.uk ready

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a daily feed of specific tech roles or a full sweep of the UK job market, we build and operate the infrastructure. Tell us what you need.

hello@dataflirt.com · Bengaluru · IST · typical reply < 4h
Services

Data Extraction for Every Industry

View All Services →