We extract yield curve rates, OFAC SDN lists, foreign exchange rates, and public debt figures from treasury.gov. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Snowflake on your cadence.
Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.
Complete list of extractable fields for Yield Curve Rates objects from treasury.gov. All fields typed and schema-versioned.
"date": "2023-10-24", "1_mo": 5.54, "3_mo": 5.59, "1_yr": 5.42, "10_yr": 4.84, "30_yr": 4.96, "scraped_at": "2023-10-24T18:00:00Z"
| # | date | 1_mo | 2_mo | 3_mo | 6_mo | 1_yr |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for OFAC SDN Sanctions objects from treasury.gov. All fields typed and schema-versioned.
"entity_id": "2674", "name": "KIM, Jong Un", "type": "individual", "programs": "DPRK2", "nationalities": "Democratic People's Republic of Korea", "remarks": "(Linked To: WORKERS' PARTY OF KOREA)", "scraped_at": "2023-10-24T18:05:00Z"
| # | entity_id | name | type | programs | nationalities | citizenships |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for Public Debt objects from treasury.gov. All fields typed and schema-versioned.
"record_date": "2023-10-23", "debt_held_by_public": 26543128491.42, "intragovernmental_holdings": 7084918231.11, "total_public_debt_outstanding": 33628046722.53, "reporting_frequency": "Daily", "scraped_at": "2023-10-24T18:10:00Z"
| # | record_date | debt_held_by_public | intragovernmental_holdings | total_public_debt_outstanding | source_system | reporting_frequency |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for Exchange Rates objects from treasury.gov. All fields typed and schema-versioned.
"record_date": "2023-09-30", "country": "India", "currency": "Rupee", "exchange_rate": 83.12, "effective_date": "2023-09-30", "quarter": 3, "year": 2023
| # | record_date | country | currency | exchange_rate | effective_date | source_document |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for Treasury Bulletins objects from treasury.gov. All fields typed and schema-versioned.
"publication_date": "2023-09-01", "title": "Treasury Bulletin September 2023", "document_url": "https://fiscal.treasury.gov/files/bulletin/b2023-3.pdf", "document_type": "PDF", "file_size_bytes": 4194304, "parsed_tables_count": 42
| # | publication_date | title | document_url | document_type | file_size_bytes | authoring_bureau |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Our treasury.gov scraper handles every layer of the platform: legacy HTML tables, complex PDF extraction, inconsistent XML feeds, and daily financial updates.
Daily interest rates and historical backfills for 1-month to 30-year maturity periods, parsed directly from Treasury data feeds.
Extract and normalise the Specially Designated Nationals (SDN) list, capturing aliases, addresses, and linked entities.
Daily public debt figures, separating debt held by the public from intragovernmental holdings, timestamped per release.
Quarterly and daily foreign currency exchange rates published by the Treasury for reporting purposes.
Navigate 1990s table structures and nested DOM elements that lack modern CSS classes or IDs.
OCR and table parsing for official Treasury Bulletins and Monthly Treasury Statements.
Standardising disparate Treasury formats into a unified schema for downstream ingestion.
Run pipelines immediately after daily market close or official Treasury publication times.
Extract yield curve and debt data going back to 1990 for comprehensive macroeconomic modelling.
Ensure financial figures match expected types, preventing strings from breaking your numerical models.
Brief in. Clean data out.
Provide specific datasets, historical ranges, or document types. We design the extraction schema together.
We configure Scrapy crawlers, PDF parsing modules, and normalisation logic for treasury.gov.
Schema validation, null-rate checks, format normalisation, and sample exports before full launch.
JSON, CSV, or Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.
Government websites present unique challenges. Here is how we stay resilient and why teams choose managed infrastructure over DIY.
Many treasury.gov pages rely on table-based layouts built decades ago. Our selector strategy uses structural mapping and text-pattern matching to extract data reliably, even when standard CSS selectors fail.
Treasury Bulletins are often published solely as PDFs. We use advanced OCR and spatial table parsing to convert these documents into structured, queryable data.
Government data feeds frequently change their XML structures without warning. Our pipelines include schema drift detection and fallback logic to maintain data continuity.
We implement intelligent request pacing and concurrency limits to respect treasury.gov infrastructure while ensuring timely data extraction.
Financial figures are often presented with commas, currency symbols, or text descriptors like 'Trillion'. Our normalisation layer cleanses these inputs into strict numerical types.
Quantitative analysts use daily yield curves to build interest rate models and forecast economic cycles.
Financial institutions automate checks against the OFAC SDN list to ensure regulatory compliance.
Corporate treasuries track official exchange rates for accounting and risk mitigation purposes.
Think tanks and researchers monitor public debt trends and intragovernmental holdings.
Traders benchmark corporate bonds and mortgage rates against Treasury yield curves.
Academia analyses 30-year interest rate cycles using our complete historical backfills.
"The US Treasury holds the foundational metrics for global finance, but accessing decades of yield curves and debt figures requires navigating a maze of legacy HTML and PDFs."
Most quantitative teams waste weeks building parsers for treasury.gov. The site mixes modern APIs with 1990s table layouts, inconsistent XML feeds, and complex PDF bulletins. DataFlirt abstracts this chaos. We handle the extraction, normalisation, and schema validation, delivering clean financial time-series directly to your data warehouse.
Everything supported by our treasury.gov scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.
Open-source tooling on proven cloud infra — no vendor lock-in, full observability.
Custom Scrapy pipelines designed specifically to navigate and parse 1990s HTML table structures that lack modern identifiers.
Advanced OCR and spatial parsing modules to extract structured tables from official Treasury reports and bulletins.
Pipelines run on AWS Lambda and ECS. Airflow handles daily scheduling immediately following official Treasury publication times.
Data delivered to where your team already works — no new tooling required.
About treasury.gov scraping, legality, and pipeline operations.
Ask us directly →Yes. Treasury.gov publishes public government data intended for transparency and public use. DataFlirt extracts only publicly available information and adheres to reasonable request limits to avoid disrupting government infrastructure.
Our pipelines are scheduled to run immediately after the Treasury publishes daily updates, typically delivering data to your warehouse within minutes of official publication.
Yes. We extract and normalise the entire Specially Designated Nationals list, resolving complex nested data such as multiple aliases, addresses, and linked entities into a clean relational schema.
Yes. We utilise OCR and spatial table parsing to extract structured numerical data from Treasury Bulletins and Monthly Treasury Statements published in PDF format.
We can backfill yield curve and public debt data as far back as the Treasury provides it on their domain, which for many datasets extends to 1990.
We monitor pipelines continuously. If a legacy table structure or XML schema changes, our alerting system flags the anomaly, and our engineers update the selectors to restore data flow.
Yes. While we specialise in pushing data to your warehouse via S3, BigQuery, or Snowflake, we can also expose the extracted datasets via a managed REST API.
20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a daily feed of yield curves or a historical backfill of public debt figures, we build and operate the pipeline. Tell us what you need.