SYSTEM all green source treasury.gov queue 1,492 documents p99 latency 318ms dataflirt.com · scraper/treasury-gov
RUN : 34 active pipelines : treasury.gov live

Treasury data,
at warehouse scale.

We extract yield curve rates, OFAC SDN lists, foreign exchange rates, and public debt figures from treasury.gov. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Snowflake on your cadence.

Yield rates extracted
4,192 /day
Sanction entities
38,419 /run
Historical records
2.8M /total
Active pipelines
34
Uptime
99.99%
Data Dictionary

Every field we extract from treasury.gov

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Yield Curve Rates objects from treasury.gov. All fields typed and schema-versioned.

date1_mo2_mo3_mo6_mo1_yr2_yr3_yr5_yr7_yr10_yr20_yr30_yrscraped_at
yield_curve rates
● 200 OK
"date": "2023-10-24",
"1_mo": 5.54,
"3_mo": 5.59,
"1_yr": 5.42,
"10_yr": 4.84,
"30_yr": 4.96,
"scraped_at": "2023-10-24T18:00:00Z"
# date1_mo2_mo3_mo6_mo1_yr
1
2
3

Complete list of extractable fields for OFAC SDN Sanctions objects from treasury.gov. All fields typed and schema-versioned.

entity_idnametypeprogramsnationalitiescitizenshipsdates_of_birthplaces_of_birthaddressesremarksaliasesscraped_at
ofac_sdn sanctions
● 200 OK
"entity_id": "2674",
"name": "KIM, Jong Un",
"type": "individual",
"programs": "DPRK2",
"nationalities": "Democratic People's Republic of Korea",
"remarks": "(Linked To: WORKERS' PARTY OF KOREA)",
"scraped_at": "2023-10-24T18:05:00Z"
# entity_idnametypeprogramsnationalitiescitizenships
1
2
3

Complete list of extractable fields for Public Debt objects from treasury.gov. All fields typed and schema-versioned.

record_datedebt_held_by_publicintragovernmental_holdingstotal_public_debt_outstandingsource_systemreporting_frequencyfiscal_yearfiscal_quarterscraped_at
public_debt
● 200 OK
"record_date": "2023-10-23",
"debt_held_by_public": 26543128491.42,
"intragovernmental_holdings": 7084918231.11,
"total_public_debt_outstanding": 33628046722.53,
"reporting_frequency": "Daily",
"scraped_at": "2023-10-24T18:10:00Z"
# record_datedebt_held_by_publicintragovernmental_holdingstotal_public_debt_outstandingsource_systemreporting_frequency
1
2
3

Complete list of extractable fields for Exchange Rates objects from treasury.gov. All fields typed and schema-versioned.

record_datecountrycurrencyexchange_rateeffective_datesource_documentquarteryearscraped_at
exchange_rates
● 200 OK
"record_date": "2023-09-30",
"country": "India",
"currency": "Rupee",
"exchange_rate": 83.12,
"effective_date": "2023-09-30",
"quarter": 3,
"year": 2023
# record_datecountrycurrencyexchange_rateeffective_datesource_document
1
2
3

Complete list of extractable fields for Treasury Bulletins objects from treasury.gov. All fields typed and schema-versioned.

publication_datetitledocument_urldocument_typefile_size_bytesauthoring_bureausummary_textparsed_tables_countscraped_at
treasury_bulletins
● 200 OK
"publication_date": "2023-09-01",
"title": "Treasury Bulletin September 2023",
"document_url": "https://fiscal.treasury.gov/files/bulletin/b2023-3.pdf",
"document_type": "PDF",
"file_size_bytes": 4194304,
"parsed_tables_count": 42
# publication_datetitledocument_urldocument_typefile_size_bytesauthoring_bureau
1
2
3

Capabilities

Everything you need from treasury.gov: nothing you don't

Our treasury.gov scraper handles every layer of the platform: legacy HTML tables, complex PDF extraction, inconsistent XML feeds, and daily financial updates.

Yield Curve Extraction

Daily interest rates and historical backfills for 1-month to 30-year maturity periods, parsed directly from Treasury data feeds.

OFAC Sanctions Monitoring

Extract and normalise the Specially Designated Nationals (SDN) list, capturing aliases, addresses, and linked entities.

Debt to the Penny

Daily public debt figures, separating debt held by the public from intragovernmental holdings, timestamped per release.

Exchange Rate Tracking

Quarterly and daily foreign currency exchange rates published by the Treasury for reporting purposes.

Legacy HTML Parsing

Navigate 1990s table structures and nested DOM elements that lack modern CSS classes or IDs.

PDF Data Extraction

OCR and table parsing for official Treasury Bulletins and Monthly Treasury Statements.

XML / CSV Normalisation

Standardising disparate Treasury formats into a unified schema for downstream ingestion.

Scheduled Updates

Run pipelines immediately after daily market close or official Treasury publication times.

Historical Backfills

Extract yield curve and debt data going back to 1990 for comprehensive macroeconomic modelling.

Schema Validation

Ensure financial figures match expected types, preventing strings from breaking your numerical models.

// engagement pipeline

From Treasury endpoint to warehouse record

Brief in. Clean data out.

Define Scope
d 0

Provide specific datasets, historical ranges, or document types. We design the extraction schema together.

Pipeline Build
d 2–4

We configure Scrapy crawlers, PDF parsing modules, and normalisation logic for treasury.gov.

Validation & QA
d 4–6

Schema validation, null-rate checks, format normalisation, and sample exports before full launch.

Delivery
ongoing

JSON, CSV, or Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.

Under the hood

How our Treasury pipeline handles the hard parts

Government websites present unique challenges. Here is how we stay resilient and why teams choose managed infrastructure over DIY.

pipeline-monitor · treasury.gov · live ● active
// fingerprinting
Identity rotation
TLS fingerprintrandomised
User-agentrotated
IP poolresidential
Challenges blocked0
// pagination
Page coverage
48,291 pages queued running
// observability
Pipeline health
99.9%
uptime
142ms
p99 lat
0.3%
null rate
2
alerts
Legacy HTML structures
Dealing with nested tables without CSS classes

Many treasury.gov pages rely on table-based layouts built decades ago. Our selector strategy uses structural mapping and text-pattern matching to extract data reliably, even when standard CSS selectors fail.

PDF Table Extraction
Parsing complex financial tables from Treasury PDFs

Treasury Bulletins are often published solely as PDFs. We use advanced OCR and spatial table parsing to convert these documents into structured, queryable data.

Inconsistent XML schemas
Handling undocumented schema changes

Government data feeds frequently change their XML structures without warning. Our pipelines include schema drift detection and fallback logic to maintain data continuity.

Rate limiting
Managing requests on public endpoints

We implement intelligent request pacing and concurrency limits to respect treasury.gov infrastructure while ensuring timely data extraction.

Data normalisation
Converting strings to exact floats

Financial figures are often presented with commas, currency symbols, or text descriptors like 'Trillion'. Our normalisation layer cleanses these inputs into strict numerical types.

Applications

Who uses Treasury data and how

Teams across industries use treasury.gov data to build competitive products and smarter operations.

01
Macroeconomic Modelling

Quantitative analysts use daily yield curves to build interest rate models and forecast economic cycles.

02
KYC & AML Compliance

Financial institutions automate checks against the OFAC SDN list to ensure regulatory compliance.

03
Currency Risk Management

Corporate treasuries track official exchange rates for accounting and risk mitigation purposes.

04
Fiscal Policy Analysis

Think tanks and researchers monitor public debt trends and intragovernmental holdings.

05
Fixed Income Pricing

Traders benchmark corporate bonds and mortgage rates against Treasury yield curves.

06
Historical Financial Research

Academia analyses 30-year interest rate cycles using our complete historical backfills.

Why DataFlirt

"The US Treasury holds the foundational metrics for global finance, but accessing decades of yield curves and debt figures requires navigating a maze of legacy HTML and PDFs."

Most quantitative teams waste weeks building parsers for treasury.gov. The site mixes modern APIs with 1990s table layouts, inconsistent XML feeds, and complex PDF bulletins. DataFlirt abstracts this chaos. We handle the extraction, normalisation, and schema validation, delivering clean financial time-series directly to your data warehouse.

Technical Spec

Treasury scraper: technical capabilities

Everything supported by our treasury.gov scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

Daily Yield Curve Rates
Extraction of all maturity periods, updated daily
Supported
OFAC SDN List Parsing
Full entity resolution including aliases and addresses
Supported
Public Debt to the Penny
Daily extraction of total public debt outstanding
Supported
Legacy HTML Table Extraction
Parsing deeply nested table structures
Supported
Treasury Bulletin PDF Parsing
Spatial extraction of tables within official PDFs
Supported
Historical Data Backfills
Extraction of datasets dating back to 1990
Supported
Change Detection (Diffs)
Only emit records with changed fields since last run
Supported
Webhook Delivery
HTTP POST per record for real-time processing
Supported
Non-Public IRS Tax Records
Individual or corporate tax filings requiring authentication
Partial
Classified Financial Intelligence
Internal Treasury communications and classified reports
Partial
Infrastructure

Infrastructure powering the Treasury pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheus
Legacy DOM Parsing

Custom Scrapy pipelines designed specifically to navigate and parse 1990s HTML table structures that lack modern identifiers.

PDF Extraction Pipeline

Advanced OCR and spatial parsing modules to extract structured tables from official Treasury reports and bulletins.

Cloud-Native Orchestration

Pipelines run on AWS Lambda and ECS. Airflow handles daily scheduling immediately following official Treasury publication times.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON
Newline-delimited or nested
CSV
Flat file with typed columns
XLS
Excel compatible format for analysts
Parquet
Columnar format for BigQuery, Snowflake, Athena
AWS S3
Direct bucket delivery
Webhook
HTTP POST per record for real-time workflows
API
Queryable REST endpoints for extracted data
PostgreSQL
Upsert into your existing schema
BigQuery
Streamed directly into your dataset
S3
Direct bucket delivery — compatible with any data lake
// faq

Common questions.

About treasury.gov scraping, legality, and pipeline operations.

Ask us directly →
Is scraping treasury.gov legal?

Yes. Treasury.gov publishes public government data intended for transparency and public use. DataFlirt extracts only publicly available information and adheres to reasonable request limits to avoid disrupting government infrastructure.

How quickly are daily yield curves updated?

Our pipelines are scheduled to run immediately after the Treasury publishes daily updates, typically delivering data to your warehouse within minutes of official publication.

Can you parse the OFAC SDN list?

Yes. We extract and normalise the entire Specially Designated Nationals list, resolving complex nested data such as multiple aliases, addresses, and linked entities into a clean relational schema.

Do you extract data from Treasury PDFs?

Yes. We utilise OCR and spatial table parsing to extract structured numerical data from Treasury Bulletins and Monthly Treasury Statements published in PDF format.

How far back does historical data go?

We can backfill yield curve and public debt data as far back as the Treasury provides it on their domain, which for many datasets extends to 1990.

How do you handle changes to the website structure?

We monitor pipelines continuously. If a legacy table structure or XML schema changes, our alerting system flags the anomaly, and our engineers update the selectors to restore data flow.

Can I get this data via API?

Yes. While we specialise in pushing data to your warehouse via S3, BigQuery, or Snowflake, we can also expose the extracted datasets via a managed REST API.

$ dataflirt scope --new-project --source=treasury.gov ready

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a daily feed of yield curves or a historical backfill of public debt figures, we build and operate the pipeline. Tell us what you need.

hello@dataflirt.com · Bengaluru · IST · typical reply < 4h
Services

Data Extraction for Every Industry

View All Services →