SYSTEM all green source f-droid.org queue 4,821 packages p99 latency 118ms dataflirt.com · scraper/f-droid-org
RUN · 14 active pipelines · f-droid.org live

F-Droid repository data,
at warehouse scale.

We extract package metadata, version histories, anti-feature flags, and source repository metrics from F-Droid. Delivered as clean JSON, CSV, or Parquet to S3 or Postgres on your cadence.

Apps extracted
4.8K /run
APK versions
21.4K /run
Repo updates
142 /24h
Active pipelines
14
Uptime
99.98%
Data Dictionary

Every field we extract from f-droid.org

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for App Metadata objects from f-droid.org. All fields typed and schema-versioned.

package_idnamesummarydescriptionlicensecategoriessource_code_urlissue_tracker_urlchangelog_urldonate_urlwebsiteadded_datelast_updated
app_metadata
● 200 OK
"package_id": "org.mozilla.fennec_fdroid",
"name": "Fennec F-Droid",
"summary": "Browse the web, block trackers",
"license": "MPL-2.0",
"categories": "['Connectivity', 'Navigation']",
"source_code_url": "https://gitlab.com/relan/fennecbuild",
"added_date": "2015-02-14"
# package_idnamesummarydescriptionlicensecategories
1
2
3

Complete list of extractable fields for Version History objects from f-droid.org. All fields typed and schema-versioned.

package_idversion_nameversion_codeadded_datesize_bytesmin_sdktarget_sdkapk_urlpgp_signaturesha256_hashnative_code
version_history
● 200 OK
"package_id": "org.mozilla.fennec_fdroid",
"version_name": "124.0.0",
"version_code": 1240000,
"added_date": "2024-03-25",
"size_bytes": 83451904,
"min_sdk": 24,
"sha256_hash": "a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6q7r8s9t0"
# package_idversion_nameversion_codeadded_datesize_bytesmin_sdk
1
2
3

Complete list of extractable fields for Anti-Features objects from f-droid.org. All fields typed and schema-versioned.

package_idhas_anti_featurestrackingadsnon_free_networknon_free_addonsnon_free_assetsknown_vulndisabled_algorithmupstream_non_free
anti-features
● 200 OK
"package_id": "com.example.app",
"has_anti_features": true,
"tracking": true,
"ads": false,
"non_free_network": true,
"known_vuln": false,
"upstream_non_free": false
# package_idhas_anti_featurestrackingadsnon_free_networknon_free_addons
1
2
3

Complete list of extractable fields for Permissions objects from f-droid.org. All fields typed and schema-versioned.

package_idversion_codepermission_namepermission_descprotection_levelis_dangerousadded_in_sdkmax_sdk
permissions
● 200 OK
"package_id": "org.mozilla.fennec_fdroid",
"version_code": 1240000,
"permission_name": "android.permission.INTERNET",
"protection_level": "normal",
"is_dangerous": false,
"added_in_sdk": 1
# package_idversion_codepermission_namepermission_descprotection_levelis_dangerous
1
2
3

Complete list of extractable fields for Maintainer Data objects from f-droid.org. All fields typed and schema-versioned.

package_idauthor_nameauthor_emailbitcoin_addressliberapay_idopen_collective_idgithub_repogitlab_repotranslation_url
maintainer_data
● 200 OK
"package_id": "org.mozilla.fennec_fdroid",
"author_name": "Fennec Maintainers",
"liberapay_id": "fennec_fdroid",
"github_repo": "None",
"gitlab_repo": "https://gitlab.com/relan/fennecbuild",
"translation_url": "https://weblate.bubu1.eu/projects/fennec/"
# package_idauthor_nameauthor_emailbitcoin_addressliberapay_idopen_collective_id
1
2
3

Capabilities

Extract the complete FOSS Android ecosystem

Our F-Droid scraper parses repository indices, normalises metadata, tracks version diffs, and extracts anti-feature flags — providing a structured view of the open-source Android landscape.

Full Package Metadata

Extract package IDs, descriptions, categories, licenses, and external repository links for every app in the main F-Droid repository.

Version & APK Tracking

Track every APK release, including version codes, file sizes, SHA-256 hashes, and minimum SDK requirements.

Anti-Feature Extraction

Map F-Droid's anti-feature tags (tracking, non-free network services, ads) into structured booleans per package.

License Auditing

Capture exact license strings (GPL-3.0, MIT, Apache-2.0) to monitor FOSS compliance across the catalogue.

Reproducible Build Status

Identify packages verified by F-Droid's reproducible build server, extracting signature verification data.

Third-Party Repository Support

Extend extraction beyond the main repo to track IzzyOnDroid, Guardian Project, and other custom F-Droid indices.

Source & Issue Links

Extract and validate upstream GitLab, GitHub, and Codeberg repository URLs and issue tracker endpoints.

Donation & Maintainer Data

Parse Liberapay, Open Collective, and Bitcoin addresses associated with package maintainers.

Scheduled Index Syncs

Run daily or weekly pipelines to parse the latest index-v1.jar or index-v2.json and emit clean diffs.

// engagement pipeline

From F-Droid index to warehouse record

Brief in. Clean data out.

Define Scope
d 0

Specify target repositories (main F-Droid, IzzyOnDroid) and required fields. We design the extraction schema together.

Pipeline Build
d 2–4

We configure index parsers, metadata normalisation, and version diffing logic for f-droid.org.

Validation & QA
d 4–6

Schema validation, null-rate checks, and license string normalisation before full launch.

Delivery
ongoing

JSON / CSV / Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.

Under the hood

How our F-Droid pipeline handles the hard parts

Parsing F-Droid is less about bot detection and more about handling complex repository indices and unstructured metadata. Here is how we ensure clean data.

pipeline-monitor · f-droid.org · live ● active
// fingerprinting
Identity rotation
TLS fingerprintrandomised
User-agentrotated
IP poolresidential
Challenges blocked0
// pagination
Page coverage
48,291 pages queued running
// observability
Pipeline health
99.9%
uptime
142ms
p99 lat
0.3%
null rate
2
alerts
Index parsing
Native handling of index-v1 and index-v2

F-Droid distributes metadata via compressed index files. Our pipeline natively fetches and parses both the legacy index-v1.jar and the modern index-v2.json, normalising the output into a unified schema regardless of the repository format.

Diff generation
Only process new versions and changes

Parsing the entire F-Droid catalogue daily is inefficient. We maintain state across runs and emit structured diffs — alerting you only when a new APK is signed, an anti-feature is added, or metadata changes.

Data normalisation
Standardising licenses and categories

Upstream maintainers often input inconsistent license strings or category tags. We apply a normalisation layer to map raw strings to standard SPDX license identifiers and canonical F-Droid categories.

Mirror rotation
Reliable fetching across mirror networks

The primary f-droid.org domain can experience rate limits or downtime. Our fetchers automatically rotate through official and community mirrors to ensure uninterrupted data collection.

Monitoring & alerting
Pipeline health with anomaly detection

Every run emits structured logs to our observability stack. We alert on index parsing failures, missing signature fields, and schema drift — and respond before you notice.

Applications

Who uses F-Droid data — and how

Teams across industries use f-droid.org data to build competitive products and smarter operations.

01
Security & Malware Research

Security teams analyse APK hashes, permissions, and minimum SDKs to track FOSS ecosystem vulnerabilities.

02
Alternative App Store Aggregation

Third-party Android clients use structured F-Droid data to populate their own catalogues and search indices.

03
License Compliance Auditing

Organisations monitor the F-Droid repository to track GPL, MIT, and Apache license usage and ensure upstream compliance.

04
Privacy Analysis

Researchers map anti-feature flags across thousands of apps to quantify tracking and non-free network usage in the FOSS space.

05
Developer Market Intelligence

Maintainers track category saturation, update frequency, and competitor features to guide open-source project development.

06
Reproducible Build Tracking

Infrastructure teams monitor the success rate of F-Droid's reproducible build servers across different package architectures.

Why DataFlirt

"F-Droid provides the cleanest signal on Android privacy and open-source compliance, but parsing its repository indices at scale requires dedicated infrastructure."

Most teams underestimate the investment required to track F-Droid data: reliable extraction requires parsing compressed jar indices, handling mirror latency, standardising anti-feature flags, and monitoring version diffs. DataFlirt absorbs that complexity so your engineers can focus on the analysis — not the infrastructure.

Technical Spec

F-Droid scraper — technical capabilities

Everything supported by our f-droid.org scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

Index-v1.jar parsing
Native extraction of legacy XML metadata from jar archives
Supported
Index-v2.json parsing
Processing of modern signed JSON repository indices
Supported
Anti-feature extraction
Mapping of tracking, ads, and non-free network flags
Supported
APK download & hash verification
Optional fetching of actual APK binaries and SHA-256 validation
Supported
Third-party repositories
Support for IzzyOnDroid, Guardian Project, and custom repos
Supported
Reproducible build status
Extraction of signature verification data and build logs
Supported
Change detection (diffs)
Hash-based diff: only emit records with changed fields since last run
Supported
Webhook delivery
HTTP POST per record or batch — useful for real-time tracking
Supported
Maintainer internal build logs
Gated access to private CI/CD runners used by F-Droid maintainers
Partial
GitLab/GitHub private issue trackers
Requires authentication to upstream private repositories
Partial
Infrastructure

Infrastructure powering the F-Droid pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheus
Index Parsing Engine

Custom parsers built to handle F-Droid's specific index-v1 and index-v2 formats, handling decompression, signature verification, and schema mapping in memory.

Mirror Rotation Infrastructure

We maintain a dynamic list of active F-Droid mirrors. If the primary repository limits connections, requests automatically failover to community mirrors.

Cloud-Native Orchestration

Pipelines run on AWS Lambda (burst) and ECS (sustained). Airflow handles scheduling, dependency management, and SLA alerting. All state stored in managed Postgres.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON
Newline-delimited or nested — schema versioned per run
CSV
Flat file with typed columns — Excel/Sheets compatible
XLS
Excel format for non-technical analyst teams
Parquet
Columnar format for BigQuery, Snowflake, Athena
AWS S3
Direct bucket delivery — compatible with any data lake
Webhook
HTTP POST per record for real-time downstream processing
API
REST endpoints to query your extracted dataset
PostgreSQL
Upsert into your existing schema with conflict resolution
BigQuery
Streamed directly into your dataset with schema auto-detect
Snowflake
Stage + COPY INTO workflow — incremental or full-replace
S3
Direct bucket delivery — compatible with any data lake
// faq

Common questions.

About f-droid.org scraping, legality, and pipeline operations.

Ask us directly →
Is scraping F-Droid legal?

Yes. F-Droid is a public catalogue of open-source software. All metadata is publicly accessible and distributed under open licenses. DataFlirt extracts this public data without circumventing authentication or violating terms of service.

Do you parse index-v1 or index-v2?

Both. Our pipeline natively handles the legacy index-v1.jar (XML) and the modern index-v2.json formats, normalising the data into a single, unified schema for your warehouse.

Can you track third-party F-Droid repositories?

Yes. We can configure pipelines for IzzyOnDroid, Guardian Project, Bromite, or any custom repository that adheres to the F-Droid metadata standard.

How do you handle anti-features?

We extract F-Droid's specific anti-feature tags (e.g., Tracking, Ads, NonFreeNet) and map them as structured boolean fields for every package, allowing you to easily filter the dataset.

How fresh is the data?

F-Droid typically updates its main repository index daily. Our pipelines are scheduled to detect these index updates and process diffs immediately, ensuring your data is never more than a few hours behind the official repository.

Can you download and provide the actual APKs?

Yes. While standard pipelines extract metadata and download URLs, we can configure storage integrations to fetch, hash, and push the actual APK binaries to your S3 bucket.

Can I request a sample dataset before committing?

Absolutely. We provide a sample run of the main F-Droid repository as part of the pre-engagement scoping process — so you can validate schema fit, field completeness, and data quality before signing any contract.

$ dataflirt scope --new-project --source=f-droid.org ready

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off metadata dump or continuous tracking across multiple F-Droid repositories — we scope, build, and operate the pipeline. Tell us what you need.

hello@dataflirt.com · Bengaluru · IST · typical reply < 4h
Services

Data Extraction for Every Industry

View All Services →