We extract package metadata, version histories, anti-feature flags, and source repository metrics from F-Droid. Delivered as clean JSON, CSV, or Parquet to S3 or Postgres on your cadence.
Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.
Complete list of extractable fields for App Metadata objects from f-droid.org. All fields typed and schema-versioned.
"package_id": "org.mozilla.fennec_fdroid", "name": "Fennec F-Droid", "summary": "Browse the web, block trackers", "license": "MPL-2.0", "categories": "['Connectivity', 'Navigation']", "source_code_url": "https://gitlab.com/relan/fennecbuild", "added_date": "2015-02-14"
| # | package_id | name | summary | description | license | categories |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for Version History objects from f-droid.org. All fields typed and schema-versioned.
"package_id": "org.mozilla.fennec_fdroid", "version_name": "124.0.0", "version_code": 1240000, "added_date": "2024-03-25", "size_bytes": 83451904, "min_sdk": 24, "sha256_hash": "a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6q7r8s9t0"
| # | package_id | version_name | version_code | added_date | size_bytes | min_sdk |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for Anti-Features objects from f-droid.org. All fields typed and schema-versioned.
"package_id": "com.example.app", "has_anti_features": true, "tracking": true, "ads": false, "non_free_network": true, "known_vuln": false, "upstream_non_free": false
| # | package_id | has_anti_features | tracking | ads | non_free_network | non_free_addons |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for Permissions objects from f-droid.org. All fields typed and schema-versioned.
"package_id": "org.mozilla.fennec_fdroid", "version_code": 1240000, "permission_name": "android.permission.INTERNET", "protection_level": "normal", "is_dangerous": false, "added_in_sdk": 1
| # | package_id | version_code | permission_name | permission_desc | protection_level | is_dangerous |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for Maintainer Data objects from f-droid.org. All fields typed and schema-versioned.
"package_id": "org.mozilla.fennec_fdroid", "author_name": "Fennec Maintainers", "liberapay_id": "fennec_fdroid", "github_repo": "None", "gitlab_repo": "https://gitlab.com/relan/fennecbuild", "translation_url": "https://weblate.bubu1.eu/projects/fennec/"
| # | package_id | author_name | author_email | bitcoin_address | liberapay_id | open_collective_id |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Our F-Droid scraper parses repository indices, normalises metadata, tracks version diffs, and extracts anti-feature flags — providing a structured view of the open-source Android landscape.
Extract package IDs, descriptions, categories, licenses, and external repository links for every app in the main F-Droid repository.
Track every APK release, including version codes, file sizes, SHA-256 hashes, and minimum SDK requirements.
Map F-Droid's anti-feature tags (tracking, non-free network services, ads) into structured booleans per package.
Capture exact license strings (GPL-3.0, MIT, Apache-2.0) to monitor FOSS compliance across the catalogue.
Identify packages verified by F-Droid's reproducible build server, extracting signature verification data.
Extend extraction beyond the main repo to track IzzyOnDroid, Guardian Project, and other custom F-Droid indices.
Extract and validate upstream GitLab, GitHub, and Codeberg repository URLs and issue tracker endpoints.
Parse Liberapay, Open Collective, and Bitcoin addresses associated with package maintainers.
Run daily or weekly pipelines to parse the latest index-v1.jar or index-v2.json and emit clean diffs.
Brief in. Clean data out.
Specify target repositories (main F-Droid, IzzyOnDroid) and required fields. We design the extraction schema together.
We configure index parsers, metadata normalisation, and version diffing logic for f-droid.org.
Schema validation, null-rate checks, and license string normalisation before full launch.
JSON / CSV / Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.
Parsing F-Droid is less about bot detection and more about handling complex repository indices and unstructured metadata. Here is how we ensure clean data.
F-Droid distributes metadata via compressed index files. Our pipeline natively fetches and parses both the legacy index-v1.jar and the modern index-v2.json, normalising the output into a unified schema regardless of the repository format.
Parsing the entire F-Droid catalogue daily is inefficient. We maintain state across runs and emit structured diffs — alerting you only when a new APK is signed, an anti-feature is added, or metadata changes.
Upstream maintainers often input inconsistent license strings or category tags. We apply a normalisation layer to map raw strings to standard SPDX license identifiers and canonical F-Droid categories.
The primary f-droid.org domain can experience rate limits or downtime. Our fetchers automatically rotate through official and community mirrors to ensure uninterrupted data collection.
Every run emits structured logs to our observability stack. We alert on index parsing failures, missing signature fields, and schema drift — and respond before you notice.
Security teams analyse APK hashes, permissions, and minimum SDKs to track FOSS ecosystem vulnerabilities.
Third-party Android clients use structured F-Droid data to populate their own catalogues and search indices.
Organisations monitor the F-Droid repository to track GPL, MIT, and Apache license usage and ensure upstream compliance.
Researchers map anti-feature flags across thousands of apps to quantify tracking and non-free network usage in the FOSS space.
Maintainers track category saturation, update frequency, and competitor features to guide open-source project development.
Infrastructure teams monitor the success rate of F-Droid's reproducible build servers across different package architectures.
"F-Droid provides the cleanest signal on Android privacy and open-source compliance, but parsing its repository indices at scale requires dedicated infrastructure."
Most teams underestimate the investment required to track F-Droid data: reliable extraction requires parsing compressed jar indices, handling mirror latency, standardising anti-feature flags, and monitoring version diffs. DataFlirt absorbs that complexity so your engineers can focus on the analysis — not the infrastructure.
Everything supported by our f-droid.org scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.
Open-source tooling on proven cloud infra — no vendor lock-in, full observability.
Custom parsers built to handle F-Droid's specific index-v1 and index-v2 formats, handling decompression, signature verification, and schema mapping in memory.
We maintain a dynamic list of active F-Droid mirrors. If the primary repository limits connections, requests automatically failover to community mirrors.
Pipelines run on AWS Lambda (burst) and ECS (sustained). Airflow handles scheduling, dependency management, and SLA alerting. All state stored in managed Postgres.
Data delivered to where your team already works — no new tooling required.
About f-droid.org scraping, legality, and pipeline operations.
Ask us directly →Yes. F-Droid is a public catalogue of open-source software. All metadata is publicly accessible and distributed under open licenses. DataFlirt extracts this public data without circumventing authentication or violating terms of service.
Both. Our pipeline natively handles the legacy index-v1.jar (XML) and the modern index-v2.json formats, normalising the data into a single, unified schema for your warehouse.
Yes. We can configure pipelines for IzzyOnDroid, Guardian Project, Bromite, or any custom repository that adheres to the F-Droid metadata standard.
We extract F-Droid's specific anti-feature tags (e.g., Tracking, Ads, NonFreeNet) and map them as structured boolean fields for every package, allowing you to easily filter the dataset.
F-Droid typically updates its main repository index daily. Our pipelines are scheduled to detect these index updates and process diffs immediately, ensuring your data is never more than a few hours behind the official repository.
Yes. While standard pipelines extract metadata and download URLs, we can configure storage integrations to fetch, hash, and push the actual APK binaries to your S3 bucket.
Absolutely. We provide a sample run of the main F-Droid repository as part of the pre-engagement scoping process — so you can validate schema fit, field completeness, and data quality before signing any contract.
20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off metadata dump or continuous tracking across multiple F-Droid repositories — we scope, build, and operate the pipeline. Tell us what you need.