Manjaro Linux is developing a new tool called Manjaro Data Donor (MDD) to gather usage statistics about its user base. The primary motivation behind this initiative is to improve user counting and gain a better understanding of the hardware and software environments in which Manjaro is being used.
In this blog post, we will discuss the reasons behind this shift, the data being collected, and how users can participate and provide feedback.
Table of Contents
The Need for a New Approach
Previously, Manjaro relied on pings sent from users' systems via NetworkManager to ping.manjaro.org to estimate user numbers. However, this method presented several limitations:
- Inaccurate Counting: Systems behind the same NAT were counted as one, and individual systems could only be distinguished based on IP addresses, hindering accurate tracking over time.
- Privacy Concerns: Although the analysis software, Matomo, claimed to mask IP addresses, Manjaro still had to rely on this promise, raising privacy concerns.
- Unsuitable Tooling: Matomo, primarily designed for website analysis, proved bulky and ill-suited for system telemetry, resulting in a "hacky" setup with limited results.
MDD aims to address these issues by providing a more transparent and efficient system for collecting user data.
What Data Does MDD Collect?
MDD utilises the system information tool inxi to gather hardware and environment statistics.
One of the user shared the data their system sent to the MDD servers via Manjaro forum. Here is the output:
[nls@lap ~]$ mdd Welcome to MDD - The Manjaro Data Donor Preparing data submission... ------------------------------------------ Sending the following data ------------------------------------------ { "meta": { "version": 1, "timestamp": "2024-11-02T14:02:29.754011+00:00", "device_id": "939bf6e1-8e22-5927-9c01-a8cff7f4d01d", "distro_id": "manjaro", "release": "24.1.1", "inxi": true }, "system": { "kernel": "6.6.54-2-MANJARO", "form_factor": "laptop", "install_date": "2023-04-07T07:35:41+00:00", "product_name": "NJ50_70CU", "product_family": "Not Applicable", "sys_vendor": "Notebook", "board_name": "NJ50_70CU" }, "boot": { "uefi": true, "uptime_seconds": 24921 }, "cpu": { "arch": "x86_64", "model": "Intel Core i7-10510U", "cores": 4, "threads": 8 }, "memory": { "ram_gb": 15.319877624511719, "swap_gb": 7.812496185302734 }, "graphics": { "comp": "kwin_wayland", "dri": "iris", "gpus": [ { "vendor": "CLEVO/KAPOK", "model": "Intel CometLake-U GT2 [UHD Graphics]", "driver": "i915" } ], "outputs": [ { "model": null, "res": "1920x1080", "refresh": 0, "dpi": 0, "size": "N/A" } ] }, "audio": { "servers": [ { "name": "PipeWire", "active": true } ] }, "disk": { "disks": [ { "size_gb": 931.5133895874023, "root": { "size_gb": 292.96875, "fstype": "ext4", "crypt": false }, "home": null } ], "windows": true }, "locale": { "region": "en_GB.UTF-8", "language": "en", "timezone": "Europe/Paris" }, "package": { "last_update": "2024-11-01T23:02:47+01:00", "branch": "stable", "pkgs": 1938, "foreign_pkgs": 43, "pkgs_update_pending": 0, "flatpaks": 0, "pacman_mirrors": { "total": 2, "ok": 1, "country_config": "France" } }, "desktop": { "cli": "/bin/bash", "gui": "KDE Plasma", "dm": "SDDM", "wm": "kwin_wayland", "display": "wayland", "display_with": "Xwayland" } } ------------------------------------------ Succesful sent at 2024-11-02 15:02:32 [nls@lap ~]$
As you can see in the output, this data included:
- Meta Information: MDD version, timestamp, a unique device ID, distribution ID, release version, and whether inxi is being used for data collection.
- System Information: Kernel version, system form factor, installation date, product name and family, system vendor, and motherboard name.
- Boot Information: Whether the system uses UEFI and the system uptime in seconds.
- Hardware Details: CPU architecture, model, number of cores and threads, RAM and swap space in GB, graphics compositor, display server, GPU vendor, model, driver, resolution, refresh rate, audio servers, disk sizes and filesystems, and whether Windows is installed alongside Manjaro.
- Locale and Package Information: User's region, language, timezone, date of last package update, Manjaro branch (stable, testing, unstable), number of packages installed, number of foreign packages, pending updates, number of Flatpaks, and details about pacman mirrors.
- Desktop Environment Details: Command-line interface, graphical user interface, display manager, window manager, display protocol (Wayland or X11), and whether Xwayland is being used.
Participating in the MDD Programme
Users can install MDD as a package from the Manjaro repositories using the following command:
sudo pacman -S mdd
To preview the data that will be sent, users can run:
mdd --dry-run
Once satisfied, users can submit their data by running:
mdd
For debugging purposes, the following command provides additional logs:
mdd --log DEBUG
Transparency and User Control
Manjaro emphasises transparency and user control in its data collection efforts. The source code for MDD is publicly available on GitHub, and a public website displays visualisations of the collected data.
Currently, MDD installation and data submission are manual. However, plans are in place to include MDD on all Manjaro systems and implement a systemd service for automatic data submission. This automatic data collection will be opt-out, meaning users will have to manually disable the service if they do not wish to participate.
Addressing Concerns and Feedback
The introduction of MDD sparked discussions about privacy, data accuracy, and the opt-out mechanism. Some users expressed concerns about the refresh rate data not being reliably collected, particularly on Wayland systems.
Roman Gilg, the developer of MDD, acknowledged these issues and is investigating potential solutions, including utilising xrandr and wayland-info to obtain more accurate refresh rate data.
Conclusion
Manjaro Data Donor is Manjaro's new approach to data collection. By moving away from its previous system, Manjaro aims to achieve greater accuracy in user counting and gain valuable insights into its user base. The project emphasises transparency by providing users with the ability to preview their data, access the source code, and explore visualisations of the collected data.
While the opt-out approach and potential privacy concerns are subject to ongoing discussion, Manjaro's commitment to open communication and community feedback suggests that these issues will be carefully considered as the project evolves.
Resources: