Environmental data collected by crowdsourcing from the general public has become an increasingly valuable resource for researchers, governments, and companies in recent years. However, the collection and use of this data also raises important privacy concerns that must be considered. This article examines the benefits and risks of crowdsourced environmental data, focusing on the potential privacy implications.
Related Article: The Role of Environmental Big Data and Citizen Science in Tackling Environmental Challenges
Introduction
Crowdsourcing environmental data involves gathering observations, measurements, and other information about the environment from a distributed network of volunteers. With the proliferation of smartphones and consumer-grade sensors, citizens can easily collect data on things like air and water quality, biodiversity, noise levels, and more. They can then contribute this data to online platforms and databases accessible to scientists, policymakers, and other interested parties.
Some of the key benefits of crowdsourced environmental data include:
- Expanded spatial and temporal coverage: By leveraging the efforts of many contributors, crowdsourcing can generate far more extensive and fine-grained data than traditional monitoring networks. This improves researchers’ understanding of environmental phenomena.
- Reduced costs: Crowdsourcing provides a low-cost alternative to establishing extensive sensor networks or conducting large-scale sampling efforts. Individuals contribute data voluntarily at their own expense.
- Increased public engagement: Volunteering to collect data fosters greater public interest in and understanding of environmental science and policy issues.
- Near real-time data: Data can be contributed continuously, providing more immediate insights compared to sparse manual sampling.
However, while crowdsourcing generates advantages, it also introduces potential risks related to the privacy of individuals contributing data. As environmental data become more detailed and available at precise locations and times, they may reveal sensitive details about people’s lives and behaviors. Researchers and regulators must carefully consider these risks when handling crowdsourced data.
Privacy Concerns with Crowdsourced Data
Specific privacy concerns surrounding crowdsourced environmental data include:
Identifiability
Even when citizen scientists contribute data anonymously, the time and location specifics may make it possible to identify individuals. For example, noise level data tagged to a particular house could indicate when its occupants are home. Or biodiversity observations in someone’s backyard could pinpoint where they live. While not explicitly personal information, these data can expose people’s behaviors and whereabouts over time.
Automated Collection
Many crowdsourcing platforms rely on automated sampling via distributed sensors rather than manual observations. But sensors attached to homes, vehicles, and personal devices intrinsically collect information about the behaviors of their owners [3]. Analyzing the time and location patterns in automated sensor data may reveal details about people’s habits, such as their driving routes and schedules.
Sensor Capabilities
Not only can automated sensors reveal location information, but many also have expanded technical capabilities that capture additional details about individuals and their surroundings. For example, air quality sensors can now use laser-based systems to analyze particulate matter, revealing specific chemicals and pollution sources [4]. More advanced sensors measuring environmental data may have the sensitivity to detect things like conversations and activity levels, straying into surveillance.
Secondary Data Uses
One of the biggest privacy concerns around crowdsourced data is that it may be used for purposes beyond the original environmental monitoring goals. For example, air pollution data could be used to infer health conditions or transit patterns of specific individuals. Biodiversity data could help marketers profile customers’ yards and gardens. While regulations like Europe’s GDPR restrict secondary uses of personal data, it remains a risk.
Data Security
Crowdsourced environmental data are often managed by third parties such as research institutions and technology companies that may have varying data protection standards. Server breaches or unauthorized sharing remain risks, potentially exposing sensitive details about individual contributors against their wishes. More open public databases, while allowing wide access to the aggregate data, also pose security issues.
Lack of Informed Consent
Many crowdsourcing participants may not fully understand or consent to the potential privacy risks. Platforms often have participants tacitly agree to terms of service that grant broad usage rights. But the long-term privacy implications of sharing environmental data may be obscured or downplayed. This violates principles of informed consent.
Strategies to Mitigate Privacy Risks
Crowdsourcing environmental data provides invaluable scientific and social benefits that should not be lost due to privacy concerns. There are strategies platforms can employ to gather needed data while protecting individuals’ privacy:
- Aggregating data to broader spatial scales (e.g. neighborhood rather than household) or reducing precision of time stamps can help anonymize contributors.
- Allowing users to limit data collection purpose and opt in/out of certain uses gives more control over privacy impacts.
- Clear terms of service and consent processes explain to users exactly what data are collected and how they will be used.
- Automated sensor data can be processed locally on devices to extract only necessary environment metrics before transmitting to limit extraneous details collected.
- Enabling anonymous contribution without requiring account sign-ups avoids linking data to specific identities altogether.
- Secure data management protocols, encryption, access controls, and oversight within hosting organizations limit risks of exposure from security breaches.
- Following FAIR Information Practice Principles and GDPR guidelines provides a model for respecting privacy when collecting and handling crowdsourced information.
Related: Environmental Data and Indigenous Data Sovereignty
Case Examples of Privacy Protection
Some crowdsourced environmental monitoring programs offer models for how to gather needed data while protecting privacy:
- Luftdaten – This crowdsourced air quality network based in Germany allows anonymous data contribution from home sensors. They aggregate data to low spatial resolution and made key metadata like exact coordinates optional. They also have an open data policy.
- Forest Watch – Indonesia’s platform for monitoring deforestation gives users control over positional accuracy of contributed data points to obscure identifying property details. The platform also uses firewalls and access controls to protect data security.
- Citizen Science Taxonomy – Researchers developed a ranked taxonomy for categorizing the potential identifiability and sensitivity of different citizen science projects, helping identify privacy risks.
Conclusion
Crowdsourcing promises to greatly expand environmental data collection needed for research and policymaking. But the inherent privacy risks must be handled responsibly by data platforms, scientists, regulators, and companies making use of crowdsourced data. With careful design and management, citizen science initiatives can support environmental progress without unduly exposing details about people’s personal lives against their wishes. Achieving this balance will ensure crowdsourcing remains a viable tool for the future.
Next Steps
Round Table Environmental Informatics (RTEI) is a consulting firm that helps our clients to leverage digital technologies for environmental analytics. We offer free consultations to discuss how we at RTEI can help you.