Caring for and sharing data created by volunteers

In this post Quentin Groom discusses his recent Commentary paper ‘Is citizen science an open science in the case of biodiversity observations?‘

Volunteers are the single largest source of biodiversity observations. Without their work we could not monitor the declines of native species nor the spread of invasive species. Their work provides data on the diversity of life, its geography, the migration of animals and numerous other aspects of life. In recent years online citizen science projects have blossomed with the general public getting involved with the cataloguing of collections, transcription of notebooks and the collection of specimens. Yet, apart from a few notable exceptions, it is hard to find out where these data are stored, who has access to these data and, if they are shared, under which licence. If you contribute to such a project, try it for yourself. You might find a general statement about data sharing, but in the majority of cases you will not find specific details.

one — *Part of a citizen science team on Sauk Mountain. Photo by Park Ranger (Cascades Butterfly Project Team) (CC BY 2.0 (http://creativecommons.org/licenses/by/2.0)), via Wikimedia Commons.*

Coordinators of citizen science projects are well intentioned and will probably not waste the effort of volunteers. So does their data sharing policy matter? Well I think it does, for a number of reasons. Volunteers are donating their free time and skill to these projects in the expectation that the data will be used and valued. These data are in high demand to provide the evidence to policy makers that we need to do something about biodiversity loss and the global extinction crisis. Furthermore, not all data holders are aware of data standards and the importance of interoperability. I recently collaborated on a paper investigating the licensing of biodiversity observation data. This paper showed that citizen science data are often shared with more restrictive licences than other data providers. Data licensing is just one aspect of data sharing, but an important one. We need standard licensing so that researchers know which data they can use and under which conditions.

two — *Recording mountain goat survey results for the high country citizen science project, Siyeh Pass. Photo by GlacierNPS (CC BY 2.0 (http://creativecommons.org/licenses/by/2.0)), via Wikimedia Commons*

As a biodiversity informatician I look forward to the day when we have automated workflows that can harvest data from all over the world and can output information, such as maps and biodiversity indicators. Such systems could alert us to developing problems, such as catastrophic population declines; the emergence of invasive species and outbreaks of wildlife disease. Too often we only become aware of these issues after the window of opportunity has passed, spotting these trends early will only become possible if barriers to interoperability are removed and heterogeneous licensing is one of these barriers.

If, like me, you contribute to volunteer recording and citizen science projects, ask yourself how these data are shared. If you don’t like what you find, do not stop volunteering, but make the organization aware of your opinion. I suspect many of them have not considered these issues and just need your encouragement to change. If you are running a citizen science project, write a data management plan, including details of licensing, data sharing, data embargoes and archiving. Then make this plan public to show how seriously you take good data management. Nothing values a volunteer’s contribution more than making it useful for evermore and for everyone.