Is it Time to Open Up to Open Data?
Open data has the potential to radically change the face of research. The ability to access and share freely available data, without high costs or issues of copyright, could be invaluable for generating new ideas, increasing the transparency of research, and developing research in fields ranging from city planning to public health to the physical sciences. Recently, Google introduced Dataset Search, a new tool specifically for finding open data sources 1, which will allow academic researchers and curious citizens alike to identify and explore these sources much more readily. With this, could science be on the precipice of a revolution?
The idea of sharing data freely might seem, on the surface, to be a cause for concern; with headlines hitting the news about data breaches, it appears that data is something we should want researchers, governments, and commercial businesses to keep to themselves. But open data is not the same thing as personal data — in fact, some open data sources have nothing to do with people at all; they can be anything from weather records to government spending or even box office figures. For open data research to work, individual-level information need not be searchable to anyone. What open data research entails is the sharing of aggregate data once it has been collected and cleaned (checked for errors or missing data, and modification of variables into a format that can be analysed). Take a pedometer as an example: a researcher might want to investigate whether the population of a city has an overall higher average step count following changes to infrastructure, such as opening a new pedestrianised zone. To get this information the researcher does not need to know that you personally only took 204 steps one Sunday last month because you were too hungover to leave the flat — they would only need aggregate level data and trends. The results could then help to inform future city planning decisions based on apparent changes in use. This exemplifies how open data can lend itself to countless opportunities for observational research studies
More robust experimental designs could also have much to gain. Randomized Control Trials (or RCTs) are considered the gold standard in medicine for determining whether a new drug or vaccine is effective. They are expensive to run, so researchers are not always willing to share the data that has been gathered following all their time and investment. However, hiding clinical trial data can undermine the ability of policymakers to make informed decisions when approving new drugs 2. And beyond this, open data could be invaluable in our current political climate; with distrust of experts rife, transparency in how we conduct our research is more vital than ever. The public may be less inclined to believe that a new drug or vaccine is safe if the data which prove this are locked up behind Big Pharma’s doors.
The AllTrials campaign, spearheaded by Ben Goldacre, asks that clinical trial data be disseminated more widely. Specifically, the campaign seeks for all trials that have been conducted to be: recorded on a trial register, regardless of their findings; for a summary of the results to be published in a journal; and for greater detail of the methodologies and results to be accessible in a report, so that the methods used by the researchers can be critiqued or replicated. The individual level patient data, which can risk breaching patient confidentiality even after it has been anonymised, is not something that AllTrials expects to be published. It seems obvious that the first two points should be widely known, but worryingly, this has not been the case so far, with only 50% of EU studies reporting their results 3. Often journals have been less willing to publish papers with negative results. They’re “less sexy” — you’re not likely to win a Nobel prize just for finding out that a drug you spent years working on was no more effective than the current treatment available. There’s also a concern that drug companies may suppress negative results about a new drug and only publish results that are favourable to them. And no matter what the root cause is, this has lead to a huge reporting bias, as trials with positive results are twice as likely to be published than those with negative results 4. If a drug, trialled five times, is only found to have an effect on one occasion, and that one occasion is the only result that makes it into the public consciousness, then we haven’t really learned anything at all. It is impossible to say with any certainty that a treatment is effective if we do not have access to all of the information available.
Of course, there can be sticking points too. The time and costs involved in running a trial through to the data management and cleaning stage are significant. Some detractors argue that a move toward open data will “stifle discovery”, and that no one will be willing to take this task on if data sources are freely available, limiting financial gains through intellectual property 5. It’s a fair point given the huge costs and risk involved when investing in trials, but shouldn’t we, as scientists, want to further the knowledge within our subject areas as a whole? And wouldn’t a culture of more open research, rather than an individualistic approach, lend itself to greater innovation? Shifting to an academic climate where more emphasis is put on data citations would be a great upheaval, but it could help to quell these fears.
Open data has already had an impact in fields beyond medicine. The initiative GovLab collected information on open data projects across the globe and identified four key areas where its influence could already be seen: improvements to government through increased transparency; empowering citizens by enabling them to stay informed with developments in their country; creating new opportunities; and helping to solve public problems by using novel, data-driven approaches 6. One such approach was used during the emergency response to the 2010 earthquake in Christchurch, New Zealand. The quake was extremely violent, destroying large swathes of the city and leading to the deaths of almost 200 people. In the aftermath, data sharing and crowdsourcing (obtaining information from multiple organisations/ groups for a common goal) of data were used to generate the Canterbury Recovery Map — an up-to-date mapping service to direct response teams, inform of damaged or blocked roads, and guide residents to emergency supplies. Later, open data was used to create a construction tool to help re-build in urban areas, and this is estimated to have saved NZ$4 million in construction costs in the year that followed 7. The innovations that stemmed from this tragedy have facilitated further open data research in the country and could help with response efforts to similar disasters globally.
Another open data innovation closer to home is TheyWorkForYou.com 8 — a resource that gives UK residents the opportunity to read up on MPs and find out more about their voting records, interests, and speeches in the House of Commons 9. This is an invaluable resource when it comes to elections as you do not need to spend much time investigating your local MP to find out if they really do represent your best interests. So even those without the time or inclination to take a great interest in politics can be empowered with this information. With greater transparency, it also means that MPs are more likely to be held accountable and may consider the lasting outcomes of decisions they make.
From dealing with natural disasters to (hopefully) averting democratic ones, open data has boundless potential. Research thrives when we can critique one another, learn, and grow. Open data will undoubtedly advance the democratic process, social research, and scientific discovery, not with a bang, but with a gradual incline toward greater collaboration, enhanced opportunities, and higher quality research all round.
This article was specialist edited by Madeline Pritchard and copy-edited by Maisie Keogh.
References
- Check out the search tool here: https://toolbox.google.com/datasetsearch
- https://www.bmj.com/content/347/bmj.f1880
- https://www.bmj.com/content/362/bmj.k3218
- https://www.bmj.com/content/347/bmj.f1880
- https://www.bmj.com/content/347/bmj.f1881
- https://www.opengovpartnership.org/stories/how-open-data-changing-world-key-findings-open-data-impact-case-studies
- Read more about the numerous open data innovations used following the Christchurch earthquake: http://odimpact.org/case-new-zealands-christchurch-earthquake-clusters.html
- https://www.theyworkforyou.com/
- http://odimpact.org/case-united-kingdoms-theyworkforyou.html