BY NICOLE NOGOY
What do New Zealand’s beautiful indigenous birds the Rifleman and Kea have in common with Open Data? Their newly sequenced genomes are among the first in the world to be released under a Creative Commons CC0 waiver.
Science as a whole is progressing fast due to the advancement in new technologies and data-driven approaches. Because of this, researchers worldwide are experiencing the excitement of the benefits — as well as the challenges — of “big data”, particularly due to technical and cultural issues handling what some term the “data deluge”. Due in part to larger volumes of data supporting work, a growing trend of papers being retracted, (which is, perhaps counter-intuitively, strongly correlated with the journal impact factor), has been observed and continues to rise, partly due to a growing reproducibility gap. One way to address this problem is through Open Access to research and data utilising Creative Commons Attribution licenses (CC BY) for text and CC0 waivers for data, as well as data integration and online tools.
I am fortunate enough to work remotely (based in Wellington, New Zealand) as the Commissioning Editor of GigaScience, a journal co-published by the BGI, the world’s largest genomics organisation, and the Open Access pioneer BioMed Central. GigaScience publishes open access ‘big-data’ studies from the entire spectrum of life and biomedical sciences, whose goal is to promote open science, transparency and reproducibility. The scope of GigaScience covers the issues producing and handling large-scale biological and biomedical data, and provides resources and a forum for data producers and the open science community.
At GigaScience, being a true Open Access journal, all our textual content (such as blogs, and open peer reviewer reports) is published under a CC BY 4.0 Attribution licence, and our data is CC0 — maximising its reuse and setting our content free in the commons. This has only allowed us to do great things; for example, our first dataset was the genome sequence of the deadline E. Coli bug that spread throughout Germany and led to 53 total deaths in 2011. The release of this dataset in the public domain rapidly allowed researchers around the world to begin attempts to understand the pathogenic features of the disease; within days people were swapping results on Twitter with all results archived in GitHub. As a pioneering example of people crowdsourcing results on Twitter — bloggers have asked; ‘is this the first “Tweenome”?’ — this has subsequently been used as an example for science policy in Europe (and hopefully New Zealand). The Royal Society in the UK used the E. coli CC0 data release and crowdsourcing as an example of “the power of intelligently open data”, and highlighted it on the cover of its influential “Science as an Open Enterprise” report.
On World Hunger Day 2014, GigaScience released its largest dataset to date, 3,000 Rice Genomes (13.4 TB of CC0 data). This dataset quadrupled the amount of rice genomic data in the public domain. This exceptionally large dataset was produced by a collaboration between the International Rice Research Institute (IRRI), the Chinese Academy of Agricultural Sciences (CAAS), and BGI — funded by the Bill and Melinda Gates Foundation. A major goal of this project is to develop resources that will aid in improving global food security, especially in the poorest areas of the world.
If you thought we were only about genomic data, well you’re wrong. GigaScience also publishes a variety of different data types from Neuroscience (fMRI and EEG), to large-scale imaging data. Our high-resolution 3D MicroCT imaging datasets of an earth worm, aka “cyber worm“, was recently featured in Scientific American and represents the future of “big data” cyber taxonomy and comparative morphology. Being Open Access, the stunning high-resolution 3D images, videos and interactive models also can be used as teaching aids and make fantastic resources for understanding worm anatomy — a method that can enliven zoology.
Having me based in Wellington has enabled GigaScience to reach out to Kiwi researchers and to promote more transparency in science and reproducibility through open peer review and open data. We do this in order to break through a rather frustrating and New Zealand-prominent cultural hurdle, known as the ‘big “I” in science” — the fear of releasing one’s data in the public domain.
Despite the cultural hurdles of many researchers being protective of their data, it was exciting to see the involvement of two New Zealand researchers from the University of Canterbury, Dr Paul Gardner and Dr Tammy Steeves, in the Avian Phylogenomics project. This was a massive international effort involving more than 200 scientists that looked at how modern birds evolved from the termination of dinosaurs by a meteorite 66 million years ago. GigaScience and its database, GigaDB, hosts several large bird genome assembly datasets (including Kea and Rifleman) from this project in the public domain under the CC0 waiver.
Dr Gardener stated in a news piece in Voxy that “ultimately, we want to preserve the genetic diversity of threatened species so they have the ability to adapt to environmental change’’ and Dr Steeves stated that “she is confident the publication of 45 new bird genomes will lead to a surge of conservation genomics research in New Zealand.”
The involvement of Drs Gardner and Steeves in such a high-profile international project is paving the way for a new era of bird conservation and biodiversity research in New Zealand. GigaScience hopes that other researchers involved in New Zealand-based genome projects, such as the Tuatara and Kakapo, are inspired to follow suit.
There will also be vast amounts of (hopefully) open biodiversity data to come, as projects like the New Zealand Genomic Observatory aim to digitise and capture molecular information on all of the terrestrial species in well-defined New Zealand model ecosystems.
Another major goal of GigaScience is transparency and one of the ways the journal achieves this is through open peer review. However, we have taken it one step further – in June 2014 we partnered with Publons, a Wellington-based start-up company that also happens to be the world’s largest open peer-review platform, promoting transparency of research and giving peer reviewers due credit for their hard work. By having all our peer reviews available under a CC-BY licence, we can promote these efforts and already see positive examples of content reuse.
Releasing data, more open science and better transparency only means great things for researchers and can benefit everyone. If you want to know more about Open Access, Open Science, Open Data and Reproducibility, come join me and the BioMed Central Team at the BioMed Central Roadshow in Auckland on February 26, 2015. Registration is free.
Zhang, G; Li,B ; Li,C ; Gilbert,T; Jarvis,E; The Avian Genome Consortium; Wang,J (2014): Genomic data of the Rifleman (Acanthisitta chloris). GigaScience Database. http://dx.doi.org/10.5524/101015
Zhang, G; Li,B ; Li,C ; Gilbert,T; Jarvis,E; The Avian Genome Consortium; Wang,J (2014): Genomic data of the Kea (Nestor notabilis). GigaScience Database. http://dx.doi.org/10.5524/101031
Dr Nicole Nogoy is the Commissioning Editor of GigaScience and an Open Access, Open Data advocate.