Open Data
Open Data is a term used to describe data made openly available without permission or payment barriers — open access for data. Many organizations are working to support this concept, including:
- Science Commons
- PubChem
- ChemSpider
- Microsoft Project for Chemical Data
- Neuroscience Data Sharing Program
Science Commons
Science Commons is an organization that works within current copyright and patent law to promote legal and technical mechanisms that remove barriers to sharing scientific information.
Science Commons was “Built on the promise of Open Access to scholarly literature and data” and “identifies and eases key barriers to the movement of information, tools and data through the scientific research cycle.”
They recently released their new protocol for implementing open access data:”lay[ing] out principles for open access data and a protocol for implementing those principles…”
PubChem
PubChem is a freely accessible database that provides information about small molecules. It brings together chemical information with biomedical research and clinical information from numerous public sources. It is a key component of NIH’s “Roadmap”, which is intended to accelerate medical discovery to improve health.
PubChem prevailed in 2006 against an attempt by the American Chemical Society to force Congress to scale back or defund PubChem. ACS believed PubChem competed with ACS’s Chemical Abstracts Service. NIH staff analysis had showed that PubChem and CAS overlap relatively little in terms of content, and differ widely in scope and resources.
ChemSpider
ChemSpider is a chemistry search engine, which was “built with the intention of aggregating and indexing chemical structures and their associated information into a single searchable repository … available to everybody, at no charge.” Nature is now depositing chemical data in ChemSpider.
Microsoft Project for Open Access Repository for Chemical Data
Peter Murray-Rust, chemist, reports in December 2007 that Microsoft is funding a repository interoperability project with partners PubChem (see above), Cornell, Los Alamos Nuclear Laboratory, as well as several universities that involves creating “well-populated molecular repositories with heterogeneous content we (everything from crystallography to Wikipedia chemicals for example)” that will be openly available.
NIH Neuroscience Data-Sharing Program
The National Institute of Mental Health (NIMH), part of the National Institutes of Health (NIH), has launched The NIMH Human Brain Project, which aims to develop neuroscience informatics through “the creation and federation of web-based databases, analytical tools, and computational models to facilitate the open sharing and utilization of primary research data for all of neuroscience.”
The Human Brain Project has proposed some basic considerations for sharing data, including in part:
• Data produced with public funds should be shared for the public good.
• Journals publish data, as summaries and in non-machine-readable and largely non-reanalyzable form, should be supplemented by open availability of the primary datasets themselves.
• Research efficiency is greatly increased by making research data available for reanalysis and meta-analysis.
• Current NIH policy mandates a statement on data sharing for high-direct-cost grant applications.
