Thursday, November 02, 2006

Open Access

During my research on an article on the value of "open access" I stumbled about a akward problem, that is how can I demonstrate the value of open access, which are products or research results which only are here because of open access. It seems to be a non-trivial problem, since my colleagues in the open access world, eg Peter Suber, don't have a list of examples at hand.

This list below is my dump for such examples which I hope to develop.

The wisdom of the crowds, or why more brains are more than their sum

Complex problems are more likely to be solved by many independent minds.
An initiative from a rather unexpected circle, the US Republicans, achieved that all the 48,000 boxes of documents seized since the March 2003 invasion in Iraq have been made public with the argument that "the nation's spy agencies had failed adequately to analyze the documents" and "that a wide analysis and translation of the documents (...) would reinvigorate the search for clues that Mr. Hussein had resumed his unconvetnional arms program..".
For the moment, the website "Operation Iraqi Freedom Document Portal" has been taken down because some highly sensitive articles have been found. New York Times, Nov 3, 2006 (U.S. Web Archive is Said to Reveal a Nuclear Guide)

Mashups mix data into global service (Nature 439: 6 - 7(2005))

Open Text Mining Initiatives

This editorial from Nature gives an insight plus a couple of links.


Nature 440, 1090 (27 April 2006) | doi:10.1038/4401090a
Machine readability
Top of page

A publishing initiative seems ready to make text mining simpler.

For many years Tim Berners-Lee, the inventor of the World Wide Web, has dreamed of machines being able to help humans use his creation. This would enable not only sophisticated search tools to hunt for words or phrases, but also for other engines to hunt for meanings and patterns. This 'semantic web' is being pieced together gradually. The latest step forward brings users of the scientific literature closer to that dream by enhancing computer access to the full text of the scientific literature.

Many scientists are used to the idea of data mining: the ability to plunder all the available databases to search not only for relevant nuggets, but also for unexpected combinations of data that reveal — or at least hint at — relationships and mechanisms. They are not so used to the analogous function of mining texts.

But some researchers have made a start. Biologists, for example, have developed software that explores open 'text bases', especially the PubMed database. They scan many publications in order to discover relationships based on phrases or sentences that, when analysed in combination, cumulatively link one object (such as a disease) to another (such as a molecule). At the University of California, Berkeley, the BioText project is being used to explore apoptosis, for example ( At the University of Illinois in Chicago, the Arrowsmith software explores the causes of disease ( And at the European Bioinformatics Institute near Cambridge, UK, the EBIMed retrieval engine explores protein–protein interactions (

But publishers have yet to develop a standard annotation of their content that allows computers access to the full text. Earlier this month, the Nature Publishing Group launched a preliminary proposal for such a standard. The proposal is not a commercial product but rather a potential service for the community. It is open for comment and is not intended to provide a competitive advantage to us: on the contrary, it will only succeed if adopted by other publishers.

The proposal is the Open Text Mining Interface (OTMI), which was first presented at the Life Sciences Conference and Expo in Boston earlier this month. A description and examples can be found at The proposal would make coded text freely available to all. If all publishers were to adopt this or some similar standard, the entire literature would become accessible for mining.

If all publishers were to adopt this or some similar standard, the entire literature would become accessible for mining.

How does this proposal relate to publishers' various business models? 'Author pays' publishers would be able to use this approach to machine readability and help users find their content more easily. 'Subscriber pays' publishers would follow the Nature Publishing Group in making this version of the full text freely explorable by machines but unreadable by humans. (Charging for machine access across diverse publishers' firewalls would effectively make machine text-mining impossible.) The OTMI approach to encryption is to jumble up sentences, retaining semantic relationships as far as possible.

Critics will point out that this limits the machine readability too; for example, some proximity searching becomes impossible. But the subscriber-pays model is strongly supported in the marketplace. OTMI represents a potential compromise between business needs and open access. Nature and its publishers welcome feedback about this initiative, which should be sent either to or to the above-mentioned blog."


Post a Comment

<< Home