Friday, March 16, 2007

Human authorities vs machine generated wisdom

In my last newspaper article in the NZZ I end up with the question

"ob nicht Websites, die allen offenstehen und über
ausgeklügelte Qualitätskontrollen für die Fakten
verfügen, besser als einzelne Spezialisten "wissen
", was wichtig und aktuell ist.

That is, I question whether a human expert ought be replaced by some kind of an artificial system delivering a synthesis and a metrics of how trustworthy a bit of information is. This can cover the synthesis, but could go down to the single facts upon which such an synthesis is based.

Rod in his Iphylo blog had an interesting comment about this issue, well worth reading, including Matt Cockerill's post.

I would shy away from saying, that no humans are needed, but ponder rather the idea that for knowledge based on data on Internet-accessible sources and predefined criteria, a machine might fare better.

This is certainly a hot issue regarding the measuring a scientists contribution and needs with new models of communication, such as plos one, blogs, databases the attention of those needing to judge scientists for their promotion.

Friday, March 09, 2007

EO Wilson recipient of the TED award 2007 (2)

Here Wilson's wish and an excerpt from Wired's blog on Wilson lecture at the TED meeting.
Biologist E.O. Wilson followed Nachtwey by saying that he came on behalf of “insects and other small creatures,” to “make a plea for them.” Wilson’s wish: “I wish that we will work together to help to create the key tool we need to inspire preservation of earth’s biodiversity: The Encyclopedia of Life.” As I understand it, this would be a biological Super-Wikipedia, a collaborative project among scientists and amateurs that would contain information about all life on the planet.

“We live on a mostly unexplored planet,” Wilson emphasized. Recent years have seen the discovery of two new kinds of whales, a new kind of elephant, a distinct new kind of gorilla, and more. And on the microscopic (and smaller) scale, the earth is filled with the “dark matter of the biological world,” the bacteria, which are only beginning to be discovered.

“Our lives depend upon these creatures,” Wilson said. He estimated that 500 species of friendly bacteria live symbiotically with us in our mouths and throats, and that they probably fend of pathogenic bacteria. When it comes to species discovery, “Scientists are like explorers in a rowboat launched onto the Pacific Ocean.” (Wilson also allowed that he believes “true aliens,” creatures from outer space, might live among us on earth in the form of a bacterial species, which would have had billions of years to arrive.)

The “human juggernaut” is destroying the earth’s biodiversity, Wilson said, through habitat destruction (“including climate change”); the spread of invasive species, such as pathogenic bacteria and viruses, into every country; pollution; population expansion; and overharvesting, driving species into extinction through over-hunting and –fishing. (Wilson used the opening letter of each of these elements to create the acronym “HIPPO.”) Previous cataclysms of this sort, Wilson said, such as “the last one that ended the age of dinosaurs, took 5 to 10 million years to repair.”

In order to prevent catastrophe, Wilson said, “we need to have the biosphere properly explored.” He called for “a biological moon shot,” a project on the scale of the mapping of the human genome to map and discover the biological code of all of the life on the planet. The project, he said, could transform the science of biology and inspire a new generation of biologists to continue the quest that started for him 60 years ago: “to search for life, to understand it, and finally, above all, to preserve it.”

Comment: I sincerely hope that this latest initiative, together with the Encylopedia of Life project where he is honorary chair, will fly, and not end up in the same debacle as the last effort, the ALL species project. I hope, a governance model will be chosen, which is in support of the many data providers rather then us them as mere source, and one that strengthens existing global initiatives like the Global Biodiversity Information Facility rather then competes with it; one that involves, due to its global nature, the entire community and not just anglosaxon specialists, what seems to be the case right now at the EOL informatics part.

Wednesday, March 07, 2007

A thought about the copyright of legacy publications

At last week's (March 1, 2007) „Open Access – From the principles to the implementation“ meeting, organized by the Swiss Academy of Humanities and Social Sciences (summary), Bas Savenije from Utrecht University made a very interesting comment regarding copyright of pre-digital articles.

1. Before the advent of the Internet, nobody did sign any contracts regarding the digital copies and their distribution.
2. The large publishers, ie Elsevier or Blackwell (now Wiley) scan in all their back issues without asking the individual owners, that is the authors.
3. Since they do not ask the authors, and we authors didn’t sign a respective contract, we do not need to ask the publishers to allow us using digital versions of our publications to disseminate them over the Internet.

This essentially restricts copyright of publishers for articles to a period before roughly 1996, or when the first digital publication showed up, which is also individually for each journal.

According to Peter Suber, “When this question came up in the US (for journalists, not scholars), the Supreme Court decided in favor of the authors. That is, if their old contracts didn't mention electronic rights, then the authors didn't transfer the electronic rights to the publisher”.

Labels: , ,

Tuesday, March 06, 2007

If humans can't, at least machines talk to each other (2):

Future plans ...

The comment by Piotr Naskrecki (Director of Invertebrate Conservation at Conservation International ) about future development of the proprietary data in their recent ant catalogue CD-Rom reminded me about a blog I wrote last June TEAM initiative at Conservation International.

When I actually zoomed into their site, based on quiet some work on invertebrates, there is still no data accessible, just an announcement that there will.

TEAM is part of Conservation International, one of the lead institutions in the Conservation Commons, which signed up to make their data openly accessible.

A statement on making data accessible in another Harvard University Press based project with Piotr Nascrecki's involvement, Wilson's Pheidole, is since August 2003 up in the air with the same promise to be open access. Actions and access count, promises not.

Monday, March 05, 2007

If humans can't, at least machines talk to each other

The Taxonomic Impediment, one of the main reasons why biodiversity has almost vanished from the palate of environmental issues, really has two main ingredients: We do not know most of the species, nor can we find and identify most of those known to science, and measuring their abundance is extremely complicated and thus expensive. The latter shall not be discussed here, but the former, the charting and identification of species.

The better known groups, such as the feathery, furry, scaly and flowering species tools are out there to know what species are known, and increasingly how to identify them, and to know, where they live. I am aware, that this is an optimistic view though.

But these groups are far from representing the bulk of the ca 1.5 to 1.8M species, such as ants, a single family of insects representing 12,000 species alone.

With the recent publication of “Bolton’s Catalogue of Ants of the World: 1758-2005” (CD-Rom, Harvard University Press, USD45), the authors claim that “There is no longer an excuse for nomenclatural mistakes, since all past decisions are recorded here.”. The authors must indeed be convinced about their infallibility by publishing a CD-Rom based on a Filemaker extension which does not allow entering, correcting or even exporting any data from there CD-Rom.

Although Shattuck points out some source of errors in his review of this database, he simply ignores that there is a vibrant Web-based ant systematics community out there, and in fact that ant names have been for more than four years now part of the body of names feeding into global efforts to build finally a list of the world’s species (eg. Species2000, ITIS, etc.), and it is widely used. There are not only names out there, but, unlike the citations on Bolton CD-Rom, all the citations are linked to a digital library including over 4,100 publications (excluding such copyrighted works as Wilson’s Pheidole of the World, printed like the CD at Harvard University Press), a feature used by the authors of the new CD-Rom to extract information from legacy publications.

But there is no acknowledgment on the CD-Rom, which might not have been created in such as short time had the publications to be searched in the library, nor has there been a feedback on errors found, or missing publications. This even though a Creative Commons licence in states, that this work can be used under the following conditions: Attribution, non-commerical, and share alike. is based on Bolton’s first catalogue, published in 1995. But in antbase, every taxon name has an acknowledgment of the original source. Bolton and Harvard University Press did explicitly not wanted to make the catalogue of the ants of the world open access, a policy still pursued with the publication of a USD45 expensive stand alone application.

Does Shattuck’s view of the closed (CD-Rom containable) world really hold? Even if there are errors and omissions in antbase, we can now easily correct them because of this CD-Rom. At the same time we are now continually adding new names (see eg for 2006, 2007), or combination of names we discover whilst making legacy publications machine readable, and thus anybody can get all the data from the Web.

What about all those other ant communities on the web, nicely summarized by Verhaagh and Klingenberg? What about institutions like GenBank using as one of there references the antbase/Hymenoptera Name Server names to link gene sequences to names? What about,, and others using as their taxonomic reference? What about ants helping to shape the discussion of the future of taxonomy on the web? The value off a catalogue is effectively to a much wider audience (see the red dots on the map) then to the specific taxonomists themselves. Most of the latter are from the developing world, and are not able to pay for it, even though most of the data originates is from their countries (see copyright = biopiracy?).

The good emerging property of the Web is that we no longer have to depend on this secretive and authoritative individuals and groups who want to control and sell their knowledge. Luckliy for most of us, machines do not care about self declared authorities; they just ignore them, because they are not found.