Wednesday, February 23, 2011

Iranian scientists at work: an observation

Yesterday, I was invited to present a lecture at the Faculty of Biological Sciences, Sharif Beheshti University, Tehran, Iran. The goal was to present the students some ideas of what I think are relevant issues regarding biodiversity. Since I had the experience of mentoring two students through their masters thesis during the last 1.5 year, I thought not to complement what I learned from them: Instead of talking about fieldwork, monitoring design and analyzes, to show them what the big, global issues in this domain are. Why is biodiversity monitoring important. The lecture "Monitoring and Measuring Biodiversity: Some Thoughts" has been attended by a large crowd in a full lecture lecture hall, including a delegation from the Tehran University.

There where interesting questions afterward and some time for catch-up in various settings.

One issue that came, and is always coming up in discussions, is the increasingly difficult situation active scientists are here in Tehran. One typical issue are deteriorating relationships between the Iranian scientists and their former, often close counterparts abroad. What they tell is, that for the last 4-6 years, their colleagues hardly reply to their emails, even in cases where the Iranian supplied tissue or other biological materials for analyses.

Similarly, local scientists complain, that publishers in the West would not even reply to submissions of their manuscripts, something that these colleagues have not been aware of until few years ago.

The feeling is that the foreign scientists complement the sanctions imposed as well as the very negative reports coming out of this country. This attitude is astonishing, since, like in the Bush-years, there were a lot of objections against an evil US government, which was always seen as something different than the US citizen or scientist. In the case of Iran, this seems not to work this way.

This is actually an observation by almost all the visitors that visit Iran for the first time: They are all very astonished how different the experience with the people they meet is, institutions they collaborate with, and in fact leave with a very positive experience.

It my humble view it would be wise to continue the relationships rather than punishing the colleagues for something they have in most cases nothing to do with; this especially in a place where those colleagues have a very high esteem for the West, very often with part of their career spent there and thus very familiar with that region of the world.

Tuesday, February 15, 2011

Open Data in Ecology

The current Science Magazine special issue on data on Open Data in Ecology.

Abstract
Ecology is a synthetic discipline benefiting from open access to data from the earth, life, and social sciences. Technological challenges exist, however, due to the dispersed and heterogeneous nature of these data. Standardization of methods and development of robust metadata can increase data access but are not sufficient. Reproducibility of analyses is also important, and executable workflows are addressing this issue by capturing data provenance. Sociological challenges, including inadequate rewards for sharing data, must also be resolved. The establishment of well-curated, federated data repositories will provide a means to preserve data while promoting attribution and acknowledgement of its use.


This opens again the question of the illusion of adding up heterogenous data set. What can be done, what can not be done with the legacy data - data that we are going to produce for the next decennium if we do not have incentives to overcome existing research practices: to be very parsimonious on metadata and especially studying what it would need to collect data to be able to build up a larger dataset that can be used well beyond the scientists' own particular interest.

Saturday, February 12, 2011

The Origin of Hackers

I stumbled upon this book by Steven Levy Hackers: Heroes of the computer revolution and read it from cover to cover. It is a really interesting read, and keeping in mind that the excerpt below deals with the Hacker scene in 1982, I am stunned to become aware how "old" the issue of copyright is.

It also makes it very clear, why the problem surfaced: A switch in the business model.

The Third Generation lived with compromises in the Hacker Ethic that would have caused the likes of Greenblatt and Gosper to recoil in horror. It all stemmed from money. The bottom line of programming was ineluctably tied to the bottom line on a publisher's ledger sheet. Elegance, innovation, and coding pyrotech¬nics were much admired, but a new criterion for hacker stardom had crept into the equation: awesome sales figures. Early hackers might have regarded this as heresy: all software—all information—should be free, they'd argue, and pride should be invested in how many people use your program and how much they are impressed with it. But the Third-Generation hackers never had the sense of community of their predecessors, and early on they came to see healthy sales figures as essential to becoming winners.
One of the more onerous of the compromises in the Ethic grew out of publishers' desire to protect their sales figures. It involved intentional tampering with computer programs to prevent a program from being easily copied by users, perhaps for distribution without further payment to the publisher or author. The software publishers called this process "copy protection," but a substantial percentage of true hackers called it war.
Crucial to the Hacker Ethic was the fact that computers, by nature, do not consider information proprietary. The architecture of a computer benefited from the easiest, most logical flow of information possible. Someone had to substantially alter a computer process to make data inaccessible to certain users. Using one short command, a user could duplicate an "unprotected" floppy disk down to the last byte in approximately thirty seconds. This ease was appalling to software publishers, who dealt with it by "copy-protecting" disks: altering the programs by special rou¬tines which prevented the computer from acting naturally when someone tried to copy a disk. A digital roadblock that did not enhance the program's value to the user, but benefited the seller of the program.
The publishers had legitimate reason to resort to such unaesthetic measures. Their livelihood was invested in software. This was not MIT where software was subsidized by some institution. There was no ARPA footing the bill. Nor was this the Homebrew Computer Club, where everyone was trying to get his hardware built and where software was written by hobbyists, then freely swapped. This was an industry, and companies would go broke if no one bought software. If hackers wanted to write games free and hand them out to friends, that was their business. But the games published by On-Line and Braderbund and Sirius were not merely paper airplanes of truth released into the wind to spread computer gospel. They were products. And if a person coveted a product of any sort in the United States of America, he or she had to reach into a pocket for folding green bills or a plastic credit card in order to own it.
It drove publishers crazy, but some people refused to recognize this simple fact. They found ways to copy the disks, and did. These people were most commonly hackers.
Users also benefited from breaking disks. Some of them could rattle off a list of rationalizations, and you would hear them recited like a litany in meetings of users' groups, in computer stores, even in the letters column of Softalk. Software is too expensive. We only copy software we wouldn't buy anyway. We only do it to try out programs. Some of the rationalizations were compelling—if a disk was copy-protected, a legitimate owner would be unable to make a backup copy in case the disk became damaged. Most software publishers offered a replacement disk if ; you sent them a mangled original, but that usually cost extra, and | besides, who wanted to wait four weeks for something you already paid for?
But to hackers, breaking copy protection was as natural as breathing. Hackers hated the fact that copy-protected disks could not be altered. You couldn't even look at the code, admire tricks and learn from them, modify a subroutine that offended you, insert your own subroutine . . . You couldn't keep working on a program until it was perfect. This was unconscionable. To hackers, a program was an organic entity that had a life indepen¬dent from that of its author. Anyone who could contribute to the betterment of that machine-language organism should be wel¬come to try. If you felt that the missiles in Threshold were too slow, you should be welcome to peruse the code and go deep into the system to improve on it. Copy protection was like some authority figure telling you not to go into a safe which contains machine-language goodies . . . things you absolutely need to improve your programs, your life, and the world at large. Copy-protect was a fascist goon saying, "Hands off." As a matter of principle, if nothing else, copy-protected disks must therefore be "broken." Just as the MIT hackers felt compelled to compromise "security" on the CTSS machine, or engaged in lock hacking to liberate tools. Obviously, defeating the fascist goon copy-protect was a sacred calling and would be lots of fun.
Early varieties of copy-protect involved "bit-shifting" routines that slightly changed the way the computer read information from the disk drive. Those were fairly simple to beat. The companies tried more complicated schemes, each one broken by hackers. One renegade software publisher began selling a program called Locksmith, specifically designed to allow users to duplicate copy¬protected disks. You didn't have to be a hacker, or even a pro-grammer, to break copy protection anymore! The publisher of Locksmith assured the Apple World that his intent, of course, was only to allow users to make backup copies of programs they'd legally purchased. He insisted that users were not necessarily abusing his program in such a way that publishers were losing sales. And Buckminster Fuller announced he was becoming a With most publishers guessing that they lost more than half their business to software pirates (Ken Williams, with characteristic hyperbole, estimated that for every disk he sold, five or six were P'rated from it), the copy-protection stakes were high.

Thursday, February 10, 2011

Where are you, biodiversity data?

When Dave Thau presented the Google Earth Engin at the TDWG meeting in Woods Hole, I was very sceptical, and still am, with the goal of land use change detection they showed. I did some work in this field during my time as NRC fellow at the JPL and was involved in quiet many debates about the use of RADAR vs optical remote sensing data, especially when it gets to the point of creating large mosaics, or land use change detection. The problem being that all the images are taken at different times, season so the optical signal can be very different (not to speak of let's say a dry forest lost all the leaves from one shot to the next).

But never mind these thoughts, Google announced their launched of their system now called the Google Earth Engine and it presents some data and allows you to make your own analyses.

Once more, the question is up, where our biodiversity observation data is that can be used to make use of these new opportunity. We could figure out, where biodiversity disappears - but can we? I haven't seen a project that specifically makes use of these RS data sets, that is not just anecdotal.

The global initiatives like GBIF or RedList do not champion new field campaigns to get data that would live up to the analysis tools we now have.

The question then remains, how could we actually make use of it.

Wednesday, February 09, 2011

Will Knowledge Organisation Systems kill diversity?

The release of GBIF's "Recommendations for the use of Knowledge Organisation Systems by GBIF" made me think of whether the implicit drive to create the ultimate information system is not a wrong ultimately damaging approach. Such as system assumes that it models our world properly, and, ultimately that there is only one way to do so, hence the KOS for biodiversity. Essentially, this will mean, that anything that can not be packaged within the ontologies building the KOS can not be integrated - or from another perspective, we begin to look at the world from a very restrictive view - in German this would be called Scheuklappen, the little piece of leather mounted on the sides of horse eyes to avoid distractions.
This all would be not really relevant, wouldn't the organisation behind this recommendation utlimately strive to create the the bioinformatics technology standards (TDWG - Taxonomic Data Working Group, more recently changed to Biodiversity Information Standards) we all are using - better will have to use - to become part of all we hope, the seamless knowledge space.
I find this even more questionable at a moment, where this community can not even deal with something as trivial as a bibliographic citation, not to speak to build up a database of all the citations.
Or in the realm of taxonomic digital literature where we encounter on the one hand something very simple like treatments, but then are not able to define what it is - something that is now crucial to create schemas, DTDs or similar to model this domain for the purpose of creating semantically enhanced documents. The legacy data shows clearly, that there is a huge variety in how treatments are being communicated - but our goal now is to create a standard - eventually defined in one of the TDWG vocabulary that ought be more restrictive to allow reasoning and machine enabled tools to help to work through the huge amount of data we hope to open up with that move. May be, one might have to consider a definition less as something all inclusive but rather as a concept that allows a lot of flexibility in its application?

Another point is: Knowledge of what and what do we want to do with it? Bowker and Star put it this way:
Each standard and each category valorizes some point of view and silences another. This is not inherently a bad thing - indeed it is inescapable. But it is an ethical choice, and as such it is dangerous - not bad, but dangerous.

Though the others have in mind racial classification, I would argue, that we need to keep this issue of exclusion in mind.

In other words: What is the limit of what we can do with the system we create? What is the limit of the material regarding creating meaningful (as opposed to artefactual) knowledge?

This of course touching upon something very different and very negelected: Quality Control of our input data.

I wonder whether the authors provide an answer to this. The why and what questions seems to me the main stumbling block for biodiversity informatics in general.

With this in mind, I will read through this new recommendations.

Monday, February 07, 2011

CBD, IPBES and taxonomy

Very quietly, unnoticed by most, at least following my channels into the taxonomy community, as part of the CBD-follow up, the Nagoya Protocol on Access and Benefit Sharing (and see comment) adopted on 29 October 2010, includes now an article that reflects that concern of the taxonomists to be able to collect with less restrictions.

ARTICLE 8
SPECIAL CONSIDERATIONS
In the development and implementation of its access and benefit-sharing legislation or regulatory requirements, each Party shall:
(a) Create conditions to promote and encourage research which contributes to the conservation and sustainable use of biological diversity, particularly in developing countries, including through simplified measures on access for non-commercial research purposes, taking into account the need to address a change of intent for such research;


The protocol must be ratified by 50 parties and would enter into force 90 days after the fiftieth ratification, according to the CBD.