Open Source Statement
• To: This email address is being protected from spambots. You need JavaScript enabled to view it.
• Subject: Open Source Statement
• From: Bernard Moret <This email address is being protected from spambots. You need JavaScript enabled to view it.>
• Date: Wed, 12 Jun 2002 14:16:32 -0600
• Sender: This email address is being protected from spambots. You need JavaScript enabled to view it.
• User-Agent: Mutt/1.2.5i
I must voice my disappointment at the proposed ISCB statement.
I am a computer scientist who works in computational biology
(specifically, computational phylogeny) and whose specialty is
very high-performance implementations.
I sympathize with the problem caused by the ambiguous definition of
"open source" and the variety of licenses proposed under that label.
However, that is really a very minor problem: ALL open-source software
obeys the critical baseline requirement that the source code for
the software be available to researchers for non-commercial uses
(ISCB level 2 availability).
I strongly urge the ISCB to recommend that level 2 be the MINIMUM
availability level.
Levels 0 and 1 are basically useless for the purpose of research:
they are more suited to static, commercial software, than to dynamic
research software. (It should also be noted that releasing the source
for a software product by no means inhibits the commercialization of
said product; examples abound in the Unix community of products available
both as open source -- sometimes without any restriction or under a license
such as GPL that enables anyone to use and modify the sofware -- and as
successful commercial products, the latter deriving its value from
careful packaging, installation tools, and direct technical support.)
Real progress in developing effective software tools depends critically
on the availability of source code, for at least four good reasons:
A. Performance evaluation.
Binaries are useless for effective testing: one cannot instrument
a binary, one cannot recompile it to run on a new platform, and
one cannot even compare running times with another binary on the same
platform, since so many variables are involved in coding and
compiling. In the rapidly evolving field of experimental
algorithmics, which deals with the assessment of different
software solutions for the same problem. there is ample evidence
that software available only in binary form cannot be evaluated.
Lack of evaluation hurts the software authors (who cannot document
claimed improvements that their package brings to the field), hurts
potential software users (who cannot do "comparative shopping"),
and most of all hurts research (since weaknesses and strengths
cannot be identified and corrected or enhanced).
B. Maintenance and availability.
Much research software is developed in one laboratory and maintained
only as long as its authors continue working on the same project.
The result is that the lifespan of much research software is limited
to a few years. The authors of that software are rarely interested
in porting it to multiple platforms -- it is obviously not a goal of
their research. In contrast, open-source software communities such
as the Linux and GNU communities have demonstrated that, as long as
the source code is available, there are a lot of developers out there
willing to take a hand in maintaining and porting useful code.
The lifetime of community-maintained software has no ceiling;
its robustness exceeds that of any commercial product; and its
availability is unequaled anywhere. (Witness the Linux OS itself,
which runs on just about every platform known to man, from PDAs
through desktops to supercomputers, of any brand and configuration;
it is also infinitely more reliable than any Microsoft product, does not
suffer from a myriad of security issues, is updated almost weekly, etc.)
C. Further development.
Much of what I mentioned under the previous heading immediately
extends to this one. Once source code is available to other
researchers, the number of testers and developers increases
enormously. Researchers at other labs will identify new desirable
functionalities, track down existing bottlenecks or other weaknesses,
and generally contribute to a huge increase in the pace of software
development.
D. Integration.
Binaries cannot be integrated into any new software product, unlike
source code. (Even when the authors of a software package make
source available, but forbid redistribution or packaging, it is
at least possible to add "hooks" into the new product to enable it
to use the package after the user has acquired it independently from
the authors.) We are already seeing major problems of integration
at a time when the field of computational biology is barely out
of its neonatal stage. Such problems will dominate software
development in computational biology within a few years, as they
already do in more mature areas. More Perl scripts is not a
solution, but a temporary patch.
Ultimately, we need either open-source code everywhere, with accepted
standards for data representation and such, or the next Bill Gates-style
monopoly. Both result in a type of standardization, but open-source
does so in a dynamic, energetic, forward-looking way, with constant
stimulation by the developer community, whereas a monopoly resists
change, because it can afford to deliver mediocre products.
The professional association for computer science, the ACM, has called
for open-source products -- and most large software houses and computer
manufacturers (with, of course, the exception of the current Microsoft
monopoly) have joined in that call, as has the National Science Foundation.
The ACM and the IEEE Computer Society have also taken the lead in developing
standards to enable easy interoperation of software packages.
The ISCB should lead the computational biology community in the direction
taken by all modern software research efforts and strongly advocate open
source software by a clear statement that Levels 0 and 1 are not viewed as
contributing to research and that Level 2 is the lowest acceptable level
for purposes of research.
Bernard Moret
Prof. of Computer Science
and of Electr. & Computer Eng.