Open Source Statement
• To: This email address is being protected from spambots. You need JavaScript enabled to view it.
• Subject: Open Source Statement
• From: Bernard Moret <This email address is being protected from spambots. You need JavaScript enabled to view it.>
• Date: Wed, 12 Jun 2002 14:16:32 -0600
• Sender: This email address is being protected from spambots. You need JavaScript enabled to view it.
• User-Agent: Mutt/1.2.5i
I must voice my disappointment at the proposed ISCB statement.
I am a computer scientist who works in computational biology
(specifically, computational phylogeny) and whose specialty is
very high-performance implementations.
I sympathize with the problem caused by the ambiguous definition of
"open source" and the variety of licenses proposed under that label.
However, that is really a very minor problem: ALL open-source software
obeys the critical baseline requirement that the source code for
the software be available to researchers for non-commercial uses
(ISCB level 2 availability).
I strongly urge the ISCB to recommend that level 2 be the MINIMUM
availability level.
Levels 0 and 1 are basically useless for the purpose of research:
they are more suited to static, commercial software, than to dynamic
research software. (It should also be noted that releasing the source
for a software product by no means inhibits the commercialization of
said product; examples abound in the Unix community of products available
both as open source -- sometimes without any restriction or under a license
such as GPL that enables anyone to use and modify the sofware -- and as
successful commercial products, the latter deriving its value from
careful packaging, installation tools, and direct technical support.)
Real progress in developing effective software tools depends critically
on the availability of source code, for at least four good reasons:
A. Performance evaluation.
Binaries are useless for effective testing: one cannot instrument
a binary, one cannot recompile it to run on a new platform, and
one cannot even compare running times with another binary on the same
platform, since so many variables are involved in coding and
compiling. In the rapidly evolving field of experimental
algorithmics, which deals with the assessment of different
software solutions for the same problem. there is ample evidence
that software available only in binary form cannot be evaluated.
Lack of evaluation hurts the software authors (who cannot document
claimed improvements that their package brings to the field), hurts
potential software users (who cannot do "comparative shopping"),
and most of all hurts research (since weaknesses and strengths
cannot be identified and corrected or enhanced).
B. Maintenance and availability.
Much research software is developed in one laboratory and maintained
only as long as its authors continue working on the same project.
The result is that the lifespan of much research software is limited
to a few years. The authors of that software are rarely interested
in porting it to multiple platforms -- it is obviously not a goal of
their research. In contrast, open-source software communities such
as the Linux and GNU communities have demonstrated that, as long as
the source code is available, there are a lot of developers out there
willing to take a hand in maintaining and porting useful code.
The lifetime of community-maintained software has no ceiling;
its robustness exceeds that of any commercial product; and its
availability is unequaled anywhere. (Witness the Linux OS itself,
which runs on just about every platform known to man, from PDAs
through desktops to supercomputers, of any brand and configuration;
it is also infinitely more reliable than any Microsoft product, does not
suffer from a myriad of security issues, is updated almost weekly, etc.)
C. Further development.
Much of what I mentioned under the previous heading immediately
extends to this one. Once source code is available to other
researchers, the number of testers and developers increases
enormously. Researchers at other labs will identify new desirable
functionalities, track down existing bottlenecks or other weaknesses,
and generally contribute to a huge increase in the pace of software
development.
D. Integration.
Binaries cannot be integrated into any new software product, unlike
source code. (Even when the authors of a software package make
source available, but forbid redistribution or packaging, it is
at least possible to add "hooks" into the new product to enable it
to use the package after the user has acquired it independently from
the authors.) We are already seeing major problems of integration
at a time when the field of computational biology is barely out
of its neonatal stage. Such problems will dominate software
development in computational biology within a few years, as they
already do in more mature areas. More Perl scripts is not a
solution, but a temporary patch.
Ultimately, we need either open-source code everywhere, with accepted
standards for data representation and such, or the next Bill Gates-style
monopoly. Both result in a type of standardization, but open-source
does so in a dynamic, energetic, forward-looking way, with constant
stimulation by the developer community, whereas a monopoly resists
change, because it can afford to deliver mediocre products.
The professional association for computer science, the ACM, has called
for open-source products -- and most large software houses and computer
manufacturers (with, of course, the exception of the current Microsoft
monopoly) have joined in that call, as has the National Science Foundation.
The ACM and the IEEE Computer Society have also taken the lead in developing
standards to enable easy interoperation of software packages.
The ISCB should lead the computational biology community in the direction
taken by all modern software research efforts and strongly advocate open
source software by a clear statement that Levels 0 and 1 are not viewed as
contributing to research and that Level 2 is the lowest acceptable level
for purposes of research.
Bernard Moret
Prof. of Computer Science
and of Electr. & Computer Eng.
Whitepaper on Open Source Software in Bioinformatics
• To: This email address is being protected from spambots. You need JavaScript enabled to view it.
• Subject: Whitepaper on Open Source Software in Bioinformatics
• From: Peter Karp <This email address is being protected from spambots. You need JavaScript enabled to view it.>
• Date: Thu, 30 May 2002 09:40:36 -0700
• Sender: This email address is being protected from spambots. You need JavaScript enabled to view it.
Whitepaper on Open Source Software in Bioinformatics
Russ Altman, Stanford University
Phil Bourne, University of California, San Diego
Peter D. Karp, SRI International
Teri Klein, Stanford University
Tandy Warnow, The University of Texas at Austin
This whitepaper discusses issues raised by the recent ISCB statement on
Open Source software in bioinformatics (see URL
http://www.iscb.org/pr.shtml), which we strongly support. Our intent
is to examine the complicated issues behind open-source software in
more depth.
We strongly endorse the notion that bioinformatics software
produced by academic researchers should be available to academic,
government, and commercial users in the bioinformatics and genomics
communitities. The community must establish certain minimal
conditions of availability to ensure that the results of publicly
funded research projects are available to both the academic research
community and the commercial sector, to ensure that bioinformatics
software can be scientifically validated, and to reduce confusion in
the availability conditions of different software tools.
However, it is important to recognize that there may be legitimate
reasons for using any of a variety of licenses that satisfy the above
requirements. There is no definitive evidence that any of the
open-source models are either superior (or inferior) to the
alternatives. Therefore, we oppose a REQUIREMENT for open-source
software. Let us be clear that we are not against the use of
open-source software models, which have proved useful in many cases:
we oppose a blanket requirement for open-source models.
The essence of this statement is freedom of choice --- we support the
rights of individual reseachers and their institutions to decide the
most appropriate means by which they wish to distribute software they
have generated using grant funds.
1. Ambiguity of the Term "Open Source"
The majority of open-source software is distributed under some form of
open-source license agreement. Many different license agreements are
used by different entities that distribute what they call open-source
software.
As of January 2002, 30 different open source licenses were endorsed by
the open source initiative
(http://www.opensource.org/licenses/index.html), each of which has
different terms and implications for protection of intellectual
property and commercialization. This organization is likely to
endorse additional licenses in the future, and other organizations use
still other license agreements that they label as "open source."
The exact terms of these licenses vary considerably. Some involve
fees for use. Some prohibit redistribution of the software by anyone
except the software author. These variations in license terms have
great implications for the user community.
Therefore, the phrase "open source" has come to be virtually
meaningless, and we henceforth use the term in quotation marks to
highlight its ambiguity. The important question is: what are the
terms of the license agreement that will be used to implement the
open-source concept?
2. Open Source is Not a Silver Bullet
The intent of this statement is not to promote the philosophy that
"open source" is a bad idea, or that "open source" has no merit.
Rather, its intent is to counter the "open source" dogma that "open
source" is a silver bullet that will miraculously cure most of the
difficulties of software development. Although "open source" does
have advantages in some cases, those advantages have in many cases
been over-stated by the proponents of "open source," in a manner that
ignores many of the complexities of software development.
3. Funding Agencies Should not Require "Open-Source" Software
Funding agencies should literally not require "open-source software"
availability on the part of their grantees because of the ambiguity of
the term. Furthermore, funding agencies should not require any
particular "open-source" model. None of its variations are right for
every software project, every scientist, or every institution.
In the majority of cases, the author of a software package and the
author's institution should be free to determine the terms under which
it will be distributed. Small software packages, or packages that
lack particularly sophisticated algorithms, will probably have little
commercial value. In these cases, some "open-source" model may well
be the best choice. An "open-source" model may also be appropriate
for very large software-development projects that span many
institutions, where defining clear intellectual property rights can
become unreasonably complex.
Generally, requirements on distribution licenses for software
developed with support from government funds should be determined
through negotiations between the government and the institution
receiving the support, in concert with existing laws and regulations.
Only in rare circumstances, when the licensing terms are clearly
relevant to the scientific merit of a proposal, should software
licensing terms be considered as part of the peer review process.
The remainder of this section explores different properties of the
"open-source" model.
4. No Fee, Unlimited Redistribution Variation of Open Source
In this variation of "open source," a software package is supplied to
all users for no fee, and no restrictions are placed on the ability of
end users to redistribute the software.
Funding agencies that require bioinformatics software to be
distributed for no fee to all organizations, or that require that
software users must be allowed to redistribute the software source
code, will encourage bioinformatics software tools to become "wards of
the state." Government funding agencies have limited research budgets
that cannot fund the full costs of every worthwhile project. It is
advantageous to the government for some bioinformatics software
packages to be commercialized so that the costs of their further
support and development are no longer born by the government, thus
freeing government funds for other new projects. (We note that
commercialization is also not a panacea, and that not all
commercialization efforts succeed.)
Companies commercialize software largely because they see a
potential financial reward. Financial rewards usually depend on the
existence of a significant competitive advantage. A company is
unlikely to commercialize a software package if potential customers
can obtain the software for free elsewhere, or if a competitor has
free access to the complete source code of that package, giving the
original company little advantage over its competitors. Other
licensing mechanisms exclude competitors, and protect the competitive
advantage that allows companies to make significant additional
investments in the development of a software package.
If commercial entities are unwilling to take over support and
development of key bioinformatics packages, those packages will become
forever dependent on support by the government, thus decreasing the
funds available for other research projects.
Furthermore, government funding is well known to lack long-term
stability, particularly in young interdisciplinary fields such as
bioinformatics, where reviewing quality is highly variable.
Commercial licensing programs can aid a research group by providing an
alternate revenue stream that can supplement, and buffer gaps, in
government funding. Consider the case of the highly regarded
Swiss-Prot database, which lost its funding from the Swiss government
in 1996. The Swiss-Prot project adopted a commercial licensing model
that saved this valuable scientific resource from collapse.
Requiring no-fee or unlimited redistribution conflicts with the US
Bayh-Dole act. It specifies that recipients of US government-funded
research are the owners of copyrightable works (such as software) that
they author. The author of a copyrightable work may choose the
license under which that work is distributed. The Bayh-Dole act was
enacted precisely because its authors recognized that commercial
development of research ideas requires just the sort of competitive
advantage just described. A blanket "open-source" requirement will in
some cases conflict with the individual licensing rules of academic
institutions, putting investigators in the position of mediating
internal legal disputes in order to apply for funding.
Requiring grantees to distribute their software using a no-fee and
unlimited redistribution license would deprive those grantees of the
majority of the potential for being rewarded for their hard work and
ingenuity. This approach could open the door for commercial
organizations to obtain ma software package, improve upon it, and
then redistribute the package commercially, thus cashing in on the
hard work of the original author with no benefit to that author.
5. What are the Benefits of Source-Code Availability?
We know of no empirical evidence (such as a scientific study of
software engineering practices) that software projects using an
"open-source" model have a significantly higher probability of success
than projects that distribute software under more restrictive terms
(all other things being equal). Many successful bioinformatics
software tools are made available under licenses that do not include
source code, such as WU-Blast, Phred, Phrap, and EcoCyc.
Although source-code availability can allow users of a software
package to fix bugs in the software or tailor the software to their
own needs, it is worth noting that for a software package of any
significant complexity, many users will be unable to understand the
source code sufficiently well to fix it or customize it. Attempted
fixes often introduce new bugs, and it is takes times for the original
authors to test those fixes and re-integrate them into the original
package.
The ability of users to redistribute modified versions of a software
package can lead to the proliferation of multiple incompatible
versions of the software, and to versions that have more bugs than
the original package, thus damaging the reputation of the original
package.
Does source-code availability allow better scientific validation of a
software package? We know of no examples where inspection of the
source code of a software package revealed fundamental scientific
errors (as opposed to implementation bugs in an otherwise valid
method) that were not detected by the reviewers of associated
publications. For the purpose of promoting rigorous scientific
evaluation, source code availability is no substitute for a clear
written presentation of an algorithm. It is typically easier to
validate a program by running it on test cases (which does not require
source code) than by inspecting the source code.
6. Summary
In closing, we do not oppose software licensing in which source code
is made openly available -- we opposes a blanket requirement of
"open-source software." Many successful existing bioinformatics
software packages have not been distributed using some variation of
"open-source" license agreements. Licensing arrangements should be
determined by the software authors on a case-by-case basis. We
oppose government-funded bioinformatics software efforts that do not
make their software available, under some form of license agreement,
to academic, government, and commercial organizations.