ISMB 2012 features Special Session presentations throughout the conference July 15 - July 17. Special Sessions have the purpose of introducing the scientific community to relevant scientific issues and topics that are typically not within the focus of the conference. Preliminary program information on Special Sessions is noted below.
(Schedules subject to change)
Special Session 1: Biological Complexity through Model Organisms - CANCELLED
Special Session 2: Modeling Infectious Disease Processes
Special Session 3: The NIH National Centers for Biomedical Computing (NCBC)
Special Session 4: Bioinformatic Integration of Diverse Experimental Data Sources
Special Session 5: Celebrating Science Together: ISMB 20 Years and NIGMS 50th anniversary special session
Special Session 6: Computational methods for elucidating nuclear structure and dynamics
Special Session 7: Harnessing community intelligence for bioinformatics
Special Session 1:
Biological Complexity through Model Organisms --- CANCELLED
Special Session 2:
Modeling Infectious Disease Processes
Jason McDermott, Pacific Northwest National Laboratory, Richland, United States
Katrina Waters, Pacific Northwest National Laboratory, Richland, United States
Date: Sunday, July 15
Start Time: 2:30 p.m. – End Time: 4:25 p.m.
Infectious disease is one of the most formidable problems being faced in the world today. Recent emergence of new pathogens, antibiotic resistant strains of existing pathogens, and re-emergence of diseases thought to be eliminated have pointed out the importance of research to understand infectious diseases. The threat of pandemic from a novel strain of influenza or other viruses is ever present. Understanding the mechanisms of pathogenesis and the complex interplay between host and pathogen is extremely important but in large part has eluded scientific investigation. Pathogens have smaller genomes than their hosts, are present in far larger numbers than their hosts and evolve much more rapidly. Host defense systems are sophisticated and they have the ability to respond to a wide variety of pathogens. The interaction between host and pathogen is complex and involves many iterative responses on both sides in a struggle for survival. New insights into host-pathogen interactions will allow development of novel therapeutic strategies to combat disease.
The development of high-throughput data acquisition methods has prompted an explosion of systems-level data on host-pathogen systems that requires sophisticated and multi-faceted computational modeling approaches to interpret. There is a vast amount of knowledge about pathogens, hosts and their interactions in the form of scientific literature and a myriad of databases. Organization and interpretation of this information can allow a broader understanding of infectious disease processes. The interaction between host and pathogen is nuanced and complicated. Understanding these interactions requires systems biology approaches that aim to elucidate the relationships between components in the system and how these relationships give rise to higher order phenotypes such as pathogenesis.
Addressing these complicated issues requires sophisticated computational modeling approaches that involve statistical and mathematical models, high-throughput data analysis, integration of diverse forms of data, and working closely with the bench biologists, clinicians and epidemiologists who are investigating infectious disease. The community represented at ISMB is perfectly positioned to confront these problems. Our proposed session would highlight key approaches taken by leading investigators to answer crucial questions related to infectious diseases. It would serve to spur discussion in the computational biology community about this important problem.
Part A: Host Response Networks Against Influenza Infections: Virus Attack and Host Defense
(2:30 p.m. - 2:55 p.m.)
Speaker Christian Forst, UT Southwestern Medical
Pathogens and host engage in an endless battle trying to out-compete each other. Pathogens (viruses in our case) attempt to hijack host processes for their advantage and for replication, whereas the host tries to defend against such attacks. We have employed two classes of whole-genome siRNA screens to identify (i) host and resistance factors for viral replication, as well as (ii) processes for host cell-survival and death. Together with biochemical network information we were able to identify distinct cellular factors and processes with respect to four distinct phenotypes: (i) resistance factors required for cell survival, (ii) restriction factors inducing cell death, (iii) host factors required for infection, and (iv) host factors that specify death in response to infection. Thus, we are not only able to pin-point processes of the viral life cycle, but also to identify host-defense pathways with diagnostic and therapeutic potential.
Part B: Reconstructing the Regulatory Network of TB: Deconstruction of the Hypoxic Response
(3:00 p.m. - 3:25 p.m.)
Speaker: Elham Azizi, Boston University, United States
We have generated the first genome scale model of the in M. tuberculosis regulatory network and combined this network with the first comprehensive profiling of mRNA, proteins, metabolites and lipids in MTB during hypoxia and re-aeration. We have developed a high-throughput system based on ChIP-Seq for comprehensively mapping regulatory binding, and integrated this with expression data from the induction of the same factors. Our method allows us to map DNA binding of all MTB regulators in a consistent and comparable manner independent of regulatory function. Using this method we have reconstructed a regulatory network model based on over 50 transcriptions factors. The network doubles the number of regulators whose interactions have been studied in MTB, discovers thousands of interactions and assigns functions to a substantial number, suggests many more potentially functional interactions for even well-studied regulators, and displays predictive power for gene expression.
The network model also reveals a direct and interconnection between the hypoxic response, lipid catabolism, lipid anabolism and the production of known immunomodulatory lipids, and protein degradation. Consistent with this, we observe substantial alterations in lipid, amino acid, and protein content in response to oxygen availability. The regulator network provides insight into the transcription factors underlying these changes. Using our regulatory network data – generated under independent normoxic conditions - we are able to generate models of steady state gene expression that allow us to predict MTB gene expression during hypoxia and re-aeration.
Part C: Informatics-Enabled Microbe-Host-Environment Interactions (3:30 p.m. - 3:55 p.m.)
Speaker: Bruno Sobral, Virginia Bioinformatics Institute, United States
Part D: Systems Biology of Infectious Disease
(4:00 p.m. - 4:25 p.m.)
Speakers: Jason McDermott, Pacific Northwest National Laboratory United States
Katrina Waters, Pacific Northwest National Laboratory United States
The study of infectious disease and the complex interplay between pathogens and their hosts has benefitted greatly from the ability to generate many different high-throughput measurements of systems, including transcriptomics, proteomics, and metabolomics. Ways to represent, interpret, and model such multimodal datasets allow improved understanding of the host-pathogen relationship at a systems biology level. We present recent results from systems biology studies of bacterial enteropathogens, Salmonella Typhimurium and Yersinia pestis, as well as respiratory viruses, influenza H5N1 and SARS coronavirus, interacting with their hosts. We will describe the use of network-based approaches to interpretation of high-throughput data and prediction of important components of the system, including experimental validation of some of these predictions. We will also describe how predictive modeling approaches can be used to model important aspects of the interaction and provide predictions of control points for pathogenesis and host response. Finally, we will discuss critical gaps that exist in the systems biology study of infectious diseases and future directions to address those gaps.
Special Session 3:
The NIH National Centers for Biomedical Computing (NCBC)
Peter M. Lyster, National Institutes of Health, Bethesda, United States
Jen Villani, National Institutes of Health, Bethesda, United States
Date: Monday, July 16
Start Time: 10:45 a.m. – End Time: 12:40 p.m.
The National Centers for Biomedical Computing (NCBC) are cooperative agreement awards that are funded under the NIH Common Fund. The Special Session will highlight the computational scientific achievements of the NCBCs program as well as provide a full palette of software and resources that can be obtained from the individual Centers as accessed through the main portal http://www.ncbcs.org/ There are eight funded Centers that cover biophysical modeling, biomedical ontologies, information integration, tools for gene-phenotype and disease analysis, systems biology, image analysis, and health information modeling and analysis. The centers will create innovative software programs and other tools that enable the biomedical community to integrate, analyze, model, simulate, and share data on human health and disease. Each Center has Cores that are focused on (i) biomedical computational science and (ii) driving biological projects (DBPs) whose intent is to drive the interaction between computational and biomedical computational science. There are numerous efforts in education and training that emanate from the Centers and there is an annual all hands meeting. In addition to the Centers, the NIH and other government agencies have a number of funding announcements that are summarized in the Biomedical Information Science and Technology Initiative (BISTI) Funding Page http://www.bisti.nih.gov/funding/index.asp
. The funding page summarizes all initiatives across Government that have a significant component of biomedical computing. This also includes a program for Collaborations with National Centers for Biomedical Computing. The Special Session at ISMB will include three talks from our PIs and a panel that combines representatives from all eight NCBCs and NIH representatives from the Project Team.
Part A: Biomedical Imaging under the NCBC Program
(10:45 a.m. - 11:10 a.m.)
Speakers: Ron Kikinis Brigham and Women's Hospital
Part B: Computational Biology and Systems Science under the NCBC Program
A number of the NCBC centers focus on biomedical imaging. In particular, the National Alliance for Medical Image Computing (NA-MIC) researches computational tools for the analysis and visualization of medical image data, the Center for Computational Biology (CCB) is focused on the development of computational biological atlases of different populations, subjects, modalities, and spatio-temporal scales. Several of the other centers have imaging related aims, including IDASH and Simbios Centers.
(11:15 a.m. - 11:40 a.m)
Russ Altman Stanford University, United States
Part C: Biomedical Informatics under the NCBC Program (11:45 a.m. - 12:10 p.m.)
Speaker: Mark Musen Stanford University, United States
Three of the National Centers for Biomedical Computing concentrate on infrastructure to support the management of biomedical data sets. Informatics for Integrating Biology and the Bedside (i2b2) offers support for warehousing clinical and biomedical data to assist clinical and translational research. Integrating Data for Analysis, Anonymization and Sharing (iDASH) concentrates on methods to enable the secure distribution and analysis of private health information. The National Center for Biomedical Ontology (NCBO) maintains a repository of more than 300 biomedical ontologies and offers services that enable the use of those ontologies for data analytics and biomedical research. We will discuss the Centers' technology and case studies of how that technology has been used to support work in biomedical informatics.
Part D: Panel for NCBCs: (12:15 p.m. - 12:40 p.m.)
Isaac Kohane Brigham and Women's Hospital, Ron Kikinis Brigham and Women's Hospital, Andrea Califano Columbia University, Brian Athey University of Michigan, Mark Musen Stanford University, Russ Altman Stanford University, Arthur Toga University of California at Los Angeles, Lucila Ohno-Machado University of California at San Diego, and members of NIH Project Team
Special Session 4:
Bioinformatic Integration of Diverse Experimental Data Sources
Kyle Ellrott, University of California Santa Cruz, United States
David Haussler, University of California Santa Cruz, United States
Artem Sokolov, University of California Santa Cruz, United States
Josh Stuart, University of California Santa Cruz, United States
Date: Monday, July 16
Start Time: 2:30 p.m. – End Time: 4:25 p.m.
Modern genomics has seen the advent of different technologies for high-throughput biological analysis. Technologies for RNA sequencing, methylation analysis, expression arrays, whole genome sequencing, pathway analysis, proteomics and protein structure analysis can all produce a variety of data that can help understand biological systems. Frequently, data from different types of high throughput experiments are treated as separate silos, with researchers spending most of their time inspecting a single type of data and only occasionally tapping another data source for validation. However, by integrating diverse sources of data, relevant information not detectable from a single source may become more pronounced and visible.
We propose to organize a special session that will address the current successes and challenges in the development and application of integrative approaches to understanding biological systems and translating results to clinical insights. Our panel of experts will present some of the current cutting-edge ideas in this area.
Part A: New Opportunities and Challenges in Network Biology (2:30 p.m. - 2:55 p.m.)
Speaker: Trey Ideker, UCSD School of Medicine, La Jolla, United States
Seminal contributions to the integration of molecular interactions and phenotypes, investigating host-pathogen interactions, inferring systems evolution through network-network alignment, and uncovering key mechanisms through dynamic network analysis.
The past decade has seen an explosion in “genome-era” technologies which profile genes, proteins, metabolites and the intricate web of interconnections among them. Much of bioinformatics and functional genomics is now focused on methods to assemble these diverse measurements into models of functional networks and pathways within the cell. As important, these pathway maps will provide essential information as doctors struggle to interpret the flood of genetic and clinical data that can now be collected for a patient.
We are working in several areas that we believe will be critical for assembling network models and for using them in a clinical setting:
1. Mapping the genetic network underlying the response to DNA damage. Failure of cells to respond to DNA damage is a primary step in the onset of cancer and is a key mechanism of environmental toxicity. Consequently, cells have evolved complex repair and stress responses that are highly conserved across the eukaryotic kingdom, from yeast to humans. We will describe our ongoing efforts to apply ChIP-sequencing and synthetic-lethal screens to map how the cell's transcriptional network is remodeled by DNA-damaging conditions.
2. Network-based biomarkers for disease diagnosis and personalized medicine. Genetic biomarkers are typically thought of as individual genes and proteins¡ª for example using prostate specific antigen (PSA) as a marker for prostate cancer. Recently, we have shown that networks can also serve as powerful biomarkers and in many cases are more predictive than any individual gene. Our approach is to project gene and/or protein expression profiles of each patient onto the known human genetic network map to identify pathways that are predictive of disease. This “network-based” biomarker approach has shown improved accuracy in diagnosis of breast and lung cancer as well as NF-kB activation state.
3. Protein network comparative genomics. We are developing a library of standard approaches for comparing protein interaction networks across species, conditions, and network types. We will describe network comparisons to study the protein interaction network of Plasmodium, the pathogenic protozoan that causes malaria, which surprisingly is quite divergent from other known networks. We are also working with Dr. Sumit Chanda at the Burnham Institute to identify protein networks essential for HIV infection and how these differ from RNA and DNA viruses.
For professional distribution of our network-based technologies, we are developers of the Cytoscape platform, an Open-Source software environment for visualization and analysis of biological networks and models (http://www.cytoscape.org/).
Part B: Unsupervised Clustering Methods in Systems Biology and Systems Pharmacology (3:00 p.m. - 3:25 p.m.)
Speaker: Avi Ma'ayan, Mount Sinai School of Medicine, New York , United States
Research has integrating expression arrays, protein-protein interactions, and CHIP-Seq to delineate functional pathways in biological systems including stem cells and cancer.
Genome-wide experiments applied to mammalian cells collect data across regulatory layers including genomics, transcriptomics, epigenomics, proteomics and phosphoproteomics. In addition, data about drug effects on cells, including drug induced gene expression signatures, as well as side effect information from millions of patient records for single and combinations of approved drugs are collected at a rapid pace. Integrating the data from all these sources presents an opportunity to better understand how human cells work and how human cells can be controlled by approved drugs and their combinations, and how these relate to the human personalized phenotype. However, integrating these different types of experimental and clinical data is a fundamental challenge in computational systems biology and systems pharmacology. This presentation will describe two new unsupervised clustering methods to approach the problem with successful applications in stem cell systems biology and systems pharmacology. In addition, a method that can be used to link gene expression signatures to upstream regulatory cell signaling pathways and drugs will be presented. Using a program and algorithm we developed called Expression2Kinases we first identify the transcription factors and histone modifiers upstream of differentially expressed genes and then connect these regulators to protein networks and protein kinases. Such an approach allows for supervised learning of data from across regulatory layers, and this enables filtering of datasets and tuning of parameters for accurate computational predictions.
Part C: MVOCA - Multivariate Organization of Combinatorial Alterations in Large-scale Genomic Data Sets (3:30 p.m. - 3:55 p.m.)
Speaker: Rachel Karchine, Johns Hopkins University, Baltimore, United States
Expert in characterizing the impact of mutations in cancer, such as distinguishing drivers from passengers, through protein- and pathway-level modeling and machine learning.
Cooperative dysregulation of gene sequence and expression may contribute to cancer formation, progression and the response of patients to cancer drugs. We have developed MVOCA, a fast nonparametric method to rapidly and exhaustively examine the correlation between mutation, expression and other genomic alterations in a collection of tumors. The Cancer Genome Atlas (TCGA) Network recently catalogued genomic data for a collection of glioblastoma multiforme (GBM) tumors. MVOCA identified 41 genes whose mutation status was highly correlated with drastic changes in the expression, across tumor samples, of other genes. Some of the 41 genes have been previously implicated in GBM pathogenesis (e.g., NF1, TP53, RB1, and IDH1) and others, while implicated in cancer, were not previously highlighted in analyses of the TCGA GBM tumors (e.g., SYNE1, KLF6, FGFR4, and EPHB4). We found that known oncogenes and tumor suppressors participate in GBM via drastic over- and under-expression, respectively; identified a known synthetic lethal interaction between TP53 and PLK1, and other potential synthetic lethal interactions with TP53. We also identified correlations between IDH1 mutation status and the over-expression of known GBM survival genes. In addition, I will discuss recent work extending MVOCA to identify predictors of cancer drug response, using genomic and pharmacological data from a collection of cancer cell lines.
Part D: From integrated data analysis to cell-lineage specific view of human disease (4:00 p.m. - 4:25 p.m.)
Speaker: Olga Troyanskaya Princeton University, United States
Special Session 5:
Celebrating Science Together: ISMB 20 Years and NIGMS 50th anniversary special session;
Steven Brenner, University of California, Berkeley, United States
Date: Monday, July 16
Start Time: 2:30 p.m. – End Time: 4:25 p.m.
Room: Grand Ballroom
Steven E. Brenner, Ph.D. – University of California, Berkeley, CA
Relevant to this session, Dr. Brenner was the 2010 ISCB Overton Prize recipient, and NIGMS has supported his research on alternative splicing, function prediction, and structure analysis.
Under a series of visionary leaders and program officers, the National Institute of General Medical Sciences (NIGMS) has supported groundbreaking science of profound impact for 50 years. NIGMS is distinctive within NIH for the breadth of its mandate, with a mission to support research that
increases the understanding of life processes and lays the foundation for advances in disease diagnosis, treatment and prevention. In particular, NIGMS has long taken a leading role in funding computational biology, in part because research in this field has an intrinsic breadth and diverse medical impacts. This session celebrates the 50th anniversary of NIGMS and the 20th anniversary of ISMB with a selection of outstanding research from researchers supported by the NIGMS.
Part A: Pharmacogenomics: using informatics to integrate molecular biology with clinical genomics
Speaker: Russ Altman, Stanford University United States
Dr. Altman was named an ISCB Fellow in 2010, is a past President of ISCB (2000-2001), and served as a Co-Chair of the second ISMB in 1994. He runs the Helix Group at Stanford University, which creates computational tools that can be applied to solve problems in biology and medicine by using knowledge representation, database design, machine learning, natural language processing, physics-based simulation and graph-based modeling/analysis. As Principal Investigator of the NIGMS-funded PharmaGKB project, Dr. Altman focuses on the emerging field of personalized medicine by using individual genetics to optimize the selection and dosing of medications in order to advance the understanding of the genetics of drug response. In short, the Helix Group has constructed a pharmacogenomics database that integrates mutation information and knowledge about the impact of variation on drug metabolism. The group aims to build a tool that accepts a person’s genome as input, determines which drugs may or may not work, and recommends appropriate drugs and dosage levels for that person for their particular disease.
Part B: When do mutations in mammal conserved noncoding sequences alter enhancer activity in vivo?
Speaker: Katie Pollard, University of California, San Francisco, United States
The Pollard Group develops statistical and computational methods for the analysis of massive genomic datasets. Under Dr. Pollard’s leadership the group investigates genome evolution, in particular identifying genome sequences that differ significantly between or within species and their relationship to biomedical traits of interest. Many of these sequences are non-coding, such as regulatory signals, structural sites, and RNA genes. The aim is to identify specific DNA alterations that are responsible for novel functionality, such as variation in gene expression. Dr. Pollard has been the PI on a multi-year NIGMS grant entitled, “What made us human?”
Part C: Linking genes to traits (efficiently)
Speaker: Ed Marcotte University of Texas at Austin, United States
Dr. Marcotte’s research focuses on mining biological data for the large-scale organization of proteins to diagram the “wiring” of cells by learning how all of the proteins encoded by a genome are associated into functional pathways, systems, and networks. The emphasis of one of his NIGMS-funded projects, “Network-Directed Discovery of Disease Genes,” is to better define the functions of genes, thereby linking genes to traits and diseases, and in particular those genes underlying neural tube defects (NTDs), which are the second most common cause of human birth defects world-wide and the most common permanently disabling birth defect in the United States. At the other end of the scale, Dr. Marcotte is the PI on the NIGMS grant entitled, “Proteomics of Widespread, Reversible Protein Assemblies in Aging and Aggregation,” that investigates protein assemblies and their impact on aggregation-based diseases such as Alzheimer’s, Parkinson’s, and Huntington’s diseases, as well as aging itself, to determine if the assemblies are functional or catastrophic. From birth to old age, Dr. Marcotte’s research is focused on investigating genetic defects that might shed light on the path to new therapies that could improve the quality of life at all stages and ages.
Part D:NIH Update on Biomedical Informatics and Computational Biology (BICB)
Speaker: Peter Lyster NIH National Centers for Biomedical Computing, United States
As Program Director of the Division of Biomedical Technology, Bioinformatics, and Computational Biology at the NIH/NIGMS, Dr. Lyster has been directly involved in the role NIGMS has played in the past ten years in the cross-NIH Biomedical Information Science and Technology Initiative (BISTI). Among other activities, BISTI has maintained the Biomedical Informatics and Computational Biology (BICB) initiative, managed the Common Fund National Centers for Biomedical Computing
, and offered a number of smaller announcements such as the Continued Development and Maintenance of Software, which are all itemized in the BISTI Funding Page
. Dr. Lyster's talk will focus on BISTI activities, including summarizing the types and amounts of key awards noted in the recent editorial in the Biocomputational Review
, and presenting updates on the inventory of BICB awards at NIH.
Special Session 6:
Computational methods for elucidating nuclear structure and dynamics
Marc A Marti-Renom, National Center for Genomic Analysis, Barcelona, Spain
William S. Noble, University of Washington, Seattle, United States
W. Jim Zheng, Medical University of South Carolina, Charleston, United States
Date: Tuesday, July 17
Start Time: 10:45 a.m. – End Time: 12:40 p.m.
Over the last decade, and especially after the advent of fluorescent in situ hybridization imaging and chromosome conformation capture methods, the availability of experimental data on genome 3D organization has dramatically increased. We now have access to unprecedented details about how genomes organize within the interphase nucleus.
Development of new computational approaches that leverage such data has recently resulted in the first three-dimensional structures of genomic domains and genomes. Such approaches expand our knowledge of chromatin folding principles, which have been classically studied using polymer physics and molecular simulations. This Special Session will address new computational approaches for integrating experimental data with polymer physics. We believe that the development of biophysical models of higher-order chromatin architecture based on these new data helps elucidate the organizing principles of genomes and constitutes, by itself, an emerging field of computational biology.
10:45 - 11:10 Mathieu Blanchette
11:15 - 11:40 Frank Alber
11:45 - 12:10 Jijun Tang
12:15 - 12:40 Round table with all speakers + Special Session organizers
Part A: Hox in motion (10:45 - 11:10)
Speaker: Mathieu Blanchette - McGill University
Mathieu Blanchette obtained his Masters at U. of Montréal under the supervision of David Sankoff and his Ph.D. at U. of Washigton under the supervision of Martin Tompa. After a too short postdoc with David Haussler (UCSC), he joined the School of Computer Science of McGill University where he is now associate professor. His work focuses on the development of algorithmic and machine learning approaches for the analysis of molecular biology data, with a focus on the control of gene expression and comparative genomics. He was a Sloan Fellow and recently received an award for Outstanding Young Canadian Computer Researcher.
Three-dimensional genome organization is an important higher order transcription regulation mechanism that can be studied with the chromosome conformation capture (3C) techniques. Both gain and loss of contacts were reported with gene activation and repression, indicating that distinct spatial mechanisms can control transcription. Interestingly, the Hox clusters are regions where either gain or loss of contacts has been reported with gene activation. Whether this reflects distinct cell type-specific control mechanisms or differences in experimental design is unknown. Here, we examined the chromatin dynamics of the HoxA cluster in a human myeloid leukemia cell line at various stages of differentiation. We combined chromatin organization analysis with 3C-carbon copy, computational modeling, epigenetics, and protein binding to achieve the very first integrated view, through time, of a cross talk between epigenetics and chromatin architecture. We show that unfolding and remodeling of the HoxA cluster accompanies cellular differentiation. We found that 5’ HoxA activation coincides with a loss of contacts throughout the cluster, and by specific silencing at the 3’ end with H3K27 methylation. Our results support a model whereby HoxA activation involves loss-of-contacts between CTCF-bound insulator sequences. This is joint work with Josée Dostie, and Mathieu Rousseau.
Part B: Exploring parallel genome universes (11:15 - 11:40)
Speaker 2: Frank Alber - University of Southern California
PhD: ETH Zurich, Switzerland
Postdoc: SISSA Italy and Rockefeller University/UCSF
Since 2008 Assistant Professor at the University of Southern California.
Knowledge about the 3-D organization of the genome will offer great insights into how cells retrieve and process the genetic information. We will discuss a computational method to determine 3D genome structures and structure-function maps of genomes. To address the challenge of modeling highly variable genome structures, we propose a population-based modeling approach, where we construct a large population of 3D genome structures that together are entirely consistent with all available experimental data, including data from genome-wide conformation capture and imaging experiments. We interpret the result in terms of probabilities of a sample drawn from a population of heterogeneous structures. We will discuss results on the 3D spatial organization of the genome in human lymphoblastoid cells.
Part C: Using Game Engines to Build Cloud-Based 3D Genome Browsers (11:45 - 12:10)
Speaker: Jijun Tang - University of South Carolina
Jijun Tang, Ph.D., is an associate professor at the Department of Computer Science and Engineering, University of South Carolina. He received his Ph.D. in Computer Science from the University of New Mexico in 2004. His research interests spans two major areas: the first area is high-performance algorithmic design and development, with focus on phylogenetic reconstruction and comparative genomics; the other area is software design and implementation in the fields of scientific simulation and serious computer gaming.
The influx of new details about higher-level structure and dynamics of genomes requires new techniques to model, visualize and analyze the full extent of genomic information in three dimensions. In this talk, we will present our exploratory results and demanstrations of using cutting-edge game engines (such as Unity3D) to develop a cloud-based genome browser to visualize and manipulate three-dimensional genomic and epigenomic data. Compared to its previous stand-alone prototype, the new genome browser demands less computing power and makes 3D structural genome information available to a broader research community.
Part D: Computational methods for elucidating nuclear structure and dynamics - Panel discussion (12:15 p.m. - 12:40 p.m.)
The special session organizers and presenters will present as part of a panel discussion.
Special Session 7:
Harnessing community intelligence for bioinformatics
Benjamin M. Good, The Scripps Research Institute, La Jolla, United States
Date: Tuesday, July 17
Start Time: 2:30 p.m. – End Time: 4:25 p.m.
In this special session we will examine the social and technical underpinnings of successful efforts to harness community intelligence in bioinformatics.
‘Community intelligence’ efforts use the Web to enable volunteers to contribute to projects that would traditionally be accomplished solely by paid professionals. Successful products of such initiatives include Wikipedia, the Linux operating system, the Apache web server and the video collection on YouTube. The astounding scale and low cost of these and other successes has inspired many efforts to emulate these models in the life sciences. However, the majority of these efforts have largely failed because they have not managed to attract enough contributions from the community.
In this session, we will invite presentations from the leaders of community intelligence initiatives in bioinformatics that have succeeded in establishing a critical mass of contributors. Each of the four presentations will focus on one resource and will attempt to expose critical factors, such as incentives, interface design, outreach efforts, and community context, which have contributed to their success.
With increasing volumes of data and decreasing volumes of funding, many research groups in bioinformatics are now being asked to find ways to involve the larger (unpaid) research community in tasks related to the organization and dissemination of scientific information. This session will provide such groups with useful lessons that can be applied directly to their own initiatives.
Part A: Assessing the contribution of scientists to Wikipedia for Pfam and Rfam annotation (2:30 p.m. - 2:55 p.m.)
Speaker: Alex Bateman, Wellcome Trust Sanger Institute, United Kingdom
In this presentation I will show the latest survey of the scientific community's engagement with Wikipedia and its relevance to the annotation in Pfam and Rfam. Major challenges remain in: (1) educating experts in the field that Wikipedia contributions are a valuable communication tool; (2) giving non-technical scientists the confidence and knowledge of how to edit Wikipedia content.
Part B: WikiPathways and How to Change the World (or at least your small corner of the world)
(3:00 p.m. - 3:25 p.m.)
Speaker: Alexander Pico,
Gladstone Institutes, United States
WikiPathways is a collaborative platform for collecting, curating and distributing biological pathway knowledge in the research community. We started WikiPathways with almost a decade of experience archiving pathway models as a conventional resource maintained by a small internal team of experts. Switching to a community curation approach has dramatically increased the size, quality and relevance of our content. Increased relevance is a particularly unique advantage of ‘community intelligence’ efforts that directly engage researchers in real-time. More and more, we are finding research communities eager to participate in data and knowledge repositories that utilize their contributions directly and transparently. Over the past 4 years, WikiPathways has grown from 100 registered users to over 2000, with a steadily increasing percentage making edits and contributing new content. The number of visits has doubled in the last year to over 10,000 per month. In this special session, we will present the lessons we have gleaned from launching and developing WikiPathways as a ‘community intelligence’ effort: how to set milestones for early success, how to utilize open source code and culture, how to tap into already established communities and resources, how to build data mining and analytical tools and services around your content, how to make use of new models of data sharing and publishing.
Part C: The Gene Wiki: Crowdsourcing the annotation of human gene function
(3:30 p.m. - 3:55 p.m.)
Speaker: Andrew Su, The Scripps Research Institute, La Jolla, United States
Comprehensively annotating the function of human genes is a formidable challenge for the biomedical research community. The goal of the Gene Wiki project is to create a continuously updated, community-reviewed and collaboratively-written review article for every human gene. The Gene Wiki currently takes the form of 10,000 articles in the online encyclopedia Wikipedia. This collection of articles is viewed over 50 million times and edited over 15,000 times per year. In this talk, we will describe our efforts to create a critical mass of users, to mine structured gene annotations from Gene Wiki text, and to integrate these data in bioinformatics analyses.
Part D: Distributed Community Intelligence through the Scientific Discovery Game Foldit
(4:00 p.m. - 4:25 p.m.)
Speaker: Firas Khatib,
University of Washington, United States
Foldit is a graphical user interface representation of the Rosetta algorithm where players manipulate protein structures with the corresponding Rosetta energy shown in real time as their score. By leveraging human puzzle solving, pattern-recognition, and 3D spatial reasoning, humans are able to outperform many state of the art prediction methods. Foldit players have generated models accurate enough for successful molecular replacement and subsequent structure determination of a monomeric retroviral protease, despite not being given any experimental data. Foldit players have also been provided tools to encode their folding strategies, and within seven months one of these player-developed folding algorithms outperformed a previously published algorithm. Most recently, players were challenged to remodel the backbone of a computationally designed bimolecular Diels-Alderase to enable additional interactions with substrates. Several iterations of design and characterization generated a 24 residue helix-turn-helix motif, including a 13 residue insertion, that increased enzyme activity over 18-fold. X-ray crystallography showed that the large insertion adopts a helix-turn-helix structure positioned as in the Foldit model. The ability of an online gaming community to successfully guide large-scale protein structure prediction and design problems suggests that human creativity can extend down to molecular scale when given the appropriate tools.