ISMB 2018 - Special Sessions
- 3D Genomics: Computational approaches for analyzing the role of three-dimensional chromatin organization in gene regulation.
- Single-particle Cryo-electron Microscopy, Cryo-electron Tomography, and Integrative/Hybrid Methods Studies of Macromolecular Machines: Opportunities and Challenges for the Bioinformatics Community
- Omics Data Compression and Storage: Present and Future
- Advancing computational biology through critical assessments, community experiments, and crowdsourcing
- SCANGEN: Single-cell cancer genomics
Room: Grand Ballroom C-F
Saturday July 7, 10:15 am - 6:00 pm
Ferhat Ay, La Jolla Institute for Allergy and Immunology, United States
Sushmita Roy, Biostatistics & Medical Informatics, Wisconsin Institute for Discovery, United States
Long-range gene regulatory interactions are defined as interactions between a region of regulatory DNA sequence and a target gene that can be hundreds of kilobases away. Such interactions are emerging as important determinants of cell type specific expression and the effect of regulatory sequence variants on complex phenotypes including those associated with diseases. The field of regulatory genomics has recently witnessed significantly increased interest in the three-dimensional structure of DNA in the nucleus, catalyzed by the availability of chromosome conformation capture (3C) data sets that characterize the 3D organization of chromatin at a genome-wide scale. This organization, also referred to as the 3D nucleome, is not only important for packing the genome into the nucleus, but it also has significant impact on how the genome functions. With the emergence of these new data types, there is an increasingly growing demand for computational tools that can systematically analyze these data. These tools range from data processing issues (e.g. mapping and normalization) to data analysis issues, such as predicting chromosomal organizational units (e.g. TADs), identifying significant interactions between regulatory elements (e.g. enhancer-promoter), examining the interplay of transcription factors, architectural proteins and chromatin states in establishing these interactions, and examining how these interactions are impacted by sequence variants.
Modeling and predicting the 3D genome
William Noble, Genome Sciences & Computer Science University of Washington, United States
High-resolution analysis of chromatin conformation capture data
Mathieu Blanchette, Computer Science, McGill University, Canada
Continuous-trait probabilistic model for comparing nuclear genome organization of multiple species
Jian Ma, Computational Biology and Machine Learning, Carnegie Mellon University, United States
Impact of structural variants on 3D genome structure in cancer cells
Feng Yue, Biochemistry and Molecular Biology, Pennsylvania State University, United States
Connections between the structure and function of 3D genome folding
Geoffrey Fudenberg, University of California, United States
Room: Columbus EF Sunday July 8, 10:15 am - 12:40 pm
Stephen K. Burley, RCSB Protein Data Bank, United States
Jose Duarte, RCSB Protein Data Bank, United States
Among the most exciting of these newly deposited PDB structures are those coming from singleparticle cryo-electron microscopy (EM) and cryo-electron tomography (ET). Recent technical advances in sample preparation, electron optics, direct electron detection, and data processing software have created a perfect storm for the PDB. With these new methods cryo-EM and -ET are producing atomic level structures of macromolecular machines, such as multi-subunit RNA and DNA polymerases, ribosomes, and nuclear pore complexes. The next wave of exciting new structures will come from so-called integrative/hybrid methods, which typically combine cryo-EM or -ET data with data from chemical cross-linking, fluorescence resonance energy transfer, and homology models to produce multi-scale structures of even larger biomolecular machines.
The Special Session will highlight examples of the exciting work going in in these two frontier areas of structural biology from four distinguished speakers, with reference to the manifold challenges and opportunities for the bioinformatics community.
Speaker: Jose Duarte, RCSB Protein Data Bank, UC San Diego
Cryo-EM visualization of eukaryotic transcription initiation machineries
Speaker: Yuan He, Department of Molecular Biosciences, Northwestern University, United States
Visualizing molecular assemblies inside cells by cryo-electron tomography
Speaker: Wei Dai, Department of Cell Biology and Neuroscience, Rutgers University, United States
Integrative structural biology
Speaker: Barak Raveh, Department of Bioengineering and Therapeutic Sciences, UC San Francisco, United States
Web-based 3D visualization and exploration of cryo-electron microscopy and
integrative/hybrid methods structures
Speaker: Alexander Rose, RCSB Protein Data Bank, UC San Diego, United States
Speaker: Stephen K. Burley, RCSB Protein Data Bank, Rutgers University and UC San Diego, United States
Room: Columbus EF Sunday July 8, 2:00 pm - 6:00 pm
Mikel Hernaez- University of Illinois at Urbana-Champaign, Institute for Genomic Biology, United States
Idoia Ochoa, University of Illinois at Urbana-Champaign, Electrical and Computer Engineering, United States
In 2003 the first human genome assembly was completed. It was the end of a project that took almost 13 years to complete and cost 3 billion dollars (around $1 per base pair). This milestone ushered in the genomics era, giving rise to personalized or precision medicine. Fortunately, sequencing cost has drastically decreased in recent years. While in 2004 the cost of sequencing a whole human genome was around $20 million, in 2008 it dropped to a million, and in 2017 to a mere $1000. As a result of this decrease in sequencing cost, as well as advancements in sequencing technology, massive amounts of genomic data are being generated. At the current rate of growth (sequencing data is doubling approximately every seven months), more than an exabyte of sequencing data per year will be produced, approaching the zettabytes by 2025 . As an example, the sequencing data generated by the 1000 Genomes Project (www.1000genoms.org) in the first 6 months exceeded the sequence data accumulated during 21 years in the NCBI GenBank database .
In addition, the generation of other types of omics data are also experiencing a rapid growth. For example, DNA methylation data has been found to be important in early detection of tumors and in determining the prognosis of the disease , and as a result it has been the subject of many large-scale projects including MethylomeDB and DiseaseMeth , among others. Proteomics and metabolomics studies are also gaining momentum, as they contribute towards a better understanding of the dynamic processes involved in disease, with direct applications in prediction, diagnosis and prognosis, and several repositories have been created, such as PeptideAtlas/PASSEL and PRIDE.
This situation calls for state-of-the-art, efficient compressed representations of massive biological datasets, that can not only alleviate the storage requirements, but also facilitate the exchange and dissemination of these data. This undertaking is of paramount importance, as the storage and acquisition of the data are becoming the major bottleneck, as evidenced by the recent flourishing of cloud-based solutions enabling processing the data directly on the cloud. For example, companies such as DNAnexus, GenoSpace, Genome Cloud, and Google Genomics, to name a few, offer solutions to perform genome analysis in the cloud.
This sentiment is also reflected by the NIH Big Data to Knowledge (BD2K) initiative launched in 2013, which acknowledged the need of developing innovative and transformative compression schemes to accelerate the integration of big data and data science into biomedical research. In addition, the International Standardization Organization (ISO) is developing, under MPEG (Moving Picture Expert Group), a standard for genomic information representation.
This special session will cover current efforts in this area, as well as future challenges. This is of importance to biologistics and researchers alike that work with omics data, as the developed tools will soon become part of their standard pipelines.
Room: Columbus GH Monday July 9, 10:15 am - 4:00 pm
Gaia Andreoletti, University of California, Berkeley, United States
Steven E Brenner, University of California, Berkeley, United States
John Moult, The University of Maryland, United States
Current results from a wide range of critical assessment community experiments in computational biology. This session represents a unique and unprecedented gathering of a diverse range of critical assessment organizations.
Community assessment has emerged as an effective framework to evaluate and develop methodologies, especially experiments in which participants are challenged to deduce biological problems such as determining the phenotypic consequences of genomic variation, protein structure, and system perturbations. Some such challenges use community-effort to engage a large community to see how well a certain method can achieve a certain goal. Successful challenge frameworks of this type are able not only to evaluate the effectiveness of methods but also to highlight innovation, progress, and bottlenecks in the field, to guide future research efforts, and to foster strong collaborative communities
Room: Columbus IJ Tuesday July 10, 8:35 am - 4:40 pm
Kieran R Campbell, University of British Columbia & BC Cancer Agency, Canada
Sohrab P Shah, University of British Columbia & BC Cancer Agency, Canada
In the past five years technological advances have given us the unprecedented ability to measure RNA and DNA at the single-cell level. This now enables us to routinely measure gene expression and genomic alterations across tens of thousands of cells, discovering new cell types, developmental lineages, and cell-specific mutational patterns. This new data has prompted an explosion in statistical and computational methods development (http://www.scrna-tools.org/) with over 150 tools being produced in the past few years.
However, to-date the majority of methods developed have focused on either technical aspects (such as normalization and differential expression) or on applications in developmental biology such as lineage inference, with relatively little attention applied to the huge potential of single-cell data to unveil the complex biology behind cancer inception and progression. As one of the first workshops of its kind, this special session will bring together researchers developing computational and statistical methods for single-cell cancer biology. It will focus around (though not be limited to) four core topics:
1. Modelling cancer evolution
As tumors evolve they accumulate both point mutations and large structural rearrangements. The “life-histories” of these tumors are informative of the mutational processes that allow the cancer cells to evade the body’s checkpoints and can be predictive of future evolution and response to therapy. Methods covered under this topic could address: phylogenetic inference from single-cell data; inference of evolutionary processes from single-cell data; identifying singlecell cancer signatures; inference of fitness from single-cell analysis of population dynamics.
2. Integrative analyses of multi-modal data
A vast array of measurements can be made at single-cell resolution, including RNA and DNA-sequencing and epigenetic status such as methylation and chromatin accessibility. Methods covered in this topic will include: modelling of joint measurement assays (such as G&T-seq); relating and interpreting measurements from different technologies.
3. Scalable inference at the single-cell level
A typical single-cell RNA or DNA-seq dataset now contains around 100x more cells than it did just 5 years ago. As a result, there is a pressing need for computational and statistical methods that scale to “big data” sizes, particularly since fast computation allows iterative analyses by investigators, aiding SCANGEN: Single-cell cancer genomics ISMB 2018 Special Session Proposal biological interpretation. Methods covered in this topic will include: scalable statistical inference for single-cell data using methods such as stochastic optimization; computational tools for dealing with large single-cell datasets.
4. Interactions and perturbations at the single-cell level
This broad topic concerns methods to understand how cancer cells react to both their environment and external perturbation. Methods could address: how cells interact with their microenvironment; how cells respond to and resist chemotherapeutic interventions; how transcriptional programming and clonal selection are affected by genomic perturbations such as CRISPR.