Leading Professional Society for Computational Biology and Bioinformatics
Connecting, Training, Empowering, Worldwide

banner

General Computational Biology

COSI Track Presentations

Attention Presenters - please review the Speaker Information Page available here
Schedule subject to change
Wednesday, July 24th
10:20 AM-10:40 AM
Proceedings Presentation: Controlling Large Boolean Networks with Single-Step Perturbations
Room: Sydney (2nd Floor)
  • Alexis Baudin, École Normale Supérieure Paris-Saclay, France
  • Soumya Paul, University of Luxembourg, Luxembourg
  • Cui Su, University of Luxembourg, Luxembourg
  • Jun Pang, University of Luxembourg, Luxembourg

Presentation Overview: Show

Motivation:
The control of Boolean networks has traditionally focussed on strategies where the perturbations are applied to the nodes of the network for an extended period of time. In this work, we study if and how a Boolean network can be controlled by perturbing a minimal set of nodes for a single-step and letting the system evolve afterwards according to its original dynamics. More precisely, given a Boolean network BN, we compute a minimal subset Cmin of the nodes such that BN can be driven from any initial state in an attractor to another `desired' attractor by perturbing some or all of the nodes of Cmin for a single-step. Such kind of control is attractive for biological systems because they are less time consuming than the traditional strategies for control while also being financially more viable. However, due to the phenomenon of state-space explosion, computing such a minimal subset is computationally inefficient and an approach that deals with the entire network in one go, does not scale well for large networks.
Results:
We develop a `divide-and-conquer' approach by decomposing the network into smaller partitions, computing the minimal control on the projection of the attractors to these partitions and then composing the results to obtain Cmin for the whole network. We implement our method and test it on various real-life biological networks to demonstrate its applicability and efficiency.

10:40 AM-11:00 AM
Proceedings Presentation: MCS^2 : Minimal coordinated supports for fast enumeration of minimal cut sets in metabolic networks
Room: Sydney (2nd Floor)
  • Seyed Reza Miraskarshahi, Simon Fraser University, Canada
  • Hooman Zabeti, Simon Fraser University, Canada
  • Tamon Stephen, Simon Fraser University, Canada
  • Leonid Chindelevitch, Simon Fraser University, Canada

Presentation Overview: Show

Motivation: Constraint-based modeling of metabolic networks helps researchers gain insight into the metabolic processes of many organisms, both prokaryotic and eukaryotic. Minimal Cut Sets (MCSs) are minimal sets of reactions whose inhibition blocks a target reaction in a metabolic network. Most approaches for finding the MCSs in constrained-based models require, either as an intermediate step or as a byproduct of the calculation, the computation of the set of elementary flux modes (EFMs), a convex basis for the valid flux vectors in the network. Recently, Ballerstein et al. proposed a method for computing the MCSs of a network without first computing its EFMs, by creating a dual network whose EFMs are a superset of the MCSs of the original network. However, their dual network is always larger than the original network and depends on the target reaction. Here we propose the construction of a different dual network, which is typically smaller than the original network and is independent of the target reaction, for the same purpose. We prove the correctness of our approach, MCS2, and describe how it can be modified to compute the few smallest MCSs for a given target reaction.
Results: We compare MCS2 to the method of Ballerstein et al. and two other existing methods. We show that MCS2 succeeds in calculating the full set of MCSs in many models where other approaches cannot finish within a reasonable amount of time. Thus, in addition to its theoretical novelty, our approach provides a practical advantage over existing methods.

11:00 AM-11:20 AM
Proceedings Presentation: Bayesian Metabolic Flux Analysis reveals intracellular flux couplings
Room: Sydney (2nd Floor)
  • Markus Heinonen, Aalto University, Finland
  • Maria Osmala, Aalto University, Finland
  • Henrik Mannerström, Aalto University, Finland
  • Janne Wallenius, Institute for Molecular Medicine Finland, Finland
  • Juho Rousu, Aalto University, Finland
  • Harri Lähdesmäki, Aalto University, Finland
  • Samuel Kaski, Aalto University, Finland

Presentation Overview: Show

Motivation: Metabolic flux balance analysis is a standard tool in analysing metabolic reaction rates compatible with measurements, steady-state and the metabolic reaction network stoichiometry. Flux analysis methods commonly place unrealistic assumptions on fluxes due to the convenience of formulating the problem as a linear programming model, and most methods ignore the notable uncertainty in flux estimates.
Results: We introduce a novel paradigm of Bayesian metabolic flux analysis that models the reactions of the whole genome-scale cellular system in probabilistic terms, and can infer the full flux vector distribution of genome-scale metabolic systems based on exchange and intracellular (e.g. 13C) flux measurements, steady-state assumptions, and objective function assumptions. The Bayesian model couples all fluxes jointly together in a simple truncated multivariate posterior distribution, which reveals informative flux couplings. Our model is a plug-in replacement to conventional metabolic balance methods, such as flux balance analysis (FBA). Our experiments indicate that we can characterise the genome-scale flux covariances, reveal flux couplings, and determine more intracellular unobserved fluxes in C. acetobutylicum from 13C data than flux variability analysis.
Availability: The COBRA compatible software is available at github.com/markusheinonen/bamfa
Contact: markus.o.heinonen@aalto.fi

11:20 AM-11:40 AM
NOVEL SOFTWARE ‘TRIBES’ ENABLES DISTANT RELATIONSHIP AND DISEASE VARIANT DISCOVERY IN AMYOTROPHIC LATERAL SCLEROSIS
Room: Sydney (2nd Floor)
  • Lyndal Henden, Macquarie University, Australia
  • Piotr Szul, CSIRO, Australia
  • Emily McCann, Macquarie University, Australia
  • Ian Blair, Macquarie University, Australia
  • Kelly Williams, Macquarie University, Australia
  • Denis Bauer, CSIRO, Australia
  • Natalie Twine, CSIRO, Australia

Presentation Overview: Show

The power to discover disease variants can be increased by focusing on genomic regions shared within an affected family, particularly from distant relatives (greater than 3rd degree). However, available relatedness tools are lacking in accuracy or ease-of-use for distant relatives. To improve on shortcomings of existing tools we have developed ‘TRIBES’, a novel platform for distant relatedness discovery. We demonstrate the capabilities of TRIBES on a large (n=810) Amyotrophic lateral sclerosis (ALS) whole genome sequence (WGS) cohort.

ALS is a neurodegenerative disorder, which leads to death within 5 years and has no treatment. 10% of ALS cases are familial (FALS), while 90% are sporadic (SALS). Due to the late onset of disease and occasional presence of FALS gene mutations it is likely that some SALS are distantly related.

Using TRIBES on FALS pairs with known relationships (n=84), we identified 100%, 98%, 100%, 83%, 71% and 59% of 1st, 2nd, 3rd, 4th, 5th and 6th degree relatives accurately. Additionally, TRIBES uncovered 54 novel relationship pairs and was able to refine the disease-critical regions is some pairs, which include candidate ALS genes DYNC1H1, FIG4 and APOE. Furthermore, we discovered a novel haplotype joining 19 distinct FALS families over known ALS gene, SOD1.

11:40 AM-12:00 PM
Rapid genotype imputation from large-scale sequence data
Room: Sydney (2nd Floor)
  • Brian Browning, University of Washington, United States
  • Ying Zhou, University of Washington, United States
  • Sharon Browning, University of Washington, United States

Presentation Overview: Show

The explosive increase in genome sequencing will soon make it possible to assemble reference panels containing hundreds of thousands, and eventually millions, of sequenced individuals.

We have developed a genotype imputation method that greatly reduces the computational cost of imputation from large reference panels. Our method models each imputed haplotype using a small number of composite reference haplotypes. Each composite reference haplotype is a mosaic of reference haplotype segments. Each segment of the mosaic contains a long identity by descent segment shared by the target sample and a reference haplotype.

We compared genotype imputation using composite reference haplotypes (implemented in Beagle 5.0), with Beagle 4.1, Impute4, Minimac4, and PBWT. All methods, except PBWT, had nearly identical accuracy, but Beagle 5.0 had much faster computation times for large reference panels. For 100k, 1M, and 10M simulated UK European phased reference samples and 1000 phased target samples, Beagle 5.0’s computation time was 3.4x (100k), 49.6x (1M), and 804x (10M) faster than the fastest alternative method.

Analyses performed on the Amazon Elastic Compute Cloud show that Beagle 5.0 can perform genome-wide imputation from 10M reference samples into 1,000 phased target samples at a cost of less than one US cent per sample.

12:00 PM-12:20 PM
Large blocklength LDPC codes for Illumina sequencing-based DNA storage
Room: Sydney (2nd Floor)
  • Shubham Chandak, Stanford University, United States
  • Kedar Tatwawadi, Stanford University, United States
  • Billy Lau, Stanford University, United States
  • Matthew Kubit, Stanford University, United States
  • Jay Mardia, Stanford University, United States
  • Joachim Neu, Stanford University, United States
  • Hanlee Ji, Stanford University, United States
  • Tsachy Weissman, Stanford University, United States
  • Peter Griffin, Stanford University, United States
  • Mary Wootters, Stanford University, United States

Presentation Overview: Show

With the amount of data being stored increasing rapidly, current storage technologies are unable to keep up due to the slowing down of Moore’s law. In this context, DNA based storage systems can offer significantly higher storage densities (petabytes/gram) and durability (thousands of years) than current technologies. Recent advances in DNA sequencing and synthesis have made DNA storage a promising candidate for the storage technology of the future.

Recently, there have been multiple efforts in this direction focusing on aspects such as error correction for synthesis/sequencing errors and erasure correction to handle missing sequences. The typical approach is to separate the codes for handling errors and erasures, but there is limited understanding of the efficiency of this framework.

In this work, we study the trade-off between the writing and reading costs involved in DNA storage and propose practical and efficient schemes to achieve a smooth trade-off between these quantities. Our scheme breaks from the traditional framework and instead uses large block-length LDPC codes for both erasure and error correction, coupled with novel techniques to handle insertion and deletion errors. For a range of writing costs, the proposed scheme achieves 30-40% lower reading costs than state-of-the-art techniques using Illumina sequencing.

12:20 PM-12:40 PM
A New D2 Statistic and Algorithm for Efficient Detection of Repetitive Sequences in Whole Genomes and in Short Sequencing Reads
Room: Sydney (2nd Floor)
  • Xuegong Zhang, Tsinghua University, China
  • Sijie Chen, Tsinghua University, China
  • Yixin Chen, Tsinghua University, China
  • Michael Waterman, University of Southern California, United States
  • Fengzhu Sun, Department of Biological Sciences, University of Southern California, United States

Presentation Overview: Show

Detecting sequences containing repetitive regions is a basic bioinformatics task with many applications. Several methods have been developed for various types of repeat detection tasks. An efficient generic method for detecting most types of repetitive sequences is still desirable. Inspired by the excellent properties and successful applications of the D2 family of statistics in comparative analyses of genomic sequences, we developed a new statistic D2R that can efficiently discriminate sequences with or without repetitive regions.

Using the proposed D2R statistic, we developed an algorithm of linear time and space complexity for detecting most types of repetitive sequences in multiple scenarios, including finding candidate CRISPR regions from bacterial genomic or metagenomics sequences. Simulation and real data experiments show that the method works well on both assembled sequences and unassembled short reads.

2:00 PM-2:20 PM
Leveraging single-cell RNA-seq to infer cell type-specific somatic mutations and mosaicism in Alzheimer's disease.
Room: Sydney (2nd Floor)
  • Manolis Kellis, Massachusetts Institute of Technology, United States
  • Carles Boix, Massachusetts Institute of Technology, United States
  • Maria Kousi, Massachusetts Institute of Technology, United States
  • Hansruedi Mathys, Massachusetts Institute of Technology, United States
  • Li-Huei Tsai, Massachusetts Institute of Technology, United States

Presentation Overview: Show

Mosaic mutations accumulate in post-mitotic neurons and mitotic glia of the aging brain, and may play pivotal roles in driving focal neurological disorders. Previous mosaicism studies used single-cell DNA sequencing, which is cost-prohibitive and doesn’t distinguish the cell types harboring these mutations.

Here, we present a new method for cell-specific somatic mutation inference from single-cell RNA-seq data, and apply it to Smart-seq2 samples from a cohort of 24 Alzheimer’s Disease (AD) and 24 non-AD individuals. We develop permutation tests to estimate pathway- and gene-level mutational burden in each cell type, revealing multiple significant differentially-mutated genes and pathways for both glial and neuronal cells, including several previously implicated in AD.

We predict dozens of clonal events per patient in six major cell types, including excitatory and inhibitory neurons, astrocytes, microglia, oligodendrocytes, and OPCs. We also trace lineages using clonal ancestry across >200,000 cells, >150 individuals, and multiple brain regions.

Overall, we show that single cell RNA-seq enables a systematic cell-type-specific survey of mosaicism in the aging brain, shedding light on the role of clonality and relative pathway burden in different cell types and in disease, and providing important insights on the differentiation processes of the developing and adult human brain.

2:20 PM-2:40 PM
Read Mapping on Genome Variation Graphs
Room: Sydney (2nd Floor)
  • Kavya Vaddadi, TCS Research, India
  • Rajgopal Srinivasan, TCS Research, India
  • Naveen Sivadasan, TCS Research, India

Presentation Overview: Show

Genome variation graphs are natural candidates to represent a pangenome collection. In such graphs, common subsequences are encoded as vertices and the genomic variations are capturedby introducing additional labeled vertices and directed edges. Unlike a linear reference, a reference graph allows rich representation of the genomic diversities and avoids reference bias. We address the fundamental problem of mapping reads to genome variation graphs.

We give a novel mapping algorithm V-MAP for efficient identification of small subgraph of the genome graph for optimal gapped alignment of reads. For fast and accurate mapping, V-MAP creates a space efficient index using locality sensitive minimizer signatures computed using a novel graph winnowing and graph embedding onto metric space. Experiments involving graph constructed from the 1000 Genomes data and using both real and simulated reads show that V-MAP is fast, memory efficient and can map short reads as well as PacBio/Nanopore long reads with high accuracy. V-MAP performance is significantly better than the state-of-the-art, especially for long reads.

2:40 PM-3:00 PM
Batch correction evaluation framework using a-priori gene-gene associations: applied to the GTEx dataset
Room: Sydney (2nd Floor)
  • Judith Somekh, University of Haifa, Israel
  • Shai Shen-Orr, Tachnion, Israel
  • Isaac Kohane, Harvard University, United States

Presentation Overview: Show

Correcting a heterogeneous dataset that presents artefacts from several confounders is often an essential bioinformatics task. Attempting to remove these batch effects will result in some biologically meaningful signals being lost. Thus, a central challenge is assessing if the removal of unwanted technical variation harms the biological signal that is of interest to the researcher.
We describe a novel framework, B-CeF, to evaluate the effectiveness of batch correction methods and their tendency toward over or under correction. The approach is based on comparing co-expression of adjusted gene-gene pairs to a-priori knowledge of highly confident gene-gene associations based on thousands of unrelated experiments derived from an external reference. Our framework includes three steps: (1) data adjustment with desired methods (2) calculating gene-gene co-expression measurements for adjusted datasets (3) evaluating the performance of the co-expression measurements against a gold standard. Using the framework, we evaluated five batch correction methods applied to RNA-seq data of six representative tissues derived from the GTEx project. Our framework enables the evaluation of batch correction methods to better preserve the original biological signal. We show that correcting for known confounders outperforms factor analysis-based methods that estimate hidden confounders. The code is publicly available as an R package.

3:00 PM-3:20 PM
Inferring pathway activation/suppression to rank tumors by sensitivity to immune checkpoint therapy
Room: Sydney (2nd Floor)
  • Boris Reva, Icahn School of Medicine at Mount Sinai, United States
  • Anna Calinawan, Icahn School of Medicine at Mount Sinai, United States
  • Dmitry Rykunov, Icahn School of Medicine at Mount Sinai, United States
  • Azra Krek, Icahn School of Medicine at Mount Sinai, United States
  • Sujit Nair, Icahn School of Medicine at Mount Sinai, United States
  • Ash Tewari, Icahn School of Medicine at Mount Sinai, United States
  • Eric Schadt, Icahn School of Medicine at Mount Sinai, United States

Presentation Overview: Show

We introduce a new method to infer pathway activation and suppression by examining under- and over-representation of pathway genes in tumor genes ranked by gene expression levels. The novelty of our approach rests in the independent assessment of over- and under-representations of genes in a given pathway in the rank-ordered list of genes for a given sample. By finding the point of maximal pathway enrichment in the rank-ordered list, the tumors are stratified into two groups, one in which the pathway is inferred as activated (or suppressed) and the other inferred as not activated (or suppressed). We applied this method to differentiate prostate cancers by sensitivity to immune checkpoint inhibitors. We hypothesized that non-responder tumors had either the IFN-γ axis suppressed, which makes tumors invisible to immune cells, or the IFN-γ axis activated along with highly activated processes of immune evasion. Our findings show that ~1/3 of prostate tumors are likely non-responders to checkpoint therapy due to downregulation of key genes along the IFN-γ axis. Using nominated tumor immune subtypes, we determined characteristically expressed genes involved in immune evasion, proposed combination therapy and specific targets for both immune subtypes, and proposed biomarkers for clinical diagnostics of prostate cancer immune subtypes.

3:20 PM-3:40 PM
Contribution of synthetic lethality to cancer risk and onset time across human tissues
Room: Sydney (2nd Floor)
  • Nishanth Ulhas Nair, National Institutes of Health (NIH), United States
  • Kuoyuan Cheng, National Institutes of Health (NIH), United States
  • Joo Sang Lee, Cancer Data Science Lab, NCI/NIH, United States
  • Eytan Ruppin, Cancer Data Science Lab, NCI/NIH, United States

Presentation Overview: Show

Considerable variation exists in lifetime cancer risk across human tissues, which has been reported to be strongly correlated with the number of stem cell divisions and with abnormal DNA-methylation levels occurring in a tissue. Here, we investigate the hypothesis that the number of down-regulated synthetic lethal (SL) gene pairs in a tissue (termed its SL load) is another strong determinant of its cancer risk. We show that the SL load of normal tissues is higher than that of the cancers that originate from them, and that SL load of early-stage tumors is higher than that of late-stage ones. These findings testify that many SLs are lost during these transitions, and lead to the hypothesis that high SL load in normal tissues may impede cancer development. Accordingly, we find that normal tissues with high SL load have less risk of developing cancer than tissues with low SL load. Tissues with high SL load also develop cancer at later ages than tissues with low SL load. The SLs lost in the transition from healthy to cancer tissues tend to be the functionally stronger ones. Our findings highlight the significant role of synthetic lethality in determining cancer risk and onset time across tissues.

3:40 PM-4:00 PM
Deciphering the landscape of phosphorylated HLA-I ligands
Room: Sydney (2nd Floor)
  • Marthe Solleder, University of Lausanne, Swiss Institute of Bioinformatics, Switzerland
  • David Gfeller, University of Lausanne, Swiss Institute of Bioinformatics, Switzerland

Presentation Overview: Show

The identification and prediction of HLA-I–peptide interactions play an important role in our understanding of antigen recognition in infected or malignant cells. In cancer, non-self HLA-I ligands can arise from many different alterations, including non-synonymous mutations, gene fusion, cancer-specific alternative mRNA splicing or aberrant post-translational modifications. In this study, we collected in-depth phosphorylated HLA-I peptidomics data (1,920 unique phosphorylated peptides) from several studies covering 67 HLA-I alleles and expanded our motif deconvolution tool to identify precise binding motifs of phosphorylated HLA-I ligands for several alleles. In addition to the previously observed preferences for phosphorylation at P4, for proline next to the phosphosite and for arginine at P1, we could detect a clear enrichment of phosphorylated peptides among HLA-C ligands and among longer peptides. Binding assays were used to validate and interpret these observations. Using these data, we then developed the first predictor of HLA-I– phosphorylated peptide interactions and demonstrated that combining phosphorylated and unmodified HLA-I ligands in the training of the predictor led to highest accuracy.

4:40 PM-5:00 PM
PANTHER Classification System – An integrated platform for genome-wide gene function analysis
Room: Sydney (2nd Floor)
  • Huaiyu Mi, University of Southern California, United States
  • Anushya Muruganujan, University of Southern California, United States
  • Dustin Ebert, University of Southern California, United States
  • Paul Thomas, University of Southern California, United States

Presentation Overview: Show

PANTHER Classification System (www.pantherdb.org) is a comprehensive system that combines genomes, gene function classifications, pathways and statistical analysis tools to enable biologists to annotate and analyze large-scale genome data. 132 complete genomes are organized into gene families and subfamilies; evolutionary relationships between genes are represented in phylogenetic trees, multiple sequence alignments and statistical models (hidden Markov models, or HMMs). The families and subfamilies are annotated with Gene Ontology terms and sequences are associated to PANTHER and Reactome pathways. On top of that, a suite of tools has been built to allow users to browse and query gene functions and analyze large-scale experimental data with statistical tests. Recent updates have expanded the support of functional annotation and statistical analysis of genetic variants data, and the genome coverage to an additional 780 organisms. PANTHER supports a number of community projects, including GO Phylogenetic Annotation effort and GO Enrichment Analysis function. User stats and user feedback indicate that PANTHER is widely used by bench scientists, bioinformaticians, computer scientists and systems biologists in genome-wide data analysis.

5:00 PM-5:20 PM
CAFE: Compositional Anomaly and Feature Enrichment Assessment for Delineation of Genomic Islands
Room: Sydney (2nd Floor)
  • Rajeev Azad, University of North Texas, United States

Presentation Overview: Show

One of the evolutionary forces driving bacterial genome evolution is the acquisition of clusters of genes through horizontal gene transfer (HGT). These genomic islands may confer adaptive advantages to the recipient bacteria, such as, the ability to thwart antibiotics, become virulent or hypervirulent, or acquire novel metabolic traits. Methods for detecting genomic islands either search for markers or features typical of islands or examine anomaly in oligonucleotide composition against the genome background. The former tends to underestimate, missing islands that have the markers either lost or degraded, while the latter tends to overestimate, due to their inability to discriminate compositional atypicality arising because of HGT from those that are a consequence of other biological factors. We propose here a framework that exploits the strengths of both these approaches while bypassing the pitfalls of either. Genomic islands lacking markers are identified by their association with genomic islands with markers. This was made possible by performing marker enrichment and phyletic pattern analyses within an integrated framework of recursive segmentation and clustering. The proposed method, CAFE, compared favorably with frequently used methods for genomic island detection on synthetic test datasets and on a test-set of known islands from 15 well-characterized bacterial species.

5:20 PM-5:40 PM
Human Aging DNA Methylation Signatures are Conserved but Accelerated in Cultured Fibroblasts
Room: Sydney (2nd Floor)
  • Gabriel Sturm, Columbia University, United States
  • Andres Cardenas, University of California, Berkeley, United States
  • Marie-Abèle Bind, Harvard University, United States
  • Steve Horvath, University of California, Los Angeles, United States
  • Shuang Wang, Columbia University, United States
  • Yunzhang Wang, Karolinska Institutet, Sweden
  • Sara Hägg, Karolinska Institutet, Sweden
  • Michio Hirano, Columbia University, United States
  • Martin Picard, Columbia University, United States

Presentation Overview: Show

Aging is associated with progressive and site-specific changes in DNA methylation (DNAm). These global DNAm changes have been used to train elastic net regression algorithms, i.e. DNAm clocks, to accurately predict chronological age in humans. However, relatively little is known about how these clocks perform on cells in culture. Here we culture primary human fibroblasts across the cellular lifespan (~6 months) and use four different DNAm clocks to show that age-related DNAm signatures are conserved and accelerated in vitro. The Skin & Blood clock shows the best linear correlation with chronological time (r=0.90), including during replicative senescence. Although similar in nature, the rate of epigenetic aging is approximately 62x times faster in cultured cells than in the human body. Leveraging this data’s high-temporal resolution we subsequently applied generalized additive modeling and show how single CpGs exhibit loci-specific, linear and nonlinear trajectories across the lifespan that reach rates up to -47% (hypomethylation) to +23% (hypermethylation) per month, which are remarkably higher than changes in the human body. Our computational approach demonstrates how global and single CpG DNAm dynamics are conserved and accelerated in cultured fibroblasts, which may represent a system to evaluate age-modifying interventions across the lifespan.

5:40 PM-6:00 PM
The origin of the central dogma through conflicting multilevel selection
Room: Sydney (2nd Floor)
  • Nobuto Takeuchi, School of Biological Sciences, University of Auckland, New Zealand
  • Kunihiko Kaneko, Research Center for Complex Systems Biology, Graduate School of Arts and Sciences, University of Tokyo, Japan

Presentation Overview: Show

The central dogma of molecular biology rests on two kinds of asymmetry between genomes and enzymes: informatic asymmetry, where information flows from genomes to enzymes but not from enzymes to genomes; and catalytic asymmetry, where enzymes provide chemical catalysis but genomes do not. How did these asymmetries originate? Here we show that these asymmetries can spontaneously arise from conflict between selection at the molecular level and selection at the cellular level. We developed a computational model consisting of a population of protocells, each containing a population of replicating catalytic molecules. The molecules are assumed to face a trade-off between serving as catalysts and serving as templates. This trade-off causes conflicting multilevel selection: serving as catalysts is favored by selection between protocells, whereas serving as templates is favored by selection between molecules within protocells. This conflict induces informatic and catalytic symmetry breaking, whereby the molecules differentiate into genomes and enzymes, establishing the central dogma. We show mathematically that the symmetry breaking is caused by a positive feedback between Fisher's reproductive values and the relative impact of selection at different levels. Our results suggest that the central dogma is a logical consequence of conflicting multilevel selection.