JPI Abstracts

Tweets by @ISMBECCB

JPI Abstracts

Abeel, Thomas

Broad Institute of MIT and Harvard

Computational genotype-phenotype associations in Mycobacterium tuberculosis

The current overall biological theme of my research is “Antibiotic resistance in human pathogens”, with a strong focus on Mycobacterium tuberculosis. The goal of our work is to develop computational methods for associating microbial phenotypes with genotypic information gained through whole genome sequencing. We want to identify mutations in the Mycobacterium tuberculosis (Mtb) genome (genotype) that are associated with, or are predictive of, drug resistant forms of tuberculosis (TB) disease (phenotype). Being able to predict drug resistance from genome data, will enable faster diagnosis of TB and lead to more timely and effective treatment of patients, thus, curbing transmission of TB and improving patient outcomes.

Alexandrov, Theodore

University of Bremen

We announce a new series of publications “About My Lab” for PLoS Computational Biology which will be devoted to questions on how to run a research lab to success and to escape the common pitfalls on this way.

A research lab is the workhorse of modern science. Running a research lab, whether it is a wet lab, dry lab, or both, is a complex and costly endeavor, which requires a talented team, solid funding, lab space, and a principal investigator (PI). The success of a lab is normally associated with its PI and indeed very much depends on their scientific achievements (hence vision), productivity, and, importantly, their management skills. Management skills are essential, and it’s especially surprising that with the size and turnover of a lab being comparable to that of a small company, scientific PIs do not normally have formal management training and instead are typically self-made managers.

For a young PI at the beginning of their independent career, it is especially important to learn more about running a successful lab. Many questions pave the way: how to motivate team members; how to organize communication; how to handle conflicts inside the team; how to deal with increasing flows of information, the growing number of emails, university commitments, and publications.

Our motivation in creating the series of publications “About My Lab” at PLoS Computational Biology is to share knowledge on these questions with respect to lab management. We will raise awareness of the role of management in science, and provide an open platform for disseminating experience and opinions on this topic.

Basler, Georg

Exploring the catalytic space for the discovery of novel biotechnological processes

Current Research: Development and extension of a computational method for targeted metabolic engineering by integrating biochemical constraints and experimental data. Identification of chemical reactions inducing targeted changes in metabolism for biotechnological applications of Arabidopsis thaliana and Pseudomonas putida. Identification of enzyme-coding genes suitable for transformation, and estimation of their impact on metabolism.

Future research plans: Computational and experimental metabolic engineering of Pseudomonas putida for the efficient production of biodegradable materials by transformation of enzyme-coding genes. Development and integration of various computational methods for improved prediction of metabolic states, flux distributions, and growth. Validation and refinement of computational methods for metabolic engineering using targeted metabolomics and genetic engineering experiments.

Bonzanni, Nicola

Vrije Universiteit

Formal methods for (computational) biology

Formal methods are mathematically based techniques for the specification, development and verification of software and hardware systems. However, it has turned out that formal modeling and analysis techniques that have been developed for distributed computer systems are applicable to biological systems as well. Namely, both kinds of systems have a lot in common. Biological systems are built from separate components that communicate with each other and thus influence each other’s behavior. But why would a biologist want to use formal models? Formal models can be an excellent way to store and share knowledge on biological systems, and to reason about such systems. A formal representation of biological processes is crucial to foster interdisciplinary collaborations and maintain biological information in a coherent and cohesive state when data and knowledge reach size and dimensions that can’t be captured efficiently and unambiguously by natural language or hand drawn cartoons.

Bromberg, Yana

Rutgers University

The full extent of biological diversity is created by minor alterations to the DNA blueprint of life. How do these mutations change function? How do they affect the organism itself, its progeny, its community, and, ultimately, the world? The answers to these questions depend on the context of the changes – a complex system of many interactions, both within and outside of the organism. My long-term goal is to understand how biological function is encoded in genetic data, whether by a single gene, a genome, or a metagenome. My current research aims to make sense of the deluge of (meta) genomic, exomic, and transcriptomic data by (1) correlating genome variation to phenotype, (2) identifying the specifics of sequence-encoded molecular functions, and (3) elucidating complex system/community interactions.

My lab develops computational tools that combine information from resources such as high-throughput sequence interrogation and screens, literature-mining, and computational inference. We work to attain the above goals by:

1. Identifying variome-disease relationships, with a specific interest in autoimmune diseases. We are particularly interested in evaluating the functional burden imposed by specific variation onto the molecular pathways important in disease.

2. Analyzing disease pathways drives us to develop computational techniques for predicting protein sites of functional significance (e.g. protein, DNA, and metal binding sites). The tools we develop are useful in analysis of human proteins, as well as those from other organisms. In the non-human direction, we specifically study the evolution of the oxidoreductase enzymes, which are responsible for the critical electron transfer reactions that turn basic elements (H, O, C, S, N) into biologically active molecules and thus regulate the global flux of these elements. We currently use sequence similarity combined with active site validation to identify oxidoreductases in sequenced but not yet functionally annotated genomes. We are also developing a pathway evolution model and a structure-based approach for oxiodreductase comparison. Reconciliation of all these approaches will revolutionize our understanding of electron transfer reactions.

3. Finally, my lab has developed a method to taxonomically place microorganisms using the proteome-encoded functionality. We are currently evaluating metrics that would allow us to use our new metrics to estimate the diversity of metagenomic data directly from reads. We also built methods to gauge the microbial community composition and interactions by identifying existing and predicted “effector” proteins (bacterial warfare). These efforts together will enable us to map the sequenced metagenome data to the encoded molecular functionality of the community as a whole.

Clement-Ziza, Mathieu

Biotec TU-Dresden

A complete mass-spectrometric map of the yeast proteome applied to quantitative trait analysis

I am currently working exploiting the advantages of RNA sequencing in eQTL studies. This is done in the framework of an EU project that focuses on oxidative stress in fission yeast: RNA-seq based genotyping, gene expression quantification that takes genomic variation into account and antisense expression.

Corpas, Manuel

The Genome Analysis Centre

I am particularly interested in the efficient, reliable, fast annotation of high throughput genomic data in non-model organisms. This includes the analysis and visualization of a variety of -omics data sources as well as the computational optimization of processes to allow automation of such analyses. I very much enjoy playing with different data sources, integrating them, creating new hypotheses, testing them quickly and then inferring results into generalizable conclusions for the field of genomic regulation. As a hobby, I have sequenced my whole family and my personal poo and, with family consent, we have put all our genomic data for people to download, embodying an advocacy role to promote open source values in the field of genomics. Ethical, Legal and Social Implications (ELSI) of genomic data for the future are of great concern for me.

de Ridder, Jeroen

Delft University of Technology

In my lab we design and apply innovative data analysis algorithms that are required and inspired by a biological question to further understanding of disease biology. We particularly focus on analysis of mutations data in mouse (retroviral insertional mutagenesis) and human (cancer genome sequencing).

The work in my lab revolves around three research themes.

1) Interpretable predictive models. Instead of focusing on improving performance of predictive models such as classifiers and regression models, the main aim in this theme is to extract biologically relevant observations from a trained predictive model.

2) Scale-space analysis. In biology, the concept of scale is omnipresent. In cancer, for instance, mutations may cause disruption of nearby genes (small spatial scale) or disruption of distal genes through chromatin looping and/or distal enhancer activity (large spatial scale). In addition, mutations not only deregulate single genes (small functional scale) but also disrupt cellular pathways (large functional scale). To deal with the fact that the data that we measure contains structure at multiple scales we employ and investigate scale-space analysis methodologies.

3) Graph-based analysis. In this theme we investigate the use of graphs and graph-mining to represent and analyze high-throughput data. Graph-based representations often form the basis for scale-space analyses and knowledge extraction from trained predictive models.

Escudero, Luis M

Instituto Biomedicina Seville

Image analysis, Network Science, Neuromuscular diseases, Drosophila

Through the postdoctoral stage I focussed on the design of new imaging techniques to analyze complex morphogenetic events. One of the outcomes of my work was

developing a new method that permits objective comparison of epithelial organization. This work brings together concepts from different subject areas such as complex networks, developmental biology, cell biology and computer science. This served as the starting point of my studies as an independent investigator: to understand how cells are arranged to form organs and how disease/mutation can alter this organization. Our research is based in the consideration of biological images as complex systems. We combine computerized image analysis and network science to extract the defining signature of complex images and interpret biological processes in development and disease. Consequently, my ongoing research has diversified into different projects: from the topological analysis of fly morphogenesis to the investigation of human neuromuscular disorders.

Fang, Gang

Mount Sinai School of Medicine

Epigenomics and Infectious Diseases

We leverage a new-generation of technology, single molecule real time (SMRT) sequencing, and design novel statistical models to fully assess the extent, dynamics and functions of tens of different types of DNA chemical modifications in pathogenic bacteria, model organisms (e.g. yeast and drosophila) and human mitochondrial genomes. In SMRT sequencing, the time required for incorporation of each nucleotide is monitored in addition to the specific base selected. and can be statistically modeled to 1) detect the presence of known DNA modifications, 2) differentiate among different types of modifications and 3) predict novel types of modifications. In recent work, we discovered methylation patterns at single base resolution in the Escherichia Coli strain that caused 2011 German outbreak and demonstrated their regulatory roles on transcription and higher order phenotypes including growth rate and virulence (Nature Biotechnology 2012). We also revealed for the first time a complete map of DNA modifications for mitochondrial DNA samples isolated from human brain issue (Genome Research 2012). Meanwhile, we also innovated in the design of computational and statistical models for detecting DNA modifications (PLoS Computational Biology 2013) and characterizing differential modifications (PLoS Genetics 2013). Currently, we are integrating SMRT sequencing, second-generation sequencing (RNASeq, ChIP-Seq, TnSeq) and diverse molecular networks to study the dynamics of DNA modifications in diverse pathogens (virus and fungus) and human mitochondrial disorders. We believe our approach holds promise to transform previous knowledge about the regulation of DNA modifications in infectious and genetic diseases.

Fufezan, Christian

Universitaet Muenster

High throughput informatics on mass spectrometry data.

Cellular adaptation to reactive oxygen stress.

Developing MS informatics tools to extract as much information as possible from the experiments performed.

Combining transcriptomics and proteomics to further elucidate the cellular response to stress with focus on protein turnover.

Hehir-Kwa, Jayne

Radboud University Nijmegen Medical Centre

My current research aims are to work in a multidisciplinary team to implement new approaches to investigate and find the cause of, in particular neurodevelopmental, genomic disorders. I am interested in resolving current issues in structural variant identification either via NGS or genomic microarray, and interpretation of the variants. For example how to identify mobile DNA elements in whole genome sequencing data and what key features distinguish pathogenic copy number variants from those which are benign.

Jeong Jieun

University of Pennsylvania

Systems Biology and NGS

No abstract recorded.

Macintyre, Geoff

NICTA and The University of Melbourne

Integrated genomics for lethal prostate cancer

Prostate cancer is the most diagnosed internal malignancy in the western world. While the majority of prostate cancers are non-lethal, there is currently no reliable approach to distinguish lethal from non-lethal prostate cancer at an early, curable stage. Over the last 7 years, researchers at Epworth Medical Centre and Royal Melbourne Hospital have compiled a biobank of over 1500 tumour specimens to better understand the underlying molecular mechanisms governing lethality in prostate cancer. From this bank we were able to collect matched primary tumours, hormone naive and castrate resistant metastases from 7 individuals. We have performed whole-genome sequencing, RNA-SEQ, and methylation profiling on these tumours. I am currently leading the bioinformatics analysis of these samples, along with a number of clinically driven biomarker discovery projects. Our key findings so far include a putative mechanism for hormone driven structural rearrangements. Using the 55,000 DNA breakpoints across our samples we show that a large proportion of these breakpoints are in close proximity to androgen receptor binding sites. Furthermore, when we looked at breakpoints in 11 other cancers from TCGA and ICGC projects, only breakpoints in those cancers which are hormone dependent showed an association with androgen receptor. We observed the same results when we looked at estrogen receptor binding sites. Our results suggest that steroid hormone receptors may play a role in the formation of cancer driving structural rearrangements.

MacLean, Dan

The Sainsbury Laboratory

Dan leads the Bioinformatics group at The Sainsbury Laboratory and his research interest lies in developing methods and tools to allow scientists to take advantage of data generated by large-scale genomics projects to answer pressing biological questions. My group is involved in many strands of bioinformatics including assembly of bacterial, fungal and oomycete genomes from high-throughput reads, annotation of assemblies and identification of polymorphisms using re-sequencing methods in plants and plant pathogens. I have recently begun work developing crowdsourcing approaches to deal with emergent pathogens. My group contributes continuously to the open-source software community and has developed or co-developed a range of tools to address the needs of researchers, including the creation of tools for the sharing and visualisation of next-generation sequencing data, and the detection of SNPs from next-generation sequence data without needing a reference sequence. His group is currently engaged in producing software infrastructure specifically for crowdsourcing, involving adapting existing Open Source software and developing new interfaces to allow members of the public to contribute to analyses.

Michaut, Magali

Computational Cancer Genomics, Netherlands Cancer Institute

Cancer is a leading cause of death worldwide and in continuous increase. After the most important oncogenes and tumor suppressors were discovered, it was recognized that cancer is not only a disease of genes but a disease of pathways: alterations in many different genes can lead to the same deregulated pathway. Recently, The Cancer Genome Atlas (TCGA) and the International Cancer Genome Center (ICGC) have been producing deluges of large-scale and heterogeneous data: mutations, CNA, mRNA expression, DNA methylation, protein expression. In addition, large-scale panels of drug sensitivity in cell lines have also been released by the Sanger Institute and the Broad Institute. Methods are needed to integrate these different types of alterations and bridge the molecular alterations to the treatments. I am interested in data mining and integration of large-scale cancer genomics data in order to answer relevant clinical questions including subtyping, identification of altered pathways, development of biomarkers, and prediction of treatment response.

Meuleman, Wouter

MIT

Epigenetics

No abstract recorded.

Nils Gehlenborg

Center for Biomedical Informatics, Harvard Medical School

My goal is to develop computational tools that enable data-driven biomedical research. For the last few years, my focus has been on data visualization and exploration of large-scale genomics and epigenomics data. There is a great opportunity to enhance our understanding of biology if we can overcome the evident disconnect between the analysis of data by computers and the interpretation of results by humans.

Currently, my work is focussed on three areas:

1. Visualization and exploration tools for large-scale cancer genomics studies. In collaboration with visualization experts and cancer researchers, I have developed StratomeX (http://stratomex.caleydo.org), a new approach for the identification and characterization of cancer subtypes in heterogeneous cancer genomics data sets.

2. Integration of visual and computational approaches to support sense-making in biology. I am leading a team that is building the Refinery Platform (http://www.refinery-platform.org) to provide a framework for the integration of visualization tools and complex analysis pipelines that operate on data sets with hundreds of samples.

3. Development of software to support reproducible research in epigenomics and genomics. A second goal of the Refinery Platform is to efficiently store and retrieve the metadata that describe the steps and parameters used in the analyses. I am developing a system that allows effective tracking, visualization and sharing of the metadata along with automated workflows.

Patterson, Murray

Centrum Wiskunde & Informatica

Combinatorial models for haplotype assembly

The haplotype assembly of next-generation sequence (NGS) data is a computationally hard but key step in (re-) assembling the genome of an individual. The current state-of-the-art methods are heuristics aimed at the purely combinatorial optimization of correcting the minimum number of errors in order to produce such an assembly, that (1) do not take into account the several types of error probabilities associated with the NGS reads, and (2) do not take advantage of the structural properties possessed by this (paired-end) data. We see the following ways to immediately address these two deficiencies. (1) To use a maximum-likelihood method to compute the optimal haplotype assembly that is most likely, given these error probabilities. Such an approach is feasible because, not only have the authors of the current methods mentioned the possible improvement by such a maximum-likelihood method, but also that we collaborate with research labs and consortia that have high-quality data sets, which include these error probabilities, which we can use as a benchmark. (2) To use Linear Programming (LP) to, not only take advantage of such structural properties, but to also compute an exact (non-heuristic) solution to the problem. We believe this approach is a step in the right direction because there is already a body of work on LP approaches to similar problems, and yet it has never been used in the context of haplotype assembly.

Pelizzola, Mattia

IIT

Computational epigenomics

Current research focus:

I am leading the Computational Epigenomics group at the Center for Genomic Sciences of IIT in Milan. Our main interests are the development of tools for the analysis of NGS epigenomics data, unravelling the cross-talk between epigenetic layers and study their alteration involved in cancer onset and progression.

Future research plans:

I am particularly interested in developing/adopting experimental strategies for imposing controlled perturbation of epigenetic modifications and using these lab tools and computational methods to reverse engineer the cross-talk between epigenetic layers. In addition, I am interested in taking advantage of the vast amount of publicly available transcriptional and epigenomics data to dissect regulatory programs and their alterations in cancer.

Pfeifer, Nico

Max Planck Institute for Informatics

Statistical learning in HIV research

Recent advances in high-throughput technologies have led to an exponential increase in biological data (such as genomic, epigenomic and proteomic data). To find meaningful insights in such large data collections, efficient statistical learning methods are needed. Since January 2013 the theme "Statistical Learning in Computational Biology" at the MPI for Informatics is coordinated by me. We are interested in developing and applying new machine learning / statistical learning methods to solving computational biology problems and answering new biological questions. Previously, we focused on proteomic data, but now the focus is more on epigenomic and genomic data with respect to medical relevance. Application areas include the study of viruses like HIV, Hepatitis C or Influenza as well as the field of epigenetics. Method-wise we are interested in integration of heterogeneous data sets, improving interpretability of non-linear estimators and efficient learning methods for large data sets.

Radonjic, Marijana

TNO

Network Biology of Systems Flexibility

Currently, we use and (co-) develop bioinformatics and statistics workflows for preprocessing, analysis, visualisation, modeling and integration of large datasets produced within systems biology studies (such as DNA/RNA sequencing, gene expression levels, metabolite and protein abundance, phenotypic parameters and biomarkers). Specifically, we focus on network biology approaches for integration of such systems study data at different levels of complexity and in the context of existing knowledge. Analyzing complex systems biology datasets allows us to predict and understand human health as well as to improve strategies to prevent and combat disease (e.g. the effect of lifestyle, diet or stress on health, or the effects of drugs on disease treatment). Major current application involves identification of functional biomarker signatures and their interpretation in the terms of health benefit. Relevant methods include integration of data-driven networks, variable (biomarker) selection methods, embedding of findings into the context of prior knowledge, association of marker signatures with a phenotypic trait of interest, prioritization and extraction of network parts that are most relevant in context of disease development or intervention (subgraph selection methods), quantification of dynamic network properties, causality inference through genetics profiles, etc. In addition, the effects of compounds (e.g. drugs or nutrients) are being associated with their physiological effects using these technologies, which is relevant application in our commercial projects.

Raman, Karthik

Indian Institute of Technology Madras

Computational Systems Biology

My current research focus is on predicting gene manipulations in metabolic networks for metabolic engineering. We are particularly focussing on high-performance computing techniques to evaluate a large number of metabolic network configurations in a massively parallel fashion, to identify multiple gene deletions/manipulations that can potentially improve the yield of a metabolite of interest. We reconstruct metabolic networks of organisms relevant in metabolic engineering and understand their metabolic capabilities through techniques such as flux balance analysis. We are focussing primarily on plant systems, where there are many challenges both in modelling and in metabolic engineering.

An important direction of research I wish to take is the modelling and analysis of consortia of organisms for metabolic engineering. Consortia of organisms present unique advantages as well as unique challenges for metabolic engineering. We would like to extend the experience from modelling and simulation of individual metabolic networks to more complex consortia of organisms, to understand the various possible interactions that can exist in such ecosystems, and how they can be leveraged for metabolic engineering. This project will also involve a good deal of high- performance computing algorithms.

Renard, Bernhard

Robert Koch Institute

High-Throughput Bioinformatics Driven Diagnostics for Infectious Diseases

Driving ideas from statistical and algorithmic bioinformatics into application in infectious disease research and diagnostics with a special focus on high-throughput experiments.

Satagopam, Venkata

Luxembourg Centre For Systems Biomedicine (LCSB), University of Luxembourg

My research interest is to develop bioinformatics methods/tools to analyze high-throughput experimental data (omics, meta-omics, genome-wide siRNA, miRNA knockdown experiments, NGS etc.,) coming from various disease areas ranging from metabolic disorders, inflammation, neurodegeneration etc., to discover and prioritize the biomarkers for further experimental validation. In this process we also utilize vast amounts publicly available data by applying state-of-the-art data integration and knowledge management techniques. Another very important source of information is the ‘Literature’, lot of knowledge is buried in these publications; we are applying advanced text-mining techniques to extract various bio-entities mentioned in these papers and utilize in the biomarker validation. Recently we are focusing on the personal genomics and clinical information management and analysis.

Scott, Michelle

University of Sherbrooke

Small regulatory RNAs

I started my own group at the end of 2011. We have two main research interests: the investigation of the role played by nuclear small RNAs in cancer biology (specifically in ovarian cancer) and the characterization of the combinational regulation of protein subcellular localisation.

Suravajhala, Prashanth

Bioclues.org

Systems Biology

I am intrigued by the fact that there remains hypothetical protein annotation problem, despite plethora of genome sequencing and annotation efforts. Our hypothesis is that we could use aptamers which are single stranded DNA or RNA molecules designed to bind target molecules with high affinity and selectivity. This, we believe could be cost effective in characterizing HPs on a large scale wherein, perhaps antibodies can be replaced with aptamers in running pulldown assays. If this were the case, finding aptamers that suit proteins would be a challenge. Can a mere protein sequence be used to find suitable aptamers? We would like to start this problem with Hps specific to mitochondria.

Vandin, Fabio

Brown University

Algorithms for Identifying Significant Mutations in Cancer Genomes

Cancer is a disease driven by somatic mutations that accumulate in the genome during an individual’s lifetime. Recent advances in DNA sequencing technology are enabling genome-wide measurements of these mutations in large cohorts of cancer patients. A major challenge in analyzing this data is to distinguish functional "driver" mutations that contribute to cancer progression from “passenger” mutations not related to the disease. A major difficulty in identifying the driver mutations is related to the mutational heterogeneity of cancer, since somatic mutations target pathways, or sets of genes, and mutations in different genes can alter a pathway function. I will describe two algorithms for the problem of identifying significantly mutated pathways. The first algorithm identifies subnetworks of a gene-gene interaction network that are recurrently mutated in a significant number of patients. The second algorithm requires no prior information about the interactions between genes, and optimizes a measure derived from two statistical properties of mutations in driver pathways. I will also present an efficient algorithm to accurately assess the association between mutations and the survival time. I will illustrate applications of these algorithms to data from The Cancer Genome Atlas, a project that is characterizing the genomes of thousands of cancer samples.

Waltemath, Dagmar

University of Rostock

Model and simulation management in computational biology and beyond

Standardization efforts and the exchange of protocols to generate data in the life sciences have become a necessity for large scale projects and multinational collaborations. If the processes that generate the data are not sufficiently well documented, it becomes difficult to reproduce the scientific results, and to share and integrate them across projects. The same arguments apply to the generation of scientific results that are based on mathematical models and computer simulations.

Dagmar Waltemath and her group develop novel tools and methods that improve the management of simulation models in computational biology, physiology and neuroscience. The overall goal is to foster model reuse and result reproducibility in these fields. Current developments include methods for (1) ranked retrieval of simulation models, (2) model version control, and (3) integration of models with model-related data such as simulation setups, or result data.

Relevant to all projects is the aspect of standardisation. Standard formats such as SBML or SED-ML enable the exchange of knowledge about simulation models. Dagmar Waltemath has a year-long experience in standard development, in particular for simulation descriptions (MIASE, SED-ML, KiSAO). She guided the specification of SED-ML Level 1 Version 1 which has become the main standard for the exchange of simulation setups in Systems Biology and related fields.

Dagmar Waltemath received her Ph.D. in Database and Information Systems from the University of Rostock, Germany, in 2011. Since March 2012 she leads the junior research group “SEMS” at the University of Rostock (BMBF e:Bio). http://sems.uni-rostock.de

Zarowiecki, Magdalena

Sanger Institute

Dr. Magdalena Zarowiecki’s research interests are neglected tropical diseases, in particular the evolution of parasitism, drug-resistance and host-parasite interactions. The current research is focusing on genomics of parasitic flatworms, including important platyhelminth parasites of humans in the genera Taenia, Hymenolepis, Echinococcus and Schistosoma, where she does genome assembly, annotation and bioinformatics analysis. She uses high-throughput approaches including RNAseq, gene-prediction, methylome studies, re-sequencing and microRNA-studies to increase the accuracy and biological depth of platyhelminth genome annotations. She is also interested in genome and transcriptome assembly, comparative RNA-Seq, prediction of alternative splicing and real-time population genomic studies of the evolution of drug resistance.