Oral Poster Presentations

ISMB/ECCB 2013 introduces oral poster presentations for a select group of outstanding posters. Works deemed exceptional in original review were passed on to three other reviewers each for further consideration. 24 posters were selected for oral presentations (8-minute talk) on the first day of the conference. These Oral Poster Presentations will also be considered for "best poster" awards.

Note: The presenting authors should be available to further discuss their poster content during regular poster-presentation times.


OPT01 Sunday, July 21: 10:30 a.m. - 10:55 a.m.
3-Dimensional Modeling of Macromolecular Assemblies by Efficient Combination of Pairwise Dockings
Room: ICC Lounge 81
Presenting Author: Matthias Dietzen, Max Planck Institute for Informatics, Germany

Additional Authors
Olga Kalinina (Max Planck Institute for Informatics, Computational Biology and Applied Algorithmics Germany); Katerina Taškova (Johannes Gutenberg University Mainz, Software Engineering and Bioinformatics Germany); Benny Kneissl (Johannes Gutenberg University Mainz, Software Engineering and Bioinformatics Germany); Elmar Jaenicke (Johannes Gutenberg University Mainz, Institute of Molecular Biophysics Germany); Heinz Decker (Johannes Gutenberg University Mainz, Institute of Molecular Biophysics Germany); Thomas Lengauer (Max Planck Institute for Informatics, Computational Biology and Applied Algorithmics Germany); Andreas Hildebrandt (Johannes Gutenberg University Mainz, Software Engineering and Bioinformatics Germany);
Abstract:
Macromolecular complexes play a key role in many biological processes. In metabolic pathways, for example, assemblies of proteins bear several advantages: reactions are performed more efficiently, oversupply of intermediate products is reduced or avoided by regulating the activity of the involved enzymes via feedback loops, and toxic or highly reactive compounds are kept from being released into the cytoplasm. However, atom-level structural determination of such complexes, for example with X-ray crystallography, often fails due to the size of the complex, different binding affinities of the involved proteins, or the complex falling apart during crystallization.
We present a novel combinatorial greedy algorithm that iteratively assembles such complexes solely based on the knowledge of the approximate interface locations of any two interacting proteins in the complex, and the stoichiometry of each monomer. Prior assumptions about symmetries in the complex are not required; rather, the symmetry is detected during complex assembly. Complexes are assembled stepwise from pairwise docking poses obtained with RosettaDock and scored using a geometric compatibility constraint deduced from these docking poses. Clash detection and clustering guarantee a reasonable and diverse solution space in each iteration.
In a diverse and representative benchmark set of 304 complexes from the Protein Data Bank with more than five subunits, 199 (65%) could be reconstructed with an average RMSD of 14 reference points for any two contacting subunits in the reference complex not greater than 3.0Å from the reference complex. Of these, the best prediction lies within the top ten in 91% of the cases.
TOP

OPT02 Sunday, July 21: 10:30 a.m. - 10:55 a.m.
Mining and automatic classification of repeat protein structures with RAPHAEL.
Room: ICC Lounge 81
Presenting Author: Ian Walsh, University of Padua, Italy

Additional Authors
Tomás Di Domenico (University of Padua, Department of Biology Italy); Silvio C. E. Tosatto (University of Padua, Department of Biology Italy);
Abstract:
Repeat proteins form a distinct class of structures where folding is greatly simplified. The structures are elongated or circular with tandem arrays of structural motifs attached periodically. Mounting evidence suggests vital functional roles in cell regulation, transcriptional control and maintenance of a healthy nervous system to name a few. However, little attention has been paid to the large-scale organization of these highly influential structures in a comprehensible manner. From a structural point of view, finding repeats may be complicated by the presence of insertions or multiple domains. To the best of our knowledge, no automated methods are available to characterize solenoid repeats from structure. Our recently published method RAPHAEL uses a geometric approach mimicking manual classification, producing several numeric parameters that are optimized for maximum performance. It can mine large databases such as the Protein Data Bank (PDB) and CATH with 89.5% repeat and 97.2% non-repeat detection rate. Moreover, for each repeat RAPHAEL attempts the (1) determination of their periodicity and (2) assign non-periodic regions (insertions). RAPHAEL finds 1931 highly confident repeat structures not previously annotated as repeats in the PDB records (e.g. using keywords). Likewise CATH was mined for domains with periodic properties.
Finally all CATH domains and PDB chains detected to be periodic are classified automatically using a self organizing clustering algorithm. This resulted in a clear separation of repeat families and perhaps the first attempt at organizing currently available periodic proteins.
TOP

OPT03 Withdrawn Sunday, July 21: 10:30 a.m. - 10:55 a.m.
Simultaneous fitting of macromolecular subunits into low-resolution cryoEM maps using network alignment and genetic algorithm
Room: ICC Lounge 81
Presenting Author: Arun Prasad Pandurangan, Birkbeck, University of London, United Kingdom

Additional Authors
Daven Vasishtan (Henry Wellcome Building for Genomic Medicine, Oxford Particle Imaging Centre, Division of Structural Biology United Kingdom); Shihua Zhang (Chinese Academy of Science, Academy of Mathematics and Systems Science China); Frank Alber (University of Southern California, Program in Molecular and Computational Biology United States); Maya Topf (Birkbeck, University of London, Institute of Structural and Molecular Biology, Crystallography, Department of Biological Sciences United Kingdom);
Abstract:
Cryo-electron microscopy (cryoEM) techniques are widely used to study the structure and function of macromolecular assemblies. Although EM image processing methods produce 3D density maps of low resolution, they are becoming hugely important as they allow the visualisation of large assemblies in multiple conformational states. EM maps of assemblies are often interpreted by fitting atomic models into them. However, model fitting can become a very challenging task depending on the size and shape of the complex, the number of subunits, the complexity of their interactions, and the map resolution.

We present a method for simultaneously fitting subunits into cryoEM maps of assemblies. Based on the clustering of electron density information, the assembly map and its multiple subunits are simplified into a set of vectors (feature points) [1] that can be used to quickly and simultaneously place the subunits using an integer quadratic programming procedure [1]. This is combined with a genetic algorithm used to optimise the positions of the feature points. The optimisation is guided by a combination of scores to capture the shape of the subunits and the goodness-of-fit between the map and the subunits. Our current, simulated test cases suggest that the method may be applicable to larger assemblies than the ones previously published [1].

[1] Zhang S et al. (2010) A fast mathematical programming procedure for simultaneous fitting of assembly components into cryoEM density maps. Bioinformatics (ISMB)26:i261–i268.
TOP

OPT04 Sunday, July 21: 11:00 a.m. - 11:25 a.m.
Conformational diversity and lung cancer associated mutations in human EGFR kinase domain
Room: ICC Lounge 81
Presenting Author: Gustavo Parisi, Universidad Nacional de Quilmes, Argentina

Additional Authors
Marcia Hasenahuer (Universidad Nacional de Quilmes, - Argentina); Yanina Powazniak (Fundación Investigar, - Argentina); Guillermo Bramuglia (Fundación Investigar, - Argentina); María Silvina Fornasari (Universidad Nacional de Quilmes, Departamento de Ciencia y Tecnología, Unidad de Físico Química Argentina);
Abstract:
Single amino acid substitution (SAS) can cause disease by different mechanisms. The most extended is the destabilization of the protein fold producing the alteration of protein function. Considering that the native state of the protein is not unique and it is better represented by a conformer ensemble, it is conceivable that SASs distinctively affect the stability of the different conformers. Consequently to analyze the impact of a given SAS, the conformational diversity of a protein should be taken into account.
Epidermal Growth Factor Receptor (EGFR) is an important marker employed in detecting Non-Small Cell Lung Cancer (NSCLC). Here we explored 49 different SASs in EGFR kinase domain associated with NSCLC as a function of the conformational diversity of the protein.

∆∆G for all possible amino acid substitutions and for all the positions in the active and inactive conformers of EGFR were scanned. Comparing the pattern of ∆∆G per position between the conformers we found that, cancer associated SASs are located in position showing the maximum variation in the ∆∆G. Also, sensible to drug treatment associated mutations (tyrosine kinase inhibitors) were found to be mostly neutral but those resistant to treatment show high destabilization values particularly in the active conformer of the protein. Our results could indicate that the analysis of conformer specific SASs tolerance, in terms of structure stability, could improve our understanding of disease origin and treatment response.
TOP

OPT05 Sunday, July 21: 11:00 a.m. - 11:25 a.m.
The relationship between stability and amino acid conservation in enzyme structures
Room: ICC Lounge 81
Presenting Author: Romain Studer, University College London, United Kingdom

Additional Authors
Christine Orengo (University College London, Institute of Structural and Molecular Biology United Kingdom);
Abstract:
Enzymes are present at the core of various biochemical reactions. They form highly ordered atomic structures called protein domains. Due to their thermodynamically stable arrangements of atoms, their evolutionary trajectories are constrained to a narrow range of stability. While some residues are critical for maintaining the stability of the protein fold, others are important for the catalytic activity itself. Most amino acid changes in native proteins are destabilising and, consequently, mutations that lead to a more favourable enzyme activity are likely to decrease the stability of the protein, and vice versa. Compensatory mutations are then needed to restore global stability. This process is referred to as the "stability-activity trade-off", suggesting a close relationship between the function and the structure.

In this study, we extracted and analysed the 4,920 protein domains identified in FunTree, a database providing evolutionary information on enzymatic proteins. We explored different structural properties for each residue, such as its solvent accessibility, its distance to the closest ligand (clustered as in the periodic table of elements), its distance to the geometric centre of the protein, and the stability effect of the 20 amino acid mutations. At the sequence level, we extracted evolutionary conservation scores, based on entropy and Scorecons methods applied on alignments from CATH/Gene3D. In conclusion, we identified different relationships between all these properties, among them a link between the intrinsically destabilising effect of binding sites and the type of the metal associated.
TOP

OPT06 Sunday, July 21: 11:00 a.m. - 11:25 a.m.
Privacy-preserving search for a chemical compound database
Room: ICC Lounge 81
Presenting Author: Kana Shimizu, Computational Biology Research Center (CBRC), Japan

Additional Authors
Hiromi Arai (RIKEN, Bioinformatics And Systems Engineering division Japan); Koji Nuida (National Institute of Advanced Industrial Science and Technology, Research Institute for Secure Systems Japan); Michiaki Hamada (University of Tokyo, Graduate School of Frontier Science Japan); Koji Tsuda (National Institute of Advanced Industrial Science and Technology, Computational Biology Research Center (CBRC) Japan); Jun Sakuma (University of Tsukuba, Computer science Dept. Japan); Takatsugu Hirokawa (National Institute of Advanced Industrial Science and Technology, Computational Biology Research Center (CBRC) Japan); Goichiro Hanaoka (National Institute of Advanced Industrial Science and Technology, Research Institute for Secure Systems Japan); Kiyoshi Asai (University of Tokyo, Graduate School of Frontier Science Japan);
Abstract:
Searching similar compound from a database is among the most important approaches in the process of drug discovery. Since a query compound is an important starting point for a new drug, the query compound is usually treated as secret information. The most popular method for a client to avoid information leakage is downloading whole database and using it in a closed network, however, this naive approach cannot be used if the database side also wants to keep its privacy. Therefore it is expected to develop new method which enables to search a database while both sides keep their privacy. In this study, we address the problem of searching for similar compounds in privacy-preserving manner, and propose a novel cryptography protocol for solving the problem. The proposed protocol is based on semi-homomorphic encryption and is quite efficient both in computational cost and communication size. We implemented our protocol and compared it to general purpose Multi party computation (MPC) on a simulated data set. We confirmed that the CPU time of the proposed protocol was around 1000 times faster than that of MPC. The protocol can be used for the database where data is represented as a bit-vector, therefore, we expect that our protocol will be applied for various kind of problems appeared in the field of bioinformatics.
TOP

OPT07 Sunday, July 21: 11:30 a.m. - 11:55 a.m.
Characterization and identification of cis-regulatory elements in Arabidopsis thaliana based on SNP information
Room: ICC Lounge 81
Presenting Author: Paula Korkuc, Max Planck Institute for Molecular Plant Physiology, Germany

Additional Authors
Dirk Walther (Max Planck Institute for Molecular Plant Physiology, Bioinformatics Germany);
Abstract:
The identification of regulatory elements encoded in an organism’s genome remains a central goal of modern molecular biology. We exploited the genomic sequencing information of a large number of different accessions of Arabidopsis thaliana as available from the 1001 genome project to characterize known and to identify novel cis-regulatory elements in its gene promoter regions. Assuming that promoters and regulatory elements such as transcription factor binding sites (TFBSs) are more conserved than non-functional intergenic regions, we wanted to estimate the bounds of promoter regions by determining the density of single nucleotide polymorphisms (SNPs) along the intergenic regions, verify known TFBSs by analyzing their localization versus their level of conservation, and find new candidate motifs out of all possible nucleotide hexamers. Based on the obtained SNP density profile and the genomic layout, the average length of promoter regions could be established at 500nt. We confirmed that known TFBS-motifs are indeed more conserved than the promoter background. For sixteen known motifs, their positional preferences could be clearly substantiated based on their position-specific decreased SNP density. Lastly, twelve candidate hexamers were identified whose relative positional occurrence correlates significantly with their conservation level and may represent newly discovered motifs awaiting experimental validation. For eight hexamers, significant associations to particular processes and functions of the associated downstream genes suggest a functional relevance of those newly found motifs. Our study demonstrates that the currently available resolution of SNP data offers novel ways for the identification of functional genomic elements and the characterization of gene promoter sequences.
TOP

OPT08 Sunday, July 21: 11:30 a.m. - 11:55 a.m.
Machine learning methods for genotype-phenotype association in bacterial genomes
Room: ICC Lounge 81
Presenting Author: Thomas Abeel, Broad Institute, United States

Additional Authors
Bruce Birren (Broad Institute, GSAP United States); Ashlee Earl (Broad Institute, GSAP United States); Yves Van de Peer (UGent, VIB Belgium);
Abstract:
Genotype-phenotype association methods for bacterial genomes are not yet well-established. Bacteria do not have sexual reproduction, which invalidates some of the assumptions made in many of the current methods used for association studies in other organisms. Bacteria have a huge influence on human health and we need better methods to learn about how changes in genetics give clinically relevant phenotypes.
Our bug of interest is Mycobacterium tuberculosis (Mtb), a bacterial pathogen that causes pulmonary tuberculosis (TB), which kills over a million people each year. Unfortunately, Mtb is difficult to diagnose and resistance to antibiotics is becoming rampant. The current diagnostics for drug resistance take six to eight weeks. The technology for rapid molecular diagnostics exists, but requires knowledge about resistance marker mutations, which is missing.
To address the lack of knowledge about marker mutations, we present a machine learning based strategy that uses support vector machines to predict genotype-phenotype associations by integrating genome sequence and clinical meta data. An ensemble feature selection method enables the discovery of antibiotic resistance markers in Mtb. The impact is two-fold: (i) the features selection procedure gives us a ranking of mutations that are associated with drug resistance and (ii) the classification model can be used together with a molecular diagnostic to predict treatment options for patients.
In this poster we discuss the methods and illustrate their capabilities on a panel of bacterial drug-resistance projects with a particular focus on Mycobacterium tuberculosis.
TOP

OPT09 Sunday, July 21: 11:30 a.m. - 11:55 a.m.
Expression divergence between Escherichia coli and Salmonella enterica serovar Typhimurium and the relation to pathogenicity
Room: ICC Lounge 81
Presenting Author: Kristof Engelen, Research and Innovation Center Fundazione Edmund Mach, Italy

Additional Authors
Pieter Meysman (University of Antwerp, ADREM Belgium); Aminael Sanchez-Rodriguez (KULeuven, CMPG Belgium); Qiang Fu (KULeuven, CMPG Belgium); Kathleen Marchal (UGhent, PSB Belgium);
Abstract:
Escherichia coli K12 is a commensal bacteria and one of the best-studied model organisms. Salmonella enterica serovar Typhimurium, on the other hand, is a facultative intracellular pathogen. These two prokaryotic species can be considered related phylogenetically and they share a large amount of their genetic material, which is commonly termed the 'core genome'. Despite their shared core genome, both species display very different life styles and it is unclear to what extent the core genome, apart from the species-specific genes, plays a role in this lifestyle divergence. In this study, we focus on the differences in expression domains for the orthologous genes in E. coli and S. Typhimurium. The iterative comparison of coexpression methodology was used on large expression compendia of both species to uncover the conservation and divergence of gene expression. We found that gene expression conservation occurs mostly independent from amino acid similarity. According to our estimates, at least more than one quarter of the orthologous genes has a different expression domain in E. coli than in S. Typhimurium. Genes involved with key cellular processes are most likely to have conserved their expression domains whereas genes showing diverged expression are associated with metabolic processes that, although present in both species, are regulated differently. The expression domains of the shared 'core' genome of E. coli and S. Typhimurium, consisting of highly conserved orthologs, have been tuned to help accommodate the differences in lifestyle and the pathogenic potential of Salmonella.
TOP

OPT10 Sunday, July 21: 12:00 p.m. - 12:25 p.m.
Metatranscriptomics of colonic bacteria in inflammatory bowel diseases
Room: ICC Lounge 81
Presenting Author: Feargal Ryan, University College Cork, Ireland

Additional Authors
Calum Walsh (Teagasc Moorepark, food biosciences Ireland); John O'Callaghan (Teagasc Moorepark, Food Biosciences Ireland); Aldert Zomer (Radboud University Nijmegen, Medical Centre Netherlands); Aine Fanning (University College Cork, Alimentary Pharmabiotic Centre Ireland); Fergus Shanahan (University College Cork, Alimentary Pharmabiotic Centre Ireland); Marcus Claesson (Unversity College Cork, Microbiology Ireland);
Abstract:
Crohn’s disease and ulcerative colitis are inflammatory bowel diseases (IBD) characterized by chronic and relapsing inflammation of the gastro-intestinal tract. They cause lifelong suffering, as well as considerable drainage of health care resources. Although their etiology is still unclear there is a growing body of evidence for a significant microbial factor. In this study we focus on the global gene expression of these communities through mRNA sequencing. We collected colonic biopsies from inflamed and non-inflamed colonic mucosa in 19 IBD patients and using RNA-Seq with unprecedented depth we compared microbial metatranscriptomes in inflamed and non-inflamed colonic mucosa. This was done using 600Gb of Illumina HiSeq RNA-Seq technology (15Gb/sample). Human reads were filtered out using Tophat2 and remaining reads were mapped against a reference database constructed from all sequenced bacterial and viral genomes from the gastrointestinal subset of the human microbiome project. We subsequently used DE-Seq to analyze the count data. Preliminary analysis using Real-Time-PCR revealed transcriptional differences for two pathogenic Bacteroides species, where homologs of genes involved in tissue-destruction were up-regulated in inflamed mucosa of UC patients. We also observed higher expression rates of E.coli in inflamed mucosa, as has previously been observed in CD patients. Meta-transcriptome analysis confirmed these results and added a multitude of other gene candidates that were significantly up/down regulated in inflamed tissue. Thus, our analysis revealed transcriptional differences for known microbial pathogens. High-throughput RNA-Seq analysis added extra value to these findings and is a source of continuing analysis with great potential for further interesting findings.
TOP

OPT11 Sunday, July 21: 12:00 p.m. - 12:25 p.m.
Relating the metatranscriptome and metagenome of the human gut
Room: ICC Lounge 81
Presenting Author: Eric Franzosa, Harvard School of Public Health, United States

Additional Authors
Xochitl Morgan (Harvard School of Public Health, Biostatistics Department United States); Nicola Segata (Harvard School of Public Health, Biostatistics Department United States); Levi Waldron (Harvard School of Public Health, Biostatistics Department United States); Joshua Reyes (Harvard School of Public Health, Biostatistics Department United States); Curtis Huttenhower (Harvard School of Public Health, Biostatistics Department United States); Ashlee Earl (The Broad Institute, Genome Sequencing & Analysis Program United States); Georgia Giannoukos (The Broad Institute, Genome Sequencing & Analysis Program United States); Dawn Ciulla (The Broad Institute, Genome Sequencing & Analysis Program United States); Wendy Garrett (Harvard School of Public Health, Department of Immunology and Infectious Diseases United States); Andrew Chan (Massachusetts General Hospital, Gastrointestinal Unit United States); Jacques Izard (The Forsyth Institute, Department of Microbiology United States); Matthew Boylan (Massachusetts General Hospital, Gastrointestinal Unit United States);
Abstract:
Typical microbial residents and ecologies of the human microbiome have now been well-studied. However, the microbiota's >8 million genes and their transcriptional regulation remain largely uncharacterized. We conducted one of the first human microbiome studies in a well-phenotyped prospective cohort incorporating taxonomic, metagenomic, and metatranscriptomic profiling at multiple body sites. The results establish the feasibility of metatranscriptomic investigations in subject-collected samples from the Health Professionals Follow-up Study. Replicate stool and saliva samples were collected from 8 subjects, and three different RNA preservation methods were assessed (frozen, ethanol, and RNAlater). Within-subject microbial species, gene, and transcript abundances were highly concordant across sampling methods, with only transcripts and only a small fraction (<5%) displaying significant between-method variation. Their functions were consistent with reprogramming in response to storage media environment (carbon source and osmolarity). Next, we investigated relationships between the oral and gut microbial communities, identifying a subset of abundant oral microbes that routinely survive transit to the gut. Comparison of the gut metagenome and metatranscriptome revealed three distinct functional clusters: (i) the ~50% of microbial genes whose RNA and DNA levels are strongly correlated; (ii) genes detected only at the DNA level, including inactive biosynthesis and stress-response factors; and (iii) genes detected only at the RNA level, including functions specific to the gut’s archaeal inhabitants, e.g. methanogenesis. Globally, we observe that RNA-level functional profiles are significantly more individualized than DNA-level profiles across subjects but less variable than microbial composition, indicative of subject-specific whole-community regulation occurring at the transcriptional level.
TOP

OPT12 Sunday, July 21: 12:00 p.m. - 12:25 p.m.
Clinical Pathoscope: An alignment and filtering pipeline for rapid pathogen identification in unassembled, next-generation sequencing data
Room: ICC Lounge 81
Presenting Author: Allyson Byrd, Boston University, United States

Additional Authors
Joseph Perez-Rogers (Boston University, Bioinformatics United States); Kylee Bergin (Boston University, Bioinformatics United States); W. Evan Johnson (Boston University, Medicine United States);
Abstract:
The rapid and accurate identification of pathogens in human tissue samples is a necessity as disease-causing pathogens increasingly develop resistance to broad spectrum antibiotics and remain one of the greatest public health burdens worldwide. With the increased affordability of high-throughput sequencing, it is now possible to investigate the microbiome of a given sample with high sensitivity. However, clinical samples contain a mixture of genomic sequences from various sources, which complicates the identification of pathogens. Here we present Clinical Pathoscope, a pipeline to rapidly and accurately remove host contamination, isolate viral reads, and deliver a diagnosis. To optimize the Clinical Pathoscope pipeline, data was simulated from human, bacterial, and viral genomes to create biologically realistic clinical samples which represented a diverse variety of host-pathogen landscapes. These data were then used to evaluate the accuracy, usability, and speed of multiple alignment algorithms and filtration methods. The optimal alignment algorithm and filtration method were implemented in the Clinical Pathoscope pipeline to isolate viral reads. These reads were then mapped against a robust viral database and assigned to their appropriate genomes of origin. We demonstrate our approach using sequenced nasopharyngeal aspirate samples from children with upper respiratory tract infections. Unique to other methods, Clinical Pathoscope can rapidly identify multiple pathogens from mixed samples and distinguish between very closely related species with very little coverage of the genome and without the need for genome assembly.
TOP

OPT13 Sunday, July 21: 2:10 p.m. - 2:35 p.m.
Newtonian dynamics in the space of phylogenetic trees
Room: ICC Lounge 81
Presenting Author: Björn Hansen, Universität Hamburg, Germany

Additional Authors
Andrew Torda (Universität Hamburg, Center for Bioinformatics Germany);
Abstract:
A classic phylogenic tree is a simple directed graph. Edges are either present or absent and searching for a phylogenetic tree is a discrete optimisation problem. We have been developing an alternative view. Allowed trees are just points within a continuous space. Connections are continous properties which behave like coordinates. If we know the similarities between objects like sequences, we can see how well the set of coordinates (connections) fits the experimental data. The greater the disagreement, the greater the force acting on the connections. This leads to a method for generating phylogenetic trees. One can perform classic, conservative Newtonian dynamics in the space which includes all possible trees.

At the moment, we are limited to distance-based phylogenies, but the method has advantage over Monte Carlo methods, that it uses gradient information, so sampling can be quite efficient. Like Monte Carlo methods, extensions such as simulated annealing or replica exchange are easy to implement. We see the long term benefit as a means of providing efficient sampling for seeding more sophisticated methods such as Bayesian inference.
TOP

OPT14 Sunday, July 21: 2:10 p.m. - 2:35 p.m.
Sequence assembly and variation calling using multiple-dimension de Bruijn graphs
Room: ICC Lounge 81
Presenting Author: Sergey Lamzin, The Genome Analysis Centre, United Kingdom

Additional Authors

Abstract:
Recent developments in sequencing technologies have brought a renewed impetus to the development of bioinformatics tools for sequence processing and analysis. Most of the current algorithms for de novo genome assembly are based on de Brujin graphs which provide an effective framework for aggregating next generation sequencing (NGS) data into a convenient structure. De Bruijn graphs, however, introduce an artificial parameter that can impact greatly on the results: the dimension k giving rise to k-mer building blocks. We report on the development of a novel assembly algorithm with a new data structure designed to overcome some of the limitations of a single fixed k-mer size de Brujin graph approach and enable higher quality NGS data processing. Our approach structurally combines de Brujin graphs for all possible dimensions k in one supergraph, leading to a flexible graph dimension . The algorithm called StarK is designed in such a way that it allows the assembler to dynamically adjust the de Brujin graph dimension at any given nucleotide position. In addition to flexible k-mer lengths the structure allows for simultaneous assembly of a consensus sequence and mutations/haplotypes directly from reads. The StarK graph uses localised coverage differences to guide the generation of connected subgraphs. This allows higher resolution of genomic differences and helps differentiate errors from potential variants within the sequencing sample.
TOP

OPT15 Sunday, July 21: 2:10 p.m. - 2:35 p.m.
Probabilistic inference of subclonal copy number and LOH in whole genome sequencing of tumours
Room: ICC Lounge 81
Presenting Author: Gavin Ha, British Columbia Cancer Agency, Canada

Additional Authors
Andrew Roth (British Columbia Cancer Agency, Molecular Oncology Canada); Andrew McPherson (British Columbia Cancer Agency, Molecular Oncology Canada); Samuel Aparicio (British Columbia Cancer Agency, Molecular Oncology Canada); Sohrab Shah (British Columbia Cancer Agency, Molecular Oncology Canada);
Abstract:
Background: Tumours are often composed of heterogeneous mixtures of cell populations having undergone clonal evolution and expansion. This intrinsic clonal diversity is often implicated in treatment resistance and metastasis. Copy number alterations (CNA) and loss of heterozygosity (LOH), which make up the structural landscape of somatic aberrations, can be measured through quantification of read abundance in whole genome sequencing (WGS). However, inference of CNA/LOH remains a challenge due to statistical under-sampling of alleles and sources of noise such as GC-content bias and normal cell contamination. In this contribution, we address these challenges in order to explore the estimation of clonal abundance of CNA/LOH events.

Methods: We present solutions to identify subclonal CNA/LOH events by deconvolving signals from mixed cell populations in WGS of individual tumour biopsies. Using hierarchical probabilistic modeling, our approach jointly analyzes allelic fractions at all germline SNP loci and read counts from tumour and normal libraries. We propose an HMM that simultaneously segments and predicts subclonal CNA/LOH, and reports estimates of cellular frequency, normal contamination and ploidy, thus providing a more complete characterization of the tumour.

Results: We present simulation results using real data from multiple, spatially separated biopsies from a high-grade serous ovarian carcinoma. For benchmarking, we generated in-silico mixtures of these samples at known proportions. We demonstrate that our method performs with increased sensitivity compared to existing tools that do not account for subclonal heterogeneity. Our work provides an important advance in identifying and quantifying fractional events in heterogeneous tumours.
TOP

OPT16 Sunday, July 21: 2:40 p.m. - 3:05 p.m.
Variants affecting exon skipping contribute to health disparities
Room: ICC Lounge 81
Presenting Author: Younghee Lee, The University of Chiago, United States

Additional Authors
Eric Gamazon (University of Chicago, Medicine United States); Wenndy Hernandez (University of Chicago, Medicine United States); Nancy Cox (University of Chicago, Medicine United States);
Abstract:
This poster is based onProceedings Submission:Alternative splicing(AS) may be one biological factor accounting for cancer health disparities. Although AS is involved in a broad spectrum of cancer pathogenesis, AS differences among human individuals is underestimated. We compiled information on splicing regulatory elements (SREs) and single nucleotide polymorphisms (SNPs) with high population differentiation. For health disparities, we considered candidate SNPs that have a high fixation index (Fst>=0.5, an estimation of how populations differ genetically). We observed that synonymous and intronic variations within SRE sites tend to have higher Fst, values than those found outside SREs, suggesting that those functional SNPs in SREs are more likely to be under selective pressure. We describe the function (s) for a number of variants that showed phenotypic associations but for which mechanisms were unknown. For example, one of our findings in the TNFRSF1A gene, intronic rs12265291, is predicted to neutralize a SRE and has a high Fst, value of 0.88. The exon to the right to this SNP is skipped and participates in encoding CTNNB1-binding domain. Additionally, rs12265291 is in high linkage disequilibrium (LD) with rs4506565 (LD r2=0.53) which has been previously associated with Type 2 Diabetes. We also found SNP rs12265291 to be in LD with rs216013 (Fst=0.38, LD r2=0.868), this SNP is in the drug-response gene CACNA1C and has been associated with Warfarin maintenance dose requirement. Our study identifies several SNPs that may have a biological impact on human diseases through AS and emphasizes their importance for molecular therapies to reduce health disparities.
TOP

OPT17 Sunday, July 21: 2:40 p.m. - 3:05 p.m.
Accuracy of algorithms and databases for the prediction of deleterious and disease causing mutations in healthy individuals.
Room: ICC Lounge 81
Presenting Author: Raf Winand, KU Leuven, Department of Electrical Engineering-ESAT, SCD-SISTA / iMinds Future Health Department, Belgium

Additional Authors
Kristien Hens (Maastricht University, Health, Ethics & Society Netherlands); Inge Liebaers (University Hospital Brussels, Centre for Medical Genetics Belgium); Joris Vermeesch (KU Leuven, Centre for Human Genetics Belgium); Yves Moreau (KU Leuven, Department of Electrical Engineering-ESAT, SCD-SISTA, iMinds Future Health Department Belgium); Jan Aerts (KU Leuven, Department of Electrical Engineering-ESAT, SCD-SISTA, iMinds Future Health Department Belgium);
Abstract:
Whole genome sequencing comes with the unprecedented opportunity to obtain secondary information not related to the original clinical question. In order for this information to prove useful, assessing the clinical validity is crucial. We must ensure that prediction algorithms and databases are able to accurately predict the phenotype in healthy individuals. To this end we compared the specificity of several prediction algorithms and databases like SIFT, PolyPhen and HGMD regarding the detection of damaging or disease causing mutations. We also tested a new in-house developed algorithm called eXtasy that is disease-centric and incorporates the phenotype of the individual in the analysis.

We analyzed samples from people who are considered healthy from the 1000 Genomes Project, publicly available samples from Complete Genomics, and research samples from the centre for human genetics at our university. In those samples, we looked for mutations in genes associated with severe congenital disorders characterized by extreme dysmorphologies and early onset.

We found large differences between combinations of the different prediction algorithms, databases, datasets and modes of inheritance. Depending on the dataset 98-100% of individuals had mutations predicted to be damaging by both PolyPhen and SIFT. For HGMD mutations, we found that up to 22% of the samples had mutations annotated as disease causing. Overall there were more false positives for autosomal dominant disorders compared to autosomal recessive disorders. The results obtained from eXtasy are promising and show a large increase in specificity compared to other prediction algorithms.
TOP

OPT18 Sunday, July 21: 2:40 p.m. - 3:05 p.m.
Quantifying the impact of somatic mutations on gene expression networks in cancer
Room: ICC Lounge 81
Presenting Author: Jiarui Ding, University of British Columbia, British Columbia Cancer Research Centre, Canada

Additional Authors
Ali Bashashati (British Columbia Cancer Research Centre, Molecular Oncology Canada); Samuel Aparicio (British Columbia Cancer Research Centre, Molecular Oncology Canada); Anne Condon (University of British Columbia, Computer Science Canada); Sohrab Shah (British Columbia Cancer Research Centre, Molecular Oncology Canada);
Abstract:
To improve interpretability of large-scale whole genome sequencing studies of human cancers, we aim to computationally estimate the functional impacts of individual somatic mutations in patient tumours, providing insights on which mutations drive malignant phenotypes.

We assume that a functional mutation in a gene of interest will have cis-effects on its own expression or trans-effects on the expression of genes in the same biological pathway. In addition, we assume mutated genes altering phenotype are selected during evolution thus they should accrue at higher than expected rates in population studies. We developed a novel probabilistic model to integrate different sources of data and prior information, such as somatic mutations, gene expression, and pathway datasets to predict functional mutations that are likely to have affected the transcriptional profile of a tumour. The model outputs a probability that each mutation in a dataset has functional impact.

Publically available TCGA cancer datasets and breast cancer data generated in our lab were used to evaluate the proposed model. Experimental results shown the model’s predictions had higher concordance with Cancer Gene Census documented genes than the MutationAssessor algorithms’ predictions. In addition, the model predicted some known driver genes missed by the MutSig algorithm. Finally, randomly permuted datasets were predicted to have statistically significantly lower functional probabilities compared with the orginal predictions.

We show the integration of multiple data types by a probabilistic graphical model can predict individual functional mutations and driver genes. We suggest that this techinque is an important step on the road to peronalized treatment strategies informed by genome and transcriptome sequencing.
TOP

OPT19 Sunday, July 21: 3:10 p.m. - 3:35 p.m.
Systems Level Analysis of Breast Cancer Reveals the Differences between Lung and Brain Metastasis through Protein-Protein Interactions
Room: ICC Lounge 81
Presenting Author: Hatice Billur Engin, Koc University, Turkey

Additional Authors
Attila Gursoy (Koc University, Computer Engineering Turkey); Baldo Oliva (Pompeu Fabra University, Experimental and Life Sciences Spain); Emre Guney (Pompeu Fabra University, Experimental and Life Sciences Spain); Ozlem Keskin (Koc University, Chemical and Biological Engineering Turkey);
Abstract:
According to American Cancer Society, breast cancer is the second most common cause of cancer death among women. Generally, the reason of fatality is the metastasis in another organ, not the primer tumor in the breast. A better understanding of the molecular mechanism of the metastatic process may help to improve the clinical methods. For this purpose, we have used protein structure and protein networks together at the system level to explain genotype-phenotype relationship, and applied it to breast cancer metastasis.
We have built a comprehensive human PPI network, by combining the available protein-protein interactions data from various databases. Then, we have ranked all the interactions of human PPI network according to their relevance to genes known to be mediating breast cancer to brain and lung metastasis. We have formed two distinct metastasis PPI networks from high ranked interactions.
We have preformed functional analyses on brain/lung metastasis PPI networks and observed that the proteins of the lung metastasis network are also enriched in “Cancer”, “Infectious Diseases” and “Immune System” KEGG pathways. This finding may be pointing to a cause and effect relationship between immune system-infectious diseases and lung metastasis progression.
We have enriched the metastasis PPI networks with structural information both with available data in Protein Databank and with our protein interface predictions. In the interface prediction step, the most common protein-protein interface templates in lung metastasis are observed to be coming from bacterial proteins. This finding reinforced our claim about the relationship between lung metastasis and infectious diseases.
TOP

OPT20 Sunday, July 21: 3:10 p.m. - 3:35 p.m.
Music-listening regulates innate immune response genes in human peripheral whole blood
Room: ICC Lounge 81
Presenting Author: CHAKRAVARTHI KANDURI, UNIVERSITY OF HELSINKI, Finland

Additional Authors
Minna Ahvenainen (UNIVERSITY OF HELSINKI, MEDICAL GENETICS Finland); LI TIAN (UNIVERSITY OF HELSINKI, NEUROSCIENCE CENTER Finland); Liisa Ukkola-Vuoti (UNIVERSITY OF HELSINKI, MEDICAL GENETICS Finland); Pirre Raijas (UNIVERSITY OF HELSINKI, MEDICAL GENETICS Finland); Irma Järvelä (UNIVERSITY OF HELSINKI, MEDICAL GENETICS Finland); Harri Lähdesmäki (Aalto University School of Science, Information and Computer Science Finland);
Abstract:
Background: The rewarding effect and the physiological benefits of music-listening on human health are well acknowledged, but the underlying molecular mechanisms and biological pathways triggered by music-listening remain largely unknown. Here, using Illumina Human HT-12 v4, we analyzed the gene expression profiles in the peripheral whole blood of 41 subjects before and after music-listening to understand its effect on bodily functions.

Results: Statistical analyses using linear models identified the differential expression of 23 unique genes that have functions crucial for cellular innate immune responses. Functional annotation analyses demonstrated the beneficial effect of music-listening on tissue homeostasis. Firstly, network analysis showed that music-listening dampens peripheral immune responses by down-regulating a closely interacting immune-related network that is associated with functions such as cell death and survival, cell-to-cell signaling and interaction, and inflammatory response. Secondly, biological pathway analysis showed the dampening effect of music-listening on peripheral immune response-related pathways such as natural killer cell-signaling. Thirdly, biological regulator analysis suggested that music-listening may reduce inappropriate immune responses by balancing pro- and anti-inflammatory agents. Another striking evidence of our study provides reasonable explanation for the modulation of immune responses through music-listening. In conjunction, all these findings suggest that music-listening restrains immune over-activation.

Conclusion: These findings provide the primary evidence for the effect of music-listening on human gene expression and immune responses. A balanced immune homeostasis after music-listening substantiates the benefits of music-listening on human well-being.
TOP

OPT21 Sunday, July 21: 3:10 p.m. - 3:35 p.m.
Biological pathways of musical aptitude
Room: ICC Lounge 81
Presenting Author: Jaana Oikkonen, University of Helsinki, Finland

Additional Authors
Päivi Onkamo (University of Helsinki, Department of Biological and Environmental Sciences Finland); Yungui Huang (Ohio State University, Department of Pediatrics and Statistics United States); Veronica Vieland (Ohio State University, Department of Pediatrics and Statistics United States); Irma Järvelä (University of Helsinki, Department of Medical Genetics Finland);
Abstract:
Humans have developed the perception, production and processing of sounds into the art of music. Understanding of music is innate: Even newborn infants can recognize familiar melodies - the neuronal architecture is already set to process music. Musical aptitude has long been recognized to be inherited; indeed, heritabilities have been estimated to be as high as 0.7. Here, we evaluate the genetic background of musical aptitude.

We have measured musical aptitude as a skill of auditory perception: abilities to discriminate pitch, duration and sound patterns in tones. Genome-wide linkage and association scans were performed for 76 informative families ranging from trios to extended multigenerational families. We used Bayesian approach KELVIN, which supported these large families and quantitative phenotypes.

Several genes were identified. Importantly, most of the identified genes are involved in the development of cochlear hair cells or inferior colliculus (IC), both of which belong to the auditory pathway. We also confirmed previous findings of chromosome 4 being linked to musical aptitude.

Notably, all of the best associations were located at gene promoters. To study the biological meaning of the sites, we performed promoter analysis for the best-associated genes.

We hypothesize that genes affecting the development of auditory pathway constitute the ground for musical abilities and that differences in gene regulation cause the variation in these skills.
TOP

OPT22 Sunday, July 21: 3:40 p.m. - 4:05 p.m.
SplicingCompass: differential splicing detection using RNA-Seq data.
Room: ICC Lounge 81
Presenting Author: Rainer Koenig, University Hospital Jena, Germany

Additional Authors
Moritz Aschoff (DKFZ, Theoretical Bioinformatics Germany); Agnes Hotz-Wagenblatt (DKFZ, Genomics and Proteomics Core Facility Germany); Karlheinz Glatting (DKFZ, Genomics and Proteomics Core Facility Germany); Matthias Fischer (University Childrens' Hospital Cologne, Department of Pediatric Oncology and Hematology Germany); Roland Eils (DKFZ Heidelberg, Theoretical Bioinformatics Germany);
Abstract:
Alternative splicing is central for cellular processes and substantially increases transcriptome and proteome diversity. The emergence of next generation RNA sequencing provides an exciting new technology to analyze alternative splicing on a large scale. We present a new method and software to predict genes that are differentially spliced between two different conditions using RNA-seq data. Our method employs geometric angles between the high dimensional vectors of exon read counts. With this, differential splicing can be detected even if the splicing events comprise of higher complexity and involve previously unknown splicing patterns. We applied our approach to two case studies including neuroblastoma tumour data with favorable and unfavorable clinical courses and show the validity of our predictions as well as the applicability of our method in the context of patient clustering, simulated experiments, and association with specific regulatory splicing factor motifs in the regulated gene sequences.
TOP

OPT23 Sunday, July 21: 3:40 p.m. - 4:05 p.m.
A web server for the functional characterization of drugs from gene expression following treatment
Room: ICC Lounge 81
Presenting Author: Griet Laenen, KU Leuven, Belgium

Additional Authors
Amin Ardeshirdavani (KU Leuven, Department of Electrical Engineering-ESAT, SCD-SISTA / iMinds Future Health Department Belgium); Yves Moreau (KU Leuven, Department of Electrical Engineering-ESAT, SCD-SISTA / iMinds Future Health Department Belgium); Lieven Thorrez (KU Leuven, Department of Electrical Engineering-ESAT, SCD-SISTA / Department of Development and Regeneration @ Kulak Belgium);
Abstract:
Many drugs exert their therapeutic activities through the modulation of multiple targets. Moreover, this polypharmacology is often associated with both beneficial and adverse off-target effects. For most drugs these targets are largely unknown and identification among the thousands of gene products remains difficult. Yet a better knowledge about such drug-protein interactions, along with the molecular pathways involved and the associated diseases, could be of substantial value to drug development, in particular to predict side effects and explore potential drug repositioning.

DNA microarray technology enables us to observe the effect of drug treatment on the activity of all genes simultaneously and thus forms the perfect starting point for drug mode of action prediction. Hence we have developed an easy-to-use analysis suite for functional characterization of drugs based on gene expression changes following treatment. Our software provides all necessary tools for gaining new insights into the biological effects of a drug by integrating (1) preprocessing of gene expression data obtained from different Affymetrix array types; (2) quality assessment and exploratory analysis of these data; (3) genome-wide drug target prioritization; (4) prediction of pathways involved in the drug’s mode of effect; (5) identification of associated diseases enabling side effect prediction and drug repurposing; and (6) result visualization and reporting. Drug target prioritization is performed by means of an in-house developed algorithm for network neighborhood analysis, integrating the expression data with functional protein association information. All of the above functionalities are demonstrated on gene expression data for treatment with well-characterized drugs.
TOP

OPT24 Sunday, July 21: 3:40 p.m. - 4:05 p.m.
KBase: An Integrated Knowledgebase for Predictive Biology and Environmental Research
Room: ICC Lounge 81
Presenting Author: Elizabeth Glass, Argonne National Laboratory, United States

Additional Authors
Robert Cottingham (Oak Ridge National Laboratory, Computational Biology and Bioinformatics United States); Sergei Maslov (Brookhaven National Laboratory, Biology Department United States); Rick Stevens (Argonne National Laboratory, Computing, Environment, and Life Sciences United States); Thomas Brettin (Argonne National Laboratory, Mathematics and Computer Science Division United States); Dylan Chivian (Lawrence Berkeley National Laboratory, Physical Biosciences Division United States); Paramvir Dehal (Lawrence Berkeley National Laboratory, Physical Biosciences Division United States); Nomi Harris (Lawrence Berkeley National Laboratory, Physical Biosciences Division United States); Christopher Henry (Argonne National Laboratory, Mathematics and Computer Science Division United States); Folker Meyer (Argonne National Laboratory, Mathematics and Computer Science Division United States); Jennifer Salazar (Argonne National Laboratory, Computing, Environment, and Life Sciences United States); Doreen Ware (Cold Spring Harbor, Plant Biology United States); David Weston (Oak Ridge National Laboratory, Environmental Sciences Division United States); Brian Davison (Oak Ridge National Laboratory, System Biology & Biotechnology United States); Adam Arkin (Lawrence Berkeley National Laboratory, Physical Biosciences Division United States);
Abstract:
The new Systems Biology Knowledge base (KBase) is integrating commonly used core tools and their associated data, and building new capabilities on top of the combined data. New functionality allows users to visualize data, create powerful models or design experiments based on KBase‐generated suggestions. Although the integration of different data types will itself be a major offering to users, the project is about much more than data unification. KBase is distinguished from a database or existing biological tools by its focus on interpreting missing information necessary for predictive modeling, on aiding experimental design to test model‐based hypotheses, and by delivering quality‐controlled data. The project leverages the power of cloud computing and high‐performance computing resources across the DOE system of labs to handle the anticipated rapid growth in data volumes and computing requirements of the KBase.

KBase is a collaborative effort designed to accelerate our understanding of microbes, microbial communities, and plants. It is a community-driven, extensible and scalable open-source software framework, and application system. KBase offers free and open access to data, models and simulations, enabling scientists and researchers to build new knowledge, test hypotheses, design experiments, and share their findings.
TOP