Leading Professional Society for Computational Biology and Bioinformatics
Connecting, Training, Empowering, Worldwide


Student Council Symposium


Attention Presenters - please review the Speaker Information Page available here
Schedule subject to change
Sunday, July 21st
9:00 AM-10:00 AM
SCS Keynote: Reconstructing axolotl limb regeneration using sin...
Room: Delhi (Ground Floor)
  • Barbara Treutlein, ETH Zurich, Switzerland
10:45 AM-11:00 AM
Integration of Multi-Omics Data for Predicting Individual Colon Cancer Aethiology
Room: Delhi (Ground Floor)
  • Begüm Özemek
  • Uğur Sezerman

Presentation Overview: Show

Epigenetic changes are hereditary, dynamic and reversible changes that play a role in the formation of tumorigenesis. DNA methylation is one of the well-studied epigenetic mechanisms due to gene regulation, genomic imprinting and its importance on cell differentiation. miRNAs are small, non-coding, endogenous RNAs and play an important role as regulators in signal pathways in various cell mechanisms.
Cancer is affected by both genetic and epigenetic changes. Therefore, in order to understand the mechanisms leading to the formation of cancer, genetic and epigenetic changes should be studied together with a comprehensive approach. On the other hand, complex diseases follow different pathways for each patient, even if they show a similar phenotype. Personalized mechanisms should be identified in order to provide better health care and treatment. However, it is difficult to find the affected paths for a patient because there are many mechanisms that are affected by cancer.
In this study, we propose a holistic approach, along with the mentioned changes, to predict each change in gene expression changes, methylation amount-varying regions and miRNA transcription levels. In this study, DNA methylation, miRNA transcription and the omic data of gene expression were analyzed separately. Then, for each of these, pathways were analyzed, and pathways affected by separate mechanisms were found. Subsequently, candidate pathways were identified by combining these different data sources, and finally, pathways that were individually affected by DNA methylation levels or gene expression levels for colorectal adenocarcinoma patients were identified. This study presents the necessary bioinformatics background to be applied to all kinds of complex diseases, including different types of cancer or Alzheimer's disease.

11:00 AM-11:15 AM
Finding Biomarkers for Precision Cancer Medicine using Patient-Derived Xenografts
Room: Delhi (Ground Floor)
  • Arvind Singh Mer, University of Toronto, Canada
  • Wail Ba-Alawi, University of Toronto, Canada
  • Petr Smirnov, Univerisity Health Network, Canada
  • Yi Wang, University of Toronto, Canada
  • Ben Brew, Hospital for Sick Children, Toronto, Ontario, Canada
  • Janosch Ortmann, Centre de recherches mathématiques (CRM), University of Montreal, Canada
  • Ming-Sound Tsao, University Health Network, Toronto, Canada, Canada
  • David Cescon, University Health Network, University of Toronto, Canada
  • Anna Goldenberg, SickKids, Canada
  • Benjamin Haibe-Kains, University of Toronto, Canada

Presentation Overview: Show

Background: One of the key challenges in cancer precision medicine is finding robust biomarkers of drug response. Patient-derived tumor xenografts (PDXs) have emerged as reliable preclinical models since they better recapitulate tumor response to chemo- and targeted therapies. However, the lack of standard tools poses a challenge in the analysis of PDXs with molecular and pharmacological profiles. Efficient storage, access and analysis is key to the realization of the full potential of PDX pharmacogenomic data. We have developed Xeva (XEnograft Visualization & Analysis), an open-source software package for processing, visualization and integrative analysis of a compendium of in vivo pharmacogenomic datasets.

Methods and results: The Xeva package follows the PDX minimum information (PDX-MI) standards and can handle both replicate-based and 1x1x1 experimental designs. We used Xeva to characterize the variability of gene expression and pathway activity across passages. We found that only a few genes and pathways have passage specific alterations (median intraclass correlation of 0.53 for genes and positive enrichment score for 92.5% pathways). For example, activity of the mRNA 3'-end processing and elongation arrest and recovery pathways were strongly affected by model passaging (gene set enrichment analysis false discovery rate [FDR] <5%). We then leveraged our platform to link the drug response and the pathways whose activity is consistent across passages by mining the Novartis PDX Encyclopedia (PDXE) data containing 1,075 PDXs spanning 5 tissue types and 62 anticancer drugs. We identified 87 pathways significantly associated with response to 51 drugs (FDR < 5%), including associations such as erlotinib response and signaling by EGFR in cancer pathways and MAP kinase activation in TLR cascade and binimetinib response.

Conclusions: Among the significant pathway-drug associations, we found novel biomarkers based on gene expressions, copy number aberrations (CNAs) and mutations predictive of drug response (concordance index > 0.60; FDR < 0.05). Xeva provides a flexible platform for integrative analysis of preclinical in vivo pharmacogenomics data to identify biomarkers predictive of drug response, a major step toward precision oncology.

11:15 AM-11:20 AM
SCS Flash-talk: Alevin efficiently estimates accurate gene abundances from dscRNA-seq data
Room: Delhi (Ground Floor)
  • Avi Srivastava, Stony Brook university, United States
  • Robert Patro, Stony Brook University, United States
  • Laraib Iqbal Malik
  • Tom Smith
  • Ian Sudbery

Presentation Overview: Show

We introduce alevin, a new end-to-end droplet-based, single-cell sequencing processing tool, that performs cell barcode detection and correction, read mapping to the transcriptome, unique molecular identifier (UMI) deduplication and gene count estimation, and final cell barcode whitelisting using a machine learning-based approach. We propose a new algorithm for UMI deduplication, that handles reads mapping between multiple genes in a principled and consistent way. This allows the assignment of reads that are simply discarded by other existing tools. For reads whose gene of origin is still ambiguous after the application of our algorithm, we resolve their assignment probabilistically using an efficient expectation-maximization procedure. To assess the accuracy of the quantification estimates of alevin, we compare the gene-level counts from single-cell datasets against multiple bulk, short read sequencing datasets from the same tissue types. We show that the correlation between our estimates and the bulk estimates is higher than that for the other methods, and that the difference is considerable for genes sharing high sequence similarity with other genes. Additionally, we show that our method, even on large datasets, requires less than 15Gb of memory and is up to 10 times faster than the current state-of-the-art industrial pipeline.

11:20 AM-11:25 AM
SCS Flash-talk: Comprehensive pipeline for processing, deconvolution and visualization of complex DNA methylation data
Room: Delhi (Ground Floor)
  • Shashwat Sahay, Saarland University, Germany
  • Tony Kaoma, Luxembourg Institute of Health, Luxembourg
  • Valentin Maurer, University of Heidelberg, Germany
  • Francisco Azuaje, Luxembourg Institute of Health, Luxembourg
  • Joern Walter, Saarland University, Germany
  • Pavlo Lutsik, German cancer research center, Germany
  • Reka Toth, German cancer research center, Germany
  • Petr Nazarov, Luxembourg Institute of Health, Luxembourg
  • Michael Scherer, Max-Planck Institute for Informatics, Germany

Presentation Overview: Show

DNA methylation is an epigenetic mark that is broadly involved in gene regulation, while specific DNA methylation patterns are cell-type specific and key factors determining cellular identity. Thus, reference DNA methylation profiles can be used to infer the cellular composition of bulk tissue samples using reference-based deconvolution algorithms. These reference profiles are hard to obtain and cell-type identity is not always well-defined. Reference-free deconvolution methods have been created to infer both the proportions and the reference methylomes of underlying cell types in bulk tissue samples. However, these approaches pose challenges for the processing of input data and downstream interpretation.

MeDeCom is an R-package for reference-free deconvolution of complex methylomes. The input DNA methylation data matrix cannot contain unreliable or otherwise problematic methylation sites, and a pre-selection of potentially informative CpGs is to be performed. To alleviate this, we propose a new processing package (DecompPipeline) for reference-free deconvolution algorithms, which is not limited to MeDeCom. DecompPipeline uses Independent Component Analysis to account for common confounding factors including age and sex. Furthermore, we present a substantially revised visualization tool (FactorViz) to explore results of the most widely used reference-free deconvolution tools. FactorViz aids users in understanding the estimated reference profiles by associating them with phenotypic traits and performing enrichment analysis.
We applied the proposed infrastructure to a lung adenocarcinoma dataset with 461 samples. MeDeCom identified 7 latent methylation components, two of which were enriched for leukocyte specific transcription factor binding sites. The analysis of a whole blood dataset comprising 732 samples revealed substantial heterogeneity across the samples.

Understanding the cellular composition of complex tissues, such as solid cancers or whole blood, is crucial to address heterogeneity between patients in response to drugs or in clinical outcome. DecompPipeline, MeDeCom and FactorViz perform processing, deconvolution and visualization, respectively, of such complex DNA methylation data. In an application to a lung adenocarcinoma dataset, we found indications of tumor infiltrating immune cells. Furthermore, we highlight challenges in using whole blood as surrogate tissue. We envision that reference-free deconvolution will become an indispensable tool for epigenomic studies.

11:25 AM-11:30 AM
SCS Flash-talk: Structure-based drug repositioning reveals a well-known cancer drug as B-cells activation inhibitor.
Room: Delhi (Ground Floor)
  • Melissa F. Adasme, Biotechnology Center TU Dresden, Germany
  • Kristien Van Belle, Interface Valorisation Platform (IVAP), KU Leuven, Belgium
  • Sebastian Salentin, Biotechnology Center TU Dresden, Germany
  • V. Joachim Haupt, Biotechnology Center TU Dresden, Germany
  • Gary S. Jennings, Biotechnology Center TU Dresden, Germany
  • Jörg-Christian Heinrich, Biotechnology Center TU Dresden, Germany
  • Jean Herman, Interface Valorisation Platform (IVAP), KU Leuven, Germany
  • Ben Sprangers, Laboratory of Molecular Immunology (Rega institute), KU Leuven, Belgium
  • Thierry Louat, Interface Valorisation Platform (IVAP), KU Leuven, Belgium
  • Michael Schroeder, Biotechnology Center TU Dresden, Germany
  • Daniele Parisi, KULeuven / ESAT-STADIUS, Belgium
  • Yves Moreau, KU Leuven, ESAT, Stadius, Belgium

Presentation Overview: Show

Structure-based drug repositioning reveals a well-known cancer drug as B-cells activation inhibitor.

11:30 AM-11:35 AM
SCS Flash-talk: Characterizing allele specific expression using whole exome and transcriptome sequencing to improve the diagnosis of rare genetic diseases.
Room: Delhi (Ground Floor)
  • Numrah Fadra, Mayo Clinic, University of MN, United States
  • Eric Klee, Mayo Clinic, Rochester, MN, United States
  • Garrett Jenkinson, Mayo Clinic, United States

Presentation Overview: Show

Loss of heterozygosity (LOH) is an important mechanism for studying the impact of rare variants on gene disruption. In this study, we aim to elucidate the characteristics of LOH using transcriptome sequencing on 30 unrelated rare disease patients to augment the value of Whole exome sequencing (WES) based rare disease gene discovery.

A profile for high confidence heterozygous mutations from WES data was generated. A distribution of the allele frequencies for these loci within the transcriptome data was obtained. Using these features as a guide, an algorithm was implemented for accurate identification of LOH at the gene level based on the calculation of allele frequencies for homozygous positions within the transcriptome that display a heterozygous genotype within the exome.

Using a whole exome and transcriptome guided method for inference of loci showing LOH, we observed 60-80 mutations per sample showing strong allelic bias in favor of LOH in a cohort of 30 unrelated patients with rare genetic diseases. A total of 110 genes were identified as showing strong evidence of LOH. 5 out of 110 genes (~4.5%) are known to be genetically imprinted. Averages of 4 mutations per sample were present within regions of low complexity and 5 mutations per sample were present within intronic regions. All samples have maximum number of transition mutations except 4 samples (13%) which show higher number of transversions relative to transitions. In the next step, we aim to categorize positions showing LOH into several classes; a classification that can explain the presence of LOH for any given position. The classes include protein truncating variants resulting in loss of function, variants present within cryptic splice sites causing nonsense mediated decay, variants present within promoter binding sites or methylation sites , variants present within immunology related genes and variants disrupting long non coding RNA binding sites. One of the overarching goals of this project would be identifying mutations showing unexplained strong expression bias of one allele versus the other for reasons that cannot be explained by known genetic mechanisms. These findings are instrumental for understanding the underlying pathophysiology of rare genetic diseases.

11:35 AM-11:40 AM
SCS Flash-talk: The characterization of Plasmodial GTP Cyclohydrolase I enzyme as a potential anti-malarial drug target using computational approaches
Room: Delhi (Ground Floor)
  • Afrah Khairallah, Rhodes University, Research Unit in Bioinformatics (RUBi), South Africa
  • Vuyani Moses, Rhodes University, Research Unit in Bioinformatics (RUBi), South Africa
  • Ozlem Tastan Bishop, Rhodes University, South Africa

Presentation Overview: Show

Malaria remains a global health problem. The mosquito-borne disease is caused by species of the genus Plasmodium. About a quarter of a billion people are infected annually, including young children, pregnant women and non-immune travelers. The efficacy of current anti-malarial drugs has reduced drastically due to the parasites’ increased resistance against available antimalarial drugs. This has raised a great challenge in anti-malarial drug discovery and establishes the need for new treatment strategies. Guanosine triphosphate CycloHydrolase I (GCH1) is a rate limiting enzyme of the malaria parasite’s folate biosynthesis pathway that has never been targeted before – the enzyme is important for the production of essential cofactors needed for the parasite DNA synthesis. In this study, bioinformatics tools were used to screen for new anti-malarial drug candidates, to inhibit the malaria parasite de novo folate pathway GCH1 enzyme.

Sequence and phylogenetic analyses were performed, followed by modeling the 3D structure of the malaria parasite Plasmodium. falciparum GCH1 enzyme using the homology modeling approach. Potential drug candidates from the South African Natural Compounds Database https://sancdb.rubi.ru.ac.za/) were screened against the GCH1 enzyme. All atom molecular dynamics (MD) simulations were performed to analyse the dynamical properties of the protein using the GROningen MAchine for Chemical Simulations and Chemistry at HARvard Molecular Mechanics (CHARMM) MD packages. Force field (FF) parameters describing the enzyme metal centres were developed using quantum mechanics calculations and potential energy surface scans approach via the Gaussian09 software package. The use of resources available at the Centre for High-Performance Computing (Cape Town, South Africa) has accelerated and opened new perspectives to this process.

Findings and conclusion
The GCH1 protein showed selective binding to inhibitor compounds, and a set of promising inhibitors were identified. New FF parameters describing the protein metal bonded terms were derived and validated using CHARMM. The developed FF parameters will be of important use to yield accurate and reliable MD simulations, which can aid significantly in the in-silico identification and validation of the identified potential drug candidates for malaria treatment.

11:40 AM-11:45 AM
SCS Flash-talk: GSCALite: A Web Server for Gene Set Cancer Analysis
Room: Delhi (Ground Floor)
  • Anyuan Guo, Huazhong University of Science and Technology, China
  • Feifei Hu

Presentation Overview: Show

Next generation sequencing (NGS) technology has emerged as a powerful method for cancer genomics analysis. The Cancer Genome Atlas (TCGA), Genotype-Tissue Expression (GTEx) and other projects have generated a large amount of complex, multi-omics data for cancer and normal samples, provide unprecedented opportunities to understand cancer causal genes and mechanism, find candidate drug targets, and screen genes associated with phenotypes. Recently, a few excellent web servers such as cBioPortal focusing on the genomic variations based on multi-omics, GEPIA and Oncomine providing analysis for single gene expression and survival. However, cancer initiation, progression and metastasis are inclined to the result of mutation and/or expression alterations of a set of genes or pathways, and the signal of a single gene could be covered by background noise. Thus, performing gene set association analysis with big data of cancer multi-omics and drug sensitivity is imperative and very useful for cancer research.
Therefore, we developed an interactive web-based application named Gene Set Cancer Analysis (GSCALite) to analyze and visualize the expression/variation/correlation of a gene set in cancers with flexible manner, with the following functional modules. (i) Differential expression in tumor vs normal, and the survival analysis; (ii) Genomic variations profile and their association with survival, including Single Nucleotide Variation, Copy Number Variation and Methylation; (iii) Correlation between gene expression and cancer pathway activity; (iv) miRNA regulatory network from verified target databases and prediction methods for genes; (v) Drug sensitivity influenced by gene expression; (vi) Normal tissue expression and eQTL for genes.
GSCALite is a user-friendly web server for dynamic analysis and visualization of gene set in cancer and drug sensitivity correlation. Also, is a time-saving and intuitive tool for unleashing the value of the cancer genomics big data which enables experimental biologists without any computational programming skills to test hypothesis, which will be of broad utilities to cancer researchers. In addition, it is based on gene set analysis with multi-omics data which complements the analysis with mRNA expression alone or single gene analysis.
Availability: GSCALite is available on http://bioinfo.life.hust.edu.cn/web/GSCALite/.

11:45 AM-12:30 PM
SCS Keynote: Bioinformatics and Exploratory Data Analysis in Pharmaceutical Industry. Applications to drug research and development
Room: Delhi (Ground Floor)
  • Fabian Birzele, F. Hoffmann-La Roche Ltd., Switzerland

Presentation Overview: Show

Fabian Birzele studied Bioinformatics in Munich and holds a PhD in Bioinformatics from the LMU in Munich that he completed in 2009 in the group of Ralf Zimmer. After his PhD he moved to a PostDoc at Boehringer Ingelheim implementing NGS technologies in drug research. In 2011 he joined Roche in Penzberg as bioinformatician working mostly on Oncology projects before he became a group leader for Oncology Bioinformatics in Basel in 2016. Today he is the head of a team of 30 bioinformaticians and biostatisticians called Bioinformatics and Exploratory Data Analysis (BEDA) who support drug development projects across different disease indications from very early stages (target identification and target assessment) down to clinical phase 2 in Roche's Early Research and Development organization (pRED).

1:15 PM-1:30 PM
Evaluation of Single-Molecule Long-Read Sequencing Technologies for Structural Variant Detection in Human Genomes
Room: Delhi (Ground Floor)
  • Nazeefa Fatima
  • Adam Ameur

Presentation Overview: Show

Chromosomes can undergo various changes such as large deletions and/or insertions, resulting in structural variation differences between individuals. Structural variants (SVs) are a common source of variability in the human genome and are known to be associated with several diseases. SVs often involve complex genomic rearrangements that are difficult to resolve using short read sequencing technologies. New approaches enabled by the latest generation of long-read single-molecule sequencing instruments, provided by Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT), can produce a sufficient amount of data to enable SV detection across entire human genomes to a reasonable cost.

Previously, we performed PacBio sequencing of two Swedish human genomes, as part of the SweGen 1000 Genomes project (https://swefreq.nbis.se) and uncovered over 17,000 SVs per individual (Ameur et al, 2018). A majority of these SVs were not detectable in short reads. As a follow-up, we have now generated data for the same individuals on ONT’s PromethION system, a new nanopore-based platform known for its higher throughput as compared to PacBio.

We present a pilot study that evaluates nanopore data derived from whole-genome sequencing (WGS) on PromethION in comparison to the Single-Molecule Real-Time (SMRT) reads obtained from the PacBio RSII platform. We performed comparative analyses of single-molecule technologies in a context of mappability, and SV detection that resulted in an average of 17k and 24k variants across nanopore and SMRT datasets, respectively. The results will be useful for the large-scale SweGen project, while the study serves as a bioinformatics pipeline for future long-read data analyses and sets a basis for what to consider when designing future PromethION experiments.

1:30 PM-1:45 PM
Resources for repeat protein structure annotation: RepeatsDB and RepeatsDB-Lite
Room: Delhi (Ground Floor)
  • Lisanna Paladin, University of Padua, Italy
  • Damiano Piovesan, University of Padova, Italy
  • Silvio Tosatto, University of Padova, Italy

Presentation Overview: Show

Tandem repeats (TR) in proteins are ubiquitous in genomes and have been demonstrated to be of fundamental importance in several biological processes. Structural TR modules, called units, determine the repeated region structure, stability and function.
The largest collection of TR proteins detected from structural features is provided by the RepeatsDB database, developed to fill the gap in TR protein annotation. It relies on computational approaches and expert manual curation to detect TRs in the Protein Data Bank (PDB) structures. RepeatsDB annotation pipeline is integrated in RepeatsDB-lite, web server for the prediction and refinement of TRs annotation. RepeatsDB-lite outperforms existing methods in the prediction of TR units. Its interface allows the user to evaluate the prediction visualizing similarity relationships between units at both sequence and structure level. The revision process provides us a feedback about RepeatsDB-lite prediction reliability. In addition, high quality annotations can be submitted to RepeatsDB, where currently about 60% of all entries are manually reviewed.
In the database is now possible to compare the structural annotation of unit positions with secondary structure, fold information, UniProt annotation and Pfam domains. TR unit positions and exon boundaries along the protein are compared to provide insights into the hypothesis of TR evolution through exon duplication. Future work will concentrate on exploiting the structural and evolutionary repeat unit definitions to create Pfam profiles for TR detection from sequence.

1:45 PM-2:00 PM
A Score to Differentiate Between Epithelial and Mesenchymal Cells in Cancer and Normal Tissue
Room: Delhi (Ground Floor)
  • Natalie Davidson, ETH Zürich, Department for Computer Science, Switzerland
  • Prachi Shukla

Presentation Overview: Show

Background: Cancer cell characterization is exceedingly difficult due in part to abnormal phenotypes caused by dysregulated gene expression. As a result, an active area of research is properly identifying pro-metastatic cells based upon morphological and expression methods. One key process where aberrant expression leads to morphological changes associated with metastasis and treatment resistance is epithelial-to-mesenchymal transitions. Precise identification of mesenchymal populations would greatly aid research efforts and treatment planning, but no accurate pan-cancer method currently exists. The goal of this study was to train a model on RNA sequencing data to accurately score samples on a gradient as either epithelial or mesenchymal.
Description: The score was trained using data from The Cancer Genome Atlas and the Genotype-Tissue Expression project totaling 10,132 samples. Validation of the score was conducted by 10-fold cross validation on the 20% of epithelial or mesenchymal samples withheld from training (Test ROC-AUC: 0.939). Additionally, in several external datasets, we found a significant difference between mesenchymal and epithelial cells; for renal carcinoma biopsy samples (PMID: 26901863; p-value <= 0.0001; t-test), lung cancer cell lines (PMID: 25538889; p-value <= 0.05; t-test), and head/neck squamous cell carcinoma single-cell samples (PMID: 29198524; p-value <= 0.001; t-test). In addition, within The Cancer Genome Atlas samples, the score significantly correlated with epithelial miRNA-200 family members in eighteen cancer types (Pearson’s correlation coefficient > 0.3; q-value <= 0.05; t-test). Finally, to show the clinical relevance of the score, it was applied to uterine cancers which are subdivided into carcinomas, carcinosarcomas, and sarcomas that show progressively increasing expression of a mesenchymal phenotype. Uterine carcinomas correctly scored as epithelial, sarcomas as mesenchymal and carcinosarcomas scored in between. All groups were significantly different from each other (all p-values <= 0.0001; t-test).
Conclusions: The score builds beyond existing methods of epithelial-mesenchymal transition identification because it does not require subjective morphology assessments, nor does it depend solely on cell marker expression which varies by tissue type. Furthermore, the score was trained on vast pan-cancer and normal tissue databases containing 31 distinct tissue types and therefore is versatile with widespread applicability for clinical or academic research.

2:00 PM-2:05 PM
SCS Flash-talk: Towards oligogenic disease prediction with ORVAL: a web-platform to uncover pathogenic variant combinations
Room: Delhi (Ground Floor)
  • Philippe Guillaume, University of Lausanne, Switzerland
  • Alexandre Renaux, Interuniversity Institute of Bioinformatics in Brussels, Université libre de Bruxelles, Vrije Universiteit Brussel, Belgium
  • Sofia Papadimitriou, Interuniversity Institute of Bioinformatics in Brussels, Université libre de Bruxelles, Vrije Universiteit Brussel, Belgium
  • Nassim Versbraegen, Interuniversity Institute of Bioinformatics in Brussels, Université libre de Bruxelles, Belgium
  • Charlotte Nachtegael, Interuniversity Institute of Bioinformatics in Brussels, Université libre de Bruxelles, Belgium
  • Simon Boutry, Interuniversity Institute of Bioinformatics in Brussels, de Duve Institute - UCLouvain, Belgium
  • Ann Nowé, Interuniversity Institute of Bioinformatics in Brussels, Vrije Universiteit Brussel, Belgium
  • Tom Lenaerts, Interuniversity Institute of Bioinformatics in Brussels, Université libre de Bruxelles, Vrije Universiteit Brussel, Belgium

Presentation Overview: Show

The vast amount of DNA sequencing data collected from large patient cohorts have helped in identifying a wide number of disease related mutations relevant for diagnosis and therapy. While existing bioinformatics methods and resources are mainly focusing on causal variants in Mendelian diseases, many difficulties remain to analyse more intricate genetic models involving variant combinations in different genes, an essential step for the discovery of the causes of oligogenic diseases. ORVAL (the Oligogenic Resource for Variant AnaLysis) tries to solve this problem by generating networks of pathogenic variant combinations in gene pairs, as opposed to isolated variants in unique genes. This online platform integrates innovative machine learning methods for combinatorial variant pathogenicity prediction and offers several interactive and exploratory tools, such as predicted pathogenicity and protein-protein interaction networks, a ranking of pathogenic gene pairs, as well as visual mappings of the cellular location and pathway information. ORVAL is the first web-based exploration platform dedicated to identifying networks of candidate pathogenic variant combinations to help clinicians and researchers in uncovering oligogenic causes for more complex diseases. ORVAL is available at https://orval.ibsquare.be.

2:05 PM-2:10 PM
SCS Flash-talk: A Bayesian Markov models-based motif discovery tool for predicting motifs in nucleotide sequences and its web server
Room: Delhi (Ground Floor)
  • Wanwan Ge, MPI-BPC, Germany
  • Christian Roth, MPI-BPC, Germany
  • Johannes Soeding, MPI BPC, Germany

Presentation Overview: Show

Abstract: We developed a Bayesian approach for motif discovery using Markov models (BaMMs) which learn nucleotide dependencies within motifs for transcription factor binding. BaMMs efficiently prevent overfitting by automatically adapting model complexity to the amount of available data which can be still estimated reliably up to order k. We have shown previously that higher-order Bayesian Markov Models perform substantially better in ROC analyses than position weight matrices (PWMs) or first-order models [Siebert M, Söding J, NAR, 2016].

To bring the community the high-order BaMMs with improved quality and to offer users the possibility to combine various standard analyses, we now developed the BaMM web server with a user-friendly and largely self-explanatory interface and results pages. The BaMM web server offers four tools: (i) de-novo discovery of enriched motifs in a set of nucleotide sequences, (ii) scanning a set of sequences with motifs to find motif occurrences, (iii) searching with an input motif for similar motifs in our BaMM database, and (iv) browsing and keyword searching in the database. Our motif database contains motifs for over 1000 transcription factors, trained from ChIP-seq databases for human, mouse and other model organisms. In contrast to most other servers, such as JASPAR and HOCOMOCO, we represent sequence motifs not by PWMs but by BaMMs of order 4. The BaMM server is freely accessible without registration at https://bammmotif.mpibpc.mpg.de [Kiesel A, et al, NAR, 2018].

To address the inadequacy of P- and E-values as measures of motif quality, which are badly correlated with biological relevance of the motif, we have developed the AvRec score (average recall over the TP-to-FP ratio between 1 and 100). The AvRec score summarizes how well the motif model can distinguish true motif instances from the background.

2:10 PM-2:15 PM
SCS Flash-talk: An Integrated Molecular Modeling and Dynamic Residue Network Analysis Strategy to Identify Allosteric Modulators of Human Heat Shock Proteins
Room: Delhi (Ground Floor)
  • Arnold Amusengeri, Rhodes University, South Africa
  • Ozlem Tastan Bishop, Rhodes University, South Africa

Presentation Overview: Show

The need for next-generation anti-cancer drugs cannot be overstated. Due to resistance to multi-drug therapy, new strategies have been proposed to identify top quality leads against key human cancers’ pro-survival targets. The present study aims at combining structure-based drug design fundamentals with dynamic residue network (DRN) concepts to identify and assess allosteric regulation propensities of South African natural compounds. Utilizing high through-put molecular docking technique, heat shock proteins Hsp72 and Hsc70 were screened for their previously identified allosteric sites against South African Natural Compounds Database (SANCDB; https://sancdb.rubi.ru.ac.za/). Selected protein-hit complexes were further analyzed by molecular dynamics calculations. Discorhabdin N, a marine alkaloid commonly isolated from Latrunculid sponges, which bound allosteric substrate binding domain (SBD) back pocket, modulated both protein targets’ dynamic behavior. Further, using DRN analysis via MD-TASK tool kit, key allosteric communication centers within the proteins were identified. The implications of ligand binding on these signal sensitive hotspots were determined. Our findings allowed us to discuss possible allostery regulatory mechanisms of Discorhabdin-N, and hence provide a novel approach developed by the group for allosteric drug discovery.

2:15 PM-2:20 PM
SCS Flash-talk: FFT-Mutant kit. A novel library for the design of de novo mutations using mathematical modeling techniques, data mining and pattern recognition.
Room: Delhi (Ground Floor)
  • David Medina, CeBiB, Chile
  • Alvaro Olivera-Nappa, Centre for Biotechnology and Bioengineering, Department of Chemical Engineering and Biotechnology, University of Chile, Chile

Presentation Overview: Show

Designing mutations to get desirable biological activities is one of the most recurrent problems in biotechnology. Experimental methods imply large time, economic costs and limit the search space for mutants. Computational tools appear as a powerful solution for this challenge. Current tools mainly use molecular dynamics-derived techniques to evaluate energy changes within the protein and, from this, its stability. However, calculation times scale up with precision. Libraries have been implemented, which facilitate the estimation of mutant stability from its physicochemical properties. Nevertheless, the latent problem of the design of mutants with desirable biological activities still persists.

As a different approach, we propose FFT-Mutant Kit (Fast Fourier Transformation Mutant Kit), a novel tool that allows to design mutations from linear sequences, digitizing their physicochemical properties, considering for their evaluation, structural and phylogenetic information and the application of techniques of data mining and pattern recognition.

This library works by coding the linear sequence of residues in its physicochemical properties. Then, dimensionality reduction techniques are implemented to obtain the most relevant properties. Data sets are created considering variants of homolog proteins whose characteristics are known, and descriptors are digitized by means of a Fast Fourier Transform. The data set is subjected to an exploratory phase of model training. Performance distributions are calculated, then statistical tests are applied to select the best models. Finally, Meta Learning techniques are used for joining models, based on Bayesian probability distributions. Therefore, FFT-Mutant kit creates prediction/classification models for variants based on frequencies in the Fourier spectrum of physicochemical properties.

For the design of new mutations and to evaluate their potential effect, physicochemical, thermodynamic and phylogenetic properties are considered as a pre-filter stage. If these indicators are within acceptable ranges, the tool applies trained models to associate mutations and their expected effect on the target variable, the probabilities associated to classification errors, and the relevant physicochemical properties for training and describing the suggested mutations.

Finally, it is believed that this tool will be a significant contribution when designing new mutants with desirable biological activities and a powerful means of studying proteins from the digitalization of their physicochemical properties.

2:20 PM-2:25 PM
SCS Flash-talk: AnimalTFDB 3.0: a comprehensive resource for annotation and prediction of animal transcription factors
Room: Delhi (Ground Floor)
  • Hui Hu, Huazhong University of Science and Technology, China
  • Anyuan Guo, Huazhong University of Science and Technology, China

Presentation Overview: Show

The Animal Transcription Factor DataBase (AnimalTFDB) is a resource aimed to provide the most comprehensive and accurate information for animal transcription factors (TFs) and cofactors. The AnimalTFDB has been maintained and updated for seven years and we will continue to improve it. Recently, we updated the AnimalTFDB to version 3.0 (http://bioinfo.life.hust.edu.cn/AnimalTFDB/) with more data and functions to improve it.
AnimalTFDB contains 125,135 TF genes and 80,060 transcription cofactor genes from 97 animal genomes. Besides the expansion in data quantity, some new features and functions have been added. These new features are: (i) more accurate TF family assignment rules; (ii) classification of transcription cofactors; (iii) TF binding sites information; (iv) the GWAS phenotype related information of human TFs; (v) TF expressions in 22 animal species; (vi) a TF binding site prediction tool to identify potential binding TFs for nucleotide sequences; (vii) a separate human TF database web interface (HumanTFDB) was designed for better utilizing the human TFs.
The new version of AnimalTFDB provides a comprehensive annotation and classification of TFs and cofactors, and will be a useful resource for studies of TF and transcription regulation.

2:30 PM-3:30 PM
SCS Keynote: Big Data: opportunities, pitfalls, remedies
Room: Delhi (Ground Floor)
3:50 PM-4:05 PM
PhyloMagnet: Fast and accurate profiling of short-read meta-omics data using gene-centric phylogenetics
Room: Delhi (Ground Floor)
  • Max Emil Schön, Uppsala University, Sweden
  • Laura Eme, Uppsala University, Sweden
  • Thijs J G Ettema, Uppsala University, Sweden

Presentation Overview: Show

Metagenomic and metatranscriptomic sequencing analyses have become increasingly popular tools for producing massive amounts of short-read data, often used for the reconstruction of draft genomes or the detection of (active) genes in microbial communities. Unfortunately, sequence assemblies of such datasets generally remain a computationally challenging task. Frequently, researchers are only interested in a specific group of organisms or genes; yet, the assembly of multiple datasets only to identify candidate sequences for a specific question is sometimes prohibitively slow, forcing researchers to select a subset of available datasets. Here we present PhyloMagnet, a workflow to screen meta-omics datasets for taxa and genes of interest using gene-centric assembly and phylogenetic placement of sequences.

Given one or several input short reads datasets, PhyloMagnet aligns these to the chosen reference protein families and performs gene-centric assembly based on the computed read-to-protein alignments. As only the aligned part (core) of each read is used for the assembly, no pre-processing such as adapter clipping is needed. The assembled and translated contigs are then aligned to the alignment of the corresponding protein families and placed onto the reference tree to determine its taxonomic annotation. PhyloMagnet identified up to 87% of the genera in an in vitro mock community (MBARC-26) with variable abundances of prokaryotic strains, while the false positive predictions per single gene tree ranged from 0% to 23%. We compared PhyloMagnet to a similar tool that does not rely on using a gene-centric assembly approach and show that PhyloMagnet is faster and has a higher precision and sensitivity.

Our results demonstrate that PhyloMagnet can accurately identify short-read sequence datasets that contain sequences for genes and taxa of interest. The workflow allows researchers to explore the microbial diversity of a specific clade, or to assess the presence of a metabolic pathway of interest. For example, when applied to a group of metagenomes for which a set of metagenome-assembled genomes have been published, we could detect the majority of the taxonomic labels that the genomes had been annotated with, showcasing how our tool could be used to screen metagenomic datasets before applying whole-metagenome assembly and binning methods.

4:05 PM-4:20 PM
Genome sequence and comparative genomics of the rubber tree pathogen Pseudocercospora ulei
Room: Delhi (Ground Floor)
  • Sandra Milena Gonzalez Sayer, Ntional University of Colombia, Colombia
  • Diego Mauricio Riaño Pachon, Nuclear Energy Center, São Paulo University, Brazil
  • Fabio Aristizabal Gutierrez, National University of Colombia, Colombia
  • Ibonne Aydee Garcia Romero

Presentation Overview: Show

Natural rubber is a naturally produced polymer that represents the raw material for the manufacture of several products of medical and automotive industries. The Hevea brasiliensis specie is the main commercial source of rubber due to its physicochemical properties and high yield production. Despite being the geographic origin, Latin American countries only contribute 1% to rubber global commercialization, this uncompetitive production is because of the presence of a disease known as the South American leaf blight. The causal agent of this disease is the ascomycete fungus Pseudocercospora ulei, which is a specialized pathogen with biotrophic nutritional metabolism. Detailed molecular characterization of P. ulei populations, their pathogenicity mechanisms, and the molecular interplay with their hosts are lacking.

We are sequencing the genome of P. ulei with the aim of identifying genes involved in the pathogenicity. For this purpose, whole-genome shotgun sequencing and assembly of the genome was done using Pacific BioScience, Oxford Nanopore, and Illumina platforms. The first version of the assembly was constructed with short reads using the MaSurCA software and it was later improved integrating the long reads obtaining a hybrid genome version with the same software. Finally, we performed a third assembly with long reads using Falcon and wtdbg2 software. The quality of each assembly was assessed by Quast, blobTools, and BUSCO.

The final assembly size was 90 Mb, split into 1311 contigs with the largest at 617.313 bp and an N50 of 144.000 bp. This genome size is the largest reported so far within organisms of the Mycosphaerellaceae family. BUSCO results showed that 1253 complete BUSCOs from Ascomycete lineage were assigned to P. ulei genome (95%). From these, only 2% were duplicated, suggesting either that this genome is haploid or a very low level of polymorphism between chromosomes. Furthermore, we found that DNA rearrangement domains were highly represented in 670 contigs of P.ulei genome, these domains could be correlated with gene duplication events. We confirm the presence of this kind of domains in other fungi of Trichocomaceae family, but could not identify them in closely related species like Pseudocercospora fijensis and Pseudocercospora pini -densiflora.

4:20 PM-4:35 PM
Disentangling environmental effects in microbial association networks
Room: Delhi (Ground Floor)
  • Ina Maria Deutschmann, Institute of Marine Sciences (ICM-CSIC), Spain
  • Gipsi Lima-Mendez, Department of Microbiology and Immunlogy, REGA Institute, KU Leuven, Leuven, Belgium, Belgium
  • Anders K. Krabberød, Department of Biosciences, Section for Genetics and Evolutionary Biology (Evogene), University of Oslo, Oslo, Norway, Norway
  • Karoline Faust, Department of Microbiology and Immunlogy, REGA Institute, KU Leuven, Leuven, Belgium, Belgium
  • Ramiro Logares, ICM-CSIC, Spain
  • Sergio M. Vallina

Presentation Overview: Show

Ecological interactions among microbes are fundamental for ecosystem functioning. Yet most of them remain unknown. High-throughput omics can help unveiling microbial interactions by inferring species associations over time or space, which can be represented as networks. Associations in these networks can indicate ecological interactions between species or alternatively, similar or different environmental preferences, in which case the association is environmentally-driven. Therefore, prior to network analysis and interpretation, it is important to disentangle these associations and determine whether two species are associated because they interact ecologically or because they are associated to an abiotic or biotic environmental factor.
We developed an approach to determine whether or not two species are associated in a network due to environmental preference. We use a combination of four methods (Sign Pattern, Overlap, Interaction Information, and Data Processing Inequality) that aim to detect what associations in a network are environmentally-driven. Our approach was tested on simulated networks as well as on real marine microbial networks constructed with temporal and spatial community composition data. For the network constructed with 10 years of monthly microbial-plankton abundance data, we found that 14% of the associations were predicted to be environmentally-driven. In the network, constructed with 120 samples from the surface global ocean, 16% of the associations were predicted to be environmentally-driven.
We conclude that it is crucial to determine and quantify environmentally-driven associations in microbial association networks in order to generate more accurate hypotheses on ecological interactions. We implemented our approach in a publicly available software tool called EnDED (Environmentally-Driven Edge Detection).

4:35 PM-4:50 PM
Pleiades Toolkit: Automatic rule-based modeling of bacterial gene regulation enables simulation, prediction, and perturbation of gene responses
Room: Delhi (Ground Floor)
  • Leandro Murgas, Universidad Mayor, Chile
  • Sebastian Contreras-Riquelme, Universidad Mayor, Chile
  • J. Eduardo Martinez, Universidad Mayor, Chile
  • Rodrigo Santibáñez
  • Alberto J.M. Martin, Universidad Mayor, Chile

Presentation Overview: Show

Gene expression is a key factor in the development and maintenance of life in all organisms. This regulation is carried on mainly through the action of transcription factors (TFs), although other actors such as ncRNAs are also involved. However, integrating different types of information related to gene regulation at a specific time can be costly experimental and computationally. In this work, we developed a method to construct condition specific Gene Regulatory Networks (GRNs), i.e., networks depicting regulatory events in a certain condition, using several types of experimental data collected from different databases. Our method creates GRNs starting from a Gold Standard Network (GSN) that contains all known gene regulations for an organism. Regulations from the GSN that are unlikely taking place are removed by applying a series of filters. The method considers different combinations of experimental evidence, including DNA methylation and accessibility, histone modifications and gene expression, or combination of these data. In this way, if a given pattern of data generated in the same experimental condition is associated to, for example, inactive TF-binding sites, the respective filter will remove all TF-gene interactions associated to those sites. The implementation is available as a Cytoscape application available at https://figshare.com/articles/WeoN_install_zip/7913912.

4:50 PM-4:55 PM
SCS Flash-talk: Traitcapture: An integrated plant genome-phenome-environment discovery toolset
Room: Delhi (Ground Floor)
  • Kevin Murray, ANU, Australia
  • Tim Brown

Presentation Overview: Show

Agricultural crops require ongoing genomic improvement to satisfy demand and avoid ecological and humanitarian disaster. Advanced breeding technologies are required to provide an adequate rate of improvement. These technologies require statistical association of genotype, phenotype, and environment. While genomic sequencing technology has rapidly accelerated over recent years, this has often been in isolation from phenotypes and environments.

The TraitCapture project develops open-source high-throughput plant phenotyping methods that interface with high-performance genomic sequencing pipelines to associate genotype and phenotype under dynamic environments. Specific sub-projects include high-performance time-series image analysis software, computer vision software for plant phenotype extraction, software and hardware for gigapixel-resolution composite image time-lapse, and software and hardware for enhanced control of plant growth chamber conditions.

Traitcapture tools allow statistical linkage of genotype, phenotype, and environment in crop and model species, as well as keystone tree species of threatened ecosystems.

4:55 PM-5:00 PM
SCS Flash-talk: Comparative genomic analysis in Leishmania spp. reveals new insights into their metabolism and genome regulation
Room: Delhi (Ground Floor)
  • J. Eduardo Martinez, Universidad Mayor, Chile
  • Cristian Molina, Lircaytech, Chile
  • Vinicius Maracaja-Coutinho, Advanced Center for Chronic Diseases - ACCDiS, Facultad de Ciencias Químicas y Farmacéuticas Universidad de Chile, Chile
  • Alberto J.M. Martin, Universidad Mayor, Chile

Presentation Overview: Show

Leishmaniasis is a vector-borne disease caused by different Leishmania species. One million new cases of Leishmaniasis reported annually have a devastating impact on global health. Notoriously, our ability to fight the many different strains of these parasites, and the different disease forms they cause, is limited by our understanding of their underlying genetics. Thus, here we investigate the structural and functional differences of genomes across 26 strains of Leishmania spp. by performing a novel structural and functional gene annotation by a combination of different methods and in house pipeline based on sequence homology and posteriori pangenomic analysis. In particular, we focus on identifying gene regulatory elements and their relationship with different disease forms. Our results show an open pangenome composed of 16046 genes. With our methods combination we improved the annotation of regulatory proteins for these parasites and transcription factors differed among strains depending on the associated disease form. These results could improve our understanding of how differences in the pathogenesis of particular Leishmania strains are linked to differences in their genetics.

5:00 PM-5:05 PM
SCS Flash-talk: Establishment of bladder cancer cell line persistently infected with Newcastle Disease Virus
Room: Delhi (Ground Floor)
  • Umar Ahmad
  • Chan Soon Choy
  • Syahril Abdullah
  • Khatijah Yusoff
  • Abhimanyu Veerakumarasivam

Presentation Overview: Show

Background: Newcastle disease virus (NDV) is an avian oncolytic virus that selectively replicates and kills many different types of cancer cells and is being developed for cancer virotherapy. Our aim was to establish persistent infection in TCCSUP bladder cancer cells and characterise it. Description: Cytopathic effect (CPE) were observed with progressive acute lysis crisis when TCCSUP cells were infected with NDV AF2240 at low MOI. Small percentage of cells survived the viral infection after 5 days. These subset of cells were maintained until subconfluent monolayer is formed and were termed persistently infected cells (TCCSUPPi). A visual difference between uninfected TCCSUP and TCCSUPPi monolayer at 24 hrs was observed. TCCSUPPi did not show characteristic elongation and spreading observed in the mock infected TCCSUP. PCR indicated the presence of viral genome in the TCCSUPPi cells. Flow cytometric analysis demonstrated that 85% of the TCCSUPPi cells at different passages expressed GFP at 24 hrs when infected with recombinant NDV AF2240 (rNDV-GFP). NDV particles were observed in some endosomes of the TCCSUPPi after transmission electron microscope (TEM) analysis. Furthermore, TCCSUPPi cells developed an anti-viral state by producing low level of interferon (IFN) beta and this was reduced by addition of anti-IFN beta antibody to the cells making it more sensitive to NDV re-infection. Conclusions: This study presents the successful development of NDV persistently infected TCCSUP bladder cancer cells (TCCSUPPi) in vitro. The infection of TCCSUP with NDV AF2240 produced a lytic crisis in which majority of the cells were killed by day 5 and only small numbers of cells survived the infection and later developed into persistently infected (PI) cells. RT-PCR and TEM analyses revealed that TCCSUP cells are persistently infected with NDV. Whole transcriptome (RNA-seq) experiments are underway to elucidate the mechanism involved in development of persistence infection in TCCSUP cells and identify possible molecular network associated with it. Finding from this study will help in facilitating the use of NDV for treatment of bladder cancer in the clinic.

5:05 PM-5:10 PM
SCS Flash-talk: Pyrimethamine compromised Plasmodium falciparum mutant dihydrofolate reductase (DHFR) proteins reveal residue network differences underlying drug resistance in the parasite
Room: Delhi (Ground Floor)
  • Arnold Amusengeri, Rhodes University, South Africa
  • Rolland Bantar Tata, Rhodes University, South Africa
  • Ozlem Tastan Bishop, Rhodes University, South Africa

Presentation Overview: Show

Malaria remains a public health challenge with high global prevalence and death rates. Among the five Plasmodium species causing human malaria, Plasmodium falciparum is responsible for the most devastating form of the disease. Crucial to Plasmodium parasite replication is the enzyme dihydrofolate reductase (DHFR). DHFR is involved in folate metabolism, responsible for generating the parasite DNA base, deoxythymidine monophosphate (dTMP). DHFR has long been validated as an antimalarial drug target. However, widespread resistance of DHFR to known therapeutic agents has been reported, due to drug induced active site mutations. In this work, we aim to understand the effects of different combinations of four-point mutations within the active site of Plasmodium falciparum DHFR, responsible for resistance to the approved antimalarial drug, pyrimethamine.
Using homology modelling technique, missing residues were added to existing Plasmodium falciparum DHFR crystal structure. Unavailable mutant structures were modeled using wild type structure as template. Molecular docking technique was then used to dock pyrimethamine into wild type and mutated S108N, N51I+S108N, C59R+S108N, N51I+C59R+S108N, C59R+S108N+I164L and N51I+C59R+S108N+I164L DHFR homology models. Subsequent all-atom molecular dynamics simulations (MD) and binding free energy computations were performed. It was discovered that, mutations marginally influence protein-ligand stability, while significantly aggravating pyrimethamine’s binding affinity. Next, dynamic residue network analysis (DRN) was used to determine the impact of mutations on communication dispositions of DHFR residues. Relative to wild type, specific mutated models demonstrated non-native connectivity patterns. Furthermore, presence of pyrimethamine resulted in unique network changes in pyrimethamine susceptible wild type compared to mutants. Such unique differences are suggestive of compromised information flow in pyrimethamine bound wild type and could be further explored in designing novel antimalarial drugs.