Posters - Schedules
Posters Home

View Posters By Category

Monday, July 24, between 18:00 CEST and 19:00 CEST
Tuesday, July 25, between 18:00 CEST and 19:00 CEST
Session A Poster Set-up and Dismantle
Session A Posters set up:
Monday, July 24, between 08:00 CEST and 08:45 CEST
Session A Posters dismantle:
Monday, July 24, at 19:00 CEST
Session B Poster Set-up and Dismantle
Session B Posters set up:
Tuesday, July 25, between 08:00 CEST and 08:45 CEST
Session B Posters dismantle:
Tuesday, July 25, at 19:00 CEST
Wednesday, July 26, between 18:00 CEST and 19:00 CEST
Session C Poster Set-up and Dismantle
Session C Posters set up:
Wednesday, July 26,between 08:00 CEST and 08:45 CEST
Session C Posters dismantle:
Wednesday, July 26, at 19:00 CEST
Virtual
C-091: Comprehensive review of tumoral sample simulators: building a realistic gold standard for somatic variant calling validation
Track: General Computational Biology
  • Francesca Longhin, University of Padova, Italy
  • Enidia Hazizaj, University of Padova, Italy
  • Giacomo Baruzzo, University of Padova, Italy
  • Barbara Di Camillo, University of Padova, Italy


Presentation Overview: Show

Bioinformatic pipelines for variant calling have recently undergone dramatic improvements given the decreasing costs of next-generation sequencing experiments. However, variant discovery in tumoral samples is hindered by the great variety of cancer types, high tumoral heterogeneity, and unpredictability of sequencing errors. A fully-characterized validation dataset for somatic variant calling, that takes into account all these aspects, is still missing. In this work, we performed an extensive review of nine somatic sample simulators (Synggen, SVEngine, BAMSurgeon, VarSim, Xome-Blender, tHapMix, Pysim-sv, SCNVSim, and HeteroGenesis), evaluating their ability to control variant features such as type, number, position, length, content and zygosity, and tumoral features, such as clonality, to learn variant and error profiles from real data, and to retrieve all files needed for variant calling validation. No individual simulator was able to provide the user full control over both variant features and tumoral features, together with adequate modeling of in-silico sequencing. However, Synngen, with its ad-hoc built-in read simulator that combines three different error models, perfectly emulates the variability of technical noise in real sequencing data, and SVEngine provides the most complete framework for simulating biological variability of tumoral samples, allowing the user to define all variant features for each individual variant.

C-092: GraphST: A novel graph self-supervised contrastive learning method for spatially informed clustering, integration, and deconvolution of spatial transcriptomics
Track: General Computational Biology
  • Jinmiao Chen, Singapore Immunology Network, A * STAR, Singapore, Singapore


Presentation Overview: Show

Advances in spatial transcriptomics technologies have enabled the gene expression profiling of tissues while retaining its spatial context. Effective exploitation of this data combination requires spatially informed analysis tools to perform three key tasks, spatial clustering, multi-sample integration, and cell type deconvolution. Here, we present GraphST, a novel graph self-supervised contrastive learning method that incorporates spatial location information and gene expression profiles to accomplish all three tasks in a streamlined process while outperforming existing methods in each task. GraphST combines graph neural networks with self-supervised contrastive learning to learn informative and discriminative spot representations by minimizing the embedding distance between spatially adjacent spots and vice versa. With GraphST, we achieved 10% higher clustering accuracy on multiple datasets than competing methods, and better delineated the fine-grained structures in tissues such as the brain and embryo. Moreover, GraphST is the only method that can jointly analyze multiple tissue slices in both vertical and horizontal integration while correcting for batch effects. Lastly, compared to other methods, GraphST’s cell type deconvolution achieved higher accuracy on simulated data. On experimentally acquired data, it better captured spatial niches such as lymph node germinal centers and exhausted tumor infiltrating T cells in breast tumor tissue.

C-093: Pharmaceutical patent landscaping: A novel approach to understand patents from the drug discovery perspective
Track: General Computational Biology
  • Yojana Gadiya, University of Bonn, Germany
  • Philip Gribbon, Fraunhofer ITMP, Germany
  • Martin Hofmann-Apititus, Fraunhofer SCAI, Germany
  • Andrea Zaliani, Fraunhofer ITMP, Germany


Presentation Overview: Show

Patents play a crucial role in the drug discovery process by providing legal protection for discoveries and incentivising investments in research and development. By identifying patterns within patent data resources, researchers can gain insight into the market trends and priorities of the pharmaceutical industries, as well as provide additional perspectives on more fundamental aspects such as the emergence of potential new drug targets. In this paper, we used the PEMT to integrate and analyse patent literature for rare diseases (RD) and Alzheimer's disease (AD). This is followed by a systematic review of the underlying patent landscape to decipher trends and applications in patents. We start by discussing organisations involved in R&D in AD and RD. This allows us to gain an understanding of the importance of AD and RD from specific organisational perspectives. Next, we analysed the historical focus of patents for therapeutic targets and correlated them with market scenarios allowing the identification of prominent targets for a disease. Lastly, we identified repurposed drugs within the two diseases with the help of patents. The study demonstrates the expanded applicability of patent documents from legal to drug discovery, design, and research, thus, providing a valuable resource for future drug discovery efforts.

C-094: Detecting the feature of human branch point se-quence(BPS) using an integrated prediction model
Track: General Computational Biology
  • Dianjing Guo, The Chinese University of Hong Kong, Hong Kong
  • Yong Cao, Harbin Institute of Technology Shenzhen Graduate School, China


Presentation Overview: Show

In higher eukaryotes, pre-mRNA splicing is a set of reactions catalyzed by the spliceosome, a complex consisting of small nuclear ribonucleoproteins (U1, U2, U5 and U6 snRNPs). The importance of splicing is illustrated by the fact that 50% of the reported human genetic diseases arise from disruption of splicing by mutations in the splicing sites or in the cis-acting splicing regulatory sites. In yeast, it is easy to identify the BPS because it is a nearly invariant UACUAAC sequence with the branch point adenosine (BPA) being the sixth nucleotide, which is exactly complementary to GUAGUA in U2 snRNP. However, BPS characterization in mammal has been a far more complicated task since BPS is highly variable. In this paper, we propose a novel computational framework intergrating candidate BPS and PPT for BPS prediction. A novel scoring system by integrating the scores of BPS and PPT sequence was developed to predict the BPS. We demonstrate that our methods outperformed previously published methods. Compared to the SVM method, this new method can be easily applied to other mammals and predict the BPS without the “TNA" structure.

C-095: Constrained disorder principle-based second-generation algorithms implement quantified variability signatures to improve the function of complex systems
Track: General Computational Biology
  • Yaron Ilan, Hebrew University of Jerusalem, Israel
  • Hillel Lehman, Hebrew University of Jerusalem, Israel
  • Noa Hurvitz, Hebrew University of Jerusalem, Israel
  • Tal Sigawi, Hebrew University of Jerusalem, Israel


Presentation Overview: Show

Motivation: Improving the efficacy and overcoming the malfunctions of systems are significant chal-lenges. Variability characterizes all levels of complex biological systems. We reviewed the relevant publications and described a method for improving the systems' function.
Results: The constrained disorder principle (CDP) defines the function of living systems based on their degree of variability. Per the CDP, the boundaries of a system define its function and efficiency. The present paper aims to describe the role of variability in biological systems and the generation of CDP-based second-generation artificial intelligence (AI) algorithms designed to improve effective-ness and correct malfunctions of biological organisms by focusing on implementing personalized variability signatures. The paper describes some of the challenges of first-generation AI systems, focusing on the three steps process of establishing the second-generation platforms comprising: the use of a pseudorandom number generator in an open-loop system, implementing variability based on feedback in a closed-loop system, and quantifying variability signatures in a personalized way for improving algorithm' output. Examples of its use in humans are provided. The CDP provides a plat-form for improving disturbed systems' functions using second-generation AI systems.

C-096: Pancancer transcriptomic profiling identifies key markers of PANoptosis as therapeutic targets
Track: General Computational Biology
  • Raghvendra Mall, Biotechnology Research Center, Technology Innovation Institute, P.O. Box 9639, United Arab Emirates, United Arab Emirates
  • Ratnakar Bynigeri, St Jude Children's Research Hospital, United States
  • Rajendra Karki, St Jude Children's Research Hospital, United States
  • Rk Subbarao Mallireddi, St Jude Children's Research Hospital, United States
  • Bhesh Raj Sharma, St. Jude Children's Research Hospital, United States
  • Thirumala-Devi Kanneganti, St. Jude Children's Research Hospital, United States


Presentation Overview: Show

Resistance to programmed cell death (PCD) is hallmark of cancer. While some PCD components are prognostic in cancer, the roles of many molecules are masked by redundancies and crosstalks between PCD pathways, impeding the development of targeted therapeutics. Recent studies characterizing these redundancies have identified PANoptosis, an innate immune-mediated inflammatory PCD pathway that integrates components from other PCD pathways. Here, we designed a systematic computational framework to determine the pancancer clinical significance of PANoptosis and identify targetable biomarkers. We found that high expression of PANoptosis genes was detrimental in low-grade-gliomas (LGG) and kidney renal cell carcinoma (KIRC). ZBP1, ADAR, CASP2, CASP3, CASP4, CASP8 and GSDMD expression consistently had negative effects on prognosis in LGG across multiple survival models, while AIM2, CASP3, CASP4 and TNFRSF10 expression had negative effects for KIRC. Conversely, high expression of PANoptosis genes was beneficial in skin cutaneous melanoma (SKCM), with ZBP1, NLRP1, CASP8 and GSDMD expression consistently having positive prognostic effects. As therapeutic proof-of-concept, we treated melanoma cells with combination therapy that activates ZBP1 and showed this treatment induced PANoptosis. Overall, through our systematic framework, we identified and validated key innate immune biomarkers which can be targeted to improve patient outcomes in cancers.

C-097: iCSDB: an integrated database of CRISPR screens
Track: General Computational Biology
  • Minseo Kim, Korea Bioinformation Center, South Korea
  • Insu Jang, Korea Bioinformation Center, South Korea
  • Byungwook Lee, Korea Bioinformation Center, South Korea


Presentation Overview: Show

High-throughput screening based on CRISPR-Cas9 libraries has become an attractive and powerful technique to identify target genes for functional studies. However, accessibility of public data is limited due to the lack of user-friendly utilities and up-to-date resources covering experiments from third parties. Here, we describe iCSDB, an integrated database of CRISPR screening experiments using human cell lines. We compiled two major sources of CRISPR-Cas9 screening: the DepMap portal and BioGRID ORCS. DepMap portal itself is an integrated database that includes three large-scale projects of CRISPR screening. We additionally aggregated CRISPR screens from BioGRID ORCS that is a collection of screening results from PubMed articles. Currently, iCSDB contains 1375 genome-wide screens across 976 human cell lines, covering 28 tissues and 70 cancer types. Importantly, the batch effects from different CRISPR libraries were removed and the screening scores were converted into a single metric to estimate the knockout efficiency. Clinical and molecular information were also integrated to help users to select cell lines of interest readily. Furthermore, we have implemented various interactive tools and viewers to facilitate users to choose, examine and compare the screen results both at the gene and guide RNA levels. iCSDB is available at https://www.kobic.re.kr/icsdb/.

C-098: Y chromosome painting assays with OligoY pipeline
Track: General Computational Biology
  • Isabela Almeida, QIMR Berghofer, Australia
  • Henry Bonilla, IB/USP, Brazil
  • Mara Pinheiro, IB/USP, Brazil
  • Antonio B. Carvalho, UFRJ, Brazil
  • Maria D. Vibranovski, IB/USP, Brazil


Presentation Overview: Show

The heterochromatic and highly repetitive state of the Y chromosome leads to multiple difficulties when assembling its scaffolds and contigs, resulting in a lack of final assembled sequences for it. In Drosophila melanogaster, the Y chromosome has an estimated size of 41 Mb of repeat-rich sequences, but only 10% of them are assembled in the most recent genome release. In contrast, the protocol for designing probes used in full chromosome fluorescent labelling experiments does not include repetitive sequences to avoid off-target hybridization, resulting in <1500 oligopaint probes for this Y, a value at least 10x smaller when compared to the other chromosomes of the same species. Here we present OligoY, a pipeline that allows the design of oligopaint probes for the Y chromosome of any specie. While using open-source tools in Bioinformatics, OligoY guarantees the user the autonomy to choose parameters and effectively uses repetitive sequences to design probes that are exclusive to the target chromosome, thus maximising overall efficiency of cytogenetic experiments. After extensive tests and validations in silico and in situ, we verified that the application of the developed pipeline, OligoY, allows staining the Y chromosome without generating off-target signal, despite using repetitive sequences for oligopaint probe design.

C-099: Towards Improved Data Sharing in Food Research: Developing and Standardizing a Data Model
Track: General Computational Biology
  • Hyemi Shin, Korea Food Research Institute, South Korea
  • Jungmin Park, Korea Food Research Institute, South Korea
  • Gitaek An, Korea Food Research Institute, South Korea
  • Hyein Seo, Korea Food Research Institute, South Korea


Presentation Overview: Show

Although the value and importance of data is increasingly recognized, data generated in the research process is difficult to share and collaborate on due to the different ways researchers collect and manage data and the lack of standardized data models. In particular, the food field produces a wide variety of research data for each field including processing, safety, and functionality. To solve this, we conducted a study to analyze and standardize food research data formats. In this study, we developed a data management plan format and collected and integrated metadata from 20 research projects to identify data types. We categorized the collected data into sample data and result data, and defined data model names for each data type. The essential elements of the selected data models were identified through interviews with food research data experts. We selected 12 and 15 data model names to group sample data and result data, respectively.
Developing a standardized data model can increase the accuracy and consistency of data and facilitate data sharing, reuse, and integration across different platforms and systems. This facilitates preliminary research utilizing existing data and reduces duplicate production of data, ultimately reducing the time and cost of food research.

C-100: Inferring Sex-Specific Genetic Signal in Hypertension by Gene-Based Association Methods on UK-Biobank Data
Track: General Computational Biology
  • Roei Zucker, hebrew university of jerusalem, Israel
  • Michael Kovalerchik, Hebrew University of Jerusalem, Israel
  • Michal Linial, The Hebrew University of Jerusalem, Israel


Presentation Overview: Show

Hypertension is a polygenic disease that affects over 1.2 billion adults aged 30–79 worldwide. It is a major risk factor for renal, cerebrovascular, and cardiovascular diseases. The heritability of hypertension is estimated to be high; nevertheless, the understanding of the underlying mechanisms remains scarce and incomplete. Using a novel method called PWAS (proteome-wide association study) on participants from the UK Biobank (UKB), we discovered 70 statistically significant associated genes, most of which failed to reach significance by the routine GWAS, which is variant-based. Our findings were validated against independent cohorts, including the Finnish Biobank, and confirmed a substantial fraction of the PWAS hypertension-associated genes. The gene-based analyses that were performed on both sexes separately revealed a sex-dependent genetic signal with a stronger component associated with females. Analysis of the measurements for systolic and diastolic blood pressure for the entire UKB cohort confirmed the dominant genetic contribution for females. In this study, we will demonstrate the advantage of applying gene-based association methods over the classical GWAS in interpretability and in identifying sex-specific genetic signals as a lead towards mechanistic understanding of hypertension and related phenotypes.

C-101: Proteome-wide association study uncovers sex-dependent genetic effects in hypertension
Track: General Computational Biology
  • Michal Linial, The Hebrew University of Jerusalem, Israel
  • Roei Zucker, The Hebrew University of Jerusalem, Israel
  • Michael Kovalerchik, The Hebrew University of Jerusalemm, Israel


Presentation Overview: Show

Hypertension is a polygenic disease that affects over 1.2 billion adults worldwide. It is a major risk factor for renal, cerebrovascular, and cardiovascular diseases. The understanding of the underlying mechanisms remains scarce and incomplete. This study covered European ancestry from the UK Biobank, with 74,090 cases diagnosed with essential (primary) hypertension and 200,734 controls. We compared the findings from large-scale GWAS to the gene-based method of proteome-wide association studies (PWAS). PWAS is based on a machine-learning-trained model to assess the impact of any variant on protein functionality. Applying PWAS in a case-control setting, 70 statistically significant associated genes were identified, most of which failed to reach significance in variant-based GWAS. A third of the PWAS-associated genes were replicated in independent cohorts. Gene-based analyses that were performed on females and males revealed sex-dependent genetics with a stronger component associated with females. Analysis of systolic and diastolic blood pressure measurements confirms a strong female's genetic effects. We demonstrated that gene-based approaches provide insight into the biology of hypertension with top-ranked significant genes that are involved in cellular immunity. We conclude that studying hypertension and blood pressure via gene-based association methods improves interpretability and exposes sex-dependent genetic effects, which enhances clinical utility.

C-102: Single-Cell Deconvolution for Spatial Transcriptomics in Cardiorenal Disease
Track: General Computational Biology
  • Alban Laus Obel Slabowska, Department of Health Technology, Section for Bioinformatics, Technical University of Denmark, DTU, Denmark
  • Charles Pyke, Novo Nordisk A/S, Måløv, Denmark, Denmark
  • Leon Eyrich Jessen, Department of Health Technology, Section for Bioinformatics, Technical University of Denmark, Denmark
  • Simon Baumgart, Novo Nordisk A/S, Måløv, Denmark, Denmark
  • Vivek Das, Novo Nordisk A/S, Måløv, Denmark, Denmark


Presentation Overview: Show

High-throughput RNA-sequencing technologies that provide spatial resolution of transcripts, popularly known as spatial transcriptomics, are on the rise. This technology seems highly promising and has an untapped potential for expression-driven discovery in development and disease. Nonetheless, it also faces the central challenge of mixed cell type signals due to limitations in resolution. This is apparent in sequencing-based 10X Visium where slides have larger spots of 55 μm. This mixed transcriptional signal can pose inferential problems; however, it can theoretically be deconvoluted into underlying cell types. To this end, we developed a systematic deconvolution framework and performed benchmarking in previously unvalidated healthy and disease samples from human coronary arterial and kidney disease. We used: Cell2location, RCTD and spatialDWLS that have previously been shown to perform well in mouse brain and simulated data (1). We show that all three methods are capable of deconvoluting verifiable cell types when benchmarked against expert provided ground truth based on accuracy scores (0.7-0.73). Kidney podocyte cells and major populations of macrophages, smooth muscle cells and fibroblasts in arteries are all deconvoluted with a high level of agreement. Bayesian Cell2location is more computationally demanding, however it provides quality solutions, when less reference data is available.

C-103: hECA: Human Ensemble Cell Atlas as a Virtual Body for “In Data” Cellular Experiments
Track: General Computational Biology
  • Sijie Chen, Tsinghua University, China
  • Lei Wei, Tsinghua University, China
  • Xuegong Zhang, Tsinghua University, China


Presentation Overview: Show

Profiling the molecular features of all cells with their anatomical and functional attributes is essential for understanding the human body in health and diseases. Scientists have been enthusiastic in building such atlases of human cells using single-cell omics technologies. The community has conducted more and more single-cell studies with the rapid development and popularization single-cell RNA-sequencing technologies. Tremendous amount of single-cell data has been accumulating in the public domain. This suggests the possibility of building cell atlases by assembling such “shot-gun” data in scattered publications. Cell atlas assembly faces several major challenges comparing with the shot-gun assembly of the human genome. We proposed a unified information framework for assembling atlases and built the first cell-centric human Ensemble Cell Atlas (hECA) assembled from scattered data. We developed the “in data” cell sorting scheme that allows extracting cells using logic formula from the hECA as a “virtual human body” to investigate scientific questions involving multiple organs and cell types. We also developed a multidimensional coordinate system UniCoord for different physical and biological attributes of cells by adopting a supervised variational autoencoder (VAE) neural network model, and trained it on hECA to make it represent the diversity of healthy human cells.

C-104: Identifying COVID-19 Prone Candidate Biomarkers for Childhood Leukemia
Track: General Computational Biology
  • Judy Bai, Greenhills School, United States
  • Qing Li, University of Michigan, United States


Presentation Overview: Show

During the pandemic, children were less susceptible to contracting COVID-19, and studies have shown that children diagnosed with Leukemia often had prior infection of COVID-19. Recent studies showed that MDA5 (encoded by IFIH1) is responsible for children’s increased immunity to COVID-19. Our goal is to test the hypothesis that IFIH1 and its regulating miRNAs are biomarkers linked to AML in children. We also wanted to identify candidate genes that protect children from viral infections and leukemia development. Because miRNAs are important regulators of gene expression, we investigated our project goal in the context of miRNA targeting mechanisms.
Through checking TarBase for IFIH1 then searching for genes regulated by its targeting miRNAs, we identified two significant miRNAs, hsa-196a-5p and hsa-196b-5p, and 51 of its targeted genes that have high expression (>500 TPM) reported in TCGA AML RNA-Seq samples. Protein-Protein Interaction analysis with the STRING database indicated that two genes, STAT3 and MAP3K1, directly interact with IFIH1. Our DAVID/KEGG pathway analysis results further revealed that the three candidate genes (IFIH1, STAT3, MAP3K1) were also involved in Hepatitis B (p-value < 0.0004). Our research results for three genes in AML samples indicate that IFIH1 is likely a candidate biomarker for AML.

C-105: Unlocking the Secrets of Celiac Disease: India's All-Inclusive Database
Track: General Computational Biology
  • Sebatina Louis, SASTRA, India
  • Ragothaman Yennamalli, SASTRA, India


Presentation Overview: Show

The One-Stop Database is an initiative that aims to create a comprehensive resource for researchers and medical professionals interested in celiac disease. The database includes updated information on genes, protein sequences and structures, -omics data, SNP data and clinical trial information associated with celiac disease. Furthermore, the integration of the BLAST search feature will enable users to query their sequences for similarity with celiac-related proteins. By providing accurate and up-to-date information on celiac disease, the database will be a valuable resource for the scientific and medical communities, with the ultimate goal of advancing the discovery of novel therapeutics and improving our knowledge of the disease. By creating this resource, we aim to bridge the knowledge gap in celiac disease and ultimately improve patient diagnosis and treatment options. Additionally, we have developed SVM and LMST models using machine-learning techniques to classify celiac-inducing and non-inducing proteins. The resource can be accessed at: https://celiacindia.in/

C-106: 3DIV update for 2021: a comprehensive resource of 3D genome and 3D cancer genome
Track: General Computational Biology
  • Jinhyuk Choi, Korea Research Institute of Bioscience and Biotechnology, South Korea
  • Insu Jang, Korea Research Institute of Bioscience and Biotechnology, South Korea
  • Byungwook Lee, Korea Research Institute of Bioscience and Biotechnology, South Korea


Presentation Overview: Show

Three-dimensional (3D) genome organization is tightly coupled with gene regulation in various biological processes and diseases. In cancer, various types of large-scale genomic rearrangements can disrupt the 3D genome, leading to oncogenic gene expression. However, unraveling the pathogenicity of the 3D cancer genome remains a challenge since closer examinations have been greatly limited due to the lack of appropriate tools specialized for disorganized higher-order chromatin structure. Here, we updated a 3D-genome Interaction Viewer and database named 3DIV by uniformly processing ∼230 billion raw Hi-C reads to expand our contents to the 3D cancer genome. The updates of 3DIV are listed as follows: (i) the collection of 401 samples including 220 cancer cell line/tumor Hi-C data, 153 normal cell line/tissue Hi-C data, and 28 promoter capture Hi-C data, (ii) the live interactive manipulation of the 3D cancer genome to simulate the impact of structural variations and (iii) the reconstruction of Hi-C contact maps by user-defined chromosome order to investigate the 3D genome of the complex genomic rearrangement. In summary, the updated 3DIV will be the most comprehensive resource to explore the gene regulatory effects of both the normal and cancer 3D genome. ‘3DIV’ is freely available at http://3div.kr.

C-107: The Proteomes that Feed the World
Track: General Computational Biology
  • Sophia Hein, Elite Network of Bavaria, School of Life Sciences, Technical University of Munich, Germany
  • Bernhard Kuster, Elite Network of Bavaria, School of Life Sciences, Technical University of Munich, Germany
  • Brigitte Poppenberger, Elite Network of Bavaria, School of Life Sciences, Technical University of Munich, Germany
  • Ralph Hückelhoven, Elite Network of Bavaria, School of Life Sciences, Technical University of Munich, Germany
  • Caroline Gutjahr, Elite Network of Bavaria, School of Life Sciences, Technical University of Munich, Germany
  • Mathias Wilhelm, Elite Network of Bavaria, School of Life Sciences, Technical University of Munich, Germany
  • Dimitri Frischmann, Elite Network of Bavaria, School of Life Sciences, Technical University of Munich, Germany
  • Christina Ludwig, Elite Network of Bavaria, School of Life Sciences, Technical University of Munich, Germany
  • Josch Pauling, Elite Network of Bavaria, School of Life Sciences, Technical University of Munich, Germany
  • Viktoriya Avramova, Elite Network of Bavaria, School of Life Sciences, Technical University of Munich, Germany
  • Chris-Carolin Schön, Elite Network of Bavaria, School of Life Sciences, Technical University of Munich, Germany
  • Claus Schwechheimer, Elite Network of Bavaria, School of Life Sciences, Technical University of Munich, Germany
  • Stephanie Wilhelm, Elite Network of Bavaria, School of Life Sciences, Technical University of Munich, Germany
  • Corinna Dawid, Elite Network of Bavaria, School of Life Sciences, Technical University of Munich, Germany
  • Gio Tsiklauri, Elite Network of Bavaria, School of Life Sciences, Technical University of Munich, Germany
  • Qussai Abbas, Elite Network of Bavaria, School of Life Sciences, Technical University of Munich, Germany
  • Jiuyue Pan, Elite Network of Bavaria, School of Life Sciences, Technical University of Munich, Germany
  • Armin Soleymaniniya, Elite Network of Bavaria, School of Life Sciences, Technical University of Munich, Germany
  • Cemil Can Saylan, Elite Network of Bavaria, School of Life Sciences, Technical University of Munich, Germany
  • Mario Picciani, Elite Network of Bavaria, School of Life Sciences, Technical University of Munich, Germany
  • Genc Haljiti, Elite Network of Bavaria, School of Life Sciences, Technical University of Munich, Germany
  • Ezgi Aydin, Elite Network of Bavaria, School of Life Sciences, Technical University of Munich, Germany
  • Veronica Ramirez, Elite Network of Bavaria, School of Life Sciences, Technical University of Munich, Germany
  • Guido Giordano, Elite Network of Bavaria, School of Life Sciences, Technical University of Munich, Germany
  • Paula Andrade Galan, Elite Network of Bavaria, School of Life Sciences, Technical University of Munich, Germany
  • Lukas Würstl, Elite Network of Bavaria, School of Life Sciences, Technical University of Munich, Germany
  • Sebastian Urzinger, Elite Network of Bavaria, School of Life Sciences, Technical University of Munich, Germany
  • Andrea Piller, Elite Network of Bavaria, School of Life Sciences, Technical University of Munich, Germany
  • Patrick Röhrl, Elite Network of Bavaria, School of Life Sciences, Technical University of Munich, Germany
  • Sarah Brajkovic, Elite Network of Bavaria, School of Life Sciences, Technical University of Munich, Germany


Presentation Overview: Show

Plants form the foundation of the nutrition that sustains life on Earth. To meet the increasing needs of the human population and tackle climate change, crops can provide protein-rich alternatives to animal-based protein. However, little is known about crop proteomes, which control every aspect of plant life. To address this gap, we launched the international doctoral program "The Proteomes that Feed the World" funded by the Elite Network of Bavaria. We aim to create a Crop Proteome Atlas by charting the proteomes of the 100 most vital crop plants for human nutrition. We established a robust protocol for analyzing plant tissues, and the resulting data will be publicly accessible. We also provide detailed information on our processing pipelines, along with an extensive update on ProteomicsDB. Our dataset serves as a valuable resource for developing tools in plant biology and provides new biological insights, including better genome annotations, cross-species analysis, homology inference, and protein function prediction. Our interdisciplinary team, comprising 16 PhD students and 12 principal investigators, supported by over 30 international partners, has optimized the project workflows. We seek partners to leverage the potential of this data. This Atlas is the first of its kind for many important crop plants.

C-108: Genetic Differentiation of Citrus Using DArTseq and Development of a Rapid Species HRM Identification Kit
Track: General Computational Biology
  • Karolinni Bianchi Britto, Federal University of Espirito Santo, Brazil
  • Francine Alves Nogueira de Almeida, Federal University of Espirito Santo, Brazil
  • Pablo Viana Oliveira, Federal University of Espirito Santo, Brazil
  • Marianna Abdalla Prata Guimarães, Instituto Capixaba de Pesquisa, Assistência Técnica e Extensão Rural, Brazil
  • Greiciane Paneto, Federal University of Espirito Santo, Brazil
  • José Aires Ventura, Instituto Capixaba de Pesquisa, Assistência Técnica e Extensão Rural, Brazil


Presentation Overview: Show

Citrus plants are a diverse group that belongs to the Rutaceae family. Among them, the genus Citrus is highly valued due to its economic and nutritional value. However, identifying the species and variety of seedlings acquired by growers can be challenging, as the leaves are similar. To address this issue, we propose using DArTseq technology to genetically analyze citrus samples and create a rapid species identification kit using the HRM technique. For this, 94 citrus samples from the state of Espirito Santo /Brazil were sent to the Service of Genetic Analysis for Agriculture (SAGA) in Mexico for analysis. The results showed 64,442 SNP markers and 69,963 SilicoDArT markers. After data filtering, the number of SNPs was reduced to 9,073 and the number of SilicoDArT reduced to 3,496. Their polymorphic information content (PIC) was 0.24 and 0.28, respectively. Eight clusters were observed in the dendrogram generated by separating nine citrus species. SNPs are being selected using RStudio and Biopython software for use in the HRM. Among the nine chromosomes, chromosome 2 presented the most SNPs, indicating the need for deeper analysis. We are continuing the analysis, and the expected results are promising.

C-109: omnideconv: unified next-generation deconvolution guided by single-cell RNA sequencing
Track: General Computational Biology
  • Lorenzo Merotto, Universität Innsbruck, Faculty of Biology, Department of Molecular Biology, Digital Science Center (DiSC), Austria
  • Alexander Dietrich, Chair of Experimental Bioinformatics, Technical University of Munich (TUM), Freising, Germany, Germany
  • Konstantin Pelz, Chair of Experimental Bioinformatics, Technical University of Munich (TUM), Freising, Germany, Germany
  • Katharina Reinisch, Chair of Experimental Bioinformatics, Technical University of Munich (TUM), Freising, Germany, Germany
  • Constantin Zackl, Universität Innsbruck, Faculty of Biology, Department of Molecular Biology, Digital Science Center (DiSC), Austria
  • Federico Marini, Institute of Medical Biostatistics, Epidemiology and Informatics, Johannes Gutenberg University Mainz, Mainz, Germany, Germany
  • Gregor Sturm, Biocenter, Institute of Bioinformatics, Medical University of Innsbruck, 6020 Innsbruck, Austria, Austria
  • Markus List, Chair of Experimental Bioinformatics, Technical University of Munich (TUM), Freising, Germany, Germany
  • Francesca Finotello, Universität Innsbruck, Faculty of Biology, Department of Molecular Biology, Digital Science Center (DiSC), Austria


Presentation Overview: Show

Understanding the cellular composition of complex tissues can help in uncovering disease mechanisms, treatment effects, and biological processes. Cell-type deconvolution methods quantify cellular composition from bulk RNA sequencing data using cell-type-specific transcriptomic signatures. While first-generation deconvolution methods are based on predefined signatures, second-generation deconvolution methods can directly learn these signatures from single-cell RNA sequencing data for virtually any cell type. However, differences in programming language, inputs, semantics, and workflows of these methods complicate their unified execution, and validating them poses additional challenges.
To address these issues, the omnideconv ecosystem was developed. It includes two R packages, omnideconv and SimBu, and a web app, DeconvExplorer, that can facilitate the systematic benchmarking of second-generation methods under different experimental conditions. The packages allow for the invocation of R and Python-based second-generation methods with single functions, and the simulation of pseudo-bulk RNA-seq datasets under different scenarios, respectively. Finally, DeconvExplorer provides a user-friendly web interface to analyze deconvolution results and signatures.
This framework makes second-generation deconvolution methods more accessible and streamlined and can aid in effectively utilizing large single-cell atlases. The omnideconv ecosystem is a novel resource that helps to benchmark second-generation methods and validate context-specific cell-type signatures.

C-110: Redistribution of mutation rates across chromosomal domains in human cancer genomes
Track: General Computational Biology
  • Marina Salvadores, IRB Barcelona, Spain
  • Fran Supek, IRB Barcelona, Spain


Presentation Overview: Show

Somatic mutations in human cells have an heterogeneous genomic distribution, with increased burden in late-replicating, heterochromatic domains. This regional mutation density (RMD) varies between tissues, in association with tissue-specific RT or chromatin organization. We hypothesized that the RMD additionally varies between individual tumors independently of the tissue. Here, we identified three tissue-independent global RMD signatures that describe mutation risk redistribution across megabase-sized domains in >4000 tumors. First, we identified an RMD redistribution preferentially affecting facultative heterochromatin, Polycomb-marked domains, enriched in the B1 subcompartment and in malleable Hi-C domains. This RMD signature strongly reflects recurrent patterns in plasticity in DNA RT and heterochromatin domains linked with a higher expression of cell cycle genes. Consistently, occurrence of this mutation redistribution pattern is associated with altered cell cycle control via loss of activity of the RB1 gene. Second, another independant global RMD signature was associated with loss-of-function of the TP53 pathway, mainly affecting the redistribution of mutation rates within late-RT regions. Our study highlights that RMDs at the domain scale are variable across tumors in a manner independent of tissue-of-origin, but associated with loss-of-function in cell cycle genes, which may trigger the local remodeling of heterochromatin, spatial chromatin contacts or the RT program.

C-111: Single-plant omics: Profiling individual plants in a field to identify processes affecting yield
Track: General Computational Biology
  • Michael Van de Voorde, Department of Plant Biotechnology and Bioinformatics, Ghent University; VIB Center for Plant Systems Biology, Belgium
  • Stijn Hawinkel, Department of Plant Biotechnology and Bioinformatics, Ghent University; VIB Center for Plant Systems Biology, Belgium
  • Sam De Meyer, Department of Plant Biotechnology and Bioinformatics, Ghent University; VIB Center for Plant Systems Biology, Belgium
  • Daniel Cruz, Department of Plant Biotechnology and Bioinformatics, Ghent University; VIB Center for Plant Systems Biology, Belgium
  • Ewout Crombez, Department of Plant Biotechnology and Bioinformatics, Ghent University; VIB Center for Plant Systems Biology, Belgium
  • Tom De Swaef, Plant Sciences Unit, Flanders Research Institute for Agriculture, Fisheries and Food (ILVO), Belgium
  • Peter Lootens, Plant Sciences Unit, Flanders Research Institute for Agriculture, Fisheries and Food (ILVO), Belgium
  • Hilde Nelissen, Department of Plant Biotechnology and Bioinformatics, Ghent University; VIB Center for Plant Systems Biology, Belgium
  • Isabel Roldán-Ruiz, Department of Plant Biotechnology and Bioinformatics, Ghent University; Plant Sciences Unit, ILVO, Belgium
  • Steven Maere, Department of Plant Biotechnology and Bioinformatics, Ghent University; VIB Center for Plant Systems Biology, Belgium


Presentation Overview: Show

Translating knowledge from lab to field is not straightforward because field conditions are very different from lab conditions. We are therefore developing a new strategy to study the molecular wiring of plant traits directly in the field, based on profiling of individual field-grown plants (single-plant omics). During a recent field trial, we profiled the autumnal rosette leaf transcriptome and a range of phenotypes (before winter and at time of harvest) of 192 plants of winter-type rapeseed variety Darmor, along with several environmental data layers at individual plant resolution such as microbiomes and soil nutrient profiles. To analyze this spatial multi-omics dataset we use unsupervised methods for integration of spatial omics data (MEFISTO) and supervised machine learning methods. The latter model plant phenotypes as function of other data layers such as autumnal gene expression, and identify features that potentially influence plant yield. Important features in our yield models include genes involved in vegetative to reproductive phase transition and floral transition, indicating that developmental processes in autumn influence final yield in summer. Conceptual similarity between single-plant and single-cell data allows us to apply methods from the single-cell field such as trajectory inference on our single-plant data to further unravel these developmental effects.

C-112: Characterization of recombinant adeno-associated virus gene therapy vectors and process- and product-related impurities with long-read sequencing
Track: General Computational Biology
  • Joseph Saelens, Pfizer, United States
  • Yu-Ting Chen, Pfizer, United States
  • Qinyu Sun, Pfizer, United States
  • Megan Leander, Pfizer, United States
  • Herbert Runnels, Pfizer, United States
  • Reiko Nakashima, Pfizer, United States


Presentation Overview: Show

Background: With the dramatic rise in clinical trials for recombinant adeno-associated virus (rAAV)-based gene therapies, there is increasing demand from regulatory agencies for more standardized and systematic approaches for nucleic acid characterization to mitigate vector toxicity. Single-molecule, real-time (SMRT) sequencing enables interrogation of rAAV genomes and packaged product- and process-related impurities at a single molecule level without fragmentation. This technology can thus address one of the remaining challenges in producing rAAV vectors, which is gaining an understanding of packaged impurities that may impact the efficacy and safety of rAAV vectors.

Method: We have developed a SMRT sequencing and computational workflow to characterize rAAV vectors and DNA impurities. Our approach recovers reads with low base calling accuracy and incorporates barcode scores to profile single-stranded and self-complementary rAAVs.

Results: This method identifies product- and process-related impurities, including truncated rAAV genomes, chimeras of rAAV genomes and residual plasmid and host cell genomic DNA. In addition, we found current recommendations restricting the analysis to High Fidelity (≥99% accuracy) reads with high quality barcode scores (≥80) skews estimations of intact rAAV genomes.

Conclusion: Pairing long read sequencing with the computational tools we have developed offers novel insights into rAAV genome integrity and impurities.

C-113: A web-based system for comparative multi-omic analyses
Track: General Computational Biology
  • Suyeon Wy, Konkuk University, South Korea
  • Daehong Kwon, Konkuk University, South Korea
  • Nayoung Park, Konkuk University, South Korea
  • Minji Gu, Konkuk University, South Korea
  • Jiyeong Ahn, Konkuk University, South Korea
  • Mikang Sim, Konkuk University, South Korea
  • Jeongmin Oh, Konkuk University, South Korea
  • Junyoung Kim, Konkuk University, South Korea
  • Jaebum Kim, Konkuk University, South Korea


Presentation Overview: Show

The accumulation of high-quality genome assemblies has facilitated a more accurate comparison of genomes among multiple species. Furthermore, the availability of various omic data has further extended the scope of such comparative studies to identify the consequences of multi-omic signatures and underlying mechanisms. When performing such comparative multi-omic analyses, the genome-wide comparison of multi-omic data and the visualization of the results are critical. In addition, the visualization needs to be efficient enough to handle a large volume of multi-omic data. However, there is still a lack of applications that fulfill such requirements. In this study, we developed a web-based system for comparative multi-omic analyses. Using the data generation pipelines in our system, users can easily (i) compare multiple genomes, (ii) produce the profiles of omic data, and (iii) perform integrative analyses using the profiles in a web interface. Users can also browse the analysis results in a web interface, which also helps discover genomic regions harboring interesting multi-omic signatures easily. The web interface works very efficiently because of intelligent indexing and multi-level data sampling. Our system will contribute to making the use of multi-omic data easier and more effective.

C-114: FoldMason: Comparative protein structure analysis in the era of next generation structure predictions
Track: General Computational Biology
  • Cameron Gilchrist, Seoul National University, South Korea
  • Martin Steinegger, Seoul National University, South Korea


Presentation Overview: Show

Understanding the evolution of proteins and their interactions with other molecules is critically important. Recent advancements in prediction of 3D protein structure with AlphaFold2 have revolutionized structural biology, enabling the exploration of the protein universe at a depth previously impossible as structure is conserved beyond the twilight zone of amino acids. Structural alignment, where evolutionarily related residues of multiple structures are grouped together, is the core of structural comparative analysis. However, with hundreds of millions (soon to be billions) of structures available, our current set of comparative alignment tools cannot scale to this enormous volume of data. Here we propose FoldMason, an alignment method capable of aligning huge sets of monomeric proteins. FoldMason is a progressive alignment tool built on top of Foldseek, our tool for rapid searches of massive protein structure databases. Foldseek utilises the 3D-interactions (3Di) alphabet, a novel structural alphabet based on tertiary interactions between neighbouring residues within proteins, to discretize structures, making them amenable to fast sequence alignment algorithms. FoldMason leverages this to construct multiple structure alignments of large protein datasets using a progressive alignment approach. Preliminary results on reference datasets show that FoldMason is orders of magnitude faster than gold-standard tools while maintaining comparable accuracy.

C-115: Facilitating the study of peptides sequences using machine learning strategies and bioinformatics tools
Track: General Computational Biology
  • Gabriel Cabas, Departamento de Ingeniería en Computación, Universidad de Magallanes, Chile
  • Alvaro Olivera-Nappa, Centre for Biotechnology and Bioengineering, Department of Chemical Engineering and Biotechnology, University of Chile, Chile
  • David Medina, Departamento de Ingeniería en Computación, Universidad de Magallanes, Chile


Presentation Overview: Show

Peptides are relevant in several biotechnology applications. These molecules have different biological activities, such as therapeutic, signalling, antimicrobial, and antitumoral. In particular, the peptides are attractive as therapeutic agents. New research has fostered the exponential increase of these molecules in common or specific databases. However, there needs to be more user-friendly tools to make up for the lack of bioinformatics or machine learning skills to study peptide sequences. In this work, we developed pepti-tools, a user-friendly web application tool that allows peptide analysis using bioinformatics and machine learning methods. From bioinformatics, we incorporate methods of phylogenetic analysis of sequences through alignments against databases and multiple sequence alignments. Besides, functional prediction methods (Gene Ontology/Pfam), secondary structure prediction, and structure search are incorporated. Pepti-tools use physicochemical properties, statistical property comparison, and analysis techniques to integrate sequence characterizers. From machine learning, Pepti-tools allows the elaboration of predictive models and pattern recognition, facilitating the exploration of algorithms, hyperparameters, and numerical representation. Finally, functional activity classification models, prediction of antiviral/HIV activity (IC50), solubility estimation, immunogenicity, and promiscuity probability have been enabled, proving to be a powerful and highly usable alternative to study peptide sequences without relying on bioinformatics and machine learning skills.

C-116: Leveraging the Genetic Correlation between Traits Improves the Detection of Epistasis in Genome-wide Association Studies
Track: General Computational Biology
  • Julian Stamp, Brown University, United States
  • Alan Denadel, Brown University, United States
  • Daniel Weinreich, Brown University, United States
  • Lorin Crawford, Microsoft Research New England, United States


Presentation Overview: Show

In this study, we present the ``multivariate MArginal ePIstasis Test'' (mvMAPIT) --- a multi-outcome generalization of a recently proposed epistatic detection method which seeks to detect marginal epistasis or the combined pairwise interaction effects between a given variant and all other variants. By searching for marginal epistatic effects, one can identify genetic variants that are involved in epistasis without the need to identify the exact partners with which the variants interact --- thus, potentially alleviating much of the statistical and computational burden associated with conventional explicit search-based methods. Our proposed mvMAPIT builds upon this strategy by taking advantage of correlation structure between traits to improve the identification of variants involved in epistasis. We formulate mvMAPIT as a multivariate linear mixed model and develop a multi-trait variance component estimation algorithm for efficient parameter inference and P-value computation. Together with reasonable model approximations, our proposed approach is scalable to moderately sized GWA studies. With simulations, we illustrate the benefits of mvMAPIT over univariate (or single-trait) epistatic mapping strategies. We also apply mvMAPIT framework to protein sequence data from two broadly neutralizing anti-influenza antibodies and approximately 2,000 heterogenous stock of mice from the Wellcome Trust Centre for Human Genetics.

C-117: Multi-omics workflow for the identification of discriminant markers associated with Trypanosoma cruzi populations
Track: General Computational Biology
  • Hayat Hage, BIOASTER, France
  • Amy Hesketh, BIOASTER, France
  • May Taha, BIOASTER, France
  • Yannick Charretier, BIOASTER, France
  • Audric Cologne, BIOASTER, France
  • Fanny Escudie, Drugs for Neglected Diseases Initiative (DNDi), Switzerland
  • Francisco Olmo, Department of Infection Biology, London School of Hygiene and Tropical Medicine (LSHTM), United Kingdom
  • Adrien Saliou, BIOASTER, France
  • Ségolène Arnoux, BIOASTER, France
  • Martin Taylor, Department of Infection Biology, London School of Hygiene and Tropical Medicine (LSHTM), United Kingdom
  • John Kelly, Department of Infection Biology, London School of Hygiene and Tropical Medicine (LSHTM), United Kingdom
  • Gilles Courtemanche, BIOASTER, France
  • Eric Chatelain, Drugs for Neglected Diseases Initiative (DNDi), Switzerland
  • Joséphine Abi Ghanem, BIOASTER, France


Presentation Overview: Show

Trypanosoma cruzi (T. cruzi), a kinetoplastid protozoan parasite, is the etiologic agents of Chagas disease that affects an estimated 8 million people worldwide, mainly in Latin America. Progress in developing improved treatments for Chagas disease is compromised by limitations in our knowledge of the mechanistic processes associated with the persistence of T. cruzi. During its life-cycle, the parasite undergoes changes in morphology, metabolism, and gene expression as it passes from the epimastigote replicative stage in the insect midgut to the metacyclic trypomastigote form, which infects humans. Trypomastigote progress in tissues where they become amastigotes. We developed a robust method for isolating populations of amastigotes parasites followed by integrated proteome/transcriptome profiling to identify discriminant markers/pathways associated with parasite dormancy (amastigote) vs replicating (epimastigote).

C-118: Intercellular extrachromosomal DNA copy number heterogeneity drives cancer cell state diversity
Track: General Computational Biology
  • Maja-Celine Stöber, Charité Berlin, Germany
  • Rocío Chamorro González, Charité Berlin, Germany
  • Anton G Henssen, Charité Berlin, Germany
  • Roland F Schwarz, University Hospital Cologne, Germany
  • Kerstin Haase, Charité Berlin, Germany


Presentation Overview: Show

Neuroblastoma is characterised by extensive inter- and intra-tumour genetic heterogeneity and varying clinical outcomes. One possible driver for this heterogeneity are extrachromosomal DNAs (ecDNA), which segregate independently to the daughter cells during cell division and can lead to rapid amplification of oncogenes. While ecDNA-mediated oncogene amplification has been shown to be associated with poor prognosis in many cancer entities, the effects of ecDNA copy number heterogeneity on intermediate phenotypes are still poorly understood.
Here, we leverage DNA and RNA sequencing data from the same single cells in cell lines and neuroblastoma patients to investigate these effects. We utilise ecDNA amplicon structures to determine precise ecDNA copy numbers and reveal extensive intercellular ecDNA copy number heterogeneity. We further provide direct evidence for the effects of this heterogeneity on gene expression of cargo genes, including MYCN and its downstream targets, and the overall transcriptional state of neuroblastoma cells.
These results highlight the potential for rapid adaptability of cellular states within a tumour cell population mediated by ecDNA copy number, emphasising the need for ecDNA-specific treatment strategies to tackle tumour formation and adaptation.

C-119: Bridging the Gap between Variant Genotyping and Scientists for Accelerated Innovations in Agriculture
Track: General Computational Biology
  • Rajesh Perianayagam, Karyosoft Inc., United States
  • Vinothraj Sekar, Karyosoft Inc., India
  • Ramesh Dharmaraj, Karyosoft Inc., United States


Presentation Overview: Show

Molecular genetics is the correlation of genotype and phenotype to discover important genomic regions. With the direct integration of NGS technologies into in-house workflow, genotyping has improved dramatically in terms of the number of genome wide variants such as SNP and Indel markers and thus the amount of genomic information. However, the current variant calling method is time consuming, labor intensive, no suitable mechanism for managing millions of SNPs and Indels in-house and more that delay innovations. At Karyosoft, we developed a cloud-based user-friendly platform Variants to circumvent these issues and reduced the time from the usual 28+ hours per sample with 30x data coverage to 4 hours/8 samples. Our Variants platform is 7 – 12 times faster, can reduce the cost by 12x – 7x and can save up to 168 days for 96 samples. Additionally, our cloud based Variant Mining Studio helps to manage and mine millions and millions of SNPs and Indels in seconds. Our platforms have a vast use in mutation discovery, direct genotyping, custom chip designing and amplicon sequencing. Above all, our user-friendly platform makes variant genotyping easy for scientists with any level of computational skills and empowers them to drive the innovations faster.

C-120: Bacterial scRNA-seq reveals heterogenetic plasmid-encoded β-lactamase CTX-M-65 variants mediate a bet-hedging resistance in Klebsiella pneumoniae
Track: General Computational Biology
  • Yan Jiang, Zhejiang University, China
  • Rui Weng, Zhejiang University, China
  • Xueqing Wu, Zhejiang University, China
  • Jiawei Wang, European Bioinformatics Institute, United Kingdom
  • Yunsong Yu, Zhejiang University, China


Presentation Overview: Show

Bacterial pathogens use so-called bet-hedging to switch between different states, improving their chances of developing multiple resistance mechanisms in fluctuating antibiotic conditions, particularly in nosocomial environments. However, the underlying mechanism is not yet fully investigated due to the limitations in methods that can explore bacterial heterogeneity at the sub-population level. Here, we utilized microbial single-cell RNA sequencing (Msc-RNA-seq), a high-throughput bacterial scRNA-seq technique, to profile multi-drug resistant Klebsiella pneumoniae populations at the single-cell level. Msc-RNA-seq employs random primers for in situ reverse transcription and droplets for DNA barcoding, allowing for high sensitivity and throughput. In our experimental scenarios, Msc-RNA-seq detected a median of approximately 700 genes per cell. Downstream scRNA-seq analysis revealed the heterogeneity in K. pneumoniae population under the sub-lethal ceftazidime/avibactam, and further confirmed our finding in clinical settings: the plasmid-encoded β-lactamase CTX-M-65 and its variant, CTX-M-249 showed a bet-hedging resistance against ceftazidime/avibactam and cefotaxime simultaneously. In addition, the workflow and framework we formulated and developed, including the experimental protocol and computational pipeline, will facilitate future discoveries in evolution of resistant bacteria, and beyond, promote bacterial population study at single-cell level.

C-121: Machine learning guides identification of virus antigen specificity based on deep T cell phenotypic profiles
Track: General Computational Biology
  • Florian Schmidt, ImmunoScape Pte Ltd, Singapore
  • Hannah Fields, ImmunoScape Pte Ltd, United States
  • Yovita Purwanti, ImmunoScape Pte Ltd, Singapore
  • Ana Milojkovic, ImmunoScape Pte Ltd, Singapore
  • Syazwani Salim, ImmunoScape Pte Ltd, Singapore
  • Kan Xing Wu, ImmunoScape, Singapore
  • Yannick Simoni, ImmunoScape Pte Ltd, Singapore
  • Antonella Vitiello, ImmunoScape Pte Ltd, United States
  • Dan McLeod, ImmunoScape Pte Ltd, United States
  • Alessandra Nardin, ImmunoScape Pte Ltd, Singapore
  • Evan Newell, ImmunoScape Pte Ltd, Singapore
  • Katja Fink, ImmunoScape Pte Ltd, Singapore
  • Andreas Wilm, ImmunoScape Pte Ltd, Singapore
  • Michael Fehlings, ImmunoScape Pte Ltd, Singapore


Presentation Overview: Show

Following viral infection, the human immune system generates broad and dynamic CD8+ T cell responses to virus antigens. A characterization of such T cell responses allows to understand infection history and its contribution to protective immunity.

We performed in-depth profiling of CD8+ T cells reactive to CMV, EBV and Influenza virus derived antigens in peripheral blood samples from 114 healthy donors and 55 cancer patients using high-dimensional mass cytometry with combinatorial barcoding of peptide-MHC-I multimers and subsequent single cell RNA sequencing/VDJ-CITE-Seq for phenotypes and TCR repertoire analysis of identified antigen-specificities.

We analysed the expression of up to 138 surface markers from more than 500 antigen-specific T cell responses across six different HLA alleles by applying multiple machine learning approaches. Our data revealed unique phenotypic signatures of T cells specific for antigens from different virus categories. Based on these signatures, we built a ML approach to predict virus specificity from bulk CD8+ T cells. We validated our prediction capabilities in-silico using an independent sample cohort and also in-vitro by TCR expression in a Jurkat reporter assay. Our data suggest that machine learning can be used as a statistically rigorous and unbiased way to accurately predict antigen specificity from T cell phenotypes.

C-122: A systematic comparison of novel and existing differential analysis methods for CyTOF data
Track: General Computational Biology
  • Quirin Manz, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Munich, Germany, Germany
  • Judith Bernett, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Munich, Germany, Germany
  • Lis Arend, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Munich, Germany, Germany
  • Melissa Klug, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Munich, Germany, Germany
  • Olga Lazareva, Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany, Germany
  • Jan Baumbach, Chair of Computational Systems Biology, University of Hamburg, Hamburg, Germany, Germany
  • Dario Bongiovanni, Department of Internal Medicine I, University hospital rechts der Isar, Technical University of Munich, Munich, Germany, Germany
  • Markus List, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Munich, Germany, Germany


Presentation Overview: Show

Cytometry techniques are widely used to discover cellular characteristics at single-cell resolution. Many data analysis methods for cytometry data focus solely on identifying subpopulations via clustering and testing for differential cell abundance. For differential expression analysis of markers between conditions, only few tools exist. These tools either reduce the data distribution to medians, discarding valuable information, or have underlying assumptions that may not hold for all expression patterns. Here, we systematically evaluated existing and novel approaches for differential expression analysis on real and simulated CyTOF data. We found that methods using median marker expressions compute fast and reliable results when the data are not strongly zero-inflated. Methods using all data detect changes in strongly zero-inflated markers, but partially suffer from overprediction or cannot handle big datasets. We present a new method, CyEMD, based on calculating the earth mover’s distance between expression distributions that can handle strong zero-inflation without being too sensitive. Additionally, we developed CYANUS, a user-friendly R Shiny App allowing the user to analyze cytometry data with state-of-the-art tools, including well-performing methods from our comparison. A public web interface is available at https://exbio.wzw.tum.de/cyanus/.

C-123: Genome-Wide Analysis and Evolutionary Perspective of the Cytokinin Dehydrogenase Gene Family in Wheat (Triticum aestivum L.)
Track: General Computational Biology
  • Ankita Singh, TECHNOLOGY INNOVATION INSTITUTE,UAE, India
  • Priyanka Jain, Amity University, Noida, India, India
  • Sarika Jaiswal, Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India, India
  • Mir Asif Iquebal, Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, India
  • Sundeep Kumar, ICAR-National Bureau of Plant Genetic Resources, New Delhi, India, India
  • Dinesh Kumar, Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, India


Presentation Overview: Show

Cytokinin dehydrogenase (CKX) is a small gene family that regulates the level of cytokinin in plants. In Triticum aestivum, 11 CKX subfamilies were identified with similar gene structures, motifs, domains, cis-acting elements, and an average signal peptide of 25 amino acid length. We performed a genome-wide identification of CKX family members in the Triticum aestivum genome to get their chromosomal location, gene structure, cis-element, phylogeny, synteny, and tissue- and stage-specific expression along with gene ontology. This study has also elaborately described the tissue- and stage-specific expression and is the resource for further analysis of CKX in the regulation of biotic and abiotic stress resistance, growth, and development in Triticum and other cereals to endeavor for higher production and proper management.

C-123: Genome-Wide Analysis and Evolutionary Perspective of the Cytokinin Dehydrogenase Gene Family in Wheat (Triticum aestivum L.)
Track: General Computational Biology
  • Ankita Singh, TECHNOLOGY INNOVATION INSTITUTE,UAE, India
  • Priyanka Jain, Amity University, Noida, India, India
  • Sarika Jaiswal, Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India, India
  • Mir Asif Iquebal, Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, India
  • Sundeep Kumar, ICAR-National Bureau of Plant Genetic Resources, New Delhi, India, India
  • Dinesh Kumar, Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, India


Presentation Overview: Show

Cytokinin dehydrogenase (CKX) is a small gene family that regulates the level of cytokinin in plants. In Triticum aestivum, 11 CKX subfamilies were identified with similar gene structures, motifs, domains, cis-acting elements, and an average signal peptide of 25 amino acid length. We performed a genome-wide identification of CKX family members in the Triticum aestivum genome to get their chromosomal location, gene structure, cis-element, phylogeny, synteny, and tissue- and stage-specific expression along with gene ontology. This study has also elaborately described the tissue- and stage-specific expression and is the resource for further analysis of CKX in the regulation of biotic and abiotic stress resistance, growth, and development in Triticum and other cereals to endeavor for higher production and proper management.

C-124: Streamlining Adaptive Immune Response Studies through IMGT® : from immunogenetics data to insights
Track: General Computational Biology
  • Nika Abdollahi, IMGT, the international ImMunoGeneTics Information System, France
  • Taciana Manso, IMGT, the international ImMunoGeneTics Information System, France
  • Gaoussou Sanou, IMGT, the international ImMunoGeneTics Information System, France
  • Merouane Elazami Elhassani, IMGT, the international ImMunoGeneTics Information System, France
  • Joumana Jabado-Michaloud, IMGT, the international ImMunoGeneTics Information System, France
  • Géraldine Folch, IMGT, the international ImMunoGeneTics Information System, France
  • Patrice Duroux, IMGT, the international ImMunoGeneTics Information System, France
  • Véronique Giudicelli, IMGT, the international ImMunoGeneTics Information System, France
  • Sofia Kossida, IMGT, the international ImMunoGeneTics Information System, France


Presentation Overview: Show

IMGT®, the International ImMunoGeneTics Information System®, is the reference resource in immunogenetics and immunoinformatics. Its main objective is to provide the basic knowledge, databases and tools to the scientific community that are relevant to explore the adaptive immune response using IMGT-ONTOLOGY standards. IMGT® is dedicated to advancing research and development in this field, with a focus on three key areas. Axis I centers on identifying and characterising immunoglobulin (IG) and T cell receptor (TR) genes in jawed vertebrates, aspiring to understand the adaptive immune response. This axis serves as a foundation for the remaining two axes. Axis II focuses on analysing and exploring expressed IG and TR repertoires in normal and pathological situations, achieved through comparing these repertoires with IMGT reference directories. Axis III investigates the 2D and 3D structures of engineered antibodies and TR, with their functions, the amino acid changes and the modifications of their properties. This talk will focus on the most recent features of the IMGT® databases, tools, reference directories and web resources, with an emphasis on their relevance to the current challenges in adaptative immune response studies.

C-125: T-REx. A Tandem Repeat Explorer algorithm for the fast, accurate and simple detection of tandem repeated proteins based on their secondary structures
Track: General Computational Biology
  • David Moyano-Palazuelo, Centro Andaluz de Biología del Desarrollo - CSIC, Spain
  • Walter Santos, Instituto de Biotecnología - Universidad Nacional Autonoma de Mexico, Mexico
  • Damien Devos, Centro Andaluz de Biología del Desarrollo - CSIC, Spain
  • Enrique Merino, Instituto de Biotecnología - Universidad Nacional Autonoma de Mexico, Mexico


Presentation Overview: Show

Tandem repeat proteins (TRPs) are proteins containing repeated units that can be formed either by identical or nearly identical amino acid sequences in the primary structure or by structural patterns that can be superimposed in the three-dimensional space. These tandem repeat units occur in various forms across all domains of life, ranging from short dipeptide repeats to longer units. TRPs can serve multiple functions, such as providing structural stability or catalytic activity, among others. However, predicting TRPs can be challenging due to their structural complexity and variability in sequence and length, making it difficult to predict their presence solely based on their primary sequences; therefore, methods based on the three-dimensional structure of proteins have been used to obtain the precise position of repeat units. Although these methods yield spectacular results, they come with a price: a substantial computational cost, making them unsuitable for large-scale analysis. The launch of Alphafold marked a milestone in the history of computational biology, with over 200 million predicted structures. To perform large-scale detection of TRPs on Alphafold-predicted structures, we have developed an ultra-fast prediction method based on a combination of repeated element analysis in secondary structure followed by subsequent three-dimensional evaluation.

C-126: Redundans2: a dynamic assembly pipeline for highly fragmented, heterozygous and hybrid genomes.
Track: General Computational Biology
  • Diego Fuentes Palacios, Barcelona Supercomputing Center BSC & IRB Barcelona - Institute for Research in Biomedicine, Spain
  • Lezsek Pryszcz, CRG Barcelona - Centre for Genomic Regulation, Poland
  • Toni Gabaldón, Barcelona Supercomputing Center BSC & IRB Barcelona - Institute for Research in Biomedicine, Spain


Presentation Overview: Show

The assembly of highly heterozygous genomes such as the ones from primarily hybrid organisms from short sequencing reads remains challenging due to difficulties in accurately recovering different haplotypes. When standard assembly processes encounter highly heterozygous genomes, they tend to collapse homozygous regions and report heterozygous regions in alternative contigs. This creates boundaries between homozygous and heterozygous regions, leading to multiple assembly paths that are difficult to resolve. The result is usually a highly fragmented assembly with a larger total size than expected, causing problems in downstream analyses, such as fragmented gene model predictions, incorrect gene copy number and broken synteny. To address these issues here we present Redundans2, a Python3-based pipeline specifically designed to handle the short read assembly of heterozygous genomes from small to large size. This pipeline includes a reduction step to recognize and selectively remove alternative heterozygous contigs that can be applied for contigs derived from both short and long reads. In addition to that, Redundans2 allows the usage of long reads as well as reference based strategies for scaffolding. Our method is available for free at https://github.com/Gabaldonlab/redundans.

C-127: Inference of mechanistic insights from multi-omics data facilitating the drug discovery process
Track: General Computational Biology
  • Mathias Kalxdorf, Cellzome, a GSK company, Germany


Presentation Overview: Show

Background: Drug development is a costly and challenging process, among others, due to high failure rates at late stages of the drug discovery process. Lack of comprehensive knowledge of disease mechanism and of causal effects induced by perturbation of selected drug targets are key causes for failure. Increasing availability of multi-omics data provides opportunities to address these lacks through quantitative assessment of mechanistic hypotheses from such data.
Method: A two-step approach is proposed to infer mechanisms from multi-omics data. First, agreements between observed data and prior-knowledge molecular interaction graphs are identified. Second, measures of confidence are added to mechanistic hypotheses by joining evidence from causal-reasoning with evidence from multi-omics-based protein activity estimations.
Results: The approach is evaluated using proteomics, phosphoproteomics, and transcriptomics data from pro- and anti-inflammatory macrophages. Results demonstrate how omics layers complement each other to provide mechanistic insights e.g. for key regulators like STAT1 and STAT6.
Conclusion: The presented approach enables quantitative inference of mechanistic insights from complex biological systems, linking disease-causing genes to measured phenotypes and explaining causal routes from drug targets to perturbation-induced effects. This approach can help to identify novel high-confidence drug targets, reveal unfavorable off-target mechanisms, and thereby facilitate the drug discovery process.

C-128: Comparison of Prokaryotic Annotation Tools Using 14,000 Species: Are You Getting the Most Out of Your Genome?
Track: General Computational Biology
  • Mateusz Jundzill, Institute of Infectious Diseases and Infection Control, Jena University Hospital, Jena, Germany; Leibniz Center for Photonics in Infection Research (LPI), Jena 07747, Germany, Germany
  • Riccardo Spott, Institute of Infectious Diseases and Infection Control, Jena University Hospital, Jena, Germany, Germany
  • Oliwia Makarewicz, Institute of Infectious Diseases and Infection Control, Jena University Hospital, Jena, Germany, Germany
  • Mathias W. Pletz, Institute of Infectious Diseases and Infection Control, Jena University Hospital, Jena, Germany, Germany
  • Christian Brandt, Institute of Infectious Diseases and Infection Control, Jena University Hospital, Jena, Germany, Germany


Presentation Overview: Show

Bacterial genome annotation is key to identifying genes, providing insight into bacterial biology, metabolic pathways, strain classification and potential novel drug targets, and aiding in the development of new treatments. We evaluated the performance of four widely used annotation tools (NCBI Prokaryotic Genome Annotation Pipeline (PGAP), Prokka, Bakta, eggNOG-mapper) utilizing 14,319 genomes from the Genome Taxonomy Database each of a unique species. Each genome was also subjected to random deletions to simulate various different states of genome assemblies (“noise”). In non-modified conditions, PGAP predicted the highest median of gene count (3907, IQR: 2890, 5041), while Prokka predicted the lowest (3768, IQR: 2768, 4843). However, PGAP struggled to annotate the predicted genes and had the second-highest median proportion of hypothetical proteins (19%, IQR: 16.5%, 21.8%), compared to Bakta (3%, IQR: 1.5%, 6.3%). Under noise conditions, PGAP retains the best annotation stability. The statistical results on how the taxa influence the quality of the annotations are still pending, but are an important cornerstone of this work so that the user can choose the right strategy depending on his data. Our preliminary data already highlights the need for serious consideration between the different prokaryotic annotation tools.

C-129: Hypoxia Induces Metabolic Alterations in Kidney Organoids
Track: General Computational Biology
  • Celine C. Berthier, University of Michigan, United States
  • Akihiro Minakawa, University of Michigan, United States
  • Matthew Fisher, University of Michigan, United States
  • Jamal El Saghir, University of Michigan, United States
  • Chenchen He, University of Michigan, United States
  • Virginia Vega Warner, University of Michigan, United States
  • Rajasree Menon, University of Michigan, United States
  • John R. Hartman, University of Michigan, United States
  • Felix H. Eichinger, University of Michigan, United States
  • Viji Nair, University of Michigan, United States
  • Jennifer A. Schaub, University of Michigan, United States
  • Subramaniam Pennathur, University of Michigan, United States
  • Matthias Kretzler, University of Michigan, United States
  • Jennifer L. Harder, University of Michigan, United States


Presentation Overview: Show

We developed and analyzed a novel in vitro hypoxic kidney organoid (K-org) model to study how hypoxia contributes to the development and progression of kidney disease.
K-orgs containing kidney-specific architecture were generated from human pluripotent stem cells and exposed to hypoxic conditions for 24h; immunostaining, ELISA and 13C-glucose flux analysis confirmed expected protein and functional response. Bulk and single cell (sc) transcriptional profiling was performed, and the latter integrated by RPCA using Seurat v4.0 (500-5000 genes, >50% mitochondrial reads/cell). Ten cell-type clusters were identified and analyzed using CellxGene. Sc and bulk RNA findings were integrated with experimental validations.
Differential expression analysis comparing hypoxic to normoxic organoids revealed increased but variable expression of HIF1A and HIF1A targets in podocytes, stromal, proximal tubular, and distal tubular cells. Casual network inference analysis confirmed HIF1A as the top upstream regulator of the observed bulk and sc transcriptional responses. Moreover, key metabolic pathways (glycolysis, sirtuin signaling, gluconeogenesis and mitochondrial dysfunction) were activated.
Computational analyses integrating multiple omics datasets demonstrate that hypoxic k-orgs capture key metabolic pathway perturbations seen in kidney disease. In combination with protein expression and functional studies, these data demonstrate the relevance of the hypoxic k-org model to study pathomechanisms in kidney disease.

C-130: CAB: A toolkit for carbon-aware bioinformatics
Track: General Computational Biology
  • Anne Hartebrodt, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Germany
  • Julian Matschinske, Bitspark GmbH, Nürnberg, Germany
  • David B. Blumenthal, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Germany


Presentation Overview: Show

Computing is rapidly becoming one of the major contributors to carbon emission. Mitigation strategies to reduce greenhouse gas emissions are location and time shifting the computation, making software more efficient, and using older hardware to avoid hardware obsolescence. In the public and especially the medical sector, users may be reluctant to location shift their computation due to privacy concerns. Therefore, we suggest time shifting. To date, no convenient tool is available to time shift the computation to a ‘greener time’ and report the estimated carbon emission. In this talk, we will present a toolkit for carbon-aware bioinformatics which minimizes the effort by the user to use greener energy and report emissions. It consists of an API returning the most favorable time to run the computation within a given tolerance frame; and a python package to streamline the integration of the API into python code and bioinformatics pipelines. Currently, users need to specify the estimated run time of their task, the percentage of renewable energy, an area code, and a deadline for the task. Users can upload reports on the resource usage of general tasks and pipelines which will allow us to provide automatic estimates in the future.

C-131: Identifying Candidate Driver Genes Using Intronic Background Mutation Rate from 8054 Cancer Whole-Genomes
Track: General Computational Biology
  • Ahmed Khalil, Institute for Research in Biomedicine (IRB Barcelona), Spain
  • Ivan Galván-Femenía, Institute for Research in Biomedicine (IRB Barcelona), Spain
  • Elizaveta Besedina, Institute for Research in Biomedicine (IRB Barcelona), Spain
  • Fran Supek, Institute for Research in Biomedicine (IRB Barcelona), Catalan Institution for Research and Advanced Studies (ICREA), Spain


Presentation Overview: Show

The background mutation rate (BMR) in cancer is the neutral accumulation of passenger mutations that occur spontaneously during DNA replication and repair, and is influenced by diverse endogenous and exogenous factors such mutagen exposures. Estimating the BMR is essential for quantifying selection in cancer evolution studies and identifying driver genes in cancer.
Given the increasing availability of whole-genome sequencing (WGS) data, we have developed HyperInVEx, a Bayesian-regularized Poisson regression model to quantify selection in cancer, refining the previous “InVEx” approach. HyperInVEx estimates the local BMR based on the intronic and intergenic mutations. Confounding by trinucleotide and pentanucleotide composition is stringently accounted for via a locus sampling approach.
Using 8054 whole-cancer genomes from 28 cancer types, we demonstrated that our intronic-based BMR can more accurately model local neutral mutation rates than covariate signals utilized by state-of-the-art selection models such as dNdScv and MutSigCV. Using HyperInVEx, we identified many known cancer genes, detected by dNdScv and MutSigCV, as well as a long tail of putative cancer driver genes that await replication.

C-132: Classification of human adipocyte cellular phenotypes using brightfield images following CRISPR mediated genetic perturbations
Track: General Computational Biology
  • Niels Tolstrup, Novo Nordisk, Denmark
  • Barak Gilboa, Novo Nordisk Research Centre Oxford, United Kingdom
  • Rikke Nielsen, Novo Nordisk, Denmark
  • Vanessa Jurtz, Novo Nordisk, Denmark
  • Robert Kitchen, Novo Nordisk Research Centre Oxford, United Kingdom
  • Manu Verma, Novo Nordisk Research Centre Oxford, United Kingdom


Presentation Overview: Show

Background:
High content brightfield imaging enables cost effective, longitudinal and high throughput assessment of human adipose progenitor (AP) cell fate cultured in vitro. This is relevant for inferring new insights into human metabolic health. It is however not clear how to classify genetic perturbations from brightfield images of adipocytes.
Method:
A novel workflow for classifying bright field images of CRISPR genetic perturbations using literature based gene function annotations for proof of concept. Our method encompasses virtual staining of brightfield images using deep neural networks to create fluorescence-like images of neutral lipid droplets, followed by training of Support Vector Machines (SVM) to distinguish loss of function of gene effect on features extracted with CellProfiler. From the SVM results we calculated the distance of images to the hyperplane which helps us determine Z prime values to estimate differences in cellular phenotypes following genetic perturbations.
Result:
Our method enables high throughput investigation of novel regulators of AP cell fate from brightfield images with direct impact on human metabolic health.

C-133: Improving Survival Risk Prediction with Random Survival Forests for Recurrent Events in Biological Systems
Track: General Computational Biology
  • Juliette Murris, HeKA, Inria, Inserm, Cordeliers Research Centre, Pierre Fabre, Sorbonne University, University Paris Cité, France, France
  • Audrey Lavenu, CIC-1414, Inserm, IRMAR – CNRS 6625, University of Rennes 1, France, France
  • Sandrine Katsahian, CIC-1418, AP-HP, HeKA, Inria, Inserm, Cordeliers Research Centre, Sorbonne University, University Paris Cité, France, France


Presentation Overview: Show

Administrative medical databases contain a wealth of information on patients’ pathways that can be leveraged to improve our understanding of survival outcomes, like disease progression or treatment response. However, to fully capture the complexity of follow-up, it is essential to analyse all events that patients may experience. Recurrent events refer to subsequent occurrences of the same event, such as recurrences or rehospitalizations, which are common in many diseases.
In this context, we present an extension of the random forests algorithm for the analysis of survival data with recurrent events, utilizing concepts from non-parametric survival analysis and statistical learning.
The proposed approach is an ensemble of survival trees with the pseudo-score test as splitting rule and the Nelson-Aalen estimator of mean cumulative function for each terminal node. Model discrimination through adapted concordance index and variable importance were computed to assess the algorithm overall. Cross-validation was used for hyperparameter optimisation and performance evaluation. We evaluated our methodology on both simulated and real-world data settings, and the results were promising with consistent findings.
The proposed methodology has the potential to facilitate the analysis of recurrent events in biological systems, providing key insights into the underlying mechanisms of survival outcomes.

C-134: Annotating phage protein functions with deep learning
Track: General Computational Biology
  • Xiaohao Yang, Monash University, Australia
  • Jinzheng Ren, Australian National University, Australia
  • Jiahui Li, Monash University, Australia
  • Wei Dai, Monash University, Australia
  • Yan Jiang, Zhejiang University, China
  • Christopher Stubenrauch, Monash University, Australia
  • Jiawei Wang, European Bioinformatics Institute, United Kingdom
  • Trevor Litghow, Monash University, Australia


Presentation Overview: Show

Bacteriophages (phages), viruses that kill bacterial pathogens, are being collected for use in phage therapies, with the intention to apply these bactericidal viruses directly into the infection sites in bespoke phage cocktails. Using such a biological agent for infection control requires a deep understanding of the phage. Thus, and despite the great unsampled phage diversity for this purpose, a critical issue hampering the roll out of phage therapy is the poor-quality functional annotation of the majority of phages.

To this end, we have formulated a pipeline, including machine learning-based algorithms that capture informative features and experimental validation, to predict key types of phage proteins. Most recently, we developed PhageProfiler, based on protein language models to annotate phage proteins with over 15 core functions, from the prevailing capsid and tail proteins, to the rare but critical anti-CRISPR and depolymerase proteins. Benefitting from the protein language models that learn patterns from millions of protein sequences across all life domains, PhageProfiler can capture the key characteristics to distinguish phage proteins with different functions. Having been extensively validated on various benchmarking tests and case studies, PhageProfiler represents the state-of-the-art method to accurately annotate phage genomes in a high throughput manner.

C-135: Systematic investigation of molecular mechanisms underlying neurodevelopmental disorders
Track: General Computational Biology
  • Milena Djokic, Institute of Molecular Biology, Mainz, Germany
  • Katja Luck, Institute of Molecular Biology, Mainz, Germany
  • Sivarajan Karunanithi, Institute of Molecular Biology, Mainz, Germany
  • Kristina Hintz, Institute of Molecular Biology, Mainz, Germany


Presentation Overview: Show

Neurodevelopmental disorders (NDDs) such as intellectual disabilities, autism, epilepsy and others are genetic disorders that primarily affect the brain, despite being caused by germline mutations present throughout the body. They are characterized by a wide range of developmental and neurological manifestations, even for a single disorder, indicating a multitude of possible underlying disease mechanisms. We explored the tissue-specific expression patterns of around 1000 NDD risk genes in GTEx to identify subgroups with distinct molecular mechanisms. Using hierarchical agglomerative clustering, as well as gene and disease ontology enrichment, we found that the largest group of genes showed uniform expression across all tissues, pointing towards the brain molecular context of the gene product but not the gene product itself causing the observed brain specific phenotype upon mutation of the gene. With this in mind, we are employing an integrative systems approach, combining various types of omics data of the brain and other tissues, evolutionary gene relationships, protein-protein interactions, and mutation data collected from the cohorts of NDD patients, to further improve our mechanistic understanding of brain-specific processes in neurodevelopment. Our study has the potential to unravel pathways affected in NDDs, as well as establish the approach for studying rare diseases in general.

C-136: Deblurring and enhancing the resolution of two-photon microscopy images using a deep neural network and generative model
Track: General Computational Biology
  • Haruhiko Morita, Division of Systems Biology, Nagoya University Graduate School of Medicine, Japan
  • Shuto Hayashi, Department of Computational and Systems Biology, Medical Research Institute, Tokyo Medical and Dental University, Japan
  • Teppei Shimamura, Division of Systems Biology, Nagoya University Graduate School of Medicine, Japan


Presentation Overview: Show

Two-Photon Microscopy (TPM) enables deep-tissue live imaging. However, its axial resolution is inferior to the lateral resolution, and this makes it difficult to reconstruct the three-dimensional structure of cellular details such as synapses or microglia spines. Previous studies have been insufficient for improving deep-tissue TPM images, or they are not suitable for live imaging.

We built a deep neural network that deblurs and improves the axial resolution of TPM images (“deblurring model”). Since we do not have the true structures of objects in TPM images, the deblurring model was trained in combination with a blurring generative model simulating the blurring process of TPM.

For quantitative evaluations, we first adapted our model to simulation data and real images of beads, and we found that our model accurately inferred the true shapes of the objects. Secondly, we adapted our model to images of axons, and we found that the model deblurred images and improved image resolution, resulting in providing more clear cellular shapes.

We expect this method enables more accurate evaluations of the three-dimensional structure of the living cells in deep tissue.

C-137: Statial: A Bioconductor package for identifying spatially-related changes in cell state
Track: General Computational Biology
  • Ellis Patrick, The University of Sydney, Australia
  • Farhan Ameen, The University of Sydney, Australia
  • Sourish Iyengar, The University of Sydney, Australia
  • Shila Ghazanfar, The University of Sydney, Australia


Presentation Overview: Show

The human body comprises over 37 trillion cells with diverse forms and functions, which can exhibit dynamic changes based on their environmental context. Understanding the spatial interactions between cells and changes in their state within the tissue microenvironment is crucial to comprehending the development of human diseases. State-of-the-art technologies such as PhenoCycler, IMC, CosMx, Xenium, and others can deeply phenotype cells in their native environment, providing a high-throughput means of identifying spatially related changes in cell state.

The Statial Bioconductor package offers a suite of complementary approaches for identifying changes in cell state explained by changes in cell type localization. In this presentation, we introduce new functionality in the Statial package that can 1) identify changes in cell state between distinct tissue environments, 2) uncover changes in marker expression associated with cell proximities, and 3) model spatial relationships between cells in the context of hierarchical cell lineage structures. We provide context for these approaches and explain when and why modeling spatial relationships between cells in these ways is appropriate. Finally, we demonstrate how these approaches can be used in a classification setting to predict patient prognosis or treatment response.

C-138: The role of tandem repeats in bacterial functional amyloids - computational analysis
Track: General Computational Biology
  • Alicja Nowakowska, Wrocław University of Science and Technology, Poland
  • Jakub Wojciechowski, Wrocław University of Science and Technology, Poland
  • Natalia Szulc, Wrocław University of Science and Technology, Poland
  • Malgorzata Kotulska, Wrocław University of Science and Technology, Poland


Presentation Overview: Show

Repetitivity and modularity of proteins are two related notions incorporated into multiple evolutionary concepts. We study whether they may also be essential for functional amyloids. Amyloids are proteins that create very regular and usually highly insoluble fibrils, often associated with neurodegeneration. However, recent discoveries revealed that amyloid structure of a protein could also be beneficial and desired, e.g., to promote cell adhesion. Functional amyloids are proteins which differ in their characteristics from pathological amyloids so that the fibril formation is more under control of an organism. We propose that repeats in the sequence could regulate the aggregation propensity of these proteins. The inclusion of multiple symmetric interactions, due to the presence of the repeats, may support and strengthen the desirable structural properties of functional amyloids. Our results show that tandem repeats in bacterial functional amyloids have specific characteristics. The pattern of repeats supports the appropriate level of fibril formation and better controllability of fibril stability. The repeats tend to be more imperfect, which attenuates excessive aggregation propensity. Their desired structure and function is also reinforced by their amino acid profile. Although in the study we focused on bacterial functional amyloids, due to their importance in biofilm formation, we propose that similar mechanisms could be employed in other functional amyloids which are designed by evolution to aggregate in a desirable manner.

C-139: Genome instability pattern in ovarian cancer
Track: General Computational Biology
  • Nicolò Gnoato, Department of biology, University of Padua, Italy
  • Angelo Velle, Department of biology, University of Padua, Italy
  • Stefania Pirrotta, Department of biology, University of Padua, Italy
  • Laura Masatti, Department of biology, University of Padua, Italy
  • Enrica Calura, Department of biology, University of Padua, Italy
  • Chiara Romualdi, Department of biology, University of Padua, Italy


Presentation Overview: Show

The immune system plays a critical role in recognizing and subsequently eliminating tumor cells. Structural genomic modifications can alter the expression of tumor antigens, immune signaling molecules and affect the tumor's ability to evade the immune response. Through the analysis of copy number and structural variation data from whole genome sequencing of ovarian tissue samples, genomic instability signatures have demonstrated the ability to classify tumor samples into distinct categories of severity. In this study, our objective is to analyze genomic instability in multiple ovarian tumor sites using single-cell sequencing data, assessing copy number and structural variations of the genome from multiple ovarian tumor sites. Specifically, the study seeks to test the reliability of these signatures in classifying genomic instability across different sample types, comparing results from both whole genome sequencing and single-cell data. The entire analysis is focused on paired samples derived from the same patients to assess the degree of similarity between classification results obtained from the two datasets. This work could provide new insights into genomic instability in ovarian tumors and determine whether genomic instability signatures can be applied to single-cell data for improved tumor classification.

C-140: Comparative analysis of gene expression profiles induced by chemicals with the same target molecule
Track: General Computational Biology
  • Yayoi Natsume-Kitatani, National Institutes of Biomedical Innovation, Health and Nutrition, Japan
  • Ken-Ichi Aisaki, National Institute of Health Sciences, Japan
  • Satoshi Kitajima, National Institute of Health Sciences, Japan
  • Jun Kanno, National Institute of Health Sciences, Japan


Presentation Overview: Show

The Percellome database [1], which allows quantitative comparison of gene expression profiles induced by toxic chemicals through the process of estimating mRNA copy number per cell, is a useful resource for inferring the molecular mechanisms of chemical exposure-induced toxicity. This database contains gene expression profiles (number of mRNA copies per cell estimated by the above process) by exposure dose and time in mice for various chemicals. By quantitatively capturing the dynamic changes in gene expression, it is possible to extract ""what kind of molecular network changes lead to toxic expression due to chemical exposure”. The targets for our analysis include known PPARα (Peroxisome Proliferator Activated Receptor Alpha) ligands and chemicals that have been suggested to be PPARα ligands based on previous research results (clofibrate, valproic acid, estragole, di(2-ethylhexyl)phthalate, and phenobarbital). We compared the patterns of dynamic changes in gene expression levels of these five chemicals and detected common and unique patterns among them. We report findings on the inferred toxicological mechanisms of these chemicals obtained by tracing how biological responses tied to the characteristics of these chemicals change with dose and time of administration.

[1] Kanno J. et al., J. Toxicol. Sci. 2013;38(4): 643-654

C-141: Role of Alternative Polyadenylation in Noradrenergic-to-Mesenchymal Transition in Neuroblastoma
Track: General Computational Biology
  • Rhea Ahluwalia, University of Toronto, Canada
  • Fupan Yao, University of Toronto, Canada
  • Quang Trinh, Ontario Institute of Cancer Research, Canada
  • Gabrielle Persad, University of Toronto, Canada
  • Brent Derry, University of Toronto, Canada
  • Lincoln Stein, University of Toronto, Canada


Presentation Overview: Show

Neuroblastoma, the most common extracranial tumor in children; compared to adult cancers, neuroblastoma has a distinctly lower number of somatic mutations, with known drivers including MYCN, NRAS, and ALK. Two distinct cell states in neuroblastoma, adrenergic (ADRN) and mesenchymal (MES), dynamically interconvert in the process of noradrenergic-to-mesenchymal transition (NMT). MES cells are implicated in conferring an additional level of pathogenicity due to their more migratory and therapy-resistant phenotype. Epigenetic mechanisms, known to be involved in neuroblastoma, are likely important in regulating NMT. A recent study linked alternative polyadenylation (APA) to proliferation and neuronal differentiation in neuroblastoma.

Using an integrated computational and experimental approach, we explore whether changes in APA affect NMT. With scRNA-seq of five neuroblastoma cell lines, we identified distinct ADRN and MES populations and compared their usage of 3’UTR polyadenylation sites. Preliminary results show differential ADRN vs. MES 3’UTR usage in 180 genes that include transcription factors and chromatin modifiers. We are establishing an in vitro neuroblastoma APA model by biasing 3’UTR usage to shorter or longer extremes, which will enable us to study the effect of globally truncated or extended 3’UTRs on NMT. Elucidating the role of APA in NMT may reveal novel targetable vulnerabilities in neuroblastoma.

C-141: Role of Alternative Polyadenylation in Noradrenergic-to-Mesenchymal Transition in Neuroblastoma
Track: General Computational Biology
  • Rhea Ahluwalia, University of Toronto, Canada
  • Fupan Yao, University of Toronto, Canada
  • Quang Trinh, Ontario Institute of Cancer Research, Canada
  • Gabrielle Persad, University of Toronto, Canada
  • Brent Derry, University of Toronto, Canada
  • Lincoln Stein, University of Toronto, Canada


Presentation Overview: Show

Neuroblastoma, the most common extracranial tumor in children; compared to adult cancers, neuroblastoma has a distinctly lower number of somatic mutations, with known drivers including MYCN, NRAS, and ALK. Two distinct cell states in neuroblastoma, adrenergic (ADRN) and mesenchymal (MES), dynamically interconvert in the process of noradrenergic-to-mesenchymal transition (NMT). MES cells are implicated in conferring an additional level of pathogenicity due to their more migratory and therapy-resistant phenotype. Epigenetic mechanisms, known to be involved in neuroblastoma, are likely important in regulating NMT. A recent study linked alternative polyadenylation (APA) to proliferation and neuronal differentiation in neuroblastoma.

Using an integrated computational and experimental approach, we explore whether changes in APA affect NMT. With scRNA-seq of five neuroblastoma cell lines, we identified distinct ADRN and MES populations and compared their usage of 3’UTR polyadenylation sites. Preliminary results show differential ADRN vs. MES 3’UTR usage in 180 genes that include transcription factors and chromatin modifiers. We are establishing an in vitro neuroblastoma APA model by biasing 3’UTR usage to shorter or longer extremes, which will enable us to study the effect of globally truncated or extended 3’UTRs on NMT. Elucidating the role of APA in NMT may reveal novel targetable vulnerabilities in neuroblastoma.

C-142: CytoEXpert – a cytometry analysis tool
Track: General Computational Biology
  • Gábor Beke, Institute of Molecular Biology, Slovak Academy of Sciences, Bratislava, Slovakia
  • Lubos Klucar, Institute of Molecular Biology, Slovak Academy of Sciences, Bratislava, Slovakia
  • Dana Cholujova, Cancer Research Institute, Biomedical Research Center, Slovak Academy of Sciences, Bratislava, Slovakia
  • Jana Jakubikova, Cancer Research Institute, Biomedical Research Center, Slovak Academy of Sciences, Slovakia


Presentation Overview: Show

Cytometry is a powerful method, which is used in many areas of biology and medicine, e.g., immunology, haematology, cancer research, and microbiology. It is based on antibodies and allows to measure multiple cell parameters in large number of cells. In addition to flow cytometry, there are other, high-throughput, types of cytometry, such as mass cytometry (also known as CyTOF) and imaging cytometry. High-throughput cytometry data analysis requires sophisticated software and algorithms and often some previous programming knowledge. To overcome these limitations, we developed “CytoEXpert”, a free web portal based on HTML, PHP, MySQL and Javascript. The portal allows to pre-process and normalize raw cytometry data. CytoEXpert comprise of dynamic web-based gating tool, which allows identification of cell populations, and incorporates additional R-based tools, such as SPADE, Citrus, tSNE, viSNE and flowSOM. The results can be further analysed with different statistical tests, PCA or correspondence analysis. Besides tabular outputs, result can be visualized with different types of plots: e.g., heatmaps, box-plots, dot-plots, volcano-plots and others. This work was supported by grants APVV-19-0212, APVV-20-0183, and MZSR 2019/14-BMCSAV-9.

C-145: Genetic association studies of deep-learning derived endophenotypes for neuropsychiatric disorders
Track: General Computational Biology
  • Shane O'Connell, University of Galway, Ireland
  • Pilib Ó Broin, University of Galway, Ireland
  • Dara Cannon, University of Galway, Ireland


Presentation Overview: Show

Genome wide association studies (GWAS) of complex neuropsychiatric phenotypes are often limited in their ability to detect statistically significant single nucleotide polymorphisms, partly owing to the broad range and variability of considered symptoms. While this limitation can be mitigated in some cases by leveraging the larger sample sizes offered by biobanking initiatives such as the UK Biobank, studies of many common disorders, including schizophrenia and bipolar disorder, remain underpowered. GWAS of intermediate phenotypes, derived from phenotype-associated quantities such as neuroimaging biomarkers, could increase the potential for detecting significant genetic signals by refining the problem space. This concept has recently been explored using GWAS of brain imaging derived tabular data; however these approaches do not usually consider non-linear relationships between derived measures and outcomes. Here, we propose the use of deep-learning models, such as convolutional neural networks (CNNs) and autoencoders, to derive secondary phenotypes of neuropsychiatric conditions from neuroimaging data. We apply our methods to an Alzheimer’s disease dataset and compare the genetic properties of the derived phenotypes to primary GWAS results.

C-146: Coevolution of Methionine adenosyltransferase and its regulatory binding protein
Track: General Computational Biology
  • Shohei Yoshida, UCL, United Kingdom


Presentation Overview: Show

Methionine adenosyltransferase (MAT2A, herewith MAT) catalyses the synthesis of S-adenosylmethionine from L-methionine and ATP. MAT is a pharmacologically validated cancer target. Furthermore, a binding protein (MAT2B or herewith BP), was recently shown to bind to and stabilize MAT. While MAT enzymes are ubiquitous in nature, the distribution of BP remains unknown. In addition, the detail of coevolution of MAT and BP and the molecular mechanisms involved remain elusive. To tackle these questions, I investigate the evolution of MAT and BP with computational methods that extract coevolutionary signals in interacting proteins. In addition, molecular dynamics (MD) simulation is employed to understand the interaction between MAT and BP as well as the interaction between BP and other potential ligands. The finding of the computational analysis is complemented by experimental investigations.
Our preliminary results from computational analyses suggest that Craniata presents a BP with a conserved C-terminus, while all other organisms possess a shortened version of the C-terminus (BP-like). MD simulations of the MAT-“BP-like” complex implies that “BP-like” protein show low or no affinity to its MAT counterpart. This finding is confirmed by the ITC experiment.
In conclusion, it seems that C-terminus of BP is important for the binding to MAT.

C-147: Enhancing Real-World Evidence with Biomedical Knowledge Graphs: Exploring Atopic Dermatitis Endotypes
Track: General Computational Biology
  • Francesca Frau, Sanofi , R&D Data and Data Science, Frankfurt am Main, 65926, Germany, Germany
  • Paul Loustalot, Quinten Health, 8 rue Vernier 75017, Paris, France, France
  • Margaux Tornqvist, Quinten Health, 8 rue Vernier 75017, Paris, France, France
  • Nina Temam, Quinten Health, 8 rue Vernier 75017, Paris, France, France
  • Jean Cupe, Quinten Health, 8 rue Vernier 75017, Paris, France, France
  • Franck Augé, Sanofi R&D Data and Data Science, Chilly-Mazarin, France, France


Presentation Overview: Show

Precision Medicine is defined by the U.S. Food & Drug Administration as “an innovative approach to tailoring disease prevention and treatment that considers differences in people’s genes, environments and lifestyles.
To succeed in providing personalised medicine to the patients, it will be necessary to combine medical, biological and molecular data not only to identify all complex diseases subtypes (patient stratification), but also to understand the underlying molecular mechanisms. Biomedical Knowledge Graphs (BKGs) are limited to the integration of prior knowledge data and do not integrate real-world data (RWD) that would allow for the incorporation of patient level information.
With this work we propose a first step towards using graphs and graph machine learning in a fully integrated precision medicine strategy. We show that RWD can be integrated with a BKG to form a Patient & Biomedical Knowledge Graph. This allows to create new patient’s representations using graph representation leaning and which can be used to synergize the strength of RWD studies in identifying disease subtypes with the strength of BKGs in bridging medical and molecular information.
We applied our methodology to atopic dermatitis (AD), identifying 7 subgroups of patients, characterising the medical, biological, and molecular evidence of each subtype.

C-147: Enhancing Real-World Evidence with Biomedical Knowledge Graphs: Exploring Atopic Dermatitis Endotypes
Track: General Computational Biology
  • Francesca Frau, Sanofi , R&D Data and Data Science, Frankfurt am Main, 65926, Germany, Germany
  • Paul Loustalot, Quinten Health, 8 rue Vernier 75017, Paris, France, France
  • Margaux Tornqvist, Quinten Health, 8 rue Vernier 75017, Paris, France, France
  • Nina Temam, Quinten Health, 8 rue Vernier 75017, Paris, France, France
  • Jean Cupe, Quinten Health, 8 rue Vernier 75017, Paris, France, France
  • Franck Augé, Sanofi R&D Data and Data Science, Chilly-Mazarin, France, France


Presentation Overview: Show

Precision Medicine is defined by the U.S. Food & Drug Administration as “an innovative approach to tailoring disease prevention and treatment that considers differences in people’s genes, environments and lifestyles.
To succeed in providing personalised medicine to the patients, it will be necessary to combine medical, biological and molecular data not only to identify all complex diseases subtypes (patient stratification), but also to understand the underlying molecular mechanisms. Biomedical Knowledge Graphs (BKGs) are limited to the integration of prior knowledge data and do not integrate real-world data (RWD) that would allow for the incorporation of patient level information.
With this work we propose a first step towards using graphs and graph machine learning in a fully integrated precision medicine strategy. We show that RWD can be integrated with a BKG to form a Patient & Biomedical Knowledge Graph. This allows to create new patient’s representations using graph representation leaning and which can be used to synergize the strength of RWD studies in identifying disease subtypes with the strength of BKGs in bridging medical and molecular information.
We applied our methodology to atopic dermatitis (AD), identifying 7 subgroups of patients, characterising the medical, biological, and molecular evidence of each subtype.

C-148: Enhancing CryoET Image Clarity with a Vision Transformer-Based Denoising Model
Track: General Computational Biology
  • Sung-Eun Jang, Seoul National Univ., South Korea
  • Martin Steinegger, Seoul National Univ., South Korea


Presentation Overview: Show

Cryo-electron tomography (CryoET) is a powerful method for obtaining 3D images of biological samples, offering invaluable insights into cellular structures and their functions. However, this technique encounters challenges, such as radiation damage, low signal-to-noise ratio, and difficulty in determining particle orientation. To address these issues, we introduce a deep learning-based denoising model that employs a Vision Transformer (ViT).

Our approach assesses the ViT's capacity to identify particle locations and morphologies. The model utilizes three input tilt images—anchor, positive, and negative—and predicts the indirect angle between them. The attention heatmap produced by our model demonstrates its focus on particle locations and shapes within the tilt image.

By capitalizing on the model's reliable attention to particles, we anticipate that our approach can be applied to few-shot learning methods, enabling rapid adaptation to new tasks with minimal training data. Additionally, we are expecting that the ViT can effectively denoise CryoET images and maintain accuracy across diverse noise levels using a noise-to-noise training scheme. Our novel method aims to develop a versatile and efficient denoising model for CryoET, which could lead to significant advancements in structural biology.

C-149: The spatial organization of enhancers around promoter regions within chromatin contact domains for selected Human cell lines: Structural regulatory landscape
Track: General Computational Biology
  • Abhishek Agarwal, Centre of New Technologies, University of Warsaw, Poland
  • Sevastianos Korsak, Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland, Poland
  • Anup Kumar Halder, Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland, Poland
  • Mateusz Chiliński, Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland, Poland
  • Dariusz Plewczynski, Centre of New Technologies, University of Warsaw, Poland


Presentation Overview: Show

The multiscale hierarchical spatial structure of the mammalian genome is defined by the chromatin loops, TADs, compartments, & chromosomal territories. The chromatin looping is observed when the CTCF protein participates in the loop extrusion process driven by the ring-like cohesin molecular motor. The cohesin-mediated chromatin interactions vary among cell types and conditions. Such spatial variability correlates with the differences in gene expression between those cellular states and contributes to the microscale transcription and DNA replication processes. Our research aims to develop and test the concept of the structural epigenomic landscape (SEL) of regulatory elements around promoter regions for selected cell types and the different individuals of the human population. We will propose a biophysical method to construct probabilistic ensembles of three-dimensional conformations at genomic domains scale (i.e. for chromatin contact domains - CCDs, or topologically associating domains - TADs), compartments, chromosomal territories and finally, at the whole genome-scale.

C-150: CONNECTOR, fitting and clustering of longitudinal data to reveal a new risk stratification system
Track: General Computational Biology
  • Simone Pernice, Department of Computer Science, University of Torino, Italy
  • Roberta Sirovich, Department of Mathematics G Peano, University of Torino, Italy
  • Elena Grassi, Candiolo Cancer Institute, FPO-IRCCS and Department of Oncology, University of Torino, Italy
  • Marco Viviani, Candiolo Cancer Institute, FPO-IRCCS and Department of Oncology, University of Torino, Italy
  • Martina Ferri, Candiolo Cancer Institute, FPO-IRCCS and Department of Oncology, University of Torino, Italy
  • Francesco Sassi, Candiolo Cancer Institute, FPO-IRCCS, Italy
  • Luca Alessandrì, Department of Molecular Biotechnology and Health Sciences, University of Torino, Italy
  • Dora Tortarolo, Department of Computer Science, University of Torino, Italy
  • Raffaele A Calogero, Department of Molecular Biotechnology and Health Sciences, University of Torino, Italy
  • Livio Trusolino, Candiolo Cancer Institute, FPO-IRCCS and Department of Oncology, University of Torino, Italy
  • Andrea Bertotti, Candiolo Cancer Institute, FPO-IRCCS and Department of Oncology, University of Torino, Italy
  • Marco Beccuti, Department of Computer Science, University of Torino, Italy
  • Martina Olivero, Candiolo Cancer Institute, FPO-IRCCS and Department of Oncology, University of Torino, Italy
  • Francesca Cordero, Department of Computer Science, University of Torino,, Italy


Presentation Overview: Show

The transition from evaluating a single time point to examining the entire dynamic evolution of a system is possible only in the presence of the proper framework. The strong variability of dynamic evolution makes the definition of an explanatory procedure for data fitting and clustering, a challenging task. We developed CONNECTOR, a data-driven framework, able to analyze longitudinal data in a straightforward and revealing way. CONNECTOR is based on a functional clustering method, which provides fitted curves as well as cluster memberships through the estimation of a functional model written using natural cubic splines with random coefficients. CONNECTOR includes a collection of tools which help the user to visualize the data, to properly set the free parameters (the model selection phase) and to inspect the fitting and clustering results.
When used to analyze tumor growth kinetics over time in 1599 patient-derived xenograft growth curves from ovarian and colorectal cancers, CONNECTOR allowed the aggregation of time-series data through an unsupervised approach in informative clusters. We give a new perspective of mechanism interpretation, specifically, we define novel model aggregations and we identify unanticipated molecular associations with response to clinically approved therapies. CONNECTOR is freely available under GNU GPL license at https://qbioturin.github.io/connector.

C-151: Identification of cell-cell communication and ligand-receptor interaction in CKD and NASH through the integration of single cell and spatial transcriptomics.
Track: General Computational Biology
  • Jaime Moreno, Artificial Intelligence and Digital Research (AIDR) - DSI, Novo Nordisk A/S Måløv, Denmark, Denmark
  • Albin Sandelin, Computational and RNA Biology, University of Copenhagen, Denmark
  • Gianluca Mazzoni, Artificial Intelligence and Digital Research (AIDR) -DSI, Novo Nordisk A/S Måløv, Denmark, Denmark
  • Vivek Das, Artificial Intelligence and Digital Research (AIDR) - DSI, Novo Nordisk A/S Måløv, Denmark, Denmark


Presentation Overview: Show

Background: Chronic Kidney Disease (CKD) and Nonalcoholic steatohepatitis (NASH) are multi-factorial metabolic diseases with interplay of fibrotic and inflammatory insults. The combination of single-cell (scRNASeq) and spatial transcriptomics (ST) could give unprecedent molecular disease understanding at single cell resolution. Notably, cell-specific ligand-receptor (L-R) interactions, learned across disease stages, have the potential to reveal novel disease features and contribute significantly to the early drug target discovery and validation process.
Methods: We present a systematic analytical framework to combine scRNASeq with ST to pinpoint the L-R pairs that play a role in disease-centric inter-cellular signaling. Our framework uses state-of-the-art methods such as Cell Chat, Cell2location, and a co-occurrence model to integrate ST and scRNASeq information.
Results: Our framework identified L-R pairs driving the cellular crosstalk in CKD and NASH. These cell-cell interactions are co-occurring in ST data and can be visualized directly in the tissue slides. Several of those L-R protein pairs are known CKD and NASH drivers while some are novel potential targets.
Conclusion: This integration of scRNASeq and ST modalities provides a comprehensive understanding of molecular mechanisms in CKD and NASH, which could not be attainable alone by a single technology, thus paving a way for future potential therapeutic targets.

C-152: Identification of novel dosage sensitive driver genes in CNV loci associated with neurodevelopmental disorders.
Track: General Computational Biology
  • Sara Azidane, STALICLA Discovery and Data Science Unit, Spain
  • Xavier Gallego, STALICLA Discovery and Data Science Unit, Spain
  • Lynn Durham, STALICLA, Spain


Presentation Overview: Show

Copy number variants (CNVs) are genome-wide structural variations involving the duplication or deletion of large nucleotide sequences. While these types of variations can be commonly found in humans, large and rare CNVs including coding sequence gains or losses are known to contribute substantially to the development of various neurodevelopmental disorders (NDDs), and particularly to autism spectrum disorder (ASD). Nevertheless, given that these NDD-risk CNVs cover broad regions of the genome, it is particularly challenging to pinpoint the critical gene(s) responsible for the expression of the phenotype. Here we performed a meta-analysis study with 11,570 NDD patients and 4,114 controls from the SFARI-Gene database to identify NDD-risk regions and to later determine which deleted or duplicated genes within these broad regions were driving the phenotypic effects. We identified 38 NDD-risk CNV loci surpassing Bonferroni correction, including 23 novel ones, and provided evidence for dosage-sensitive genes within these regions being significantly enriched for driver gene candidates. Finally, we conducted a burden analysis using 4,194 NDD cases from Decipher and iHART and 2,504 neurotypical controls from the 1000 Genomes Project, which validated the association of 152 dosage sensitive driver genes with risk for NDDs, including 21 novel NDD-risk genes.

C-153: Cell deconvolution of ccRCC tissues reveals key predictors of anti-PD-1 treatment outcome
Track: General Computational Biology
  • Florian Jeanneret, CEA, France
  • Pauline Bazelle, CEA, France
  • Sarah Schoch, Lund University, Denmark
  • In Hwa Um, University of St Andrews, United Kingdom
  • Catherine Pillet, CEA, France
  • Assilah Bouzit, CHU Grenoble, France
  • Jean-Alexandre Long, CHU Grenoble, France
  • Jean Luc Descotes, CHU Grenoble, France
  • Delphine Pflieger, CNRS, France
  • Odile Filhol, CEA, France
  • David Harrison, University of St Andrews, United Kingdom
  • Håkan Axelson, Lund University, Denmark
  • Christophe Battail, CEA, France


Presentation Overview: Show

Immune checkpoint blockade (ICB) therapies are now a important tool in the arsenal for the treatment of advanced kidney cancer with prolonged progression-free survival and overall survival. However, only a subset of patients respond to ICB therapies causing an urgent need for novel approaches to better select patients who may benefit from immunotherapy. Although substantial effort has been devoted to T cells towards ICB treatment response understanding, other cell types are involved in this process. Here, we used primary and metastatic ccRCC samples obtained before ICB treatment and performed cell deconvolution analysis to investigate novel biomarkers of ICB treatment response. We found that several cell types in the TME of metastatic samples of ccRCC were highly valuable to highlight several TME subtypes with significant differences in anti-PD-1 (Nivolumab) treatment response, cancer progression and overall survival. Moreover, differentially gene expression analyses between these TME subtypes revealed a 5 genes signature associated with a TME cluster harboring the worst ICB response values. Then, a numerical score was built to predict the treatment response outcome (overall response rate, ORR) for Nivolumab-treated patients and showed a strong classification performance (AUC-ROC=0.88) compared to other existing scores (AUC-ROCs ranging from 0.55 to 0.80).

C-154: Trinity of chromatin architects - The coordination of CTCF, RNAPOL2 and Cohesin in shaping the genomic landscape
Track: General Computational Biology
  • Abhishek Aagarwal, Centre of New Technologies, University of Warsaw, Poland
  • Mateusz Chiliński, Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland, Poland
  • Sevastianos Korsak, Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland, Poland
  • Anup Kumar Halder, Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland, Poland
  • Dariusz Plewczynski, Centre of New Technologies, University of Warsaw, Poland


Presentation Overview: Show

Genome-wide architectural landscapes of chromatin in the nucleus can be identified by advanced high-throughput sequencing-based 3C-type methods such as Hi-C, ChIA-PET, and HiChIP. The spatial organisation of chromatin in the nucleus is stabilised by structural proteins, such as the CCCTC-binding factor (CTCF), RNAPOL2 and cohesin complex. These proteins play an essential role in establishing long-range chromatin interactions (chromatin loops), facilitating topologically associating domain formation and allowing for the coordination of genes with their corresponding regulatory elements. Here, we discuss the exact role of CTCF, RNAPOL2 and cohesin in shaping chromatin multiscale three-dimensional architecture, particularly how static architecture defined by CTCF is re-shaped by the dynamical activity of cohesin (LEM: loop extrusion model), and re-organized during transcriptional activity by RNAPOL2. We analyse CTCF, cohesin and RNAPOL2 binding sites that account for the topological regulation of chromatin loops, the dynamics of loop extrusion, and phase separation condensates related to the transcriptional factories.

C-154: Trinity of chromatin architects - The coordination of CTCF, RNAPOL2 and Cohesin in shaping the genomic landscape
Track: General Computational Biology
  • Abhishek Aagarwal, Centre of New Technologies, University of Warsaw, Poland
  • Mateusz Chiliński, Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland, Poland
  • Sevastianos Korsak, Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland, Poland
  • Anup Kumar Halder, Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland, Poland
  • Dariusz Plewczynski, Centre of New Technologies, University of Warsaw, Poland


Presentation Overview: Show

Genome-wide architectural landscapes of chromatin in the nucleus can be identified by advanced high-throughput sequencing-based 3C-type methods such as Hi-C, ChIA-PET, and HiChIP. The spatial organisation of chromatin in the nucleus is stabilised by structural proteins, such as the CCCTC-binding factor (CTCF), RNAPOL2 and cohesin complex. These proteins play an essential role in establishing long-range chromatin interactions (chromatin loops), facilitating topologically associating domain formation and allowing for the coordination of genes with their corresponding regulatory elements. Here, we discuss the exact role of CTCF, RNAPOL2 and cohesin in shaping chromatin multiscale three-dimensional architecture, particularly how static architecture defined by CTCF is re-shaped by the dynamical activity of cohesin (LEM: loop extrusion model), and re-organized during transcriptional activity by RNAPOL2. We analyse CTCF, cohesin and RNAPOL2 binding sites that account for the topological regulation of chromatin loops, the dynamics of loop extrusion, and phase separation condensates related to the transcriptional factories.

C-155: DReAmocracy-app: An R-shiny tool to highlight candidate repurposed drugs based on aggregated knowledge from prior drug discovery efforts
Track: General Computational Biology
  • Kyriaki Savva, The Cyprus Institute of Neurology & Genetics, Cyprus
  • Margarita Zachariou, The Cyprus Institute of Neurology & Genetics, Cyprus
  • Marilena Bourdakou, The Cyprus Institute of Neurology & Genetics, Cyprus
  • Nikolas Dietis, Medical School, University of Cyprus, Cyprus
  • George M. Spyrou, The Cyprus Institute of Neurology & Genetics, Cyprus


Presentation Overview: Show

Several computational drug repurposing studies have highlighted candidate repurposed drugs, as well as clinical trial studies testing drugs in different phases. To our knowledge, the aggregation of information from previous studies has not been widely exploited. To fill this knowledge gap, we performed a weight-modulated majority voting of the modes of action, initial indications and targeted pathways of the drugs in the Drug Repurposing Hub repository. Our method, DReAmocracy, exploits this information and creates frequency tables and finally a disease suitability score for each drug from the selected library. This method was applied to Alzheimer’s, Parkinson’s and Huntington’s Disease, and Multiple Sclerosis. A super-reference table with drug suitability scores has been created for the four diseases. Based on this methodology, we will present an R-shiny tool, the DReAmocracy-app, which provides the user with the following options: (1) select or upload the library of drug lists from prior efforts on a specific disease, (2) query a registered drug of interest (3) adjust the parameters of the weight-modulated majority voting scheme for scoring a drug in terms of its candidacy against a disease.

C-156: Role of glycans in the interaction of N. meningitidis with human cells: a computational study
Track: General Computational Biology
  • Bernardina Scafuri, Department of Chemistry and Biology “A. Zambelli”, University of Salerno, Italy
  • Anna Palmieri, Department of Chemistry and Biology “A. Zambelli”, University of Salerno, Italy
  • Anna Marabotti, Department of Chemistry and Biology “A. Zambelli”, University of Salerno, Italy


Presentation Overview: Show

Neisseria meningitidis can cross human endothelial cells, causing meningitis. The species-specific mechanism involves an initial interaction between meningococcal type IV pili (Tfp) and the human CD147 receptor [1].
Using computational methods, we predicted the structure of the meningococcal Tfp, which consists of pilins-E and pilins-V; using GlyProt [2] and Sweet [3] software, we constructed various models of the glycosylated human CD147 receptor and glycosylated CD147 receptor of mouse and chimpanzee as control structures. The interaction between meningococcal Tfp and the various CD147 receptors was simulated with Web servers that performed glycosylated protein-protein docking.
The simulations predicted energetically favourable interactions between the glycan bound to asparagine 186 of human CD147, with Neu5Ac sialic acid (typical human) without fucosylations, and meningococcal Tfp. In contrast, when glycan contains Neu5Gc (typical animal) and fucosylations, no positive interactions were predicted. In the simulations conducted between meningococcal Tfp and glycosylated chimpanzee and mouse CD147 (containing once Neu5Ac and once Neu5Gc), no possible interaction was found between involved glycan and meningococcal Tfp.
This study emphasises the importance of glycans in this interaction and of the sialic acid Neu5Ac present on glycan antennae.

References
[1] Bernard et al. Nat Med. 2014 Jul;20(7):725-31.
[2] http://www.glycosciences.de/modeling/glyprot/php/main.php
[3] http://www.glycosciences.de/modeling/sweet2/doc/index.php

C-157: Predicting the effects of SARS-CoV-2 VoCs on human antibody interaction
Track: General Computational Biology
  • Nancy D'Arminio, Department of Chemistry and Biology “A. Zambelli”, University of Salerno, Fisciano (SA), Italy, Italy
  • Bernardina Scafuri, Department of Chemistry and Biology “A. Zambelli”, University of Salerno, Fisciano (SA), Italy, Italy
  • Deborah Giordano, National Research Council, Institute of Food Science, Avellino, Italy, Italy
  • Angelo Facchiano, National Research Council, Institute of Food Science, Avellino, Italy, Italy
  • Anna Marabotti, Department of Chemistry and Biology “A. Zambelli”, University of Salerno, Fisciano (SA), Italy, Italy


Presentation Overview: Show

Since the outbreak of SARS-CoV-2 in November 2019, several variants of interest (VOCs) have appeared and spread rapidly worldwide. Particularly, Omicron sub-lineages as BA.1, BA.2, BA.3, BA.4 and the BA.2.75, unofficially indicated as Centaurus [PMID: 36366461]. We compiled a non-redundant dataset containing 172 representative spike ensembles bound to different antibodies, manually selected from the PDB database (www.rcsb.org). For each of them, we modelled automatically the variants mentioned above, despite all the anomalies in the pdb files [PMID: 37031054]. We considered the H-bonds and hydrophobic interactions between each different chain of the non-mutated spike and the antibody in the selected complexes using LigPlot [PMID: 21919503] and detected the ionic interactions with an in-house developed Perl script. The complete analyses were supported by R programming (https://www.r-project.org). Lastly, we computed variants interactions and compared them with the ones of the prior version. This allowed us to determine whether the modifications could alter the way spike chains and antibodies bind to each other. Our results provide information on the behavior of mutant spikes in the presence of antibodies, which is important for both therapeutic development and the evaluation of vaccine efficacy.

C-158: Bioinformatics framework to study molecular and cellular similarity between 3D cell cultures and tissues
Track: General Computational Biology
  • Thierno Sidy Bah, CEA, France
  • Florian Jeanneret, CEA, France
  • Pauline Bazelle, CEA, France
  • Christophe Battail, CEA, France


Presentation Overview: Show

In recent years, 3D cell cultures (organoids and tumor spheroids) has generated great interest in biological engineering because they seem to allow a better representation of biological complexity than monolayer cell cultures. However, few works have yet focused on the detailed analysis of the levels of molecular and cellular similarity between different 3D culture models and with the corresponding reference tissues. We will present our framework integrating several bioinformatics approaches and their automatisation to facilitate the simultaneous analysis of multiple experimental conditions. We will also show the challenges associated with the comparison of transcriptomic proximity between 3D cultures and reference tissues, simulated from single-cell RNA-seq data, and the strategies we have implemented to investigate this question.

C-159: Application of molecular networking for comparing multiple LC/MS profiles from herbal medicines
Track: General Computational Biology
  • Dongyeop Jang, Gachon University, South Korea
  • Sarah Shin, Korea Institute of Oriental Medicine, South Korea
  • Seulgi Lee, Korea Institute of Oriental Medicine, South Korea
  • Chang-Eop Kim, Gachon University, South Korea
  • Jeeyoun Jung, Korea Institute of Oriental Medicine, South Korea


Presentation Overview: Show

Herbal medicines, widely employed in ethnomedicine including traditional Asian medicine, are often derived from a mixture of herbs, considered to impart a synergistic effect beyond the capabilities of single-herb extracts. To elucidate the metabolite-level differences between these complex herbal extracts, we employed molecular networking applied to LC/MS profiles obtained from the herbal medicines. Molecular networking provides a tool to visualize structural similarities among precursors in these profiles, and to uncover chemical alterations that arise from the extraction process involving multiple herbs.
To further explore the differences, we quantified the fold-change in precursor abundance between profiles, and assessed the statistical significance of these fold-changes using bootstrap sampling. This approach helps in managing variability in abundance measurements that may be attributed to factors such as ionization efficiency, chromatographic separation, matrix effects, and instrumental variability.
In our study comparing PM and YM extracts, where PM is a composite of eight herbs and YM is derived from six of the eight herbs used in PM in traditional Asian medicine, we revealed significant metabolite-level changes that occur when multiple herbs are co-extracted. Our results underscore the complex chemical interplay in mixed herbal medicines and highlight the value of molecular networking in understanding these interactions at a metabolite level.

C-160: Improving cell-type identification with Gaussian noise-augmented single-cell RNA-seq contrastive learning
Track: General Computational Biology
  • Ibrahim Alsaggaf, Department of Computer Science and Information Systems, Birkbeck, University of London, United Kingdom, United Kingdom
  • Daniel Buchan, Department of Computer Science, University College London, United Kingdom, United Kingdom
  • Cen Wan, Department of Computer Science and Information Systems, Birkbeck, University of London, United Kingdom, United Kingdom


Presentation Overview: Show

Cell-type identification is an important task for single-cell RNA-seq data analysis. Due to the recent successes of contrastive learning, we propose a novel contrastive learning-based cell-type identification method GsRCL. The experimental results suggest that GsRCL successfully obtained state-of-the-art performance and outperformed other well-known cell-type identification methods.

C-161: A metatranscriptomic approach to study the effect of multiple stressors on the microbial community involved in CPOM degradation in freshwater habitats
Track: General Computational Biology
  • Aman Deep, Biodiversity, University of Duisburg-Essen, Germany
  • Tom Stach, Enviormental Metagenomics Research Center, One health Ruhr University Alliance Ruhr, Germany
  • Jens Boenigk, Biodiversity, University of Duisburg-Essen, Germany
  • Daniela Beisser, Biodiversity, University of Duisburg Essen, Germany


Presentation Overview: Show

The degradation of coarse particulate organic matter (CPOM) in streams largely depends on microbes, specifically fungi, and bacteria. However, the effect of environmental factors, such as temperature and salinity, on the metabolic pathways and enzymatic reactions of these degraders is undetermined. Our aim is to elucidate the role of different fungal and bacterial groups and their particular role in the enzymatic decomposition of CPOM in Emscher/Boye and Kinzig catchments, with and without stressors. We use DNA stable isotope probing (DNA-SIP), amplicon, and metatranscriptomic sequencing to study the taxonomical and functional diversity involved in leaf litter degradation. To determine the active taxa involved in degradation, we have used 13C labeled Alder leaves. The DNA of the active degraders of 13C leaves will be extracted by DNA-SIP. These targeted organisms will be further analyzed for metabolic pathways and enzymatic reactions. To analyze the metatranscriptomic sequences, we built a pipeline to preprocess the eukaryotic and prokaryotic mRNA and map it to databases like Mycocosm, CAZy, and NCBI for taxonomic and functional information. While studying the effect of multiple stressors, such as temperature, salinity, and flow velocity, we are testing the hypothesis that function recovers faster than community due to functional redundancy.

C-162: Multi-task transfer learning-based polygenic risk score modeling for identifying individuals with high risk of Alzheimer’s disease
Track: General Computational Biology
  • Sang-Hyuk Jung, University of Pennsylvania, United States
  • Jaeyoung Kim, Sungkyunkwan University, South Korea
  • Hong-Hee Won, Sungkyunkwan University, South Korea
  • Li Shen, University of Pennsylvania, United States
  • Dokyoon Kim, University of Pennsylvania, United States


Presentation Overview: Show

Alzheimer's disease (AD) is a complex neurodegenerative disorder with multiple underlying biological pathways and mechanisms. Recent studies have shed light on these pathways and mechanisms, yet there remains a need to comprehensively integrate genetic susceptibility from multiple pathways for the effective prediction of AD.
We propose a novel approach based on transfer learning and multi-task polygenic risk score modeling (MT-PRS). The proposed MT-PRS involves learning key biomarkers (Aβ [A], tau processing [T], and neurodegeneration [N]) associated with the development of AD and exploring the role of pathway-specific genetic susceptibility in AD. In addition, we performed a conditional analysis according to APOE status.
We identified significant PRSs for AD and A/T/N biomarkers, with the proposed MT-PRS outperforming these predictors. Moreover, we identified significant pathways associated with AD not only with A/T/N but also with immunity and endocytosis. The identified pathways contributed to predicting AD risk after adjusting for APOE status.
In conclusion, our findings emphasize the usefulness of our approach, which takes into account additional biological pathways, in predicting AD and enhancing biological resolution. The proposed MT-PRS holds the potential to enhance the current understanding of the genetic basis of AD and provide novel insights into personalized prevention and treatment strategies.

C-163: GraphFusion: An Intuitive Web-Based Graph Analytics, Fusion, and Visualization Tool
Track: General Computational Biology
  • Carlos Garcia-Hernandez, Barcelona Supercomputing Center (BSC), 08034 Barcelona, Spain, Spain
  • Noël Malod-Dognin, Barcelona Supercomputing Center (BSC), 08034 Barcelona, Spain, France
  • Nataša Pržulj, Barcelona Supercomputing Center (BSC), 08034 Barcelona, Spain, Serbia


Presentation Overview: Show

We developed GraphFusion, a light, modular, and scalable platform as a web application to combine several graph analysis methodologies and data fusion tools. It works as a graph analytics and visualization instrument that helps users explore, analyze, and visualize complex networks and graphs. It is an ideal solution for working with social networks, biological networks, and other graph datasets.

GraphFusion's user-friendly interface allows users to readily load and manipulate datasets in a wide range of models or representations, such as undirected graphs, directed graphs, probabilistic graphs, simplets, and hypergraphs.

GraphFusion also includes advanced graph inspection and comparison algorithms to work with, including pairwise analysis, data versus model analysis, network alignment, node clustering, annotations-based enrichment analysis, and a flexible data-fusion section to frame any joint NMF-based (Non-negative Matrix Factorization) optimization problems without having to code the optimization procedure. Those tools can help identify essential nodes and highlight features and communities within a graph or group of related graphs.

Overall, GraphFusion is a powerful and intuitive interface ideal for researchers and data analysts who need to study and visualize complex graph datasets with easy-to-use analysis algorithms and visualization capabilities.

C-164: Exploring cytokine pathways as potential determinants of multimorbidity: a Mendelian randomisation study
Track: General Computational Biology
  • Nikita Hukerikar, Institute of Health Informatics, Faculty of Population Health, University College London, London, UK, United Kingdom
  • Floriaan Schmidt, Institute of Cardiovascular Science, Faculty of Population Health, University College London, London, UK, United Kingdom
  • Chris Finan, Institute of Cardiovascular Science, Faculty of Population Health, University College London, London, UK, United Kingdom
  • Aroon Hingorani, Institute of Cardiovascular Science, Faculty of Population Health, University College London, London, UK, United Kingdom
  • Folkert Asselbergs, Institute of Cardiovascular Science, Faculty of Population Health, University College London, London, UK, United Kingdom
  • Sandesh Chopade, Institute of Cardiovascular Science, Faculty of Population Health, University College London, London, UK, United Kingdom


Presentation Overview: Show

Multimorbidity, the co-occurrence of two or more distinct diseases, presents a challenge in research and healthcare. Cytokines and their regulators play an important role in acute and chronic inflammatory and immune responses. Moreover, they are pleiotropic, thereby having implications for multimorbid diseases. We aimed to characterise the role of cytokines as determinants of multimorbidity, and identify druggable cytokines relevant for therapeutic interventions in multiple diseases.

We leveraged Mendelian randomisation analysis to identify associations between 139 immune-proteins and 64 traits relevant in multimorbidity. The multiplicity corrected results were annotated with information on druggability, licensed indications and trait-tissue associations, creating a knowledge graph. This network was queried to identify cytokine communities and to identify a prioritised subset of pleiotropic cytokines.

Diseases affected by multiple proteins included irritable bowel disease (IBD), lung cancer and type 2 diabetes. 24 proteins were prioritised based on their graph importance, including the druggable proteins C4B, CX3CL1 and IL1RN. Diseases and traits predominantly affected by these proteins included IBD, dementia, cholesterol and asthma.

We found strong genetic support for plasma cytokines partially determining the onset of multimorbidity and identified druggable proteins which might be pursued in drug development programs.

C-165: Cell shape characterization, alignment and comparison using FlowShape
Track: General Computational Biology
  • Casper van Bavel, KU Leuven, Belgium
  • Wim Thiels, KU Leuven, Belgium
  • Rob Jelier, KU Leuven, Belgium


Presentation Overview: Show

The shape of a cell is tightly controlled, and plays a crucial role during processes such as morphogenesis.
Cell shape is strongly linked to the differentiation of the cells and can further be used to infer important cellular properties such as force generation, cortical tension and adhesion properties.
Modern microscopy and image processing technologies have made it easier than ever to access accurate, high-resolution cell shapes.

We propose FlowShape, a new framework to study the shape of cells.
Our method represents shapes as a single function on a sphere: the mean curvature.
This function is then characterized by decomposing it into a sum of simple functions called spherical harmonics.
We applied these methods to real data from C. elegans embryos.
We show how this decomposition can be applied in a variety of applications:
aligning cells by finding the optimal rotation that matches their shapes,
clustering cells into groups with similar shapes,
scanning for structural features such as lamellipodia and
statistically determining phenotypes that show changes in cell shape.

C-166: Inferring mutational signatures from distinct copy-number events
Track: General Computational Biology
  • Tom L Kaufmann, Max Delbrueck Center for Molecular Medicine, Germany
  • Roland F Schwarz, Cancer Research Center Cologne Essen (CCCE), University Hospital and University of Cologne, Germany


Presentation Overview: Show

Somatic copy number alterations include large-scale events, such as chromosome arm-level gains and losses as well as focal amplifications and deletions and play a key role in the evolutionary processes that shape cancer genomes. In the case of small-scale events such as point mutations and indels, there exists a list of established mutational signatures that can be linked to distinct exogenous or endogenous exposures such as tobacco use. Despite previous efforts, accurate and meaningful copy-number signatures are still elusive. The biggest obstacle in creating copy-number signatures is that due to their cascading nature, traditional segment-based representations of copy number do not reveal individual evolutionary events.
Here we introduce a new method for deriving copy-number signatures that explicitly models evolutionary copy-number events. We derive these events using a minimum evolution framework based on our phylogenetic copy-number model MEDICC2 (Kaufmann 2022, Genome Biology) and employ a probabilistic approach to resolve ambiguous evolutionary trajectories.
We demonstrate the power of our approach on an independent simulation of mutational processes and real world data from 2,778 tumors from the Pan Cancer Analysis of Whole Genomes and demonstrate how the extracted copy-number signatures reveal novel insights into the nature of the mutational processes shaping cancer genomes.

C-167: The Neglected Giants: Uncovering the Prevalence and Functional Groups of Huge Proteins
Track: General Computational Biology
  • Aníbal Segurilho Amaral, Centro Andaluz de Biología del Desarrollo (CABD), Spain
  • Damien Devos, Centro Andaluz de Biología del Desarrollo (CABD), Spain


Presentation Overview: Show

In this study, we focused on an often-overlooked extreme aspect of biology: the outliers of the protein length distribution, specifically those with more than 5000 amino acids, which we refer to as Huge Proteins. By examining UniprotKB we discovered more than 41,000 Huge Proteins, the majority in Eukaryotes and a significant proportion in Prokaryotes. The phyla with the highest propensity for Huge Proteins are Apicomplexa and Fornicata. Moreover, we observed that certain Bacteria, mostly members of the PVC superphylum, have the same tendency for possessing Huge Proteins than the average Eukaryote. To investigate if these proteins represent “real” proteins, we explored several indirect metrics, finding that the vast majority most likely exist. Additionally, we examined the orthologs of these proteins and identified around 7,000 clusters of homologous sequences, revealing functional groups related to key cellular processes such as cytoskeleton organization, or regulating transcription and translation. For Bacteria, the major clusters have functions related to Non-Ribosomomal peptide synthesis/Polyketide synthesis, pathogen-host attachment or recognition surface proteins. Further exploration of the domain annotations supported the previously identified functional groups. These findings underscore the need for further investigation of the cellular and ecological roles of these remarkable proteins and their potential impact and applications.

C-168: Systematic discovery of directional regulatory motifs associated with human insulator sites
Track: General Computational Biology
  • Naoki Osato, Waseda University, Japan
  • Michiaki Hamada, Waseda University, Japan


Presentation Overview: Show

Chromatin interactions are essential in enhancer-promoter interactions (EPIs) and transcriptional regulation. Transcription factor (TF) CTCF, which binds to chromatin interaction anchors, is the main insulator protein for EPIs in vertebrate. However, there is still no overall understanding of TFs and proteins involved in chromatin interactions and insulator functions. To identify the DNA-binding motifs of TFs, here, we describe a systematic and comprehensive deep-learning-based approach for this purpose. We discover 99 directional and non-directional biases of motifs in human fibroblast cells, which include those of 23 TFs related to an insulator function, CTCF, and/or other transcriptional regulations in previous studies. The estimated CTCF orientation bias is consistently proportional to the CTCF orientation rate at chromatin interaction anchors. Non-directional motifs consist only of palindromic motifs of TFs and their interacting TF. These findings reveal that the directional bias of motifs is associated with insulator functions and other chromatin regulations potentially through structural interactions.

C-169: Causal relationships between human mobility and the spread of Covid19 in Spain
Track: General Computational Biology
  • Camila Pontes, Barcelona Supercomputing Center, Spain
  • Miguel Ponce de León, Barcelona Supercomputing Center, Spain
  • Alexandre Arenas, Universitat Rovira i Virgili, Spain
  • Alfonso Valencia, Barcelona Supercomputing Center, Spain


Presentation Overview: Show

Human mobility is known to be a key factor in the spread of infectious diseases. During the Covid19 pandemic, the rapid spread of the virus caused healthcare systems to collapse in many countries, contributing to a large number of deaths. To avoid these undesirable outcomes, understanding the causal relationships between commuting flows and the spread of infectious diseases is crucial. With this objective in mind, we applied an information theoretic approach called Transfer Entropy, TE(X,Y), to measure the directed influence of the mobility-associated risk on patch X over the Covid19 incidence on another patch Y over time. We first validated our approach using simulated epidemiological data generated by a SIR model called EpiCommute. We then calculated the TE between all provinces in Spain using the time-series data from the Spanish cross-referenced Covid-19 Flow-Maps geographic information system. As a result, we identified the main drivers of the pandemic at each time period, spotting important known epidemiological events such as the outbreak in Lleida during the Summer of 2020 caused by the incoming flow of temporary workers. These results help clarify how human mobility contributes to the dynamic spread of infectious diseases and can be used to inform future non-pharmaceutical interventions.

C-170: Modality agnostic automated integration and annotation of single-cell datasets
Track: General Computational Biology
  • Sikander Hayat, Uniklinik RWTH Aachen, Germany
  • Yang Xu, BROAD Institute, United States


Presentation Overview: Show

Single-cell revolution has made it possible to investigate heterogeneity in biological systems at single-cell resolution. Multiple modalities and computational methods are available to analyse such datasets. These datasets cover a wide range of biological scenarios such as tissue development, perturbation, and disease phenotypes. As there are no well-established protocols to automatically annotate and optimally integrate these datasets, it is challenging to leverage their full potential for systematic data-driven discovery of disease signatures. For example, different research groups annotate their cell-types manually and the importance of marker genes employed is not shared. This leads to a situation where similar cell-types can be annotated differently using different sets of markers. Furthermore, most existing tools for data integration are not yet interpretable. Moreover, these methods are computationally expensive to use as they require GPUs to perform efficiently. This makes some of these methods out of the reach of researchers without access to expensive computational hardware. To address interpretability, reproducibility and scalability, we have developed a set of tools for automatic annotation (MACA), and integration of different modalities (MASI). I will present our benchmark studies and our newer graph-based approaches to integrate spatial transcriptomics, single-cell chromatin accessibility, DNA methylation, and histone modification data.

C-171: Investigating the Potential of Incorporating Large Language Models into ML/DL Approaches for Enhanced Prediction of Allosteric Sites in Proteins
Track: General Computational Biology
  • Moaaz Ur Rehman Azhar Khokhar, Koc University, Turkey
  • Attila Gürsoy, Koc University, Turkey
  • Özlem Keskin, Koc University, Turkey


Presentation Overview: Show

Allostery is the process by which binding at one site perturbs a distant site. Allosteric drugs activate or inhibit proteins and offer advantages over non-allosteric drugs. However, the identification of allosteric sites is challenging due to their distance and lack of conservation across protein structures. Machine learning (ML) approaches have been employed to predict allosteric sites, but the performance of these methods needs further improvement. This research investigates the potential of incorporating Large Language Models (LLMs) such as ProtBERT into ML/DL approaches for better prediction of allosteric sites. Preliminary results show that small Multi Layer Perceptrons (MLPs) without LLMs can achieve an F1 score of upto ~50%. This study contributes to research on protein structure and function prediction, potentially enabling identification of allosteric sites for drug discovery and protein engineering.

C-172: The sex-specific genetic architecture of childhood asthma
Track: General Computational Biology
  • Amelie Fritz, Technical University Denmark, Denmark
  • Klaus Bønnelykke, Copenhagen Prospective Studies on Asthma in Childhood, Copenhagen University Hospital, Herlev-Gentofte, Denmark
  • Anders Gorm Pedersen, Technical University Denmark, Denmark


Presentation Overview: Show

Childhood asthma is the most common reason for hospitalization in early childhood. From epidemiological studies, it is evident that the prevalence is higher in boys than girls. After puberty, it is more prominent in women than men. The heritability of childhood asthma is estimated to be between 60-90%. This suggests that the genetic components driving the development of childhood asthma have a sex-specific effect. Yet, most association studies do not consider gender in their analysis.

In this project, a Bayesian logistic regression model with a variant-sex interaction term was developed to identify SNPs that have a sex-specific effect on childhood asthma. Discovery studies were conducted in a dataset of 1189 children with severe asthma from Copenhagen Prospective Studies on Asthma in Childhood and 5094 controls. 77 variants have a posterior probability of interaction higher than 95%. Sex-stratified analysis confirms the sex-specific effect in both data sets.

Variants are found to be part of the genes IL1R1 and CLEC16A, known for being associated with asthma previously, and 4 of the top 9 interacting SNPs are expressed in lung tissue.

C-173: Downstream analysis from single and multi-omics data from Parkinson’s Disease patients
Track: General Computational Biology
  • Efi Athieniti, Cyprus Institute of Neurology and Genetics, Cyprus
  • George M. Spyrou, Cyprus Institute of Neurology and Genetics, Cyprus


Presentation Overview: Show

This project aims to extract molecular markers and pathogenic mechanisms associated with the progression of Parkinson’s Disease (PD) from the analysis of multi-omics (MO) datasets. We utilise the Parkinson’s Progression Marker Initiative dataset which includes multi-omics datasets for PD patients (blood RNA, miRNA and plasma proteomics).

We obtain pathway enrichment analysis (PEA) results using a single modality enrichment analysis method and two integrative PEA tools: MO Gene Set Analysis (MOGSA) and Multi-Omics Factors Analysis (MOFA). The tool MOGSA produces an integrative pathway enrichment analysis from a combined set of differentially expressed features.The tool MOFA uses dimensionality reduction to produce an integrated view of the three modalities and then performs pathway analysis on this combined output.

We present the enriched processes obtained from the single modality enrichment analysis and the two integrative methodologies to highlight disease mechanisms. We find a high overlap between the pathways obtained from the three methods. The integration methods allow us to re-rank and prioritise pathways that are important across all layers. In addition, pathways with low significance from only one omics layer are discarded, allowing a smaller more confident set to be obtained.

C-174: Multi-layer analysis on imaging and proteomics data from patients with Mild Cognitive Impairment and Alzheimer’s disease
Track: General Computational Biology
  • Sotiroula Afxenti, The Cyprus Institute of Neurology and Genetics, Cyprus
  • Margarita Zachariou, The Cyprus Institute of Neurology and Genetics, Cyprus
  • Marios Tomazou, The Cyprus Institute of Neurology and Genetics, Cyprus
  • Nestoras Karathanasis, The Cyprus Institute of Neurology and Genetics, Cyprus
  • Nancy Lambrianides, The Cyprus Institute of Neurology and Genetics, Cyprus
  • Marios Pantzaris, The Cyprus Institute of Neurology and Genetics, Cyprus
  • George M. Spyrou, The Cyprus Institute of Neurology and Genetics, Cyprus


Presentation Overview: Show

The ability to integrate the abundance of biomedical information available for a disease is a great challenge yet, it can help understand better the underlying mechanisms and build more comprehensive profiles. This study aims to develop a computational framework that integrates multi-source data methods and network-based approaches for more precise diagnostic and therapeutic approaches.
To do so, Alzheimer’s disease (AD) is used as a case study. Subjects with normal cognition (CN), mild cognitive impairment (MCI) and AD are collected. MRI measurements, protein expression data and clinical assessments are obtained from the AD Neuroimaging Initiative (ADNI) database.
Single layer analyses and multi-layer analyses are conducted to obtain molecular and biomedical profiles. List2Net, an in-house tool that represents lists in a network context, is used to create subject-to-subject networks for all different combinations of the CN, MCI, AD based on their within layers and across layers correlation. Multi-omics factor analysis (MOFA) and mixOmics tool will be used to obtain an integrated vector of brain imaging and protein expression data. Graph clustering methods are applied in both the single and the multi-layer generated networks and are evaluated with the label-based clustering to assess the contribution of each approach.

C-175: DNA from multiple viral species are associated with Alzheimer’s disease risk
Track: General Computational Biology
  • Marlene Tejeda, Boston University School of Medicine, United States
  • John Farrell, Boston University School of Medicine, United States
  • Congcong Zhu, Boston University School of Medicine, United States
  • Lee Wetzler, Boston University School of Medicine, United States
  • Kathryn Lunetta, Boston University School of Medicine, United States
  • William S. Bush, Case Western Reserve University School of Medicine, United States
  • Eden R. Martin, University of Miami, United States
  • Li-San Wang, University of Pennsylvania, United States
  • Gerard Schellenberg, University of Pennsylvania School of Medicine, United States
  • Margaret A. Pericak-Vance, University of Miami, United States


Presentation Overview: Show

INTRODUCTION: In this study, we investigated the association between various viral agents and the risk of Alzheimer's disease (AD). We examined a large sample of AD cases and controls by comparing the quantity of viral reads identified in their DNA samples.

METHODS: We used both whole exome sequencing (WES) and whole genome sequencing (WGS) datasets and selected DNA sequence reads that did not align to the human genome, mapped them to viral reference sequences, quantified them, and tested them for association with AD.

RESULTS: Our results showed that several viruses were significant predictors of AD based on machine learning classifiers. Subsequent regression analyses showed that HSV-1 (OR=3.71, P=8.03x10−4) and HPV-71(OR=3.56, P=0.02), were significantly associated with AD after Bonferroni correction. The quantity of reads from the phylogenetic family Herpesviridae was significantly associated with AD in several strata of the data (P<0.01). Utilizing a novel propensity score matching algorithm, we found a significant association between HSV-1 and AD (OR=1.10, P= 0.02) using a regression model on a sample of 5828 AD cases and 6487 controls.

DISCUSSION: Overall, our findings support the hypothesis that viral infection, particularly HSV-1, is linked to AD risk.

C-176: Prediction and differential analysis of chromatin compartments from Hi-C data
Track: General Computational Biology
  • Cyril Kurylo, INRAE, France
  • Elise Maigné, INRAE, France
  • Matthias Zytnicki, INRAE, France
  • Sylvain Foissac, INRAE, France


Presentation Overview: Show

Like gene expression, the 3D structural organization of animal genomes varies widely across cell types or environments. Chromosome compartmentalization, for instance, characterizes genomic regions of different properties: active transcription and open chromatin for “A” compartments, compact chromatin and low gene expression for “B” compartments. Available computational tools can process chromosome conformation capture “Hi-C” data from a single experiment and assign compartment types to genomic regions, usually using a PCA-based dimensionality reduction. Much remains to be done to accurately detect compartmentalization differences between groups of samples.
Here we present HiCDOC, a Hi-C data analysis method for the automatic identification and comparison of A/B compartments between groups of Hi-C matrices. Unlike traditional PCA-based methods, HiCDOC performs a constrained K-means clustering to assign A or B compartments to genomic regions in multiple datasets simultaneously, using information from biological replicates to enhance accuracy. A statistic reflects the prediction confidence at each position and identifies regions with significant compartment differences (A=>B or B=>A) between experimental groups. First results show that HiCDOC compares favorably with dcHiC, the only other tool with similar functionalities.
HiCDOC is available as an R Bioconductor package: https://github.com/mzytnicki/HiCDOC

C-177: Variant impact based patient similarity networks for cancer subtype analysis
Track: General Computational Biology
  • Hakime Öztürk, DKFZ, Germany
  • Nagarajan Paramasivam, DKFZ, Germany
  • Simon Kreutzfeldt, NCT, Germany
  • Peter Horak, NCT, Germany
  • Christoph Heilig, NCT, Germany
  • Stefan Fröhling, NCT, Germany
  • Daniel Huebschmann, DKFZ, Germany
  • Oliver Stegle, DKFZ, Germany


Presentation Overview: Show

Computational methods that decipher rare and private somatic changes can provide critical insights into the underlying mechanisms of cancer development and progression. Identifying potential cancer subtypes that might be associated with diverse biological responses is a key first step to define target therapeutics.

Machine and deep learning (ML/DL) methods that use clinical and/or multi-omics data have been adopted for the identification of cancer subtypes. There also exists a growing collection of sequence-based ML/DL models that accurately predict different epigenetic traits (e.g. transcription factor binding), and allow for estimating the impact of individual somatic aberrations. The application of sequence-based MD/DL on a genome-wide scale enables augmenting somatic mutations by a model-based view that captures functionally relevant differences between individuals.

In this study, we adopt SEI, a sequence-based DL model that is trained to predict more than 21K different regulatory activities, to obtain mutation impact embeddings. We first identify mutations with strong impacts through investigating clusters of alternative and reference sequence embeddings. Then, mutation impact embeddings are utilized to generate a patient similarity network (PSN) for unsupervised identification of patient subgroups. The proposed approach provides a novel strategy of utilizing variant impact scores in PSNs for cancer subtyping.

C-178: Prevalence of clonal hematopoiesis of indeterminate potential in World Trade Center first responders
Track: General Computational Biology
  • Myvizhi Esai Selvan, Icahn School of Medicine at Mount Sinai, United States
  • Pei-Fen Kuan, Stony Brook University, United States
  • Xiaohua Yang, Stony Brook University, United States
  • Robert J. Klein, Icahn School of Medicine at Mount Sinai, United States
  • Paolo Boffetta, Stony Brook University, University of Bologna, United States
  • Benjamin J. Luft, Stony Brook University, United States
  • Zeynep H. Gümüş, Icahn School of Medicine at Mount Sinai, United States


Presentation Overview: Show

In the aftermath of the terrorist attacks on the World Trade Center (WTC), the first responders received intense exposure to a complex mix of airborne carcinogens that elevated their cancer risk. However, the development of hematologic malignancy is not well studied. With current molecular genomic testing methods, acquired genetic alterations in hematopoietic precursor cells can be detected even prior to overt hematological manifestations. This finding has been defined as clonal hematopoiesis of indeterminate potential (CHIP). We hypothesized that exposure to WTC debris may have led to CHIP-specific mutations.

This study aims to determine whether i) prevalence of CHIP is elevated in WTC responders and ii) CHIP mutations are associated with phenotypes such as age, ancestry, smoking, WTC debris exposure and blood count parameters. To this end, we performed deep whole exome sequencing of blood in 350 WTC responders. We then analyzed CHIP mutations associated with hematologic malignancy.

Consistent with literature, we found that prevalence of CHIP increased with age. Furthermore, we observed that the responders exposed to the WTC debris had significantly higher rates of CHIP mutations than unexposed individuals. These findings will aid in the development of specialized cancer screening programs for WTC responders.

C-179: Prevalence of clonal hematopoiesis of indeterminate potential (CHIP) in Inflammatory bowel disease (IBD) patients
Track: General Computational Biology
  • Myvizhi Esai Selvan, Icahn School of Medicine at Mount Sinai, United States
  • Daniel I. Nathan, Icahn School of Medicine at Mount Sinai, United States
  • Giulia Collatuzzo, University of Bologna, Italy
  • Daniela Guisado, Icahn School of Medicine at Mount Sinai, United States
  • Paolo Boffetta, Stony Brook University, University of Bologna, United States
  • Louis J. Cohen, Icahn School of Medicine at Mount Sinai, United States
  • Bridget K. Marcellino, Icahn School of Medicine at Mount Sinai, United States
  • Zeynep H. Gümüş, Icahn School of Medicine at Mount Sinai, United States


Presentation Overview: Show

Clonal hematopoiesis of indeterminate potential (CHIP) refers to the presence of somatic mutations in blood in hematologic malignancy associated genes, but without any clinical evidence of hematologic disease. However, CHIP is a known risk factor for hematologic malignancy and other systemic diseases. Some factors that increase CHIP prevalence include age, smoking and inflammatory conditions. As inflammatory bowel diseases (IBD), including ulcerative colitis (UC) and Crohn’s disease (CD), are characterized by increased inflammation, we hypothesized that individuals with IBD may have elevated rates of CHIP-specific mutations.

This study aims to characterize the role of disease activity or clinical phenotype in the prevalence of CHIP in IBD patients. To this end, we analyzed CHIP mutations from whole exome sequencing data of IBD patients (587 CD and 441 UC) and 293 controls from Mount Sinai’s IBD cohort, and performed CHIP association analysis using multivariate logistic regression. We then validated our results in an independent cohort.

We found that prevalence of CHIP mutations increased with age, with the top CHIP genes TET2, DNMT3A, AXSL1 and PPM1D. Interestingly, UC patients had significantly elevated levels of CHIP mutations than controls. These findings will aid in the development of CHIP-screening programs for IBD patients.

C-180: Studying relative RNA localization - From nucleus to the cytosol
Track: General Computational Biology
  • Vasilis F. Ntasis, Centre for Genomic Regulation (CRG), Spain
  • Roderic Guigó Serra, Centre for Genomic Regulation (CRG), Universitat Pompeu Fabra (UPF), Spain


Presentation Overview: Show

RNA localization plays a significant role in gene expression regulation. It has been implicated in buffering proteins levels from bursty transcription, nuclear size control, protein localization, and even disease. Hence, estimating transcript localization is of major importance. An approach that has traditionally been followed by many studies in order to investigate relative nuclear and cytosolic RNA localization, is RNA sequencing (RNA-seq) coupled with cellular fractionation. Nevertheless, transcript quantification estimates obtained independently from nuclear and cytosolic RNA cannot be compared, as the total amount of RNA in each of these cellular compartments is usually unknown. Here we show that if, in addition to nuclear and cytosolic RNA-seq, whole cell RNA-seq is also performed, then accurate estimations of the localization of transcripts can be obtained. We first establish the theoretical basis that supports this by formalizing mathematically the relationship between the different RNA abundances. Based on that, we designed a method that estimates for every transcript a localization index. We evaluated our methodology on simulated data. Finally, we compared transcript localization in different human cell lines using bulk RNA-seq data from the ENCODE project, and attempted to explain the differences based on features known to regulate RNA localization.

C-181: Detection of genetic anomalies in ITR-carrying plasmids using PacBio CCS sequencing
Track: General Computational Biology
  • Ayush Shekhar Saxena, Regeneron Pharmaceuticals Inc., United States
  • Clarisse L. Jose, Regeneron Pharmaceuticals Inc., United States
  • Sarah Wolf, Regeneron Pharmaceuticals Inc., United States
  • Alexander Lopez, Regeneron Pharmaceuticals Inc., United States
  • Sooraz Bylipudi, Regeneron Pharmaceuticals Inc., United States
  • Steven Reisenweber, Regeneron Pharmaceuticals Inc., United States
  • Poulami Samai, Regeneron Pharmaceuticals Inc., United States
  • Sven Moller-Tank, Regeneron Pharmaceuticals Inc., United States
  • Leah Sabin, Regeneron Pharmaceuticals Inc., United States
  • Calvin Chen, Regeneron Pharmaceuticals Inc., United States
  • Sheldon McKay, Regeneron Pharmaceuticals Inc., United States
  • Gurinder S. Atwal, Regeneron Pharmaceuticals Inc., United States


Presentation Overview: Show

Gene therapy has the potential to address many loss-of-function genetic disorders by inducing wild-type transgene expression of the faulty gene in the affected cell. Adeno-associated viruses (AAVs) are a promising vector to deliver the transgene because they are replication defective and are not associated with any human disease.

The therapeutic AAV genome contains a gene of interest (GOI) in an ITR – GOI – ITR cassette. ITRs (inverted terminal repeats) allow for this cassette to be packaged inside viral capsids. ITRs are the only viral genetic material that is part of the therapy. Our standard QC of the cultured cassettes in plasmids includes restriction digests, capillary electrophoresis, Sanger sequencing, and Illumina sequencing. With long read PacBio CCS sequencing, we demonstrate limitations in existing QC methods in calling ITR variants.

We discovered a heterogenous population of plasmids with multiple ITR variants, with both the flip and flop alleles in the plasmid and deletions of the hairpin loops. Existing bioinformatic tools are unable to effectively call these variants even using CCS reads. We circumvent these limitations using a custom bioinformatics pipeline. Our work identifies appropriate methods to support AAV production using plasmids.

C-182: Characterizing tumour inflammatory interactions from spatial transcriptomics.
Track: General Computational Biology
  • Gabrielle Persad, Ontario Institute of Cancer Research, University of Toronto, Canada., Canada
  • Kaitlin Kharas, The Hospital for Sick Children, University of Toronto, Canada., Canada
  • Alexandra Rasnitsyn, The Hospital for Sick Children, University of Toronto, Canada., Canada
  • Michael Taylor, Texas Children's Hospital, Baylor College of Medicine, United States., United States
  • Lincoln Stein, Ontario Institute of Cancer Research, University of Toronto, Canada., Canada


Presentation Overview: Show

The microenvironment of solid tumours comprises a wide range of innate and adaptive immune cells, which can characterize the microenvironment into one of two types: cold or hot. Hot tumours are defined as having the presence of tumour-infiltrating T lymphocytes and molecular signatures of immune activation, while cold tumours lack these hallmarks. The clinical significance of these categories is that hot tumours are more likely to have a better response to immune checkpoint blockade therapy. To understand the tumour microenvironment, we need to understand the nature and state of individual cells as well as their juxtaposition. For this project, three pediatric cerebellar brain cancers, known to have variable immune recruitment, will be subject to spatial transcriptomics via 10X Genomics Visium platform and scRNA-seq. Preliminary findings suggest the presence of 3 groupings: (1) metabolically active non-dividing tumour cells, (2) rapidly growing tumour cells expressing markers for both transcriptional activity and stemness, as well as cell cycle genes, and lastly (3) an immune infiltrating region. Identifying patterns in the tumour microarchitecture, will permit the identification of local interactions involved in immune cell infiltration in various pediatric cerebellar brain tumours and will provide knowledge to develop prognostic and predictive biomarkers to guide therapy.

C-183: Characterization of signatures that inform DNA repair deficiencies in cancer
Track: General Computational Biology
  • Miguel M Álvarez, IRB Barcelona, Spain
  • Marcel McCullough, IRB Barcelona, Spain
  • Fran Supek, IRB Barcelona, Spain


Presentation Overview: Show

Different samples of the same tumor type can differ in their across-genome mutation rate spectrum, due to having undergone different combinations of mutational processes, such as those arising from DNA repair pathway deficiencies. These mutational processes could be summarized by SBS-based signatures, but most of these are convoluted (i.e. comprising several processes). Also, SBS-based signatures rely on knowledge of the DNA repair deficiencies of the training samples, in order to assign a signature to a specific aetiology. In our approach, each sample is summarized by a specific profile, which consists on a vector of regression coefficients from the associations between local mutation rates and the local activities/abundances of each DNA repair mark included in the model. We also account for factors known to play a role in mutation rate spectrum, such as replication time and trinucleotide context. Then, via non-negative matrix factorization (NMF) we reduce the dimensions of the all-samples coefficient profile matrix into signatures: each signature will have a different exposure in each sample, so sample outliers could potentially have a DNA repair pathway deficiency that results in an altered mutation pattern, and therefore in an unexpected association between mutation rate and DNA repair activity.

C-184: Interpretable generative model of correlation structure of multidimensional biological datasets
Track: General Computational Biology
  • Piotr Stomma, University of Białystok, Institute of Computer Science, Poland
  • Witold Rudnicki, University of Białystok, Institute of Computer Science, Poland


Presentation Overview: Show

We propose a generative model of artificial datasets that are similar to omics datasets in terms of correlation structure. Our method produces multidimensional data with desired correlation, where the distribution of generated variables is known. It is an alternative to black-box models or classical approaches using Cholesky factorization, which faces problems when sample sizes are small.
It is based on the Local cluster-wise dimensionality reduction of weighted gene correlation network analysis (WGCNA). In the WGCNA approach, one can simulate correlated variables using low dimensionality projection of the clusters to which they belong. Our approach uses that method iteratively in conjunction with our hierarchical clique-based clustering algorithm. We find multiple basis clusterings using edge weight thresholding to learn the structure on multiple resolutions.
We have compared our approach (S2) to a basic one-level simulation protocol (S1) on a reference dataset of 8673 genes, using network statistics and partition similarity of clusters found in simulations to clusters found in reference. As number of thresholds increases, the clustering coefficient distribution converges.Good fit requires accurately capturing the correlation structure, making the model a useful analytical tool.

C-185: Dynamics of Resistance vs Sensitivity of Bacterial Pathogens
Track: General Computational Biology
  • Swetha Usha Lal, University of Warwick, United Kingdom
  • Xavier Didelot, University of Warwick, United Kingdom
  • Matt Keeling, University of Warwick, United Kingdom


Presentation Overview: Show

As antibiotic resistance becomes more prevalent, we explore the dynamics of sensitivity versus resistance to antibiotics in bacterial pathogens using multiple epidemiological compartmental models and stochastic simulations employing the fitness cost and advantage of resistance to understand their mutual relationship, coexistence, co-evolution and relative dynamics as a function of antibiotic usage to determine optimal antibiotic usage for the best treatment outcomes and reduced risk of resistance emergence and spread.
Stochastic simulations are performed to analyse model behaviour, concurrence or difference from one another in their epidemiological dynamics. Inference methods including iterated filtering and partial Markov chain Monte Carlo are performed to demonstrate that the fitness cost and advantage of resistance can be estimated from prevalence data on both susceptible and resistant infections. Model validation and comparison can be used to establish which model, if any, can explain the dataset at hand. This study will help understand model sensitivity on a stochastic scale, whereas previous studies consider a deterministic version.
Once a model and corresponding parameters have been selected and validated, it becomes possible to make predictions on future resistance dynamics under different scenarios of antibiotic use, and make recommendations for optimal use of antibiotics to avoid further increase in resistance.

C-186: On the presence of gastric cancer biomarkers in salivary transcriptome
Track: General Computational Biology
  • Murlidharan Nair, Indiana University South Bend, United States


Presentation Overview: Show

Gastric cancer affects the line the stomach and is known to be one of the leading causes of death worldwide. There are several factors that contributes to this malady, they include environmental, genetic and Helicobacter pylori infection. These cancers unfortunately get diagnosed only in advanced stages, resulting in poor outcomes for the patient. Development of non-invasive and easy to use diagnostic methods can help in catching the disease at an early stage. Towards this, salivary transcriptomic data could reveal potential signatures that could indicate the presence of disease. Here we present the results on the comparative analysis of transcriptome from saliva samples and gastric cancer tissue samples. The results lend credence to consider saliva as an alternative source of biofluid that can serve diagnostic purposes.

C-188: Comparative Genomic Analysis of Colorectal Cancer Microbiome Bacteria to Discover Novel Relationships
Track: General Computational Biology
  • Harshita Keerthipati, Aspiring Scholars Directed Research Program, United States
  • Shriya Viswanathan, Aspiring Scholars Directed Research Program, United States
  • Clinton Cunha, Aspiring Scholars Directed Research Program, United States


Presentation Overview: Show

Colorectal cancer (CRC) is uncontrolled tumor growth that originally starts in either the rectum or colon. Our research is focused on the microbiome in the gut. The end goal is to target signaling pathways in order to decrease the metastasis and malignity of gut tumors by increasing the expression of certain bacteria genes in CRC. The probiotic bacteria’s byproducts may play a role in this process. The usage of the R programming language allowed us to first narrow our target proteins down into a few that were common between known probiotic bacteria. We then utilized NCBI Blast to align the genomes of the probiotic bacteria in order to find structural similarities and differences that may play a role in how effective each probiotic bacteria is in inhibiting CRC. Currently, we are analyzing bacteria present from a recent cancer microbiome review paper to reveal novel phenotypic and genotypic differences at the protein and signaling/pathway levels. We hope to perform Protein Annotation and KEGG Pathway analysis to reveal undiscovered relationships. Eventually, we hope our research will help narrow down specific proteins/pathways in bacteria that microbiology, wet-lab, researchers can manipulate in order to find cheaper and novel ways to reduce colorectal cancer.

C-189: A Fair Experimental Comparison of Neural Network Architectures for Latent Representations of Multi-Omics for Drug Response Prediction
Track: General Computational Biology
  • Tony Hauptmann, Johannes Gutenberg University of Mainz, Germany
  • Stefan Kramer, Johannes Gutenberg University of Mainz, Germany


Presentation Overview: Show

Recent years have seen a surge of novel neural network architectures for multi-omics integration. One important parameter is the integration depth: the point at which the latent representations are computed or merged, which can be early, intermediate, or late. The literature on integration methods grows steadily, however, close to nothing is known about the relative performance of these methods under fair experimental conditions and under consideration of different use cases. We developed a comparison framework that trains multi-omics integration methods under equal conditions. We incorporated four recent deep learning methods, early integration, PCA, and a novel method, Omics Stacking, that combines the advantages of intermediate and late integration. Experiments were conducted on a drug response data set with multiple omics data. Our experiments confirmed that early integration has the lowest predictive performance. Statistical differences can, overall, rarely be observed, however, in terms of the average ranks of methods, Super.FELT performed best in a cross-validation setting and Omics Stacking best on the external test set. When faced with a new data set, Super.FELT is a good option in the cross-validation setting as well as Omics Stacking in the external test set setting.

C-190: Eulerian Parameter Inference: Opportunities and Challenges in Preparing a Method for the Harsh Reality
Track: General Computational Biology
  • Vincent Wagner, University of Stuttgart, Germany
  • Nicole Radde, University of Stuttgart, Germany
  • Sebastian Höpfl, University of Stuttgart, Germany


Presentation Overview: Show

Modeling requires the estimation of model parameters from experimental data. Probabilistic inference returns a distribution, thereby inherently estimating the uncertainty associated with the parameters.
We present Eulerian Parameter Inference (EPI), a probabilistic inference method based on the concept of random variable transformations. The input of EPI is a simulation model and a data distribution that is assumed to be generated by an underlying parameter distribution. EPI estimates this parameter distribution. In practice, we often have to estimate this distribution from individual samples by using established density estimation approaches. EPI transforms the estimated data distribution into a parameter distribution that is consistent with the observed data. This can be done by only using point-wise evaluations of the simulation model and approximations of its derivatives with respect to the parameters, which directly returns a density value in the parameter space. In particular, we do not require an explicit formulation of the inverse mapping from the output to the parameters.
EPI is parameter-free and provably correct if the parameter inference problem is well-posed.
Besides academic examples, we apply EPI to a diverse set of models ranging from algebraic equations over chaotic maps to ordinary differential equation systems, thereby proving its practical applicability.

C-191: Compositional biases promoting self-assembly establish a link between the genome- and the cell-spatial self-organization
Track: General Computational Biology
  • Audrey Lapendry, ENS de Lyon, France
  • Nicolas Fontrodona, ENS de Lyon, France
  • Audrey Gibert, CNRS, France
  • Didier Auboeuf, INSERM, France


Presentation Overview: Show

Genes are not randomly distributed in the nucleus space, but are organized within more or less dynamical spatial clusters. This genome spatial organization plays a major role in gene expression regulation. Using a variety of experimental datasets, we show that genes in spatial proximity share the same nucleotide composition biases, which could at least in part explain the spatial genome self-organization. In addition, co-localized genes equally biased have a higher probability of being co-regulated by the same transcription factors. They also produce RNAs that share the same nucleotide composition biases, that are co-regulated by the same RNA-binding proteins, and that generate proteins sharing the same amino acid composition biases. As a consequence, proteins produced by co-localized genes share the same physicochemical properties and have a higher probability of belonging to the same cellular sub-compartments. Thus, by analyzing compositional biases - as a proxy of the physicochemical properties of genes and their products - we uncover a link between the spatial organization of genes in the nucleus and the spatial organization of their products (i.e. proteins) in the cell.

C-192: An integrated cell atlas of the lung in health and disease
Track: General Computational Biology
  • Lisa Sikkema, Helmholtz Munich, Germany
  • Malte Luecken, Helmholtz Munich, Germany
  • Fabian Theis, Helmholtz Munich, Germany


Presentation Overview: Show

Single-cell technologies have transformed our understanding of human tissues. Yet, single-cell studies typically capture only a limited number of donors and disagree on cell type definitions. Integrating many datasets can address these limitations of individual studies and capture the variability in the population. Here, we present the integrated Human Lung Cell Atlas (HLCA), combining 49 datasets of the human respiratory system into a single atlas spanning over 2.4 million cells from 486 individuals. The HLCA presents a consensus cell-type re-annotation with matching marker genes, including annotations of rare and previously undescribed cell types. Leveraging the number and diversity of individuals in the HLCA, we identify gene modules that are associated with demographic covariates such as age, sex and BMI, as well as gene modules changing expression along the proximal-to-distal axis of the bronchial tree. Mapping new data to the HLCA enables rapid data annotation and interpretation. Using the HLCA as a reference for the study of disease, we identify shared cell states across multiple lung diseases, including SPP1+ profibrotic monocyte-derived macrophages in COVID-19, pulmonary fibrosis, and lung carcinoma. Overall, the HLCA serves as an example for the development and use of large-scale, cross-dataset organ atlases within the Human Cell Atlas.

C-193: Estimation and Analysis of Cell-Cell Interactions by Axon Guidance Factors from Single-Cell RNA Sequencing Data
Track: General Computational Biology
  • Yoshinori Hayakawa, University of Tsukuba, Japan
  • Haruka Ozaki, University of Tsukuba, Japan


Presentation Overview: Show

Axon guidance governs the growth direction of axons and forms neural circuits, and is crucially dependent on cell-cell interactions. Understanding these interactions provides insights into how circuit formation is achieved in normal and disease brains and can potentially inform neuroregenerative therapies. Single-cell RNA sequencing (scRNA-seq) technologies hold great promise in providing a comprehensive analysis of cell-to-cell interactions through axon guidance factors. However, the lack of computational methods hinders the realization of this potential.

Here, we present a novel data analysis framework, scAG (Single Cell RNA-seq analysis for Axon Guidance), to uncover cell-cell interactions in axon guidance using scRNA-seq data. scAG employs scRNA-seq data and Axon Guidance Related Genes (AGRG) ligand-receptor database for interaction detection and analysis, enabling thorough interaction identification, temporal progression analysis, and comparative studies in axon guidance.

We applied scAG to scRNA-seq data from the mouse cerebral cortex, obtained at different developmental stages from wild-type and Fezf2 mutant mice (Di Bella+ 2021 Nature). Our analysis unveiled stage-specific and mutant-specific cell-type pairs and AGRG ligand-receptor pairs, providing a detailed temporal and comparative overview of cell-cell interactions guiding axon growth. These findings elucidate the intricate cellular coordination necessary for proper neural circuit formation and have implications for understanding neurological disorders.

C-194: Introducing a Novel Simulation Tool for Interconnected Differential Expression Signatures and Its Application to Benchmarking
Track: General Computational Biology
  • Catalina Gonzalez Gomez, Signia Therapeutics ; Virpath, Colombia
  • Manuel Rosa Calatrava, Virpath, France
  • Julien Fouret, Signia Therapeutics, France


Presentation Overview: Show

Pharmaceutical research has long used differential gene expression signatures to study external stimuli like pathogenic determinants or small molecule treatments. These signatures measure expression values for multiple tags and are often compared using the concept of connectivity. Despite the scientific community's efforts to produce unbiased datasets for evaluating connectivity-based methods for drug identification and repurposing, the lack of reliable benchmarking data hinders their effectiveness.

To address this, we developed a simulation method for connected differential expression signatures, that is based on a three layers decomposition and relies on a statistical framework with different levels of parametrization.

We benchmarked seven connectivity scores methods from the literature using our simulated signatures. We then evaluated the capacity of each method to retrieve the most reversed signatures for a specific query, using the area under the precision-recall curves. Moreover, we introduced a novel application perspective by training a siamese neural network with our simulated data to predict the connectivity score.

Overall, our method is a significant advance in pharmaceutical research, providing a reliable way to simulate connected differential expression signatures. It will help develop and evaluate algorithms for comparing signatures to find the most connected or reversed, leading to more effective drug repurposing.

C-195: Rapid and data-driven generation of synthetic NGS Cancer Datasets with SYNGGEN
Track: General Computational Biology
  • Riccardo Scandino, University of Trento, Italy
  • Federico Calabrese, Department of Cellular, Computational and Integrative Biology (CIBIO), University of Trento, Italy
  • Alessandro Romanel, Centre for Integrative Biology (CIBIO) University of Trento, Italy


Presentation Overview: Show

Advances in next-generation sequencing (NGS) technologies such as whole-exome sequencing (WES) and targeted sequencing (TS) have revolutionized cancer genomics and precision medicine. However, accurate interpretation of somatic genomics profiling results from NGS requires reliable computational tools. That's where synggen comes in - a powerful tool written in C programming language that enables researchers to rapidly generate realistic synthetic WES and TS datasets for benchmarking.
Synggen closely mimics real-life cancer sequencing scenarios utilizing non-cancer NGS sequencing files in BAM format fo generate reference models and by incorporating user-specified phased germline polymorphisms, complex allele-specific somatic copy number aberrations and point mutations, as well as the clonality of somatic events and overall tumor content of the sample.
To demonstrate the effectiveness of synggen we simulated two liquid biopsy cfDNA scenarios: cancer data at decreasing tumor content, and cancer data simulating temporal sampling from a patient with dynamic tumor sub-clones’ populations.
Generating WES reference models using one control sample takes approximately 5 minutes with 4 cores, and 2.5 minutes with 16 cores. Generating a FASTQ file with 100 million reads using the same number of cores requires about 10 minutes and 4 minutes, respectively.

C-196: Taxonomic classification of genetically distinct RNA viruses: limits of detection for existing bioinformatics tools
Track: General Computational Biology
  • Igor Sidorov, Leiden University Medical Center, ESCV Network on NGS, Netherlands
  • Jutte de Vries, Leiden University Medical Center, ESCV Network on NGS, Netherlands


Presentation Overview: Show

Viral metagenomics is increasingly used for the detection of viral pathogens in clinical diagnostic settings and a wide variety of the bioinformatic tools are available for taxonomic classification of the metagenomic data. A growing number of studies have been reported on benchmarking of performance of taxonomic classifiers with the use of the experimental and simulated NGS datasets generated for the known viruses. However, benchmarking studies focusing on the detection of genetically distinct viral sequences are scarce.
RNA viruses evolve rapidly with a high rate of accruing mutations in their genomes. Classification of the newly emerging RNA viruses with taxonomic classifiers can be a challenge if the changes in the emerging virus genomes are not reflected in the reference genome databases used for classification. How sensitive are the results of taxonomic classification to the mutations in the newly emerging viruses?
In this study we evaluate the performance of taxonomic classifiers of three types (DNA-to-DNA, DNA-to-protein, and mixed one) in detection of several RNA viruses with simulated mutations that mimic evolution of the virus and produce closely related organisms at a controlled relative phylogenetic distance. With this approach, thresholds of phylogenetic distances for effective detection and classification by each tool are established.

C-197: ADD: a comprehensive database and benchmarking tool for specificity prediction of adenylation domains in nonribosomal peptide synthetases.
Track: General Computational Biology
  • Aleksandra Kushnareva, Helmholtz Institute for Pharmaceutical Research saarland, Germany
  • Alexey Gurevich, Helmholtz Institute for Pharmaceutical Research Saarland, Germany


Presentation Overview: Show

Nonribosomal peptide synthetases (NRPS) are modular enzymes that produce many important secondary metabolites, including antibiotics. Adenylation (A) domains within NRPS determine the final product by recognizing and activating its building blocks - specific amino acids. A-domain specificity prediction is vital for exploring metabolites and engineering NRPS pathways. However, the existing prediction tools have limited accuracy. The software developers lack a comprehensive resource with confirmed A-domain specificities and train their tools on ad hoc datasets.

To address this gap, we present ADD, a database encompassing A-domain sequences, specificities, neighbouring domains, biosynthetic gene clusters (BGCs), and producers' taxonomies. With 3459 entries, our database is the largest of its kind and includes both bacterial (3063) and fungal (396) A-domains. ADD incorporates and unifies records from previously published A-domain specificity datasets and MIBiG, the largest collection of experimentally validated links between secondary metabolite BGCs and their products. We complement our database with a benchmarking utility for assessing the quality of specificity prediction algorithms.

We believe ADD will become a useful resource for training and benchmarking A-domain specificity predictors, and might shed light on the evolutionary dynamics of A-domains in bacteria and fungi.

C-198: THEMA: Identification of molecular mechanisms by which Tumor HEterogeneity influences disease outcome: high-dimension Mediation Analysis to link causes and consequences.
Track: General Computational Biology
  • Florence Pittion, Univ. Grenoble Alpes, CNRS, UMR 5525, TIMC, 38000 Grenoble, France, France
  • Olivier François, Univ. Grenoble Alpes, CNRS, UMR 5525, Grenoble INP, TIMC, 38000 Grenoble, France, France
  • Magali Richard, Univ. Grenoble Alpes, CNRS, UMR 5525, TIMC, 38000 Grenoble, France, France


Presentation Overview: Show

Heterogeneity and composition of the tumor have a major impact on tumor growth, division, resistance (to treatment) and metastasis. We want to test to which extent the tumor heterogeneity impact on disease outcome is explained by the molecular features of the tumor (i.e. gene expression and DNA methylation (DNAm), an epigenetic mark regulating gene expression). Our goal is to develop a new multimodal high-dimension mediation analysis framework to unravel this causal links.
THEMA (our project) will concentrate on pancreatic ductal adenocarcinoma (PDAC) which is a highly heterogeneous cancer and is expected to become the second leading cause of cancer-related mortality by 2025.
Based on methylomes and transcriptomes from public cohorts, we will study the role of DNAm and gene expression in the causal link between tumor heterogeneity and outcome. Then we will perform multimodal high-dimension mediation analysis and question the relationship between the identified mediators (gene expression and DNAm). Finally, we will test how the exposure to treatment affects the mediators.
We expected that THEMA will help to identify molecular mediators of tumor heterogeneity, both at the DNAm and gene expression levels and offer perspectives in the development of new biomarkers and personalized therapeutic treatments.

C-199: Epigenomic variability and transcriptomics as a novel multiomic complementary approach for personalized nutrition in colorectal cancer patients
Track: General Computational Biology
  • Teresa Laguna, IMDEA Food Institute, Spain
  • Oscar Piette, IMDEA Food Institute, Spain
  • Marco Garranzo, IMDEA Food Institute, Spain
  • Marta Gómez-de-Cedrón, IMDEA Food Institute, Spain
  • Ana Ramírez-de-Molina, IMDEA Food Institute, Spain
  • Enrique Carrillo-De Santa Pau, IMDEA Food Institute, Spain


Presentation Overview: Show

Food natural compounds are of interest as modulators of cancer progression and prognosis, as they participate in cellular processes such as growth and differentiation, DNA repair, programmed cell death and oxidative stress. Here we select dietary biocompounds for specific subgroups of 308 colorectal adenocarcinoma (COAD) samples by finding bioactives with opposite transcriptomic profiles to the subgroup-specific tumoral transcriptomes, hypothesizing they may counteract the cancer gene-expression profiles. First, we selected 2189 CpGs based on their differentially variable methylation between tumor and normal samples by a combination of linear and Bartlett tests. Afterwards, samples were meta-clustered by 1) classifying each sample by 8 different methods (including k-means and hierarchical clustering), 2) building a network and 3) meta-clustering it by the edge-betweenness method. We extracted 6 main subgroups, 2 of them with immune-enriched transcriptomes. Finally, we compared the transcriptomes of the 6 subgroups with the ones of 56 in vitro bioactive studies from GEO by Gene Set Enrichment Analysis (GSEA), resulting in a potential positive effect of resveratrol, japonicone A and vitamin D. In summary, we present a promising in silico strategy to propose specific bioactives as co-adjuvants in cancer treatment. Supported by Spanish PN I+D+i PID2019-110183RB-C21 and FNS-Cloud project H2020-EU.3.2.2.3 863059.

C-200: Prediction of tRNA modifications using machine learning
Track: General Computational Biology
  • Anne Busch, Institute of Computer Science, Johannes Gutenberg Universität, Mainz, Germany
  • Mark Helm, Institute of Pharmaceutical and Biomedical Science, Johannes Gutenberg Universität, Mainz, Germany
  • Andreas Hildebrandt, Institute of Computer Science, Johannes Gutenberg Universität, Mainz, Germany


Presentation Overview: Show

Post-transcriptional RNA modifications have emerged as crucial regulatory
elements, exerting influence over diverse cellular processes. Among the 143 dis-
tinct modifications identified thus far, tRNA molecules display remarkable diver-
sity with respect to modifications. Accurate prediction of tRNA modifications
is essential for unraveling their functional significance and exploring potential
therapeutic implications. Although promising results have been achieved using
random forest models to predict m1A modifications, ongoing research aims to
expand these efforts to encompass additional modifications. We aim to develop
deep learning techniques by which the prediction accuracy and efficiency of
tRNA modifications can be significantly enhanced. These advanced techniques
have the capability to capture intricate patterns and relationships within tRNA
sequences, enabling precise identification of modified sites. Moreover, the here
presented prediction pipeline incorporates a specially designed, BED-based data
format for storing modifications and their corresponding predictions.
By employing well-engineered machine learning models to predict a wide
range of tRNA modifications, we aim to provide valuable insights into the regu-
latory mechanisms underlying tRNA modifications and their implications. Ad-
ditionally, the use of a user-friendly, genome viewer compatible data format
enhances accessibility and usefulness for non-computer scientists.

C-201: Proteolizard - A Python-based framework for access, processing, and visualization of timsTOF raw data
Track: General Computational Biology
  • David Teschner, Institute of Computer Science, Johannes Gutenberg University, Mainz, Germany
  • David Gomez-Zepepda, Institute for Immunology, University Medical Center of the Johannes Gutenberg University, Mainz, Germany
  • Thomas Kemmer, Institute of Computer Science, Johannes Gutenberg University, Mainz, Germany
  • Tim Maier, Institute of Computer Science, Johannes Gutenberg University, Mainz, Germany
  • Stefan Tenzer, Institute for Immunology, University Medical Center of the Johannes Gutenberg University, Mainz, Germany
  • Andreas Hildebrandt, Institute of Computer Science, Johannes Gutenberg University, Mainz, Germany


Presentation Overview: Show

Valuable insights into complex disease-driving factors such as heart failure are to be gained by high-throughput omics technologies. Liquid chromatography coupled to mass spectrometry (LC-MS) enables the high throughput profiling of diverse types of molecules such as proteins, peptides, metabolites and lipids. The timsTOF Pro MS raises the dimensionality of generated datasets by an additional ion mobility separation. This results in increased peak capacity and acquisition speed, but the extra dimension significantly increases data complexity and thus requires establishing computationally highly efficient solutions for raw-data processing. Data processing for such complex and voluminous data must be efficient, flexible, and able to be incorporated into existing workflows.

Therefore, we developed Proteolizard, a software toolset bridging high-performance C++ code with user-friendly Python bindings. It enables the seamless integration of timsTOF raw data into Python-centric machine-learning libraries like TensorFlow, PyTorch, and scikit-learn. Proteolizard facilitates effective utilization of multi-core systems or accelerators such as GPUs, and the implementation of novel algorithms based on deep learning, enhancing the analysis of high-dimensional omics data. We specifically implement data access, representation, processing, and visualization in three separate python modules.

Proteolizard is accessible and offered free-of-charge under the GPL3 license on GitHub: https://github.com/theGreatHerrLebert/proteolizard-[data, algorithm, vis].

C-202: Analysis of Single Nucleotide Variants on Patients with Leukemia
Track: General Computational Biology
  • Amanda Bataycan, Computational Science Program, United States
  • Jonathon Mohl, Computational Science Program; Department of Mathematical Sciences; Border Biomedical Research Center, United States
  • Ming-Ying Leung, Computational Science Program; Department of Mathematical Sciences; Border Biomedical Research Center, United States


Presentation Overview: Show

The purpose of this work is to construct an organized dataset of single nucleotide variants (SNVs) for patients with acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL). With Genomic Data Commons and cBioPortal as primary resources, SNV information from whole-exome or whole-genome sequencing data of 149 patients with AML and 603 patients with ALL was obtained. For each patient group, Python scripts were written to compile individual patients’ data files into a single dataset. The resulting AML and ALL datasets allowed us to obtain an overall statistical comparison of SNV counts between tumor and normal samples and between the two leukemia types. The numbers of distinct SNVs found in the AML and ALL datasets were 136071 and 174459 respectively, with the vast majority occurring only in tumor samples. We observed that the number of distinct variants per patient with AML is higher than that for ALL and that the tumor sample variants in both AML and ALL favored mutations that would reduce GC content in genes. These datasets will be used for downstream bioinformatics analyses to compare the two leukemia types with the ultimate aim of identifying SNV effects that can help discover potential targets for gene therapies.

C-203: Distinct immune signatures reveal sex differences in patients with sepsis
Track: General Computational Biology
  • Amanda P. Vasconcelos, School of Pharmaceutical Sciences, University of São Paulo, Brazil
  • Thiago D. C. Hirata, Innovation Center, University of São Paulo (INOVA-USP), Brazil
  • André Guilherme C. Martins, Instituto de Pesquisas Tecnológicas, São Paulo (IPT), Brazil
  • Helder T. I. Nakaya, Hospital Israelita Albert Einstein, São Paulo, Brazil


Presentation Overview: Show

Sepsis is a life-threatening condition triggered by an immune response to infection, and it exhibits varying outcomes in patients based on their sex. We aim to investigate the molecular patterns associated with sex differences in sepsis. We retrieved sepsis-related datasets from GEO-NCBI and developed an R-based workflow to analyze the transcriptomic data. Our workflow involved data processing steps including quality control, normalization, outlier identification, and probe summarization. To confirm or determine the sex of samples, we used a set of genes known as immune sex expression signatures (iSEXS) to differentiate between male and female groups. After identifying differentially expressed genes (DEGs) consistently across multiple datasets, we performed a deconvolution analysis with CIBERSORTx to determine the proportion of specific cell types in each group. Additionally, we used CEMiTool to identify gene co-expression modules and explore enriched signaling pathways in each module. Our study found differentiated DEGs in septic patients, with both groups showing increased myelocytes and band neutrophils, but segmented neutrophil percentage being higher only in pre-pubertal males. Our study contributes to the understanding of sex differences in sepsis by revealing differences in gene expression profiles and the disparity of immature neutrophils with regard to sex and age.

C-204: GenoMaker: a data-driven language for interval-based genomic analysis
Track: General Computational Biology
  • Alberto Riva, Bioinformatics Core, ICBR, University of Florida, United States


Presentation Overview: Show

Genomic analysis often involves complex sequences of operations on chromosomal regions. For example, combining intervals of different sizes, such as the peaks derived from ChIP-Seq experiments or regions associated with genes, measuring the intensity of a signal in these intervals, performing differential analysis of these signals. This is especially true in the field of cancer epigenomics, which requires integrating signals for various epigenetic marks with gene expression data. While tools to perform these operations are commonly available, they are not designed interoperate easily in a reproducible way.

GenoMaker facilitates complex genomic analysis making it efficient, reproducible, and self-documenting. Operations to be executed are specified in a human-readable, high-level language that hides differences between the various underlying tools, and are translated into a “make” file that is then automatically executed. This allows GenoMaker to take advantage of the useful features of the make command: defining generic rules to express transformations between different types of files, and only performing operations that are necessary, without re-creating files that are already up to date. We present an example of its use in a complex cancer epigenomics project. GenoMaker is currently under development, and is available at https://github.com/uf-icbr-bioinformatics/GenoMaker.

C-205: BiochemicalAlgorithms.jl - The Biochemical Algorithms Library in the Julia Language
Track: General Computational Biology
  • Thomas Kemmer, Institute of Computer Science, Johannes Gutenberg University Mainz, Germany
  • Jennifer Leclaire, Institute of Computer Science, Johannes Gutenberg University Mainz, Germany
  • Andreas Hildebrandt, Institute of Computer Science, Johannes Gutenberg University Mainz, Germany


Presentation Overview: Show

As a cornerstone of modern biology, structural bioinformatics is an important tool for the understanding of biomolecular structure, interactions, and dynamics. A plethora of software tools for this purpose have been developed over the recent decades, usually specialized with respect to their focus or target audience. However, they often require in-depth knowledge of both the programming framework and language to be used efficiently. The versatile Julia language bridges the gap between ease of use and high-performance capabilities. In particular, it allows for highly efficient native software with an easily accessible programming interface that can be utilized, e.g., for exploratory data analysis or incorporated into full-fledged data processing pipelines. Here we present BiochemicalAlgorithms.jl, our free and open-source Julia package, built around a comprehensive representation for biomolecular systems, accessible in a DataFrame-based manner. Our representations support multiple common data formats and can be processed through a toolset of data preparation routines, including such for bond reconstruction or inferring missing atoms. Prepared systems can further be used for energy calculations or structure optimization, with protein docking and simple molecular dynamics simulations being planned as future features. BiochemicalAlgorithms.jl is accompanied by a second Julia package, extending its functionality by visualization capabilities for our systems.

C-206: CDAScorer: A Python package for quickly annotating cell death area severity on plant leaves
Track: General Computational Biology
  • Joshua Williams, John Innes Centre, Norwich Research Park, Norwich, United Kingdom
  • Mark Banfield, John Innes Centre, Norwich Research Park, Norwich, United Kingdom
  • Dan MacLean, The Sainsbury Laboratory, Norwich Research Park, Norwich, United Kingdom


Presentation Overview: Show

Loss of crops to plant disease costs billions of dollars every year. Artificial intelligence, including machine vision, can improve manual lab assays to detect and score the severity of disease, allowing more robust conclusions to be drawn when assessing and comparing potential solutions. The disease lesion image data necessary to train computer vision models is sparse. Existing tools to gather and annotate image data are often unable to capture the unique biological context necessary to draw conclusions. Annotation is therefore typically a slow, manual process, lacking continuity between research groups.

I developed a Python package, CDAScorer, to quickly record coordinate and severity score data for cell death areas (CDAs) on plant leaves. CDAScorer is run from the command line. The user interacts with a graphical application window built using the Tkinter framework, dragging a bounding box around the CDA matching given positional metadata, then entering its score. Using a dataset built with CDAScorer, I will train deep learning models to create a computer vision tool to automatically score CDAs without the subjectivity inherent in qualitative visual scoring. Severity scoring will help to automate and streamline plant resistance breeding programs, supporting the development of climate resilient crops.

C-207: Wearables Detect Malaria Early in a Controlled Human-Infection Study
Track: General Computational Biology
  • Sidhartha Chaudhury, Center for Enabling Capabilities, Walter Reed Army Institute of Research and BHSAI/TATRC, United States
  • Chenggang Yu, The Henry M. Jackson Foundation and BHSAI/TATRC, United States
  • Ruifeng Liu, The Henry M. Jackson Foundation and BHSAI/TATRC, United States
  • Kamal Kumar, The Henry M. Jackson Foundation and BHSAI/TATRC, United States
  • Samantha Hornby, The Henry M. Jackson Foundation and BHSAI/TATRC, United States
  • Christopher Duplessis, Malaria Department, Naval Medical Research Center, United States
  • Joel Sklar, Malaria Department, Naval Medical Research Center, United States
  • Judith Epstein, Malaria Department, Naval Medical Research Center, United States
  • Jaques Reifman, BHSAI/TATRC, United States


Presentation Overview: Show

Observational studies on the use of commercially available wearable devices for infection detection lack the rigor of controlled clinical studies, where time of exposure and onset of infection are exactly known. Towards that end, we carried out a feasibility study using a commercial smartwatch for monitoring of heart rate, skin temperature, and body acceleration on subjects as they underwent a controlled human malaria infection (CHMI) challenge. Subjects were asked to wear the smartwatch for at least 12 hours/day from 2 weeks pre-challenge to 4 weeks post-challenge. Using these data, we developed 2B-Healthy, a Bayesian-based infection prediction algorithm that estimates a probability of infection. We compared the infection probability over time with the time to onset of parasitemia, as determined by a daily FDA-approved blood smear diagnostic. Among 10 CHMI subjects, nine developed parasitemia, with an average time to parasitemia of 12 days. 2B-Healthy detected infection in seven of nine subjects (78% sensitivity), where in six subjects it detected infection 6 days before parasitemia (on average). We also investigated 2B-Healthy on eight control subjects for 4 weeks and obtained a false-positive rate of 6%/week. Our findings demonstrate the feasibility of wearables as a screening device to provide early warning of infection.

C-208: Determinants of tissue-specific copy number and aneuploidy patterns in tumours
Track: General Computational Biology
  • Gokce Senger, European Institute Of Oncology, Italy
  • Fabio Alfieri, European Institute Of Oncology, Italy
  • Martin H Schaefer, European Institute Of Oncology, Italy


Presentation Overview: Show

Copy number alterations (CNAs), ranging from local to whole-chromosome-level, are common in cancer genomes. Different cancer types show distinct patterns of those alterations. However, what shapes those patterns and what causes the differences between tissues is poorly understood. We reasoned that differences in the probability of occurrence of CNAs (e.g. the epigenome at genomic breakpoints or lamina attachment regions), and selection acting on CNAs (e.g. negative selection acting against CNA-induced differential production of proteins, positive selection favouring amplification specific regions buffering for the detrimental effects of deleterious mutations) would shape the observed tissue-specific patterns of CNAs. To test this, we first identified individual features correlating with the frequency of focal or chromosomal amplification. We identified a number of genomic, transcriptomic and functional features explaining observed tissue-specific CNA patterns. We then fitted multivariable models that significantly improved the prediction of amplification patterns, demonstrating that combining multiple features can be useful to predict tissue-specific CNA patterns. Our ongoing efforts focus on extending our model by adding epigenomic properties of chromosomes. Taken together, our results highlight the need for a systematic analysis of determinants of tissue-specific alteration patterns and might guide our understanding of tissue-specific tumour evolution and, ultimately, therapy response.

C-209: FunARTS, the Fungal bioActive compound Resistant Target Seeker, an exploration engine for target-directed genome mining in fungi
Track: General Computational Biology
  • Turgut Mesut Yılmaz, The University of Tübingen, Germany
  • Mehmet Direnç Mungan, The University of Tübingen, Germany
  • Aileen Berasategui, The University of Tübingen, Germany
  • Nadine Ziemert, The University of Tübingen, Germany


Presentation Overview: Show

There is an urgent need to diversify the pipeline for discovering novel natural products due to the increase in multi-drug resistant infections. Like bacteria, fungi also produce secondary metabolites that have potent bioactivity and rich chemical diversity. To avoid self-toxicity, fungi encode resistance genes which are often present within the biosynthetic gene clusters (BGCs) of the corresponding bioactive compounds. Recent advances in genome mining tools have enabled the detection and prediction of BGCs responsible for the biosynthesis of secondary metabolites. The main challenge now is to prioritize the most promising BGCs that produce bioactive compounds with novel modes of action. With target-directed genome mining methods, it is possible to predict the mode of action of a compound encoded in an uncharacterized BGC based on the presence of resistant target genes. Here we introduce the “Fungal bioActive compound Resistant Target Seeker” (FunARTS) available at https://funarts.ziemertlab.com. This is a specific and efficient mining tool for the identification of fungal bioactive compounds with interesting and novel targets. FunARTS rapidly links housekeeping and known resistance genes to BGC proximity and duplication events, allowing for automated, target-directed mining of fungal genomes. Additionally, FunARTS generates gene cluster networking by comparing the similarity of BGCs from multi-genomes.

C-210: Decoding toxicological tipping points using single-cell high-throughput transcriptomics
Track: General Computational Biology
  • Imran Shah, US Environmental Protection Agency, United States
  • Bryant Chambers, US Environmental Protection Agency, United States
  • David Gallegos, Takeda Pharmaceutical, United States
  • Dennis Eastburn, BioSpyder Technologies, United States
  • Brian Chorley, US EPA, United States


Presentation Overview: Show

Predicting chemical-induced toxicity presents a multifaceted challenge due to its complex nature and our limited understanding of underlying mechanisms. We are reimagining toxicity by elucidating the biological states during adaptation to chemical exposure and tipping points to adversity. Here, we investigate the hypothesis that single-cell high-throughput transcriptomics (sc-HTTr) can help decode tipping points between cellular adaptation and adversity, obscured in gene expression from bulk samples. We treated human hepatic cell line (HepaRG) with five chemicals that disrupt cellular homeostasis through different pathways, including mitochondrial disruption, oxidative stress, endoplasmic reticulum stress, heat shock, and DNA damage. After dissociating the cells, we used the TempO-LINC platform to generate over 74,000 sc-HTTr profiles with an average of 5,000 genes per cell detected. Clustering the profiles revealed diverse cellular states, including normal, adaptive, autophagic, and apoptotic. We identified putative tipping points marking boundaries between cellular adaptation and adversity by integrating empirical and simulated trajectories based on literature-derived signaling and regulatory networks. This presentation underscores the transformative potential of sc-HTTr in decoding toxicological tipping points, offering a novel perspective in understanding the mechanisms of chemical toxicity and a new approach for estimating human health risks of chemical exposures.

C-211: Three-dimensional in situ mapping of intratumor heterogeneity
Track: General Computational Biology
  • Eduardo Gonzalez-Solares, University of Cambridge, United Kingdom
  • Gregory Hannon, Cancer Research UK Cambridge Institute, United Kingdom
  • Sohrab Shah, Memorial Sloan Kettering Cancer Center, United States
  • Andrew Roth, BC Cancer Research Centre, Canada
  • Dario Bressan, Cancer Research UK Cambridge Institute, United Kingdom
  • Samuel Aparicio, BC Cancer Research Centre, Canada
  • Simon Tavaré, Columbia University, United States
  • Nicholas Walton, Cancer Research UK Cambridge Institute, United Kingdom
  • Atefeh Fatemi, Cancer Research UK Cambridge Institute, United Kingdom
  • Naila Adam, BC Cancer Research Centre, Canada
  • Claire Mulvey, Cancer Research UK Cambridge Institute, United Kingdom
  • Spencer Watson, University of Lausanne - UNIL, Switzerland
  • Tristan Whitmarsh, Cancer Research UK Cambridge Institute, United Kingdom
  • Guilia Lerda, Cancer Research UK Cambridge Institute, United Kingdom
  • Leonardo Sepulveda, Department of Chemistry and Chemical Biology, Harvard University, United States
  • Ciara Ciara O'Flanagan, BC Cancer Research Centre, Canada
  • Marta Paez-Ribes, Cancer Research UK Cambridge Institute, United Kingdom
  • Ignacio Vázquez-García, Memorial Sloan Kettering Cancer Center, United States


Presentation Overview: Show

Single-cell sequencing of tumors enable detailed understanding of intratumor heterogeneity and the individuality of cells, missing the context. Constructing a 3D picture that include the spatial context of the tumor microenvironment ( TME) is a critical factor in understanding selection of malignant cells with proliferative potential at the tumor front, immune surveillance and suppression of malignant immunogenic clones and deciphering spatial modes of growth and dispersal that impact tumor-immune co-evolution. At the IMAXT consortium, we used a 4T1 polyclonal mouse model to map TNBC tumor and its TME at single-cell resolution as a function of immune competency. We employed scRNA-seq and CITE-seq to identify tumor and immune cell states and design protein panels for single-cell spatial imaging methods (IMC &merFISH). Leveraging scDNA-seq, we identified clones within a mixed tumor population. This work specifically tackles multimodal single-cell integration challenges by presenting an analysis framework and devising strategies for tying together different data types using common anchors. Here, we project cell types/states discovered by single-cell sequencing on an accurate map of spatial organization in the TME by integrating CITE-seq and IMC . Moreover, with scDNA-seq, scRNA-seq and merFISH modalities, we create an accurate spatial map of tumor clones and their TME context.

C-212: Clinical relevance of endogenous retrovirus abnormal expression in colon cancer
Track: General Computational Biology
  • Zhifu Sun, Mayo Clinic, United States
  • Aditya Vijay Bhagwate, Mayo Clinic, United States
  • William Taylor, Mayo Clinic, United States
  • John Kisiel, Mayo Clinic, United States


Presentation Overview: Show

Genomic sequences integrated from human exogenous retroviruses (ERVs) account for nearly 9% of human DNA. These ERVs are generally silenced through epigenetic mechanisms. Growing evidence shows many ERVs are re-activated and associated with a variety of diseases including colon cancer. However, comprehensive profiling of ERVs and their clinical significances are lacking. We systematically profiled the ERV expression in 307 tumors and 41 adjacent normal tissues using RNA sequencing for 3,320 ERVs. ERV expression was found to be very different between tumors and their adjacent normal tissues, in which most ERVs had increased expression in tumors. These ERVs were mainly located in intergenic regions or intronic region of protein coding genes or lncRNAs. Host or nearby genes of ERVs with increased expression in tumors were significantly enriched in viral protein interactions with cytokine and cytokine receptors. ERV expression defined tumor subtypes were significantly associated with tumors’ methylation subtypes, MSI status, and hyper-mutation status. With adjustment for other known covariates, we found 152 ERVs were significantly associated with disease specific survival, 51 of which were also differentially expressed. Our comprehensive analysis provides in-depth insights to abnormal ERV expression in colon cancer and their clinical importance in tumor subclassification and clinical outcomes.

C-213: A novel non-linear approach for dealing with challenging batch effects
Track: General Computational Biology
  • Ser Xian Phua, Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore
  • Wilson Wen Bin Goh, Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore


Presentation Overview: Show

High-dimensional biological data such as transcriptomics and proteomics often suffer from batch effects that arise when technical variables are not controlled during data acquisition. Current batch effect correction methods like ComBat, though robust, frequently struggle with data confounded with class imbalance or class-batch confounding. This underscores the need for an approach that can model both batch and class variables effectively during batch effect estimation and correction. In this study, we present a novel nonlinear approach towards the modelling of batch effects. This method uses the underlying empirical cumulative distribution function of the dataset to map class-batch variables. Compared to ComBat and batch mean centering, our batch effect correction method consistently achieves lower Euclidean distances between batch associated clusters after correction across varying severity of class imbalances, partially class-batch confounded datasets, and different distributions in both simulated datasets and proteomics datasets. Visualization using t-SNE and principal component analysis also shows improved clustering of class variables post-correction. As we anticipate an increase in prevalence of high throughput methods, we hope that this approach can address future nuances like interaction effects and different distributions when it comes to batch effects in high-dimensional data.

C-214: The relevance of data characteristics: More power, less bias
Track: General Computational Biology
  • Eva Brombacher, Medical Center-University of Freiburg, Germany
  • Clemens Kreutz, Medical Center-University of Freiburg, Germany


Presentation Overview: Show

Omics technologies result in data of varying data characteristics, which, in turn, can influence the performance of downstream analyses, such as normalization methods or statistical tests, and may also be at the core of differing performance results in benchmarking studies.

Here, we show typical data characteristic patterns for selected omics data types – including proteomics, metabolomics (mass spectrometry (MS)- and nuclear magnetic resonance (NMR)-based), RNA-sequencing, microarray, and microbiome data – and demonstrate at the example of normalization methods how particular data properties render those methods inapplicable.

Based on our results, we encourage the thorough inspection of omics datasets as to their data characteristics prior to conducting downstream analyses, since the inappropriate use of algorithms on those datasets is prone to introducing bias.

C-215: Predicting pathogenic potential from DNA reads using deep learning
Track: General Computational Biology
  • Salem A. El-Aarag, Bioinformatics Department, Genetic Engineering and Biotechnology Research Institute (GEBRI), Egypt
  • Mohamed E Hasan, Bioinformatics Department, Genetic Engineering and Biotechnology Research Institute (GEBRI), Egypt
  • Alaa E. Hemeida, Bioinformatics Department, Genetic Engineering and Biotechnology Research Institute (GEBRI), Egypt
  • Mahmoud Elhefnawi, National Research Center, Egypt., Egypt


Presentation Overview: Show

Background
The reliable detection of emerging novel pathogens from next-generation sequencing data is a key challenge to solve. Traditional approaches depend on sequence similarity, may not able to detect novel species due to unavailability of closely related reference sequences. In contrast, machine learning methods can detect novel pathogens even though the biological context is unavailable.

Method
A list of pathogenic and nonpathogenic bacteria was retrieved from Integrated Microbial Genome and Mircrobiomes (IMG/M). One strain per species is included. Nonpathogenic strains of well-known pathogenic species were discarded. This resulted in a list of 446 species (342 pathogens and 67 non-pathogens). We simulated 10 million paired-end Illumina reads per class using InSilicoSeq. Reverse complements of the simulated reads were added to the final list. One hot-encoding was used to represent DNA sequences. The final list is divided into 90% training, 5% validation, 5% test sets.
Our deep learning model for predicting pathogenicity from DNA sequence reads includes two convolutional neural networks (CNN).

C-215: Predicting pathogenic potential from DNA reads using deep learning
Track: General Computational Biology
  • Salem A. El-Aarag, Bioinformatics Department, Genetic Engineering and Biotechnology Research Institute (GEBRI), Egypt
  • Mohamed E Hasan, Bioinformatics Department, Genetic Engineering and Biotechnology Research Institute (GEBRI), Egypt
  • Alaa E. Hemeida, Bioinformatics Department, Genetic Engineering and Biotechnology Research Institute (GEBRI), Egypt
  • Mahmoud Elhefnawi, National Research Center, Egypt., Egypt


Presentation Overview: Show

Background
The reliable detection of emerging novel pathogens from next-generation sequencing data is a key challenge to solve. Traditional approaches depend on sequence similarity, may not able to detect novel species due to unavailability of closely related reference sequences. In contrast, machine learning methods can detect novel pathogens even though the biological context is unavailable.

Method
A list of pathogenic and nonpathogenic bacteria was retrieved from Integrated Microbial Genome and Mircrobiomes (IMG/M). One strain per species is included. Nonpathogenic strains of well-known pathogenic species were discarded. This resulted in a list of 446 species (342 pathogens and 67 non-pathogens). We simulated 10 million paired-end Illumina reads per class using InSilicoSeq. Reverse complements of the simulated reads were added to the final list. One hot-encoding was used to represent DNA sequences. The final list is divided into 90% training, 5% validation, 5% test sets.
Our deep learning model for predicting pathogenicity from DNA sequence reads includes two convolutional neural networks (CNN).

C-216: ChiTaH: a fast and accurate tool for identifying known human chimeric sequences from high-throughput sequencing data
Track: General Computational Biology
  • Milana Frenkel-Morgenstern, Bar-Ilan University, Israel
  • Rajesh Detroja, Bar-Ilan University, Israel


Presentation Overview: Show

Fusion genes or chimeras typically comprise sequences from two different genes. The chimeric RNAs of such joined sequences often serve as cancer drivers. Identifying such driver fusions in a given cancer or complex disease is important for diagnosis and treatment. The advent of next-generation sequencing technologies, such as DNA-Seq or RNA-Seq, together with the development of suitable computational tools, has made the global identification of chimeras in tumors possible. However, the testing of over 20 computational methods showed these to be limited in terms of chimera prediction sensitivity, specificity, and accurate quantification of junction reads. These shortcomings motivated us to develop the first 'reference-based' approach termed ChiTaH (Chimeric Transcripts from High-throughput sequencing data). ChiTaH uses 43,466 non-redundant known human chimeras as a reference database to map sequencing reads and to accurately identify chimeric reads. We benchmarked ChiTaH and four other methods to identify human chimeras, leveraging both simulated and real sequencing datasets. ChiTaH was found to be the most accurate and fastest method for identifying known human chimeras from simulated and sequencing datasets. Moreover, especially ChiTaH uncovered heterogeneity of the BCR-ABL1 chimera in both bulk and single-cells of the K-562 cell line, which was confirmed experimentally.