Posters - Schedules
Posters Home

View Posters By Category

Monday, July 24, between 18:00 CEST and 19:00 CEST
Tuesday, July 25, between 18:00 CEST and 19:00 CEST
Session A Poster Set-up and Dismantle
Session A Posters set up:
Monday, July 24, between 08:00 CEST and 08:45 CEST
Session A Posters dismantle:
Monday, July 24, at 19:00 CEST
Session B Poster Set-up and Dismantle
Session B Posters set up:
Tuesday, July 25, between 08:00 CEST and 08:45 CEST
Session B Posters dismantle:
Tuesday, July 25, at 19:00 CEST
Wednesday, July 26, between 18:00 CEST and 19:00 CEST
Session C Poster Set-up and Dismantle
Session C Posters set up:
Wednesday, July 26,between 08:00 CEST and 08:45 CEST
Session C Posters dismantle:
Wednesday, July 26, at 19:00 CEST
Virtual
C-031: EXPANSION: EXploring Protein AlterNative SplIcing cONsequence
Track: VarI
  • Chakit Arora, Scuola Normale di Pisa, Italy
  • Natalia De Oliveira Rosa, Scuola Normale di Pisa, Italy
  • Francesco Raimondi, Scuola Normale di Pisa, Italy


Presentation Overview: Show

We present a web server to explore the functional consequences of protein-coding alternative splice variants, generated by
combining information of Differentially Expressed (DE) protein-coding transcripts from cancer genomics with information of
domain architecture, protein interaction network and gene enrichment analysis.
We have retrieved all the protein-coding Ensembl transcripts using the ensembldb package and computed DE transcripts by
considering RSEM from TCGA and GTEX tissues (XenaBrowser). We clustered all the protein sequences coded by each gene
and aligned them to map differences between isoforms and Uniprot canonical sequences. We mapped Interpro domains, as
well as Post-translational modifications (PTMs) on canonical sequences to identify functional splicing events. We retrieved
isoform-specific PPIs from IntAct, to identify isoform-specific functions.

C-046: TIGER: The gene expression regulatory variation landscape of human pancreatic islets
Track: VarI
  • Vibe Nylander, Oxford Centre for Diabetes, Endocrinology, and Metabolism, Radcliffe Department of Medicine, University of Oxford, United Kingdom
  • David Torrents, Life Sciences Department, Barcelona Supercomputing Center & Institució Catalana de Recerca i Estudis Avançats, Spain
  • Miriam Cnop, ULB Center for Diabetes Research, Université Libre de Bruxelles, Belgium
  • Josep M. Mercader, Barcelona Supercomputing Center, Broad Institute of Harvard and MIT & Massachusetts General Hospital, United States
  • Jorge Ferrer, Center for Genomic Regulation, BIST, CIBERDEM & Imperial College London, Spain
  • Decio L. Eizirik, ULB Center for Diabetes Research, Université Libre de Bruxelles, Belgium, Belgium
  • Piero Marchetti, Department of Clinical and Experimental Medicine and AOUP Cisanello University Hospital, University of Pisa, Italy
  • Leif Groop, Unit of Islet Cell Exocytosis & Unit of Molecular Metabolism, Lund University Diabetes Centre & FIMM, Sweden
  • Anna L. Gloyn, Stanford University, United Kingdom
  • Hindrik Mulder, Unit of Molecular Metabolism, Lund University Diabetes Centre, Sweden
  • Magic Consortium, MAGIC Consortium, United Kingdom
  • Ramon Amela, Life Sciences Department, Barcelona Supercomputing Center (BSC), Spain
  • Matthieu Defrance, ULB Center for Diabetes Research, Université Libre de Bruxelles, Belgium
  • Lena Eliasson, Unit of Islet Cell Exocytosis, Lund University Diabetes Centre, Sweden
  • Ji Chen, Exeter Centre of Excellence for Diabetes Research (EXCEED), University of Exeter Medical School, United Kingdom
  • Lorena Alonso, Life Sciences Department, Barcelona Supercomputing Center (BSC), Spain
  • Jason M. Torres, Clinical Trial Service Unit and Epidemiological Studies Unit & Wellcome Centre for Human Genetics, Nuffield, United Kingdom
  • Jean-Valery Turatsinze, ULB Center for Diabetes Research, Université Libre de Bruxelles, Belgium
  • Jonathan L.S. Esguerra, Unit of Islet Cell Exocytosis, Lund University Diabetes Centre, Sweden
  • Lorella Marselli, Department of Clinical and Experimental Medicine and AOUP Cisanello University Hospital, University of Pisa, Italy
  • Mara Suleiman, Department of Clinical and Experimental Medicine and AOUP Cisanello University Hospital, University of Pisa, Italy
  • Xavier Garcia-Hurtado, Bioinformatics and Genomics Program, Centre for Genomic Regulation (CRG), BIST & CIBERDEM, Spain
  • Montserrat Puiggròs, Life Sciences Department, Barcelona Supercomputing Center (BSC), Spain
  • Romina Royo, Life Sciences Department, Barcelona Supercomputing Center (BSC), Spain
  • Irene Miguel-Escalada, Bioinformatics and Genomics Program, Centre for Genomic Regulation (CRG), BIST & CIBERDEM, Spain
  • Goutham Atla, Bioinformatics and Genomics Program, Centre for Genomic Regulation (CRG), BIST & CIBERDEM, Spain
  • Sílvia Bonàs-Guarch, Bioinformatics and Genomics Program, Centre for Genomic Regulation (CRG), BIST & CIBERDEM, Spain
  • Marta Guindo-Martínez, Life Sciences Department, Barcelona Supercomputing Center (BSC), Spain
  • Ignasi Morán, Life Sciences Department, Barcelona Supercomputing Center (BSC), Spain
  • Anthony Piron, ULB Center for Diabetes Research & Interuniversity Institute of Bioinformatics in Brussels (IB2), Belgium


Presentation Overview: Show

Although Genome Wide Association Studies (GWAS) have facilitated the discovery of a large list of variants associated with complex disorders, the vast majority of these signals still remain without functional interpretation, thus, representing a challenge for the research community. Remarkably, the transcriptomic and epigenetic study of disease-related tissues facilitates the understanding of the molecular mechanisms underlying GWAS disease-susceptibility loci. This is the case of pancreatic islets which, despite the many complexities surrounding their accessibility and analysis, facilitate the comprehension of Type 2 Diabetes (T2D). Here, within the context of the T2DSystems, a Horizon2020 Project, we developed the Translational human Islet Genotype tissue-Expression Resource (TIGER), a large human islets regulatory expression database (http://tiger.bsc.es) which integrates, in a unique platform, the results obtained from extensive expression, expression quantitative trait loci (eQTL), and combined allele-specific expression (cASE) in 514 human islets samples, with publicly available summary statistics results from islets analyses, including expression array, regulatory elements, and other gene, variant, and disease functional information. The TIGER platform encloses tools for visualising, querying, and downloading human islet data enhancing the study of T2D and other islet-related diseases. Therefore, representing a formidable resource to interrogate the molecular aetiology of beta-cell failure.

C-047: A new GWAS method to unravel CNVs associated with ASD and cognitive ability
Track: VarI
  • Cécile Poulain, University of Montréal, CHU Sainte-Justine, Canada
  • Catherine Proulx, University of Montréal, CHU Sainte-Justine, Canada
  • Elise Douard, University of Montréal, CHU Sainte-Justine, Canada
  • Jean Louis Martineau, CHU Sainte-Justine, Canada
  • Zohra Saci, CHU Sainte-Justine, Canada
  • Zdenka Pausova, Hospital for Sick Children, University of Toronto, Toronto,, Canada
  • Tomas Paus, CHU Sainte-Justine, Canada
  • Laura Almasy, Children's Hospital of Philadelphia,, United States
  • David Glahn, Boston Children's Hospital/Harvard Medical School,, United States
  • Guillaume Huguet, University of Montréal, CHU Sainte-Justine, Canada
  • Sébastien Jacquemont, University of Montréal, CHU Sainte-Justine, Canada


Presentation Overview: Show

Autism spectrum disorder (ASD) is a complex neurodevelopmental disorder characterised by large clinical and genetic heterogeneity. 30% of autism patients also have intellectual disability (ID).
Among the variants associated with ASD, copy number variants (CNVs) are the most frequently identified in the clinic. CNV are deletions or duplications of genomic regions that may involve one or many genes. ASD-associated variants also have a significant impact on ID risk. No studies of rare variants have been able to clearly separate the effects on ASD risk and cognitive ability .
We aim To identify rare CNVs that confer a preferential risk for ASD or ID.
Methods: We performed an association study based on genes within CNV (CNV-GWAS) on a dataset of ~466 000 individuals (nASD=8,408) , to identify CNVs implicated in ASD and ID while controlling for their effects on cognition.
Results: We replicated previous associations and also identified 28 new genes associated with ASD. Only 3 of these genes remain significant after adjusting for cognitive ability.
Conclusion: We identified specific ASD associations. These observations and the identification of the biological pathways associated with these genes will lead to a better understanding of the common and specific features of ASD and ID.

C-048: Benchmarking UMI-aware and standard variant callers on synthetic and real ctDNA datasets
Track: VarI
  • Rugare Maruzani, University of Liverpool, United Kingdom
  • Anna Fowler, Department of Health Data Science, University of Liverpool, Liverpool, UK, United Kingdom
  • Liam Brierley, Department of Health Data Science, University of Liverpool, Liverpool, UK, United Kingdom
  • Andrea Jorgensen, Department of Health Data Science, University of Liverpool, Liverpool, UK, United Kingdom


Presentation Overview: Show

Circulating tumour DNA (ctDNA) has shown great potential as a minimally invasive biomarker for cancer patients. All tissues release fragmented cell free DNA (cfDNA) into the bloodstream via a range of mechanisms. ctDNA is a subset of cfDNA originating from tumour tissues.

Accurate detection of ctDNA variants in next-generation sequencing data is critical in realising the potential of ctDNA analysis as a minimally invasive cancer biomarker. However, ctDNA variants present a challenge for variant calling tools due to the low variant allele frequencies expected in ctDNA NGS data.

In this study we evaluate the performance of six variant callers for detection of low frequency variants in cfDNA datasets. First, we benchmarked callers on a set of synthetic datasets with variants spiked in at 6 different allele frequencies. Next, we benchmarked variant callers on a set of 10 real ctDNA samples.

Our results show Mutect2 displayed high sensitivity with a trade-off in specificity. LoFreq showed a good balance between sensitivity and specificity while bcftools was not suited to low frequency variant calling. Variant caller concordance was <2% in some benchmarking datasets. We provide a reference point to guide researchers on selecting the right tools for ctDNA variant calling applications.

C-049: Methodological approach for the analysis of WES data in human diseases
Track: Function
  • Kevin Muret, Université Paris-Saclay, CEA, Centre National de Recherche en Génomique Humaine (CNRGH), Evry, France., France
  • Claire Dandine-Roulland, Université Paris-Saclay, CEA, Centre National de Recherche en Génomique Humaine (CNRGH), Evry, France., France
  • Mallek Mziou, Université Paris-Saclay, CEA, Centre National de Recherche en Génomique Humaine (CNRGH), Evry, France., France
  • Vincent Le Goff, Université Paris-Saclay, CEA, Centre National de Recherche en Génomique Humaine (CNRGH), Evry, France., France
  • Edith Le Floch, Université Paris-Saclay, CEA, Centre National de Recherche en Génomique Humaine (CNRGH), Evry, France., France
  • Eric Bonnet, Université Paris-Saclay, CEA, Centre National de Recherche en Génomique Humaine (CNRGH), Evry, France., France
  • Sophie Hue, Department of Internal Medicine, Centre Hospitalier Henri Mondor, AP-HP, Université Paris-Est Créteil, France., France
  • Jean-François Deleuze, Université Paris-Saclay, CEA, Centre National de Recherche en Génomique Humaine (CNRGH), Evry, France., France


Presentation Overview: Show

According to the latest articles about WES, more than 2/3 of them focus exclusively on candidate genes for the diseases studied. Thus, the large majority of variants beyond these genes are not even considered. We summarize several methodological approaches for the systematic analysis of WES data on patient cohorts. The first approach is the gene candidate approach (with a priori). The other ones focus on methods without a priori: mutational burden study, classical GWAS analysis applied to WES data and an approach grouping variants to score genes (Burden/SKAT methods). As an illustration, we show our latest results on Hidradenitis suppurativa (HS), a chronic inflammatory dermatological disease for which familial transmission exists in 40% of cases. Our cohort is composed of 100 healthy individuals and 100 HS patients. We identified 260,000 variants. With the first approach, we detected a heterozygous frameshift mutation affecting NCSTN in three individuals. This mutation results in a non-functional protein. The other methods highlight genes related to HS risk factors (smoking and obesity) as well as genes related to inflammatory processes, which are also very consistent with HS. In conclusion, these methodological approaches, allows this type of analysis to be systematically performed on cohorts of patients.

C-050: Genome-wide identification and functional assessment of non-coding variants in hypoplastic left heart syndrome (HLHS) patients using survey of regulatory elements (SuRE) assay
Track: VarI
  • Vartika Bisht, Annogen , Science Park 406, 1098 XH Amsterdam, The Netherlands, Netherlands
  • Ludo Pagie, Annogen , Science Park 406, 1098 XH Amsterdam, The Netherlands, Netherlands
  • Miriam Smits, Annogen , Science Park 406, 1098 XH Amsterdam, The Netherlands, Netherlands
  • James Burgess, Annogen , Science Park 406, 1098 XH Amsterdam, The Netherlands, Netherlands
  • Alexey Pindyurin, Annogen , Science Park 406, 1098 XH Amsterdam, The Netherlands, Netherlands
  • W. S. Kerstjens-Frederikse, Department of Genetics, University Medical Centre Groningen, Groningen, The Netherland, Netherlands
  • Cleo Diemen, Department of Genetics, University Medical Centre Groningen, Groningen, The Netherland, Netherlands
  • Joris Arensbergen, Annogen , Science Park 406, 1098 XH Amsterdam, The Netherlands, Netherlands


Presentation Overview: Show

Congenital heart disease (CHD) accounts for nearly 1/3 of all major congenital anomalies with a prevalence of 1% in the human population. HLHS is a type of CHD, which results in the malformations of the left-sided structures in the heart. Occurring in about 3% of all infants born with CHD, HLHS is a uniformly fatal lesion which requires early surgical interventions. While whole genome sequencing is increasingly employed to better understand the underlying genetics of HLHS, many cases remain unexplained due to the difficulty of interpreting non-coding mutations. In this project, we employ a massively parallel reporter assay, SuRE, which enables functional comparison of entire genomes to identify mutations that are likely to have an impact on promoter and enhancer activity. By applying this strategy to five HLHS patients, we generate a database of about 4 million functionally annotated variants. With this, we hope to increase our understanding of HLHS genetics and contribute to possible preventive solutions based on early diagnosis. So far, we have identified about 11,000 functionally significant variants in our assay. Currently, we are developing a robust pipeline to further prioritize these variants with the aim to test several potentially causal variants in HLHS model systems.

C-051: Hypothesis-free phenotype prediction within a genetics-first framework
Track: VarI
  • Arun Pandurangan, University of Cambridge, United Kingdom
  • Julian Gough, MRC Laboratory of Molecular Biology, United Kingdom
  • Davide Danovi, Kings College London, United Kingdom
  • Bastian Greshake Tzovoras, The Alan Turing Institute, United Kingdom
  • Adam Sardar, University of Bristol, United Kingdom
  • Minkyung Sung, MRC Laboratory of Molecular Biology, United Kingdom
  • Raju Kalaivani, MRC Laboratory of Molecular Biology, United Kingdom
  • Hashem Shihab, University of Bristol, United Kingdom
  • Himani Tandon, MRC Laboratory of Molecular Biology, United Kingdom
  • Chang Lu, Imperial College London, United Kingdom
  • Natalie Zelenka, University of Bristol, United Kingdom
  • James Williams, Kings College London, United Kingdom
  • Miguel Bernabe-Rubio, Kings College London, United Kingdom
  • Matt Oates, University of Bristol, United Kingdom
  • Ben Smithers, University of Bristol, United Kingdom
  • Hai Fang, Shanghai Jiao Tong University School of Medicine, China
  • Rihab Gam, MRC Laboratory of Molecular Biology, United Kingdom
  • Jan Zaucha, University of Bristol, United Kingdom


Presentation Overview: Show

Cohort-wide sequencing studies have revealed that the largest category of variants is those deemed ‘rare’, even for the subset located in coding regions (99% of known coding variants are seen in less than 1% of the population. Associative methods give some understanding how rare genetic variants influence disease and organism-level phenotypes. But here we show that additional discoveries can be made through a knowledge-based approach using protein domains and ontologies (function and phenotype) that con- siders all coding variants regardless of allele frequency. We describe an ab initio, genetics-first method making molecular knowledge-based interpretations for exome-wide non-synonymous variants for phenotypes at the organ- ism and cellular level. By using this reverse approach, we identify plausible genetic causes for developmental disorders that have eluded other established methods and present molecular hypotheses for the causal genetics of 40 phenotypes generated from a direct-to-consumer genotype cohort. This system offers a chance to extract further discovery from genetic data after standard tools have been applied.

C-052: aiDIVA: augmented intelligence disease variant analysis
Track: VarI
  • Dominic Boceck, Institute of Medical Genetics and Applied Genomics, University of Tuebingen, Germany
  • Lucia Laugwitz, Institute of Medical Genetics and Applied Genomics, University of Tuebingen, Germany
  • Marc Sturm, Institute of Medical Genetics and Applied Genomics, University of Tuebingen, Germany
  • Stephan Ossowski, Institute of Medical Genetics and Applied Genomics, University of Tuebingen, Germany


Presentation Overview: Show

The diagnosis of rare Mendelian diseases is challenging and less than 40% of the cases can be solved using exome/genome sequencing. The major difficulties are the accurate pathogenicity classification of all detected variants and the prioritization of the causal variants among thousands of variants. During the last decade many approaches have been developed to estimate the impact of variants on protein function and for classification of pathogenicity. However, methods for prioritization of causal variants that integrated additional clinical information and phenotypes are scarce.

We developed a novel clinical decision support system (CDS), aiDIVA, which integrates a large range of molecular features, damage predictors and other information on variants with clinical data and phenotype descriptors for patients and candidate genes to classify and prioritize the variants identified for a given patient. The pathogenicity classification of all variants is based on a random forest model. This classification is used in combination with additional metadata of the patient (e.g. phenotype, inheritance), to generate a “causality ranking” of the variants. Ranking results can directly help a clinician to quickly identify the causal variant. Using more than 3000 solved cases we show that in most cases the causal variant is ranked among the top 25.

C-053: Quantifying CNVs effect sizes on cognitive ability for brain and non-brain gene-sets.
Track: VarI
  • Thomas Renne, Université de Montréal, Canada
  • Martineau Jean Louis, CHU Sainte Justine, Canada
  • Zohra Saci, CHU Sainte Justine, Canada
  • Zdenka Pausova, Hospital for sick children, Toronto, Canada
  • Tomas Paus, CHU Sainte Justine, Canada
  • Laura Almasy, Children Hospital of Philadelphia, United States
  • David Glan, Boston Children Hospital, United States
  • Guillaume Huguet, CHU Sainte Justine, Canada
  • Sébastien Jacquemont, CHU Sainte Justine, Canada


Presentation Overview: Show

Neurodevelopmental disorders (NDDs) are a spectrum of disorders associated to nervous system and its development. They are diagnosed in 10% of the general population. The NDDs etiology partially comes from Copy Number Variants (CNVs) then rare Single Nucleotide Variants (SNVs). However, only a few of these variants (most frequents) could explain the diseases. It is therefore difficult for clinicians to estimate which genetic variant may contribute to the neurodevelopmental symptoms in a patient, particularly for ultra-rare mutations.

To address this issue of undocumented CNVs, we proposed a novel strategy to understand general principles of variants on cognitive dimensions with 264k participants. A first model based on the effect of gene intolerance to loss-of-function on cognitive ability showed a negative and positive effects sizes. Constraint scores are not correlated to gene expression patterns, so we developed a second model to quantify the effect size of genes over-expressed in tissues or cell types. We identified that gene over-expressed in brain and non-brain tissues have a negative effect on the cognitive ability. Moreover, over-expressed genes in neuron as well as accessory and glial brain cells have negative effect sizes. This method opens new perspectives on how identifying gene effect sizes on phenotypes.

C-054: VariPred: Enhancing Pathogenicity Prediction of Missense Variants Using Protein Language Models
Track: VarI
  • Weining Lin, University College London, United Kingdom
  • Jude Wells, University College London, United Kingdom
  • Christine Orengo, University College London, United Kingdom


Presentation Overview: Show

Computational approaches for predicting the pathogenicity of genetic variants have advanced in recent years. These methods enable researchers to determine the possible clinical impact of rare and novel variants. Historically these prediction methods used hand-crafted features based on structural, evolutionary, or physiochemical properties of the variant. In this study we propose a novel framework that leverages the power of pre-trained protein language models to predict variant pathogenicity. We show that our approach VariPred (Variant impact Predictor) outperforms current state-of-the-art methods by using an end-to-end model that only requires the protein sequence as input. By exploiting one of the best performing protein language models (ESM-1b), we established a robust classifier, VariPred, requiring no pre- calculation of structural features or multiple sequence alignments. We compared the performance of VariPred with other representative models including 3Cnet, EVE and ‘ESM variant’. VariPred outperformed all these methods on the ClinVar dataset achieving an MCC of 0.751 vs. an MCC of 0.690 for the next closest predictor.

C-055: INVAR2: A modernised software tool for the detection of circulating tumour DNA in liquid biopsy data
Track: VarI
  • Emma-Jane Ditter, Cancer Research UK Cambridge Institute, University of Cambridge and Cancer Research UK Cambridge Centre, Cambridge, United Kingdom
  • Richard Bowers, Cancer Research UK Cambridge Institute, University of Cambridge and Cancer Research UK Cambridge Centre, Cambridge, United Kingdom
  • Matthew Eldridge, Cancer Research UK Cambridge Institute, University of Cambridge and Cancer Research UK Cambridge Centre, Cambridge, United Kingdom
  • Zhao Cheng, Cancer Research UK Cambridge Institute, University of Cambridge and Cancer Research UK Cambridge Centre, Cambridge, United Kingdom
  • Angela An, Cancer Research UK Cambridge Institute, University of Cambridge and Cancer Research UK Cambridge Centre, Cambridge, United Kingdom
  • Nitzan Rosenfeld, Cancer Research UK Cambridge Institute, University of Cambridge and Cancer Research UK Cambridge Centre, Cambridge, United Kingdom
  • Hui Zhao, Cancer Research UK Cambridge Institute, University of Cambridge and Cancer Research UK Cambridge Centre, Cambridge, United Kingdom


Presentation Overview: Show

When cells in the body die, they release fragments of DNA into the blood stream. These fragments may come from healthy cells, called cell free DNA, or from tumour cell, called circulating tumour DNA (ctDNA).
Analysis of ctDNA aims to detect few molecules of tumour origin for the identification of disease presence or recurrence. INVAR (Integration of VAriant Reads), an analysis pipeline previously published in Wan et al, is a pan-cancer classifier that takes as input sequencing data across a large panel of patient-specific tumour mutations and calculates a score for each sample that corresponds to the log likelihood of ctDNA being present in the sample. INVAR can distinguish healthy vs cancer samples with a detection sensitivity of down to few mutant molecules per million by considering locus-specific background error rates, trinucleotide context and fragment lengths.
We present results from a re-engineered version of INVAR2, showing how it can be applied on different types and depths of sequencing data. We describe how the noise suppression methods make it capable of detecting signal in low tumour fraction samples from low mutational burden cancers. We show how INVAR2 can be adapted to work on various analytes and present plans for future improvements.

C-056: XClone: detection of allele-specific subclonal copy number variations from single-cell transcriptomic data
Track: VarI
  • Rongting Huang, The University of Hong Kong, Hong Kong
  • Xianjie Huang, The University of Hong Kong, Hong Kong
  • Yin Tong, The University of Hong Kong, Hong Kong
  • Helen Y.N. Yan, The University of Hong Kong, Hong Kong
  • Suet Yi Leung, The University of Hong Kong, Hong Kong
  • Oliver Stegle, The University of Hong Kong, Hong Kong
  • Yuanhua Huang, The University of Hong Kong, Hong Kong


Presentation Overview: Show

Somatic copy number variations (CNVs) are major mutations in various cancers for their development and clonal progression. A few computational methods have been proposed to detect CNVs from single-cell transcriptomic data. Still, the technical sparsity makes it challenging to identify allele-specific CNVs, especially in complex clonal structures. Here we present a statistical method, XClone, to detect haplotype-aware CNVs by integrating expression levels and allelic balance from scRNA-seq data. With well-annotated datasets on multiple cancer types, we demonstrated that XClone could accurately detect different types of allele-specific CNVs, enabling the discovery of the corresponding subclones and dissection of their phenotypic impacts.

C-057: From genotype to phenotype in Arabidopsis thaliana: in-silico genome interpretation predicts 288 phenotypes from sequencing data
Track: VarI
  • Daniele Raimondi, KU Leuven, Leuven, Belgium, Belgium
  • Massimiliano Corso, Institut Jean-Pierre Bourgin, Université Paris-Saclay, INRAE, AgroParisTech, 78000 Versailles, France, France
  • Piero Fariselli, Department of Medical Sciences, University of Torino, 10123 Torino, Italy, Italy
  • Nora Verplaetse, Katholieke Universiteit Leuven, Belgium
  • Antoine Passemiers, Katholieke Universiteit Leuven, Belgium
  • Francesco Codicè, Department of Medical Sciences, University of Torino, 10123 Torino, Italy, Italy
  • Yves Moreau, Katholieke Universiteit Leuven, Belgium


Presentation Overview: Show

In many cases, the unprecedented availability of data provided by high-throughput sequencing has shifted the bottleneck from a data availability issue to a data interpretation issue, thus delaying the promised breakthroughs in genetics and precision medicine, for what concerns Human genetics, and phenotype prediction to improve plant adaptation to climate change and resistance to bioagressors, for what concerns plant sciences. In this paper, we propose a novel Genome Interpretation paradigm, which aims at directly modeling the genotype-to-phenotype relationship, and we focus on A. thaliana since it is the best studied model organism in plant genetics. Our model, called Galiana, is the first end-to-end Neural Network (NN) approach following the genomes in/phenotypes out paradigm and it is trained to predict 288 real-valued Arabidopsis thaliana phenotypes from Whole Genome sequencing data. We show that 75 of these phenotypes are predicted with a Pearson correlation ≥0.4, and are mostly related to flowering traits. We show that our end-to-end NN approach achieves better performances and larger phenotype coverage than models predicting single phenotypes from the GWAS-derived known associated genes. Galiana is also fully interpretable, thanks to the Saliency Maps gradient-based approaches.

C-058: In silico comprehensive analysis of coding and non-coding SNPs in human mTOR protein
Track: VarI
  • Tahirah Yasmin, Assistant Professor, Dept of Biochemistry and Molecular Biology, University of Dhaka, Bangladesh


Presentation Overview: Show

The mammalian/mechanistic target of rapamycin (mTOR) protein is an important growth regulator and has been linked with multiple diseases including cancer and diabetes. Genetic mutations can potentially affect a protein’s structure and hence its functions. In this study, the most deleterious SNPs of mTOR protein have been determined to identify potential biomarkers for various disease treatments. In total 11 nsSNPs have been filtered out of 2178 nsSNPs along with two non-coding variations. All of the nsSNPs were found to destabilize the protein structure and disrupt its function. While R619C, A1513D, and T1977R mutations were shown to alter C alpha distances and bond angles of the mTOR protein, L509Q, R619C and N2043S were predicted to disrupt the mTOR protein’s interaction with NBS1 protein and FKBP1A/rapamycin complex. In addition, one of the non-coding SNPs was shown to alter miRNA binding sites. Characterizing nsSNPs and non-coding SNPs and their harmful effects on a protein’s structure and functions will enable researchers to understand the critical impact of mutations on the molecular mechanisms of various diseases. This will ultimately lead to the identification of potential targets for disease diagnosis and therapeutic interventions.

C-059: Cell-free DNA single nucleotide variant analyser: inferring cancer-derived mutations from liquid biopsy whole-genome sequencing data
Track: VarI
  • Paulius D. Mennea, Cancer Research UK Cambridge Institute, University of Cambridge and Cancer Research UK Cambridge Centre, Cambridge, United Kingdom
  • Emma-Jane Ditter, Cancer Research UK Cambridge Institute, University of Cambridge and Cancer Research UK Cambridge Centre, Cambridge, United Kingdom
  • Katrin Heider, Cancer Research UK Cambridge Institute, University of Cambridge and Cancer Research UK Cambridge Centre, Cambridge, United Kingdom
  • Wendy N. Cooper, Cancer Research UK Cambridge Institute, University of Cambridge and Cancer Research UK Cambridge Centre, Cambridge, United Kingdom
  • Amit Roshan, Cancer Research UK Cambridge Institute, University of Cambridge and Cancer Research UK Cambridge Centre, Cambridge, United Kingdom
  • Tommy Kaplan, School of Computer Science and Engineering, Faculty of Medicine, The Hebrew University of Jerusalem, Israel, United Kingdom
  • Hui Zhao, Cancer Research UK Cambridge Institute, University of Cambridge and Cancer Research UK Cambridge Centre, Cambridge, United Kingdom
  • Nitzan Rosenfeld, Cancer Research UK Cambridge Institute, University of Cambridge and Cancer Research UK Cambridge Centre, Cambridge, United Kingdom


Presentation Overview: Show

Background
Whole-genome sequencing of cell-free DNA (cfDNA) enables tumour profiling via somatic point mutation detection, as the frequency of cancer-specific mutations found in blood reports upon the fraction of tumour-derived cfDNA (‘tumour fraction’). Existing variant callers, such as MuTect2 and LoFreq, typically identify cancer mutations for sufficiently high tumour fraction (≥ 5%-10%), with false calls due to sequencing artifacts, clonal haematopoiesis, and germline contamination.
Here, we integrate cancer-associated features and mutational signatures, to filter cfDNA variants and infer tumour-derived mutations.

Method
Input variants called by existing tools are annotated using features, such as cfDNA fragment length, and region-specific mutational density. Fragment-level information is extracted from cfDNA bam files, while regional annotations are derived from publicly available tumour data of the Pan-Cancer Analysis of Whole Genomes. A 96-trinucleotide matrix is used for mutational signature re-fitting, allowing accurate identification of relative cancer signature contributions even at 3%-5%.
An optimisation framework iteratively filters input variants by updating the annotated feature cut-offs until the mutational signature re-fitting results in optimal cancer signature attribution and re-fitting accuracy.

Conclusion
This tool leverages cancer-associated features and mutational signatures to infer the origin of cfDNA variants. It enables cfDNA-specific mutational analysis to enhance comprehensive tumour profiling from blood.

C-060: MAPT subhaplotype associations with chronic traumatic encephalopathy endophenotypes
Track: VarI
  • Xudong Han, Bioinformatics Graduate Program, Boston University, United States
  • Jillian Petrosky, Bioinformatics Graduate Program, Boston University, United States
  • Sarah Bald, Bioinformatics Graduate Program, Boston University, United States
  • Yichi Zhang, Bioinformatics Graduate Program, Boston University, United States
  • Richard Sherva, Boston University, United States
  • Jaeyoon Chung, Boston University, United States
  • Bobak Abdolmohammadi, Boston University, United States
  • Shruti Durape, Boston University, United States
  • Brett Martin, Boston University, United States
  • Joseph Palmisano, Boston University, United States
  • Kurt Farrell, Icahn School of Medicine at Mount Sinai, United States
  • John Farrell, Boston University, United States
  • Jonathan Cherry, Boston University, United States
  • Victor Alvarez, Boston University, United States
  • Bertrand Huber, Boston University, United States
  • Michael Alosco, Boston University, United States
  • Yorghos Tripodis, Boston University, United States
  • Robert Stern, Boston University, United States
  • Thor Stein, Boston University, United States
  • Lindsay Farrer, Boston University, United States
  • John Crary, Icahn School of Medicine at Mount Sinai, United States
  • Ann McKee, Boston University, United States
  • Adam Labadorf, Boston University, United States
  • Jesse Mez, Boston University, United States


Presentation Overview: Show

Repetitive head impacts (RHI) are the main risk-factor for chronic traumatic encephalopathy (CTE), a neurodegenerative disease characterized by tau protein buildup. The MAPT region has nine structural subhaplotypes due to a megabase-long inversion and several copy-number variations. It is implicated in other tauopathies, but has not been investigated in CTE. We investigated associations between genome-wide variants and MAPT subhaplotypes with CTE neuropathology and related endophenotypes such as tau burden in different brain regions.

458 donors of European ancestry from the Understanding Neurologic Injury and Traumatic Encephalopathy (UNITE) Brain Bank with known RHI were evaluated for CTE status, stage, tau burden, and dementia. Donors were genotyped on ~5,000 SNPs across the 17q21.31 region. Missing SNPs were imputed with IMPUTE2 and MAPT subhaplotypes were predicted with SHAPEIT. Regression modeling and permutation testing were used to investigate possible associations between MAPT subhaplotypes and CTE and/or related endophenotypes.

Having at least one H1β1γ1 subhaplotype (sample frequency = 0.39) was found to be significantly associated with dementia (OR=1.90; padj=0.019) and tau burden in 5 brain regions. These findings suggest a relationship between MAPT structural variants and CTE endophenotypes.

C-061: Avocato: a computational high-resolution scATAC-seq platform to prioritise non-coding genetic variants
Track: VarI
  • E. Ravza Gür, MRC WIMM Centre for Computational Biology, MRC Weatherall Institute of Molecular Medicine, University of Oxford, United Kingdom
  • Simone G. Riva, MRC WIMM Centre for Computational Biology, MRC Weatherall Institute of Molecular Medicine, University of Oxford, United Kingdom
  • Martin Sergeant, MRC WIMM Centre for Computational Biology, MRC Weatherall Institute of Molecular Medicine, University of Oxford, United Kingdom
  • Liangti Dai, MRC WIMM Centre for Computational Biology, MRC Weatherall Institute of Molecular Medicine, University of Oxford, United Kingdom
  • Christopher Cole, MRC WIMM Centre for Computational Biology, MRC Weatherall Institute of Molecular Medicine, University of Oxford, United Kingdom
  • Edward Sanders, MRC WIMM Centre for Computational Biology, MRC Weatherall Institute of Molecular Medicine, University of Oxford, United Kingdom
  • Stephen Taylor, MRC WIMM Centre for Computational Biology, MRC Weatherall Institute of Molecular Medicine, University of Oxford, United Kingdom
  • Gerton Lunter, Department of Epidemiology, University Medical Center Groningen (UMCG), University of Groningen,, Netherlands
  • Jim R. Hughes, MRC WIMM Centre for Computational Biology, MRC Weatherall Institute of Molecular Medicine, University of Oxford, United Kingdom


Presentation Overview: Show

Genome-wide association studies (GWAS) have provided huge amounts of information about human biology. However, the interpretation of GWAS findings has been very slow since most genetic variants are found in the non-coding regions of genome, affecting regulatory elements, such as promoters and enhancers (gene regulation). Currently, there is no clear path for identifying causal regulatory variants and linking them to genes and cell types they affect. ATAC-seq maps chromatin accessibility landscapes in cell types providing rich information about cell type-specific regulatory elements and their intersection with non-coding variants. Now, mapping chromatin accessibility profiles at a single-cell level is possible which is cost and time effective. However, important technical challenges also need to be overcome, such as the scalability and complexity of genetic and single-cell epigenomic data and the reproducibility of their analysis. We built a computational platform called Avocato to overcome those challenges. Avocato greatly increases the resolution of scATAC-seq data allowing for the high-resolution functional prioritisation of non-coding genetic variants and their effector cell types. It provides an end-to-end pipeline to analyse and visualise scATAC-seq data in a user-friendly interface. Together this study brings us one step closer to comprehensively understanding GWAS associations and elucidating the mechanisms of diseases.

C-062: Identifying rare cell-type specific driver mutations using DNA+Protein single cell sequencing
Track: VarI
  • Matt Field, James Cook University, Australia
  • Raymond Louie, University of New South Wales, Australia
  • Mandeep Singh, Garvan Institute of Medical Research, Australia
  • Chris Goodnow, Garvan Institute of Medical Research, Australia
  • Fabio Luciano, University of New South Wales, Australia


Presentation Overview: Show

Background: Single cell sequencing is revolutionizing life sciences by interrogating gene expression and other omics modalities at high resolution. The study of autoimmune disease and the identification of genetic drivers is ideally suited to a combination of DNA+Protein single cell sequencing.

Methods: We developed a novel bioinformatics workflow to identify rare driver mutations enriched by immune cell-type using MissionBio data. Cell-type annotations are determined using a supervised learning approach that better reflects the gating thresholds employed than the unsupervised method and more accurately annotates duplicates and dead cells. Variant calls are generated using modified GATK best practices and empirically-derived filters (including minimum cell number, allele frequency and cell-type annotations) are applied to produce lists of cell-type enriched high-quality variants. For visual inspection, plots are generated with variants annotated by cell-type with all candidate driver mutations used to generate an upset plot to identify variant co-occurrence.

Results: Across samples, we have identified expanded clones harbouring known GOF somatic mutations in T cell lymphoma driver genes. Some of the known driver variants are found in less than 0.5% of cells (down to ~10 clones) highlighting the power of this approach in characterising rare pathogenic clones in autoimmune disease.

C-063: ReproSNP: Phenotype-specific variant filtering and exploration based on UK Biobank data
Track: VarI
  • Lynn Ogoniak, Institute of Medical Informatics, University of Münster, Münster, 48149, Germany, Germany
  • Julian Varghese, Institute of Medical Informatics, University of Münster, Münster, 48149, Germany, Germany
  • Alexander Siegfried Busch, University of Münster, Department of General Pediatrics, Münster, 48149, Germany, Germany
  • Sarah Sandmann, Institute of Medical Informatics, University of Münster, Münster, 48149, Germany, Germany


Presentation Overview: Show

The UK Biobank (UKB) provides phenotype and genotype data from over 500,000 participants. It is an extremely valuable resource for the analysis of genetically associated diseases, especially those that are rare and/or have very heterogeneous and complex phenotypes. However, undiagnosed cases may be present in the UKB cohort. One example is male infertility: About 7% of men worldwide are estimated to be affected. For a large proportion, the medical cause remains unknown (60-70%). However, only 0.25% of male UKB participants report diagnosed infertility, indicating a high dark rate in the UKB cohort. To address this constraint, we developed the R Shiny application ReproSNP. Two cohorts, potentially infertile vs supposed fertile, are available. Additionally, a selection of phenotypes is provided, which allows for interactive definition of further subgroups. Comparing two subgroups, users can identify and explore candidate variants (based on whole exome sequencing) to validate or exclude them for further research. Visual and tabular output, including statistical tests, is automatically generated, providing easy and intuitive handling also for non-bioinformaticians. Concluding, ReproSNP enables a previously impractical analysis and exploration of infertility-associated variants. The integration of additional phenotypes for research in various medical fields will target a broad range of scientists and clinicians.

C-064: Network-Based Prediction Method for Cancer Driver Missense Mutations
Track: VarI
  • Narumi Hatano, Kyoto University, Japan
  • Mayumi Kamada, Kyoto University, Japan
  • Ryosuke Kojima, Kyoto University, Japan
  • Yasushi Okuno, Kyoto University, Japan


Presentation Overview: Show

With the development of genomic sequencing technology, numerous somatic missense mutations have been detected. Only a small part of them are driver mutations that are involved in cancer development.
Several computational methods have been developed to predict driver mutations. However, even though the abnormalities in molecular networks are related to cancer, most focus on individual variant features and do not consider molecular networks for the prediction.
Here we propose a network-based machine-learning model to predict driver mutations. The model consists of a graph learning part and a driver prediction part. The graph learning part uses a graph neural network to learn a graph structure representing molecular networks and computes graph node vectors. The driver prediction part uses a random forest algorithm to predict driver mutations with individual variant feature vectors and the graph node vectors corresponding to the variant.
The validation results using benchmark datasets showed that the proposed model was better than the conventional model with only individual variant features, and it is useful to consider molecular networks for predicting driver mutations. The proposed model is expected to find driver mutations from many missense mutations that may or may not be associated with cancer development.

C-065: A statistical approach to identify regulatory DNA variations combined with epigenomics data reveals novel non-coding disease genes
Track: VarI
  • Nina Baumgarten, Goethe University, Frankfurt am Main, Germany
  • Chaonan Zhu, Goethe University, Frankfurt am Main, Germany
  • Meiqian Wu, Goethe University, Frankfurt am Main, Germany
  • Yue Wang, Goethe University, Frankfurt am Main, Germany
  • Arka-Provo Das, Goethe University, Frankfurt am Main, Germany
  • Jaskiran Kaur, Goethe University, Frankfurt am Main, Germany
  • Fatemeh Behjati, Goethe University, Frankfurt am Main, Germany
  • Duong, Genome Biologics, Germany
  • Minh Duc Pham, Goethe University, Frankfurt am Main, Germany
  • Maria Duda, Genome Biologics, Germany
  • Laura Rumpf, Goethe University, Frankfurt am Main, Germany
  • Stefanie Dimmeler, Goethe University, Frankfurt am Main, Germany
  • Ting Yuan, Goethe University, Frankfurt am Main, Germany
  • Thorsten Kessler, German Heart Centre Munich, Germany
  • Jaya Krishnan, Goethe University, Frankfurt am Main, Germany
  • Marcel H. Schulz, Goethe University, Frankfurt am Main, Germany


Presentation Overview: Show

Non-coding variations located within regulatory elements may alter gene expression by modifying Transcription Factor (TF) binding sites and thereby lead to functional consequences like various traits or diseases. To understand these molecular mechanisms, different approaches are being used to assess the effect of DNA sequence variations, such as Single Nucleotide Polymorphisms (SNPs) on TF binding.
We investigate the distribution of maximal differential TF binding scores for general computational models that assess TF binding. We find that a modified Laplace distribution can adequately approximate the empirical distributions. A benchmark on in vitro and in vivo data sets showed that our new approach improves on an existing method in terms of performance and speed.
We combined our statistical approach in a flexible workflow with epigenetic data. Further we provide additional functionalities e.g., the identification of SNP-specific TFs conducted in a TF enrichment statistic. Applications on large sets of eQTL and SNPs obtained from genome wide association studies illustrate the usefulness of our approach to highlight cell-type specific regulators and disease associated target genes.
To conclude, our fast and accurate approach allows to evaluate DNA changes that induce differential TF binding, where the incorporation of epigenetic data results in a cell-type specific prediction.

C-066: Predicting human and viral protein variants affecting COVID-19 susceptibility
Track: VarI
  • Vaishali Waman, Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK, United Kingdom
  • Paul Ashford, Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK, United Kingdom
  • Su Datt Lam, Universiti Kebangsaan Malaysia, Malaysia
  • Mahnaz Abbasian, Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK, United Kingdom
  • Neeladri Sen, Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK, United Kingdom
  • Laurel Woodridge, Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK, United Kingdom
  • Nicola Bordin, Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK, United Kingdom
  • Ian Sillitoe, Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK, United Kingdom
  • Yonathan Goldtzvik, Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK, United Kingdom
  • Jiaxin Wu, Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK, United Kingdom
  • Christine Orengo, Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK, United Kingdom


Presentation Overview: Show

We used a structural bioinformatics approach to analyse the impact of missense variants from human and viral proteins, using 3D structures of SARS-CoV-2: human protein complexes (obtained from PDB and built using Alphafold2-multimer). Structure-based analyses indicate that missense variants in several human proteins including IFIT2, IFIH1, TOM70 and ISG15, are implicated in increased binding affinity to SARS-CoV-2 proteins. Affinity-enhancing variants in these proteins could promote their binding to SARS-CoV-2 proteins, instead of their natural protein partners or substrates in immune pathways. We report a catalogue of both common and rare variants in these proteins and discuss their structural and functional impact. We do not observe a specific trend in occurrence of these variants in certain specific ethnic groups, however occurrence of certain affinity-enhancing variants could lead increased susceptibility in individuals carrying them. We suggest monitoring of variants in immune proteins using experimental approaches thus would be helpful.

C-067: Speeding up clinical diagnostics of rare disease by ranking NGS variants based on clinical evidence
Track: VarI
  • Marc Sturm, Institute of Medical Genetics and Applied Genomics, University Hospital Tübingen, Germany
  • Stephan Ossowski, Institute of Medical Genetics and Applied Genomics, University Hospital Tübingen, Germany


Presentation Overview: Show

Today, the main method for clinical diagnostics of rare diseases is short-read exome/genome sequencing (WES/WGS). WES produces about 50k variants per sample and WGS about 5M variants. Ranking these variants according to relevance for the patient’s disease can accelerate the time for diagnostics tremendously.
We developed a novel ranking method which is based on clinical evidence. The input is a variant list and the patient’s phenotype encoded as HPO terms. Variants are ranked based on consequence of the variants on proteins and information from gnomAD (allele frequency, O/E), ClinVar (pathogenicity), HGMD (pathogenicity), OMIM (gene-phenotype), HPO (gene-phenotype) and SpliceAI (splicing effect). A retrospective benchmark based on our solved diagnostic cases shows that our method can rank the causal variant in the top 10 in more than 95% of the cases. In more than 80% of cases the causal variant is even on rank 1.
Most other methods for ranking variants, e.g. Exomizer, LIRICAL and AMELIE, are based on complex models with high computational requirements. As our method uses a simple point-based system it is very fast (~5s per sample). Based on performance and runtime, our model is suitable as a decision support system for clinical diagnostics.

C-068: A Comparative Study and Normalization Strategies for Structural Variant Analysis
Track: VarI
  • Mayumi Kamada, Kyoto University, Japan
  • Toshiaki Katayama, Database Center for Life Science, Japan
  • Yosuke Kawai, National Center for Global Health and Medicine, Japan


Presentation Overview: Show

Structural variants (SVs) represent large-scale genomic alterations known to be related to disease onset. The accumulation of SV data through international large-scale cohort studies has been conducted. Accurate interpretation of detected SVs requires differentiation between known and novel SVs, predicated on genomic positions and sequence similarity. The inherent complexity and diversity of SVs make comparative analyses laborious. An extensive manual process is needed to discern correspondence with prior SV data.

In this study, we investigated the differences among various SV detection tools employing a high-confidence SV call set and classified them based on SV types. We meticulously assess the definition, ambiguity of detection region boundaries, and representation of detection outcomes for approximately ten conventionally popular detection tools.
Subsequent to our investigations, we systematically categorized the discrepancies, taking into account SV types, and formulated criteria for subsequent normalization.

We will involve implementing normalization conforming to the formulated criteria and devising a scoring methodology to evaluate the similarity between SVs, thereby enhancing comparative analyses of SV in the genomics research.

C-069: Knowledge base for the improvement of nonsense suppression therapies
Track: VarI
  • Nicolas Haas, ICube, University of Strasbourg, France
  • Kirsley Chennen, ICube, University of Strasbourg, France
  • Olivier Poch, ICube, University of Strasbourg, France


Presentation Overview: Show

Approximately 12% of the 8,000 rare genetic diseases affecting 350 millions people worldwide and 10 to 30% of the cancers linked to tumour suppressor genes, are due to nonsense variations, which are responsible for the introduction of a premature termination codon (PTC) in the sequence of a protein-coding gene. During translation, mRNAs carrying PTC lead to the synthesis of truncated and often dysfunctional proteins, causing diseases. Several therapeutic strategies, termed nonsense suppression therapies, have been developed to tackle this problem, aiming to provide therapeutic relief for numerous affected patients. A promising therapeutic approach is to use molecules that induce readthrough of these PTC. However, current molecules have difficulties to combine efficacy and low toxicity and new research is needed for a better characterization of the nonsense variations targeted by the readthrough approaches and finally, to develop new molecules.
In this context, my work focuses on the development of StopKB, a graph-oriented knowledge base adapted to the prioritization of PTC suitable for therapeutic nonsense suppression approaches. StopKB includes all observed nonsense variations as well as associated genes, diseases and phenotypes. Some characteristics such as the minor allele frequencies in population, the deliriousness status of the nonsense variations, their nucleotide context, the prevalence of diseases... have been integrated into StopKB to determine the best therapeutic targets. A web application completes StopKB for a user-friendly and fast exploitation of the data according to specific needs.

C-070: Exploring the Spatial Distribution of Persistent SARS-CoV-2 Mutations - Leveraging mobility data for targeted sampling
Track: VarI
  • Riccardo Spott, Institute for Infectious Diseases and Infection Control, Jena University Hospital, Jena, Germany, Germany
  • Mateusz Jundzill, Institute for Infectious Diseases and Infection Control, Jena University Hospital, Jena, Germany, Germany
  • Mara Lohde, Institute for Infectious Diseases and Infection Control, Jena University Hospital, Jena, Germany, Germany
  • Mike Marquet, Institute for Infectious Diseases and Infection Control, Jena University Hospital, Jena, Germany, Germany
  • Mathias W. Pletz, Institute for Infectious Diseases and Infection Control, Jena University Hospital, Jena, Germany, Germany
  • Christian Brandt, Institute for Infectious Diseases and Infection Control, Jena University Hospital, Jena, Germany, Germany


Presentation Overview: Show

Given the rapid spread of SARS-CoV-2 across countries, it is difficult to understand how each variant spreads across districts. We sequenced over 6,500 Alpha-genomes (B.1.1.7) across seven months within the German federal state of Thuringia while collecting patients' isolation dates and postal codes. Our dataset is complemented by over 66,000 publicly available German Alpha sequences to bring our data into context and mobile service data for Thuringia. We identified the existence and spread of nine persistent mutation variants within the Alpha lineage, each forming separate phylogenetic clusters and different spreading patterns within Thuringia. Although cell service data cannot be used to predict the spread with absolute certainty, it is indicative enough to direct the scarce surveillance resources to districts where the emergence of these variants is expected. Therefore, we concluded that combining mobility data and SARS-CoV-Surveillance via sequencing is a valuable asset for less random and more guided surveillance.

C-071: Benchmark of Common Variant Calling Pipelines vs. the OmicsBox Approach
Track: VarI
  • Enrique Presa, Biobam Bioinformatics S.L, Spain
  • Adolfo López-Cerdán, Biobam Bioinformatics S.L, Spain
  • Stefan Götz, BioBam Bioinformatics S.L., Spain


Presentation Overview: Show

Genetic Variant Detection plays a crucial role in diverse fields such as biomedical research and plant breeding, where DNA sequences are compared to identify variants associated with a phenotype or disease. With the availability of numerous tools, there are now multiple pipeline options that significantly vary in efficiency and runtime. Our study aims to compare the performance of OmicsBox’s cloud-based pipeline with several commonly used pipelines.
Specifically, we compare OmicsBox’s Variant Calling Pipeline, which utilizes BWA-MEM as the aligner and BCFtools as the variant caller, with four other pipelines: TASSEL-GBS, Stacks, IGST, and Fast-GBS. These pipelines employ popular aligners like BWA-ALN and BWA-MEM, as well as standard variant callers such as TASSEL, BCFtools, and Platypus. We evaluate both the accuracy and computational performance of each workflow. Accuracy is assessed by comparing the number of correctly identified variants in both a Genotyping-by-Sequencing (GBS) dataset and a Whole-Genome Sequencing (WGS) dataset containing the same samples. Computational performance is evaluated based on runtime, taking into consideration the number of CPUs and memory allocation.
Our findings indicate that while standard approaches require several hours for computation, OmicsBox’s Variant Calling Pipeline completes the task within minutes. Moreover, in terms of accuracy, the implementation in OmicsBox also demonstrates superior performance. This benchmark highlights the importance of parallelization and cloud-based solutions when analyzing genetic variation at scale.

C-072: A Deep Dive into the Impact of Biotic Stress Resistant Mutations on Rice using AlphaFold2 Structures
Track: VarI
  • Fatima Shahid, Department of Applied Physics, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, Malaysia
  • Neeladri Sen, Institute of Structural and Molecular Biology, University College London, United Kingdom
  • Christine Orengo, Institute of Structural and Molecular Biology, University College London, United Kingdom
  • Su Datt Lam, Department of Applied Physics, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, Malaysia


Presentation Overview: Show

Rice is a vital food source, but its yield is negatively affected by various biotic stress. This study aims to investigate the impact of GWAS-derived missense mutations that confer biotic resistance on protein structure. Rice AlphaFold structures were chopped into domains using the CATH-Assign protocol. Using the GWAS atlas, our study identified 147 biotic resistance mutations that were mapped to 66 CATH structural domains. We calculated the impact of mutations based on their Grantham score and pathogenicity using MutPred2 and our in-house protein-language model-based predictor VariPred. We also studied the effect of mutations based on their proximity to functional sites. Many of the domains containing these mutations belong to stress-modulating CATH superfamilies, such as the Leucine-rich Repeat Variant, Tetratricopeptide repeat, and P-loop containing nucleotide triphosphate hydrolases. The study revealed that 62 non-conservative mutations were located near functional sites, including evolutionary conserved FunSites, ligand binding sites, protein-protein interfaces, binding pocket residues, PTM sites, and allosteric sites. This study is the first to use high-quality AlphaFold protein models to predict the pathogenicity and functional sites for pathogen-resistant mutations in plants, providing insights into stress resistance mechanisms and offering strategies for better resistance in rice.

C-073: The DAR database unravels the complex associations between human enzymes and genetic diseases
Track: VarI
  • Giulia Babbi, Biocomputing Group Bologna, University of Bologna, Italy
  • Castrense Savojardo, Biocomputing Group Bologna, University of Bologna, Italy
  • Davide Baldazzi, Biocomputing Group Bologna, University of Bologna (now at CRO Aviano), Italy
  • Elisa Bertolini, Biocomputing Group Bologna, University of Bologna, Italy
  • Pier Luigi Martelli, Biocomputing Group Bologna, University of Bologna, Italy
  • Rita Casadio, Biocomputing Group Bologna, University of Bologna, Italy


Presentation Overview: Show

The increase of human genomic variants related to genetic diseases requires highlighting the complex associations of genes and maladies. We address this problem by considering the Reactome pathways (https://reactome.org/) as a representation of the cellular biological processes. A new database, DAR (Diseases And Reactome) is now available on our website (https://dar.biocomp.unibo.it). DAR maps disease-associated enzymes into Reactome pathways, allowing an overview of all the complex relationships among different pathways. Presently DAR contains 1,494 human enzymes associated with 2,539 genetic diseases derived from OMIM, Humsavar, Clinvar and Monarch. By mapping into the Reactome pathways the set of enzyme-associated diseases (described with MONDO code), we found a Reactome-disease association for 1525 pathways. A search in DAR allows to characterize the disease-gene-pathway/s association. This helps in understanding the biochemical/molecular biology of the disease across different pathways and can support further analysis for drug repurposing. Furthermore, by grouping pathways according to the Reactome roots it is possible to establish links among diseases and highlight possible co-morbidities. DAR can help with the annotation of pathogenetic gene variants, particularly in the case of rare diseases, as 75% of the collected diseases have an Orphanet identifier.

C-074: In silico exploration of variants within regulatory regions in a cohort of Brazilians with rare diseases and hereditary cancer risk syndromes
Track: VarI
  • Antonio V. C. Coelho, Hospital Israelita Albert Einstein, Brazil
  • Catarina S. Gomes, Hospital Israelita Albert Einstein, Brazil
  • Rafael Lucas Muniz Guedes, Hospital Israelita Albert Einstein, Brazil
  • Rafael S. de Albuquerque, Hospital Israelita Albert Einstein, Brazil
  • Gustavo Santos de Oliveira, Hospital Israelita Albert Einstein, Brazil
  • Rodrigo A. S. Barreiro, Hospital Israelita Albert Einstein, Brazil
  • Luciana Souto Mofatto, Hospital Israelita Albert Einstein, Brazil
  • Lívia Silva Moura, Hospital Israelita Albert Einstein, Brazil
  • Marina Cadena da Matta, Hospital Israelita Albert Einstein, Brazil
  • Marcel Pinheiro Caraciolo, Hospital Israelita Albert Einstein, Brazil
  • Murilo Cervato, Hospital Israelita Albert Einstein, Brazil
  • João Bosco Oliveira, Hospital Israelita Albert Einstein, Brazil


Presentation Overview: Show

The Brazilian Rare Genomes (GRAR) project aims to improve genomics diagnostics of rare diseases in Brazil. Patients receive a molecular diagnostic based on the presence of pathogenic/likely pathogenic variants following the ACMG guidelines. In this study, we analyzed the transcriptome from peripheral blood mononuclear cells of 250 patients with Negative/Inconclusive reports, investigating the potential regulatory impacts of cis-acting variants that may affect gene expression. Low-expressed transcripts and outlier samples were removed. Quantile normalization was performed and batch effects were removed. We selected 11 narrow peak BED files from the ENCODE database derived from ATAC-Seq experiments and filtered GRAR genotypes within those genomic coordinates. A preliminary analysis, using MatrixEQTL package v.2.3 for R, suggested 18,837 cis-acting variant/transcript pairs. After grouping by transcript, filtering for the variant with the lowest FDR, and refitting a linear regression model, including age, sex, RNA integrity, and recent consanguinity as covariates 2,414 pairs remained. Most rare variants downregulated the expression of the associated transcript. Twenty-four variants were associated with the expression of genes with phenotypes recognized in the OMIM database. We found three possibly novel tag-SNVs for further investigation with relevant phenotypes in the context of the GRAR Project.

C-075: Modeling endogenous editing outcome of base editor reporter screens with CRISPR-Bean discovers causal variants for cellular LDL uptake
Track: VarI
  • Jayoung Ryu, Harvard Medical School, Massachusetts General Hospital and Broad Institute of Harvard and MIT, Boston, MA, USA, United States
  • Sam Barkal, Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA, United States
  • Tian Yu, Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA, United States
  • Matthew Francoer, Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA, United States
  • Martin Jankowiak, Generate Biomedicines, United States
  • Zhijian Li, Broad Institute of Harvard and MIT and Molecular Pathology Unit, Massachusetts General Hospital, Cambridge, MA, USA, United States
  • Michael Love, Department of Biostatics and Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA, United States
  • Richard Serwood, Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA, United States
  • Luca Pinello, Massachusetts General Hospital, Broad Institute of Harvard and MIT and Harvard Medical School, Charlestown, MA, USA, United States


Presentation Overview: Show

CRISPR base editor screens are powerful tools for studying disease-associated variants at scale. However, the efficiency and precision of base editing perturbations vary, frequently leading to incomplete and unintended edits, which confounds the assessment of variant-induced phenotypic effects from such screens.
To overcome these challenges we have developed CRISPR-Bean, a framework that directly utilizes endogenous editing outcomes for variant effect quantification in base editor screens by combining recently developed base editor reporter assays with Bayesian generative models. We show that base editor reporters faithfully recapitulate endogenous site editing outcomes while being dependent on accessibility. CRISPR-Bean directly models per-guide editing outcomes and target site accessibility when modeling phenotypic impacts of target variants. We deployed CRISPR-Bean on base editor screens with reporters for cellular LDL uptake on LDL-C-associated GWAS candidate variants and coding variants on LDL receptor exons. We show that CRISPR-Bean attains superior performance in variant classification and effect size quantification. With this improved sensitivity, CRISPR-Bean identifies novel coding and noncoding variants that alter LDL uptake which are further validated and characterized of their mechanism of action. This work provides a novel and widely applicable approach that improves the power of base editor screens for disease-associated variant characterization.

C-076: Interface-guided phenotyping of coding variants
Track: VarI
  • Kivilcim Ozturk, UC San Diego, United States
  • Rebecca Panwala, UC San Diego, United States
  • Jeanna Sheen, UC San Diego, United States
  • Prashant Mali, UC San Diego, United States
  • Hannah Carter, UC San Diego, United States


Presentation Overview: Show

Understanding the consequences of single amino acid substitutions in cancer driver genes remains an unmet need. Perturb-seq provides a tool to investigate the effects of individual mutations on cellular programs. In this work, we hypothesized that examining the consequences of perturbing distinct protein interactions could provide a useful abstraction of the phenotypic space reachable by individual amino acid substitutions. To explore this hypothesis, we employed a Perturb-seq style approach to generate mutations at physical interfaces of the transcription factor RUNX1, with the potential to perturb different interactions, and therefore produce transcriptional readouts implicating different aspects of the RUNX1 regulon. We measured the impact of more than 100 mutations on RNA profiles in single myelogenous leukemia cells, and used the profiles to identify functionally distinct groups of RUNX1 mutations, characterize their effects on cellular programs, and study the implications of cancer mutations. The largest concentration of functional mutations clustered at the DNA binding site and contained many of the more frequently observed mutations in human cancers. Overall, our work demonstrates the potential of targeting protein interaction interfaces to better define the landscape of prospective phenotypes reachable by amino acid substitutions.

C-077: JACNEx : Joint Analysis of Copy Numbers from Exomes – applications to infertility.
Track: VarI
  • Amandine Septier, Univ. Grenoble Alpes, CNRS, UMR 5525, VetAgro Sup, Grenoble INP, TIMC, France
  • Nicolas Thierry-Mieg, Univ. Grenoble Alpes, CNRS, UMR 5525, VetAgro Sup, Grenoble INP, TIMC, France


Presentation Overview: Show

Infertility affects approximately 10-15% of couples, and mounting evidence suggests that 50% of cases have genetic causes.
Whole exome sequencing enables the large-scale detection of genomic variations. However, although single nucleotide variants are identified with reasonable accuracy using state-of-the-art bioinformatics tools, the computational detection of other types of variants such as copy number variations (CNVs) remains challenging. Indeed many tools for calling CNVs from exome data have been proposed, but independent benchmarks reveal that they largely disagree. Detailed analysis of their shortcomings provided insights that led us to develop JACNEx.
JACNEx performs fine-grained processing of sequencing data to accurately calculate read depths and identify CNV-supporting split-reads. It then constructs groups of similar samples by hierarchical clustering, and models the read depth profile for each exon and each group independently by fitting Gaussian and exponential distributions. Finally, exon-level copy-number likelihoods are combined using a hidden Markov model to call CNVs.
Compared to other tools, JACNEx avoids false-positive calls by taking into account exon-level data quality, seamlessly integrates heterogeneous exome sequencing technologies, and can identify breakpoints via split-reads.
Preliminary investigations using an infertility cohort of 700 exomes reveal JACNEx as the most sensitive and specific exome-based CNV caller. Further validations are underway.

C-078: Onkopus – A Decision Support Framework for Evidence-Based Interpretation of Actionable Biomarkers in Precision Oncology
Track: VarI
  • Nadine Sina Kurz, Department of Medical Bioinformatics, University Medical Center Göttingen (UMG), Germany
  • Kevin Kornrumpf, Department of Medical Bioinformatics, University Medical Center Göttingen (UMG), Germany
  • Tim Tucholski, Department of Medical Bioinformatics, University Medical Center Göttingen (UMG), Germany
  • Vera Gnaß, Department of Medical Bioinformatics, University Medical Center Göttingen (UMG), Germany
  • Jingyu Yang, Department of Medical Bioinformatics, University Medical Center Göttingen (UMG), Germany
  • Klara Drofenik, Department of Medical Bioinformatics, University Medical Center Göttingen (UMG), Germany
  • Tim Beißbarth, Department of Medical Bioinformatics, University Medical Center Göttingen (UMG), Germany
  • Jürgen Dönitz, Department of Medical Bioinformatics, University Medical Center Göttingen (UMG), Germany


Presentation Overview: Show

The current bottleneck in Molecular Tumor Boards (MTBs) is manual annotation and interpretation of increasing amounts of biomarker data, hence there is a need for computational decision support for variant interpretation. We present Onkopus, a modular framework for cancer variant interpretation that predicts the most relevant biomarkers to select the best personalized treatment options. Onkopus provides modules to annotate variants and to predict the functional impact of variants of unknown significance by incorporating relevant information, including biochemical characteristics, variant pathogenicity scores, functional effect prediction and clinical significance data. Additional modules focusing on pathway analysis, protein structural alteration analysis and translational methods for previously unaccounted biomarkers are in progress. Taking all this information into account, Onkopus builds an aggregated, comprehensive overview of a patient to support the selection of personalized treatments. Methods developed for this purpose include the aggregation and normalization of data from public resources, converting sequence coordinates, scoring benign and pathogenic variants, estimating the clinical significance of treatments and data visualization. Onkopus is accompanied by subprojects developing new methods for gene fusions, copy number variations, protein docking, drug classification and methylome analysis. Finally, we compare the accuracy of our variant interpretation method to previous approaches and clinical decisions.

C-079: Neurodevelopmental disorders and cancer networks share pathways; but differ in mechanisms, signaling strength, and outcome
Track: VarI
  • Bengi Ruken Yavuz, Middle East Technical University, Turkey
  • Cansu Demirel, Koc University, Turkey
  • Kaan Arici, Middle East Technical University, Turkey
  • Chung-Jung Tsai, Frederick National Laboratory for Cancer Research, National Cancer Institute, United States
  • Hyunbum Jang, Frederick National Laboratory for Cancer Research, National Cancer Institute, United States
  • Ruth Nussinov, Frederick National Laboratory for Cancer Research, National Cancer Institute, United States
  • Nurcan Tuncbag, Koc University, Turkey


Presentation Overview: Show

Neurodevelopmental disorders (NDDs) and cancer are connected, with immunity as their common factor. Their clinical presentations differ; however, individuals with NDDs are more likely to acquire cancer. Schizophrenia patients have ∼50% increased risk; autistic individuals also face an increased cancer likelihood. NDDs are associated with specific brain cell types at specific locations, emerging at certain developmental time windows during brain evolution. Their related mutations are germline; cancer mutations are sporadic, emerging during life. At the same time, NDDs and cancer share proteins, pathways, and mutations. Here we ask exactly which features they share, and how despite their commonality, they differ in outcomes. Our pioneering bioinformatics exploration of the mutations, reconstructed disease-specific networks, pathways, and transcriptome profiles of autism spectrum disorder (ASD) and cancers, points to elevated signal strength in pathways related to proliferation in cancer, and differentiation in ASD. Signaling strength, not the activating mutation, is the key factor in deciding cancer versus NDDs.

C-080: Comprehensive Identification and Characterization of Splicing Associated Variants with Coverage Aware Statistical Models
Track: VarI
  • David Wang, University of Pennsylvania, United States
  • Matthew Gazzara, University of Pennsylvania, United States
  • San Jewell, University of Pennsylvania, United States
  • Christopher Brown, University of Pennsylvania, United States
  • Yoseph Barash, University of Pennsylvania, United States


Presentation Overview: Show

Identification and characterization of splicing quantitative trait loci (sQTLs) has emerged as a critical component in understanding the function of noncoding genetic variants implicated in disease. However, a significant number of sQTLs remain undiscovered due to limitations in both splicing quantification and statistical methods. Here we present a sQTL mapping approach that identifies thousands of novel variants that have been recurrently omitted in recent studies. Our method combines event and transcript level quantifications to identify variants associated with a more comprehensive set of splicing phenotypes that includes intron retention and alternative transcript start/end which are not considered by existing approaches. We also develop statistical methods to handle discrete and highly correlated multivariate splicing phenotypes which have more power to detect sQTLs while reducing false discoveries. Through modeling of overdispered count data, phenotype correlation, missing values, and heteroscedasticity, our model outperforms current methods which were adapted 'as is' from eQTL studies but are still the standard in the field. Using GTEX as a case study, we show that over 25% of sQTLs are not reported across multiple tissues. To facilitate downstream variant interpretation, we also introduce improved visualization tools and identify novel variants associated with intron retention in Alzheimer’s genes.

C-080: Comprehensive Identification and Characterization of Splicing Associated Variants with Coverage Aware Statistical Models
Track: VarI
  • David Wang, University of Pennsylvania, United States
  • Matthew Gazzara, University of Pennsylvania, United States
  • San Jewell, University of Pennsylvania, United States
  • Christopher Brown, University of Pennsylvania, United States
  • Yoseph Barash, University of Pennsylvania, United States


Presentation Overview: Show

Identification and characterization of splicing quantitative trait loci (sQTLs) has emerged as a critical component in understanding the function of noncoding genetic variants implicated in disease. However, a significant number of sQTLs remain undiscovered due to limitations in both splicing quantification and statistical methods. Here we present a sQTL mapping approach that identifies thousands of novel variants that have been recurrently omitted in recent studies. Our method combines event and transcript level quantifications to identify variants associated with a more comprehensive set of splicing phenotypes that includes intron retention and alternative transcript start/end which are not considered by existing approaches. We also develop statistical methods to handle discrete and highly correlated multivariate splicing phenotypes which have more power to detect sQTLs while reducing false discoveries. Through modeling of overdispered count data, phenotype correlation, missing values, and heteroscedasticity, our model outperforms current methods which were adapted 'as is' from eQTL studies but are still the standard in the field. Using GTEX as a case study, we show that over 25% of sQTLs are not reported across multiple tissues. To facilitate downstream variant interpretation, we also introduce improved visualization tools and identify novel variants associated with intron retention in Alzheimer’s genes.

C-081: Ending Diagnostic Odysseys with Intelligently Directed Exome Reanalysis
Track: VarI
  • Robert Schuetz, Nationwide Children's Hospital, United States
  • Austin Antoniou, Nationwide Children's Hospital, United States
  • Bimal Chaudhari, Nationwide Children's Hospital, United States
  • Peter White, Nationwide Children's Hospital, United States
  • Daniel Koboldt, Nationwide Children's Hospital, United States
  • Mohammad Marhabaie, Nationwide Children's Hospital, United States
  • Swetha Ramadesikan, Nationwide Children's Hospital, United States


Presentation Overview: Show

Exome and genome sequencing (ES/GS) are essential for diagnosing patients with rare genetic conditions. However, a genetic diagnosis is identified in less than half of patients. Therefore, we developed a computational variant prioritization tool, CAVaLRi, that contextualizes patients’ phenotype and genotype with pedigree data to identify likely diagnostic variants. CAVaLRi diagnostic probabilities were obtained for every candidate diagnostic variant in undiagnosed ES/GS cases. These data were combined with case metadata and organ system-specific phenotypic data to train an ensemble machine learning classifier, PARADIGM, to distinguish between diagnostic and non-diagnostic cases. Cases with diagnostic variants (n=236) were used as positive patterns in model training, while cases without diagnostic variants or variants of uncertain significance were used as negative patterns (n=175). The highest-scoring 20 cases from an undiagnosed cohort (n=523) were referred to an independent rare disease team for ES/GS reinterpretation. Eight new diagnoses were made. This is approximately three times the expected yield compared to unprioritized reanalysis. PARADIGM has the potential to help prioritize cases that are more likely to yield a diagnosis and shorten diagnostic odysseys for patients with rare genetic conditions.

C-082: Blood Mitochondrial DNA Heteroplasmy Level Are Associated with Amyloid and Tau
Track: VarI
  • Tong Tong, Bioinformatics Program, Boston University, United States
  • Congcong Zhu, Departments of Medicine (Biomedical Genetics), Boston University Chobanian & Avedisian School of Medicine, United States
  • John Farrell, Departments of Medicine (Biomedical Genetics), Boston University Chobanian & Avedisian School of Medicine, United States
  • Xiaoling Zhang, Departments of Medicine (Biomedical Genetics), Boston University Chobanian & Avedisian School of Medicine, United States
  • Lindsay Farrer, Departments of Medicine (Biomedical Genetics), Boston University Chobanian & Avedisian School of Medicine, United States


Presentation Overview: Show

Mitochondrial dysfunction plays an important role in Alzheimer’s Disease (AD) pathogenesis, and one main feature is disturbed mitochondrial genome. Mitochondrial heteroplasmy is the coexistence of an individual’s wildtype and mutant mitochondrial DNAs (mtDNA), which could lead to deteriorated mitochondria when the proportion of mutants reaches a certain threshold. However, the relationship between mtDNA heteroplasmy level and AD development is unclear. To investigate whether AD and its related pathological traits are associated with heteroplasmy, we leveraged whole genome sequence data from 1,566 human blood samples from Alzheimer’s Disease Neuroimaging Initiative (ADNI). We developed a pipeline – MitoH3 – to efficiently and accurately call heteroplasmic variants. Negative binomial regression was performed to assess the associations between heteroplasmy and age, gender, and AD diagnosis. We found the number of heteroplasmic sites positively associated with age (β=0.016, P=4.34×10-4). We also evaluated associations with AD biomarkers from cerebrospinal fluid (CSF) and blood using linear regression. An increased heteroplasmy level was associated with higher ratio of CSF beta amyloid 42/40 (β=0.070, P=0.017), and decreased CSF tau level (β=-0.06, P=0.043). These findings suggested that blood mtDNA heteroplasmy level is involved in AD pathologies in some way though doesn’t show a difference between cognitively normal and dementia patients.

C-083: Whole-genome sequencing identifies rare variants implicating the endocytic pathway in Alzheimer’s disease
Track: VarI
  • Jixin Cao, Fudan University, China
  • Cheng Zhang, Fudan University, China
  • Chun-Yi Zac Lo, Fudan University, China
  • Qihao Guo, Shanghai Jiao Tong University, China
  • Xiaohui Luo, Fudan University, China
  • Zi-Chao Zhang, Fudan University, China
  • Tian-Lin Cheng, Fudan University, China
  • Jingqi Chen, Fudan University, China
  • Xing-Ming Zhao, Fudan University, China


Presentation Overview: Show

Alzheimer’s disease (AD) irreversibly leads to dementia, with an increasing prevalence as the population ages. Genome-wide association studies for AD have identified more than 70 susceptibility loci, yet there is still substantial heritability missing that might be attributed to rare variants. Here, we performed a whole-genome sequencing analysis on 404 individuals of Chinese ancestry, including 141 sporadic AD cases, to explore likely disease-contributing rare variants. We observed increased burdens of rare, likely deleterious variants in AD, in both coding and functional noncoding regions. Rare variants in the known AD core genes may contribute to the etiologies of about one third of AD cases. Through network analysis, we identified 19 rare variants in 18 potentially novel AD risk genes that are significantly enriched in the endocytic pathways. Functional experiments further showed that the prioritized variant could induce early endosomal dysfunction and exacerbate the AD-associated amyloid processing of APP, making it a probable contributor to AD and a promising target for interventions. Overall, our results highlighted the important role of rare deleterious variants impacting the endocytic pathways in AD genetic etiology.

C-084: Genome Variation Map and GWAS Atlas across multiple species
Track: VarI
  • Shuhui Song, Beijing Institute of Genomics, CAS, China


Presentation Overview: Show

The Genome Variation Map (GVM; http://bigd.big.ac.cn/gvm/) is a public data repository of genome variations, and GWAS Atlas (https://ngdc.cncb.ac.cn/gwas/) is a curated resource of genome-wide genotype-to-phenotype associations.
GVM aims to collect and integrate genome variations
for a wide range of species, accepts submissions of different variation types from all over the world and provides free open access to all publicly available data in support of worldwide research activities. In the current release, GVM houses a total of ∼1159 million variants from 49 species, including 18 animals, 28 plants and 3 viruses. Moreover, it incorporates 66 276 individual genotypes and 459 552 manually curated highquality genotype-to-phenotype associations. Since its inception, GVM has archived genomic variation data of 258 973 samples submitted by worldwide users and served >1 million data download requests. GWAS Atlas incorporates a total of 278,109 curated genotype-to-phenotype associations for 1,444 different traits across 15 species (10 plants and 5 animals) from 830 publications and 3,432 studies. Collectively, as core resource in the National Genomics Data Center, GVM and GWAS Atlas provides valuable genome variations for a diversity of species and thus plays an important role in both functional genomics studies and molecular breeding.

C-085: Single-cell mapping of somatic copy number alterations in healthy tissues
Track: VarI
  • Ronja Johnen, CECAD, Germany
  • Luise Nagel, CECAD, Germany
  • Ana Carolina Leote, CECAD, Germany
  • Manuel Lentzen, CECAD, Germany
  • Andreas Beyer, CECAD, Germany


Presentation Overview: Show

Studying the emergence of somatic copy number alterations (CNAs) in healthy human cells is important to better understand how they contribute to aging, cancer and other diseases.
Here, we present a CNA calling method that infers CNAs in single cells using single-cell RNA-sequencing (scRNA-seq) data. By using scRNA-seq data from healthy tissue samples it becomes possible to map extremely rare CNA events and to stratify cells by cell type. Our analysis reveals abundant CNAs present in healthy cells, including amplifications of pro-oncogenes and deletions of tumor suppressor genes. Thus, our approach enables the detection of possible oncogenic events at a very early stage. Further, application of this method to human donors of different ages revealed an age-associated increase of CNA events, which opens up the possibility that the accumulation of CNAs may contribute to age-related phenotypes.

C-086: Molecular combing for genetic variant detection in gene editing studies
Track: VarI
  • Rostyslav Makarenko, Genomic Vision, France
  • Engin Altulu, Genomic Vision, France
  • Sana Ahmed-Seghir, Genomic Vision, France
  • Aaron Bensimon, Genomic Vision, France


Presentation Overview: Show

Targeted genome editing, a technique for modifying the DNA of living cells at specific sites, is on the verge of transforming science and medicine. Precise characterization of large structural variants produced by such modifications remains a challenge to current technologies. We used Molecular Combing to characterize large structural variations as a part of Genome Editing Consortium study. Molecular Combing allows the stretching of thousands of DNA molecules on chemically treated coverslips before hybridization using DNA probes targeting a region of interest. The fluorescent probes are detected using Genomic Vision’s FiberVision® automated fluorescence microscope and FiberStudio® analysis software. With a constant stretching factor of 2 kilobases of DNA per micrometre, molecular combing provides an accurate single DNA molecule analysis approach. We have demonstrated that molecular combing joint with artificial intelligence algorithms provides direct visualization tool to effectively detect and quantify large structural variants for gene editing approaches. While detection limits were not identified with the scope of this study, we were able to report the detection of the indels up to 70 kilobases at frequencies up to 0.2%. Molecular Combing could help advance confidence in using genome editing technologies by better characterizing engineered cells to develop safer future therapies.

C-087: Streamlining Genomic Surveillance for Malaria Parasites and Vectors
Track: VarI
  • Matthew Forbes, Wellcome Sanger Institute, United Kingdom
  • Katherine Figueroa, Wellcome Sanger Institute, United Kingdom
  • Antonio Marinho da Silva Neto, Wellcome Sanger Institute, United Kingdom
  • Thomas Maddison, Wellcome Sanger Institute, United Kingdom
  • Simon Suddaby, Wellcome Sanger Institute, United Kingdom
  • Kim Judge, Wellcome Sanger Institute, United Kingdom
  • Andrea Frick-Kretschmer, Wellcome Sanger Institute, United Kingdom
  • Kevin Howe, Wellcome Sanger Institute, United Kingdom


Presentation Overview: Show

The Wellcome Sanger Institute’s Genomic Surveillance Unit (GSU) was formed in 2022 to accelerate the use and impact of genomic surveillance. We work with partners in both high-income and low-middle-income countries to deliver genomic surveillance products to inform on infectious diseases such as Covid-19 and Malaria. Here we present the platform of computational pipelines which we have developed for bringing together multiple genomic surveillance products for the surveillance of Malaria, from the perspective of both the parasite (Plasmodium) and the vector (Anopheles mosquito). We discuss how we work with partners to enable both surveillance and variant discovery through pipelines designed for data from whole genome and amplicon sequencing, and how we incorporate a system of shared modules to serve similar functions across these pipelines using the Nextflow scientific workflow system. Furthermore, we discuss how our experience of delivering these products in the malaria space created a foundation allowing us to forge a relationship with the UK government as an essential partner in the genomic surveillance of Sars-CoV-2. Finally, we will look ahead to how we plan to extend our platform into the surveillance of other respiratory viruses and into future disease areas.

C-088: Graph theory approaches for exome sequencing data analysis
Track: VarI
  • Marie Coutelier, Sorbonne Université, Paris Brain Institute, Inserm, CNRS, APHP, Hôpital de la Pitié-Salpêtrière, Paris, France
  • Léna Guillot-Noël, Sorbonne Université, Paris Brain Institute, Inserm, CNRS, APHP, Hôpital de la Pitié-Salpêtrière, Paris, France
  • Nisha Kabir, Sorbonne Université, Paris Brain Institute, Inserm, CNRS, APHP, Hôpital de la Pitié-Salpêtrière, Paris, France
  • Giulia Coarelli, Sorbonne Université, Paris Brain Institute, Inserm, CNRS, APHP, Hôpital de la Pitié-Salpêtrière, Paris, France
  • Jean-Loup Méreaux, Sorbonne Université, Paris Brain Institute, Inserm, CNRS, APHP, Hôpital de la Pitié-Salpêtrière, Paris, France
  • Claire-Sophie Davoine, Sorbonne Université, Paris Brain Institute, Inserm, CNRS, APHP, Hôpital de la Pitié-Salpêtrière, Paris, France
  • Alexis Brice, Sorbonne Université, Paris Brain Institute, Inserm, CNRS, APHP, Hôpital de la Pitié-Salpêtrière, Paris, France
  • Alexandra Durr, Sorbonne Université, Paris Brain Institute, Inserm, CNRS, APHP, Hôpital de la Pitié-Salpêtrière, Paris, France


Presentation Overview: Show

Spinocerebellar degenerations (SCD) are rare neurodegenerative diseases, caused by mutations in more than 300 genes. Whole exome sequencing (WES) is a technique of choice to identify the causative mutation, offering the best compromise between the amount of data generated, the possibility for clinically relevant interpretation, and the overall cost.

We performed WES in 532 SCD patients and only identified causative mutations in 125, which is linked to inherent limitations of WES. Some mutations, such as non-coding or structural variants, cannot be detected by the technology, covering the coding genome only, furthermore with short reads. Interpretation of variants in genes not yet implicated in SCD is hindered by their extremely rare prevalence and the absence of second pedigrees mutated in the same gene.

We apply unsupervised clustering methods (k-means, affinity propagation, DBSCAN) to gene networks, issued from public databases (protein-protein interaction networks) or derived from relevant singe-cell RNA sequencing data (gene regulatory networks, coexpression networks). We assess the enrichment of SCD genes in robust clusters, defined by intrinsic metrics (silhouette coefficient) and relevance versus known biological pathways (MSigDB). We then use labeling methods (random walks, PageRank, label propagation), crossed with rare variants from WES, to identify candidate new genes in SCD.

C-089: DenGen database of 2211 unrelated whole genome sequenced Danish individuals
Track: VarI
  • Gül Sude Demircan, Center for Genomic Medicine, Rigshospitalet, Denmark
  • Section For Bioinformatics Bioinformatics Team, Center for Genomic Medicine, Rigshospitalet, Denmark
  • Frederik Otzen Bagger, Center for Genomic Medicine, Rigshospitalet, Denmark


Presentation Overview: Show

The DenGen database comprises frequencies of genetic variants in the Danish population, offering valuable information for genomics research and clinical diagnostics. The dataset combines both single-nucleotide variants (SNVs) and structural variants (SVs) of 2211 unrelated individuals in a whole-genome sequencing cohort. The data is obtained from routine clinical practice. Paired-end sequencing was performed on a NovaSeq6000 platform from Illumina using Illumina DNA PCR-free (tagmentation) kit, with an average read-depth of at least 30x. The sequencing data was analyzed using an in-house bioinformatics pipeline, built on GATK's HaplotypeCaller to identify SNVs. For CNV detection, the pipeline utilizes a consensus approach, combining multiple CNV detection algorithms (CNVnator, Delly, Lumpy, and Manta) to address the limitations of individual tools and achieve more robust results. The database provides a collection of aggregated variant frequencies in the Danish population for both researchers and clinicians. The design furthermore allows for refinement of the next N+1 analysis as a specific filter for common variant calls in the Danish population, whether they arise from sequencing errors, tool-specific noise, or as population specific events. The database is available at the Danish National Genome Center (NGC) infrastructure, and the aggregated variant information will be publicly available in an anonymized form.

C-090: Towards automatic ACMG evidence identification for variant interpretation
Track: VarI
  • Francesca Longhin, University of Padova, Italy
  • Alessandro Guazzo, University of Padova, Italy
  • Enrico Longato, University of Padova, Italy
  • Diego Boscarino, AB ANALITICA, Italy
  • Dino Paladin, AB ANALITICA, Italy
  • Nicola Ferro, University of Padova, Italy
  • Barbara Di Camillo, University of Padova, Italy


Presentation Overview: Show

Next-generation sequencing technologies have been producing a growing volume of genomic data, increasing the challenge of sequence variant interpretation. Identifying disease-causing mutations within millions of variants relies on the research of evidence in support of or against variant pathogenicity, a process regulated by American College of Molecular Genetics (ACMG) guidelines, which leverages data from scientific literature. Despite recent improvements towards automation, searching shreds of evidence for pathogenicity in the literature still requires manual curation, a time-consuming process, due to ever-growing number of published papers.
With this work, we provided a reliable manually curated dataset comprising both articles containing (positive) and not containing (negative) evidence related to the PS3 and BS3 ACGM criteria. Moreover, we demonstrated that such dataset can be used to train a predictive model that automatically identifies positive articles.
We performed manual curation on 132 articles retrieved from EuropePMC: 71 positives and 61 negatives. We used a combination of bag-of-words encoding and logistic regression to efficiently identify positive articles, with an F1-score of 0.85. The model’s performance constitutes a clear proof of concept for automatic PS3/BS3 evidence identification. Our dataset represents a useful resource to train further models.