VarI

Attention Presenters - please review the Speaker Information Page available here
Schedule subject to change
All times listed are in CEST
Thursday, July 27th
8:30-8:40
Opening Remarks
Room: Salle Rhone 1
Format: Live from venue

Moderator(s): Julien Gagneur

  • Julien Gagneur
  • Antonio Rausell
8:40-8:50
Developing a new pipeline for exploring pleiotropy of GWAS data at gene-level
Room: Salle Rhone 1
Format: Live from venue

Moderator(s): Julien Gagneur

  • Yazdan Asgari, CESP, INSERM, Paris-Saclay University, Paris-Sud University, Gustave Roussy, Villejuif, France, France
  • Pierre-Emmanuel Sugier, CESP, INSERM, Paris-Saclay University, Paris-Sud University, Gustave Roussy, Villejuif, France, France
  • Taban Baghfalaki, Department of Statistics, Faculty of Mathematical Sciences, Tarbiat Modares University Tehran, Iran, Iran
  • Elise Lucotte, CESP, INSERM, Paris-Saclay University, Paris-Sud University, Gustave Roussy, Villejuif, France, France
  • Mojgan Karimi, CESP, INSERM, Paris-Saclay University, Paris-Sud University, Gustave Roussy, Villejuif, France, France
  • Mohammed Sedki, CESP-U1018 and Université Paris-Saclay, Villejuif, France, France
  • Amélie Ngo, CESP, INSERM, Paris-Saclay University, Paris-Sud University, Gustave Roussy, Villejuif, France, France
  • Benoit Liquet, Université de Pau et des Pays de l’Adour, UMR CNRS 5142, E2S-UPPA, France, France
  • Thérèse Truong, CESP, INSERM, Paris-Saclay University, Paris-Sud University, Gustave Roussy, Villejuif, France, France


Presentation Overview: Show

The use of gene-level analysis to explore cross-phenotype associations can help to identify pleiotropic genes and uncover common underlying mechanisms between diseases. Despite the increasing number of statistical methods available for studying pleiotropy, there is a lack of pipelines that can efficiently apply gene-level analysis to genome-scale data. To address this, we have developed a user-friendly pipeline for gene-level analysis of cross-phenotype associations from GWAS data using the GCPBayes method. Users could easily modify input parameters and run the entire pipeline through a Shiny application. A step-by-step tutorial is provided on GitHub, as well as a Shiny application to visualize the results. We applied the pipeline to breast and ovarian cancer GWAS data, and identified known and novel pleiotropic genes and regions. We also provided recommendations to decrease the computational time when running GCPBayes on genome-scale data.

8:50-9:30
Invited Presentation: Multi-omics characterization of rare heterogeneous tumors
Room: Salle Rhone 1
Format: Live from venue

Moderator(s): Julien Gagneur

  • Matthieu Foll, International Agency for Research on Cancer, France


Presentation Overview: Show

The molecular characterization of tumors has revolutionized our understanding of cancer biology, how they are classified, and how they are treated. However, tumor heterogeneity has been pointed out as a major challenge, pushing the research field toward better understanding the complex interaction between the tumor and its microenvironment. Omic techniques together with system biology approaches have been used to identify biologically and clinically meaningful tumor profiles. In this presentation, I will introduce these concepts, discuss the technical challenges and limitations of these approaches, and provide examples from the Rare Cancers Genomics initiative.

10:00-10:10
Predicting human and viral protein variants affecting COVID-19 susceptibility
Room: Salle Rhone 1
Format: Live from venue

Moderator(s): Julien Gagneur

  • Vaishali Waman, Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK, United Kingdom
  • Paul Ashford, Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK, United Kingdom
  • Su Datt Lam, Universiti Kebangsaan Malaysia, Malaysia
  • Mahnaz Abbasian, Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK, United Kingdom
  • Neeladri Sen, Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK, United Kingdom
  • Laurel Woodridge, Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK, United Kingdom
  • Nicola Bordin, Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK, United Kingdom
  • Ian Sillitoe, Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK, United Kingdom
  • Yonathan Goldtzvik, Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK, United Kingdom
  • Jiaxin Wu, Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK, United Kingdom
  • Christine Orengo, Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK, United Kingdom


Presentation Overview: Show

We used a structural bioinformatics approach to analyse the impact of missense variants from human and viral proteins, using 3D structures of SARS-CoV-2: human protein complexes (obtained from PDB and built using Alphafold2-multimer). Structure-based analyses indicate that missense variants in several human proteins including IFIT2, IFIH1, TOM70 and ISG15, are implicated in increased binding affinity to SARS-CoV-2 proteins. Affinity-enhancing variants in these proteins could promote their binding to SARS-CoV-2 proteins, instead of their natural protein partners or substrates in immune pathways. We report a catalogue of both common and rare variants in these proteins and discuss their structural and functional impact. We do not observe a specific trend in occurrence of these variants in certain specific ethnic groups, however occurrence of certain affinity-enhancing variants could lead increased susceptibility in individuals carrying them. We suggest monitoring of variants in immune proteins using experimental approaches thus would be helpful.

10:10-10:30
Interface-guided phenotyping of coding variants
Room: Salle Rhone 1
Format: Live from venue

Moderator(s): Julien Gagneur

  • Kivilcim Ozturk, UC San Diego, United States
  • Rebecca Panwala, UC San Diego, United States
  • Jeanna Sheen, UC San Diego, United States
  • Prashant Mali, UC San Diego, United States
  • Hannah Carter, UC San Diego, United States


Presentation Overview: Show

Understanding the consequences of single amino acid substitutions in cancer driver genes remains an unmet need. Perturb-seq provides a tool to investigate the effects of individual mutations on cellular programs. In this work, we hypothesized that examining the consequences of perturbing distinct protein interactions could provide a useful abstraction of the phenotypic space reachable by individual amino acid substitutions. To explore this hypothesis, we employed a Perturb-seq style approach to generate mutations at physical interfaces of the transcription factor RUNX1, with the potential to perturb different interactions, and therefore produce transcriptional readouts implicating different aspects of the RUNX1 regulon. We measured the impact of more than 100 mutations on RNA profiles in single myelogenous leukemia cells, and used the profiles to identify functionally distinct groups of RUNX1 mutations, characterize their effects on cellular programs, and study the implications of cancer mutations. The largest concentration of functional mutations clustered at the DNA binding site and contained many of the more frequently observed mutations in human cancers. Overall, our work demonstrates the potential of targeting protein interaction interfaces to better define the landscape of prospective phenotypes reachable by amino acid substitutions.

10:30-10:40
An augmented transformer model trained on family specific variant data leads to improved prediction of variants of uncertain significance
Room: Salle Rhone 1
Format: Live from venue

Moderator(s): Julien Gagneur

  • Dinesh Joshi, Tata Consultancy Services, India
  • Swatantra Pradhan, Tata Consultancy Services, India
  • Rakshanda Sajeed, Tata Consultancy Services, India
  • Rajgopal Srinivasan, Tata Consultancy Services, India
  • Sadhna Rana, Tata Consultancy Services, India


Presentation Overview: Show

We present an improved method for predicting variants of uncertain significance (VUS) using a transfer learning approach leveraging a pre-trained protein language model. We demonstrate the accuracy of our method by testing it on VUS of an enzyme NAGLU
(Alpha-N-acetylglucosaminidase) whose deficiency due to mutations is known to cause a rare genetic disorder, Mucopolysaccharidosis IIIB or Sanfillipo B disease. Our model combines zero-shot log odd scores from evolutionary scale model (ESM-2) as a feature along with embeddings from ESM-2 as features for training a supervised model on gene variants functionally related to NAGLU. Our model augmented with contextual information from gene family improves the prediction of VUS in the NAGLU gene and outperforms state of the art pathogenicity predictors. Our results indicate that genes that have sparse or no experimental variant impact data, the family variant data can serve as a proxy training data for making accurate predictions.

10:40-10:50
Pathogenicity scoring of genetic variants through federated learning across independent institutions reaches comparable or superior performance than the centralized-data model counterparts.
Room: Salle Rhone 1
Format: Live from venue

Moderator(s): Julien Gagneur

  • Nigreisy Montalvo, Clinical Bioinformatics Laboratory, Université Paris Cité, Imagine Institute, INSERM UMR 1163, Paris, France., France
  • Francisco Requena, Clinical Bioinformatics Laboratory, Université Paris Cité, Imagine Institute, INSERM UMR 1163, Paris, France, France
  • Antonio Rausell, Clinical Bioinformatics Laboratory, Université Paris Cité, Imagine Institute, INSERM UMR 1163, Paris, France., France


Presentation Overview: Show

The clinical assessment of genetic variants often requires the use of bioinformatics scores. Supervised machine-learning models trained on publicly-available sets of pathogenic and benign variants have proven valuable for variant prioritization. Yet, large collections of variants generated at hospitals and research institutions remain inaccessible to machine-learning purposes because of privacy and legal constraints. Federated learning (FL) algorithms have been recently developed enabling multiple institutions to collaboratively train models without sharing their local datasets. Here we evaluated the capacity of FL strategies to classify pathogenic and benign variants as compared to the individual data owners or to the centralized-data model counterparts. FL scenarios across real-world institutions were mimicked taking advantage of the variants submitter information in ClinVar database. Specific models for deletion Copy Number Variants as well as for coding and non-coding Single Nucleotide Variants were investigated. Such sets presented different degrees of data imbalance and of non-identical feature distributions across institutions. Our results showed that FL models reached competitive or superior performances across the different scenarios, with the FedProx optimization strategy and lack of batch normalization generally leading to the best results under the hyperparameters evaluated. Our study highlights the benefits of collaborative machine-learning strategies for clinical variant assessment.

10:50-11:00
Hypothesis-free phenotype prediction within a genetics-first framework
Room: Salle Rhone 1
Format: Live from venue

Moderator(s): Julien Gagneur

  • Chang Lu, Imperial College London, United Kingdom
  • Jan Zaucha, University of Bristol, United Kingdom
  • Rihab Gam, MRC Laboratory of Molecular Biology, United Kingdom
  • Hai Fang, Shanghai Jiao Tong University School of Medicine, China
  • Ben Smithers, University of Bristol, United Kingdom
  • Matt Oates, University of Bristol, United Kingdom
  • Miguel Bernabe-Rubio, Kings College London, United Kingdom
  • James Williams, Kings College London, United Kingdom
  • Natalie Zelenka, University of Bristol, United Kingdom
  • Arun Pandurangan, University of Cambridge, United Kingdom
  • Himani Tandon, MRC Laboratory of Molecular Biology, United Kingdom
  • Hashem Shihab, University of Bristol, United Kingdom
  • Raju Kalaivani, MRC Laboratory of Molecular Biology, United Kingdom
  • Minkyung Sung, MRC Laboratory of Molecular Biology, United Kingdom
  • Adam Sardar, University of Bristol, United Kingdom
  • Bastian Greshake Tzovoras, The Alan Turing Institute, United Kingdom
  • Davide Danovi, Kings College London, United Kingdom
  • Julian Gough, MRC Laboratory of Molecular Biology, United Kingdom


Presentation Overview: Show

Cohort-wide sequencing studies have revealed that the largest category of variants is those deemed ‘rare’, even for the subset located in coding regions (99% of known coding variants are seen in less than 1% of the population. Associative methods give some understanding how rare genetic variants influence disease and organism-level phenotypes. But here we show that additional discoveries can be made through a knowledge-based approach using protein domains and ontologies (function and phenotype) that con- siders all coding variants regardless of allele frequency. We describe an ab initio, genetics-first method making molecular knowledge-based interpretations for exome-wide non-synonymous variants for phenotypes at the organ- ism and cellular level. By using this reverse approach, we identify plausible genetic causes for developmental disorders that have eluded other established methods and present molecular hypotheses for the causal genetics of 40 phenotypes generated from a direct-to-consumer genotype cohort. This system offers a chance to extract further discovery from genetic data after standard tools have been applied.

11:00-11:20
Predicting the prevalence of complex genetic diseases from individual genotype profiles using capsule networks
Room: Salle Rhone 1
Format: Live from venue

Moderator(s): Julien Gagneur

  • Xiao Luo, Bielefeld University, Germany
  • Xiongbin Kang, Bielefeld University, Germany
  • Alexander Schoenhuth, Bielefeld University, Germany


Presentation Overview: Show

Diseases that have a complex genetic architecture tend to suffer from considerable amounts of genetic variants that, although playing a role in the disease, have not yet been revealed as such. Two major causes for this phenomenon are genetic variants that do not stack up effects, but interact in complex ways; in addition, as recently suggested, the omnigenic model postulates that variants interact in a holistic manner to establish disease phenotypes.

We present DiseaseCapsule, as a capsule network based approach that explicitly addresses to capture the hierarchical structure of the underlying genome data, exploiting the non-linear relationships between variants and disease across the whole genome. DiseaseCapsule is the first such approach to operate in a whole-genome manner when predicting disease occurrence from individual genotype profiles.

DiseaseCapsule achieves 86.9% accuracy on hold-out test data in predicting the occurrence of ALS, known to have a particularly complex genetic architecture and affected by 40% heritability, thereby outperforming all other approaches by large margins. DiseaseCapsule required sufficiently less training data for reaching optimal performance. The systematic exploitation of the network architecture yielded 922 genes of particular interest, and 644 “non-additive” genes that are crucial factors in DiseaseCapsule, but remain masked within linear schemes.

11:20-12:00
Invited Presentation: Network Medicine – From protein-protein to human-machine interactions
Room: Salle Rhone 1
Format: Live from venue

Moderator(s): Julien Gagneur

  • Jörg Menche
13:20-13:40
XClone: detection of allele-specific subclonal copy number variations from single-cell transcriptomic data
Room: Salle Rhone 1
Format: Live from venue

Moderator(s): Antonio Rausell

  • Rongting Huang, The University of Hong Kong, Hong Kong
  • Xianjie Huang, The University of Hong Kong, Hong Kong
  • Yin Tong, The University of Hong Kong, Hong Kong
  • Helen Y.N. Yan, The University of Hong Kong, Hong Kong
  • Suet Yi Leung, The University of Hong Kong, Hong Kong
  • Oliver Stegle, The University of Hong Kong, Hong Kong
  • Yuanhua Huang, The University of Hong Kong, Hong Kong


Presentation Overview: Show

Somatic copy number variations (CNVs) are major mutations in various cancers for their development and clonal progression. A few computational methods have been proposed to detect CNVs from single-cell transcriptomic data. Still, the technical sparsity makes it challenging to identify allele-specific CNVs, especially in complex clonal structures. Here we present a statistical method, XClone, to detect haplotype-aware CNVs by integrating expression levels and allelic balance from scRNA-seq data. With well-annotated datasets on multiple cancer types, we demonstrated that XClone could accurately detect different types of allele-specific CNVs, enabling the discovery of the corresponding subclones and dissection of their phenotypic impacts.

13:40-14:00
Comprehensive Identification and Characterization of Splicing Associated Variants with Coverage Aware Statistical Models
Room: Salle Rhone 1
Format: Live from venue

Moderator(s): Antonio Rausell

  • David Wang, University of Pennsylvania, United States
  • Matthew Gazzara, University of Pennsylvania, United States
  • San Jewell, University of Pennsylvania, United States
  • Christopher Brown, University of Pennsylvania, United States
  • Yoseph Barash, University of Pennsylvania, United States


Presentation Overview: Show

Identification and characterization of splicing quantitative trait loci (sQTLs) has emerged as a critical component in understanding the function of noncoding genetic variants implicated in disease. However, a significant number of sQTLs remain undiscovered due to limitations in both splicing quantification and statistical methods. Here we present a sQTL mapping approach that identifies thousands of novel variants that have been recurrently omitted in recent studies. Our method combines event and transcript level quantifications to identify variants associated with a more comprehensive set of splicing phenotypes that includes intron retention and alternative transcript start/end which are not considered by existing approaches. We also develop statistical methods to handle discrete and highly correlated multivariate splicing phenotypes which have more power to detect sQTLs while reducing false discoveries. Through modeling of overdispered count data, phenotype correlation, missing values, and heteroscedasticity, our model outperforms current methods which were adapted 'as is' from eQTL studies but are still the standard in the field. Using GTEX as a case study, we show that over 25% of sQTLs are not reported across multiple tissues. To facilitate downstream variant interpretation, we also introduce improved visualization tools and identify novel variants associated with intron retention in Alzheimer’s genes.

14:00-14:10
A statistical approach to identify regulatory DNA variations combined with epigenomics data reveals novel non-coding disease genes
Room: Salle Rhone 1
Format: Live from venue

Moderator(s): Antonio Rausell

  • Nina Baumgarten, Goethe University, Frankfurt am Main, Germany
  • Chaonan Zhu, Goethe University, Frankfurt am Main, Germany
  • Meiqian Wu, Goethe University, Frankfurt am Main, Germany
  • Yue Wang, Goethe University, Frankfurt am Main, Germany
  • Arka-Provo Das, Goethe University, Frankfurt am Main, Germany
  • Jaskiran Kaur, Goethe University, Frankfurt am Main, Germany
  • Fatemeh Behjati, Goethe University, Frankfurt am Main, Germany
  • Duong, Genome Biologics, Germany
  • Minh Duc Pham, Goethe University, Frankfurt am Main, Germany
  • Maria Duda, Genome Biologics, Germany
  • Laura Rumpf, Goethe University, Frankfurt am Main, Germany
  • Stefanie Dimmeler, Goethe University, Frankfurt am Main, Germany
  • Ting Yuan, Goethe University, Frankfurt am Main, Germany
  • Thorsten Kessler, German Heart Centre Munich, Germany
  • Jaya Krishnan, Goethe University, Frankfurt am Main, Germany
  • Marcel H. Schulz, Goethe University, Frankfurt am Main, Germany


Presentation Overview: Show

Non-coding variations located within regulatory elements may alter gene expression by modifying Transcription Factor (TF) binding sites and thereby lead to functional consequences like various traits or diseases. To understand these molecular mechanisms, different approaches are being used to assess the effect of DNA sequence variations, such as Single Nucleotide Polymorphisms (SNPs) on TF binding.
We investigate the distribution of maximal differential TF binding scores for general computational models that assess TF binding. We find that a modified Laplace distribution can adequately approximate the empirical distributions. A benchmark on in vitro and in vivo data sets showed that our new approach improves on an existing method in terms of performance and speed.
We combined our statistical approach in a flexible workflow with epigenetic data. Further we provide additional functionalities e.g., the identification of SNP-specific TFs conducted in a TF enrichment statistic. Applications on large sets of eQTL and SNPs obtained from genome wide association studies illustrate the usefulness of our approach to highlight cell-type specific regulators and disease associated target genes.
To conclude, our fast and accurate approach allows to evaluate DNA changes that induce differential TF binding, where the incorporation of epigenetic data results in a cell-type specific prediction.

14:10-14:20
SpliceSM: machine learning discovery of splice-altering variants using susceptibility maps
Room: Salle Rhone 1
Format: Live from venue

Moderator(s): Antonio Rausell

  • Steve Monger, Victor Chang Cardiac Research Institute, Australia
  • Paul Young, Victor Chang Cardiac Research Institute, Australia
  • Eleni Giannoulatou, Victor Chang Cardiac Research Institute, Australia


Presentation Overview: Show

Genetic variants that disrupt pre-mRNA splicing are a significant cause of pathogenic variants, yet many of these variants remain undetected. Computational prediction plays a crucial role in identifying splice-altering variants from patient genomes. However, current methods have limitations in prediction accuracy, and new methods are required to overcome these challenges.

In this study, we developed SpliceSM, a python command-line tool that accepts VCF files as input and generates scores reflecting each variant's potential to disrupt splicing. SpliceSM utilizes established prediction features and novel features we have developed, which capture variant susceptibility in various ways. These novel features leverage information from population-scale transcriptomic datasets.

SpliceSM uses an XGBoost classifier that we trained on a dataset of 3,952 splice-altering variants curated from the literature and a balanced set of negatives. Our benchmarking results demonstrate that SpliceSM outperforms existing methods in all areas. Compared to the highest-performing existing method, SpliceSM exhibited substantial improvements in prediction accuracy for variants within splice sites (ROC AUC 0.96 vs. 0.93) and outside of splice sites (PR AUC 0.67 vs. 0.61).

Overall, SpliceSM is a novel method for discovering splice-altering variants with unmatched accuracy.

14:20-15:00
Invited Presentation: Strategies to annotate and interpret non-coding variants in rare disease
Room: Salle Rhone 1
Format: Live-stream

Moderator(s): Antonio Rausell

  • Nicky Whiffin


Presentation Overview: Show

It is increasingly being recognised that variants outside of protein-coding regions of the genome play an important role in rare disease. However, it can often be difficult to identify and annotate non-coding region variants due to the lack of available tools and limitations in our understanding of the underlying mechanisms. Here, I will discuss our recent approaches to annotate and interpret non-coding region variants, in particular using the Genomics England 100,000 genomes project. I will also discuss our work to elucidate disease-causing mechanisms associated with high-impact variants in untranslated regions, and development of clinical guidelines to enable routine classification of non-coding region variants in clinical settings.

15:30-15:50
Proceedings Presentation: Deep Local Analysis deconstructs protein - protein interfaces and accurately estimates binding affinity changes upon mutation
Room: Salle Rhone 1
Format: Live from venue

Moderator(s): Antonio Rausell

  • Yasser Mohseni Behbahani, Sorbonne Université, France
  • Elodie Laine, Sorbonne Université - Laboratory of Computational and Quantitative Biology (LCQB, CNRS-SU), France
  • Alessandra Carbone, Sorbonne Universite, France


Presentation Overview: Show

The spectacular recent advances in protein and protein complex structure prediction hold promise for reconstructing interactomes at large scale and residue resolution. Beyond determining the 3D arrangement of interacting partners, modeling approaches should be able to unravel the impact of sequence variations on the strength of the association. In this work, we report on Deep Local Analysis (DLA), a novel and efficient deep learning framework that relies on a strikingly simple deconstruction of protein interfaces into small locally oriented residue-centered cubes and on 3D convolutions recognizing patterns within cubes. Merely based on the two cubes associated with the wild-type and the mutant residues, DLA accurately estimates the binding affinity change for the associated complexes. It achieves a Pearson correlation coefficient of 0.735 on about 400 mutations on unseen complexes. Its generalization capability on blind datasets of complexes is higher than the state-of-the-art methods. We show that taking into account the evolutionary constraints on residues contributes to predictions. We also discuss the influence of conformational variability on performance. Beyond the predictive power on the effects of mutations, DLA is a general framework for transferring the knowledge gained from the available non-redundant set of complex protein structures to various tasks. For instance, given a single partially masked cube, it recovers the identity and physico-chemical class of the central residue. Given an ensemble of cubes representing an interface, it predicts the function of the complex. Source code and models are available at http://gitlab.lcqb.upmc.fr/DLA/DLA.git.

15:50-16:10
Modeling endogenous editing outcome of base editor reporter screens with CRISPR-Bean discovers causal variants for cellular LDL uptake
Room: Salle Rhone 1
Format: Live from venue

Moderator(s): Antonio Rausell

  • Jayoung Ryu, Harvard Medical School, Massachusetts General Hospital and Broad Institute of Harvard and MIT, Boston, MA, USA, United States
  • Sam Barkal, Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA, United States
  • Tian Yu, Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA, United States
  • Matthew Francoer, Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA, United States
  • Martin Jankowiak, Generate Biomedicines, United States
  • Zhijian Li, Broad Institute of Harvard and MIT and Molecular Pathology Unit, Massachusetts General Hospital, Cambridge, MA, USA, United States
  • Michael Love, Department of Biostatics and Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA, United States
  • Richard Serwood, Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA, United States
  • Luca Pinello, Massachusetts General Hospital, Broad Institute of Harvard and MIT and Harvard Medical School, Charlestown, MA, USA, United States


Presentation Overview: Show

CRISPR base editor screens are powerful tools for studying disease-associated variants at scale. However, the efficiency and precision of base editing perturbations vary, frequently leading to incomplete and unintended edits, which confounds the assessment of variant-induced phenotypic effects from such screens.
To overcome these challenges we have developed CRISPR-Bean, a framework that directly utilizes endogenous editing outcomes for variant effect quantification in base editor screens by combining recently developed base editor reporter assays with Bayesian generative models. We show that base editor reporters faithfully recapitulate endogenous site editing outcomes while being dependent on accessibility. CRISPR-Bean directly models per-guide editing outcomes and target site accessibility when modeling phenotypic impacts of target variants. We deployed CRISPR-Bean on base editor screens with reporters for cellular LDL uptake on LDL-C-associated GWAS candidate variants and coding variants on LDL receptor exons. We show that CRISPR-Bean attains superior performance in variant classification and effect size quantification. With this improved sensitivity, CRISPR-Bean identifies novel coding and noncoding variants that alter LDL uptake which are further validated and characterized of their mechanism of action. This work provides a novel and widely applicable approach that improves the power of base editor screens for disease-associated variant characterization.

16:10-16:20
From genotype to phenotype in Arabidopsis thaliana: in-silico genome interpretation predicts 288 phenotypes from sequencing data
Room: Salle Rhone 1
Format: Live from venue

Moderator(s): Antonio Rausell

  • Daniele Raimondi, KU Leuven, Leuven, Belgium, Belgium
  • Massimiliano Corso, Institut Jean-Pierre Bourgin, Université Paris-Saclay, INRAE, AgroParisTech, 78000 Versailles, France, France
  • Piero Fariselli, Department of Medical Sciences, University of Torino, 10123 Torino, Italy, Italy
  • Nora Verplaetse, Katholieke Universiteit Leuven, Belgium
  • Antoine Passemiers, Katholieke Universiteit Leuven, Belgium
  • Francesco Codicè, Department of Medical Sciences, University of Torino, 10123 Torino, Italy, Italy
  • Yves Moreau, Katholieke Universiteit Leuven, Belgium


Presentation Overview: Show

In many cases, the unprecedented availability of data provided by high-throughput sequencing has shifted the bottleneck from a data availability issue to a data interpretation issue, thus delaying the promised breakthroughs in genetics and precision medicine, for what concerns Human genetics, and phenotype prediction to improve plant adaptation to climate change and resistance to bioagressors, for what concerns plant sciences. In this paper, we propose a novel Genome Interpretation paradigm, which aims at directly modeling the genotype-to-phenotype relationship, and we focus on A. thaliana since it is the best studied model organism in plant genetics. Our model, called Galiana, is the first end-to-end Neural Network (NN) approach following the genomes in/phenotypes out paradigm and it is trained to predict 288 real-valued Arabidopsis thaliana phenotypes from Whole Genome sequencing data. We show that 75 of these phenotypes are predicted with a Pearson correlation ≥0.4, and are mostly related to flowering traits. We show that our end-to-end NN approach achieves better performances and larger phenotype coverage than models predicting single phenotypes from the GWAS-derived known associated genes. Galiana is also fully interpretable, thanks to the Saliency Maps gradient-based approaches.

16:20-16:30
Closing Remarks
Room: Salle Rhone 1
Format: Live from venue

Moderator(s): Antonio Rausell

  • Julien Gagneur
  • Antonio Rausell