Attention Presenters - please review the Speaker Information Page available here
Schedule subject to change
All times listed are in BST
Monday, July 21st
11:20-12:00
Invited Presentation: Enhancing Multi-Task CNNs for Regulatory Genomics Through Allelic and High-Resolution Training
Room: 04AB
Format: In person

Moderator(s): Emidio Capriotti


Authors List: Show

  • Alexander Sasse

Presentation Overview: Show

As we age, all cells in our bodies continuously acquire somatic mutations. Despite appearing histologically normal, many tissues become progressively colonised by microscopic clones carrying somatic driver mutations. Some of these clones represent a first step towards cancer whereas others may contribute to ageing and other diseases. However, our understanding of the clonal landscapes of human tissues, and their impact on cancer risk, ageing and disease, remains limited due to the challenge of detecting somatic mutations present in small numbers of cells. In this presentation, I will summarise the methodological advances that have occurred over the last decade that have enabled us to first discover and subsequently interrogate these microscopic clones. In particular, I will focus on our recent development of nanorate sequencing (NanoSeq), a duplex sequencing method with error rates of <5 per billion base pairs, which is compatible with whole-exome and targeted gene sequencing. Deep sequencing of polyclonal samples with single-molecule sensitivity enables the simultaneous detection of mutations in large numbers of clones, yielding accurate somatic mutation rates, mutational signatures and driver mutation frequencies in any tissue. Applying Targeted NanoSeq to 1,042 non-invasive samples of oral epithelium and 371 samples of blood from a twin cohort, we found an unprecedentedly rich landscape of selection, with 46 genes under positive selection driving clonal expansions in the oral epithelium, over 62,000 driver mutations, and evidence of negative selection in some genes. The high number of positively selected mutations in multiple genes provides high-resolution maps of selection across coding and non-coding sites, a form of in vivo saturation mutagenesis.

12:00-12:20
Combining massively parallel reporter assays and graph genomics to assay the regulatory effects of indels and structural variants
Confirmed Presenter: Lindsey Plenderleith, Roslin Institute, University of Edinburgh, United Kingdom

Room: 04AB
Format: In person

Moderator(s): Emidio Capriotti


Authors List: Show

  • Lindsey Plenderleith, Roslin Institute, University of Edinburgh, United Kingdom
  • Rachel Owen, Roslin Institute, University of Edinburgh, United Kingdom
  • Timothy Connelley, Roslin Institute, University of Edinburgh, United Kingdom
  • Musa Hassan, Roslin Institute, University of Edinburgh, United Kingdom
  • Liam Morrison, Roslin Institute, University of Edinburgh, United Kingdom
  • James Prendergast, Roslin Institute, University of Edinburgh, United Kingdom

Presentation Overview: Show

Many important phenotypes are driven by differences in gene expression caused by variation in regulatory sequences between individuals. Among such variants, the effects of larger changes such as insertion-deletion mutations (indels) and structural variants (SVs) remain understudied relative to single nucleotide variants (SNVs), even though they may often have larger regulatory impacts. We have used the Survey of Regulatory Effects (SuRE) approach, a genome-wide massively parallel reporter assay, to screen the cattle and human genomes to identify SNVs with regulatory effects, and are now leveraging this approach to study the effects of larger variants. The SuRE method, which tests the ability of individual genomic DNA fragments to initiate transcription in an otherwise promoterless plasmid, allows the effects of individual variants to be tested, considerably reducing the confounding impact of linkage disequilibrium. By combining SuRE with a novel graph genomics pipeline we have been able to improve the detection of regulatory effects of indels and SVs. We successfully tested almost 1.4 million indels and SVs, ranging in size from 1 bp to 1.5 kb, and identified around 13,000 with a significant effect on gene expression in primary cattle cells. Work is ongoing to characterise further these potential regulatory variants and their relevance to understanding how indels and SVs shape important phenotypes. These results validate our method as a new tool for evaluating the functional effects of longer variants.

12:20-12:30
Multilingual model improves zero-shot prediction of disease effects on proteins
Confirmed Presenter: Ruyi Chen, The University of Queensland, Australia

Room: 04AB
Format: In person

Moderator(s): Emidio Capriotti


Authors List: Show

  • Ruyi Chen, The University of Queensland, Australia
  • Nathan Palpant, The University of Queensland, Australia
  • Gabriel Foley, The University of Queensland, Australia
  • Mikael Boden, The University of Queensland, Australia

Presentation Overview: Show

Models for mutation effect prediction in coding sequences rely on sequence-, structure-, or homology-based features. Here, we introduce a novel method that combines a codon language model with a protein language model, providing a dual representation for evaluating effects of mutations on disease. By capturing contextual dependencies at both the genetic and protein level, our approach achieves a 3% increase in ROC-AUC classifying disease effects for 137,350 ClinVar missense variants across 13,791 genes, outperforming two single-sequence-based language models. Obviously the codon language model can uniquely differentiate synonymous from nonsense mutations at the genomic level. Our strategy uses information at complementary biological scales (akin to human multilingual models) to enable protein fitness landscape modeling and evolutionary studies, with potential applications in precision medicine, protein engineering, and genomics.

12:30-12:40
X-MAP: Explainable AI Platform for Genetic Variant Interpretation
Room: 04AB
Format: In person

Moderator(s): Emidio Capriotti


Authors List: Show

  • Marco Anteghini, BioFolD Unit, Department of Pharmacy and Biotechnology, University of Bologna, 40126 Bologna, Italy, Italy
  • Andrea Zauli, BioFolD Unit, Department of Pharmacy and Biotechnology, University of Bologna, 40126 Bologna, Italy, Italy
  • Emidio Capriotti, BioFolD Unit, Department of Pharmacy and Biotechnology, University of Bologna, 40126 Bologna, Italy, Italy

Presentation Overview: Show

Genetic variants, particularly missense mutations, can significantly affect protein function and contribute to disease development. Methods like CADD and AlphaMissense are widely used for pathogenicity prediction; however, their integration into existing resources remains limited due to compatibility issues and high computational demands.
We introduce X-MAP, an integrated platform that leverages protein language models to enhance variant effect prediction through a novel embedding-based strategy. This approach captures both local and global protein features, enabling more accurate interpretation of mutation impacts.
Our method generates embeddings for entire protein sequences using multiple state-of-the-art models—ESM2, ESMC, and ESM1v—and extracts contextual information around mutation sites using a dynamic window of four residues on each side. This window size was empirically optimized to balance detailed local structure with computational efficiency
We evaluated both concatenation and difference-based embedding strategies using rigorous 10-fold cross-validation with XGBoost classifiers on a large dataset of 71,595 genetic variants across 12,666 human proteins. Among all methods, the ESMC concatenation strategy with the 4-residue window achieved the highest performance (Accuracy: 0.84, MCC: 0.66, AUC: 0.90), outperforming the Esnp baseline (Accuracy: 0.82, MCC: 0.64, AUC: 0.82), which relies on full sequence concatenation.
By concentrating on regions directly affected by mutations while retaining global sequence context, X-MAP achieves both accuracy and computational efficiency. We are currently developing a hybrid Transformer-CNN model to further enhance prediction accuracy and interpretability. X-MAP represents a powerful and scalable framework for variant analysis with direct applications in precision medicine and disease research.

12:40-12:50
StructGuy: Data leakage free prediction of functional effects of genetic variants.
Confirmed Presenter: Alexander Gress, Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Germany

Room: 04AB
Format: In person

Moderator(s): Emidio Capriotti


Authors List: Show

  • Alexander Gress, Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Germany
  • Johanna Becher, Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Germany
  • Dominique Mias-Lucquin, Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Germany
  • Sebastian Keller, Max Planck Institute for Informatics, University of Saarland, Germany
  • Olga Kalinina, Max Planck Institute for Informatics, Germany

Presentation Overview: Show

In recent years, machine learning models for predicting variant effects on protein function have been dominated by unsupervised models doing zero-shot predictions on the task. In their development, multiplexed assay of variant effect (MAVE) data played only a secondary role used for model evaluation, most prominently applied in the ProteinGym benchmark. Yet, the rapidly increasing amount of available MAVE data should be able to fuel novel supervised predictions models, but is hindered by data leakage resulting when MAVE data is used to train a supervised model. Such models are not able to generalize their predictions to proteins not present in the training data, hence so far they are only used in protein design tasks.
Here, we present the novel random forest-based prediction method StructGuy that overcomes the problem of data leakage by applying sophisticated splits in hyperparameter optimization and feature selection. By removing proteins similar to any proteins in our training data set from ProteinGym, we constructed a dedicated benchmark that aims to evaluate the ability of a supervised model to generalize to proteins not seen in the training data. In this benchmark, we could do a direct and fair comparison of our StructGuy model with all models that are part in the zero-shot substitutions track of ProteinGym, and were able to demonstrate a slightly higher average Spearmans' correlation coefficient (0.45 vs. second highest: ProtSSN: 0.43). StructGuy directly applied on ProteinGym results in an average Spearmans' correlation coefficient of 0.6.

12:50-13:00
Functional characterization of standing variation around disease-associated genes using Massively Parallel Reporter Assays
Confirmed Presenter: Kilian Salomon, Berlin Institute of Health at Charité (BIH), Germany

Room: 04AB
Format: In person

Moderator(s): Emidio Capriotti


Authors List: Show

  • Kilian Salomon, Berlin Institute of Health at Charité (BIH), Germany
  • Chengyu Deng, University of California, San Francisco (UCSF), United States
  • Jay Shendure, University of Washington, United States
  • Max Schubach, Berlin Institute of Health at Charité (BIH), Germany
  • Nadav Ahituv, University of California, San Francisco (UCSF), United States
  • Martin Kircher, Berlin Institute of Health at Charité (BIH), Germany

Presentation Overview: Show

A substantial reservoir of disease-associated variants resides in non-coding sequences, particularly in proximal and distal gene regulatory sequences. As part of the NIH Impact of Genomic Variation on Function (IGVF) consortium, we investigated functional genetic variation using Massively Parallel Reporter Assays (MPRAs). We tested >28,000 candidate cis-regulatory regions (cCREs) in the proximity (50kb) of 526 neural, cardiac or clinically actionable genes as well as a random gene set. Within these cCREs, we included >46,000 variants from gnomAD. This included all single nucleotide variants (SNVs) with allele frequency (AF) ≥1% as well as 35,000 rare and singleton variants. Rare variants were prioritized using Enformer (Avsec Ž et al. 2021) to select 70% potentially activating, 15% repressing, and 15% random variants.
Performing this MPRA in NGN2-derived neurons from WTC-11 cells showed that 16% (4045) of cCREs have significantly different activity from negative controls, while 6% (1540) of elements exhibit distinct activity from scrambled controls (dCREs). Among the dCREs, 3.3% are significantly more active and 2.7% were less active. About 3% (1304) of the tested variants showed a significant allelic effect. We observed both common and rare variants with significant allelic effects, with rare variants contributing the larger proportion. Examples of significant common and singleton SNVs include rs11635753 and rs1257445811 affecting SMAD3 and TRIO, respectively, and both associated with complex neurological phenotypes. Using Enformer for prioritization resulted in an enrichment in the selected rare variants but also failed to effectively capture regulatory grammar at base resolution.

14:00-14:40
Invited Presentation: Variant Interpretation at Scale, for safer and more effective disease treatment
Room: 04AB
Format: In person

Moderator(s): Julien Gagneur


Authors List: Show

  • Ellen McDonagh
14:40-15:00
scFunBurd: Quantifying the cellular liability for complex disorders of all rare gene-disrupting variants.
Room: 04AB
Format: In person

Moderator(s): Julien Gagneur


Authors List: Show

  • Thomas Renne, Université de Montréal, Canada
  • Guillaume Huguet, CHU Sainte Justine, Canada
  • Tomasz Nowakowski, University of California, San Francisco, United States
  • Sébastien Jacquemont, CHU Sainte Justine, Canada

Presentation Overview: Show

Neurodevelopmental disorders are examples of complex disorders with multidimensional etiologies. This study focuses on Autism Spectrum Disorder (ASD), a prevalent and highly heritable disorder, to illustrate the challenges of identifying rare variants associated with a complex disorder. Previous research has linked only a hundred genes to ASD. However, the majority of gene-disrupting variants and their functions remain unknown. This study aims to develop a cellular burden analysis to associate the rest of the rare gene-disrupting variants with complex disorders on a function-wide scale with the help of transcriptomic datasets.

The study relied on 100,000 phenotyped and sequenced individuals from the SPARK dataset. Transcriptomic data are single-nuclei RNAseq of 150,000 cortical cells from 40 individuals, clustered into 91 developmental cell types. Cell type burdens were computed with logistic regression models of the most cell-type specific genes.

Our results showed that Loss of Function (LoF) and CNVs had significant liabilities in neuronal cell types. Interestingly, we also identified significant liabilities for ASD in non-neuronal cell types for LoF, which were never pointed out. Moreover, each variant type exhibited unique patterns of cellular liability, highlighting the need to study them individually. Finally, we observed that the cellular burden was mostly resulting of genes never associated with ASD.

The scFunBurd method effectively identified new functional processes associated with complex disorders, and offers insights into rare variants not yet linked to ASD. This method could therefore be applied to other complex disorders to uncover their functional liabilities.

15:00-15:20
Biostatistical approaches to single-cell perturbation screens to create a prospective map of mutational impact
Confirmed Presenter: Magdalena Strauss, University of Exeter, United Kingdom

Room: 04AB
Format: In person

Moderator(s): Julien Gagneur


Authors List: Show

  • Magdalena Strauss, University of Exeter, United Kingdom
  • Sarah Cooper, Wellcome Sanger Institute, United Kingdom
  • Matthew Coelho, Wellcome Sanger Institute, United Kingdom
  • Aleksander Gontarczyk Gontarczyk, Wellcome Sanger Institute, United Kingdom
  • Qianxin Wu, Wellcome Sanger Institute, United Kingdom
  • Alex Watterson, Wellcome Sanger Institute, United Kingdom
  • John Marioni, European Bioinformatics Institute EMBL-EBI, United Kingdom
  • Mathew Garnett, Wellcome Sanger Institute, United Kingdom
  • Andrew Bassett, Wellcome Sanger Institute, United Kingdom

Presentation Overview: Show

DNA single nucleotide variants are a major cause of drug resistance in cancer, but for most variants their effects on drug response are yet unknown. While new SNVs are discovered at an increasing rate, the interpretation of their impacts presents a major bottleneck in clinical use.
To address this bottleneck, we developed a suite of statistical analysis tools that allowed the creation of a prospective map of mutational impact from new experimental techniques that combine gene editing data with RNA and DNA sequencing readout at the single-cell level. Our tools shed light on the degree of malignancy of individual mutations, on changes in gene regulation resulting from mutations, and on potential drug targets, and also include methods to model the specific noise structure of single-cell data for the gene editing context.
First, we studied IFNγ response across different mutations to the JAK1 gene in colon cancer cells[1], and demonstrated the accuracy of our computational tools by linking genotype with transcriptional phenotype in 9,908 cells for scDNA-seq and 18,978 cells for scRNA-seq, encompassing 97 unique genotypes with low error-rates for known genotype-phenotype relationships.
In a second application[2], we studied the transcriptional profiles of drug-resistant colon cancer cells at scale, following exposure to the drugs dabrafenib and cetuximab. Our approach shed light on transcriptional differences between different types of drug resistance, including drug addiction.

References:

1. Cooper*, Coelho*, Strauss*, et al. Genome Biol 25, 20 (2024).

2. Coelho, Strauss, Watterson, et al. Nat. Genet. (2024).

15:20-15:30
SpliceTransformer predicts tissue-specific splicing linked to human diseases
Confirmed Presenter: Ning Shen, Zhejiang University, China

Room: 04AB
Format: In person

Moderator(s): Julien Gagneur


Authors List: Show

  • Ningyuan You, Zhejiang University, China
  • Chang Liu, Liangzhu Laboratory, China
  • Ning Shen, Zhejiang University, China

Presentation Overview: Show

We present SpliceTransformer (SpTransformer), a deep-learning framework that predicts tissue-specific RNA splicing alterations linked to human diseases based on genomic sequence. SpTransformer outperforms all previous methods on splicing prediction. Application to approximately 1.3 million genetic variants in the ClinVar database reveals that splicing alterations account for 60% of intronic and synonymous pathogenic mutations, and occur at different frequencies across tissue types. Importantly, tissue-specific splicing alterations match their clinical manifestations independent of gene expression variation. We validate the enrichment in three brain disease datasets involving over 164,000 individuals. Additionally, we identify single nucleotide variations that cause brain-specific splicing alterations, and find disease-associated genes harboring these single nucleotide variations with distinct expression patterns involved in diverse biological processes. Finally, SpTransformer analysis of whole exon sequencing data from blood samples of patients with diabetic
nephropathy predicts kidney-specific RNA splicing alterations with 83% accuracy, demonstrating the potential to infer disease-causing tissue-specific splicing events. SpTransformer provides a powerful tool to guide biological and clinical interpretations of human diseases.

15:30-15:40
Cell type-specific epigenetic regulatory circuitry of coronary artery disease loci
Confirmed Presenter: Dennis Hecker, Goethe University Frankfurt, Germany

Room: 04AB
Format: In person

Moderator(s): Julien Gagneur


Authors List: Show

  • Dennis Hecker, Goethe University Frankfurt, Germany
  • Xiaoning Song, Technical University of Munich, Germany
  • Nina Baumgarten, Goethe University Frankfurt, Germany
  • Anastasiia Diagel, Technical University of Munich, Germany
  • Nikoletta Katsaouni, Goethe University Frankfurt, Germany
  • Ling Li, Technical University of Munich, Germany
  • Shuangyue Li, Technical University of Munich, Germany
  • Ranjan Kumar Maji, Goethe University Frankfurt, Germany
  • Fatemeh Behjati Ardakani, Goethe University Frankfurt, Germany
  • Lijiang Ma, Icahn School of Medicine at Mount Sinai, United States
  • Daniel Tews, Ulm University Medical Center, Germany
  • Martin Wabitsch, Ulm University Medical Center, Germany
  • Johan L.M. Björkegren, Icahn School of Medicine at Mount Sinai, United States
  • Heribert Schunkert, Technical University of Munich, Germany
  • Zhifen Chen, Technical University of Munich, Germany
  • Marcel H. Schulz, Goethe University Frankfurt, Germany

Presentation Overview: Show

Coronary artery disease (CAD) is the leading cause of death worldwide. Recently, hundreds of genomic loci have been shown to increase CAD risk, however, the molecular mechanisms underlying signals from CAD risk loci remain largely unclear. We sought to pinpoint the candidate causal coding and non-coding genes of CAD risk loci in a cell type-specific fashion. We integrated the latest statistics of CAD genetics from over one million individuals with epigenetic data from 45 relevant cell types to identify genes whose regulation is affected by CAD-associated single nucleotide variants (SNVs) via epigenetic mechanisms. We pursue two approaches. Firstly, we aggregate variations in gene bodies and combine their significance levels while accounting for their linkage disequilibrium structure. Secondly, we focus on variations that affect transcription factor binding in enhancers. We identified 1,580 genes likely involved in CAD, about half of which have not been associated with the disease so far. Of all the candidate genes, 23.5% represented non-coding RNAs, which are underrepresented in transcriptome-based gene prioritization. Enrichment analysis and phenome-wide association studies linked the novel candidate genes to disease-specific pathways and CAD risk factors, corroborating their disease relevance. We showed that CAD-SNVs affect the binding of transcription factors with cellular specificity. Finally, we conducted a proof-of-concept biological validation for the novel CAD non-coding RNA gene IQCH-AS1. Our study not only pinpoints CAD candidate genes in a cell type-specific fashion but also spotlights the roles of the understudied non-coding RNA genes in CAD genetics.

15:40-15:50
MultiPopPred: A Trans-Ethnic Disease Risk Prediction Method, and its Evaluation on Low Resource Populations
Confirmed Presenter: Ritwiz Kamal, Indian Institute of Technology Madras, India

Room: 04AB
Format: In person

Moderator(s): Julien Gagneur


Authors List: Show

  • Ritwiz Kamal, Indian Institute of Technology Madras, India
  • Manikandan Narayanan, Indian Institute of Technology Madras, India

Presentation Overview: Show

Genome-wide association studies (GWAS) aimed at estimating the disease risk of genetic factors have long been focusing on homogeneous Caucasian populations, at the expense of other understudied non-Caucasian populations. Therefore, active efforts are underway to understand the differences and commonalities in exhibited disease risk across different populations or ethnicities. There is, consequently, a pressing need for computational methods and associated probabilistic models that efficiently exploit these population specific vs. shared aspects of the genotype-phenotype relation. We propose MultiPopPred, a novel trans-ethnic polygenic risk score (PRS) estimation method, that taps into the shared genetic risk across populations and transfers information learned from multiple well-studied auxiliary populations to a less-studied target population. MultiPopPred employs a specially designed penalized shrinkage model of regression and a Nesterov-smoothed objective function optimized via a L-BFGS routine. We present five variants of MultiPopPred based on the availability of individual-level vs. summary-level data and the weightage of each auxiliary population. Extensive comparative analyses performed on simulated genotype-phenotype data reveal that MultiPopPred improves PRS prediction in the South Asian population by 67% on settings with low target sample sizes, by 18% overall across all simulation settings and by 73% overall across all semi-simulated settings, when compared to state-of-the-art trans-ethnic PRS estimation methods. This performance trend is promising and encourages application and further assessment of MultiPopPred under real-world settings.

15:50-16:00
Rethink gender as confounder in non linear PRS for human height prediction
Room: 04AB
Format: In person

Moderator(s): Julien Gagneur


Authors List: Show

  • Huijiao Yang, Denmark Technical University, Denmark

Presentation Overview: Show

Polygenic risk score (PRS) models often treat gender as a fixed covariate with presumed causal effects on height, which oversimplifies its complex role in genetic prediction. This work challenges conventional approaches by reconceptualizing gender as a confounder whose genetic signals are entangled with height-associated variants. We introduce a novel representation learning framework that: (1) learns gender-specific patterns directly from GWAS data rather than treating gender as a simple covariate, and (2) systematically disentangles these signals from height-related genetic architecture through contrastive learning and joint entropy minimization.

Our findings show that traditional methods, especially LASSO regression, tend to overestimate the contribution of Y-chromosome SNPs due to their confounding relationship with gender. By using disentangled embeddings, we reveal that many presumed ""causal"" gender effects may actually reflect correlated genetic patterns. Our framework offers three main contributions: First, it replaces covariate adjustment with an adaptive gender encoder that learns sex-specific patterns directly from GWAS data. Second, it introduces a non-linear PRS model that improves predictive accuracy and interpretation of genetic effects. Third, it provides new biological insights, suggesting that gender’s role in height prediction is less directly causal than current models suggest.

This work advances PRS construction and provides a new perspective on sex-related genetic architecture, showing that treating gender as a confounder enhances genetic prediction and interpretation, with implications for other sexually dimorphic traits.

16:40-17:20
Invited Presentation: Somatic mutations in normal tissues
Room: 04AB
Format: In person

Moderator(s): Antonio Rausell


Authors List: Show

  • Andrew Lawson

Presentation Overview: Show

Multi-task Convolutional Neural Networks (CNNs) have emerged as powerful tools for deciphering how genomic sequence determines gene regulatory responses such as chromatin accessibility or transcript abundance. These models can learn the sequence patterns recognized by regulatory factors from the variation across hundreds of thousands of loci in the genome. Their understanding of gene regulatory syntax enables them to be used to predict individual genomic variant effects across the cell types they were trained on, and to point to the affected biological mechanisms. However, our recent study and that of another group (Sasse et al. 2023) revealed in parallel that, despite strong performance on various variant effect prediction benchmarks (Avsec et al. 2021), these models fail to correctly determine how variants affect the direction of gene expression across individual, an essential capability for associating variants with phenotypes or diseases. To address these limitations and improve model learning from available data, I present two strategies. First, training with sequence variation: we developed a modeling approach that directly contrasts sequence differences to predict allele-specific and personalized functional measurements from RNA-seq, ATAC-seq, and ChIP-seq (Tu, Sasse, and Chowdharry et al. 2025; Spiro and Tu et al. 2025). We applied this approach to data from F1 hybrid mice and from humans with personal whole genome information, with varying degrees of success: while training on allele-resolved data improved predictions of differential signals, training on hundreds of personal genomes did not generalize variant effects to unseen genes. Second, training at higher resolution: we created models that analyze ATAC-seq at base-pair resolution, capturing both overall chromatin accessibility and the distribution of Tn5 transposase insertions (Chandra et al. 2025). Our results demonstrate that additionally modeling the ATAC-seq profile consistently improves predictions of differential chromatin accessibility. Systematic analysis of the models’ sequence attributions confirms that base-pair resolution training enables the model to learn a more sensitive representation of the regulatory syntax that drives differences between immunocytes.

17:20-17:40
Revisiting Cancer Predisposition: Identifying Altered Genes with Protective and Recessive Effects
Confirmed Presenter: Michal Linial, The Hebrew University of Jerusalem, Israel

Room: 04AB
Format: In person

Moderator(s): Antonio Rausell


Authors List: Show

  • Michal Linial, The Hebrew University of Jerusalem, Israel
  • Reoi Zucker, The Hebrew University of Jerusalem, Israel
  • Shirel Schreiber, The Hebrew University of Jerusalem, Israel

Presentation Overview: Show

essential for both preventive and personalized medicine. Genetic studies of cancer predisposition typically identify significant genomic regions through family-based cohorts or genome-wide association studies (GWAS). However, these approaches often lack biological insight and functional interpretation. In this study, we performed a comprehensive analysis of cancer predisposition in the UK Biobank (UKB) cohort using a novel gene-based method to identify interpretable protein-coding genes that are associated with ten major cancer types. Specifically, we applied proteome-wide association studies (PWAS) to detect genetic associations driven by alterations in protein function. Through PWAS, we identified 110 significant gene-cancer associations across. Notably, in 44% of these associations, the damaged gene was linked to reduced rather than elevated cancer risk, suggesting a protective effect. Together with classical GWAS, we identified 145 unique genomic loci associated with cancer risk. While many of these regions are supported by external evidence, we have listed 51 novel loci. Additionally, leveraging the ability of PWAS to detect non-additive genetic effects, we found that 46% of PWAS-significant cancer regions exhibited exclusive recessive inheritance, underscoring the importance of overlooked recessive genetic effects. These findings emphasize a refreshed view of predisposition that highlights recessive effects, protective genes, and the interrelation of genes in different cancer types. We provide PWAS Hub as an interactive tool to navigate among genes, cancer phenotypes and heritability modes. We conclude that expanding the list of cancer predisposition genes will benefit early diagnosis, genetic counseling, and an approach for personalized risk assessment.

17:40-17:50
COBT: A gene-based rare variant burden test for case-only study designs using aggregated genotypes from public reference cohorts
Room: 04AB
Format: In person

Moderator(s): Antonio Rausell


Authors List: Show

  • Antoine Favier, Imagine Institute, France
  • Antonio Rausell, Imagine Institute, France
  • Stefania Chounta, Imagine Institute, France
  • Alejandro García Sánchez, Centro Nacional de Análisis Genómico, Spain
  • Fabienne Jabot-Hanin, Imagine Institute, France
  • Xiaoyi Chen, Imagine Institute, France
  • Nicolas Garcelon, Imagine Institute, France
  • Anita Burgun, Imagine Institute, France
  • Manuel Higueras Hernáez, Universidad de la Rioja, Spain
  • Agathe Guilloux, Université Paris Cité, France
  • Alexandre Benmerah, Imagine Institute, France
  • Yoann Martin, Imagine Institute, France
  • Katy Billot, Imagine Institute, France
  • Jean-Michel Rozet, Imagine Institute, France
  • Isabelle Perrault, Imagine Institute, France
  • Valérie Cormier-Daire, Imagine Institute, France
  • Céline Huber, Imagine Institute, France
  • Mohamad Zaidan, Imagine Institute, France
  • Tania Attie-Bitach, Imagine Institute, France
  • Sophie Saunier, Imagine Institute, France

Presentation Overview: Show

More than 4000 rare genetic diseases have been reported, affecting 1 in 16 people. Yet, around 50% of patients remain undiagnosed after genetic testing. Identifying genotype-phenotype associations remains challenging due to limited cohort sizes and high clinical and genetic heterogeneity. Burden tests of rare variants increase statistical power in case-control designs but are limited in rare disease studies due to the lack of matched controls. Case-only aggregation tests have recently emerged; however, most rely on assessing the number of individuals carrying variants under dominant or recessive models rather than the aggregated number of variants across the cohort, overlooking hypomorphic and modifier variants or heterogeneous inheritance modes requiring additive models. We present the Case-Only Burden Test (COBT), a gene-based rare variant burden test for case-only designs. COBT uses a Poisson parametric test to evaluate the excess of variants in a gene observed in a patient cohort, compared to expectations inferred from general population mutation rates. We validated the statistical assumptions and goodness-of-fit of the method on non-Finnish European individuals from the 1000 Genomes Project. COBT showed low false positive rates, contrasting with the high p-value inflation of alternative case-only rare variant tests. Applied to 478 ciliopathy patients, COBT successfully re-identified known disease genes previously annotated via expert review and uncovered novel candidate genes in undiagnosed patients, consistent with clinical phenotypes. Our results show that COBT can uncover genotype-phenotype associations in case-only retrospective studies of rare-disease cohorts driven by primary as well as by secondary hits with major or modifier clinical roles.

17:50-18:00
Concluding remarks and prizes
Room: 04AB
Format: In person

Moderator(s): Antonio Rausell


Authors List: Show

  • VarICOSI Co-chairs