Leading Professional Society for Computational Biology and Bioinformatics
Connecting, Training, Empowering, Worldwide

banner

VarI COSI

Attention Presenters - please review the Speaker Information Page available here
Schedule subject to change
All times listed are in UTC
Thursday, July 29th
11:00-11:10
Opening Remarks
Format: Live-stream

Moderator(s): Emidio Capriotti

  • Emidio Capriotti, Hannah Carter, Antonio Rausell
11:10-12:00
VarI Keynote: Mutations in viruses and humans.
Format: Live-stream

Moderator(s): Emidio Capriotti

  • Alessandra Carbone, Sorbonne Universite, France

Presentation Overview: Show

Evolutionary information and coevolution in protein sequences relate to protein interactions, mechanical and allosteric properties, structure, and phenotypic mutational landscapes. Based on these two notions, mutational information can profitably be extracted from few or many conserved sequences, involving single or multiple proteins and possibly entire viral genomes. Simple and elegant algorithms can be designed to reconstruct viral protein-protein interaction networks, to identify residues involved in viral immune escape and to predict phenotypic mutational landscapes. They avoid machine (deep) learning strategies and provide a direct interpretation of the data. The same computational approaches can be applied to human proteins to accurately identify genetic variations involved in human diseases and phenotypic effects.

12:00-12:20
Proceedings Presentation: A variant selection framework for genome graphs
Format: Pre-recorded with live Q&A

Moderator(s): Emidio Capriotti

  • Chirag Jain, Indian Institute of Science, India
  • Neda Tavakoli, Georgia Institute of Technology, United States
  • Srinivas Aluru, Georgia Institute of Technology, United States

Presentation Overview: Show

Variation graph representations are projected to either replace or supplement conventional single genome references due to their ability to capture population genetic diversity and reduce reference bias. Vast catalogues of genetic variants for many species now exist. However, the number of recombinant paths in a graph increases combinatorially with increasing number of variants, and is particularly troublesome when mapping long reads which span greater distances. In practice, a genome graph reference should reflect just enough allelic diversity in a population such that sequencing reads remain mappable with bounded number of differences.

We propose a rigorous algorithmic framework for variant selection, by casting it in terms of minimizing variation graph size subject to preserving paths of length α with at most δ differences. This framework leads to a rich set of problems based on the types of variants (e.g., SNPs, indels or structural variants), and whether the goal is to minimize the number of positions at which variants are listed or to minimize the total number of variants listed. We classify the computational complexity and provide efficient algorithms for these problems. Our experiments demonstrate that significant graph reduction is achieved in human genome variation graphs using multiple α and δ parameter values corresponding to long and short-read resequencing characteristics. When our algorithm is run with parameter settings amenable to long-read mapping (α = 10 kbp, δ = 1000), 99.99% SNPs and 73% structural variants can be safely excluded from human chromosome 1 variation graph.

12:40-12:50
Identifying Factors Important for Conservation at Sites of Synonymous Variation
Format: Pre-recorded with live Q&A

Moderator(s): Emidio Capriotti

  • Abhirami Ram, TCS Research, India
  • Rajgopal Srinivasan, TCS Research, India
  • Uma Sunderam, TCS Research, India

Presentation Overview: Show

Synonymous mutations can have a deleterious effect leading to disease although they are not protein altering. Variations at genomic sites leading to synonymous variants are often highly conserved across species. Several predictors have been developed to assess the impact of synonymous mutations and are highly dependent on having accurate and validated sets of both deleterious and benign synonymous mutations. However, validated data available for deleterious synonymous mutations is sparse unlike for missense mutations. Rather than develop a prediction model for predicting pathogenicity of synonymous variants, we seek to understand the relative importance of various factors that lead to conservation at sites of synonymous variants. Our study built machine learning models using various features on a large set of reported and generated synonymous variants that were used in another study to predict the conservation score at the variant site. We used the gradient boosting classifier to classify sites as high and low conservation at different cutoffs. Our experiments report an AUC between 0.74-0.77 and the sensitivity was significant. From the features we explored, some alternate allele independent properties were repeatedly flagged as having high impact. These findings provide information for predictors to further improve models for synonymous variant impact.

12:50-13:10
The structural landscape of constrained sites in the human proteome
Format: Pre-recorded with live Q&A

Moderator(s): Antonio Rausell

  • Bian Li, Vanderbilt University, United States
  • Tony Capra, University of California, San Francisco, United States

Presentation Overview: Show

Quantification of patterns of protein-coding genetic variation within and between species is a cornerstone of evolutionary and functional analyses. However, current approaches for quantifying constraint on proteins either focus on individual sites or the whole protein, without accounting for the functional context of the sequence: 3D structure. Recent growth in databases of genetic variation and protein 3D structure enable the synthesis of protein spatial context into the estimation of site-level constraint. Here, we describe a new framework, called COSMIS, for quantification of the constraint on genetic variation in 3D neighborhoods of each protein site based on a mutation-spectrum-aware statistical model of the expected number of variants. We define a comprehensive map of protein spatial constraint by applying COSMIS to the 3D distribution of >1.88 million human missense variants from gnomAD, covering 47% of all canonical human transcripts. We demonstrate that the COSMIS score is accurate in predicting gene essentiality and variant pathogenicity while also providing biophysical insight into the potential functional roles of constrained sites. We anticipate that the structural landscape of constrained sites identified by COSMIS will facilitate interpretation of patterns of protein-coding constraint in human evolution and prioritization of sites for mechanistic or functional investigation.

13:10-14:00
VarI Keynote: Pan-genomic advances for fighting reference bias
Format: Pre-recorded with live Q&A

Moderator(s): Antonio Rausell

  • Ben Langmead

Presentation Overview: Show

Sequencing data analysis often begins with aligning sequencing reads to a reference genome, where the reference takes the form of a linear string of bases. But linearity leads to reference bias, a tendency to miss or misreport alignments containing non-reference alleles, which can confound downstream statistical and biological results. This is a major concern in human genomics; we don't want to live in a world where diagnostics and therapeutics are differentially effective depending how closely our genome matches the reference.

Fortunately, computer science and bioinformatics are meeting this moment. In particular, recent advances allow us to index and align sequencing reads to references that include many population variants. Here I will describe this journey from the early days of efficient genome indexing -- especially the FM index approach behind Bowtie and BWA -- continuing through more modern methods for graph-shaped references and references that include many genomes. I will emphasize recent results that show how to optimize simple and complex pan-genome representations for effective avoidance of reference bias. Finally, I will outline major problems that remain as our field strives to make the transition to more inclusive pan-genomic representations.

14:20-14:40
PolarMorphism enables discovery of genetic variants with shared effect across multiple traits from GWAS summary statistics
Format: Pre-recorded with live Q&A

Moderator(s): Antonio Rausell

  • Joanna von Berg, University Medical Center Utrecht, Netherlands
  • Sander van der Laan, University Medical Center Utrecht, Netherlands
  • Jeroen de Ridder, University Medical Center Utrecht, Netherlands

Presentation Overview: Show

Genome-wide association studies (GWAS) have uncovered numerous trait-specific single nucleotide polymorphism (SNP) associations. Many SNPs show an effect on more than one trait. One approach to find these is to take the intersect of significantly associated SNPs for each trait. The power to discover a shared SNP with this approach is low, as both GWAS had to have enough power to discover the SNP. We propose a new method, PolarMorphism, that simultaneously analyses summary statistics from multiple traits. We transform the trait-specific z-scores x and y to polar coordinates r and theta, which express each SNP in the distance from the origin and the angle with the x-axis. r is a measure of overall effect and theta is a measure of trait-specificity. We obtain p-values for r from a chi distribution with 2 degrees of freedom. We obtain p-values for theta from a von Mises distribution. This distribution has two parameters: the angular mean and the concentration parameter kappa. We show that there is a relationship between r and kappa under the null hypothesis of only trait-specific effect, and obtain a kappa estimate per SNP. Shared SNPs are defined as having both r and theta FDR adjusted p-value < 0.05.

14:40-14:50
Sex differences in genetic architecture in UK Biobank
Format: Pre-recorded with live Q&A

Moderator(s): Hannah Carter

  • Elena Bernabeu, The University of Edinburgh, United Kingdom
  • Albert Tenesa, The University of Edinburgh, United Kingdom
  • James Prendergast, The University of Edinburgh, United Kingdom
  • Konrad Rawlik, The University of Edinburgh, United Kingdom
  • Oriol Canela-Xandri, The University of Edinburgh, United Kingdom
  • Andrea Talenti, The University of Edinburgh, United Kingdom

Presentation Overview: Show

Despite males and females sharing nearly identical genomes, there are differences between the sexes in complex traits and in the risk of a wide array of diseases. Gene by sex interactions (GxS) are thought to underlie some of these differences. However, the extent and basis of these interactions are poorly understood.

Here we provide insights into the scope and mechanism of GxS across the genome of circa 450,000 individuals of European ancestry and 530 complex traits in the UK Biobank. We found small yet widespread differences in genetic architecture across traits through the calculation of sex-specific heritability, genetic correlations, and sex-stratified genome-wide association studies (GWAS). We also found that, in some cases, sex-agnostic GWAS efforts might be missing loci of interest, and looked into possible improvements in the prediction of high-level phenotypes. Finally, we studied the potential functional role of GxS through sex-biased eQTL and gene-level analyses.

This study marks a broad examination of the genetics of sexual differences. Our findings parallel previous reports, suggesting the presence of sexual genetic heterogeneity across complex traits of generally modest magnitude. Our results suggest the need to consider sex-stratified analyses for future studies in order to shed light into possible sex-specific molecular mechanisms.

14:50-15:00
Identification of Ethnicity-Specific Associations in Multi-Ethnic Genome-Wide Association Studies
Format: Pre-recorded with live Q&A

Moderator(s): Hannah Carter

  • Ming Hin Ng, The Chinese University of Hong Kong, Hong Kong
  • Yingying Wei, The Chinese University of Hong Kong, Hong Kong

Presentation Overview: Show

The emergence of large-scale biobank studies, exemplified by the UK Biobank, provides us with unprecedented opportunities to decipher the genetic architecture of complex diseases in minority populations. One way to call ethnicity-specific associations is to partition a biobank into different ethnic groups according to self-reported ethnicities and perform association detection for each ethnic group separately. However, biobank studies usually involve admixture individuals, whose genomes have originated from multiple ancestries. Unfortunately, despite their urgent need, statistical methods for detecting ethnicity-specific associations for multi-ethnic GWAS are lacking. Moreover, computation time and memory consumption are major challenges for analyzing biobank-scale GWAS. Here, we propose a computationally efficient and statistically rigorous testing method, MESA, to infer ethnicity-specific associations for multi-ethnic GWAS. MESA reduces the time complexity from O(K^3(GR+G+N)^3) to only O(KGNR^2) for a GWAS that measures R phenotypes and G genetic variants for N individuals originating from K ancestries and save the memory consumption from terabytes to megabytes. Thus, together with further acceleration using stochastic updates, MESA is able to analyze biobank-scale GWAS consisting of hundreds of thousands of individuals. Simulation studies demonstrate that MESA accurately detects ethnicity-specific associations. Application of MESA to the PAGE study discovers novel single nucleotide polymorphisms associated with BMI.

15:00-15:20
Germline variants that influence the tumor immune microenvironment also drive response to immunotherapy
Format: Pre-recorded with live Q&A

Moderator(s): Hannah Carter

  • Hannah Carter, UCSD, United States
  • Meghana Pagadala, UCSD, United States
  • Victoria Wu, Moores Cancer Center, United States
  • Eva Pérez-Guijarro, NCI, United States
  • Hyo Kim, UCSD, United States
  • Andrea Castro, UCSD, United States
  • James Talwar, UCSD, United States
  • Cristian Gonzalez-Colin, La Jolla Institute of Immunology, United States
  • Steven Cao, UCSD, United States
  • Benjamin Schmiedel, La Jolla Institute of Immunology, United States
  • Rany Salem, UCSD, United States
  • Gerald Morris, UCSD, United States
  • Olivier Harismendy, Moores Cancer Center, United States
  • Sandip Patel, Moores Cancer Center, United States
  • Jill Mesirov, UCSD, United States
  • Maurizio Zanetti, Moores Cancer Center, United States
  • Chi-Ping Day, NCI, United States
  • Chun Chieh Fan, UCSD, United States
  • Wesley Thompson, UCSD, United States
  • Glenn Merlino, NCI, United States
  • J Silvio Gutkind, Moores Cancer Center, United States
  • Pandurangan Vijayanand, La Jolla Institute of Immunology, United States

Presentation Overview: Show

With the continued promise of immunotherapy as an avenue for treating cancer, understanding how host genetics contributes to the tumor immune microenvironment (TIME) is essential to tailoring cancer risk screening and treatment strategies. Using genotypes from over 8,000 European individuals in The Cancer Genome Atlas (TCGA) and 137 heritable tumor immune phenotype components (IP components), we identified and investigated 482 TIME-associated variants. Many TIME-associated variants influence gene activities in specific immune cell subsets, such as macrophages and dendritic cells, and interact to promote more extreme TIME phenotypes. TIME-associated variants were predictive of immunotherapy response in human cohorts treated with immune-checkpoint blockade (ICB) in 3 cancer types, causally implicating specific immune-related genes that modulate myeloid cells of the TIME. Moreover, we validated the function of these genes in driving tumor response to ICB in preclinical studies. Through an integrative approach, we link host genetics to TIME characteristics, informing novel biomarkers for cancer risk and target identification in immunotherapy.

Friday, July 30th
11:00-11:50
VarI Keynote: Sequence to energy and structure.
Format: Live-stream

Moderator(s): Antonio Rausell

  • Ben Lehner

Presentation Overview: Show

After seven decades of molecular biology we have a reasonable conceptual understanding of how macromolecules work but we still struggle to predict how they are affected by mutations and how these mutations combine to alter even simple biophysical traits. To help address this shortcoming we have developed fast and simple selection assays that allow us to quantify in parallel the effects of many mutations on the folding, binding and aggregation of proteins. Using these assays and massively parallel double mutant cycles allows us to deconvolve the effects of mutations into their underlying causal and additive free energy changes, which allows the combined effects of mutations to be accurately predicted. Applied to protein interaction domains, the approach allows us to comprehensively identify distal allosteric sites regulating binding affinity. Moreover, the residual non-additive interactions between mutations can provide sufficient information on energetic coupling to identify structural contacts in proteins, allowing structures to be solved only using mutagenesis, selection and sequencing

11:50-12:10
Protein structural consequences of DNA mutational signatures: A meta-analysis of somatic variants and deep mutational scanning data
Format: Pre-recorded with live Q&A

Moderator(s): Antonio Rausell

  • Joseph Ng, King's College London, United Kingdom
  • Franca Fraternali, King's College London, United Kingdom

Presentation Overview: Show

Signatures of DNA motifs associated with distinct mutagenic exposures have been defined for somatic variants, but little is known about the consequences different mutational processes pose to the cancer cell, particularly the distribution of the resulting variants in the implied proteins and their structural regions (surface, core, interacting interface). Here we first compare the protein-level consequences of six mutational signatures (Aging, APOBEC, POLE, UV, 5-FU and Platinum) characterised by clear DNA motif preferences. By mapping individual substitution events observed in tumours to three-dimensional protein structures, we show that these common somatic mutational signatures are biased against the protein core, consistent with the lower tolerability of substitutions at such functionally important regions. On the other hand, deep mutational scanning (DMS) data allow us to probe the ‘dark matter’ of somatic mutational landscape, exploring variants which are otherwise removed in purifying selection. A computational DMS analysis identifies mutational contexts (5’-G/C[T>G]A/G-3’) which are associated with damaging mutations, by altering physicochemical characteristics of amino acids at the protein core. We argue that comprehensive DMS analysis can contribute to classification of variants according to their true impact to the stability/activity of the affected protein, decoupling this from pathogenicity prediction offered by conventional variant impact classifiers.

12:10-12:20
Physico-chemical and structural features of pathogenic and benign human protein missense variations collected from HUMSAVAR and ClinVar
Format: Pre-recorded with live Q&A

Moderator(s): Antonio Rausell

  • Giulia Babbi, University of Bologna - Biocomputing Group, Italy
  • Castrense Savojardo, University of Bologna - Biocomputing Group, Italy
  • Matteo Manfredi, University of Bologna - Biocomputing Group, Italy
  • Pier Luigi Martelli, University of Bologna - Biocomputing Group, Italy
  • Rita Casadio, University of Bologna - Biocomputing Group, Italy

Presentation Overview: Show

Modern sequencing technologies provide an unprecedented amount of data about missense single-nucleotide variations leading to changes in protein sequences. For many single residue variations (SRVs), links to genetic diseases are reported. From HUMSAVAR and ClinVar, we collected human SRVs whose effect on human health is annotated as Pathogenic/Likely Pathogenic (P/LP) or Benign/Likely Benign (B/LB).
After merging, the Union dataset contains 3,627 proteins carrying 75,927 SRVs. Of them, 44,543 and 31,384 are labelled as P/LP and B/LB, respectively. The intersection between SRVs from HUMSAVAR and ClinVar is limited:the two datasets share about 5% and 30% of B/LB and P/LP SRVs, respectively. The question poses as to which extent the SRVs from different datasets share physico-chemical and structural features. With computational methods, we characterised solvent accessibility, flexibility and disorder of positions carrying P/LP and B/LB SRVs, and we compared the results obtained on ClinVar, HUMSAVAR and Union datasets. P/LP SRVs are significantly more abundant in buried/rigid positions, while B/LB SRVs occur preferentially in solvent-exposed/flexible regions. P/LP SRVs have a slight tendency to be more abundant than B/LB in not disordered regions. Overall, the findings suggest that SRVs deriving from HUMSAVAR and ClinVar, despite their limited overlap, share common physico-chemical and structural features.

12:40-12:55
VarI Sponsor: A unique solution to a (non) unique problem: calling variants in non-uniquely mappable regions using short-read WGS data.
Format: Pre-recorded with live Q&A

Moderator(s): Emidio Capriotti

  • Alexander Kaplun, Variantyx, USA

Presentation Overview: Show

Non uniquely mappable reads represent one of the biggest challenges in short read NGS. With a read length of 100-150bp, a significant fraction of the genome is non unique, including many clinically relevant genes such as SMN1, HBA, NEB, PMS2 and others. These ambiguous reads are usually ignored by variant callers of typical NGS pipelines, leaving many important regions covered yet not analyzed in NGS based clinical genetic tests. To overcome this problem we have developed a bioinformatic method for detecting variants using non uniquely mappable reads.
We have first identified clinically relevant regions in which non uniquely mappable reads prevail. For each clinical WGS case the pipeline selects high quality non-uniquely mappable reads in these regions and combines them to generate a small bam. Variant calling is performed over this bam, with ploidy adjusted to account for alternative mapping of the reads. The detected variants are marked as located in a non uniquely mappable region, clearly indicating to the clinical interpretation team that the actual location of the variant might be different. Taking into account the patient’s phenotype allows us to focus on relevant variants, and if such a variant is selected to be included in the genetic test report its exact position can be orthogonally confirme

12:55-14:00
VarI Roundtable
Format: Live-stream

Moderator(s): Emidio Capriotti

  • Yana Bromberg, Douglas Fowler, Daniel Gilchrist, Predrag Radivojac
14:20-14:30
The role of exome sequencing in newborn screening
Format: Pre-recorded with live Q&A

Moderator(s): Emidio Capriotti

  • Aashish N. Adhikari, University of California, Berkeley, United States
  • Renata C. Gallagher, University of California, San Francisco, United States
  • Yaqiong Wang, University of California, Berkeley, United States
  • Robert J. Currier, University of California, San Francisco, United States
  • George Amatuni, University of California, San Francisco, United States
  • Laia Bassaganyas, University of California, San Francisco, United States
  • Flavia Chen, University of California, San Francisco, United States
  • Kunal Kundu, University of California, Berkeley, United States
  • Mark Kvale, University of California, San Francisco, United States
  • Sean D. Mooney, University of Washington, United States
  • Robert L. Nussbaum, University of California, San Francisco, United States
  • Savanna S. Randi, University of California, Santa Cruz, United States
  • Jeremy Sanford, University of California, Santa Cruz, United States
  • Joseph T. Shieh, University of California, San Francisco, United States
  • Rajgopal Srinivasan, Tata Consultancy Services, India
  • Uma Sunderam, Tata Consultancy Services, India
  • Hao Tang, California Department of Public Health, United States
  • Dedeepya Vaka, University of California, San Francisco, United States
  • Yangyun Zou, University of California, Berkeley, United States
  • Barbara A. Koenig, University of California, San Francisco, United States
  • Pui-Yan Kwok, University of California, San Francisco, United States
  • Neil Risch, University of California, San Francisco, United States
  • Jennifer Puck, University of California, San Francisco, United States
  • Steven E. Brenner, University of California, Berkeley, United States

Presentation Overview: Show

Public health newborn screening (NBS) programs provide population-scale ascertainment of rare, treatable conditions that require urgent intervention. Tandem mass spectrometry (MS/MS) is currently used to screen newborns for a panel of rare inborn errors of metabolism (IEMs). The NBSeq project evaluated whole exome sequencing (WES) as an innovative methodology for NBS.

We obtained archived residual dried blood spots (DBS) and data for nearly all IEM cases from the 4.5 million infants born in California between mid-2005 and 2013, and from some infants who screened positive by MS/MS, but were unaffected upon follow-up testing. We analyzed variants within an exome slice of 78 genes associated with the 48 IEMs ascertained by NBS in California. Our automated pipeline considered curated variants, variant predictions, and copy number.

WES had an overall sensitivity of 88% and specificity of 98.4%, compared to 99.0% and 99.8%, respectively for MS/MS, although effectiveness varied among individual IEMs. Of the 103 affected cases missed by exomes, 50 had no rare missense variants in relevant genes, and 53 had a single autosomal heterozygous variant in a pertinent gene.

In 12 cases, the initial clinical diagnosis was inconsistent with the gene reported by WES, but the final disorder assignment from our subsequent clinical review was concordant with WES. For example, an individual clinically reported to have IVA had a rare, homozygous, missense variant in ACADSB suggesting 2MBCD deficiency. Another individual, MS/MS positive for VLCADD and MCADD, had been diagnosed as unaffected upon follow-up. WES revealed missense variants in ETFDH that were previously observed compound heterozygous in several late-onset MADD patients.

WES alone was insufficiently sensitive or specific to be a primary screen for most NBS IEMs. However, as a secondary test for infants with abnormal MS/MS screens, WES could reduce false positive results, facilitate timely case resolution, and in some instances even suggest a more appropriate or specific diagnosis than that initially obtained. An alternative pipeline that reported only curated variants had high specificity, 99.4%, but unacceptably low sensitivity, 55%. Thus, though sensitivity of WES alone may be too low to meet standard criteria for NBS, sequencing could potentially identify many treatable conditions that presently go unrecognized until too late for optimal intervention due to lack of an alternative current NBS test.

This study represents the largest-to-date sequencing effort of an entire population of IEM-affected cases, allowing unbiased assessment of current capabilities of WES as a tool for population screening.

14:30-14:40
Validation of genetic variants from NGS data using Deep Convolutional Neural Networks
Format: Pre-recorded with live Q&A

Moderator(s): Hannah Carter

  • Marc Vaisband, Salzburg Cancer Research Institute - Laboratory for Immunological and Molecular Cancer Research; University of Bonn, Germany
  • Maria Schubert, Salzburg Cancer Research Institute - Laboratory for Immunological and Molecular Cancer Research, Austria
  • Franz Josef Gassner, Salzburg Cancer Research Institute - Laboratory for Immunological and Molecular Cancer Research, Austria
  • Roland Geisberger, Salzburg Cancer Research Institute - Laboratory for Immunological and Molecular Cancer Research, Austria
  • Richard Greil, Salzburg Cancer Research Institute - Laboratory for Immunological and Molecular Cancer Research, Austria
  • Nadja Zaborsky, Salzburg Cancer Research Institute - Laboratory for Immunological and Molecular Cancer Research, Austria
  • Jan Hasenauer, University of Bonn, Germany

Presentation Overview: Show

One of the most important frontiers in computational biology and biomedicine is the comprehensive analysis of Next-Generation Sequencing (NGS) data. In cancer research in particular, the identification of somatic mutations is vital for the investigation of their effects on disease progression and treatment response. This is done by considering the sequenced tumour DNA and a reference germline sample, and identifying candidate variants by way of comparison. Despite automated filtering, however, sequencing artifacts or alignment errors are often mistakenly flagged as variants. For this reason, researchers must perform extremely time-consuming manual screening. We demonstrate that it is possible to reliably automate this process using Deep Convolutional Neural Networks, whose utility has been behind many recent successes in applied machine learning. Using previously performed manual annotation as input data, we trained a CNN model that recognises sequencing artifacts with high accuracy, achieving a 5-fold crossvalidation score of 96%, on par with human reviewers. Moreover, we show how this can be extended to account for artifacts specific to library preparation which require comparison with additional sequencing tracks. Altogether, this allows for a significant reduction in the workload for researchers, and can in the future be integrated into bioinformatics workflows for NGS data processing.

14:40-15:00
CADD-SV -- a framework to score the effects of structural variants in health and disease
Format: Pre-recorded with live Q&A

Moderator(s): Hannah Carter

  • Martin Kircher, Berlin Institute of Health @ Charité, Germany
  • Philip Kleinert, Berlin Institute of Health @ Charité, Germany

Presentation Overview: Show

Recent technological advances improved the identification of structural variants (SVs) in human genomes, however the interpretation of these variants remains challenging. Several methods were developed that utilize individual mechanistic principles like the deletion of coding sequence or 3D genome architecture disruptions. However, a comprehensive and easy-to-use tool that uses the broad spectrum of available annotations for estimating the effect of SVs and prioritize functional variants in health and disease is missing.
Here we describe CADD-SV, a method to retrieve and integrate a wide set of annotations across the range and in the vicinity of SVs for functional scoring. We use human and chimpanzee derived alleles as proxy-neutral and contrast them with matched simulated variants as proxy-pathogenic, an approach that has proven powerful in the interpretation of SNVs and short InDels (Kircher & Witten et al, 2014). Out tool uses random forest models and provides an easy to use website for scoring variants (https://cadd-sv.bihealth.org).
We show that CADD-SV scores correlate with known pathogenic variants in individual genomes as well as allelic diversity observed across populations. Further, CADD-SV prioritizes somatic variants observed in cancer patients as well as non-coding variants known to affect gene expression, exceeding the performance of AnnotSV and SVScore.

15:00-15:10
CNVxplorer: a web tool to assist clinical interpretation of CNVs in rare disease patients
Format: Pre-recorded with live Q&A

Moderator(s): Hannah Carter

  • Francisco Requena, Imagine Institute for Genetic Diseases, France
  • Hamza Hadj Abdallah, Imagine Institute for Genetic Diseases - Hôpital Necker-Enfants Malades, France
  • Alejandro García, Imagine Institute for Genetic Diseases, France
  • Patrick Nitschké, Imagine Institute for Genetic Diseases, France
  • Sergi Romana, Imagine Institute for Genetic Diseases - Hôpital Necker-Enfants Malades, France
  • Valérie Malan, Imagine Institute for Genetic Diseases - Hôpital Necker-Enfants Malades, France
  • Antonio Rausell, Imagine Institute for Genetic Diseases - Hôpital Necker-Enfants Malades, France

Presentation Overview: Show

Copy Number Variants (CNVs) are an important cause of rare diseases. Array-based Comparative Genomic Hybridization tests yield a ∼12% diagnostic rate, with ∼8% of patients presenting CNVs of unknown significance. CNVs interpretation is particularly challenging on genomic regions outside of those overlapping with previously reported structural variants or disease-associated genes. Recent studies showed that a more comprehensive evaluation of CNV features, leveraging both coding and non-coding impacts can significantly improve diagnostic rates. However, currently available CNV interpretation tools are mostly gene-centric or provide only non-interactive annotations difficult to assess in the clinical practice. Here we present CNVxplorer, a web server suited for the functional assessment of CNVs in a clinical diagnostic setting. CNVxplorer mines a comprehensive set of clinical, genomic, and epigenomic features associated with CNVs. It provides sequence constraint metrics, impact on regulatory elements and topologically associating domains, as well as expression patterns. Analyses offered cover (a) agreement with patient phenotypes; (b) visualizations of associations among genes, regulatory elements and transcription factors; (c) enrichment on functional and pathway annotations; and (d) co-occurrence of terms across PubMed publications related to the query CNVs. A flexible evaluation workflow allows dynamic re-interrogation in clinical sessions. CNVxplorer is publicly available at http://cnvxplorer.com

15:10-15:20
Emerging gain-of-function mutations in disease: their computational interpretation and characterization
Format: Pre-recorded with live Q&A

Moderator(s): Hannah Carter

  • Nidhi Sahni, MD Anderson Cancer Center, United States
  • Stephen Yi, University of Texas at Austin, United States

Presentation Overview: Show

Traditionally, disease causal mutations were thought to disrupt gene function. However, it becomes more clear that many deleterious mutations could exhibit a ‘gain-of-function’ (GOF) behavior. Systematic investigation of such mutations has been lacking and largely overlooked. Elucidating the functional pathways rewired by GOF mutations will be crucial for prioritizing disease-causing variants and their resultant therapeutic liabilities. In distinct cell types (with varying genotypes), precise signal transduction controls cell decision, including gene regulation and phenotypic output. Many fundamental questions pertaining to genotype-phenotype relationships remain unresolved. For example, what are common types of genomic aberrations leading to GOF? How do interaction networks undergo rewiring upon GOF mutations? Which GOF mutations are key for gene regulation and cellular decisions? What are the GOF mechanisms at the RNA and protein regulation levels? Is it possible to leverage GOF mutations to reprogram signal transduction in cells, aiming to cure disease? To begin to address these questions, in this talk, we will cover our recent discoveries regarding GOF disease mutations and their characterization by multi-omics networks. We also discuss advances in bioinformatic and computational resources, to highlight the fundamental function of GOF mutations and discuss the potential mechanistic effects in the context of signaling networks.



International Society for Computational Biology
525-K East Market Street, RM 330
Leesburg, VA, USA 20176

ISCB On the Web

Twitter Facebook Linkedin
Flickr Youtube