Posters - Schedules

Posters Home

View Posters By Category

Monday, July 11 and Tuesday, July 12 between 12:30 PM CDT and 2:30 PM CDT
Wednesday July 13 between 12:30 PM CDT and 2:30 PM CDT
Session A Poster Set-up and Dismantle Session A Posters set up:
Monday, July 11 between 7:30 AM CDT - 10:00 AM CDT
Session A Posters dismantle:
Tuesday, July 12 at 6:00 PM CDT
Session B Poster Set-up and Dismantle Session B Posters set up:
Wednesday, July 13 between 7:30 AM - 10:00 AM CDT
Session B Posters dismantle:
Thursday. July 14 at 2:00 PM CDT
Virtual: An Evaluation of Variant Annotation Tools – Alamut Batch, Ensembl Variant Effect Predictor (VEP), and Annovar - for Clinical Next Generation Sequencing (NGS) based Genetic Testing
COSI: VarI
  • Sachleen Tuteja, Illinois Mathematics and Science Academy, United States
  • Sabah Kadri, Ann and Robert H. Lurie Children’s Hospital of Chicago, United States
  • Kailee Yap, Ann and Robert H. Lurie Children’s Hospital of Chicago, United States


Presentation Overview: Show

Dramatically expanding our ability for clinical genetic testing for inherited conditions and complex diseases such as cancer, next generation sequencing (NGS) technologies are allowing for rapid interrogation of thousands of genes and identification of millions of variants. Variant annotation, the process of assigning functional information to DNA variants based on the standardized Human Genome Variation Society (HGVS) nomenclature, is a fundamental challenge in the analysis of NGS data that has led to the development of many empirically based tools. In this study, we evaluated the performance of three variant annotation tools: Alamut Batch, Ensembl Variant Effect Predictor (VEP) and Annovar, benchmarked by a manually curated ground truth set of 298 variants from the medical exome database at the Molecular Diagnostics Laboratory at Lurie Children’s Hospital. Of the three tools, VEP produces most accurate variant annotations (HGVS nomenclature for 297 of the 298 variants) due to usage of updated gene transcript versions within the algorithm. Alamut Batch called 296 of the 298 variants correctly; strikingly, Annovar exhibited the greatest number of discrepancies (20 of the 298 variants, 93.3% concordance with ground truth set). Adoption of validated methods of variant annotation is critical in post analytical phases of clinical testing.

Virtual: Bayesian Inference of Local Genetic Correlation
COSI: VarI
  • Xiaoxuan Xia, The Chinese University of Hong Kong, Hong Kong
  • Lin Hou, Tsinghua University, China
  • Yingying Wei, The Chinese University of Hong Kong, China


Presentation Overview: Show

The genetic correlation quantifies the genetic similarity of two traits. Although estimations of the overall genome-wide genetic correlation have become routine for genome-wide association studies (GWAS), statistical methods that provide robust estimation of local genetic correlations are still lacking.
On the one hand, genetic effects and hence the local genetic correlations are very heterogeneous across the genome. On the other hand, compared to the number of single nucleotide polymorphism (SNPs) which is on the scale of millions, the number of SNPs contributed to the local genetic correlation of a given region is very small. Very often, for a given trait, due the noisy estimation of genetic effects, the heritability of some regions can even be estimated to be negative, which makes the inference of genetic correlation impossible.
Here, we propose a rigorous statistical method, BAyesian inference of LOCal genetic correlation (BALOC), to provide robust estimation of local genetic correlations. BALOC is able to take both genotype data and summary statistics as input. Our simulation study demonstrates that BALOC substantially outperforms the state-of-the-art method. The application of BALOC to UK Biobank data reveals novel genetically correlated regions

Virtual: Missense3D-PPI: a structure-based prediction algorithm of the impact of missense variants at protein interfaces
COSI: VarI
  • Cecilia Pennica, Imperial College London, United Kingdom
  • Michael J. E. Sternberg, Imperial College London, United Kingdom
  • Alessia David, Imperial College London, United Kingdom


Presentation Overview: Show

In 2019, we released Missense3D which identifies stereochemical features that are disrupted by a missense variant, such as introducing a buried charge. Missense3D, which has >150 citations and over ~7K users in the last year, analyses the effect of missense variants on a single structure. Here we present Missense3D-PPI for the prediction of missense variants at protein-protein interfaces (PPI).

Our dataset comprised 1,301 interface variants in 441 proteins and 553 PDB complexes. Benchmarking of Missense3D-PPI was performed using a training (320 benign and 320 pathogenic variants) and testing (257 benign and 404 pathogenic) dataset. Structural features affecting PPI were analysed to assess the impact of the variant at PPI.

Missense3D and Missense3D-PPI were run on the test data. The inclusion of these PPI-specific features improved the Matthews Correlation Coefficient (from 0.11 to 0.21) and the accuracy of Missense3D-PPI (from 42% to 56%, p-value of 1x10-9, McNemar’s test). Comparison of Missense3D-PPI with MutaBind2, BeatMusic and mCSM-PPI2 showed that the programs performed similarly on our test data of naturally occurring human missense variants.

Missense3D-PPI represents a valuable tool to predict the structural effect of missense variants at PPI and will be available from our Missense3D web portal (http://missense3d.bc.ic.ac.uk/).

Virtual: Mutational Signature Extractor: Inferring the number of mutational signature-driven mutations from whole exome sequencing data of a small sample size
COSI: VarI
  • Youngchul Kim, H. Lee Moffitt Cancer Center and Research Institute, United States
  • Yonghong Zhang, H. Lee Moffitt Cancer Center and Research Institute, United States
  • Dae Won Kim, H. Lee Moffitt Cancer Center and Research Institute, United States


Presentation Overview: Show

Mutational Signature Analysis (MSA) is becoming a standard procedure for in-depth analysis of exome sequencing mutation data. While MSA can be used to extract both mutational signatures and the numbers of their-driven mutations from exome sequencing data of any sample sizes, signatures from data of a small sample size could be unstable and less reliable than those obtained from a large dataset. Therefore, we have developed and present a method that infers the number of mutations that are likely to be attributed to the well-established TCGA mutation signatures at COSMIC database. Our method can be viewed as an analogue of tumor immune microenvironment deconvolution methods such as CIBEROSRT that infers the frequency of tumor-infiltrating immune cells from pre-built gene expression signatures. In an empirical study with a large cohort of metastatic tumor samples, the result of our method is highly correlated to the result obtained from MSA. The advantage of our method is finally exemplified in the analysis of exome sequencing data in a small phase 1 clinical trial study of colorectal cancer. In an application of our method to the data, we found that the APOBEC3A-related mutation signature (SBS2) is correlated with response to the study regimen.

Virtual: Noncoding variant effect prediction using genome sequence and chromatin structure
COSI: VarI
  • Wuwei Tan, Texas A&M University, United States
  • Yang Shen, Texas A&M University, United States


Presentation Overview: Show

Whole-genome sequencing experiments are providing growing data on genetic variants beyond coding regions. Computational prediction of non-coding variant effects, often measured in epigenomic profile changes, helps reveal causal relationships between noncoding variations and resulting phenotypic changes. Current methods for predicting non-coding variant effects often only rely on sequence information of the variation sites and their neighbors (within hundreds of base pairs or bps) but disregard the rest which constitutes the vast majority of the genome. We introduce a method to utilize the sequence and structure information of the whole genome in an efficient way. Specifically, so that non-coding variant effect prediction considers the contexts of neighbors in both 1D sequence and 3D structure, we represent the additional Hi-C data of chromatin 3D structure as sparse graphs of local regions (1K bps), embed 3D structure data through graph neural networks (GNN), and embed 1D sequence data through convolutional and recurrent neural networks as well as DNA language models in conjunction. Numerical results indicate that our methods outperform competitors in epigenomic profile prediction, eQTL, and motif identification, by correcting the sequence-profile bias in current methods.

Virtual: Somatic mutation analyses of stem-like cells in gingivobuccal oral squamous cell carcinoma reveals DNA damage response genes
COSI: VarI
  • Sachendra Kumar, Indian Institute of Science, India
  • Annapoorni Rangarajan, Indian Institute of Science, India
  • Debnath Pal, Indian Institute of Science, India


Presentation Overview: Show

Gingivobuccal oral squamous cell carcinoma (OSCC-GB) is a prominent clinical subtype of head and neck squamous cell carcinoma (HNSCC) that occurs among persons who excessively chew smokeless tobacco in India. Understanding the genetic etiology of this disease, especially the role of cancer stem cells (CSCs), remains rudimentary to date. We performed RNA-seq data analysis from OSCC-GB primary tumors based on CD44 CSC surface markers. Our study showed DNA damage associated etiology based on identified Catalogue of Somatic Mutations in Cancer (COSMIC) signatures showing predominant C>T mutations and 1bp T/(A) nucleotide insertions, indicating the role of smokeless tobacco carcinogens in OSCC-GB. The differential somatic mutational, functional damage predictions and survival analyses indicate the role of DNA damage response genes, with CREB-binding protein (CREBBP) as a significant player among them. The identified CREBBP histone acetyltransferase domain frameshift somatic variant (p.Ile1493AsnfsTer26) disrupts the acetyl-CoA ligand-binding site and inhibits histone acetyltransferase activity, indicating a crucial role in the TP53-mediated DNA damage response in CD44-enriched OSCC-GB, interpreted by the molecular dynamics studies. The novel CSC associated somatic variants identified in the study may play a key role in local-regional recurrence, metastasis and chemo-radioresistance that contributes to the high mortality of the Indian OSCC-GB patients.

O-001: Choosing variant interpretation tools for clinical applications: context matters
COSI: VarI
  • Natàlia Padilla Sirera, Vall d'Hebron Institute of Research (VHIR), Spain
  • Josu Aguirre Gomez, Vall d'Hebron Institute of Research (VHIR), Spain
  • Selen Özkan, Vall d'Hebron Institute of Research (VHIR), Spain
  • Casandra Riera, Vall d'Hebron Institute of Research (VHIR), Spain
  • Xavier De La Cruz, Vall d'Hebron Institute of Research (VHIR), Spain


Presentation Overview: Show

Motivation: Accurate prediction of variant pathogenicity is a key challenge in the clinical application of Next-Generation Sequencing. This situation has favored the development of several in silico tools. However, choosing the optimal tool for our purposes is difficult because of a fact usually ignored: the high variability of clinical context across and within countries, and over time.

Results: We have developed a computational procedure, based on the use of cost models, that allows the simultaneous comparison of an arbitrary number of tools across all possible clinical scenarios. We apply our approach to a set of pathogenicity predictors for missense variants, showing how differences in clinical context translate to differences in tool ranking.

Availability: The source code is available at: https://github.com/ClinicalTranslationalBioinformatics/clinical_space_partition

O-002: Development of continuous, protein-specific predictors of the impact of protein sequence variants
COSI: VarI
  • Selen Ozkan, Vall d'Hebron Research Institute, Spain
  • Natàlia Padilla Sirera, Vall d'Hebron Research Institute, Spain
  • Xavier de la Cruz, Vall d'Hebron Research Institute, Spain


Presentation Overview: Show

Up to date, computational models developed for predicting the effect of protein sequence variants have mostly focused on predicting the binary version effect (benign/pathogenic). However, the need to advance in line with the requirements of Personalized Medicine has created an increasing interest in producing less simplified estimates of variant impact. New research efforts are made in producing continuous estimates comparable to functional assays. Here, we present our approach to this problem following the protein-specific method. We collected deep mutational scanning experiments data available in the literature for individual proteins. We trained specific predictors for each twenty-three proteins with a set of sequence- and structure (AlphaFold)-based input features to build a family of predictors of functional impact of variants. Our model performances from a stringent leave-one-position-out cross-validation method display a statistically significant predictive ability. The success rates of the protein-specific tools are comparable or better than those obtained with other tools in the literature. We also investigated whether a given protein-specific predictor can serve for several proteins. Our preliminary results show that cross-predictions may have an accuracy comparable to that of auto-predictions, opening the possibility to extend the use of protein-specific to other proteins in the clinical genome.

O-003: DDGun: an untrained predictor of protein stability changes upon amino acid variants
COSI: VarI
  • Ludovica Montanucci, Cleveland Clinic, United States
  • Emidio Capriotti, University of Bologna, Italy
  • Giovanni Birolo, University of Torino, Italy
  • Silvia Benevenuta, University of Torino, Italy
  • Corrado Pancotti, University of Torino, Italy
  • Dennis Lal, Cleveland Clinic, United States
  • Piero Fariselli, University of Torino, Italy


Presentation Overview: Show

To estimate the functional effect of single amino acid variants in proteins, it is fundamental to predict the change in the thermodynamic stability, measured as the difference in the Gibbs free energy of unfolding, between the wild-type and the variant protein (∆∆G). Here we present the web-server of the DDGun method, which was previously developed for the ∆∆G prediction upon amino acid variants. DDGun is an untrained method based on basic features derived from evolutionary information. Despite being untrained, DDGun reaches prediction performances comparable to those of trained methods. Here we make DDGun available as a web server. For the web server version, we updated the protein sequence database used for the computation of the evolutionary features and we compiled two new data sets of protein variants to do a blind test of its performances. On these blind data sets of single and multiple site variants DDGun confirms its prediction performance, reaching an average correlation coefficient between experimental and predicted ∆∆G of 0.45 and 0.49 for the sequence-based and structure-based version, respectively. Besides being used for the prediction of ∆∆G, we suggest that DDGun should be adopted as a benchmark method to assess the predictive capabilities of newly developed methods.

O-004: Mapping functional uORFs in the human genome
COSI: VarI
  • Danielle Gutman, University of Pennsylvania, United States
  • David Lee, University of Pennsylvania, United States
  • Paul Jewell, University of Pennsylvania, United States
  • Lou Ghanem, University of Pennsylvania, United States
  • Nicholas J. Hand, University of Pennsylvania, United States
  • Yoseph Barash, University of Pennsylvania, United States


Presentation Overview: Show

Upstream open reading frames (uORFs) are estimated to occupy half of the 5’ translated regions (UTR) in the human genome. uORFs start and stop codons have been shown to be under strong negative selection and disruption of numerous uORFs have been shown to affect downstream gene translation. In this work we aim to build a comprehensive map of the human uORFs and disease associated genetic variants that disrupt those. For this, we performed a computational prediction of 1,505,187 putative uORFs, followed by testing for experimental support based on all available human Ribo-Seq data on SRA. Next, we plan to survey all uORF start/stop codons with strong experimental support against the Penn Medicine BioBank (PMBB) (over 45K patients), reproduce them in the UK BioBank (UKBB), and validate variants effects using luciferase assays. Initial analysis of only 10K PMBB patients and 4392 uORFs from 2 human cell lines identified 6 variant-phenotype associations, 3/3 validated to affect downstream gene translation, in association for 7 phenotypes (Lee et al Nat Comm 2021).

O-005: Evaluating the relevance of sequence conservation in the prediction of pathogenic missense variants
COSI: VarI
  • Emidio Capriotti, University of Bologna, Italy
  • Piero Fariselli, University of Torino, Italy


Presentation Overview: Show

Evolutionary information is the primary tool for detecting functional conservation in nucleic acid and protein. This information has been extensively used to predict structure, interactions and functions in macromolecules. Pathogenicity prediction models rely on multiple sequence alignment information at different levels. However, most accurate genome-wide variant deleteriousness ranking algorithms consider different features to assess the impact of variants. Here, we analyze three different ways of extracting evolutionary information from sequence alignments in the context of pathogenicity predictions at DNA and protein levels. We showed that protein sequence-based information is slightly more informative in the annotation of Clinvar missense variants than those obtained at the DNA level. Furthermore, to achieve the performance of state-of-the-art methods, such as CADD and REVEL, the conservation of reference and variant, encoded as frequencies of reference/alternate alleles or wild-type/mutant residues, should be included. Our results on a large set of missense variants show that a basic method based on three input features derived from the protein sequence profile performs similarly to the CADD algorithm which uses hundreds of genomic features. These observations indicate that for missense variants, evolutionary information, when properly encoded, plays the primary role in ranking pathogenicity.

O-006: Predicting functional consequences of mutations using molecular interaction network features
COSI: VarI
  • Kivilcim Ozturk, UCSD, United States
  • Hannah Carter, UCSD, United States


Presentation Overview: Show

Variant interpretation remains a central challenge for precision medicine. Missense variants are particularly difficult to understand as they change only a single amino acid in a protein sequence yet can have large and varied effects on protein activity. Numerous tools have been developed to identify missense variants with putative disease consequences from protein sequence and structure. However, biological function arises through higher order interactions among proteins and molecules within cells. We therefore sought to capture information about the potential of missense mutations to perturb protein-protein interaction networks by integrating protein structure and interaction data. We developed 16 network-based annotations for missense mutations that provide orthogonal information to features classically used to prioritize variants. We then evaluated them in the context of a proven machine-learning framework for variant effect prediction across multiple benchmark datasets and demonstrated their potential to improve variant classification. Interestingly, network features resulted in larger performance gains for classifying somatic mutations than for germline variants, possibly due to different constraints on what mutations are tolerated at the cellular versus organismal level. Our results suggest that modeling variant potential to perturb context-specific interactome networks is a fruitful strategy to advance in silico variant effect prediction.

O-007: Lessons learned from 10 years of CAGI experiments
COSI: VarI
  • John Moult, University of Maryland, United States


Presentation Overview: Show

This talk/poster will present the lessons learned from 10 years of the Critical Assessment of Genome Interpretation (CAGI) experiments as written in the consortium’s summary manuscript. CAGI aims to advance the state of the art for computational prediction of genetic variant impact, particularly those relevant to human disease. There have been five editions of the CAGI community experiment comprised of 50 challenges, in which participants make blind predictions of phenotypes from genetic data, which are evaluated by independent assessors. Overall, the results show that while current methods are imperfect, they already have major utility for research and clinical applications. Missense variant interpretation methods are able to estimate biochemical effects with increasing accuracy. Performance is particularly strong for clinical pathogenic variants, including some difficult-to-diagnose cases, and extends to interpretation of cancer-related variants. Assessment of methods for regulatory variants and those for complex trait disease risk is less definitive but shows potential performance suitable for auxiliary use in the clinic. Emerging methods and increasing availability of large and robust datasets for training and assessment suggest further progress ahead.

O-008: Identification of variants for familial combined hyperlipidemia
COSI: VarI
  • Alejandro Q. Nato Jr., Department of Biomedical Sciences, Joan C. Edwards School of Medicine, Marshall University, Huntington, WV, 25755 USA, United States
  • Hafiz Ata Ul Mustafa, Abramson Cancer Center, University of Pennsylvania, Philadelphia, PA, 19104 USA, United States
  • Vinícius Magalhães Borges, Department of Biomedical Sciences, Joan C. Edwards School of Medicine, Marshall University, Huntington, WV, 25755 USA, United States
  • Riya K. Patel, Tata Consultancy Services Limited, Edison, NJ, 08837 USA, United States
  • Syed Wasi Haider, Department of Software Verification and Validation, Abbott Laboratories, Saint Paul, MN, 55117 USA, United States
  • Khalid Kunji, Qatar Center for Artificial Intelligence, Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar, Qatar
  • Jun Fan, Department of Biomedical Sciences, Joan C. Edwards School of Medicine, Marshall University, Huntington, WV, 25755 USA, United States
  • Dan Lucas, Charleston Area Medical Center Institute for Academic Medicine, Charleston, WV, 25304 USA, United States
  • Hatim Omar, Adolescent Health Center, Lehigh Valley Reilly Children’s Hospital, Allentown, PA, 18103 USA, United States
  • William Neal, Department of Pediatrics, School of Medicine, West Virginia University, Morgantown, WV, 26506 USA, United States
  • Mohamad Saad, Qatar Center for Artificial Intelligence, Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar, Qatar
  • Donald A. Primerano, Department of Biomedical Sciences, Joan C. Edwards School of Medicine, Marshall University, Huntington, WV, 25755 USA, United States


Presentation Overview: Show

Familial combined hyperlipidemia (FCHL) is a complex disease characterized by elevated cholesterol and triglycerides. Our goal is to identify variants associated with FCHL within regions with evidence of linkage to the disease. FCHL families were ascertained by: (1) identifying probands with fasting plasma total cholesterol ≥95th percentile and triglycerides ≥90th percentile for age, sex, and race and (2) having at least 2 participants with FCHL and 1 unaffected relative. From 638 ascertained individuals in 41 extended families, we trimmed individuals to include pedigree branches having closely-related affected individuals. After trimming, we have total of 446 individuals in 23 families (size=4-41), of which samples of 180 individuals were genotyped using ~654K SNP panel and whole exome sequencing (WES) was conducted after GIGI-Pick selection. We used the Pedigree-Based Analysis Pipeline (PBAP) to: (1) perform file manipulations, (2) validate pedigree structures, (3) sub-select linkage disequilibrium (LD)-reduced panel of markers, and (4) access gl_auto of the MORGAN package to sample inheritance vectors (IVs). Using sampled IVs, we performed linkage analysis using PBAP to identify region(s) of interest (ROIs). Within ROI(s), we performed family-based genotype imputation in individuals without WES data, and family-based association analysis to identify variants associated with FCHL.

O-009: Information theoretic analysis of synonymous codon usage offers context-dependent metric to assess constraint on synonymous variants
COSI: VarI
  • Blythe Moreland, The Institute for Genomic Medicine, Nationwide Children's Hospital, United States
  • Jeffrey Gaither, The Institute for Genomic Medicine, Nationwide Children's Hospital, United States
  • Peter White, The Institute for Genomic Medicine, Nationwide Children's Hospital, United States


Presentation Overview: Show

Insight into synonymous codon usage and metrics to assess the impact of changes in sSNVs are critically needed to enable consideration of “silent” variants in the genetic diagnosis process. While synonymous single nucleotide variants (sSNVs) do not alter protein sequence, they may influence several overlaying mRNA processes. Using an information theoretic approach, we evaluated how codon usage in human transcripts varies with sequence contexts and relate disruption of sSNV patterns to constraint in the human population.

For each amino acid group with codons varying at codon position 3 (CP3), we calculate the mutual information (MI) between the distributions of codons and those of neighboring codons and nucleotides. We find that MI, outside of central bicodons, is driven by local maxima at CP3s and demonstrate that this effect is significantly larger than control contexts. Other local sequence variables account for much, but not all, of this codon-context correlation. We convert this MI to a sSNV score that represents how expected the substitution is in that sequence context. Using TCGA data as a test case, we find relevant pathways enriched in the somatic sSNVs from the extrema of our score’s distribution, thereby demonstrating the biological significance of sSNVs in human disease.

O-010: Rare disease diagnosis without the need for human intervention using explainable machine learning and similarity matching approaches
COSI: VarI
  • Tarun Karthik Kumar Mamidi, University of Alabama–Birmingham School of Medicine, United States
  • Elizabeth Worthey, University of Alabama–Birmingham School of Medicine, United States
  • Brandon Wilk, University of Alabama–Birmingham School of Medicine, United States
  • Manavalan Gajapathy, University of Alabama–Birmingham School of Medicine, United States
  • Alexander Moss, University of Alabama–Birmingham School of Medicine, United States
  • Donna Brown, University of Alabama–Birmingham School of Medicine, United States
  • James Holt, HudsonAlpha Institute for Biotechnology, United States


Presentation Overview: Show

Background: Molecular diagnostics aims to identify variants responsible for a patient’s clinical presentation. Manual analysis and prioritization of variants is time consuming and costly. We sought to test a combination of explainable ensemble learning algorithms for variant impact prediction and similarity learning for phenotype based variant prioritization for rapid diagnosis.
Methods: Pathogenic and benign classified variants from ClinVar (527,254) and HGMD (143,850) were annotated with variant frequency, impact, and damage prediction data. They were then trained through an ensemble learning model (DITTO). Patient phenotype terms were analyzed using similarity learning method (HAZEL) to prioritize genes based on Human Phenotype Ontology.
Results: We observed 90% accuracy on DITTO variant pathogenicity prediction with explanations. We observed >99% accuracy of HAZEL causal gene identification from our OMIM test dataset. We tested combined application of DITTO and HAZEL on finding causal variants for undiagnosed patients; in 30% of cases we placed the causal variant in the top 5 prioritized variants.
Conclusion: We developed a pipeline using machine learning and similarity methods to identify disease variants in the presence of diverse clinical presentations without the need for human intervention. These methods are being integrated into our rare disease programs to reduce diagnosis time and cost.

O-011: Pursuing the discovery of impactful genetic variation in selected rat population - Hybrid Rat Diversity Panel
COSI: VarI
  • Rebecca Schilling, Medical College of Wisconsin, United States
  • Melinda Dwinell, Medical College of Wisconsin, United States
  • Anne Kwitek, Medical College of Wisconsin, United States
  • Shur-Jen Wang, Medical College of Wisconsin, United States
  • Mahima Vedi, Medical College of Wisconsin, United States
  • Marek Tutaj, Medical College of Wisconsin, United States
  • Jyothi Thota, Medical College of Wisconsin, United States
  • Ketaki Thorat, Medical College of Wisconsin, United States
  • Akiko Takizawa, Medical College of Wisconsin, United States
  • Jennifer Smith, Medical College of Wisconsin, United States
  • Monika Tutaj, Medical College of Wisconsin, United States
  • Harika Nalabolu, Medical College of Wisconsin, United States
  • Stanley Laulederkind, Medical College of Wisconsin, United States
  • Lynn Lazcares, Medical College of Wisconsin, United States
  • Logan Lamers, Medical College of Wisconsin, United States
  • Mary Kaldunski, Medical College of Wisconsin, United States
  • Morgan Hill, Medical College of Wisconsin, United States
  • G. Thomas Hayman, Medical College of Wisconsin, United States
  • Wendy Demos, Medical College of Wisconsin, United States
  • Jeffrey De Pons, Medical College of Wisconsin, United States


Presentation Overview: Show

Hybrid Rat Diversity Panel consists of 33 classic inbred strains and two recombinant inbred panels: 33 FXLE/LEXF strains from Japan and 30 HXB/BXH 30 strains from the Czech Republic. Those rats were carefully selected for genetic diversity in order to study genetic loci associated with complex traits. Common human diseases are complex traits and are shaped by the additive effect of many genetic variants. Here we provide insight into variant diversity in population of 72 rats, distribution of SNVs and indels in QTLs and genes associated with disease phenotypes for which the strains were selected, as well as functional consequences of variants shared between strains representing the same disease model, like hypertension. We aligned whole genome sequences (Illumina short-reads) to the high-quality rat reference mRatBN7.2, performed variants call (GATK4) and analyzed their impact on specific genes (SnpEff, RGD OLGA ontology tool). As disease-associated loci contain many genes it is difficult to identify the compromised ones and even more difficult to distinguish causal variants and their phenotypic effects. Researchers increasingly utilize multiple organism models exploiting their advantages in pursuance of comprehensive knowledge. Thereby the Rat Genome Database integrates multi-species data, develops tools to improve multilevel navigation and discovery of valuable information.

O-012: Mapping the human RNA G-Quadruplexes and genetic variants that affect them using Transformer based modeling
COSI: VarI
  • Farica Zhuang, University of Pennsylvania, United States
  • Danielle Gutman, University of Pennsylvania, United States
  • Di Wu, University of Pennsylvania, United States
  • Yoseph Barash, University of Pennsylvania, United States


Presentation Overview: Show

RNA G-quadruplex (rG4) have been known to play an important role in gene regulation. Recent advances in sequencing technology have produced experimental data detailing G4 formations in the genome. Following this, various computational methods have been developed to predict whether a G4 is likely to form on a given sequence. While deep learning methods such as convolutional neural network (CNN) have proven to perform well, these models are not able to integrate information from long-range interactions of the input sequence. To be able to better capture up and downstream nucleotide contexts, we trained a transformer-based model on rG4-seeker in-vitro G4 RNA data to produce G4mer. We show that G4mer outperforms other CNN-based models such as G4Detector (Barshai et al., 2021) in predicting G4 sequences and classifying G4s into its sub-categories. Using G4mer, we study G4 formations in the human 5’UTR regions and search for variants in G4s associated with disease and phenotype.

O-013: Bayesian fine mapping reveals sex-specific genetic regulation in the DLPFC during early prenatal development
COSI: VarI
  • Joseph Lalli, Department of Genetics, University of Wisconsin-Madison, United States
  • Donna Werling, Department of Genetics, University of Wisconsin-Madison, United States


Presentation Overview: Show

Many neurodevelopmental disorders (such as autism, ADHD, and others) are more prevalent in one sex. While cultural factors undoubtedly play a role in this difference, multiple lines of evidence suggest that there is a biological component to these sex differences as well. Despite the growing incidence of these disorders, the role of sex in neurodevelopment remains poorly understood.
Using eQTL analysis, we attempted to measure the role of sex in gene regulation during neurodevelopment. We utilized a previously published dataset of bulk RNA-seq and whole genome DNA sequences from the dorsolateral prefrontal cortices (DLPFCs) of 82 human fetuses with gestational ages of 14 to 21 weeks, when sex hormone levels peak during development. Applying Bayesian fine-mapping techniques, we found 335 credible sets of SNPs that affect the expression of 297 nearby genes in a sex-specific manner. Gene ontology analysis suggests that these genes are significantly associated with neurodevelopmental disorders, and are associated with E2F1 and EGR1 regulatory pathways - critical for neuroprogenitor proliferation and brain plasticity, respectively. Our work suggests that genes important for neurodevelopment are differentially regulated in male and female DLPFCs during a critical period, providing a potential mechanism for the sex-biased prevalence of some neurodevelopmental disorders.

O-014: Discovering Significant Evolutionary Trajectories in Cancer Phylogenies
COSI: VarI
  • Leonardo Pellegrina, University of Padova, Italy
  • Fabio Vandin, University of Padova, Italy


Presentation Overview: Show

Tumors are the result of a somatic evolutionary process leading to substantial intra-tumor heterogeneity.
Single-cell and multi-region sequencing enable the detailed characterization of the clonal architecture of tumors, and have highlighted its extensive diversity across tumors.
While several computational methods have been developed to characterize the clonal composition and the evolutionary history of tumors, the identification of significantly conserved evolutionary trajectories across tumors is still a major challenge.

We present a new algorithm, MASTRO, to discover significantly conserved evolutionary trajectories in cancer.
MASTRO discovers all conserved trajectories in a collection of phylogenetic trees describing the evolution of a cohort of tumors, allowing the discovery of conserved complex relations between alterations.
MASTRO assesses the significance of the trajectories using a conditional statistical test that captures the coherence in the order in which alterations are observed in different tumors.

We apply MASTRO to data from non-small-cell lung cancer bulk sequencing and to acute myeloid leukemia data from single-cell panel sequencing, and find significant evolutionary trajectories recapitulating and extending the results reported in the original studies.

O-015: SNPs and Gene prioritization using summary statistics data of lung function GWAS from different populations
COSI: VarI
  • Afeefa Zainab, Tohoku University, Sendai , Japan, Japan
  • Takeshi Obayashi, Tohoku University, Japan
  • Kengo Kinoshita, Tohoku University, Japan


Presentation Overview: Show

The condition of the human respiratory system is often assessed by the measurement of lung function which is the ratio of forced expiratory volume in one second and forced vital capacity FEV1/FVC. Lung function is used for the diagnosis and assessment of many respiratory disorders including chronic obstructive pulmonary disorder COPD. There are various genome-wide association studies GWAS performed to identify SNPs associated with disease but their function is unknown. There is a need for post GWAS analysis to identify candidate causal genes and SNPs from several reported variants. GWAS from different populations needs to be compared in order to identify functional genes associated with disease in all populations thus helping in the development of better treatment and diagnosis options. We performed SNP and gene prioritization using GWAS summary statistics data from different populations. Gene mapping is performed using FUMA software to prioritize protein-coding genes from both data sets. These results were then analyzed and compared to obtain genes that are similar in both populations and some unique genes limited to one population. This study will help in functional interpretation of reported SNPs and identification of same set of genes associated with lung function in different populations thus helping in better diagnostic and treatment approaches.

O-016: Comparing genotype calling software performance in Genotyping-by-Sequencing data of F1 outcrossing based on genetics maps quality
COSI: VarI
  • Cristiane Taniguti, Texas A&M University, United States
  • Lucas Taniguti, Mendelics, Brazil
  • Gabriel Gesteira, North Carolina State University, United States
  • Thiago Oliveira, Roslin Institute, United Kingdom
  • Jeekin Lau, Texas A&M University, United States
  • Getúlio Ferreira, University of São Paulo, Brazil
  • Rodrigo Amadeu, Bayer Crop Science, United States
  • David Byrne, Texas A&M University, United States
  • Oscar Riera-Lizarazu, Texas A&M University, United States
  • Guilherme Pereira, Federal University of Viçosa, Brazil
  • Marcelo Mollinari, North Carolina State University, United States
  • Antonio Augusto Garcia, University of São Paulo, Brazil


Presentation Overview: Show

Genetic maps are built based on meiotic recombination identified in a mapping population. The distance in genetic maps is assessed considering the number of recombination events between markers. The smaller the number of recombination events between two markers, the lower the genetic distance between them. Markers that are close together are considered linked. The linkage pattern between markers is then used to cluster and order them in linkage groups. Since the properties of genetic linkage are well-known, it is possible to identify low-quality markers using linkage analysis. Genotyping errors lead to overestimating recombination frequencies, resulting in inflated genetic distances and poor grouping and ordering of markers. Thus, good-quality genetic maps can be used to validate upstream procedures. Here, we used the Reads2Map (WDL) workflows to test the performance of GATK, Freebayes, SuperMASSA, updog, and polyRAD software by building genetic maps with markers identified by these packages in simulated and empirical genotyping-by-sequencing (GBS) data of F1 outcrossing populations. Results showed that Freebayes and GATK provide fewer genotyping errors compared to updog, polyRAD, and SuperMASSA, but the last three also provide reliable genotypic data and genetic maps if proper filters such as genotype probability and non-informative markers are applied.

O-017: Incorporating the Genetically Regulated Transcriptome into the Analysis of Protein Expression
COSI: VarI
  • Henry Wittich, Loyola University Chicago, United States
  • Heather Wheeler, Loyola University Chicago, United States


Presentation Overview: Show

Gene regulation encompasses the variety of environmental and genetic mechanisms that control the processes of transcription and translation to create differential gene expression across tissues and between organisms. DNA regulatory elements can be divided into two categories: cis-acting, which are within 1 Mb of their target gene, and trans-acting, which are further than 1 Mb away from their target gene. Transcriptome-wide association studies (TWAS), which perform a multiple linear regression analysis between predicted transcript levels and any measured trait, have proven successful at identifying trans-acting relationships by leveraging both genomic and transcriptomic data. Here I extend this analysis by performing TWAS for protein levels, incorporating this third omics trait into the study of gene regulation. For 3,301 individuals from the INTERVAL cohort and 971 individuals from the TOPMed MESA cohort, I performed TWAS with the software tool, PrediXcan, to identify associations between predicted transcript levels and plasma protein levels measured with a SOMAscan assay. Transcript levels were predicted in 49 tissues using GTEx models. With this analysis, we replicated several previously identified “master-regulator” regions - loci that regulate the expression of many genes across the genome - and we also find that cis-acting effects are more uniformly shared across tissues.

O-017: Incorporating the Genetically Regulated Transcriptome into the Analysis of Protein Expression
COSI: VarI
  • Henry Wittich, Loyola University Chicago, United States
  • Heather Wheeler, Loyola University Chicago, United States


Presentation Overview: Show

Gene regulation encompasses the variety of environmental and genetic mechanisms that control the processes of transcription and translation to create differential gene expression across tissues and between organisms. DNA regulatory elements can be divided into two categories: cis-acting, which are within 1 Mb of their target gene, and trans-acting, which are further than 1 Mb away from their target gene. Transcriptome-wide association studies (TWAS), which perform a multiple linear regression analysis between predicted transcript levels and any measured trait, have proven successful at identifying trans-acting relationships by leveraging both genomic and transcriptomic data. Here I extend this analysis by performing TWAS for protein levels, incorporating this third omics trait into the study of gene regulation. For 3,301 individuals from the INTERVAL cohort and 971 individuals from the TOPMed MESA cohort, I performed TWAS with the software tool, PrediXcan, to identify associations between predicted transcript levels and plasma protein levels measured with a SOMAscan assay. Transcript levels were predicted in 49 tissues using GTEx models. With this analysis, we replicated several previously identified “master-regulator” regions - loci that regulate the expression of many genes across the genome - and we also find that cis-acting effects are more uniformly shared across tissues.

O-018: Hypertension Genetics is Sex-dependent
COSI: VarI
  • Roei Zucker, The Hebrew University of Jerusalem, Israel
  • Michal Linial, The Hebrew University of Jerusalem, Israel


Presentation Overview: Show

Hypertension is among the most prevalent conditions, with an estimated one billion cases worldwide. It increases the risk of renal diseases, cerebrovascular and cardiovascular diseases. Hypertension is a polygenic disease with strong environmental contribution. The list of associated genes from GWAS keeps growing (1362 genes, OpenTargets), with only a few having been functionally validated. In this study, we applied a gene-based method called the Proteome-wide association study (PWAS) to detect associations through the effect of variants on protein functions. PWAS aggregates the signal from all variants from UK-Biobank (UKBB) affecting the protein-coding genes and provides three generalized models for dominant, recessive, and combined inheritance. PWAS identified 72 statistically significant associated genes (FDR-q-value <0.05) and 158 with a more relaxed threshold (FDR-q-value <0.1). Only half of the PWAS genes overlap with GWAS. Analyzing females and males from UKBB genotyping revealed a strong sex-dependent genetic signal. We found that females carry most of the polygenetic signal (28 vs. 6 genes), with only SH2B3 shared between sexes. Several of the female-female-unique genes are associated with cellular immune function. We conclude that hypertension displays sex-dependent genetics with a substantial recessive inheritance. We show the benefit of PWAS in enhancing its interpretability and clinical utility.