Posters - Schedules
Poster presentations at ISMB/ECCB 2021 will be presented virtually. Authors will pre-record their poster talk (5-7
minutes) and will upload it to the virtual conference platform site along with a PDF of their poster beginning July 19
and no later than July 23. All registered conference participants will have access to the poster and presentation
through the conference and content until October 31, 2021. There are Q&A opportunities through a chat
function and poster presenters can schedule small group discussions with up to 15 delegates during the conference.
Information on preparing your poster and poster talk are available at:
https://www.iscb.org/ismbeccb2021-general/presenterinfo#posters
Ideally authors should be available for interactive chat during the times noted below:
View Posters By Category
Session A: Sunday, July 25 between 15:20 - 16:20 UTC |
Session B: Monday, July 26 between 15:20 - 16:20 UTC |
---|---|
Session C: Tuesday, July 27 between 15:20 - 16:20 UTC |
Session D: Wednesday, July 28 between 15:20 - 16:20 UTC |
---|---|
Session E: Thursday, July 29 between 15:20 - 16:20 UTC |
---|
Short Abstract: We present a comprehensive analysis of Oxford Nanopore (ONT) sequencing technology compared with short-read techniques, such as Illumina. In our study, we focus on the structural variants, at least 50 bp segments of DNA in length that are unique for personal genomes, as identified by the 1000 Genomes project. We improve the quality of the Structural Variants identification from the whole genome sequencing (WGS) experiments by using the consensus approach. Fifteen gold-standard tools were used for obtaining the polished list of Structural Variants (SV) for daughters of families from the 1000 Genomes project using publicly available datasets from next-generation sequencing experiments performed by both short-read (Illumina) and long-read (ONT) technologies. The results of the SV callers were merged using the novel ConsensuSV algorithm, which integrates the SV sets using machine learning by combining decision trees and neural networks trained and benchmarked on the high-quality SVs from the 1k Genomes Project. Finally, upon comparing the SV sets obtained from ConsensuSV algorithm between long and short read, our findings demonstrate the superiority of ONT across all SV sizes, long-read-based SV inference detected more SVs than the short-read one.
Short Abstract: Interpreting whole genome sequencing data is a major challenge, since 98% of variants reside in non-coding genomic “dark matter”, including regulatory elements and non-coding RNA (ncRNA) genes.
The GeneCards® Suite (www.genecards.org/) is a leading integrated biomedical knowledgebase for interpretation of clinical genetics, including the gene-centric GeneCards (PMID:27322403) and disease-centric MalaCards (PMID:27899610). VarElect (PMID:27357693), our NGS phenotype interpreter, leverages this knowledgebase to prioritize associations between genes and disease/phenotype terms. We’ve made significant strides towards optimizing our Suite for effective interpretation of non-coding variants.
GeneHancer (PMID:28605766) is a database of ~400k functionally annotated enhancers, promoters, and their target genes. It integrates information from key epigenetic resources, and is included as a native regulation track at the UCSC genome browser.
GeneCaRNA (PMID:33676929) is a novel gene-centric ncRNA database, integrating data from major gene and transcript resources. GeneCaRNA provides a comprehensive non-redundant view of >220k human ncRNAs of 17 functionally diverse types, such as lncRNAs and miRNAs.
Our novel non-coding compendia provide an indispensable augmentation for VarElect, empowering the prioritization of variant-containing enhancers, promoters and ncRNA genes with respect to diseases, via direct and target-gene mediated links. These capabilities facilitate deciphering the clinical significance of non-coding variants, often elucidating unsolved clinical cases (PMID:32506582).
Short Abstract: Interpretation of genomic variation plays an essential role in monogenic disease, the analysis of cancer and increasingly also in complex trait disease, with applications ranging from basic research to clinical decisions. Yet the field lacks a clear consensus on the appropriate level of confidence to place in variant impact prediction. The Critical Assessment of Genome Interpretation (CAGI, \'kā-jē\) is a community experiment to objectively assess computational methods for predicting the phenotypic impact of genomic variation. CAGI participants are provided genetic variants and make blind predictions of resulting phenotype. Independent assessors evaluate the predictions by comparing with experimental and clinical data.
Over five CAGI editions completed during the past decade, several themes have emerged. Top missense prediction methods are highly statistically significant. Interpretation of non-coding variants shows promise but is not at the level of missense. Bespoke approaches often enhance performance. Conservation-based methods show the most consistent performance. Interpretation of whole-genome data remains an open challenge. However, in certain examples of using clinical data, predictors identified causal variants overlooked by initial analysis in a diagnostic laboratory.
CAGI 6 is currently underway with challenges addressing missense, clinical genomes, cancer, splicing and polygenic risk scores. See: genomeinterpretation.org.
Short Abstract: Shalini Rajagopal, Ayam Gupta, Jalaja Naravula, Anil Kumar S, Praveen Mathur, Anita Simlot, Sudhir Mehta, Chhagan Bihari, Sumita Mehta, Ashwani Kumar Mishra, Krishna Mohan Medicherla, G Bhanuprakash Reddy, PB Kavi Kishor and Prashanth Suravajhala*
Understanding the genetic variants is a major focus in our project research. We would like to take this opportunity to present a poster on genetic variants associated with Vitamin K Deficiency and discuss possible future directions of our project. The aim of our study is to detect and quantify the differentially expressed genes and the variant effects on experimental conditions which gives information on how genes are regulated and reveals the details of organism's biology. There are 46 genes that have been related to Vitamin K so far wherein major genes such as VKORC1, GGCX and VKA are involved in the biological functions but deciphering the mechanism of few genes is still unknown. Our future work will identify the high sensitivity and specificity in SNP calls for different conditions of cohorts of samples. The research ideas shared in our presentation would hopefully be absorbing to the scientific community and this will help early-career researchers like me to gain experience from this learning curve.
Short Abstract: With the continued promise of immunotherapy as an avenue for treating cancer, understanding how host genetics contributes to the tumor immune microenvironment (TIME) is essential to tailoring cancer risk screening and treatment strategies. Using genotypes from over 8,000 European individuals in The Cancer Genome Atlas (TCGA) and 137 heritable tumor immune phenotype components (IP components), we identified and investigated 482 TIME-associated variants. Many TIME-associated variants influence gene activities in specific immune cell subsets, such as macrophages and dendritic cells, and interact to promote more extreme TIME phenotypes. TIME-associated variants were predictive of immunotherapy response in human cohorts treated with immune-checkpoint blockade (ICB) in 3 cancer types, causally implicating specific immune-related genes that modulate myeloid cells of the TIME. Moreover, we validated the function of these genes in driving tumor response to ICB in preclinical studies. Through an integrative approach, we link host genetics to TIME characteristics, informing novel biomarkers for cancer risk and target identification in immunotherapy.
Short Abstract: The ITHANET portal (www.ithanet.eu) is an expanding, publicly available biomedical resource dedicated to haemoglobinopathies. It provides a manually curated, literature-derived collection of published genetic and epidemiological data, also integrating the latest updates on news, events, publications and many more.
A team of expert biocurators retrieves, validates and annotates information from scientific literature and individual submitters, whilst also incorporating new and updated information from existing public databases (e.g., HbVar, dbSNP, ClinVar).
ITHANET offers a range of inter-linked databases; IthaGenes currently stores annotations for over 3180 variants in over 420 globin-related loci and genes. IthaMaps consists of epidemiological data for over 200 countries, which are illustrated both at a global and regional scale. IthaChrom is a collection of digitized reports of standard diagnostic HPLC analyses on more than 600 haemoglobin variants. Recently, ITHANET developed IthaPhen, an interactive genotype-phenotype database and a tool focused on the characterization and detection of CNVs related to haemoglobinopathies.
ITHANET is the most comprehensive knowledgebase on haemoglobinopathies and the official partner of the HVP Global Globin Network for data storing, curation and sharing within and between countries. ITHANET is coordinating the ClinGen Hemoglobinopathy VCEP, focused on standardizing pathogenicity classification of genetic variants according to the ACMG/AMP guidelines.
Short Abstract: Single amino acid variations (SAVs) are a primary contributor to variations in the human genome. Identifying pathogenic SAVs can aid in the understanding of the genetic architecture of complex diseases. Most approaches for predicting the functional effects or pathogenicity of SAVs rely on either sequence or structural information. Recently, researchers have attempted to increase the accuracy of their predictions by incorporating protein dynamics. We present
Short Abstract: Loss-of-Function (LoF) mutations include nonsense Single Nucleotide Polymorphisms (SNPs), frameshift indels and splice site SNPs, which usually lead to premature termination of their transcription as well as their translations. It has been estimated that a typical human genome harbour 149-182 putative LOF mutations. Identification of disease causing LoF mutations out of those many possible LoF mutations is a major bottleneck while applying whole exome/genome sequencing for clinical diagnosis of diseases. For our analysis we have used putatively benign and pathogenic LoF mutations (except the splice site SNPs) from publicly available databases. We analysed impact of LoFs at transcript level and at protein level. We combined transcript and protein level results and classified LoF mutations into three groups i.e. I. LoFs leading to complete protein function loss, II. LoFs leading to partial function loss and III. LoFs with no function loss. We further analysed genes harbouring group I LoF mutations at systems level using network approaches. Our studies show significant differences between pathogenic and putatively benign LoF mutations at various levels and this knowledge will be used to develop a machine learning model to identify LoFs likely to be pathogenic from those that are likely to be benign LoFs.
Short Abstract: Synonymous variants are usually neglected in genetic studies. Recently, their functional roles are increasingly investigated, but not yet systematically with respect to diseases and traits.
We perform such evaluation based on the genome-wide association studies (GWAS) catalog. Effects on transcription are assessed via expression quantitative trait locus (eQTL) annotations obtained via tool Qtlizer. Effects on translation are evaluated via codon usage bias using relative synonymous codon usage (RSCU) as transcript level-based quantification.
There are 101 exclusively synonymous GWAS catalog variants in 94 genes, linked to 3,267 eQTL annotations of which 99 eQTLs (3%) from 31 variants are flagged as best eQTLs. Notably, variant rs199533 in gene NSF, associated with Parkinson's disease and cancer, has most eQTLs, indicating a gene regulatory role and variant rs11568377, associated with systolic blood pressure in sickle cell anemia, affects codons of large RSCU difference, plausibly interfering with protein folding. Finally, in an extended data set, we show that RSCU distributions for 39 of 119 trait-associated synonymous codons (33%) differ significantly from those of transcriptome-wide protein-coding sequences.
In summary, our results indicate GWAS variants and biological mechanisms for follow-up studies and that functional roles of synonymous, disease-associated variants may be more common than intuitively expected.
Short Abstract: Functional data and structural context can aid our understanding of the specific roles each residue in a protein plays and offer insight into the potential for a variant to be associated with disease. Researchers can make use of an ever-increasing variety of data from different sources at the gene/protein level or at the level of single nucleotides or amino acids. The PepVEP platform collates functional and structural data from various EMBL-EBI resources at the protein residue level and maps to genomic coordinates to allow users to query any position in the proteome.
The data include protein/protein and protein-ligand interactions both from every structure in the PDB and from experimentation. Also, mutagenesis experiments which directly assess specific variation and all publicly available human variants from healthy and disease populations. The data can be retrieved programmatically via an API which accepts a variety of input or a user interface with additional features such as the variant position in every structure of the protein.
PepVEP is regularly updated with the latest data and allows clinical geneticists and researchers a single location to gather information on any specific missense change at any position in the proteome to understand its potential impact.
Short Abstract: Modern sequencing technologies provide an unprecedented amount of data about missense single-nucleotide variations leading to changes in protein sequences. For many single residue variations (SRVs), links to genetic diseases are reported. From HUMSAVAR and ClinVar, we collected human SRVs whose effect on human health is annotated as Pathogenic/Likely Pathogenic (P/LP) or Benign/Likely Benign (B/LB).
After merging, the Union dataset contains 3,627 proteins carrying 75,927 SRVs. Of them, 44,543 and 31,384 are labelled as P/LP and B/LB, respectively. The intersection between SRVs from HUMSAVAR and ClinVar is limited:the two datasets share about 5% and 30% of B/LB and P/LP SRVs, respectively. The question poses as to which extent the SRVs from different datasets share physico-chemical and structural features. With computational methods, we characterised solvent accessibility, flexibility and disorder of positions carrying P/LP and B/LB SRVs, and we compared the results obtained on ClinVar, HUMSAVAR and Union datasets. P/LP SRVs are significantly more abundant in buried/rigid positions, while B/LB SRVs occur preferentially in solvent-exposed/flexible regions. P/LP SRVs have a slight tendency to be more abundant than B/LB in not disordered regions. Overall, the findings suggest that SRVs deriving from HUMSAVAR and ClinVar, despite their limited overlap, share common physico-chemical and structural features.
Short Abstract: The broad family of integral membrane proteins are indispensable components of living cells. The understanding of their function and stability is thus a major focus of biomedical and biotechnological research considering for example that the majority of FDA approved drugs target this class of proteins. The aim of this investigation is to understand how mutations impact on the stability of membrane proteins. We start by defining a series statistical potentials derived from a non redundant set of membrane protein structures, which better describe the stability properties of this class of proteins than standard potentials derived from globular proteins. We then combine all the information gathered from these potentials using an artificial neural network approach and construct a prediction model called BraneMuSiC that is able to predict how point mutations affect the folding free energy for a set of about 300 mutations inserted in proteins with known structure. Application to test sets further confirms the accuracy of our predictions and show that BraneMuSiC outperforms the state of the art methods for folding free energy change predictions. Our method will thus be of importance in protein design, in order to rationally modify membrane protein biophysical characteristics, and in the evaluation of the deleteriousness of genetic variants that target them.
Short Abstract: Protein-protein interactions drive virtually all biological processes in living organisms and are necessary for cellular machinery. Disease-causing point mutations on a protein interface affect its ability to interact with its partners and interrupt the physiological mechanism of the cell. Hence, it is crucial to develop a systematic and accurate approach assessing the impact of mutations on the formation and stability of protein complexes. Here, we report on a deep learning approach to directly estimate the changes of binding affinity upon mutations. Given the importance of local structural variations for this purpose, we implemented a siamese architecture that takes as input the local 3D environments around the mutation site in the wild-type and mutated forms of the complex. Thanks to the use of locally oriented frames, the architecture is invariant to 3D translations and rotations. The 3D environments are extracted from conformations generated by the Rosetta-Backrub algorithm that explicitly models the flexibility of the backbone and side-chains, and accounts for their fluctuations around the native state. We evaluated the performance of our approach against experimental binding affinity measurements from SKEMPI-2.0. Our predictive performance on a completely blind test with 50 complexes (one mutation per complex) is comparable or better than state-of-the-art.
Short Abstract: The deposition of amyloid fibrils is a characteristic of a variety of diseases including Alzheimer’s disease (AD). Proteins and peptides with a tendency to form such depositions are called amyloidogenic e.g. Αβ peptide, the primary component of amyloid plaques characteristic of AD. In this work we analyzed how msSNPs affect amyloidogenic proteins. Amyloidogenic proteins were collected from AmyCo, the Amyloidoses Collection, a repository, containing information about amyloidoses and diseases related to amyloid deposition created by our lab. msSNPs were extracted from dbSNP, ClinVar and UniProt. Statistical analyses, such as Chi-squared test, were performed to determine, for example, if alterations to residue properties are correlated to pathogenic msSNPs. Additional analysis was done focusing on msSNPs found in amyloidogenic-prone segments as predicted by AMYLPRED2. It was shown that msSNPs located in those segments are more likely to be pathogenic. To explore how msSNPs contribute to the onset of disease, Aβ Precursor Protein (APP) and AD were used as an example. Pathogenic msSNPs of APP are mostly gathered in and around the Aβ sequence affecting the proteolytic cleavage of APP or tendency of Aβ to aggregate. APP variants have a significant role in AD and should be considered when designing pharmaceuticals.
Short Abstract: Despite a consensus that regulatory mutations play an important role in disease, computational tools supporting their identification are limited and frequently unavailable for the recent genome build. Here, we rebuild the ReMM score for prioritizing non-coding mutations in the GRCh38 human genome assembly. We contrast a curated set of 406 regulatory variants causative for Mendelian disorders and millions of human-derived sequence alterations (as proxy for non-pathogenic variation). We use a set of 26 genomic features combining epigenetic profiles, species conservation and density of disease and population variants to train a hyper-ensemble random forest model. The entire workflow is based on Snakemake, which improves reproducibility and facilitates adaption of the model for future genome releases and inclusion of new features. We achieve an average precision of 0.57 on our data, which compares favorably to the original ReMM version of the GRCh37 build (0.50). We observe moderate correlation of scores (0.72) between genome builds, which we ascribe to the changes in the feature sets as well as adjustments in feature importance of the model. Our work provides a reliable tool for scoring pathogenicity of human regulatory variants and will facilitate further developments of the ReMM score. GRCh38 scores are available at doi.org/10.5281/zenodo.4768448.
Short Abstract: One of the most important frontiers in computational biology and biomedicine is the comprehensive analysis of Next-Generation Sequencing (NGS) data. In cancer research in particular, the identification of somatic mutations is vital for the investigation of their effects on disease progression and treatment response. This is done by considering the sequenced tumour DNA and a reference germline sample, and identifying candidate variants by way of comparison. Despite automated filtering, however, sequencing artifacts or alignment errors are often mistakenly flagged as variants. For this reason, researchers must perform extremely time-consuming manual screening. We demonstrate that it is possible to reliably automate this process using Deep Convolutional Neural Networks, whose utility has been behind many recent successes in applied machine learning. Using previously performed manual annotation as input data, we trained a CNN model that recognises sequencing artifacts with high accuracy, achieving a 5-fold crossvalidation score of 96%, on par with human reviewers. Moreover, we show how this can be extended to account for artifacts specific to library preparation which require comparison with additional sequencing tracks. Altogether, this allows for a significant reduction in the workload for researchers, and can in the future be integrated into bioinformatics workflows for NGS data processing.