The SciFinder tool lets you search Titles, Authors, and Abstracts of talks and panels. Enter your search term below and your results will be shown at the bottom of the page. You can also click on a track to see all the talks given in that track on that day.

View Talks By Category

Scroll down to view Results

July 12, 2024
July 13, 2024
July 14, 2024
July 15, 2024
July 16, 2024

Results

July 13, 2024
10:40-10:45
Welcome
Track: HiTSeq

Room: 517d

Authors List: Show

July 13, 2024
10:45-11:40
Invited Presentation: Unsupervised learning approaches for genomics to decipher structure and dynamics of 3D genome organization and gene regulatory networks
Confirmed Presenter: Sushmita Roy
Track: HiTSeq

Room: 517d
Format: In Person

Authors List: Show

  • Sushmita Roy

Presentation Overview:Show

Advances in genomic technologies have substantially expanded our repertoire of high-dimensional datasets that capture different modalities such as the transcriptome, epigenome and chromosome conformation across many different cellular contexts. An open challenge is to effectively analyze these datasets to extract meaningful structures such as cell types, chromosomal domains, gene modules and regulatory networks. Unsupervised machine learning that aims to extract structure, often low-dimensional, from unlabeled data is a powerful paradigm for unbiased analysis of omic datasets. In this talk, I will present two examples of such approaches, Non-negative matrix factorization (NMF) and graph structure learning to tackle problems in regulatory genomics. We consider multi-task extensions of NMF for examining three-dimensional organization of the genome. Our results show that NMF is a powerful approach for analyzing 3D genome organization from Hi-C assays that can recover biologically meaningful topological units and their dynamics. In the second part of my talk, I will present factorization and graph learning approaches to single cell omic datasets. Using our approaches we have identified key gene expression programs and cell type-specific gene regulatory networks that are informative of cell state and fate specification in different dynamic processes such as cellular differentiation and reprogramming.

July 13, 2024
11:40-12:00
An Adaptive K-Nearest Neighbor Graph Optimized for Single-cell and Spatial Clustering
Confirmed Presenter: Qi Liu, Vanderbilt University Medical Center, United States
Track: HiTSeq

Room: 517d
Format: In Person

Authors List: Show

  • Jia Li, Jia Li, Vanderbilt University Medical Center
  • Yu Shyr, Yu Shyr, Vanderbilt University Medical Center
  • Qi Liu, Qi Liu, Vanderbilt University Medical Center

Presentation Overview:Show

Unsupervised clustering is crucial for characterizing cellular heterogeneity in single-cell and spatial transcriptomics analysis. While conventional clustering methods have difficulty in identifying rare cell types, approaches specifically tailored for detecting rare cell types gain their ability at the cost of poorer performance for grouping abundant ones. We introduce aKNNO, a method to identify abundant and rare cell types simultaneously based on an adaptive k-nearest neighbor graph with optimization. Unlike traditional kNN graphs, which require a predetermined and fixed k value for all cells, aKNNO selects k for each cell adaptively based on its local distance distribution. This adaptive approach enables accurate capture of the inherent cellular structure. Through extensive evaluation across 38 simulated scenarios and 20 single-cell and spatial transcriptomics datasets spanning various species, tissues, and technologies, aKNNO consistently demonstrates its power in accurately identifying both abundant and rare cell types. Remarkably, aKNNO outperforms conventional and even specifically tailored methods by uncovering both known and novel rare cell types without compromising clustering performance for abundant ones. Most notably, when utilizing transcriptome data alone, aKNNO delineates stereotyped fine-grained anatomical structures more precisely than integrative approaches combining expression with spatial locations and/or histology images, including GraphST, SpaGCN, BayesSpace, stLearn, and DR-SC.

July 13, 2024
12:00-12:20
Proceedings Presentation: Forseti: A mechanistic and predictive model of the splicing status of scRNA-seq reads
Confirmed Presenter: Yuan Gao, University of Maryland, College Park
Track: HiTSeq

Room: 517d
Format: In Person

Authors List: Show

  • Dongze He, Dongze He, University of Maryland
  • Yuan Gao, Yuan Gao, University of Maryland
  • Spencer Skylar Chan, Spencer Skylar Chan, University of Maryland
  • Natalia Quintana-Parrilla, Natalia Quintana-Parrilla, University of Puerto Rico
  • Rob Patro, Rob Patro, University of Maryland

Presentation Overview:Show

Motivation: Short-read single-cell RNA-sequencing (scRNA-seq) has been used to study cellular heterogeneity, cellular fate, and transcriptional dynamics. Modeling splicing dynamics in scRNA-seq data is challenging, with inherent difficulty in even the seemingly straightforward task of elucidating the splicing status of the molecules from which the underlying sequenced fragments are drawn. This difficulty arises, in part, from the limited read length and positional biases, which substantially reduce the specificity of the sequenced fragments. As a result, the splicing status of many reads in scRNA-seq is ambiguous because of a lack of definitive evidence. We are therefore in need of methods that can recover the splicing status of ambiguous reads which, in turn, can lead to more accuracy and confidence in downstream analyses.
Results: We develop Forseti, a predictive model to probabilistically assign a splicing status to scRNA-seq reads. Our model has two key components. First, we train a binding affinity model to assign a probability that a given transcriptomic site is
used in fragment generation. Second, we fit a robust fragment length distribution model that generalizes well across datasets deriving from different species and tissue types. Forseti combines these two trained models to predict the splicing status of the molecule of origin of reads by scoring putative fragments that associate each alignment of sequenced reads with proximate potential priming sites. Using both simulated and experimental data, we show that our model can precisely predict the splicing status of reads and identify the true gene origin of multi-gene mapped reads.

July 13, 2024
12:20-12:40
Invited Presentation: Computational Advances In Multiomics Analysis Using HiFi Sequencing
Confirmed Presenter: Liz Tseng
Track: HiTSeq

Room: 517d
Format: In Person

Authors List: Show

  • Liz Tseng

Presentation Overview:Show

PacBio HiFi sequencing has been used to generate the latest and most complete version of the human genome and has ushered in a new era of bioinformatics development. The two main characteristics of HiFi data – long read length and high accuracy – are critical for applications that require near-perfect consensus sequencing or long-range phasing information.
In this workshop, we will describe the bioinformatics tools that have been developed for HiFi data. These tools often address genetic puzzles that were previously challenging or impossible to solve with short reads. For example, Paraphase for resolving segmental duplications, StarPhase for diplotyping important pharmacogenetic genes (e.g. HLA, CYP2D6), and TRGT for repeat expansion profiling. In other cases, new methods were developed based on the unique nature of HiFi sequencing that include methylation signals (e.g. MethBat). Beyond the genome, the ability to sequence full-length transcripts without the need for computational assembly has brought about new long-read-aware tools that address isoform classification (e.g. SQANTI3), fusion detection (e.g. pbfusion, CTAT-LR-fusion), and quantification (e.g. Oarfish). Together, these tools reveal novel insights across the whole spectrum of genetic applications, from uncovering de novo mutations in rare diseases, detecting allele-specific methylation patterns, to helping design new therapeutic targets in neurodegenerative diseases.

July 13, 2024
14:20-14:40
Proceedings Presentation: Sigmoni: classification of nanopore signal with a compressed pangenome index
Confirmed Presenter: Vikram Shivakumar, Johns Hopkins University, United States
Track: HiTSeq

Room: 517d
Format: In Person

Authors List: Show

  • Vikram Shivakumar, Vikram Shivakumar, Johns Hopkins University
  • Omar Ahmed, Omar Ahmed, Johns Hopkins University
  • Sam Kovaka, Sam Kovaka, Johns Hopkins University
  • Mohsen Zakeri, Mohsen Zakeri, Johns Hopkins University
  • Ben Langmead, Ben Langmead, Johns Hopkins University

Presentation Overview:Show

Improvements in nanopore sequencing necessitate efficient classification methods, including pre-filtering and adaptive sampling algorithms that enrich for reads of interest. Signal-based approaches circumvent the computational bottleneck of basecalling. But past methods for signal-based classification do not scale efficiently to large, repetitive references like pangenomes, limiting their utility to partial references or individual genomes. We introduce Sigmoni: a rapid, multiclass classification method based on the r-index that scales to references of hundreds of Gbps. Sigmoni quantizes nanopore signal into a discrete alphabet of picoamp ranges. It performs rapid, approximate matching using matching statistics, classifying reads based on distributions of picoamp matching statistics and co-linearity statistics, all in linear query time without the need for seed-chain-extend. Sigmoni is 10-100× faster than previous methods for adaptive sampling in host depletion experiments with improved accuracy, and can query reads against large microbial or human pangenomes. Sigmoni is the first signal-based tool to scale to a complete human genome and pangenome while remaining fast enough for adaptive sampling applications.

July 13, 2024
14:40-15:00
Proceedings Presentation: Label-guided seed-chain-extend alignment on annotated De Bruijn graphs
Confirmed Presenter: Harun Mustafa, ETH Zurich, Switzerland
Track: HiTSeq

Room: 517d
Format: In Person

Authors List: Show

  • Harun Mustafa, Harun Mustafa, ETH Zurich
  • Mikhail Karasikov, Mikhail Karasikov, ETH Zurich
  • Nika Mansouri Ghiasi, Nika Mansouri Ghiasi, ETH Zurich
  • Gunnar Rätsch, Gunnar Rätsch, ETH Zürich
  • André Kahles, André Kahles, ETH Zurich

Presentation Overview:Show

Motivation: Exponential growth in sequencing databases has motivated scalable De Bruijn graph-based (DBG) indexing for searching these data, using annotations to label nodes with sample IDs. Low-depth sequencing samples correspond to fragmented subgraphs, complicating finding the long contiguous walks required for alignment queries. Aligners that target single-labelled subgraphs reduce alignment lengths due to fragmentation, leading to low recall for long reads. While some (e.g., label-free) aligners partially overcome fragmentation by combining information from multiple samples, biologically-irrelevant combinations in such approaches can inflate the search space or reduce accuracy.

Results: We introduce a new scoring model, multi-label alignment (MLA), for annotated DBGs. MLA leverages two new operations: To promote biologically-relevant sample combinations, Label Change incorporates more informative global sample similarity into local scores. To improve connectivity, Node Length Change dynamically adjusts the DBG node length during traversal. Our fast, approximate, yet accurate MLA implementation has two key steps: a single-label seed-chain-extend aligner (SCA) and a multi-label chainer (MLC). SCA uses a traditional scoring model adapting recent chaining improvements to assembly graphs and provides a curated pool of alignments.
MLC extracts seeds from SCA’s alignments, produces multi-label chains using MLA scoring, then finally forms multi-label alignments. We show via substantial improvements in taxonomic classification accuracy that MLA produces biologically-relevant alignments, decreasing average weighted UniFrac errors by 63.1–66.8% and covering 45.5–47.4% (median) more long-read query characters than state-of-the-art aligners. MLA’s runtimes are competitive with label-free alignment and substantially faster than single-label alignment.

Availability: https://github.com/ratschlab/mla.

July 13, 2024
15:00-15:20
Compressed Indexing for Pangenome Substring Queries
Confirmed Presenter: Stephen Hwang, XDBio Program, Johns Hopkins School of Medicine
Track: HiTSeq

Room: 517d
Format: In Person

Authors List: Show

  • Stephen Hwang, Stephen Hwang, XDBio Program
  • Nathaniel K. Brown, Nathaniel K. Brown, Department of Computer Science
  • Omar Y. Ahmed, Omar Y. Ahmed, Department of Computer Science
  • Katharine Jenike, Katharine Jenike, Department of Computer Science
  • Sam Kovaka, Sam Kovaka, Department of Computer Science
  • Michael C. Schatz, Michael C. Schatz, Department of Computer Science
  • Ben Langmead, Ben Langmead, Department of Computer Science

Presentation Overview:Show

Pangenomes are growing in number and size, thanks to the prevalence of high-quality long-read assemblies. However, current methods for studying sequence composition and conservation within pangenomes have limitations. Methods based on graph pangenomes require a computationally expensive multiple-alignment step, which can leave out some variation. Indexes based on k-mers and de Bruijn graphs are limited to answering questions at a specific substring length k. We present Maximal Exact Match Ordered (MEMO), a pangenome indexing method based on maximal exact matches (MEMs) between sequences. A single MEMO index can handle arbitrary-length queries over pangenomic windows. MEMO enables both queries that test k-mer presence/absence (membership queries) and that count the number of genomes containing k-mers in a window (conservation queries). MEMO's index for a pangenome of 89 human autosomal haplotypes fits in 2.04 GB, 8.8x smaller than a comparable KMC3 index and 11.4x smaller than a PanKmer index. MEMO indexes can be made smaller by sacrificing some counting resolution, with our decile-resolution HPRC index reaching 0.67 GB. MEMO can conduct a conservation query for 31-mers over the human leukocyte antigen locus in 13.89 seconds, 2.5x faster than other approaches. MEMO's small index size, lack of k-mer length dependence, and efficient queries make it a flexible tool for studying and visualizing substring conservation in pangenomes.

July 13, 2024
15:20-15:40
Sequence-to-graph alignment based copy number calling using a flow network formulation
Confirmed Presenter: Hugo Magalhães, Institute for Medical Biometry and Bioinformatics, Medical Faculty
Track: HiTSeq

Room: 517d
Format: In Person

Authors List: Show

  • Hugo Magalhães, Hugo Magalhães, Institute for Medical Biometry and Bioinformatics
  • Timofey Prodanov, Timofey Prodanov, Institute for Medical Biometry and Bioinformatics
  • Jonas Weber, Jonas Weber, Institute of Medical Microbiology and Hospital Hygiene
  • Gunnar Klau, Gunnar Klau, Algorithmic Bioinformatics
  • Tobias Marschall, Tobias Marschall, Institute for Medical Biometry and Bioinformatics

Presentation Overview:Show

Variation of copy number (CN) between individuals has been associated with phenotypic differences. Consequently, CN calling is an important step for disease association and identification, as well as in genome assembly. Traditionally, sequencing reads are mapped to a linear reference genome, after which CN is estimated based on observed read depth. This approach, however, leads to inconsistent CN assignments and is hampered by sequences not represented in a linear reference. To address this issue, we propose a method for CN calling with respect to a graph genome using a flow network formulation.
The tool processes read alignments to any bidirected genome graph, and calculates CN probabilities for every node according to the Negative Binomial distribution and total base pair coverage across the node. Integer linear programming is then employed to find a maximum likelihood flow through the graph, resulting in CN predictions for each node. This way, the method achieves consistent CN assignments across the graph.
The proposed method is capable of processing a wide variety of input graphs and read mappings from different sequencing technologies. We processed reads aligned to a Verkko assembly graph for HG02492 (HGSVC) using high coverage mixed HiFi and ONT-UL reads in under 2 hours using one thread and <2Gb peak memory. For 18% nodes, the method produced different CN values than those expected from read depth alone, showcasing how the graph topology informs CN assignment. Further applications include CN assignment as part of diploid/polyploid (pan)genome assembly workflows.

July 13, 2024
15:40-16:00
Targeted genotyping of complex polymorphic genes using short and long reads
Confirmed Presenter: Timofey Prodanov, Institute for Medical Biometry and Bioinformatics, Heinrich Heine University
Track: HiTSeq

Room: 517d
Format: In Person

Authors List: Show

  • Timofey Prodanov, Timofey Prodanov, Institute for Medical Biometry and Bioinformatics
  • Tobias Marschall, Tobias Marschall, Institute for Medical Biometry and Bioinformatics

Presentation Overview:Show

The human genome contains numerous highly polymorphic loci, rich in tandem repeats and structural variants. There, read alignments are often ambiguous and unreliable, resulting in hundreds of disease-associated genes being inaccessible for accurate variant calling. In such regions, structural variant callers show limited sensitivity, k-mer based tools cannot exploit full linkage information of a sequencing read, and gene-specific methods cannot be easily extended to process more loci. Improved ability to genotype highly polymorphic genes can increase diagnostic power and uncover novel disease associations.
We present a targeted tool Locityper, capable of genotyping complex polymorphic loci using both short- and long-read whole genome sequencing, including error-prone ONT data. For each target, Locityper recruits WGS reads and aligns them to possible locus haplotypes (e.g. extracted from a pangenome). By optimizing read alignment, insert size, and read depth profiles across haplotypes, Locityper efficiently estimates the likelihood of each haplotype pair. This is achieved by solving integer linear programming problems or by employing stochastic optimization.
Across 256 challenging medically relevant loci and 40 HPRC Illumina datasets, 95% Locityper haplotypes were accurate (QV, Phred-scaled divergence, ≥33), compared to 27% accurate haplotypes, reconstructed from the phased NYGC call set. In leave-one-out (LOO) evaluation, Locityper produced 60% accurate haplotypes, a fraction that will increase with larger reference panels as >91% haplotypes were very close (ΔQV≤5) to best available haplotypes. Overall, 82% 1KGP trio haplotypes were concordant. Finally, across 36 HLA genes LOO Locityper correctly predicted protein product in 94% cases, outperforming the specialized HLA-genotyper T1K at 78%.

July 13, 2024
16:40-17:00
VISTA: An integrated framework for structural variant discovery
Confirmed Presenter: Varuni Sarwal, UCLA, United States
Track: HiTSeq

Room: 517d
Format: In Person

Authors List: Show

  • Varuni Sarwal, Varuni Sarwal, UCLA
  • Seungmo Lee, Seungmo Lee, UCLA
  • Jianzhi Yang, Jianzhi Yang, USC
  • Sriram Sankararaman, Sriram Sankararaman, UCLA
  • Mark Chaisson, Mark Chaisson, USC
  • Eleazar Eskin, Eleazar Eskin, UCLA
  • Serghei Mangul, Serghei Mangul, USC

Presentation Overview:Show

Structural variation (SV), refers to insertions, deletions, inversions, and duplications in human genomes. With advances in whole genome sequencing (WGS) technologies, a plethora of SV detection methods have been developed. However, dissecting SVs from WGS data remains a challenge, with the majority of SV detection methods prone to a high false-positive rate, and no existing method able to precisely detect a full range of SVs present in a sample. Here, we report an integrated structural variant calling framework, VISTA (Variant Identification and Structural Variant Analysis) that leverages the results of individual callers using a novel and robust filtering and merging algorithm. In contrast to existing consensus-based tools which ignore the length and coverage, VISTA overcomes this limitation by executing various combinations of top-performing callers based on variant length and genomic coverage to generate SV events with high accuracy. We evaluated the performance of VISTA on comprehensive gold-standard datasets across varying organisms and coverage. We benchmarked VISTA using the Genome-in-a-Bottle (GIAB) gold standard SV set, haplotype-resolved de novo assemblies from The Human Pangenome Reference Consortium (HPRC), along with an in-house PCR-validated mouse gold standard set. VISTA maintained the highest F1 score among top consensus-based tools measured using a comprehensive gold standard across both mouse and human genomes.In conclusion, VISTA represents a significant advancement in structural variant calling, offering a robust and accurate framework that outperforms existing consensus-based tools and sets a new standard for SV detection in genomic research.

July 13, 2024
17:00-18:00
Invited Presentation: Long-read sequencing and pangenome perspective of structural variation
Track: HiTSeq

Room: 517d
Format: In Person

Authors List: Show

  • Evan Eichler, Evan Eichler, University of Washington