The SciFinder tool lets you search Titles, Authors, and Abstracts of talks and panels. Enter your search term below and your results will be shown at the bottom of the page. You can also click on a track to see all the talks given in that track on that day.

View Talks By Category

Scroll down to view Results

July 12, 2024
July 13, 2024
July 14, 2024
July 15, 2024
July 16, 2024

Results

July 14, 2024
10:40-11:40
Invited Presentation: Why and how long reads are used to improve gene isoform quantification
Confirmed Presenter: Kin Au
Track: HiTSeq

Room: 517d

Authors List: Show

  • Kin Au
July 14, 2024
11:40-12:00
Telomere-to-telomere assembly by preserving contained reads
Confirmed Presenter: Sudhanva Shyam Kamath, Indian Institute of Science, Bangalore
Track: HiTSeq

Room: 517d
Format: Live Stream

Authors List: Show

  • Sudhanva Shyam Kamath, Sudhanva Shyam Kamath, Indian Institute of Science
  • Mehak Bindra, Mehak Bindra, Indian Institute of Science
  • Debnath Pal, Debnath Pal, Indian Institute of Science
  • Chirag Jain, Chirag Jain, Indian Institute of Science

Presentation Overview:Show

Automated telomere-to-telomere (T2T) de novo assembly of diploid and polyploid genomes remains a formidable task. A string graph is a commonly used assembly graph representation in the overlap-based algorithms. The string graph formulation employs graph simplification heuristics, which drastically reduce the count of vertices and edges. One of these heuristics involves removing the reads contained in longer reads. However, this procedure is not guaranteed to be safe. In practice, it occasionally introduces gaps in the assembly by removing all reads covering one or more genome intervals. The factors contributing to such gaps remain poorly understood. In this work, we mathematically derived the frequency of observing a gap near a germline and a somatic heterozygous variant locus. Our analysis shows that (i) an assembly gap due to contained read deletion is an order of magnitude more frequent in Oxford Nanopore reads than PacBio HiFi reads due to differences in their read-length distributions, and (ii) this frequency decreases with an increase in the sequencing depth. Drawing cues from these observations, we addressed the weakness of the string graph formulation by developing the RAFT assembly algorithm. RAFT fragments reads and produces a more uniform read-length distribution. The algorithm retains spanned repeats in the reads during the fragmentation. We empirically demonstrate that RAFT significantly reduces the number of gaps using simulated datasets. Using real Oxford Nanopore and PacBio HiFi datasets of the HG002 human genome, we achieved a twofold increase in the contig NG50 and the number of haplotype-resolved T2T contigs compared to Hifiasm.

July 14, 2024
12:00-12:20
Rawsamble: Overlapping and Assembling Raw Nanopore Signals using a Hash-based Seeding Mechanism
Confirmed Presenter: Can Firtina, ETH Zurich, Switzerland
Track: HiTSeq

Room: 517d
Format: In Person

Authors List: Show

  • Can Firtina, Can Firtina, ETH Zurich
  • Maximilian Mordig, Maximilian Mordig, Max Planck Institute for Intelligent Systems
  • Joël Lindegger, Joël Lindegger, ETH Zurich
  • Harun Mustafa, Harun Mustafa, ETH Zurich
  • Sayan Goswami, Sayan Goswami, ETH Zurich
  • Stefano Mercogliano, Stefano Mercogliano, ETH Zurich
  • Yan Zhu, Yan Zhu, University of Toronto
  • Andre Kahles, Andre Kahles, ETH Zurich
  • Onur Mutlu, Onur Mutlu, ETH Zurich

Presentation Overview:Show

Although raw nanopore signal mapping to a reference genome is widely studied to achieve highly accurate and fast mapping of raw signals, mapping to a reference genome is not possible when the corresponding reference genome of an organism is either unknown or does not exist. To circumvent such cases, all-vs-all overlapping is performed to construct de novo assembly from overlapping information. However, such an all-vs-all overlapping of raw nanopore signals remains unsolved due to its unique challenges such 1) generating multiple and accurate mapping pairs per read, 2) performing similarity search between a pair of noisy raw signals, and 3) performing space- and compute-efficient operations for portability and real-time analysis.

We introduce Rawsamble, the first mechanism that can quickly and accurately find overlaps between raw nanopore signals without translating them to bases. We find that Rawsamble can 1) find overlaps while meeting the real-time requirements with throughput on average around 200,000 bp/sec, 2) share a large portion of overlapping pairs with minimap2 (37.12% on average), and 3) lead to constructing long assemblies from these useful overlaps. Finding overlapping pairs from raw signals is critical for enabling new directions that have not been explored before for raw signal analysis, such as de novo assembly construction from overlaps that we explore in this work. We believe these overlaps can be useful for many other new directions coupled with real-time analysis.

Rawsamble is integrated in RawHash and available at https://github.com/CMU-SAFARI/RawHash.

July 14, 2024
14:20-14:40
Proceedings Presentation: Conway-Bromage-Lyndon (CBL): an exact, dynamic representation of k-mer sets
Confirmed Presenter: Igor Martayan, Univ Lille, France
Track: HiTSeq

Room: 517d
Format: In Person

Authors List: Show

  • Igor Martayan, Igor Martayan, Univ Lille
  • Bastien Cazaux, Bastien Cazaux, Univ. Lille
  • Antoine Limasset, Antoine Limasset, CNRS
  • Camille Marchet, Camille Marchet, CNRS

Presentation Overview:Show

In this paper, we introduce the Conway-Bromage-Lyndon (CBL) structure, a compressed, dynamic and exact method for
representing k-mer sets. Originating from Conway and Bromage’s concept, CBL innovatively employs the smallest cyclic
rotations of k-mers, akin to Lyndon words, to leverage lexicographic redundancies. In order to support dynamic operations
and set operations, we propose a dynamic bit vector structure that draws a parallel with Elias-Fano’s scheme. This
structure is encapsulated in a Rust library, demonstrating a balanced blend of construction efficiency, cache locality, and
compression. Our findings suggest that CBL outperforms existing k-mer set methods, particularly in dynamic scenarios.
Unique to this work, CBL stands out as the only known exact k-mer structure offering in-place set operations. Its different
combined abilities positions it as a flexible Swiss knife structure for k-mer set management.

July 14, 2024
14:40-15:00
Proceedings Presentation: Learning Locality-Sensitive Bucketing Functions
Confirmed Presenter: Xin Yuan, Pennsylvania State University, United States
Track: HiTSeq

Room: 517d
Format: In Person

Authors List: Show

  • Xin Yuan, Xin Yuan, Pennsylvania State University
  • Ke Chen, Ke Chen, Pennsylvania State University
  • Xiang Li, Xiang Li, Pennsylvania State University
  • Qian Shi, Qian Shi, Pennsylvania State University
  • Mingfu Shao, Mingfu Shao, Pennsylvania State University

Presentation Overview:Show

Many tasks in sequence analysis ask to identify biologically related sequences in a large set. Edit distance is widely used in these tasks as a measure. To avoid all-vs-all pairwise comparisons and save on expensive edit distance computations, locality-sensitive bucketing (LSB) functions have been proposed. Formally, a (d1,d2)-LSB function sends sequences into multiple buckets with the guarantee that pairs of sequences of edit distance at most d1 can be found within a same bucket while those of edit distance at least d2 do not share any. LSB functions generalize the locality-sensitive hashing (LSH) functions and admit favorable properties, making them potentially ideal solutions to the above problem. But constructing LSB functions for practical use is scarcely possible. In this work, we aim to utilize machine learning techniques to train LSB functions. With the development of a novel loss function and insights in the neural network structures that can extend beyond this specific task, we obtained LSB functions that exhibit nearly perfect accuracy for certain (d1,d2). Comparing to the state-of-the-art method OMH, the trained LSB functions achieve a 2- to 5-fold improvement on the sensitivity of recognizing similar sequences. An experiment on analyzing erroneous cell barcode data is also included to demonstrate the application of the trained LSB functions.

July 14, 2024
15:00-15:20
Proceedings Presentation: Fast Multiple Sequence Alignment via Multi-Armed Bandits
Confirmed Presenter: Kayvon Mazooji, University of Illinois Urbana-Champaign, United States
Track: HiTSeq

Room: 517d
Format: In Person

Authors List: Show

  • Kayvon Mazooji, Kayvon Mazooji, University of Illinois Urbana-Champaign
  • Ilan Shomorony, Ilan Shomorony, University of Illinois at Urbana-Champaign

Presentation Overview:Show

Multiple sequence alignment is an important problem in computational biology with applications that include phylogeny and the detection of remote homology between protein sequences. UPP is a popular software package that constructs accurate multiple sequence alignments for large datasets based on ensembles of Hidden Markov Models (HMMs). A computational bottleneck for this method is a sequence-to-HMM assignment step, which relies on the precise computation of probability scores on the HMMs. In this work, we show that we can speed up this assignment step significantly by replacing these HMM probability scores with alternative scores that can be efficiently estimated. Our proposed approach utilizes a Multi-Armed Bandit algorithm to adaptively and efficiently compute estimates of these scores. This allows us to achieve similar alignment accuracy as UPP with a significant reduction in computation time.

July 14, 2024
15:20-15:40
Contrasting and Combining Transcriptome Complexity Captured by Short and Long RNA Sequencing Reads
Confirmed Presenter: Seong Woo Han, University of Pennsylvania, United States
Track: HiTSeq

Room: 517d
Format: In Person

Authors List: Show

  • Seong Woo Han, Seong Woo Han, University of Pennsylvania
  • San Jewell, San Jewell, University of Pennsylvania
  • Andrei Thomas-Tikhonenko, Andrei Thomas-Tikhonenko, University of Pennsylvania
  • Yoseph Barash, Yoseph Barash, University of Pennsylvania

Presentation Overview:Show

High-throughput short-read RNA sequencing has given researchers unprecedented detection and quantification capabilities of splicing variations across biological conditions and disease states. However, short-read technology is limited in its ability to identify which isoforms are responsible for the observed sequence fragments and how splicing variations across a gene are related. In contrast, more recent long-read sequencing technology offers improved detection of underlying full or partial isoforms but is limited by high error rates and throughput, hindering its ability to accurately detect and quantify all splicing variations in a given condition.

To better understand the underlying isoforms and splicing changes in a given biological condition, it’s important to be able to combine the results of both short and long-read sequencing, together with the annotation of known isoforms. To address this need, we develop MAJIQ-L, a tool to visualize and quantify splicing variations from multiple data sources. MAJIQ-L combines transcriptome annotation, long reads based isoform detection tools output, and MAJIQ (Vaquero-Garcia et al. (2016, 2023)) based short-read RNA-Seq analysis of local splicing variations (LSVs). We analyze which splice junction is supported by which type of evidence (known isoforms, short-reads, long-reads), followed by the analysis of matched short and long-read human cell line datasets. Our software can be used to assess any future long reads technology or algorithm, and combine it with short reads data for improved transcriptome analysis.

July 14, 2024
15:40-16:00
Quantum Computing for Genomic Analysis
Confirmed Presenter: Sergii Stre
Track: HiTSeq

Room: 517d
Format: In Person

Authors List: Show

  • James Bonfield, James Bonfield, Wellcome Sanger Institute
  • Tony Burdett, Tony Burdett, European Bioinformatics Institute
  • Peter Clapham, Peter Clapham, Wellcome Sanger Institute
  • Josh Cudby, Josh Cudby, University of Cambridge
  • Robert Davies, Robert Davies, Wellcome Sanger Institute
  • Richard Durbin, Richard Durbin, University of Cambridge
  • David Holland, David Holland, Wellcome Sanger Institute
  • Aditya Jain, Aditya Jain, University of Cambridge
  • James McCafferty, James McCafferty, Wellcome Sanger Institute
  • Yanisa Sunthornyotin, Yanisa Sunthornyotin, European Bioinformatics Institute
  • Andrew Whitwham, Andrew Whitwham, Wellcome Sanger Institute
  • Orson Ye, Orson Ye, University of Cambridge
  • David Yuan, David Yuan, European Bioinformatics Institute
  • Sergii Stre

Presentation Overview:Show

Many essential tasks in genomic analysis are extremely difficult for classical computers due to problems inherently hard to solve efficiently with classical (empirical) algorithms. Quantum computing offers novel possibilities with algorithmic techniques capable of achieving provable speedups over existing classical exact algorithms in large-scale genomic analyses. Our work utilizes PhiX174, SARS-CoV-2, and human genome data to explore quantum algorithms and data encoding techniques to pave the way for the analysis with better time and space efficiency.

We take a two-pronged approach:

1) Algorithm Development: We will design novel quantum algorithms for MSA subproblems and heuristic methods (QAOA) for de novo assembly.

2) Data Encoding and State Preparation: We develop efficient quantum circuits to encode genomic data and reduce the computational overhead with a variety of techniques, including tensor network methods. It facilitates data encoding into quantum states for Machine Learning applications.

Starting with the PhiX174 genome, we will test our quantum algorithms with provable theoretical speedup compared to classical methods. This allows us to scale the approach to larger and more complex genomes like SARS-CoV-2 and the human genome. We'll develop efficient encoding strategies and optimize quantum circuits to minimize resource needs for the current hardware. To test how noise sources that appear in a variety of hardware implementations affect the computation we are using recently-developed tensor network contraction methods for efficient small-scale classical simulation.

This project aims to identify problem settings where utilizing quantum computing will be the most beneficial to unlocking the vast potential of genomics in healthcare. By studying classical computational bottlenecks and developing ways to speed them up, we aim to achieve a deeper understanding of human health and pathogens.

July 14, 2024
16:40-17:00
Proceedings Presentation: Adaptive Digital Tissue Deconvolution
Confirmed Presenter: Franziska Görtler, Department of Oncology and Medical Physics, Haukeland University Hospital
Track: HiTSeq

Room: 517d
Format: In Person

Authors List: Show

  • Franziska Görtler, Franziska Görtler, Department of Oncology and Medical Physics
  • Malte Mensching-Buhr, Malte Mensching-Buhr, Department of Medical Bioinformatics
  • Ørjan Skaar, Ørjan Skaar, Computational Biology Unit
  • Stefan Schrod, Stefan Schrod, University Medical Center Göttingen
  • Thomas Sterr, Thomas Sterr, Institute of Theoretical Physics
  • Andreas Schäfer, Andreas Schäfer, Institute of Theoretical Physics
  • Tim Beissbarth, Tim Beissbarth, University Medicine Göttingen
  • Anagha Joshi, Anagha Joshi, Department of Clinical Science
  • Helena U. Zacharias, Helena U. Zacharias, University Medical Center Schleswig-Holstein; Kiel University
  • Sushma Nagaraja Grellscheid, Sushma Nagaraja Grellscheid, Computational Biology Unit
  • Michael Altenbuchinger, Michael Altenbuchinger, Department of Medical Bioinformatics

Presentation Overview:Show

Motivation: The inference of cellular compositions from bulk and spatial transcriptomics data increasingly complements data analyses. Multiple computational approaches were suggested and recently, machine learning techniques were developed to systematically improve estimates. Such approaches allow to infer additional, less abundant cell types. However, they rely on training data which do not capture the full biological diversity encountered in transcriptomics analyses; data can contain cellular contributions not seen in the training data and as such, analyses can be biased or blurred. Thus, computational approaches have to deal with unknown, hidden contributions. Moreover, most methods are based on cellular archetypes which serve as a reference; e.g., a generic T-cell profile is used to infer the proportion of T-cells. It is well known that cells adapt their molecular phenotype to the environment and that pre-specified cell archetypes can distort the inference of cellular compositions.
Results: We propose Adaptive Digital Tissue Deconvolution (ADTD) to estimate cellular proportions of pre-selected cell types together with possibly unknown and hidden background contributions. Moreover, ADTD adapts prototypic reference profiles to the molecular environment of the cells, which further resolves cell-type specific gene regulation from bulk transcriptomics data. We verify this in simulation studies and demonstrate that ADTD improves existing approaches in estimating cellular compositions. In an application to bulk transcriptomics data from breast cancer patients, we demonstrate that ADTD provides insights into cell-type specific molecular differences between breast cancer subtypes.
Availability and implementation: A python implementation of ADTD and a tutorial are available at https://doi.org/10.5281/zenodo.7548362 (doi:10.5281/zenodo.7548362).

July 14, 2024
17:00-17:20
Maximizing accuracy of cellular deconvolution. (ACeD)
Confirmed Presenter: Jonathan Bard, State University of New York at Buffalo, United States
Track: HiTSeq

Room: 517d
Format: In Person

Authors List: Show

  • Jonathan Bard, Jonathan Bard, State University of New York at Buffalo
  • Norma Nowak, Norma Nowak, State University of New York at Buffalo
  • Satrajit Sinha, Satrajit Sinha, State University of New York at Buffalo
  • Michael Buck, Michael Buck, State University of New York at Buffalo

Presentation Overview:Show

Bulk RNA-sequencing has been a mainstay for biomedical research since its inception. In cancer alone, the TCGA project has examined 33 cancer types with over 20,000 samples. Each sample has a wealth of patient information associated with it, from survival records to several data modalities including copy number, microbiome, methylation and transcriptomic profiling at the bulk tissue level. However, the challenge with bulk tissue profiling, like RNA-seq, is that the assay measures the average expression across all the cells in the sample, thus hiding cellular heterogeneity. Leveraging cellular deconvolution, these datasets can be used to infer cell type composition and molecular heterogeneity. However, accurate deconvolution is contingent upon using a high-quality single-cell reference dataset with proper cell-type cluster resolution. Therefore, there is a fundamental need for methodology to quantify single-cell dataset quality for deconvolution with optimization of cell-type cluster resolution. To address this challenge, we developed a novel computational strategy to identify the optimal cell-type clustering resolution that maximizes deconvolutional performance. Our R-based software package (ACeD) provides the research community with a valuable toolset to evaluate reference set quality and optimize data upstream of reference-based deconvolution algorithms, enhancing our analysis and understanding of the tumor microenvironment.

July 14, 2024
17:20-17:40
Evolution of genomic and epigenomic heterogeneity in prostate cancer from tissue and liquid biopsy
Confirmed Presenter: Marjorie Roskes, Weill Cornell Medicine, United States
Track: HiTSeq

Room: 517d
Format: In Person

Authors List: Show

  • Marjorie Roskes, Marjorie Roskes, Weill Cornell Medicine
  • Alexander Martinez Fundichely, Alexander Martinez Fundichely, Weill Cornell Medicine
  • Weiling Li, Weiling Li, Weill Cornell Medicine
  • Sandra Cohen, Sandra Cohen, Weill Cornell Medicine
  • Hao Xu, Hao Xu, McGill University
  • Shahd Elnaggar, Shahd Elnaggar, Barnard College
  • Anisha Tehim, Anisha Tehim, Cornell University
  • Metin Balaban, Metin Balaban, Princeton University
  • Chen Khuan Wong, Chen Khuan Wong, Memorial Sloan Kettering Cancer Center
  • Yu Chen, Yu Chen, Memorial Sloan Kettering Cancer Center
  • Ben Raphael, Ben Raphael, Princeton University
  • Ekta Khurana, Ekta Khurana, Weill Cornell Medicine

Presentation Overview:Show

Castration Resistant Prostate Cancer (CRPC) is an aggressive disease that is highly plastic. Although histologically there are two subtypes of CRPC: adenocarcinoma and neuroendocrine, we have shown it has four distinct molecular subtypes exhibiting differential chromatin and transcriptomic profiles. These are CRPC-AR (androgen receptor dependent), CRPC-WNT (Wnt pathway dependent), CRPC-SCL (stem-cell like), and CRPC-NE (neuroendocrine). During treatment with AR signaling inhibitors, patient tumors can evolve to different subtypes. Clinical identification of these subtypes and mechanistic understanding of the genomic and epigenomic heterogeneity accompanying this evolution is a huge challenge. To address this, we have amassed a unique cohort of 60 CRPC patients with various subtypes from whom cell-free DNA (cfDNA) was collected at various clinically relevant time points and whole-genome sequencing (WGS) was performed. For 24 of these patients, time-matched tissue RNA-seq was performed. We estimated epigenetic/transcriptomic heterogeneity in tissue by deconvolution of bulk RNA-seq data. We performed nucleosomal profiling from cfDNA WGS to infer tumor chromatin accessibility and estimate each epigenetic subtype’s fractional contribution. We can detect the different subtypes in cfDNA and find that CRPC-SCL patients exhibit more heterogeneity than other subtypes in both tissue and cfDNA, likely indicating the transitory state of this subtype. We calculated allele-specific, genome-wide copy number alterations in cfDNA, and can track the parallel evolution of genomic and epigenomic events, e.g. AR gains track with increasing CRPC-AR fraction over time. Our study shows that, beyond biomarker development, cfDNA WGS can be used for characterizing the epigenomic and genomic evolution of patient tumors.

July 14, 2024
17:40-18:00
Accurate and robust bootstrap inference of single-cell phylogenies by integrating sequencing read counts
Confirmed Presenter: Rija Zaidi, University College London Cancer Institute, United Kingdom
Track: HiTSeq

Room: 517d
Format: In Person

Authors List: Show

  • Rija Zaidi, Rija Zaidi, University College London Cancer Institute
  • Simone Zaccaria, Simone Zaccaria, University College London Cancer Institute

Presentation Overview:Show

Recent single-cell DNA sequencing (scDNA-seq) technologies have enabled the parallel investigation of thousands of individual cells. This is required for accurately reconstructing tumour evolution, during which cancer cells acquire a multitude of different genetic alterations. Although the evolutionary analysis of scDNA-seq datasets is complex due to their unique combination of errors and missing data, several methods have been developed to infer single-cell tumour phylogenies by integrating estimates of the false positive and false negative error rates. This integration relies on the assumption that errors are uniformly distributed both within and across cells. However, this assumption does not always hold; error rates depend on sequencing coverage, which is not constant within or across cells in a sequencing experiment due to, e.g., copy-number alterations and the replication status of a cell, limiting the accuracy of existing methods.

To address this challenge, we developed a novel single-cell phylogenetic method that integrates raw sequencing read counts into a statistical framework to robustly correct the errors and missing data. Specifically, our method includes bootstrapping to robustly correct for high error frequency genomic positions and a fast probabilistic heuristic based on hypothesis testing to distinguish the remaining errors from truly observed genotypes. We demonstrate the improved accuracy and robustness of our method compared to existing approaches across several simulation settings. To demonstrate its impact, we applied our method to 42,009 breast cancer cells and 19,905 ovarian cancer cells, revealing more accurate phylogenies consistent with larger genetic alterations.