The SciFinder tool lets you search Titles, Authors, and Abstracts of talks and panels. Enter your search term below and your results will be shown at the bottom of the page. You can also click on a track to see all the talks given in that track on that day.

View Talks By Category

Scroll down to view Results

July 14, 2025
July 15, 2025
July 20, 2025
July 21, 2025
July 22, 2025
July 23, 2025
July 24, 2025

Results

July 24, 2025
8:40-9:40
Invited Presentation: Pangenome based analysis of structural variation
Confirmed Presenter: Tobias Marschall, Heinrich Heine University, Germany
Track: HiTSeq: High Throughput Sequencing Algorithms & Applications

Room: 01A
Format: In person
Moderator(s): Francisco De La Vega


Authors List: Show

  • Tobias Marschall, Tobias Marschall, Heinrich Heine University

Presentation Overview:Show

Breakthroughs in long-read sequencing technology and assembly methodology enable the routine de novo assembly of human genomes to near completion. Such assemblies open a door to exploring structural variation (SV) in previously inaccessible regions of the genome. The Human Pangenome Reference Consortium (HPRC) and the Human Genome Structural Variation Consoritum (HGSVC) have produced high quality genome assemblies, which provide a basis for comparative genome analysis using pangenome graphs.

First, we will ask how a pangenomic resource like this can be leveraged in order to better analyze structural variants in samples with short-read whole-genome sequencing (WGS) data. In a process called genome inference, implemented in the PanGenie software, we can use a pangenome reference to infer the haplotype sequences of individual genomes to a quality clearly superior to standard variant calling workflows. This process allows us to detect more than twice the number of structural variants per genome from short-read WGS and therefore provides an opportunity for genome-wide association studies to include these SVs.

Second, we introduce Locityper, a tool specifically designed for targeted genotyping of complex loci using short and long-read whole genome sequencing. For each target, Locityper recruits and aligns reads to local haplotypes and finds the likeliest haplotype pair by optimizing read alignment, insert size and read depth profiles. Locityper accurately genotypes up to 194 of 256 challenging medically relevant loci (95% haplotypes at QV33), an 8.8-fold gain compared to 22 genes achieved with standard variant calling pipelines. Furthermore, Locityper provides access to hyperpolymorphic HLA genes and other gene families, including KIR, MUC and FCGR.

July 24, 2025
9:40-10:00
Invited Presentation: Resolving Paralogues and Multi-Copy Genes with Nanopore Long-Read Sequencing
Confirmed Presenter: Sergey Nurk
Track: HiTSeq: High Throughput Sequencing Algorithms & Applications

Room: 01A
Format: In person
Moderator(s): Francisco De La Vega


Authors List: Show

  • Sergey Nurk
July 24, 2025
11:20-11:40
Proceedings Presentation: GreedyMini: Generating low-density DNA minimizers
Confirmed Presenter: Arseny Shur, Bar Ilan University, Israel
Track: HiTSeq: High Throughput Sequencing Algorithms & Applications

Room: 01A
Format: In person

Authors List: Show

  • Shay Golan, Shay Golan, Reichman University and University of Haifa
  • Ido Tziony, Ido Tziony, Bar-Ilan University
  • Matan Kraus, Matan Kraus, Bar-Ilan University
  • Yaron Orenstein, Yaron Orenstein, Bar-Ilan University
  • Arseny Shur, Arseny Shur, Bar Ilan University

Presentation Overview:Show

Motivation:
Minimizers are the most popular k-mer selection scheme in algorithms and data structures analyzing high-throughput sequencing (HTS) data. In a minimizer scheme, the smallest k-mer by some predefined order is selected as the representative of a sequence window containing w consecutive k-mers, which results in overlapping windows often selecting the same k-mer. Minimizers that achieve the lowest frequency of selected k-mers over a random DNA sequence, termed the expected density, are desired for improved performance of HTS analyses. Yet, no method to date exists to generate minimizers that achieve minimum expected density. Moreover, for k and w values used by common HTS algorithms and data structures there is a gap between densities achieved by existing selection schemes and the theoretical lower bound.

Results:
We developed GreedyMini, a toolkit of methods to generate minimizers with low expected or particular density, to improve minimizers, to extend minimizers to larger alphabets, k, and w, and to measure the expected density of a given minimizer efficiently. We demonstrate over various combinations of k and w values, including those of popular HTS methods, that GreedyMini can generate DNA minimizers that achieve expected densities very close to the lower bound, and both expected and particular densities much lower compared to existing selection schemes. Moreover, we show that GreedyMini's k-mer rank-retrieval time is comparable to common k-mer hash functions. We expect GreedyMini to improve the performance of many HTS algorithms and data structures and advance the research of k-mer selection schemes.

July 24, 2025
11:40-12:00
Proceedings Presentation: Exploiting uniqueness: seed-chain-extend alignment on elastic founder graphs
Confirmed Presenter: Nicola Rizzo, University of Helsinki, Finland
Track: HiTSeq: High Throughput Sequencing Algorithms & Applications

Room: 01A
Format: In person

Authors List: Show

  • Nicola Rizzo, Nicola Rizzo, University of Helsinki
  • Manuel Cáceres, Manuel Cáceres, Aalto University
  • Veli Mäkinen, Veli Mäkinen, University of Helsinki

Presentation Overview:Show

Sequence-to-graph alignment is a central challenge of computational pangenomics. To overcome the theoretical hardness of the problem, state-of-the-art tools use seed-and-extend or seed-chain-extend heuristics to alignment. We implement a complete seed-chain-extend alignment workflow based on indexable elastic founder graphs (iEFGs) that support linear-time exact searches unlike general graphs.
We show how to construct iEFGs, find high-quality seeds, chain, and extend them at the scale of a telomere-to-telomere assembled human chromosome.
Our sequence-to-graph alignment tool and the scripts to replicate our experiments are available in https://github.com/algbio/SRFAligner.

July 24, 2025
12:00-12:20
FroM Superstring to Indexing: a space-efficient index for unconstrained k-mer sets using the Masked Burrows-Wheeler Transform (MBWT)
Confirmed Presenter: Ondřej Sladký, Charles University, Czechia
Track: HiTSeq: High Throughput Sequencing Algorithms & Applications

Room: 01A
Format: In person

Authors List: Show

  • Ondřej Sladký, Ondřej Sladký, Charles University
  • Pavel Veselý, Pavel Veselý, Charles University
  • Karel Brinda, Karel Brinda, INRIA/IRISA Rennes

Presentation Overview:Show

The exponential growth of DNA sequencing data calls for efficient solutions for storing and querying large-scale k-mer sets. While recent indexing approaches use spectrum-preserving string sets (SPSS), full-text indexes, or hashing, they often impose structural constraints or demand extensive parameter tuning, limiting their usability across different datasets and data types. Here, we propose FMSI, a minimally parametrized, highly space-efficient membership index and compressed dictionary for arbitrary k-mer sets. FMSI combines approximated shortest superstrings with the Masked Burrows-Wheeler Transform (MBWT). Unlike traditional
methods, FMSI operates without predefined assumptions on k-mer overlap patterns but exploits them when available. We demonstrate that FMSI offers superior memory efficiency for processing queries over established indexes such as SSHash, Spectral Burrows-Wheeler Transform (SBWT), and Conway-Bromage-Lyndon (CBL), while supporting fast membership and dictionary queries. Depending on the dataset, k, or sampling, FMSI offers 2–3x space savings for processing queries over all state-of-the-art indexes; only a space-optimized SBWT (without indexing reverse complement) matches its memory efficiency in some cases but is 2–3x slower. Overall, this work establishes superstring-based indexing as a highly general, flexible, and scalable
approach for genomic data, with direct applications in pangenomics, metagenomics, and large-scale genomic databases.

July 24, 2025
12:20-12:40
The Alice assembler: dramatically accelerating genome assembly with MSR sketching
Confirmed Presenter: Roland Faure, Institut Pasteur, Paris
Track: HiTSeq: High Throughput Sequencing Algorithms & Applications

Room: 01A
Format: In person

Authors List: Show

  • Roland Faure, Roland Faure, Institut Pasteur
  • Jean-François Flot, Jean-François Flot, Université libre de Bruxelles
  • Dominique Lavenier, Dominique Lavenier, CNRS / IRISA

Presentation Overview:Show

The PacBio HiFi technology and the R10.4 Oxford Nanopore flowcells are transforming the genomic world by producing for the first time long and accurate sequencing reads. The low error rate of these reads opens new venues for computational optimizations. However, genome and particularly metagenome assembly using high-fidelity reads still faces challenges. Current assemblers (e.g., Flye, hifiasm, metaMDBG) struggle to efficiently resolve highly similar haplotypes (homologous chromosomes, bacterial strains, repeats) while maintaining computational speed, creating a gap between rapid and haplotype-resolved methods.

We investigated this issue using on several dataset including a human gut microbiome sequencing and a diploid, finding that hifiasm_meta and metaFlye required over a month of CPU time to produce an assembly, while metaMDBG, which collapses similar strains, assembles the same dataset in four days.

We present Alice, a new assembler which introduces a new sequence sketching method called MSR sketching to bridge this gap and produce efficiently haplotype-resolved assemblies, for both genomic and metagenomic datasets. On the aforementioned human gut dataset, Alice completed the assembly in just 7 CPU hours. Furthermore, the analysis of the assemblies revealed that Alice missed <1% of abundant 31-mers (≥20x coverage), compared to >15% missed by both metaMDBG and hifiasm_meta.

Overall, our results indicate that Alice accelerates assembly dramatically while providing high quality assemblies, offering a powerful new tool for the field.

July 24, 2025
12:40-13:00
BINSEQ: A Family of High-Performance Binary Formats for Nucleotide Sequences
Confirmed Presenter: Noam Teyssier, Arc Institute, United States
Track: HiTSeq: High Throughput Sequencing Algorithms & Applications

Room: 01A
Format: In person

Authors List: Show

  • Noam Teyssier, Noam Teyssier, Arc Institute
  • Alexander Dobin, Alexander Dobin, Arc Institute

Presentation Overview:Show

Modern genomics routinely generates billions of sequencing records per run, typically stored as gzip-compressed FASTQ files. This format's inherent limitations—single-threaded decompression and sequential parsing of irregularly sized records—create significant bottlenecks for bioinformatics applications that would benefit from parallel processing. We present BINSEQ, a family of simple binary formats designed for high-throughput parallel processing of sequencing data. The family includes BINSEQ, optimized for fixed-length reads with true random access capability through two-bit encoding, and VBINSEQ, supporting variable-length sequences with optional quality scores and block-based organization. Both formats natively handle paired-end reads, eliminating the need for synchronized files. Our comprehensive evaluation demonstrates that BINSEQ formats deliver substantial performance improvements across bioinformatics workflows while maintaining competitive storage efficiency. Both formats achieve up to 32x faster processing than compressed FASTQ and continue to scale with increasing thread counts where traditional formats quickly plateau due to I/O bottlenecks. These advantages extend to complex workflows like alignment, with BINSEQ formats showing 2-5x speedups at higher thread counts when tested with tools like minimap2 and STAR. Storage requirements remain comparable to or better than existing formats, with BINSEQ (610.35 MB) similar to gzip-compressed FASTA (647.29 MB) and VBINSEQ (509.89 MB) approaching CRAM (491.85 MB) efficiency. To facilitate adoption, we provide high-performance libraries, parallelization APIs, and conversion tools as free, open-source implementations. BINSEQ addresses fundamental inefficiencies in genomic data processing by considering modern parallel computing architectures.

July 24, 2025
14:00-14:20
Proceedings Presentation: Ultrafast and Ultralarge Multiple Sequence Alignments using TWILIGHT
Confirmed Presenter: Yu-Hsiang Tseng, University of California San Diego, United States
Track: HiTSeq: High Throughput Sequencing Algorithms & Applications

Room: 01A
Format: In person

Authors List: Show

  • Yu-Hsiang Tseng, Yu-Hsiang Tseng, University of California San Diego
  • Sumit Walia, Sumit Walia, University of California San Diego
  • Yatish Turakhia, Yatish Turakhia, University of California San Diego

Presentation Overview:Show

Motivation: Multiple sequence alignment (MSA) is a fundamental operation in bioinformatics, yet existing MSA tools are struggling to keep up with the speed and volume of incoming data. This is because the runtimes and memory requirements of current MSA tools become untenable when processing large numbers of long input sequences and they also fail to fully harness the parallelism provided by modern CPUs and GPUs.
Results: We present TWILIGHT (Tall and Wide Alignments at High Throughput), a novel MSA tool optimized for speed, accuracy, scalability, and memory constraints, with both CPU and GPU support. TWILIGHT incorporates innovative parallelization and memory-efficiency strategies that enable it to build ultralarge alignments at high speed even on memory-constrained devices. On challenging datasets, TWILIGHT outperformed all other tools in speed and accuracy. It scaled beyond the limits of existing tools and performed an alignment of 1 million RNASim sequences within 30 minutes while utilizing less than 16 GB of memory. TWILIGHT is the first tool to align over 8 million publicly available SARS-CoV-2 sequences, setting a new standard for large-scale genomic alignment and data analysis.
Availability: TWILIGHT’s code is freely available under the MIT license at https://github.com/TurakhiaLab/TWILIGHT. The test datasets and experimental results, including our alignment of 8 million SARS-CoV-2 sequences, are available at https://zenodo.org/records/14722035.

July 24, 2025
14:20-14:40
Proceedings Presentation: CREMSA: Compressed Indexing of (Ultra) Large Multiple Sequence Alignments
Confirmed Presenter: Mikaël Salson, CRIStAL, UMR 9189 Université de Lille
Track: HiTSeq: High Throughput Sequencing Algorithms & Applications

Room: 01A
Format: In person

Authors List: Show

  • Mikaël Salson, Mikaël Salson, CRIStAL
  • Arthur Boddaert, Arthur Boddaert, Université de Lille
  • Awa Bousso Gueye, Awa Bousso Gueye, Université de Lille
  • Laurent Bulteau, Laurent Bulteau, CNRS - Université Gustave Eiffel
  • Yohan Hernandez-Courbevoie, Yohan Hernandez-Courbevoie, Université de Lille
  • Camille Marchet, Camille Marchet, CNRS
  • Nan Pan, Nan Pan, LIX - Ecole Polytechnique
  • Sebastian Will, Sebastian Will, Ecole Polytechnique
  • Yann Ponty, Yann Ponty, CNRS/LIX

Presentation Overview:Show

Recent viral outbreaks motivate a systematic collection of pathogenic genomes, including a strong focus on genomic RNA, in order to accelerate their study and monitor the apparition/spread of variants. Due to their limited length and temporal proximity of their collection, viral genomes are usually organized, and analyzed as oversized Multiple Sequence Alignments (MSAs). Such MSAs are largely ungapped, and mostly homogeneous on a column-wise level but not at a sequential level due to local variations, hindering the performances of sequential compression algorithms.

In order to enable an efficient manipulation of MSAs, including subsequent statistical analyses, we introduce CREMSA (Column-wise Run-length Encoding for Multiple Sequence Alignments), a new index that builds on sparse bitvector representations to compress an existing or streamed MSA, all the while allowing for an expressive set of accelerated requests to query the alignment without prior decompression.

Using CREMSA, a 65GB MSA consisting of 1.9M SARS-CoV 2 genomes could be compressed into 22MB using less than half a gigabyte of main memory, while supporting access requests in the order of 100ns. Such a speed up enables a comprehensive analysis of covariation over this very large MSA. We further assess the impact of the sequence ordering on the compressibility of MSAs and propose a resorting strategy that, despite the proven NP-hardness of an optimal sort, induces greatly increased compression ratios at a marginal computational cost.

July 24, 2025
14:40-15:00
Proceedings Presentation: LYCEUM: Learning to call copy number variants on low coverage ancient genomes
Confirmed Presenter: Mehmet Alper Yilmaz, Bilkent University, Turkey
Track: HiTSeq: High Throughput Sequencing Algorithms & Applications

Room: 01A
Format: In person

Authors List: Show

  • Mehmet Alper Yilmaz, Mehmet Alper Yilmaz, Bilkent University
  • Ahmet Arda Ceylan, Ahmet Arda Ceylan, Bilkent University
  • Gun Kaynar, Gun Kaynar, Carnegie Mellon University
  • A. Ercument Cicek, A. Ercument Cicek, Bilkent University

Presentation Overview:Show

Motivation: Copy number variants (CNVs) are pivotal in driving phenotypic variation that facilitates species adaptation. They are significant contributors to various disorders, making ancient genomes crucial for uncovering the genetic origins of disease susceptibility across populations. However, detecting CNVs in ancient DNA (aDNA) samples poses substantial challenges due to several factors: (i) aDNA is often highly degraded; (ii) contamination from microbial DNA and DNA from closely related species introduce additional noise into sequencing data; and finally, (iii) the typically low coverage of aDNA renders accurate CNV detection particularly difficult.
Conventional CNV calling algorithms, which are optimized for high coverage read-depth signals, underperform under such conditions.
Results: To address these limitations, we introduce LYCEUM, the first machine learning-based CNV caller for aDNA. To overcome challenges related to data quality and scarcity, we employ a two-step training strategy. First, the model is pre-trained on whole genome sequencing data from the 1000 Genomes Project, teaching it CNV-calling capabilities similar to conventional methods. Next, the model is fine-tuned using high-confidence CNV calls derived from only a few existing high-coverage aDNA samples. During this stage, the model adapts to making CNV calls based on the downsampled read depth signals of the same aDNA samples. LYCEUM achieves accurate detection of CNVs even in typically low-coverage ancient genomes. We also observe that the segmental deletion calls made by LYCEUM show correlation with the demographic history of the samples and
exhibit patterns of negative selection inline with natural selection.
Availability: LYCEUM is available at https://github.com/ciceklab/LYCEUM.

July 24, 2025
15:00-15:20
POPSICLE: a probabilistic method to capture uncertainty in single-cell copy-number calling
Confirmed Presenter: Lucrezia Patruno, University College London Cancer Institute, Cancer Research UK Lung Cancer Centre of Excellence
Track: HiTSeq: High Throughput Sequencing Algorithms & Applications

Room: 01A
Format: In person

Authors List: Show

  • Lucrezia Patruno, Lucrezia Patruno, University College London Cancer Institute
  • Sophia Chirrane, Sophia Chirrane, University College London Cancer Institute
  • Simone Zaccaria, Simone Zaccaria, University College London Cancer Institute

Presentation Overview:Show

During tumour evolution, cancer cells acquire somatic copy-number alterations (CNAs), that are frequent genomic alterations resulting in the amplification or deletion of large genomic regions. Recent single-cell technologies allow the accurate investigation of CNA rates and their underlying mechanism by performing whole-genome sequencing of thousands of individual cancer cells in parallel (scWGS-seq). While several methods have been developed to identify the most likely CNAs from scWGS-seq data, the high levels of variability in these data make the accurate inference of point estimates for CNAs (i.e., a single value for the most likely copy number) challenging. Moreover, given that variability increases with increasing copy numbers, this is especially true when considering high amplifications and highly aneuploid cells, which play a key role in cancer. However, to date existing methods are limited to the inference of point estimates for CNAs in single cells and do not capture their related uncertainty.
To address these limitations we introduce POPSICLE, a novel probabilistic approach that computes the probability of having different copy numbers for every genomic region in each single cell. Using simulations, we show that POPSICLE improves ploidy and CNA inference for up to 20% of the genome in 90% of cells. Using a dataset comprising more than 60,000 of breast and ovarian cancer cells, we show how POPSICLE leverages uncertainty to improve the identification of genes that are recurrently highly amplified and might play a key role in tumour progression.

July 24, 2025
15:20-15:40
MutSuite: A Toolkit for Simulating and Evaluating Mutations in Aligned Sequencing Reads
Confirmed Presenter: Kendell Clement, University of Utah, United States
Track: HiTSeq: High Throughput Sequencing Algorithms & Applications

Room: 01A
Format: In person

Authors List: Show

  • Kendell Clement, Kendell Clement, University of Utah

Presentation Overview:Show

Simulated sequencing reads containing known mutations are essential for developing, testing, and benchmarking mutation detection tools. Most existing simulation tools introduce mutations into synthetic reads and then realign them to a reference genome prior to downstream analysis. However, this realignment step can obscure the true position of insertions and deletions, introducing ambiguity and potential error in evaluation. In particular, the alignment process can shift the apparent location of insertions and deletions, complicating efforts to assess recall and precision of variant callers.
To address this limitation and support the development of more accurate and sensitive mutation detection algorithms, we developed MutSim, a tool that introduces substitutions, insertions, and deletions directly into aligned reads (e.g., in BAM files). By avoiding realignment, MutSim ensures that each simulated mutation remains at its exact specified position, enabling precise evaluation of variant caller performance.
MutSim is part of a larger toolkit we call MutSuite, which also includes MutRun, a companion tool that automates the execution of variant calling software on simulated datasets, and MutAgg, which aggregates and summarizes results across multiple variant callers for performance comparison. Together, these tools provide a robust and flexible framework for mutation simulation and benchmarking. MutSuite is open-source and freely available at: https://github.com/clementlab/mutsuite.

July 24, 2025
15:40-16:00
Landscape of The Dark Genome’s variants and their influence on cancer
Confirmed Presenter: Joao P. C. R. Mendonca, Rigshospitalet, Denmark
Track: HiTSeq: High Throughput Sequencing Algorithms & Applications

Room: 01A
Format: In person

Authors List: Show

  • Joao P. C. R. Mendonca, Joao P. C. R. Mendonca, Rigshospitalet
  • Kristoffer Staal Rohrberg, Kristoffer Staal Rohrberg, Rigshospitalet
  • Peter Holst, Peter Holst, Hervolution Therapeutics
  • Frederik Otzen Bagger, Frederik Otzen Bagger, Rigshospitalet

Presentation Overview:Show

Human endogenous retroviruses (HERVs) are remnants of ancient viral infections that now make up ~8% of the human genome. Although typically silenced, HERVs can become reactivated in cancer and are emerging as biomarkers and immunotherapeutic targets. However, their clinical utility is limited by challenges in resolving individual loci due to high sequence similarity, incomplete genome annotations, and an overreliance on linear reference genomes. To address this, we constructed a variational pangenome using long-read sequencing data from Genome in a Bottle and the Platinum Pedigree projects. This approach enables accurate detection of single nucleotide variants (SNVs), insertions/deletions (indels), and structural variants (SVs) in a reference-free manner, revealing polymorphic HERV insertions absent from the human reference genome. By integrating data from the Copenhagen Prospective Personalized Oncology (CoPPO) biobank, we link these variants to HERV expression in cancer, distinguishing potentially pathogenic variants from benign ones. We combine pangenome-informed annotations with locus-specific expression quantification tools to resolve HERV transcription at individual loci and connect specific sequence variants to tumorigenesis and immune modulation. Our findings enhance the resolution of HERV mapping across individuals and cancer types, uncovering previously inaccessible variation in a historically overlooked portion of the genome. This work not only improves our understanding of HERV-driven disease mechanisms but also lays the groundwork for variant-informed biomarker discovery and therapeutic targeting in precision oncology.