Attention Presenters - please review the Presenter Information Page available here
Schedule subject to change
All times listed are in EDT
Saturday, July 13th
10:40-11:00
HPC-AI Support for Singapore’s Bioinformaticians and Computational Biologists
Confirmed Presenter: Shoba Ranganathan, NSCC, Australia

Room: 524c
Format: In Person


Authors List: Show

  • Shoba Ranganathan, NSCC, Australia

Presentation Overview: Show

NSCC Singapore was established in 2015 as a National Research Infrastructure and manages Singapore’s first national petascale facility with high-performance computing (HPC) resources. NSCC supports the HPC and Artificial Intelligence (AI) needs of the Singapore research community, accelerating innovative solutions, exemplified by select biological and biomedical case studies.

11:00-11:20
Traversing the mouse-human interface with a knowledge graph of analytic and data services
Confirmed Presenter: Robyn Ball, The Jackson Laboratory, USA

Room: 524c
Format: In Person


Authors List: Show

  • Robyn Ball, The Jackson Laboratory, USA
  • Matthew Gerring, The Jackson Laboratory, USA
  • Alexander Berger, The Jackson Laboratory, USA
  • Molly Bogue, The Jackson Laboratory, USA
  • Hongping Liang, The Jackson Laboratory, USA
  • David Walton, The Jackson Laboratory, USA
  • Vivek Philip, The Jackson Laboratory, USA
  • Erich Baker, Belmont University, USA
  • Elissa Chesler, The Jackson Laboratory, USA

Presentation Overview: Show

Functional genomics has generated a wealth of gene regulatory information across species and research has shown that variants can be identified within each species that have similar effects on orthologous targets. We developed a knowledge graph across data resources and analytical services for cross-species analysis that includes GenomeMUSter (https://muster.jax.org), the Mouse Phenome Database (https://phenome.jax.org/), the meta-analysis server integrated with GenomeMUSter and the extensive collection of mouse phenotypic measurements in MPD, GeneWeaver (https://geneweaver.org), and VariantGraph db.

- GenomeMUSter is comprehensive and uniformly dense mouse variant resource comprised of imputed and measured allelic state data for 657 mouse strains at 106.8M sites
- The Mouse Phenome Database (MPD) is an NIH-recognized Biomedical Data Repository, curated and annotated with community standard ontologies, and integrated with a suite of analytical tools, including the meta-analysis server.
- GeneWeaver is a curated repository of genomic experimental results and ontology resources with an analytical tool suite that enables users to perform cross-species integrated functional genomics
- VariantGraph database is a Neo4j graph comprised of 32B relations among mouse and human variants, transcripts, genes, and regulatory peaks that enables evidenced-based identification of variants and genes with similar effects on orthologous targets

We will demo the knowledge graph and illustrate approaches to identify and characterize mouse-human effects with orthologous targets using motivating examples of cross-species multi-population multi-trait analytical approaches

11:20-11:40
Network analyses for functional annotation with FunCoup tools
Confirmed Presenter: Erik Sonnhammer, Stockholm University, Sverige

Room: 524c
Format: In Person


Authors List: Show

  • Erik Sonnhammer, Stockholm University, Sverige
  • Davide Buzzao, Stockholm University, Sverige

Presentation Overview: Show

The FunCoup database (https://FunCoup.org) provides comprehensive functional association networks of genes/proteins that were inferred by integrating massive amounts of multi-omics data, combined with orthology transfer. We will showcase how its unique online tools can be used to gain functional insights and answer scientific questions with network biology.

11:40-12:20
Advances towards comprehensive and accurate whole genome analysis at scale using DRAGEN accelerated algorithms
Confirmed Presenter: Rami Mehio

Room: 524c
Format: In Person


Authors List: Show

  • Rami Mehio

Presentation Overview: Show

Research and medical genomics require comprehensive and scalable solutions to drive
the discovery of novel disease targets, evolutionary drivers, and genetic markers with
clinical significance. This necessitates a framework to identify all types of variants
independent of their size (e.g., SNV/SV) or location (e.g., repeats). Here we present
DRAGEN that utilizes novel methods based on multigenomes mapping, hardware
acceleration, and machine learning based variant detection to provide novel insights
into individual genomes with ~30min computation time (from raw reads to variant
detection). DRAGEN outperforms all other state-of-the-art methods in speed and
accuracy across all variant types (SNV, indel, STR, SV, CNV) and further incorporates
specialized methods to obtain key insights in medically relevant genes (e.g., HLA, SMN,
GBA). We showcase DRAGEN across 3,202 genomes and demonstrate its scalability,
accuracy, and innovations to further advance the integration of comprehensive
genomics for research and medical applications.

14:20-14:40
Enhancing Clinical Trial Outcomes with AI-based Predictive Biomarker Discovery via Contrastive Learning
Confirmed Presenter: Etai Jacob

Room: 524c
Format: In Person


Authors List: Show

  • Etai Jacob

Presentation Overview: Show

Modern clinical trials capture numerous clinicogenomic measurements. Manual discovery of predictive biomarkers is challenging. We introduce a framework which explores predictive biomarkers in a systematic and unbiased manner using contrastive learning. Applied to real data, our framework found biomarkers identifying IO-treated individuals who survive longer than those treated with chemotherapy.

14:40-15:00
Miqa: Automating bioinformatics testing, evaluation and validation for real-time performance data & instant bug detection on every code change
Confirmed Presenter: Venus Lau, Magna Labs, USA

Room: 524c
Format: In person


Authors List: Show

  • Venus Lau, Magna Labs, USA

Presentation Overview: Show

Evaluation of bioinformatics pipeline performance (accuracy, robustness, and consistency) is critical both for developing and optimizing top-class algorithms, and for proving and maintaining the quality and reproducibility of these tools. Bioinformatics data is large, complex, and challenging to test, and many teams lack the time and resources to test effectively and efficiently, slowing development and introducing downstream risks and maintenance burdens.

Miqa is a biologist-friendly software quality assurance (QA) platform that can automate bioinformatic tool validation, benchmarking or troubleshooting as frequently as every code change. It allows you to build custom tests & benchmarking metrics for any data type, and is agnostic to programming language (Python, R, C++, etc.), workflow & containerization technologies (Nextflow, Snakemake, Docker) and cloud/execution platforms. We will demonstrate how to easily set-up automated software tests, customize QA metrics and generate interactive reports for comprehensive bioinformatic evaluation within minutes, on common data types like BAM, FASTQ, VCF, BED, and CSV/TSV/JSON, as well as custom pipeline outputs, and how it can be applied to a variety of technology types and disciplines.

15:00-15:20
UniProt: The Universal Protein resource: new features, access and tools for protein data
Confirmed Presenter: Aurélien Luciani, EMBL-EBI, United Kingdom

Room: 524c
Format: In Person


Authors List: Show

  • Aurélien Luciani, EMBL-EBI, United Kingdom

Presentation Overview: Show

The Universal Protein resource – UniProt – after its more than 20 years of existence, is now a
fundamental component in the bioinformatics and molecular biology community, providing a
comprehensive, high-quality, and freely accessible resource of protein sequence and functional
information. Recognizing the continuous evolution in data analysis needs and technology, and the
exponential growth of biological data, UniProt has undergone a significant update to enhance its
interface, functionalities, and overall user experience. This presentation aims to introduce these
transformative changes to the participants of the conference.
Our presentation will:
- Showcase the updated look and improved navigational functionalities of the new UniProt
website.
- Detail the advancements in the API that facilitate more efficient data retrieval and
integration.
- Highlight the enhanced data visualization tools that provide intuitive insights into protein
functions, interactions, and structures.
- Demonstrate the optimized processes for data export and sharing, which cater to both
academic and industrial research needs.
- Engage with both new and veteran UniProt users to gather feedback and discuss potential
future enhancements, helping us define a development roadmap that is based on user feedback

15:40-16:00
Accelerating Bioinformatics Workflows with Interactive High-Performance Computing
Confirmed Presenter: Camilo Buscaron

Room: 524c
Format: In person


Authors List: Show

  • Camilo Buscaron

Presentation Overview: Show

Bioinformatics research continues to advance at an increasing scale with the help of techniques such as next-generation high-throughput sequencing algorithms, computational mass spectrometry, computational biophysics and the availability of tools to support automation of bioinformatics processes and workloads. With this growth, a large amount of biological data gets accumulated at an unprecedented rate, demanding high-performance and high-throughput computing technologies for processing such datasets. The use of hardware accelerators, and massively parallel heterogeneous compute systems accelerates the processing of big data in high-performance computing environments. Enabling higher degrees of parallelism to be achieved, thereby increasing computational throughput. In this talk, we will discuss the state of the art architectures enabling the acceleration and growth of computational biology workloads and algorithms.

Monday, July 15th
10:40-11:00
Utilizing Pre-Treatment Lab Values & Whole-Lung Radiomics for Modeling Survival Risk for ICB in the mNSCLC Setting
Confirmed Presenter: Kedar Patwardhan, AstraZeneca, United States

Room: 524c
Format: In Person

Moderator(s): Jennifer Kelly


Authors List: Show

  • Kedar Patwardhan, AstraZeneca, United States

Presentation Overview: Show

At AstraZeneca Oncology Data Science, we committed to unlock the potential of AI/ML-driven data science. Here we demonstrate that pre-treatment clinical lab values and non-invasive CT imaging features can be used to model survival risk in the mNSCLC setting. This is an important step towards improving patient access to Immunotherapy.

11:00-11:20
Enhancing Genomic Research through National Collaboration: The Role of Canada's National Data Platform
Confirmed Presenter: Felipe Pérez-Jvostov

Room: 524c
Format: In Person

Moderator(s): Jennifer Kelly


Authors List: Show

  • Felipe Pérez-Jvostov

Presentation Overview: Show

National data infrastructure is a critical enabler of Canada’s genomic research and community-driven collaboration. The success of such infrastructure is dependent on its capacity to address growing demands for data and its ability to enable a diverse range of resource-intensive computational activities. This presentation will focus on the challenges and opportunities of establishing such national data infrastructure in the Canadian context, and highlight the importance of data interoperability across domains and data types to fuel research and innovation in genomic science and beyond.

11:20-11:40
The Missense3D portal: Structure-based evaluation of missense variants including protein complexes and transmembrane regions
Confirmed Presenter: Alessia David, Imperial College London, United Kingdom

Room: 524c
Format: Live Stream

Moderator(s): Jennifer Kelly


Authors List: Show

  • Gordon Hanna, Imperial College London, United Kingdom
  • Tarun Khanna, Imperial College London, United Kingdom
  • Cecilia Pennica, Imperial College London, United Kingdom
  • Suhail Islam, Imperial College London, United Kingdom
  • Michael Sternberg, Imperial College London, United Kingdom
  • Alessia David, Imperial College London, United Kingdom

Presentation Overview: Show

Missense3D (http://missense3d.bc.ic.ac.uk/) predicts the impact of missense variants on protein structure and reports their structural impact e.g. burial of charged residues. A user can assess the impact of a variant on a monomeric structure, including its transmembrane region, or on a protein complex. Missense3D accepts any structure, including AlphaFold models.

11:40-12:00
Integrated Pathway/Genome/Omics Informatics in Pathway Tools and BioCyc
Confirmed Presenter: Suzanne Paley

Room: 524c
Format: In Person

Moderator(s): Jennifer Kelly


Authors List: Show

  • Suzanne Paley

Presentation Overview: Show

An overview of the BioCyc website and Pathway Tools software suite, which features an extensive array of capabilities covering genome informatics, pathway informatics, regulatory informatics, and omics data analysis. Several new capabilities will be demonstrated, including multi-omics visualization tools, a new genome browser, and the comparative genome dashboard.

12:00-12:20
CATH and TED: Protein structure classification in the age of AI
Confirmed Presenter: Nicola Bordin

Room: 524c
Format: In Person

Moderator(s): Jennifer Kelly


Authors List: Show

  • Nicola Bordin

Presentation Overview: Show

CATH, now up-to-date with the Protein Data Bank, created with the group of David Jones at UCL the TED resource, classifying over 200m domains from AFDBv4 within the CATH classification and identified over 7k novel folds. TED offers community access via a dedicated web resource, facilitating data visualization and downloads.

14:20-14:40
GPCRVS – a machine learning system for GPCR drug discovery
Confirmed Presenter: Paulina Dragan

Room: 524c
Format: In Person

Moderator(s): Jennifer Kelly


Authors List: Show

  • Paulina Dragan
  • Dorota Latek

Presentation Overview: Show

The number of GPCR structures in PDB and their active ligands has
recently become sufficient to apply machine learning in the compound
activity recommendation systems for drug design. GPCRVS [1] is an
efficient machine learning system [2, 3, 4] for the online assessment of
the compound activity against several GPCR targets, including peptide
and protein-binding GPCRs, the most difficult for virtual screening [3].
GPCRVS evaluates compounds in terms of their activity range,
pharmacological effects, and binding modes. GPCRVS evaluates compounds
ranging from classical small molecules to short peptides. Results of
activity class assignment and binding affinity prediction are provided
in comparison with known active ligands of each GPCR receptor type. A
multi-class classification in GPCRVS, handling incomplete and fuzzy
biological data, was validated on ChEMBL-retrieved training data sets
for class B GPCRs and chemokine CC and CXC receptors. Acknowledgments:
National Science Centre in Poland (2020/39/B/NZ2/00584).

Availability: https://gpcrvs.chem.uw.edu.pl

References:

[1] D. Latek, K. Prajapati, M. Merski, P. Dragan, P. Osial. GPCRVS – a
machine learning system for GPCR drug discovery, submitted.

[2] P. Dragan, K. Joshi, A. Atzei, D. Latek Keras/TensorFlow in Drug
Design for Immunity Disorders. Int. J. Mol. Sci. 2023, 24, 15009.

[3] P. Dragan, M. Merski, S. Wisniewski, S.G. Sanmukh, D. Latek
Chemokine Receptors - Structure-Based Virtual Screening Assisted by
Machine Learning. Pharmaceutics 2023, 15(2), 516.

[4] M. Mizera, D. Latek. Ligand-receptor interactions and machine
learning in GCGR and GLP-1R drug discovery. Int. J. Mol. Sci. 2021,
22(8), 4060.

Author: Dorota Latek

14:40-15:00
Modelling multi-omic, real-world data reveals immunogenomic drivers of resistance to cancer immunotherapy
Confirmed Presenter: Martin Miller, AstraZeneca, United Kingdom

Room: 524c
Format: In person

Moderator(s): Jennifer Kelly


Authors List: Show

  • Martin Miller, AstraZeneca, United Kingdom

Presentation Overview: Show

At AstraZeneca’s Oncology Data Science, we committed to unlock the potential of AI/ML-driven data science. Here, we model clinical endpoints together with >10.000 of DNA and RNA profiled tumour samples from patients progressing on immune checkpoint blockade (ICB). We uncover that the post-ICB tumour microenvironment is fundamentally different in acquired vs primary resistance. At AstraZeneca’s Oncology Data Science, we committed to unlock the potential of AI/ML-driven data science. Here, we model clinical endpoints together with >10.000 of DNA and RNA profiled tumour samples from patients progressing on immune checkpoint blockade (ICB). We uncover that the post-ICB tumour microenvironment is fundamentally different in acquired vs primary resistance.

15:00-15:20
Decoding the grammar of DNA using Natural Language Processing
Room: 524c
Format: In person

Moderator(s): Jennifer Kelly


Authors List: Show

  • Sonika Tyagi

Presentation Overview: Show

DNA is the blueprint defining all living organisms. Therefore, understanding the nature and
function of DNA is at the core of all biological studies. Rapid advances in DNA sequencing
and computing technologies over the past few decades resulted in large quantities of DNA
generated for diverse experiments, exceeding the growth of all major social media platforms
and astronomy data combined. However, biological data is both complex and
high-dimensional, and is difficult to analyse with conventional methods.
Machine learning is naturally well suited to problems with a large volume of data and
complexity. In particular, applying Natural Language Processing to the genome is
intuitive, since DNA is a natural language. Unique challenges exist in Genome-NLP over
natural languages, including the difficulty of word segmentation or corpus comparison.
To tackle these challenges, we developed the first automated and open-source genomeNLP
workflow that enables efficient and accurate knowledge extraction on biological data,
automating and abstracting preprocessing steps unique to biology. This lowers the barrier to
perform knowledge extraction by both machine learning practitioners and computational
biologists.

15:20-15:40
Transform Healthcare and Life Sciencewith Biomedical Foundation modelsand Quantum computing
Confirmed Presenter: Filippo Utro, IBM Research, United States

Room: 524c
Format: In Person

Moderator(s): Jennifer Kelly


Authors List: Show

  • Filippo Utro, IBM Research, United States

Presentation Overview: Show

In the recent years, foundation models (FM) and quantum computing (QC) in healthcare and life science have sparked significant interests. This talk explores the latest effort of IBM Research in FM and QC aiming to accelerate discovery in healthcare and life science. I will delve into 3 different FMs that we are developing to accelerate drug discovery and in different efforts on QC in particular Quantum Machine Learning as a powerful tool discussing some of its application spanning from genomics to diagnostics in medical research. We also will discuss technical challenges, envisioning the new era of FM and QC in healthcare and life science.

15:40-16:00
Ontologic: developing and deploying tools for collaborative computational biology
Room: 524c
Format: In person

Moderator(s): Jennifer Kelly


Authors List: Show

  • Eli Pollock