July 14, 2024: ISMB 2024: Day 3 Highlights and Recap

July 14: ISMB Day 3 Highlights and Recap

Day 3 of ISMB 2024 started before all the hustle and bustle for some of our attendees who rose bright and early to join the Serene Stretch Symposium in Level 2, Hall Viger.

The rest of the day kept everyone busy with 10 COSI tracks, 2 special sessions, and the Success Circle networking event

that allowed those in attendance to connect with thought leaders in the field.

Keynote Address

Our Day 3 keynote address was given by Dr. Guillaume Bourque with his talk entitled, “Human Genome 2.0: Why a Pangenome Graph is Better for Genetic and Epigenetic Analyses.” His talk started with an overview of the motivation for genomic analyses, emphasizing the limitations encountered in mapping reads to a reference genome due to genetic diversity within populations, which can lead to sequences being ignored or incorrectly mapped, ultimately introducing reference bias.

To address the limitations, Dr. Bourque discussed the use of pangenome graphs that represent the shared segments of individual genomes and capture both common sequences and structural variants. Despite the human genome being mostly linear, these graphs encode genetic diversity and provide a more comprehensive representation.

Dr. Bourque emphasized that the timing for adopting genome graphs is ideal due to advancements in long-read sequencing technology, improved genome assembly methods, and the recent release of the first draft human pangenome reference. This reference, containing 47 phased, diploid assemblies from diverse individuals, has revealed significant additional genomic content and variation in haplotype structure. The pangenome approach has shown improvements in variant calling, reducing errors in small variant discovery by 34% and boosting structural variant detection by 104% when using short reads.

The keynote address also covered the application of pangenome graphs in epigenetic analyses, particularly in ChIP-seq studies. Additionally, the pangenome approach has identified millions of additional CpGs, expanding our understanding of the epigenome. Dr. Bourque concluded by highlighting the opportunities presented by the new pangenome reference, including exploring new biological questions, developing novel algorithms, and building tools for annotation and visualization on graphs, all while reducing bias in genomic analyses.

Thanks to Genome Canada for sponsoring the Bioinformatics in Canada Special Session at ISMB 2024!

Session Recaps

BioInfo-Core

The bioinfo-core COSI brings together managers and staff working in bioinformatics core facilities around the world. In our first full day session, we had a mix of presentations, panel discussions, and breakout groups.

Talks:

Swapnil Sawant from Phoenix Bioinformatics spoke about the comprehensive modernization of TAIR (the arabadopsis information resource, a website and database with 600k users globally). They were starting with legacy technology more than 20 years old, where changes took a long time and maintenance costs were high. By keeping the interface pretty similar and moving to a new technology platform, they were able to improve performance a lot for users and make things easier and cheaper to maintain.
Francesco Lescai gave an excellent introduction and overview about nextflow, the problems it helps solve for cores, gave us a feel for what it looks like, and discussed the nf-core community.
Nikhil Kumar offered another view of nextflow as they use it at Memorial Sloan Kettering Cancer Center, where they have shareable nextflow modules within their center, how they set this up and how they use it to good effect.
Dena Leshkowitz spoke about UTAP2, a popular and user friendly pipeline allowing non-computational users to easily perform transcriptomic and epigenomic data processing and analysis.
Grace Pigeau discussed the massive amount of data being generated at OICR and some of their approaches and strategies to manage and remove or store this data once they have processed and generated the results.
In a rather memorable and unusual rhyming talk, George Bell discussed the development of command line linux-style scripts and tools they write to allow their end users to perform different types of downstream analysis in R, etc. Users who might not want to bother learning a whole language are still willing to run a script on the command line.
Patricia Carvajal Lopez spoke about the Bioinformatics core facility competency framework, an effort to better define the role of a bioinformatics core facility scientist at three different levels plus a managerial level using competencies and knowledge, skills, and attributes.
Michael Laszloffy showed and discussed Dimsum, a dashboard for quality control, project tracking, turnaround time reporting, and more. This allows OICR to better keep track of the status of all their projects and keep on top of their turnaround time and progress, set priorities, and better communicate with end users.
Aliye Hashemi presented about protein classification using delaunay tessellation, a way of representing a protein using points in 3D space. That data is then put into a neural network in order to classify proteins.

Panels:

AI/LLMs in cores: what are we doing now?
- Panelists: Dexter Pratt, Nancy Li, Michelle Brazas, Dukka KC
- This was a really excellent and wide-ranging discussion on everything from training users on how to use AI and LLMs, using generative AI to help develop training materials, incorporating AI chat agents into web interfaces, how to set up the infrastructure so that users can access and query various models on their own without too much hassle, open source vs. commercial models, etc. We all wrote down many notes to take home and explore further.
New technologies in cores
- Panelists: Madelaine Gogol, Lorena Pantano Rubino
- How do cores manage to find the time and resources to be able to tackle new technologies? How do they avoid the situation that the first person to do some new type of project has to pay for all the development time? Perhaps more questions were raised than answered. One suggestion was that the first pilot project gets a discount on the sequencing and the project then becomes a collaborative project resulting in a publication. There was some discussion at what point you treat a project as a one-off and at what point it starts becoming part of a pipeline.

Breakout groups:
We broke into three breakout groups based on what the people in the room were interested in discussing.

Pipelines – nextflow was quite popular, but they discussed how transition to using a workflow tool does require some time and effort. You also must compare and evaluate how this new approach compares to what you have done before.
Cost recovery models and management – basically it’s challenging. Some cores use a combination of techniques or approaches. Some charge by the hour, but it’s not really fair to clients who are first to try a new type of project. Different analysts also may take different amounts of time on the same type of project. There is also a fixed cost model for a particular type of project, but there is a risk that you estimate the cost wrong. Some people estimate a minimum number of hours required and if the project goes over, then an additional discussion is needed with the collaborator.
Managing big data – What data actually DOES need to be saved, and what data needs to be made FAIR? What data can you upload to repositories for longer term storage? In some groups, every project actually has a data management plan, but the compliance with that plan is not always there. iRODs was mentioned as a solution, but it wasn’t an option for other reasons so they went with a second choice, which was more DIY. Starfish was mentioned as a commercial solution. Some places have a data steward, which is a role helping guide people through what metadata they need, what they should store where (locally or cloud) and for how long, etc.

BioVis

We started the day with a keynote from Melanie Tory (The Roux Institute, Northeastern University). She gave an engaging and insightful talk titled “When Visualization Meets AI: Exploring Opportunities.” After the keynote, Zeynep H. Gümüş (Icahn School of Medicine at Mount Sinai) presented work titled “PRIMAVO: Precision Immune Monitoring Assay Visualization Online,” for which she later received the Best Abstract Award. The first session was then closed by Eric Mörth (Harvard Medical School), who presented a talk on mixed reality titled “The Best of Both Worlds: Blending Mixed Reality and 2D Displays in a Hybrid Approach for Visual Analysis of 3D Tissue Maps.”

After lunch, we had a series of seven abstract talks. The session was opened by Sehi L'Yi (Harvard Medical School), who presented a talk "Understanding Visualization Authoring for Genomics Data through User Interviews." After that, Suzanne Paley (SRI International) introduced New BioCyc Visualization Tools for Genome Exploration and Comparison. Yannis Nevers (Université de Lausanne) discussed "Matreex: Compact and Interactive Visualization of Large Gene Families" and Hiruna Samarakoon (University of New South Wales) showcased the "Interactive Visualization of Raw Nanopore Signal Data with Squigualiser." Towards the end of the session, Devin Lange (University of Utah) gave two presentations: "Aggregate Annotated Single-Cell Heatmap Visualizations" and "Aardvark: Composite Visualizations of Trees, Time-Series, and Images." Finally, Komlan Atitey (National Institutes of Health) wrapped up the session with "Boosting Data Interpretation with GIBOOST to Enhance Visualization of High-Dimensional Data."

In the last session, we enjoyed an online proceedings presentation by Xiaocheng Zeng from Tsinghua University, titled "Unveil Cis-acting Combinatorial mRNA Motifs by Interpreting Deep Neural Network." Last but not least, Fritz Lekschas gave his keynote talk on “The Insight's in the Details: Challenges and Opportunities for BioVis Software Tools”. We then wrapped up the day with a closing ceremony where the Best Abstract Award was presented to Osho Rawal, Edgar Gonzalez-Kozlova, Sacha Gnjatic, and Zeynep H. Gümüş for their work on PRIMAVO.

iRNA

Many iRNA participants had dinner at Restaurant Les Soeurs Grises where in addition to sampling Québécois cuisine, they were also subjected to our traditional iRNA quiz measuring their depth (or lack) of knowledge of Montreal. The second day of iRNA started with a keynote by Chris Burge from MIT who presented the KATMAP algorithm, an approach that uses in vitro binding data and knockdown/RNA-seq to derive splicing activity maps of RNA binding proteins to enable advanced interpretation of RNA splicing. We then had diverse talks, considering diverse organisms, discussing RNA foundational models and their uses, RNA design and the prediction of different RNA features and regulations including translation initiation, protein-RNA-binding, the extent and conservation of translational efficiency covariation, polyadenylation, G4 RNA presence and RNA structure. We finished the day with our fourth keynote talk by Jérôme Waldispühl from McGill University who presented a data-driven structural virtual screening pipeline which uses an augmented classification of RNA base pairs combined with graph machine learning methods to identify promising candidate molecule binding.The bioinfo-core COSI brings together managers and staff working in bioinformatics core facilities around the world. In our first full day session, we had a mix of presentations, panel discussions, and breakout groups.

NetBio

We are excited to share some highlights from today’s NetBio afternoon session at ISMB2024! The session kicked off with an inspiring keynote by Sergio Baranzini on "Towards Semantic Representation and Causal Inference in Biomedicine: Challenges and Applications."

We had one proceeding (doi: 10.1093/bioinformatics/btae250) and three selected talks covering various network biology methods and applications, including modlling metastatic progression, network inference for neuropsychiatric disorders, single-cell based network inference, and community detection.

Thanks to all the speakers and attendees for the great discussions. We are looking forward to a full day tomorrow, including the first part of our poster session. Stay tuned!

RegSys

Day 1, July 13, 2024:

We launched RegSys 2024 with a Keynote presented by Carl de Boer from The University of British Columbia where he shared his insights on “Continual improvement of cis-regulatory models”, in particular how a DREAM challenge has helped determine the best cis regulatory modeling choices. His team also developed an API to enable continuous benchmarking of models and tasks. One important note was made on how to avoid data leakage caused by sequence similarities across the genomes: sometimes our models are not learning, they are memorizing data. We then had two selected talks by Peter Koo and Melanie Weilert where they presented tools to interpret large scale deep learning models to identify regulatory mechanisms. Peter showed his lab’s work on tools focused on solving this problem, and Melanie showed how these tools can be applied to learn novel biological insights. We closed the morning session with five amazing flash talks representing the high-level poster session put together by the RegSys Program committee.

The second session of the day was opened by a keynote by Jian Ma from Carnegie Mellon University, presenting his lab’s work on how 3D features modulate gene function (structure/function), and how we can use 3D data to analyze single cell Hi-C data and separate sub cell types. We closed this session with two selected and one proceedings talks by Bill Noble, Gabriel Dolsten and Ghulam Murtaza, respectively. They presented methods to analyze bulk and single Hi-C data, showing how low-coverage Hi-C data can be compensated and the application of loop identification to characterize differences between closely related cell types.

Finally, the last session of the day was kicked off with two selected talks by Emily Maciejewski and Mirae Kim, where both presenters showed their work on detection of methylation sites across species (CMImpute) and across tissues (UBERON plus Minipatch), highlighting the relevance of genome annotation and data curation. We closed this session with a Keynote by Michael Hoffman from Princess Margaret Cancer Centre, presenting strategies to predict transcription factor binding from expression, cross species conservation, open chromatin and motif data. This Virtual ChIP-seq work is highly relevant, as it is an impossible task to generate all ChIP-seq data for all tissues and cell types.

The Success Circle event this evening was a hit! This unique take on traditional thought-leader sessions, designed to foster meaningful connections and facilitate knowledge sharing among attendees, was an inspiring evening for all involved!

Bioninformatics.ca Celebrates 25 Years!

On Saturday night, Bioinformatics.ca celebrated 25 years of hosting workshops and building up bioinformatics across Canada with a reception allowing members, new and old, to celebrate together at ISMB 2024! Congratulations on 25 great years, Bioinformatics.ca. Cheers to 25 more!

Coming Up Tomorrow, Monday, July 15

7:30am – Serene Stretch Symposium
8:40am – Welcome and Day 4 Distinguished Keynote: Dr. Martin Steinegger
10:00am and 4:00pm – Caffeinate and Connect with Exhibitors
12:20pm – Poster Session with Lunch
Sessions starting at 10:40am:
- 3DSIG
- BOSC
- CAMDA
- Equity and Diversity in Computational Biology Research (ends at 12:20pm)
- EvolCompGen
- General Computational Biology
- MLCSB
- NetBio
- Tech Track (ends at 4:00pm)
- VarI
- WEB (ends at 4:00pm)
Sessions starting at 4:40pm:
- TransMed
6:15pm – ISCB Town Hall