Attention Presenters - please review the Presenter Information Page available here
Schedule subject to change
All times listed are in EDT
Sunday, July 14th
10:40-11:15
Building a Future-Proof Resource: The Comprehensive Modernization of TAIR
Confirmed Presenter: Swapnil Sawant, Phoenix Bioinformatics, United States

Room: 525
Format: Live Stream

Moderator(s): Madelaine Gogol


Authors List: Show

  • Swapnil Sawant, Phoenix Bioinformatics, United States
  • Trilok Prithvi, Phoenix Bioinformatics, United States
  • Tanya Z. Berardini, Phoenix Bioinformatics, United States
  • Leonore Reiser, Phoenix Bioinformatics, United States
  • Xingguo Chen, Phoenix Bioinformatics, United States
  • Kartik Khosa, Phoenix Bioinformatics, United States

Presentation Overview: Show

Since its inception in 1999, The Arabidopsis Information Resource (TAIR) has been an essential global resource for plant biologists, offering curated gene function data for Arabidopsis thaliana from peer-reviewed literature into a ""gold standard"" annotation set. Our recent modernization—the first major infrastructure enhancement since 2011—aims to improve TAIR's performance, scalability, and accessibility.

The legacy system, built on a monolithic architecture with large servers, had become costly and inefficient, leading to technical debt and frequent feature breakdowns. To address these issues and keep pace with technological advances, we've modernized TAIR's infrastructure by developing a robust data pipeline, adopting Docker for deployment, and transitioning to a microservices architecture. This includes using lightweight servers, AWS S3 for scalable storage, and Apache Solr for fast search capabilities.
These changes have significantly boosted performance, reducing load times by up to 50% and search times by 70%. This shift not only enhances user experience but also cuts costs by moving away from expensive, large-scale servers, solidifying TAIR's position as a cutting-edge platform for the foreseeable future.

Our presentation will detail the technical challenges and solutions of this upgrade, the integration of modern tools, and our strategies for virtual team collaboration, which has been crucial in this transition. The modernization of TAIR serves as a model for other organizations looking to update their digital resources, demonstrating how leveraging current technology can enhance efficiency, reduce expenses, and improve service delivery.

11:15-12:20
Panel: AI and LLMs in cores: How are we using them now?
Room: 525
Format: In person

Moderator(s): Madelaine Gogol


Authors List: Show

Presentation Overview: Show

A panel discussion around the adoption and use of LLMs and AI within bioinformatics core facilities or similar settings. What works, what doesn't? Practical advice and best practices.

14:20-14:50
Invited Presentation: Streamlining Bioinformatics Pipelines with Nextflow: A Scalable, Portable, Reproducible, and Collaborative Solution.
Confirmed Presenter: Francesco Lescai

Room: 525
Format: In Person

Moderator(s): Alberto Riva


Authors List: Show

  • Francesco Lescai

Presentation Overview: Show

Bioinformatics cores face significant challenges, especially when running pipelines across multiple computing environments, including portability, scalability, and reproducibility issues. Nextflow, an open-source workflow manager, offers a comprehensive solution to these pain points. This talk will present the key features and advantages of Nextflow, showcasing its ability to ease the deployment and execution of bioinformatics pipelines across diverse environments, from local clusters to cloud and high-performance computing infrastructures. By leveraging Nextflow's containerisation, cloud-native design, and automated resource management, researchers and core facilities can ensure seamless pipeline execution, reduce costs, and increase collaboration. Nextflow's capabilities for automatic reports generation and quality control enable the tracking of pipeline performance and data quality, ensuring that results are reliable and reproducible. Additionally, Nextflow's flexibility and customization options allow it to respond to the needs of a diverse range of stakeholders. Further value is provided by the nf-core community, which adds standardisation and best-practice pipelines on top of Nextflow’s capabilities, as well as key accessibility features, allowing workflows to be easily executed by both experts and beginners.
We will demonstrate how Nextflow's adoption can overcome common challenges, such as environment limitations, data management, and resource allocation, ultimately accelerating scientific discovery and improving analysis services.
By providing a unified and transparent way to manage complex bioinformatics workflows, Nextflow enables bioinformatics cores to focus on their mission: delivering high-quality results and advancing scientific knowledge.

14:50-15:10
Leveraging the NF-Core Framework for sharable institutional Nextflow modules at Memorial Sloan Kettering Cancer Center
Confirmed Presenter: Nikhil Kumar, Memorial Sloan Kettering Cancer Center, United States

Room: 525
Format: In Person

Moderator(s): Alberto Riva


Authors List: Show

  • Nikhil Kumar, Memorial Sloan Kettering Cancer Center, United States
  • Eric Buehler, Memorial Sloan Kettering Cancer Center, United States
  • Yu Hu, Memorial Sloan Kettering Cancer Center, United States
  • Anne Noronha, Memorial Sloan Kettering Cancer Center, United States
  • Rashmi Naidu, Memorial Sloan Kettering Cancer Center, United States
  • Yixiao Gong, Memorial Sloan Kettering Cancer Center, United States
  • John Orgera, Memorial Sloan Kettering Cancer Center, United States
  • Adam Price, Memorial Sloan Kettering Cancer Center, United States
  • Christopher Bolipata, Memorial Sloan Kettering Cancer Center, United States
  • Michael Berger, Memorial Sloan Kettering Cancer Center, United States
  • Aijazuddin Syed, Memorial Sloan Kettering Cancer Center, United States
  • Mark Donoghue, Memorial Sloan Kettering Cancer Center, United States
  • Ronak Shah, Memorial Sloan Kettering Cancer Center, United States

Presentation Overview: Show

Four 10 minute talks about different aspects of pipelines within the context of core facilities or related settings:

Nikhil Kumar (nikhilkumar516@gmail.com) - Leveraging the NF-Core Framework for sharable institutional Nextflow modules at Memorial Sloan Kettering Cancer Center.

Dena Leshkowitz (dena.leshkowitz@weizmann.ac.il): UTAP2: User-friendly Transcriptome and Epigenome Analysis Pipeline

Grace Pigeau (gpigeau@oicr.on.ca): Managing Big Data in a High-Throughput Genomics Pipeline

George Bell (gbell@wi.mit.edu): Novel Linux-style code helps us all down the road

UTAP2: User-friendly Transcriptome and Epigenome Analysis Pipeline
Confirmed Presenter: Leshkowitz Dena, Weizmann Institute of Science, Israel

Room: 525
Format: In Person

Moderator(s): Alberto Riva


Authors List: Show

  • Leshkowitz Dena, Weizmann Institute of Science, Israel
  • Bareket Dassa, Weizmann Institute of Science, Israel
  • Jaime Prilusky, Weizmann Institute of Science, Israel
  • Noa Wigoda, Weizmann Institute of Science, Israel
  • Gil Stelzer, Weizmann Institute of Science, Israel
  • Jordana Lindner, Weizmann Institute of Science, Israel

Presentation Overview: Show

UTAP2 empowers researchers to unlock the mysteries of gene expression and epigenetic modifications with ease. This user-friendly, open-source pipeline, built by unit programmers and deep sequencing analysts, streamlines transcriptome and epigenome data analysis, handling everything from sequences to gene or peak counts and differentially expressed genes or genomic regions annotation. Results are delivered in organized folders and rich reports packed with plots, tables, and links for effortless interpretation. Since the debut of UTAP original version [1] in 2019, it has been embraced by thousands of runs at the Weizmann Institute and 118 citations, thus highlighting its scientific contribution.
UTAP2 is available to the broader biomedical research community as an open-source installation. With a single image, it can be easily installed on both local servers and cloud platforms, allowing users to leverage parallel cluster resources (detailed information and installation instructions are provided on our GitHub site: https://utap2.readthedocs.io/en/latest/).
Reference:
1. UTAP: User-friendly Transcriptome Analysis Pipeline”, BMC Bioinformatics 2019, 20(1):154(PMID: 30909881)

15:10-15:30
Managing Big Data in a High-Throughput Genomics Pipeline
Confirmed Presenter: Grace Pigeau, Ontario Institute for Cancer Research, Canada

Room: 525
Format: In Person

Moderator(s): Alberto Riva


Authors List: Show

  • Grace Pigeau, Ontario Institute for Cancer Research, Canada
  • Heather Armstrong, Ontario Institute for Cancer Research, Canada
  • Michael Laszloffy, Ontario Institute for Cancer Research, Canada
  • Dillan Cooke, Ontario Institute for Cancer Research, Canada
  • Alexander Fortuna, Ontario Institute for Cancer Research, Canada
  • Alexis Varsava, Ontario Institute for Cancer Research, Canada
  • Ally Wu, Ontario Institute for Cancer Research, Canada
  • Beatriz Lujan Toro, Ontario Institute for Cancer Research, Canada
  • Bernard Lam, Ontario Institute for Cancer Research, Canada
  • Jessica Miller, Ontario Institute for Cancer Research, Canada
  • Xuemei Luo, Ontario Institute for Cancer Research, Canada
  • Ryan Falkenberg, Ontario Institute for Cancer Research, Canada
  • Morgan Taschuk, Ontario Institute for Cancer Research, Canada
  • Lawrence Heisler, Ontario Institute for Cancer Research, Canada

Presentation Overview: Show

The Genome Sequence Informatics (GSI) team at OICR handled the analysis and processing of over 1.2 petabytes of data in 2023. The resources required to store such large amounts of data are expensive and difficult to manage. Additionally, data processing demands will increase significantly in 2024, with the on-boarding of two new sequencers and migration to larger capacity flow cells. To manage this increased data output we are implementing changes to data tracking and more aggressive data removal.
Assays available through the genomics core consist of a distinct set of samples - defined as a case - which are analyzed together, producing a variety of deliverable files and reports. Once all work on a case is complete, the associated data can be scheduled for deletion. However, data from cases that use our clinical reporting assays, must be retained for two years. To accommodate this, data is automatically backed up over multiple stages to cloud storage before being deleted. First, the cases which are complete and ready for archiving are identified by an automated pipeline operations system. The raw sequence data and any files that are directly used by the clinical report are encrypted and automatically backed-up to a file storage web service. Archive status and metadata are tracked in a local database. If needed, the archive retrieval is straightforward to initiate and the files are recalled for reload into the production pipeline. This allows the team to meet accreditation requirements and ensure data integrity without requiring continually increasing storage capacity.

Novel Linux-style code helps us all down the road
Confirmed Presenter: George Bell, Whitehead Institute, United States

Room: 525
Format: In Person

Moderator(s): Alberto Riva


Authors List: Show

  • George Bell, Whitehead Institute, United States
  • Bingbing Yuan, Whitehead Institute, United States
  • Troy Whitfield, Whitehead Institute, United States
  • M Inmaculada Barrasa, Whitehead Institute, United States
  • Xinlei Gao, Whitehead Institute, United States

Presentation Overview: Show

Python, Matlab, and especially R --
all have code bases that can help you go far.
But for biologists who can't program,
asking them to try can lead to, ""No way, ma'am!""
In contrast, when sending a biologist to the command line,
they typically respond, ""Sure -- that'd be fine!""
As a result, getting R and python packages into scripts
doesn't cause any coding conflicts.
Typing the command provides the syntax,
so then you'll know all the practical facts.
We can recommend libraries like edgeR and DESeq2,
and give everyone great analytic methods to pursue.
And specialized figures like UpSet, Sankey, and waterfall,
can be easily created, even in places like Montreal.
We provide input, output and sample commands
which are accessible by all -- no one misunderstands.
So try out the scripts on our web site,
It can increase your efficiency to a new height.
https://github.com/whitehead/barc"

15:30-16:00
Panel: New technologies in cores: single-cell, spatial, etc.
Room: 525
Format: In person

Moderator(s): Alberto Riva


Authors List: Show

Presentation Overview: Show

How do cores find the time and resources for development on new technologies? Lessons learned and useful tools we have found when managing these new approaches.

16:40-17:10
Short Talks, Various topics
Room: 525
Format: In person

Moderator(s): Shannan Ho Sui


Authors List: Show

  • Various Various

Presentation Overview: Show

Patricia Carvajal-López (pati@ebi.ac.uk) - Competency framework profiles to reflect career progression within bioinformatics core facility scientists

Michael Laszloffy (MLaszloffy@oicr.on.ca) - Dimsum: a dashboard for quality control, project tracking, turnaround time reporting and more

Aliye Hashemi (ahashem@gmu.edu) - Protein Classification Using Delaunay Tessellation

17:10-18:00
Panel: Breakout Groups
Room: 525
Format: In person

Moderator(s): Shannan Ho Sui


Authors List: Show

Presentation Overview: Show

Attendees will form breakout groups to discuss topics of interest from the day and other topics suggested by attendees in order to share knowledge and ideas between cores and make connections with others in similar roles. Groups will report back their findings to the larger group.