Leading Professional Society for Computational Biology and Bioinformatics
Connecting, Training, Empowering, Worldwide



COSI Track Presentations

Attention Presenters - please review the Speaker Information Page available here
Schedule subject to change
Monday, July 22nd
10:20 AM-10:30 AM
Transitioning bioinformatics core to support biomedical AI/ML research - lessons learned
Room: Shanghai 1/2 (Ground Floor)
  • Yang Fann, National Institute of Health, United States

Presentation Overview: Show

Recently, Machine learning (ML) and artificial Intelligence (AI) are dominating the field of data science research including bioinformatics. To meet the increasing demands of investigators wanting to embrace these new ML/AI technologies, the traditional bioinformatics core needs to adapt and add new skillsets and services to the core. Although nowadays bioinformatics cores across countries are setup to fit their unique institution environment and research needs, there are common and fundamental changes in staffing and infrastructures needed such as data analysts and informatics scientists, algorithm and application developers, etc. to be able to accomondate those services. This short talk will share our experience and lesson learned on the challenges of transitioning our bioinformatics core to provide ML/AI services to the diverse neuroscience investigators at NIH, US.

10:30 AM-10:40 AM
Supporting single cell RNA-seq analysis: A Core's Perspective
Room: Shanghai 1/2 (Ground Floor)
  • Shannan Ho Sui, Harvard School of Public Health, United States

Presentation Overview: Show

Recent advances in single cell transcriptomics make it possible to examine the gene expression profiles of thousands of individual cells, providing unprecedented insights into tissue heterogeneity, development and pathogenesis. Since 2015, the Harvard Chan Bioinformatics Core (http://bioinformatics.sph.harvard.edu) has worked closely with the Harvard Medical School (HMS) Single Cell Core (https://iccb.med.harvard.edu/single-cell-core) to standardize data analysis for the InDrop droplet barcoding system and attempt to address demand for single cell analyses within the Harvard community. Here we describe our approach to building single cell analytical expertise and infrastructure through our partnership with the Single Cell Core and multiple research labs. We outline the challenges we faced and our current best practices for data analysis. Our pipeline, implemented within the bcbio-nextgen framework (https://bcbio-nextgen.readthedocs.io/), handles multiple UMI schemes to accommodate different single cell technologies (e.g. Drop-seq, Seq-well, Bio-Rad ddSeq, 10X, etc.). We also describe our approach to managing single cell projects, with their long and iterative analysis timelines, increased complexity, and requirement for rigorous experimental design, data management, computing infrastructure and methods evaluation. Due to these factors, we have expanded our bioinformatics training program to include single cell RNA-seq. With this program, we hope to develop analysis expertise within the community and an understanding of the methods and intricacies inherent to the technology - ultimately leading to better designed and more successful single cell RNA-seq experiments.

10:40 AM-10:50 AM
Conda and Bioconda, the best thing since sliced bread
Room: Shanghai 1/2 (Ground Floor)
  • Devon Ryan, MPI-IE, Germany

Presentation Overview: Show

Two of the primary challenges faced by bioinformaticians are (1) installing software and (2) creating environments to facilitate reproducible research. The Bioconda project uses the Conda package and environment management system to deliver thousands of up-to-date bioinformatics packages with full dependency resolution, that can be easily installed in isolated environments. Bioconda also automatically creates Docker containers for all packages, to facilitate analyses on cloud-based platforms or other environments where standard package installation is avoided. We briefly give an overview of Conda and Bioconda and their relationship to other projects, such as Biocontainers and Bioconductor as well as how these can be used to easily obtain and distribute packages in isolated reproducible environments.

10:50 AM-11:00 AM
Improving project management and tracking with Asana and Toggl
Room: Shanghai 1/2 (Ground Floor)
  • Sara Brin Rosenthal, UCSD Center for Compuational Biology and Bioinformatics, United States

Presentation Overview: Show

Computational biology and bioinformatics cores encounter some specific challenges in managing projects and working with clients. We discuss these challenges and present some ideas for solving them using project management tools including Asana and Toggl. We present an example use case and workflow for how a project progresses through our pipeline from initial consultation to final manuscript or submitted grant.

11:00 AM-11:10 AM
Bioinformatics training (in the context of a core)
Room: Shanghai 1/2 (Ground Floor)
  • Radhika Khetani, Harvard School of Public Health, United States

Presentation Overview: Show

There are 3 major challenges faced by wet-lab biologists venturing into utilizing next-generation sequencing (NGS) as a tool:
(1) Communicating with collaborators/cores about their experiment and analysis results, including understanding how and why certain computational methods were employed.
(2) Designing an experiment that ensures they have the necessary power to yield accurate results.
(3) Availability of training to learn about (1) and (2) above, in addition to becoming independent at analyzing their own data.

The training program at our core provides workshops to address these challenges for our clients, as well as the larger community our core serves. For the last 4.5 years we have been providing hands-on workshops on basic topics (R, shell, high-performance computing) and advanced topics (bulk RNA-seq, ChIP-seq, single-cell RNA-seq, variant calling) with a focus on experimental design, analysis best practices, data management, and basic data skills.

During this talk, we will discuss how we continue to fund this program that provides training at subsidized rates, how we structure these workshops, and how we try to ensure that students/community have access to current training resources given the ever-changing data analysis best practices.

11:10 AM-11:20 AM
Development of bioinformatics workshop by a core facility
Room: Shanghai 1/2 (Ground Floor)
  • Alberto Riva, Bioinformatics Core, ICBR, University of Florida, United States

Presentation Overview: Show

The ICBR Bioinformatics Core at the University of Florida is increasingly being asked to provide bioinformatics training to the very large and diverse community of researchers it serves. Organizing an 8-hour Bioinformatics 101 workshop presented us with several challenges. First, the workshop was aimed at students and postdocs who had a very wide range of interests and computational skills; we therefore had to identify topics of sufficiently general interest that could be approached in a few hours even by attendees with no preexisting experience with computational tools. Although the goal of the workshop was to provide an introduction to bioinformatics and not to turn attendees into independent bioinformaticians, we aimed at giving attendees hands-on experience working in a cluster environment. Finally, organizing and delivering a free workshop was in conflict with our core's operating model based on chargeback.

We will describe how, by partnering with the UF Health Cancer Center, the Health Science Library, and UF Research Computing, we leveraged resources available in the UF environment to make this workshop possible and successful. We believe our experience represents a viable model for similar endeavors at UF and elsewhere.

11:20 AM-11:55 AM
Small Group Discussions - BIOINFO-CORE
Room: Shanghai 1/2 (Ground Floor)
  • Everyone

Presentation Overview: Show

Unconferencing style small group breakout sessions

11:55 AM-12:20 PM
Small Group Reports - BIOINFO-CORE
Room: Shanghai 1/2 (Ground Floor)
  • Everyone

Presentation Overview: Show

small groups report findings back to larger group

12:20 PM-12:35 PM
nf-core - A community effort to collect a curated set of pipelines built using Nextflow
Room: Shanghai 1/2 (Ground Floor)
  • Harshil Patel, Bioinformatics and Biostatistics, The Francis Crick Institute, United Kingdom

Presentation Overview: Show

The standardization, portability, and reproducibility of analysis pipelines is a renowned problem within the bioinformatics community. Most pipelines are designed for execution on-premise, and the associated software dependencies are tightly coupled with the local compute environment. This leads to poor pipeline portability and reproducibility of the ensuing results - both of which are fundamental requirements for the validation of scientific findings. Here, we introduce nf-core: a framework that provides a community-driven, peer-reviewed platform for the development of best practice analysis pipelines written in Nextflow. Key obstacles in pipeline development such as portability, reproducibility, scalability and unified parallelism are inherently addressed by all nf-core pipelines. We are also continually developing a suite of tools that assist in the creation and development of both new and existing pipelines. Our primary goal is to provide a platform for high-quality, reproducible bioinformatics pipelines that can be utilized across various institutions and research facilities.