Description
Developing software or contributing to package development often seems out of reach for those starting as package users, especially in the bioinformatics field, where this journey typically begins. Furthermore, the transition from user to developer predominantly occurs in the Global North community. This workshop aims to bridge this gap by bringing this knowledge to the Latin American community, fostering more open-source software developers in bioinformatics.
Learning Objectives
Master good coding practices in R.
Transform functions into an R package.
Understand the structure of an R package.
Create documentation following best practices.
Learn the submission process to rOpenSci.
Intended Audience and Level
This tutorial is aimed at participants with past experience and advanced knowledge in R, particularly those interested in bioinformatics and software development
Description
This tutorial introduces students to methods for studying biological systems using various OMIC data types, employing a multi-omics approach. The goal is to identify potential interactions or correlations between entities across different OMIC layers. A given transcript, for instance (transcriptomic later), can be correlated with multiple proteins (proteomic layer). The methods explored in this tutorial are chosen based on the type of biological questions addressed and the available data types. These methods allow visualization, exploration, and summarization of relevant relationships among OMIC datasets.
The need for OMIC integration approaches is growing as more research projects simultaneously measure different types of high-dimensional data. Integrative analysis methods are crucial to handle this complexity, which in turns allows answering more complex questions and provides a more comprehensive understanding of biological systems.
In this training, we will first present the fundamental data pre-processing concepts and their impact on integration, including missing data and data distribution. Then, we will explore various multi-omic integration approaches, including:
Data-driven, using existing functional annotations such as biological pathways.
Dimensionality reduction, for example Multi-Omics Factor Analysis (MOFA) and Joint and Individual Variation Explained (JIVE).
Network theory, for example Weighted Gene Co-expression Network Analysis WGCNA.
These approaches will result in different types of integrations, including horizontal, parallel, and hierarchical that are mainly dependent on the available OMICs. This course aims to equip participants with a robust understanding of current integration approaches, empowering them to address specific biological questions using available OMIC data. Upon completion, participants should identify questions suitable for multi-omics integration and choose appropriate methods tailored to their scenarios. More importantly, they should be able to tackle new questions as competent decision-makers.
We offer a multi-level tutorial suitable for trainees with both beginner and intermediate skills in OMIC data analysis. Each topic includes one or more exercises designed to deepen the understanding of critical concepts through hands-on experience on selected datasets. These hands-on exercises are designed to accommodate trainees with heterogeneous backgrounds (e.g., life sciences, computer sciences) and varying proficiency levels in OMIC data analysis. The material provided includes theoretical content and hands-on exercises with all necessary code, presented as R Markdowns. All materials are hosted in GitLab repositories, which will be publicly available. Materials are in English, but the course can be given in English or Spanish.
Participants will benefit from the teaching team's experience as computational biologists at the Bioinformatics and Biostatistics Hub of Institut Pasteur (IP) in Paris, collaborating with IP’s wet lab scientists. This collaboration encompasses data analysis, method development, and bioinformatics/biostatistics training for the campus. In addition to courses for PhD students at Institut Pasteur and for French universities, the teaching team has extensive experience in international training activities across Latin America (Ecuador, Colombia, Peru), Africa (Tunisia), Asia (Vietnam), and European consortiums.
Drawing from its experience as a lecturers and instructors, the teaching team believes that providing comprehensive training to students and researchers on the formal basis of bioinformatic and biostatistic methodologies is paramount for fostering successful collaboration between wet and dry lab environments. Indeed, regardless of the specific research subject in life sciences, there is a significant need for an advanced understanding on data analysis.
Learning Objectives
Become familiar with the principles, flavours and challenges of OMIC integration.
Understand fundamental data pre-processing concepts and their impact on integration,
including missing data (data imputation), and data distribution (normalization and
scaling).
Demystify the use of complex integrative methods through hands-on applying
statistical approaches such as dimensionality reduction, and network-based methods.
Develop an informed perspective on the key issues to make methodological choices
according to the data available and the biological question(s).
Apply and integrate good practices for data analysis in R/RStudio.
Intended Audience and Level
This tutorial is open to all scientists (students, postdocs, and researchers) involved in projects comprising various types of OMICs datasets. No prior knowledge of integration methods for OMICs data is required. However, previous experience with any OMICs data analysis (transcriptomics, proteomics, epigenomics) is highly encouraged. Familiarity with R/RStudio is necessary, including understanding the RStudio environment, R syntax, and package installation, though coding expertise is not mandatory.