ISCBacademy 2021 Webinars



To view previous webinars use the links below

2020 Webinars | 2022 Webinars | 2023 Webinars | 2024 Webinars

ISCBacademy is an online webinar series including the ISCB COSI, COVID webinars, Indigenous Voices and practical tutorials. We aim to inspire, connect, and communicate the science while providing a hands-on experience accessing and using newly developed bioinformatics tools while ensuring best practices for rigour and reproducibility.


  • January 14, 2021 - Approaching Indigenous communities on theior own terms in microbiome research by Matthew Anderson, Ohio State University - Hosted by ISCB
  • January 26, 2021 - The ISCB Competency Framework: what is it and how does it support bioinformatics education and training? by Cath Brooksbank, EMBL-European Bioinformatics Institute - Hosted by ISCB
  • January 28, 2021 - SaGePhy: A phylogenetic simulation framework for gene and subgene evolution by Soumya Kundu, University of Connecticut. - Hosted by EvolCompGen
  • February 18, 2021 - Responsibilities for the Stewardship of Indigenous Data in Open Science by Stephanie Russo Carroll, University of Arizona. - Hosted by ISCB
  • March 17, 2021 - Recognising Indigenous Rights in Digital Sequence Information by Maui Hudson, University of Waikato. - Hosted by ISCB
  • May 25, 2021 - The Evolution of the Data Sharing Culture in Structural Biology by Helen Berman, Rutgers University. - Hosted by ISCB
  • June 10, 2021 - How did they get there? Genetic History of Native Americans in the Central Andes by Victor Borda, University of Maryland - Hosted by ISCB
  • June 28, 2021 - Early publication access and EMBL-EBI bio-molecular data tackle COVID-19 by Matt Pearce, Michael Parkin, EBI - Hosted by ISCB
  • September 7, 2021 - Protein Structure Prediction in a Post-AlphaFodl2 World by Mohammed AlQuraishi, Columbia University - Hosted by 3D-SIG
  • September 9, 2021 - Bezos to Bottlenecks: The Chasm between Altruism & the Amerindigenous by Joseph Yracheta, Johns-Hopkins University - Hosted by ISCB
  • September 14, 2021 - Open Sourcing Ourselves - Together by Mad Price Ball, Open Humans - Hosted by BOSC
  • September 21, 2021 - Resolving and avoiding design conflicts in ontology development and deployment by Maria Keet, University of Cape Town - Hosted by Bio-Ontologies
  • October 5, 2021 - Alternative approach for discovering relationship between bacteriophages and antimicrobial resistance by Roumyana Yordanova, Hokkaido University and Bulgarian Academy of Sciences - Hosted by CAMDA
  • October 12, 2021 - Injecting Life into Visualizations for Biomedical Research by Marc Streit, Johannes Kepler University Linz - Hosted by BioVis
  • October 20, 2021 - SARS-CoV-2 structural coverage map reveals viral protein assembly, mimicry, and hijacking mechanisms by Seán O’Donoghue, Andrea Schafferhans, Neblina Sikta, - Hosted by ISCB
  • October 20, 2021 - Multi-Omic Data and Clinical Risk Factor Integration to Build Interpretable Predictive Models for Type 1 Diabetes by Bobbi-Jo Webb-Robertson, Pacific Northwest National Laboratory - Hosted by CompMS
  • October 22, 2021 - Metabolic modelling of microbial interactions in microbiomes by Aarthi Ravikrishnan, Karthik Raman, Dinesh Kumar, - Hosted by ISCB
  • October 22, 2021 - Indigenous Pharmacogenomics and Implications for Personalized Medicine by Katrina Claw, University of Colorado - Hosted by ISCB
  • October 26, 2021 - Practicals in next-generation sequencing - Programming course in a generalist school can truly be fun, even in lockdown by Marie Sémon, ENS de Lyons - Hosted by Education
  • November 2, 2021 - Scalable inference of phylogenetic networks by Claudia Solis-Lemus, Wisconsin Institute for Discovery - Hosted by EvolCompGen
  • November 9, 2021 - Inferring functions of the essential genes for life by Mark Wass, University of Kent - Hosted by Function
  • November 23, 2021 - Robust single-cell discovery of RNA targets of RNA-binding proteins and ribosomes by Kristopher Brannan, University of California at San Diego - Hosted by iRNA
  • November 26, 2021 - Elements of Style in Reproducible Workflow Creation and Maintenance: A Hands-on Tutorial by Anne Deslattes Mays, Christina Chatzipantsiou, - Hosted by ISCB
  • November 29, 2021 - Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads by Kishwar Shafin, University of California Santa Cruz - Hosted by HiTSeq
  • November 30, 2021 - Longitudinal genome resolved metagenomics by Christopher Quince, Earlham Institute - Hosted by MICROBIOME
  • December 9, 2021 - DeepSTARR predicts enhancer activity from DNA sequence and enables the de novo design of enhancers by Bernardo Almeida, Research Institute of Molecular Pathology - Hosted by MLCSB
  • December 14, 2021 - A Multi-Objective Genetic Algorithm to Find Active Modules in Multiplex Biological Networks by Elva Maria Novoa del Toro, Toxalim/INRAE - Hosted by NetBio

  • Approaching Indigenous communities on theior own terms in microbiome research
    by Matthew Anderson

    January 14, 2021

    Principles of individual consent and sample deidentification stand as pillars of modern biomedical research but are flawed with respect to certain populations. Indigenous peoples have historically been targeted by unethical practices that continue into the present even when following best practices for conducting research with human subjects. This has led some studies in American Indian/Alaskan Native (AI/AN) populations to included additional safeguards that are reinforced through these communities’ unique legal status as domestic dependent nations. Yet, use of microbiome datasets generally lacks restriction on data sharing and other protections because of their perceived inability to significantly impact public health or individual welfare despite over a decade of work demonstrating the importance of microbial population in human development, metabolism, and immunopathologies. Additionally, raw datasets can contain large proportions of human-derived reads that include information on the host and not just microbes. Current projects in partnerships with the Cheyenne River Sioux Tribe serve as new models of community partnerships to address issues of sovereignty in human and non-human datasets.

    Click here to watch

    Hosted by:

    - top -


    The ISCB Competency Framework: what is it and how does it support bioinformatics education and training?
    by Cath Brooksbank

    January 26, 2021

    Demand for the application of data science techniques to life science research is accompanied by an increased need for bioinformatics expertise across a broad range of professionals – from lab-based molecular life-scientists through computer scientists to software engineers; furthermore, the applications of data-driven biology are just as varied, encompassing fundamental life-science, medicine, agriculture and environmental science. Educating and training the individuals who choose career paths in this varied and fast-moving field is therefore challenging, and educators can struggle to keep up with the needs of employers.

    The ISCB competency framework was developed by the ISCB Education Committee in consultation with a global community of bioinformatics professionals to bridge this gap. It provides a minimum information standard defining the competencies required, and the levels they’re required at, for a range of roles that require bioinformatics expertise, and it provides a tool to support bioinformatics educators to develop courses and curricula that meet the needs of employers.

    In this webinar I will explain why the ISCB adopted a competency-based approach, describe the newly released version 3 of the framework, summarise how educators and trainers can use the framework to develop new learning interventions or update pre-existing ones, and outline how the ISCB is planning to support a competency-based approach to bioinformatics education and training in the future, both through continuing improvement of the framework and through initiatives to encourage the recognition of courses and curricula that make use of it.

    Click here to watch

    Hosted by:

    - top -


    SaGePhy: A phylogenetic simulation framework for gene and subgene evolution
    by Soumya Kundu

    January 28, 2021

    SaGePhy (pronounced sage-phy) is a software package for improved phylogenetic simulation of gene and subgene evolution. SaGePhy can be used to generate species trees, gene trees, and subgene or (protein) domain trees using a probabilistic birth–death process that allows for gene and subgene duplication, horizontal gene and subgene transfer, and gene and subgene loss. SaGePhy implements a range of important features not generally found in other phylogenetic simulation frameworks; these include the ability to simulate (i) subgene or domain level events inside one or more gene families, (ii) both additive and replacing horizontal gene and subgene/domain transfers, (iii) distance-biased horizontal transfers, and (iv) probabilistic sampling of species tree and gene tree nodes, respectively, for gene- and domain-family birth. SaGePhy therefore makes it possible to perform more realistic simulation of gene and subgene/domain evolution.

    Click here to watch

    Hosted by:

    - top -


    Responsibilities for the Stewardship of Indigenous Data in Open Science
    by Stephanie Russo Carroll

    February 18, 2021

    As big data, open data, and open science advance to increase access to complex and large datasets for innovation, discovery, and decision-making, Indigenous Peoples’ rights to control and access their data within these data environments remain limited. Indigenous Data Sovereignty focuses on the protection of Indigenous rights and interests in the control and governance of Indigenous data. Indigenous data interests stretch across diverse disciplinary fields connecting community data governance ambitions with institutional and individual responsibilities in practice. Given this reach, a range of initiatives have been developed to strategically build new capabilities for strengthening control and governance of Indigenous data. These initiatives draw on a variety of methods and tactics across law, policy, ethics, and infrastructure. Applying these new tools and mechanisms in open science shifts Indigenous Peoples from invisibility within data ecosystems to vibrant contributors to open science.

    Click here to watch

    Hosted by:

    - top -


    Recognising Indigenous Rights in Digital Sequence Information
    by Maui Hudson

    March 17, 2021

    Indigenous concerns about genomic research have been strongly articulated over the past few years with accompanying suggestions about how to improve relationships with indigenous communities and the practice of research. Discussions are now moving towards how Indigenous rights can be recognised in the context of Digital Sequence Information including the recognition of provenance and sharing of protocols and permissions through labelling systems like Local Contexts.

    Click here to watch

    Hosted by:

    - top -


    The Evolution of the Data Sharing Culture in Structural Biology
    by Helen Berman

    May 25, 2021

    The Protein Data Bank was established 50 years ago in 1971. In this Webinar I will describe its evolution from a small repository to a large international data resource. The roles that the many stakeholders played in creating a data sharing culture and how science has benefited from that culture will be discussed.

    Click here to watch

    Hosted by:

    - top -


    How did they get there? Genetic History of Native Americans in the Central Andes
    by Victor Borda

    June 10, 2021

    Central Andes, which extends from Southern Ecuador to Southern Peru, was the homeland of civilizations that reached the state-level society in pre-Columbian times. The term “Central Andes” do not include solely the highland mountains but also the regions affected by both slopes to the east (Amazon) and west (Pacific Coast). Cultural interaction involving these regions were described for the last 5000 years. Here we describe genetic evidence that these cultural connections were accompanied by gene flow across the Andes and Northern Peru was one of the main scenarios for these movements.

    Click here to watch

    Hosted by:

    - top -


    Early publication access and EMBL-EBI bio-molecular data tackle COVID-19
    by Matt Pearce, Michael Parkin

    June 28, 2021

    The COVID-19 Data Portal (CDP) and Europe PMC’s full-text collection of COVID-19 preprints represent two efforts by EMBL-EBI to make data available to promote coronavirus research.

    The COVID-19 Data Portal (CDP) was launched in April 2020 to provide access to SARS-CoV-2 and COVID-19 biomolecular data in an accessible manner. The data portal is part of the European COVID-19 Data Platform, which is provided by EMBL's European Bioinformatics Institute (EMBL-EBI), ELIXIR, partners from the ReCoDID and VEO projects and the European Open Science Cloud. There are national portals that complement the covid19dataportal.org and represent a broad international collaboration.

    In recognition of many researchers publishing their COVID-19 results rapidly via preprints during the pandemic, Europe PMC (https://europepmc.org/), an EMBL-EBI database for life science literature, launched a project in July 2020 to make the full text of COVID-19 preprints available for reading and reuse via a standard XML format. Preprints are linked to journal-published articles, open peer review materials, as well as underlying data in community databases, including PDBe, ENA, and many more. The full text corpus of COVID-19 preprints with an open access license or similar is made available for download via a public API and FTP site, enabling deeper analysis.

    Click here to watch

    Hosted by:

    - top -


    Protein Structure Prediction in a Post-AlphaFodl2 World
    by Mohammed AlQuraishi

    September 7, 2021

    AlphaFold2 burst on the life sciences stage in late 2020 with the remarkable claim that protein structure prediction has been solved. In this talk I will argue that in some fundamental sense the core scientific problem of static structure prediction is finished, but that further maturation is necessary before AlphaFold2 and similar systems can address biological questions beyond those of structure determination itself. I will outline some of these necessary developments and highlight one in particular: the prediction of structure from individual protein sequences. I will describe present challenges and opportunities, and our efforts to tackle them by combining advances in protein language modeling with end-to-end differentiable structure prediction, presenting new results on the prediction of orphan and de novo designed proteins. Time permitting, I will end by speculating on what abundant availability of structural information might mean for the future of biology.

    Click here to watch

    Hosted by:

    - top -


    Bezos to Bottlenecks: The Chasm between Altruism & the Amerindigenous
    by Joseph Yracheta

    September 9, 2021

    Background and Aim:
    American Indians suffer from higher rates of several conditions like diabetes, chronic kidney disease, cardiovascular disease and disproportionate exposures to metals and/or other toxic environmental hazards. Indigenous people in the rest of the Americas (Latin Indigenous) and Polynesia show remarkable similarities despite not having a common ancestry. Exposure to colonization and its long lasting systemic effects are common, however. Gene-environmental studies are key to creating interventions for these groups. This includes the internal environment of the cell and its myriad nucleic interactions in the cytosol, mitochondria, nucleus and virome.

    Conclusions:
    Few studies or institutions have explained the impact of multifactorial research or unique Amerindigenous Dynamic Architecture and Omic substructure to community decision makers. Nor have they tried to broker in any meaningful way, the disconnect between funders and implementers.
    Systemically biased socio-economic realities that negatively impact Indigenous communities are likely to be breached only by scientists, lawyers, ethicists and public relations experts from Indigenous communities. Successful research can only be achieved by creating a trustworthy system, not by creating trusting participants. Increasing the numbers of trained professionals in and around the research endeavor is the only way to account & respond to the historic mistrust of Indigenous communities where internal dialogue and explication of human & environmental interactions can lead robust and transformative research.

    Keywords:
    American Indian, Omics, Environmental Exposure, Exposome, Amerindigenous, Community Engagement, ELSI, Informed Consent, Systemic Racism

    Click here to watch

    Hosted by:

    - top -


    Open Sourcing Ourselves - Together
    by Mad Price Ball

    September 14, 2021

    Open source refers to the practice of making software freely available, re-usable, and adaptable. We might also ask: how can we apply open source to understanding ourselves as humans -- our genomes, health, or behavior? While navigating concerns about privacy and consent, the principles of open should also prompt us to consider what we can do to enable others. How can we make it more open for people to research themselves? Open source communities have come to understand that it takes more than just sharing code: it requires building a community. These same principles also apply to individual and collective research about our health. Drawing on my work with the Personal Genome Project and Open Humans, I share insights and lessons I've learned in efforts to collect, share, and analyze our personal data to better understand ourselves.

    Click here to watch

    Hosted by:

    - top -


    Resolving and avoiding design conflicts in ontology development and deployment
    by Maria Keet

    September 21, 2021

    Ontology development avails of science, engineering, and philosophy to represent the subject domain knowledge formally so that it can be used to enhance information systems. This process involves resolving ontological differences and making choices between conflicting axioms, which are due to various reasons. Examples include different foundational ontologies, alternate design patterns for the prospective ontology’s use case, and an ontology language’s expressivity limitations.
    Instead of ad hoc decision-making, science and engineering-based modelling guidance with methods and tools can alleviate these issues to assist with the meaning negotiation and conflict resolution in a systematic way. In this talk, I will discuss common conflicts and typical steps toward resolution, including the tool availability for it. A similar situation with trade-offs exist when deploying ontologies for ontology-based data access and integration, which we shall touch upon as well. Use cases, tools, and experiments were in several subject domains, such as avian influenza, horizontal gene transfer, and metabolic pathways.

    Click here to watch

    Hosted by:

    - top -


    Alternative approach for discovering relationship between bacteriophages and antimicrobial resistance
    by Roumyana Yordanova

    October 5, 2021

    Recent focus on the relationship between bacteriophages and antimicrobial resistance in the context of contemporary microbiology related to medicine and pharmaceutics is driven by their potential contribution to the current growing importance of antimicrobial resistance. There exists a number of research studies which confirm [1], or question [2] the role of the bacteriophages in dissemination of antimicrobial resistance genes.
    A major objective of the CAMDA challenge is to acquire more knowledge about the relationship between viruses, their hosts and antimicrobial resistance genes in determining if antimicrobial resistance indeed can spread through phages. This study is focused on discovering relationship and possible dependencies between bacteriophages and antimicrobial resistance based on the data collected from different city environments all over the world. The approach used in our analyses consists of several different methods which assess the differential abundance of phages, their diversity across samples, the impact on antimicrobial resistance categories and associations with ARGs genes. The relationship between phages, their hosts and antimicrobial resistance is also explored by a Bayesian spatial model.

    Click here to watch

    Hosted by:

    - top -


    Injecting Life into Visualizations for Biomedical Research
    by Marc Streit

    October 12, 2021

    Biology has become increasingly data-driven. Visualization is now an important part of the data science toolbox. Many researchers, however, still think of visualization primarily as a means to communicate insights rather than a fundamental building block of the discovery process.

    One effective way to make sense of large and heterogeneous biological data is to combine the strengths of visualization with the power of analytical reasoning, automated analysis, and modern AI capabilities. This powerful combination can lead to discoveries that neither a computer nor a human could make alone.

    I will start this talk by giving examples of interactive web-based visualization tools that were designed for the purpose of drug discovery and cancer research. In the second part of the talk, I will show how low-dimensional embeddings of high-dimensional data can be used for understanding and explaining complex models and processes.

    Click here to watch

    Hosted by:

    - top -


    SARS-CoV-2 structural coverage map reveals viral protein assembly, mimicry, and hijacking mechanisms
    by Seán O’Donoghue, Andrea Schafferhans, Neblina Sikta

    October 20, 2021

    We will discuss our recent modelling study of the 3D structures of all SARS-CoV-2 proteins. Using HMMs, we generated 2,060 models that span 69% of the viral proteome (https://doi.org/10.15252/msb.202010079). These models revealed viral mimicry and hijacking mechanisms that reverse post-translational modifications, block host translation, and disable host defenses. The models also revealed new insights into viral replication.

    To make these models accessible, we devised a structural coverage map, a concise visual summary of what is known — and not known — about viral protein structures. We used the map to create the Aquaria-COVID resource (https://aquaria.ws/covid), designed to help researchers use the 79 structural states identified in our work to understand COVID-19 mechanisms, and to draw attention to the 31% of the viral proteome that remains structurally unknown or ‘dark’.

    We will also discuss a new resource we developed to help combat emerging viral strains by streamlining the use of protein structures in variant analysis (https://doi.org/10.1101/2021.09.10.459756). All structural data on a variant can be accessed via simple URLs: for example, https://aquaria.app/SARS-CoV-2/S?L452R specifies the L452R variant in 'S', i.e., the 'spike' protein of SARS-CoV-2.

    Click here to watch

    Hosted by:

    - top -


    Multi-Omic Data and Clinical Risk Factor Integration to Build Interpretable Predictive Models for Type 1 Diabetes
    by Bobbi-Jo Webb-Robertson

    October 20, 2021

    Type 1 diabetes (T1D) is a chronic autoimmune disease that results from autoimmune destruction of insulin-producing pancreatic beta-cells. T1D progresses through stages and clinical diabetes is generally preceded by the presentation of diabetes-related autoantibodies (IA), but no symptoms. As the cause of the disease remain elusive, multiple diabetes cohorts, such as the Diabetes Autoimmunity Study in the Young (DAISY; http://www.daisycolorado.org/) and The Environmental Determinants of Diabetes in the Young (TEDDY; https://teddy.epi.usf.edu/), have been established to collect information longitudinally to gain insights into the biological mechanisms driving changes in the progression of the disease from a pre-symptomatic IA to symptomatic T1D state. These prospective cohort studies have reported potential demographic, immune, genetic, metabolomic, and proteomic markers statistically associated with IA or the progression from IA to T1D. However, these markers alone are not highly predictive on T1D outcomes at an individual level. This presentation will describe an approach for integration and feature selection of these various risk factors and multi-omics measurements via machine learning, enabling a better understanding of the biological mechanisms driving IA and/or T1D and identifying clinically relevant biomarkers to predict patient-level progression to these disease endpoints.

    Click here to watch

    Hosted by:

    - top -


    Metabolic modelling of microbial interactions in microbiomes
    by Aarthi Ravikrishnan, Karthik Raman, Dinesh Kumar

    October 22, 2021

    About the Tutorial
    The recent years have seen the emergence of the microbiomes as important axes of human health and disease. Microbial communities abound in various regions of the human body, notably the gut, skin and the oral cavity. Microbial communities are increasingly being used for industrial fermentations and wastewater treatment. Many algorithms and tools have been developed to study microbial communities, particularly the metabolic interactions that drive and sustain these microbial communities. In this tutorial, we seek to provide a brief overview of the key modelling paradigms that have been used to study microbiomes, particularly focussing on two broad classes of methods: (a) constraint-based modelling, that attempts to model microbial metabolic networks in terms of the fluxes of various constituent reactions, and (b) graph-based modelling, which models microbial interactions as part of a complex graph capturing the exchange of several metabolites between the constituent organisms, and consequently, shed light on the nature of the interactions between the organisms. In this tutorial, we will introduce the participants to the fundamental concepts of metabolic modelling with a special emphasis on microbial communities. Following this, we will delve deeper into different types of techniques to understand interactions in a microbial community. Lastly, we will showcase some representative tools and methods, which will enable the participants to apply the theories to real-life examples and understand the nature of interactions between different kinds of organisms in community settings. At the end of the tutorial, the participants will have an understanding of: - Broad applications of metabolic modelling to model microbiomes - Databases and resources for microbiome modelling - Key constraint-based methods that can be used to understand microbiomes - Key graph-based methods that can aid in understanding metabolic exchanges - Tools for microbiome modelling such as COBRA toolbox (specific algorithms) or MetQuest.

    Training Materials
    https://github.com/RamanLab/ISCB-Academy-Tutorial-Community-Modelling

    Target Audience
    Familiarity with Python is necessary. Python and Matlab must be installed prior to the tutorial along with select packages and toolboxes. Instructions will be available in the GitHub repository by 1st October, 2021.

    About the Hosts
    Karthik Raman is an Associate Professor at the Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, IIT Madras. Karthik’s research group works on the development of algorithms and computational tools to understand, predict and manipulate complex biological networks. Broadly spanning computational aspects of synthetic and systems biology, key areas of research in his group encompass microbiome analysis, in silico metabolic engineering, biological network design and biological data analysis. Karthik also co-ordinates the Centre for Integrative Biology and Systems Medicine and is a core member of the Robert Bosch Centre for Data Science and Artificial Intelligence (RBC-DSAI). Karthik teaches courses on computational biology and systems biology at IIT Madras, and has also authored a textbook on Computational Systems Biology. Aarthi Ravikrishnan is a postdoctoral fellow and a team lead at Genome Institute of Singapore. Her research interests are predominantly in microbiome analytics and developing computational methods to understand the role of microbiome in health. Her team focusses on understanding skin and gut microbiome through data generation and analyses. She, along with Karthik Raman, has co-authored a book on Systems-level modelling of microbial communities.

    Hosted by:

    - top -


    Indigenous Pharmacogenomics and Implications for Personalized Medicine
    by Katrina Claw

    October 22, 2021

    The integration of genomic technology into health care settings has the potential to transform healthcare through increased personalization of medical decisions. In particular, pharmacogenomics research on drug disposition and response can tailor and improve medication regimens for all patients by informing tests of function altering variation in drug metabolism and transport genes. Unfortunately, Indigenous peoples remain underrepresented in pharmacogenomics research. Effective strategies to create research partnerships between tribal communities and genomic researchers are often lacking, yet such partnerships are needed for trustworthy research. We review what is currently known about pharmacogenetic variation in Indigenous communities and highlight work related to nicotine metabolism and tobacco cessation as an example of successful collaborations in pharmacogenetic research relating genotype-phenotype associations. We discuss the challenges and opportunites related to the implementation of personalized drug therapy in the community using ethical engagement and collaborative approaches.

    Click here to watch

    Hosted by:

    - top -


    Practicals in next-generation sequencing - Programming course in a generalist school can truly be fun, even in lockdown
    by Marie Sémon

    October 26, 2021

    Next-Gen Sequencing has become a staple tool in biology during the past decades. This makes it necessary to teach students how to perform analysis of such data. However, this is challenging, particularly for students unfamiliar with code and command-line tools. The Master of biology of the ENS de Lyon has set up a practical course where we teach students to set up a reproducible NGS data analysis pipeline to generate near-publication results from raw sequencing data. The students have to deposit their work on a git repository. We engage students by allowing them to choose from a broad range of projects, and strengthen group and student interaction through flipped-classrooms. Despite the need to host the course remotely due to the pandemic, we achieved great success (as evidenced by student feedback), through the use of virtual machines for computing, chat applications for communicating and screen-sharing, and much involvement on both students and teachers’ part.

    Click here to watch

    Hosted by:

    - top -


    Scalable inference of phylogenetic networks
    by Claudia Solis-Lemus

    November 2, 2021

    Phylogenetic network inference plays an important role in the reconstruction of the tree of life, given the widespread gene flow among different organisms. However, there are many challenges in the inference of reticulate evolution such as network reconstruction and interpretation, and difficulties to summarize network uncertainty. In this talk, I will explain the current difficulties in network statistical inference and present a new scalable method based on pseudolikelihood theory. I will also present extensions of standard trait evolution tools to networks, such as phylogenetic regression or ANOVA, ancestral trait reconstruction, and Pagel's lambda test of phylogenetic signal. All the new tools are implemented in the open-source Julia package PhyloNetworks.

    Click here to watch

    Hosted by:

    - top -


    Inferring functions of the essential genes for life
    by Mark Wass

    November 9, 2021

    Identification of the smallest possible genome that is possible to support life has been a long term quest in Synthetic Biology. This has seen ongoing progress and a few years ago a bacterial genome, based on Mycoplasma mycoides, containing only 438 protein coding genes was engineered. Strikingly, the function of more than a third (149) of these proteins was unknown, demonstrating our limited knowledge and understanding of the essential function for life. In this talk I will present our recent work using an array of bioinformatics approaches to characterise these proteins and infer their functions. I will discuss the insights we gained into the essential functions for life and also reflect on what our findings show for the area of protein function prediction

    Click here to watch

    Hosted by:

    - top -


    Robust single-cell discovery of RNA targets of RNA-binding proteins and ribosomes
    by Kristopher Brannan

    November 23, 2021

    RNA-binding proteins (RBPs) are critical regulators of gene expression and RNA processing that are required for gene function. Yet the dynamics of RBP regulation in single cells is unknown. To address this gap in understanding, we developed STAMP (Surveying Targets by APOBEC-Mediated Profiling), which efficiently detects RBP–RNA interactions. STAMP does not rely on ultraviolet cross-linking or immunoprecipitation and, when coupled with single-cell capture, can identify RBP-specific and cell-type–specific RNA–protein interactions for multiple RBPs and cell types in single, pooled experiments. Pairing STAMP with long-read sequencing yields RBP target sites in an isoform-specific manner. Finally, Ribo-STAMP leverages small ribosomal subunits to measure transcriptome-wide ribosome association in single cells. STAMP enables the study of RBP–RNA interactomes and translational landscapes with unprecedented cellular resolution.

    Click here to watch

    Hosted by:

    - top -


    Elements of Style in Reproducible Workflow Creation and Maintenance: A Hands-on Tutorial
    by Anne Deslattes Mays, Christina Chatzipantsiou

    November 26, 2021

    In this short 3 hour course, we will introduce the learner to certain elements of style in the construction and containerization of small single-function processes that facilitate reproducible workflow creation and execution. We will show how these processes may be kept up-to-date and alert the creator to the functional state of these processes (working or failing) by using a feature found within GitHub called GitHub Actions.

    This hands-on-course will use a small example to provide the structure, philosophy and approach to achieving this desirable outcome. This course seeks to demystify and make accessible powerful methods one can use to achieve platform independence and platform interoperability. Using a simple RNASeq pre-baked analysis example to demonstrate these techniques, we will break down and walk the learner through each of the construction steps. The learners will be introduced to Conda, Docker, GitHub and the standard workflow language, Nextflow. If time permits, we will also show how these containerized processes can also be represented in a second standard workflow language implementation (e.g. Common Workflow Language or WDL).

    By the end of the course, the learner will understand these Elements of Style and will know how Conda, Docker, GitHub, Zenodo, and Nextflow enable reproducible research. Moreover, these steps will be on GitHub for the Learner to return to and reproduce themselves after the end of the course. In taking this course, the Learner will also be shown the power of JupyterLab notebooks to facilitate literate programming. Through their participation in the class, learners will learn and understand FAIR (findability, accessibility, interoperability and reusability) best practices. We ask all participants to get a GitHub, Zenodo and ORCID accounts prior to the course. We ask for minimal background knowledge of the command line, simple commands in the shell environment, we enable a bit of self-learning from the repository to facilitate the acquisition of this knowledge.

    GitHub: https://github.com/ISCB-Academy/Elements-of-Style-Reproducible-Workflow-Creation-Maintenance-Tutorial

    Capacity: 20

    Hosted by:

    - top -


    Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads
    by Kishwar Shafin

    November 29, 2021

    Abstract: Long-read sequencing has the potential to transform variant detection by reaching difficult-to-map regions and routinely linking together adjacent variations to enable read-based phasing. Third-generation nanopore sequence data has demonstrated a long read length, but current interpretation methods for its novel pore-based signal have unique error profiles, making accurate analysis challenging. In this talk, I will introduce a haplotype-aware variant calling pipeline PEPPER-Margin-DeepVariant that produces state-of-the-art variant calling results with nanopore data. The nanopore-based method outperforms the short-read-based single nucleotide variant identification method at the whole genome-scale and produces high-quality single nucleotide variants in segmental duplications and low-mappability regions where short-read-based genotyping fails.

    Click here to watch

    Hosted by:

    - top -


    Longitudinal genome resolved metagenomics
    by Christopher Quince

    November 30, 2021

    The extraction of prokaryotic genomes direct from metagenome assemblies has uncovered a wealth of novel microbial diversity, both in the environment and host associated microbiomes. It is a particularly powerful technique when coupled with longitudinal sampling of the same community since it can then also be used to understand changes in community structure. I will give an overview of bioinformatics methods for resolving genomes direct from multiple metagenomic samples. I will briefly explain short read assembly methods and binning followed by evaluation of bins to metagenome assembled genomes (MAGs). These principles will be illustrated using a large-scale binning of MAGs from anaerobic digestion reactors. I will then introduce our pipeline, STRONG, STrain Resolution ON assembly Graphs: https://github.com/chrisquince/STRONG, for resolving sub-populations within MAGs. I will compare to alternative methods that obtain strains from metagenomes de novo and apply it to a study of human fecal microbiome transplants.

    Click here to watch

    Hosted by:

    - top -


    DeepSTARR predicts enhancer activity from DNA sequence and enables the de novo design of enhancers
    by Bernardo Almeida

    December 9, 2021

    Enhancer sequences control gene expression and comprise binding sites (motifs) for different transcription factors (TFs). Despite extensive genetic and computational studies, the relationship between DNA sequence and regulatory activity is poorly understood and enhancer de novo design is considered impossible. Here we built a deep learning model, DeepSTARR, to quantitatively predict the activities of thousands of developmental and housekeeping enhancers directly from DNA sequence in Drosophila melanogaster S2 cells. The model learned relevant TF motifs and higher-order syntax rules, including functionally non-equivalent instances of the same TF motif that are determined by motif-flanking sequence and inter-motif distances. We validated these rules experimentally and demonstrated their conservation in human by testing more than 40,000 wildtype and mutant Drosophila and human enhancers. Finally, we designed and functionally validated synthetic enhancers with desired activities de novo.

    Click here to watch

    Hosted by:

    - top -


    A Multi-Objective Genetic Algorithm to Find Active Modules in Multiplex Biological Networks
    by Elva Maria Novoa del Toro

    December 14, 2021

    One of the most challenging tasks in computational biology is the integration of complementary biological data produced from different experimental sources. Our goal here is to combine expression data and biological networks to identify “active modules”, i.e. subnetworks of interacting genes/proteins associated with expression changes in different biological contexts. We developed MOGAMUN, a multi-objective genetic algorithm that finds dense and overall deregulated subnetworks in a multiplex network. We compared the performance of MOGAMUN with different state-of-the-art methods for active module identification. We also applied MOGAMUN to identify active modules for a rare monogenic disease, Facioscapulohumeral muscular dystrophy (FSHD). MOGAMUN is available as a Bioconductor package.

    Click here to watch

    Hosted by:

    - top -