Knowledge Guided Machine Learning in Biology
Tuesday, May 11, 2021
The study of biological systems holds great promise for understanding the origin and evolution of life and the interplay of biological processes with environmental effects, which influences policy decisions relating to public health and conservation. While the state-of-the-art for understanding biological systems have conventionally relied on numerical or statistical models for making predictions or performing in silico experimentation, these techniques struggle to capture the nonlinear response of many natural systems. On the other hand, machine learning (ML) methods, that are able to extract highly complex and non-linear patterns and models solely from data, are increasingly being considered as promising alternatives to scientific discovery in biological applications. However, black-box ML methods, that are developed and deployed agnostic to underlying scientific theories, face several barriers in understanding real-world biological systems, primarily due to the absence of ML-ready data in biological applications at the scales possible in commercial applications of ML (e.g., on benchmark problems in computer vision and speech recognition). As a result, there is a growing realization in the scientific community to embrace a deeper integration of scientific knowledge with machine learning frameworks, referred to as the paradigm of Knowledge Guided Machine Learning (KGML). While this emerging paradigm has already started to show successful applications in a number of disciplines including fluid dynamics, particle physics, computational chemistry, and climate science, there is a need for concerted efforts to realize the full potential of “KGML in biology,” by integrating complex forms of biological knowledge (available as process-based models, ontologies, rules, heuristics, etc.), with ML methodologies.
This session will bring together leading scientists working on the frontiers of KGML in biological applications and discuss some of the common challenges and opportunities in this emerging field. We anticipate participation by academic and non-academic professionals and graduate students. The recent maturation of machine learning approaches, environmental data infrastructure, informatics techniques, and data terminologies makes KGML an exciting area of research.
For more information: https://sites.google.com/view/glbio-kgml/home
Genotype to Phenotype in Model and Non-Model Organisms
Wednesday, May 12, 2021
Part I: Model Organisms
Part II: Non-Model Organisms
- Lenore J. Cowen (Tufts University)
- Jane Greenberg (Drexel University)
- Judith Klein-Seetharaman (Colorado School of Mines)
- Nastassja A. Lewinski (Virginia Commonwealth University)
- Hollie Putnam (University of Rhode Island)
- Hannah G. Reich (University of Rhode Island)
- Liza M. Roger (Virginia Commonwealth University)
- Rohit Singh (Massachusetts Institute of Technology)
Comprehensive understanding of a gene’s function requires going beyond a high-level annotation of its molecular function and biological process to a more detailed characterization of its specific role in the relevant signaling, epigenetic or enzymatic context as well as its interactions with DNA and other proteins. An integrative approach across species or conditions, combining omics with physiology-based approaches, can advance understanding and potentially lead to testable hypotheses. We are particularly excited about bioinformatics research that adapts computational tools and extends data infrastructures for model organisms to investigate phenotypes of non-model organisms, using integrative and systems-based approaches to overcome the relative lack of data in the latter.
To invite a diverse set of research presentations on this topic, we propose splitting the topic into model and non-model organisms. We believe that the core community that attends GLBIO is already very interested in the genotype to phenotype problem as it relates to human health and disease, and may contribute some interesting talks for part I. Non-model organisms and in particular, organisms whose importance is outside biomedical applications, have not been historically well-represented in this community, and we are excited to invite interesting speakers and bring this new area to part II of our session and this conference.
For more information: https://corals.cs.tufts.edu/glbio2021/
Taxonomic Names and Metadata: A Framework for Big Data Interoperability
Thursday, May 13, 2021
Research involving organisms depends critically on a long-standing tradition of defining and assigning names to species and higher groups of organisms (taxonomy) and conventions for recording metadata on where, when and under what conditions specimens of organisms were observed or captured (provenance). Taxon names are governed by nomenclatural codes that aim to ensure uniqueness and universality of all taxon names. However, different groups of organisms (animals, bacteria, algae, fungi and plants) are governed by different codes, and some repetition exists. Principles are in place for restoring order to the surprisingly large number of cases where the same taxon has been assigned multiple names (synonymy). Similarly, standards have been adopted for recording metadata describing provenance of biological specimens (e.g., Darwin Core). Less formalized are metadata for describing the specimens themselves, digital analogs (images) of specimens, or anatomical and morphological features of specimens, which frequently are the objects of study. This special session will feature abbreviate presentations on challenges with taxonomic names and metadata associated with biological specimens, including their associated images, omics, and environmental datasets currently being used in research projects. Presenters will highlight approaches that have been adopted to address these challenges. The presentations will frame the issues as general questions in need of answers, and current as intermediate solutions. The session will conclude with a discussion that will explore more robust solutions to the challenges of using taxonomic names and metadata in research.
For more information: http://glbio2021.tnm.tubri.org/