Leading Professional Society for Computational Biology and Bioinformatics
Connecting, Training, Empowering, Worldwide

banner

CAMDA COSI

Attention Presenters - please review the Speaker Information Page available here
Schedule subject to change
All times listed are in UTC
Wednesday, July 28th
11:00-12:00
CAMDA Keynote: Gene deregulations driving cancer at single patient resolution
Format: Live-stream

Moderator(s): Joaquin Dopazo

  • Francesca Ciccarelli

Presentation Overview: Show

A major challenge in cancer biology is the identification of gene alterations that promote cancer. Current methods tackle this issue by detecting signs of positive selection, i.e. identifying genes whose mutations recur across cancer samples because they confer selective growth advantages to the mutated cancer cells. However, in cancers where the recurrence of mutations across sample is low and the mutational landscape is highly variable but mostly flat, these approaches are clearly insufficient. We have developed a novel approach that allows a comprehensive characterisation of genes contributing to cancer one patient at a time. Our approach applies machine learning to identify aberrations in the individual patients based on their similarity to known cancer-driver alterations rather than how frequently they occur across patients. In my talk, I will explain how this approach works and will describe how we have validated experimentally some of these predictions to confirm their role in cancer. I will give examples of how a better knowledge of the completer repertoire of drive genes at the single-patient resolution contribute to enhance our understanding of carcinogenesis.

12:00-12:20
CAMADA Challenges - Overview
Format: Live-stream

Moderator(s): Joaquin Dopazo

  • Wenzhong Xiao, Massachusetts General Hospital, Harvard Medical School, United States
12:40-13:20
Filter Drug-induced Liver Injury (DILI) Literature with Natural Language Processing and Ensemble Learning
Format: Pre-recorded with live Q&A

Moderator(s): Joaquin Dopazo

  • Xianghao Zhan, Stanford University, United States
  • Fanjin Wang, University College London, United Kingdom
  • Olivier Gevaert, Stanford University, United States

Presentation Overview: Show

Drug-induced liver injury (DILI) is an adverse effect of drugs characterized by abnormalities in liver tests, and it may lead to acute liver failure. As a key assessment for new drug candidates, DILI events are reported in the publications of clinical practices and preliminary in vitro and in vivo experiments. Conventionally, screening the large corpus of publications to label DILI-related reports is carried out manually, which substantially limits the processing speed. The development of natural language processing (NLP) techniques enables the automatic processing of texts. Here, we report a model for filtering DILI literature with four NLP text vectorization techniques and ensemble learning. The model with TF-IDF and logistic regression outperformed others with an AUROC of 0.990, an accuracy of 0.957, and an AUPRC of 0.990. An ensemble model with similar performance but the fewest false-negative cases was built based on 12 models. Both models showed good performance on the hold-out validation data, and the ensemble model reached a higher accuracy of 0.954 and an F1 score of 0.955. Additionally, important words in positive/negative predictions were identified by interpreting the models. Generally, the ensemble model reached satisfactory classification results, which can be used by researchers to quickly filter DILI-related literature.

13:20-13:40
Comparative analysis of information-theory-based statistical methods and transformer-based machine learning techniques for text classification
Format: Pre-recorded with live Q&A

Moderator(s): Joaquin Dopazo

  • Arsentii Ivasiuk, Bogomoletz Institute of Physiology, Kyiv, Ukraine, Ukraine
  • Ihor Stepanov, Taras Shevchenko National University of Kyiv, Ukraine, Ukraine
  • Stanislav Zubenko, Institute of Molecular Biology and Genetics of NASU, Kyiv, Ukraine, Ukraine
  • Alina Frolova, The Institute of Molecular Biology and Genetics of NASU, Ukraine

Presentation Overview: Show

We implemented an information theory statistical approach and compared it with modern transformers on relevant practical tasks ‒ classification of biomedical papers related to Drug-Induced Liver Injury (DILI) as part of the CAMDA 2021 Challenge 2. We propose to use our statistical method as a first approach to text classification because our method is very fast, easily interpretable and can be scaled by extending the training dataset. Compared with models such as SciBERT that require computational and informational resources for pre-training, statistical approach does not require any additional data sources and pre-training operations. Moreover, most text classification tasks do not need very high accuracy and complicated language structure analysis. Our statistical information-theory based method is a reliable approach that shows robust performance with balanced precision and recall. It is very useful for dataset analysis because we can estimate the word distribution among classes and their influence in the text. This information can be used for feature selection by more complicated models such as neural network classifiers that utilize word embeddings.

13:40-13:50
The CAMDA Contest Challenges TextNetTopics: Applied on Literature AI for Drug Induced Liver Injury
Format: Pre-recorded with live Q&A

Moderator(s): Joaquin Dopazo

  • Malik Yousef, Zefat College, Israel

Presentation Overview: Show

In this study, we are applying TextNetTopics on textual data as a response to the CAMDA challenge. TextNetTopics is a novel approach that applies feature selection by considering topics of words rather than that traditional approach, Bag-of-words. Thus the approach performs topic selections rather than word selection. TextNetTopics is based on the generic approach of grouping and scoring/ranking.
The approach suggests ranked significant topics as its output along with the performance of building a model from top topics. The performance of TextNetTopics outperforms other feature selection approaches while getting a high performance when applying the model on the validation data provided by the CAMDA.

13:50-14:00
Medical text classification using dynamic time warping (DTW) and a CNN-BiLSTM hybrid model
Format: Pre-recorded with live Q&A

Moderator(s): Joaquin Dopazo

  • Anika Liu, University of Cambridge, United Kingdom
  • Sanjay Rathee, University of Cambridge, United Kingdom
  • Nicholas M Katritsis, University of Cambridge, United Kingdom
  • Gehad Youssef, University of Cambridge, United Kingdom
  • Woochang Hwang, University of Cambridge, United Kingdom
  • Lilly Wollman, University of Cambridge, United Kingdom
  • Namshik Han, University of Cambridge, United Kingdom
  • Meabh MacMohan, LifeArc, United Kingdom

Presentation Overview: Show

Medical text classification is important in the drug and biomedical discovery process. Traditional text mining techniques can identify patterns from a text based on a curated list of domain-specific words. However, these approaches are dependent on the subject matter and input of additional keywords is often required as new terminologies are introduced. We developed a hybrid model that does not require any input of curated words from the subject matter (drug-induced liver toxicity) and which could potentially be abstracted to a different domain. Our model exploits three features from a passage of text - its sequence, structure and higher dimensionality. We developed a hybrid CNN-bidirectional LSTM architecture using word embeddings as input. We simultaneously trained SVM and Naive Bayes classifiers using term tf–idf data as input. We obtained a majority vote for the classifiers. If there was a consensus, we considered this a stable prediction. However, if there was discordance between the classifiers, we implemented dynamic time warping (DTW) techniques to act as the final arbiter. DTW was used to align the tf–idf representations for every abstract and the prevalence of DILI or Non-DILI labelled abstracts in the 0.5% most aligned abstracts was taken as the prediction.

14:20-14:30
Robustness and reproducibility of computational genomics tools
Format: Pre-recorded with live Q&A

Moderator(s): Wenzhong Xiao

  • Pelin Icer Baykal, Georgia State Univeristy, United States
  • Dhrithi Deshpande, University of Southern California, United States
  • Serghei Mangul, University of Southern California, Los Angeles, United States

Presentation Overview: Show

Reproducibility and robustness of genomic tools are two important factors to assess the reliability of bioinformatics analysis. Such assessment based on these criteria requires repetition of experiments across lab facilities which is usually costly and time consuming. In this study, we propose a novel method that is able to generate computational replicates that created by randomly shuffling the order of reads and by taking the reverse complement of the reads. Despite our method not being to capture the full variability of real technical replicates, our method provides a robust low bound of reproducibility of genomic tools. We analyzed three different groups of genomic tools: read alignment tools, structural variant (SV) detection tools and RNA-seq quantification tools. We observed substantial variability across these different genomic tools. However, we found that the chosen transcriptome quantification tools were not affected by the shuffled data but were affected by reverse complement data. The proposed method here will help biomedical communities advise on the robustness and reproducibility factors of genomic tools and help them to choose the most appropriate tools in terms of their needs. Furthermore, our method will enable routine evaluation of newly published tools at scale free of charge.

14:30-14:40
IMPACT OF EPIGENETICS IN SARS-CoV-2 INFECTION WITH PROPOSED EPI-DRUGS FOR CORONA VIRUSES
Format: Pre-recorded with live Q&A

Moderator(s): Wenzhong Xiao

  • Pragya Chaturvedi, Central University of Punjab, Bathinda India, India
  • Sudhanshu Singh, Amity University Rajasthan, India

Presentation Overview: Show

We are frequently exposed to animal viruses through the food which we eat, the pets which we have, and our connections with nature. The enormous majority of viruses which enter our bodies pass inoffensively through our respiratory tract and gastrointestinal tracts or are destroyed by our immune systems. However, on rare circumstances, a human-encounters by an animal and begins to replicate itself, accomplishing its entire lifecycle within human cells and intensifying one virion into a population of many. Replication of an animal virus within the body of human subject is the key instant in the zoonotic process. By varying the function of gene locus. such regulation links genotype and phenotype without changing the original DNA sequences.However antiviral drugs have been used to treat various viral diseases since long, epi-drugs are now proposed to treat these diseases due to the epigenetic implications found in these infections. Epi-drugs are small agents that are able to reverse some epigenetic changes.

14:40-14:50
In silico evaluation of SARS-CoV-2 primers performance
Format: Pre-recorded with live Q&A

Moderator(s): Wenzhong Xiao

  • Paweł Łabaj, Malopolska Centre of Biotechnology, Jagiellonian University University, ul. Gronostajowa 7A, 30-387 Krakow, Poland, Poland
  • Wojciech Branicki, Malopolska Centre of Biotechnology, Jagiellonian University University, ul. Gronostajowa 7A, 30-387 Krakow, Poland, Poland
  • Alina Frolova, Institute of Molecular Biology and Genetics of NASU, 150, Zabolotnogo Str., Kyiv, 03143, Ukraine, Ukraine
  • Michał Kowalski, Malopolska Centre of Biotechnology, Jagiellonian University University, ul. Gronostajowa 7A, 30-387 Krakow, Poland, Poland
  • Witold Wydmański, Malopolska Centre of Biotechnology, Jagiellonian University University, ul. Gronostajowa 7A, 30-387 Krakow, Poland, Poland
  • Krzysztof Pyrć, Malopolska Centre of Biotechnology, Jagiellonian University University, ul. Gronostajowa 7A, 30-387 Krakow, Poland, Poland

Presentation Overview: Show

Throughout course of SARS-CoV-2 pandemic, diagnostic laboratories and researchers all around the world had observed that different clades/lineages may impact COVID-19 diagnosis, leading to false results, which allows for further, unnoticed spread of virus. We had placed hypothesis that SNPs can drastically decrease diagnostic power and value of primer sets. With obtained in-vitro results of amplification of SARS-CoV-2 hypothesis has been strengthened and we evaluated in-silico how variability in genomes of SARS-CoV-2 in primer/probe binding sites may potentially affect their interactions, and suggest the best combinations for further consideration. We downloaded nearly 1.5 millions of SARS-CoV-2 genomes from GISAID, applied quality filters, and performed an analysis with our Python library pyprimer for the 15 publicly available primers/probe sets. We found that the five sets are most susceptible to the currently most abundant clades/lineages. Mismatches encompassing the binding sites for them are present in current Variants of Concern. Best performing five sets of primers can still detect almost all of VOC with high overall accuracy. Nonetheless, secondary structure of some of best performing primers raises concerns regarding similar structure properties from retracted sets, which by dimerization were producing false-positives.

14:50-15:00
Mechanistic models of COVID-19 disease maps to model SARS-CoV-2 infection and antiviral interventions
Format: Pre-recorded with live Q&A

Moderator(s): Wenzhong Xiao

  • Kinza Rian, 1 Bioinformatics Área, Fundación Progreso y Salud (FPS), Hospital Virgen del Rocío, Sevilla, Spain;, Spain
  • Marina Esteban-Medina, 1 Bioinformatics Área, Fundación Progreso y Salud (FPS), Hospital Virgen del Rocío, Sevilla, Spain, Spain
  • Carlos Loucera, 1 Bioinformatics Área, Fundación Progreso y Salud (FPS), Hospital Virgen del Rocío, Sevilla, Spain, Spain
  • María Peña-Chilet, 1 Bioinformatics Área, Fundación Progreso y Salud (FPS), Hospital Virgen del Rocío, Sevilla, Spain, Spain
  • Joaquin Dopazo, 1 Bioinformatics Área, Fundación Progreso y Salud (FPS), Hospital Virgen del Rocío, Sevilla, Spain, Spain

Presentation Overview: Show

Scope: The use of mechanistic models of human cell signaling can provide the functional link between the gene-level data and cell phenotype level, highlighting specific disease-related cellular mechanisms. Here we aimed to approach the COVID-19 problem in more detail, trying to capture the complexity of the molecular mechanism of the disease. To achieve so, Hipathia uses gene expression data to produce profiles of signaling circuit activity within pathways to describe the molecular mechanisms behind different biological scenarios, using them as proxies of disease-related cell functionalities triggered by them.

Methodology: To evaluate the effect of the SARS-CoV-2 infection in human cells, we firstly expanded the SARS-CoV2 virus-human interactome from existing KEGG pathways to obtain a subset of signaling circuits affected. Then, to test the methodology, we used an RNA-seq dataset from SARS-CoV-2 infected/healthy individuals from the GEO and performed a differential signaling analysis. In addition, we have implemented the same analysis over the set of pathways from the COVID-19 disease map to assess differences between the two conditions.

Conclusions: Indeed, these disease models can help to enlighten the deregulated disease mechanisms to gain insight into COVID-19 underlying processes. Opening new avenues for potential intervention points and new drug repurposing strategies.

15:00-15:20
CAMDA 1st day summary
Format: Live-stream

Moderator(s): Wenzhong Xiao

  • Joaquin Dopazo, 1 Bioinformatics Área, Fundación Progreso y Salud (FPS), Hospital Virgen del Rocío, Sevilla, Spain, Spain
Thursday, July 29th
11:00-11:40
CAMDA Invited: 5 myths about AI and its implication to regulatory science at FDA
Format: Live-stream

Moderator(s): Joaquin Dopazo

  • Weida Tong
11:40-12:20
DILIc : An AI based classifier to search for Drug-Induced Liver Injury Literature
Format: Pre-recorded with live Q&A

Moderator(s): Joaquin Dopazo

  • Anika Liu, University of Cambridge, United Kingdom
  • Sanjay Rathee, University of Cambridge, United Kingdom
  • Meabh MacMohan, LifeArc.Org, United Kingdom
  • Nicholas Katritsis, University of Cambridge, United Kingdom
  • Gehad Youssef, University of Cambridge, United Kingdom
  • Woochang Hwang, University of Cambridge, United Kingdom
  • Lilly Wollman, University of Cambridge, United Kingdom
  • Namshik Han, University of Cambridge, United Kingdom

Presentation Overview: Show

Drug-Induced Liver Injury (DILI) is the most frequent cause of acute liver failure in the majority of western countries[4] and is a major cause of attrition of novel drug candidates[2]. Manual trawling of literature for DILI papers is the main route of obtaining data from DILI studies. This makes it an inefficient process prone to human error. Therefore, an automatized AI model capable of retrieving DILI-related papers from the huge ocean of literature could be invaluable for the drug discovery community. In this project, we built an artificial intelligence (AI) model combining the power of Natural Language Processing (NLP) and Machine Learning (ML) to address the above problem. The keywords from NLP are processed by apriori pattern mining ML algorithm to extract relevant patterns which are used to estimate initial weightings for an ML classifier. Along with pattern importance and frequency, an FDA-approved drug list mentioning DILI adds extra confidence in classification. The combined power of these methods builds a DILI classifier with 94.91% cross-validation and94.14% external validation accuracy. An R Shiny App capable to classify single or multiple entries will be developed to enhance user experience.

12:40-13:00
dialogí: A text-mining approach for the identification of DILI-related literature with automated concept extraction
Format: Pre-recorded with live Q&A

Moderator(s): Joaquin Dopazo

  • Nicholas M Katritsis, University of Cambridge, United Kingdom
  • Anika Liu, University of Cambridge, United Kingdom
  • Gehad Youssef, University of Cambridge, United Kingdom
  • Sanjay Rathee, University of Cambridge, United Kingdom
  • Méabh MacMahon, LifeArc.Org, United Kingdom
  • Woochang Hwang, University of Cambridge, United Kingdom
  • Lilly Wollman, University of Cambridge, United Kingdom
  • Namshik Han, University of Cambridge, United Kingdom

Presentation Overview: Show

Drug-induced liver injury (DILI) is one of the most common reasons for the withdrawal of drug candidates. Among the cases of DILI, detecting unexpected (idiosyncratic) liver injury poses an interesting challenge, since this is not directly tied to the (dose-dependent) toxicity of a drug or its metabolites. As such, literature search remains a major tool for sourcing DILI-related information, which often comes directly from clinical practice.

Here, we present dialogí, a text-mining tool that combines different Natural Language Processing (NLP) approaches, together with a linear classifier, to differentiate between DILI-positive and -negative PubMed abstracts. Often, within the same DILI-positive paper, multiple drugs-- most of which unrelated to DILI-- are mentioned. We, thus, expand our tool with a framework that tries to identify and extract key (DILI-positive) drugs on a paper-by-paper basis.

The aforementioned classifier was trained on 11,200 equally-split DILI-positive and -negative PubMed abstracts, including titles, and was validated (internally) on the remaining 2,800 abstracts, resulting in a precision of 94.8% and recall of 93.5%. On external validation, the model displayed precision and recall of 93.3% and 94.9%, respectively, with an accuracy of 94.1%.

13:00-13:20
Fine-tuning pretrained roBERTa model for optimizing relevant biomedical literature search
Format: Pre-recorded with live Q&A

Moderator(s): Joaquin Dopazo

  • Valentyn Bezshapkin, Małopolska Centre of Biotechnology, Poland

Presentation Overview: Show

Transformer architecture has dominated the world of natural language processing in recent years. It has two main advantages over previous generation LSTM-based models. First, the self-attention mechanism allows for the utilization of word context in longer sequences. Second, in contrast to LSTM, transformers do not have a recurrence, therefore they do not require sequential training. As a result, efficient parallelization could be implemented, drastically decreasing training times for massive datasets and improving the scalability of the models. Due to these prerequisites, large pretrained domain-oriented models emerged, which require only a few epochs of fine-tuning for a specific task. They became an "industry standard" as of now and we explored their applicability in a search of relevant biomedical literature.

13:20-14:00
Discovering relationship between bacteriophages and antimicrobial resistance
Format: Pre-recorded with live Q&A

Moderator(s): Wenzhong Xiao

  • Maya Zhelyazkova, Sofia University , Bulgaria
  • Roumyana Yordanova, Hokkaido University, Japan
  • Stefan Tsonev, AgroBioInstitute, Bulgaria
  • Iliyan Mihaylov, Sofia University , Bulgaria
  • Miroslav Zoric, IFVC, NoviSad, Serbia, Serbia
  • Stefan Kirov, Bristol-Myers-Squibb, NJ, USA, United States
  • David Danko, Weill Cornell Medical College, New York, USA, United States
  • Christopher Mason, Weill Cornell Medical College, New York, USA, United States
  • Dimitar Vassilev, Sofia University , Bulgaria

Presentation Overview: Show

Recent focus on the relationship between bacteriophages and antimicrobial resistance in the context of contemporary microbiology related to medicine and pharmaceutics is driven by their potential contribution to the current growing importance of antimicrobial resistance. There exists a number of research studies which confirm [1], or question [2] the role of the bacteriophages in dissemination of antimicrobial resistance genes.
A major objective of the CAMDA challenge is to acquire more knowledge about the relationship between viruses, their hosts and antimicrobial resistance genes in determining if antimicrobial resistance indeed can spread through phages. This study is focused on discovering relationship and possible dependencies between bacteriophages and antimicrobial resistance based on the data collected from different city environments all over the world. The approach used in our analyses consists of several different methods which assess the differential abundance of phages, their diversity across samples, the impact on antimicrobial resistance categories and associations with ARGs genes. The relationship between phages, their hosts and antimicrobial resistance is also explored by a Bayesian spatial model.

14:20-15:10
CAMDA Keynote: Microbiome Data Science: from the Earth microbiome to the Global virom
Format: Live-stream

Moderator(s): Wenzhong Xiao

  • Nikos Kyrpides

Presentation Overview: Show

Microbiome research is rapidly transitioning into Data Science. The unprecedented volume of microbiome data being generated pose significant challenges with respect to standards and management strategies, but also bear great new opportunities that can fuel discovery. Computational analysis of microbiome samples involving previously uncultured organisms, is currently advancing our understanding of the structure and function of entire microbial communities and expanding our knowledge of genetic and functional diversity of individual micro-organisms. I will describe some of our computational approaches and will emphasize the value of data processing integration in enabling the exploration of large metagenomic datasets and the discovery of novelty. I will discuss current approaches and success stories for the discovery of novel phylogenetic lineages as well as the exploration of the viral dark matter.

15:10-15:20
CAMDA 2nd day summary
Format: Live-stream

Moderator(s): Wenzhong Xiao

  • Paweł P. Łabaj, Małopolska Centre of Biotechnology of Jagiellonian University, Poland
Friday, July 30th
11:00-11:40
CAMDA Invited: Mechanistic models in emerging infectious diseases: learning from COVID-19
Format: Live-stream

Moderator(s): Joaquin Dopazo

  • María Peña-Chilet, 1 Bioinformatics Área, Fundación Progreso y Salud (FPS), Hospital Virgen del Rocío, Sevilla, Spain, Spain
11:40-12:20
Targeting the host response in COVID-19 by integration of metabolic modeling and cheminformatics
Format: Pre-recorded with live Q&A

Moderator(s): Joaquin Dopazo

  • Gonghua Li, Kunming Institute of Zoology, Chinese Academy of Sciences, China
  • Fei-Fei Han, Massachusetts General Hospital, Harvard Medical School, United States
  • Qing-Peng Kong, Kunming Institute of Zoology, Chinese Academy of Sciences, China
  • Wenzhong Xiao, Massachusetts General Hospital, Harvard Medical School, United States

Presentation Overview: Show

The host response to SARS-CoV-2 is critical to the disease onset, progression, and outcome of COVID patients. Since the virus hijacks the host cellular metabolism for its replications, we hypothesized that proper modulation of the metabolism of host cells can efficiently reduce viral production and propagation. Here, we utilized the pathway information of COVID-19 Disease Map and developed viral-host Metabolic Modeling (vhMM) to in silico screen candidates for modulating the host metabolism against SARS-CoV-2. Our results showed that the impaired mitochondria function is the most significant change in human cells after the viral infection. We also identified five enzymes involved in de novo purine synthesis as candidate drug targets and NAD+ as a candidate metabolite for nutritional support. In addition, we integrated the results from metabolic modeling with cheminformatics to identify small molecules for COVID by targeting the viral biomass reaction and other essential reactions in the infected host cells. Fourteen candidates were identified among FDA approved drugs, which can be further tested for repurposing in the COVID-19 disease.

12:40-13:00
UTRCOV2: Unraveling T cell responses for long term protection of SARS-COV-2 infection
Format: Pre-recorded with live Q&A

Moderator(s): Wenzhong Xiao

  • Dongyuan Wu, Department of Biostatistics, University of Florida, United States
  • Runzhi Zhang, Department of Biostatistics, University of Florida, United States
  • Susmita Datta, Department of Biostatistics, University of Florida, United States

Presentation Overview: Show

Due to the COVID-19 pandemic, the global need for vaccines to prevent the disease is imperative. To date, several manufacturers have made efforts to develop vaccines against SARS-CoV-2. It will be helpful for future vaccine designs, resulting in long-term disease protection, if we know more details of the mechanism of T cell responses to SARS-CoV-2. In this study, we first detected DE genes between healthy donors and COVID-19 patients, and then built a healthy network and a COVID-19 network among those genes, separately. For each network, we identified modules and obtained hub genes for each module. Furthermore, we evaluated the differential connectivity for each gene between two networks. The results might improve the insight of gene expression associated with CD4+ T cells and expand our understanding of COVID-19.

13:00-13:20
Proceedings Presentation: Investigation of REFINED CNN ensemble learning for anti-cancer drug sensitivity prediction
Format: Pre-recorded with live Q&A

Moderator(s): Wenzhong Xiao

  • Omid Bazgir, Texas Tech University, United States
  • Souparno Ghosh, University of Nebraska-Lincoln, United States
  • Ranadip Pal, Texas Tech University, United States

Presentation Overview: Show

Motivation: Anti-cancer drug sensitivity prediction using deep learning models for individual cell line is a significant challenge in personalized medicine. Recently developed REFINED (REpresentation of Features as Images with NEighborhood Dependencies) CNN (Convolutional Neural Network) based models have shown promising results in improving drug sensitivity prediction. The primary idea behind REFINEDCNN is representing high dimensional vectors as compact images with spatial correlations that can benefit from CNN architectures. However, the mapping from a high dimensional vector to a compact 2D image depends on the a-priori choice of the distance metric and projection scheme with limited empirical procedures guiding these choices.
Results: In this article, we consider an ensemble of REFINED-CNN built under different choices of distance metrics and/or projection schemes that can improve upon a single projection based REFINED-CNN model. Results, illustrated using NCI60 and NCI-ALMANAC databases, demonstrate that the ensemble approaches can provide significant improvement in prediction performance as compared to individual
models. We also develop the theoretical framework for combining different distance metrics to arrive at a single 2D mapping. Results demonstrated that distance-averaged REFINED-CNN produced comparable performance as obtained from stacking REFINED-CNN ensemble but with significantly lower computational cost.

13:20-13:40
Proceedings Presentation: PathCNN: Interpretable convolutional neural networks for survival prediction and pathway analysis applied to glioblastoma
Format: Pre-recorded with live Q&A

Moderator(s): Wenzhong Xiao

  • Jung Hun Oh, Memorial Sloan Kettering Cancer Center, United States
  • Wookjin Choi, Virginia State University, United States
  • Euiseong Ko, University of Nevada, Las Vegas, United States
  • Mingon Kang, University of Nevada, Las Vegas, United States
  • Allen Tannenbaum, Stony Brook University, United States
  • Joseph Deasy, Memorial Sloan Kettering Cancer Center, United States

Presentation Overview: Show

Motivation: Convolutional neural networks (CNNs) have achieved great success in the areas of image processing and computer vision, handling grid-structured inputs and efficiently capturing local dependencies through multiple levels of abstraction. However, a lack of interpretability remains a key barrier to the adoption of deep neural networks, particularly in predictive modeling of disease outcomes. Moreover, because biological array data are generally represented in a non-grid structured format, CNNs cannot be applied directly.
Results: To address these issues, we propose a novel method, called PathCNN, that constructs an interpretable CNN model on integrated multi-omics data using a newly-defined pathway image. PathCNN showed promising predictive performance in differentiating between long-term survival (LTS) and non-LTS when applied to glioblastoma multiforme (GBM). The adoption of a visualization tool coupled with statistical analysis enabled the identification of plausible pathways associated with survival in GBM. In summary, PathCNN demonstrates that CNNs can be effectively applied to multi-omics data in an interpretable manner, resulting in promising predictive power while identifying key biological correlates of disease.

13:40-14:00
Proceedings Presentation: Asynchronous Parallel Bayesian Optimization for AI-driven Cloud Laboratories
Format: Pre-recorded with live Q&A

Moderator(s): Wenzhong Xiao

  • Trevor Frisby, Carnegie Mellon University, United States
  • Zhiyun Gong, Carnegie Mellon University, United States
  • Christopher Langmead, Carnegie Mellon University, United States

Presentation Overview: Show

Motivation: The recent emergence of cloud laboratories — collections of automated wet-lab instruments that are accessed remotely, presents new opportunities to apply Artificial Intelligence and Machine Learning in scientific research. Among these is the challenge of automating the process of optimizing experimental protocols to maximize data quality. Results: We introduce a new deterministic algorithm, called PROTOCOL (PaRallel OptimizaTiOn for ClOud Laboratories), that improves experimental protocols via asynchronous, parallel Bayesian optimization. The algorithm achieves exponential convergence with respect to simple regret. We demonstrate PROTOCOL in both simulated and real-world cloud labs. In the simulated lab, it outperforms alternative approaches to Bayesian optimization in terms of its ability to find optimal configurations, and the number of experiments required to find the optimum. In the real-world lab, the algorithm makes progress towards the optimal setting. Availability: PROTOCOL is available as both a stand-alone Python library, and as part of a R Shiny application at https://github.com/clangmead/PROTOCOL

14:20-15:10
CAMDA Cafe - Grand challenges of our times
Format: Live-stream

Moderator(s): Joaquin Dopazo

  • Joaquin Dopazo, 1 Bioinformatics Área, Fundación Progreso y Salud (FPS), Hospital Virgen del Rocío, Sevilla, Spain, Spain
15:10-15:20
Award announcement and closing remarks
Format: Live-stream

Moderator(s): Joaquin Dopazo

  • David P Kreil



International Society for Computational Biology
525-K East Market Street, RM 330
Leesburg, VA, USA 20176

ISCB On the Web

Twitter Facebook Linkedin
Flickr Youtube