View Posters By Category
Short Abstract: The goal of the challenge was to predict the probability that a patient can develop cancers within 12 months based on mammographic images. Several studies have shown that the breast density is an important breast cancer risk factor [1,2]. In fact, they indicate that the increase in the breast density is one of the strongest indicators of developing breast cancer. So, to meet the Digital Mammography DREAM Challenge, we have proposed an approach based on a statistical analysis of breast density to automatically discover the development of breast cancer. The basic idea was to the represent textons distribution as topological map. We have first extracted the dictionary of words, in fact for each image in the training data textons with a size of 200×200, which are spaced by 100 pixels are extracted,then for each texton we compute a set of features. We have used k-means method to cluster the textons of the input image to create the centers and compute their frequency, then we select the centers that have a frequency higher then 0.02 from all the training dataset. Finally we have applied k-means method to extract the most representative centers that represent the words of our dictionary. The dictionary extracted is used to map each input image to a co-occurence table,each case in the table represent the texton in the image and the value of this case is the number of nearest word from the dictionary. From the co-occurence table of each image we have computed the histogram and we have extracted the following features: mean, variance, standard deviation, skewness, kurtosis, entropy, energy. Also, we have computed the Haralick features  and also the features proposed by Soh  by using four orientation and one distance. In the test phase, for each input image we extract the features vector , then the Euclidian distance was used to retrieve the most similar images to the image query. We have submitted two runs to Digital Mammography DREAM Challenge Sub- Challenge 1. The first one we have used fifty nearest images to compute the voting score of the retrieved images and for the second we have used one hundred images. For the first submitted run the receiver operating characteristic curve (AUC) score of 0.53 is obtained which ranked #23 on the leaderboard and we have obtained the AUC score of 0.55 for the second run.
Short Abstract: Clustering is an important step to identify previously uncharacterized populations in the single-cell RNA-sequencing (scRNA-seq) data. However, existing scRNA-seq clustering algorithms have the following disadvantages: (1) lack of scalability to handle large-scale datasets; (2) compromise of data quality during the dimension reduction; and (3) time-consuming especially for large-scale single-cell datasets. To process large-scale scRNA-seq data effectively while preserving cell-to-cell distance in a reduced dimension, we present SHARP (https://github.com/shibiaowan/SHARP), a scalable algorithm based on ensemble random projection and multi-layer meta-clustering. For effective processing of large-scale scRNA-seq data, SHARP first employed a divide-and-conquer strategy to partition the data into several blocks of smaller-size datasets. Then, by adopting sparse random projection (RP), SHARP ran much faster than traditional dimension-reduction methods like PCA or tSNE, while minimizing the information loss during the dimension reduction. To make SHARP robust, several applications of RPs were applied for each block, whose individual RP-based clustering results were merged by a weighted-based meta-clustering (wMetaC) method. Later, we proposed a similarity-based meta-clustering (sMetaC) method to integrate the block-wise clustering results by considering the average cell-to-cell correlations of each block of single-cell datasets. SHARP has the following advantages over existing state-of-the-art scRNA-seq algorithms: (1) scalable to more than 1.3 million single cells; (2) well preserving cell-to-cell distance in a reduced dimensional space; (3) hyper-faster than most existing clustering algorithms; and (4) robustly accurate in terms of clustering performance. Comprehensive benchmarking tests on 16 public scRNA-seq datasets demonstrate SHARP outperforms other clustering algorithms in terms of speed and accuracy, especially for the large-size datasets (cell number>40,000) where SHARP ran at least 20 times faster than previous clustering algorithms. Besides, to the best of our knowledge, SHARP is the first R package which can be scalable to analyze 1.3 million single cells in less than one hour. SHARP can be also used for single-cell data visualization and marker gene identification.
Short Abstract: Malaria is still a global health burden, despite numerous coordinated efforts for eradication. One obstacle to eradication is the malaria parasite’s quick adaptation to anti-malarial drugs resulting in drug resistance. Resistance mechanisms are ill understood for many anti-malarials, making it difficult to overcome these costly evolutionary changes. Recent reports from Southeast Asia of resistance against Artemisinins (Art), the last line of defense against multi-drug resistant malaria, raises concerns of a significant setback in eradication efforts. Understanding the changing biology of malaria parasites as they acquire resistance may be key to preventing the spread of resistance. Here we introduce the DREAM of Malaria challenge, the first DREAM challenge that focuses on an infectious disease agent, aimed at predicting the changing biology of Art-resistant malaria. This two-part challenge will predict Art resistance states of malaria isolates based on transcription data and predict synergistic anti-malarial combinations in drug-resistant isolates. The first set of the Malaria DREAM challenge has 2 sub-challenges- i) predicting Art-resistance status measured as in vivo clearance rate using transcriptional data from sensitive and resistant parasites; and ii) predicting Art-resistance status measured as in vitro IC50. The aim of this challenge is to determine whether transcriptional data obtained from Art-resistant malaria field isolates could explain the complex Art resistance landscape in Southeast Asia. Unique aspects of the sub-challenge include a largely unannotated genome, the malaria parasite’s “just in time” transcription, and the peculiar nature of the resistance phenotype. The second challenge aims at determining which drugs may prove most effective in combination with Art. The second challenge will build on knowledge gained from the first challenge, utilizing transcriptional signals to determine if there are any potential untapped combination therapies that may exploit the changed biology of Art-resistant isolates. Between the two challenges, the DREAM of Malaria challenge aims at understanding and attacking anti-malarial resistance, preventing any further treatment failures, and taking a large step towards eradication.
Short Abstract: Single-cell biology has the potential to revolutionize biomedical research by resolving cellular dynamics not feasible using bulk gene expression measurements. A drawback of this technology is that the physical location of cells in the tissue of origin is being lost, and hence, a possible solution is using location-specific expression patterns to trace back the location of single-cells. The DREAM Single Cell Transcriptomics Challenge was designed to identify analytical strategies to achieve the spatial localization of single-cells from the Drosophila embryo based on matching single cell transcriptomic profiles and insitu expression measurements using only a subset of the 84 driver genes needed in previous studies. We have hypothesized that 1) genes that are important in predicting the coordinates of single-cells from RNA-Seq data via random forest models are among the most important ones to consider for matching of single-cell expression to reference insitu expression patterns and that 2) the best matching insitu patterns to a given single cell should be somehow close to the predicted location of single-cells, as learned from mapping single-cell expression to the gold standard based coordinates. The improvement in accuracy of single-cell localization due to the later hypothesis can be thoroughly evaluated in a post challenge analysis by having access to the scores of predictions that do or do not rely on this later hypothesis.
Short Abstract: One of the great tools computational biology has is the potential ability to provide researchers a deeper resolution of in situ gene expression via a digital recreation of target organisms. The recent DREAM Single Cell Transcriptomics Challenge aimed to help realize such a tool using Drosophila melanogaster as a target organism and Drop-seq technology, which enables profiling of the entire transcriptome at a single cell level. However, the Drop-seq process loses the 3-dimensional structure of the organism from which the cells originate. Thus, the thrust of the challenge was to design a method for recreating the 3- dimensional structure from the information obtained by Drop-seq.
Short Abstract: The Challenge requires participants to accurately reconstruct the location of single cells in the Drosophila embryo through in-situ hybridization data and single-cell sequencing data. The most challenge part is that there are fewer in situ hybridization genes available for each sub-challenge. Based on prior knowledge, we try to predict the expression of genes highly related with the in-situ hybridization genes and extend the set of genes that can be used to predict the cell position. Gene Ontology (GO) and Protein-Protein Interaction (PPI) networks are two data sources that are widely used as prior knowledge in the field of bioinformatics. In the previous works of single-cell sequencing data, researchers found that combining GO and PPI network with neural network is of great help to clustering of scRNA-seq data. Inspired by these works, we use the prior knowledge of GO and PPI networks to increase the available in-situ hybridization genes and then apply machine learning classification algorithms to extend the available genes in-situ hybridization data. To calculate the spatial location of the gene, we combined the normalized data for each cell with the expanded in-situ hybridization data to calculate the MCC score to determine its predicted location.
Short Abstract: The problem under discussion is to use RNAseq single-cell transcriptomic data to predict X, Y, and Z coordinates of a cell and to select the most important features. We used three methodologies to identify the most informative features, namely Deep Neural Networks (NN), LASSO and Random. Random was used to define a baseline accuracy for our analysis. Furthermore, we decided not to use the predicted X, Y, and Z values to directly come up with our top-10 position locations per cell. Instead, we utilized inference techniques using the trained models to get a list of the most important 20 / 40 / 60 insitu genes that could then be fed into DistMap. We ran NN and Lasso on all 3 subchallenges and compared their resulting gene lists against random. Neural nets performed better for subchallenge #2 and LASSO performed better for subchallenges #1 and #3. Importantly, the training data and approach used does differ between LASSO and Neural Networks. For instance, the number of features (# of genes from RNASeq data used) and number of observations (max MCC versus 95%) differ between the approaches. Therefore, the differences in performance likely reflect various decisions made during preprocessing and not only the algorithm. This was intentional as our goal wasn’t to determine if NN outperformed LASSO using the exact same data, but which independent approach performed best for each subchallenge. The details of LASSO are described in subchallenge #1 and #3 and the details of Neural Networks are described in subchallenge #2. Here https://www.synapse.org/#!Synapse:syn17055868/wiki/585917 can be found the steps used that helped us baseline and determine which algorithm (Neural Net or LASSO) to use for the various subchallenges.
Short Abstract: Traditional drug discovery approach is identifying a suitable target for a disease and finding a compound that binds to the target. In this approach, structures of compounds are considered as the most important feature because it is assumed that similar structures will bind to the same target. Therefore structural analogs of the drugs that bind to the target have been selected as drug candidates. However, even though compounds are not structural analogs, they may achieve the desired response and may be used for the disease. A new drug discovery method based on drug response, and not solely on drug structure, is necessary; therefore, we propose a drug response-based drug discovery model called ReSimNet
Short Abstract: The DREAM Single Cell Transcriptomics Challenge is a community-wide effort to seek computational solutions for spatial mapping of single cells in tissues using single-cell RNAseq data and a reference atlas obtained from in situ hybridization data. We approached this problem by combining unsupervised and supervised machine learning algorithms and obtained promising results. First, to find a set of most informative genes, an unsupervised feature selection method was designed to optimize two biologically rational metrics based on the consistency between gene expression similarity and cell proximity. The “gold standard” locations of the cells to be predicted were not used at this stage, thus significantly reducing the chance of overfitting. Second, a Particle Swarm Optimization (PSO) algorithm was used to learn proper weights for different genes in order to maximize matches between the predicted locations and the “gold standard” locations. Cross-validation was performed to avoid over-fitting. Finally, the information embedded in the cell topology was used to improve the predicted cell-location scores by weighted averaging of scores from neighbor locations. While our own evaluation shows that all three components are important for the performance of the algorithm, post-challenge analysis will be performed to evaluate the contribution from individual components and the biological significance of the selected genes and their associated weights