Posters

20th Annual International Conference on
Intelligent Systems for Molecular Biology

Posters

Poster numbers will be assigned May 30th.
If you can not find your poster below that probably means you have not yet confirmed you will be attending ISMB/ECCB 2015. To confirm your poster find the poster acceptence email there will be a confirmation link. Click on it and follow the instructions.

If you need further assistance please contact submissions@iscb.org and provide your poster title or submission ID.

Category Z - ''

Z01 - Epi-Letter - a bioinformatics tool for epigenomic signature discovery

Short Abstract: The concept of a chromatin-based epigenetic code, proposed more than ten years ago, associates specific combinations of chromatin marks with different gene expression states and their maintenance. High-throughput technologies like microarray profiling or next generation sequencing enable us to examine the validity of the concept, by profiling transcriptomes and multiple chromatin marks for many different samples, conditions and organisms. The large amounts of generated data require efficient and instructive computational methods to identify biologically relevant correlations and to challenge the hypothesis of an epigenetic code.

Here, we introduce a generally applicable bioinformatics method to group epigenetic information across genome-wide chromatin data sets. It classifies automatically homogenous chromatin-based signals into discrete categories, and transforms the categories into so-called epi-letters. Each genomic region can then be represented as a combined string of epi-letters referring to different chromatin marks. This synoptic compilation can be used for further clustering or de novo discovery of common epigenomic signatures and can be visualized by applying the DNA motif sequence logo concept.

We will present the results of the Epi-Letter method using published data from twelve chromatin marks (including DNA methylation) in the model organism Arabidopsis thaliana. We further examine the results for association of the signatures with relevant biological functions such as gene expression level, tissue specificity and gene ontology functional categories and compare them with previously published results.

Z02 - Widgets integration in Mobyle to enable Bioinformatics data visualization and editing

Short Abstract: Mobyle is an open source framework and web portal specifically designed for the integration of bioinformatics software and databanks. Its web interface allows scientists, without installing anything locally, to use command line-based bioinformatics tools to perform analyses on remote computing resources. The high level of integration between the different tools provided enables and guides users throughout the construction of potentially complex protocols, chaining interactively successive tasks in an exploratory mode, or automating their execution with workflows. However, it can be extremely tedious for users to understand and edit complex bioinformatics data (e.g., multiple sequence alignments, phylogenetic trees, structural data) in their native often text-based formats, using traditional web interfaces. To solve this issue, Mobyle now offers the ability to use integrated rich browser-embedded components, significantly reducing the previously mentioned usability problem. This is achieved by using a combination of server-based tools to perform analyses and client-side components to seamlessly visualize and manually edit web-hosted data. Relying on the open and widely used protocols and formats of the web, the provided framework makes it easy to integrate additional "rich" components such as Java applets and Ajax-based or Flash-based components.

Z03 - Quantifying Reproducibility in Computational Biology: The Case of the Tuberculosis Drugome

Short Abstract: We attempt to quantify the difficulty of reproducing previously published research in computational biology for different classes of user and suggest ways in which the situation might be improved. We developed a workflow for a previously published paper (Kinnings et al. “The Mycobacterium tuberculosis Drugome and Its Polypharmacological Implications”, PLOS Computational Biology, November 2010). The computational method reported in the article was expressed as a workflow that can be executed in a workflow system. We observed that the software originally used had changed and therefore the experiment was hard to reproduce exactly as it was carried out. We kept careful records of the amount of effort required to develop the workflow and reproduce the work. We attempt to quantify reproducibility by assigning a reproducibility score that exposes the cost of omitting from the published paper important information that caused problems in creating the workflow. We assign a reproducibility score for 1) researchers familiar with the research area, 2) researchers with basic bioinformatics expertise, 3) researchers with no expertise in bioinformatics. We also consider reproducibility by the authors who did the original work and may need to reproduce the method to update or extend the results published, noting that some authors may move away from the lab and their notes may no longer be available. Finally, we developed desiderata for reproducibility and guidelines to authors of papers that include complex computational methods who wish to facilitate reproducibility and reuse by other researchers.

Z04 - Estimating the benifits of additional sequencing

Short Abstract: We describe a novel method to accurately estimate how many distinct molecules are present in a sequencing library and how deeply the library should be sequenced in order to observe them based on the information from a small sequenced sample. Our method is based on empirical Bayesian estimates for species abundance derived in the context of capture-recapture experi- ments. Previous algorithms for these estimates were not suffi- ciently robust for applications associated with high-throughput sequencing. We leverage rational function approximations to ob- tain stable estimates that extrapolate accurately over two orders of magnitude. In the context of sequencing experiments, these estimates allow us to sequence a few million reads and estimate what can be learned by sequencing hundreds of millions of reads from the same library.

Z05 - Dao: a novel programming language for bioinformatics

Short Abstract: Dao is a novel programming language with advanced features that make it well suited for developing bioinformatics applications. These features include: (a) optional typing with type inference, which has the advantages of both static typing and dynamic typing; (b) Just-In-Time (JIT) compiling, which makes numeric computation efficient; (c) built-in support for concurrent programming, which simplifies multi-core programming; (d) concurrent garbage collector, which makes multi-threading safer; (e) transparent C interfaces, which make embedding and extending Dao very simple.

Dao also has a number of other less prominent but important features that make the language every expressive and easy to use. Such features include built-in support for string pattern matching, built-in numeric array types, variant type (disjoint union), enumerated symbol type (a combination of C++ enum types and Ruby objects), typed function decorator and code section methods (similar to Ruby's code blocks) etc.

Currently the main disadvantage of Dao for bioinformatics is the lacking of a rich set of bioinformatics-specific modules, but this may change soon. A Clang-based tool (ClangDao) is under development and is almost complete, with this tool, it should be very easy to create bindings for existing bioinformatics libraries written in C or C++. In fact, ClangDao has successfully created bindings for a number of large (non bioinformatics) libraries. Screening and prioritizing bioinformatics and other scientific libraries for binding will be the next important step to prepare Dao for the bioinformatics field.

Dao is currently released under the LGPL license. A more permissive license (such as MIT) is planned for future releases.

Z06 - CellOrganizer: Image-derived Generative Models

Short Abstract: The goal of systems biology is to simulate cellular behavior using computational models. Due to the large number of proteins within a cell and their behavioral complexity, the location and function of all proteins under all conditions cannot be directly observed. CellOrganizer attempts to solve this problem by creating a conditional generative framework for modeling complex cellular behavior in silico. These models can be used to combine data from a large number of experiments to simulate cellular response for a large number of proteins and structures simultaneously. The largest advantage of this approach is that simulations can predict novel behavior of unobserved proteins under a variety of conditions using the conditionality of unobserved proteins on observed models for a given experiment. Currently CellOrganizer can learn conditional generative models from two- or three-dimensional fluorescent imaging data. It can synthesize instances for cell shape, nuclear shape, chromatin texture, vesicular organelle number, size, shape and position, and microtubule distribution. It is important to note that CellOrganizer can be applied to a variety of cell types and has been successfully used for modeling human HeLa and mouse NIH 3T3 cells. Specifically it combines a variety of vesicular proteins such as LAM and TfR with microtubule distribution models within a single cell and nuclear framework model. We show that the learned models are statistically accurate by using a classifier trained on the 3D HeLa cell dataset to determine classification accuracy of each generative pattern within a synthesized cell and nucleus.

Z07 - The Three-Dimensional Architecture of a Bacterial Genome and Its Alteration by Genetic Perturbation

Short Abstract:

Z08 - Comparing Normal Modes Of Protein Structures Using WEBnm@ 2.0

Short Abstract: In the field of molecular modeling, normal modes analysis has been shown to be an effective computational method to study the movements of proteins, especially at the domain level. WEBnm@ (http://apps.cbu.uib.no/webnma/home) is a web-tool which provides access to calculations of these modes on C-alpha atoms of protein structures and various analyses with output in R plots or raw data points. We have improved the efficiency for input processing and have added new functionality that interprets the normal modes calculated. In the Single Analysis section, we have included an interactive visualisation of the lowest six mode vectors, the calculation of the correlation matrix based on all the modes vectors and the overlap analysis with another conformation of the same structure. The newest section, Comparative Analysis, calculates and compares the normal modes of a set of aligned protein structures, which is currently not available in any other tool. It includes analyses such as the fluctuations profile, deformation energies and the conformational overlap analysis. For this part, more than one structure can be submitted along with a FASTA file of their alignment. In addition to the updates, we have also provided a SOAP web-service for a more programmable interface for both of these sections.

Z09 - A structural systems biology approach to polypharmacological drug discovery

Short Abstract:

Z10 - Why CDRs are not what you think they are or How to identify the real antigen binding sites

Short Abstract: Identification of the residues within an antibody (Ab) that recognize and bind the antigen (Ag) is at the heart of immunological research. Complementarity Determining Regions (CDRs) are considered a proxy for the sites of Ag recognition and binding and are typically discerned by searching for the regions that are most different, in sequence or in structure, between Abs. Therefore, the most widely used bioinfomatic immunological tools are those aimed at identifying CDRs. In this study we compare the residues identified by CDR identification tools to residues that are found experimentally to bind the Ag. We found that >20% of Ag binding residues fall outside the CDRs these methods identify. Thus, we conclude, these widely used methods may not constitute a comprehensive strategy for Ag binding sites identification. By analyzing all Ab-Ag complexes, we found that virtually all Ag binding residues fall within regions of structural consensus between Abs, and that these regions are organized along the sequence of the Ab chains. Moreover, we show that residues that fall outside CDRs are at least as important to Ag binding as residues within CDRs. On the other hand, Ag binding residues that fall outside the structural consensus regions but within CDRs show a marginal energetic contribution to Ag binding. The high affinity and specificity of Ab-Ag interactions are fundamental for understanding the biological activity of these molecules. Correct identification of the residues that mediate these interactions is crucial for numerous molecular applications in immunological research as well as in diagnostics and therapy.

Z11 - Copy number aberrations affecting adhesion genes involved in the development of the cerebellar vermis are associated with autism spectrum disorders

Short Abstract: We investigated neurodevelopmental dysfunctions in autism spectrum disorders (ASD) by a integrative analysis including the largest genome-wide studies on associations between copy number aberrations (CNA) and ASD, the BioGPS tissue atlas, the Allen-Brain-Atlas, and in situ hybridization histochemistry data from the developing mouse brain. In contrast to the original association studies, we considered "ASD-candidate-genes" each of which is the only CNA-impaired gene in an ASD case, therefore, presumably causing ASD. For extracting ASD-candidate-genes, we developed an analysis pipeline for rare and small CNAs.
The independently identified ASD-candidate-genes include neurexins (CNTNAP2, NRXN1), catenin (CTNNA3), cadherin (CDH13), and contactins (CNTN5, CNTN6). Gene set enrichment analysis showed that significant biological processes are all related to cell and synaptic adhesion, postsynaptic density, membrane and synapse. At data from the BioGPS, the Cancer-Genome-Anatomy-Project, and the Allen-Brain-Atlas, ASD-candidate-genes have significantly different variations in their expression values in cerebellum compared to other genes, where at the Allen-Brain-Atlas cerebellar vermis lobes I-II, III, VI, and VIII where most significant. In situ hybridization histochemistry data indicate that ASD-candidate-genes are primarily expressed in the developing mouse cerebellum.
Our results, which hint at the vermis as the location of ASD’s pathogenesis, are consistent with pathological studies of ASD cases (90% of the examined brains showed well-defined cerebellar abnormalities). Also studies on children with vermal lesions showed phenotypes similar to autism. The high percentage, 60-80%, of ASD cases showing motoric deficits again hints at the cerebellum. We explain 4:1 male to female ratio in ASD by the regulatory influence of estrogen on the development of the cerebellum.

Z12 - Next Generation Personalized Medicine Strategies Incorporating Genetic Dynamics and Single Cell Heterogeneity May Lead to Improved Outcomes

Short Abstract: Cancers are heterogeneous and genetically unstable. Current practice of personalized medicine tailors therapy to heterogeneity between cancers of the same organ type. However, it does not yet systematically address heterogeneity at the single cell level within a single individual’s cancer or the dynamic nature of cancer, due to genetic and epigenetic change as well as transient functional changes. We have developed a mathematical model of targeted therapy of cancer incorporating genetic evolutionary dynamics and single cell heterogeneity, and examined simulated clinical outcomes. Analyses of an illustrative case and of a virtual clinical trial of over 3 million evaluable “patients” demonstrate that augmented (and sometimes counterintuitive) next-generation personalized medicine strategies may lead to superior patient outcomes compared to the current personalized medicine approach. Current personalized medicine matches therapy to a tumor molecular profile at diagnosis and at tumor relapse or progression, generally focusing on the average, static, and current properties of the sample. Next-generation strategies also consider minor sub-clones, dynamics, and predicted future tumor states. Our methods allow systematic study and evaluation of next-generation personalized medicine strategies. These findings may in turn suggest global adjustments and enhancements to translational oncology research paradigms.

Z13 - Prediction of genetic interactions using network topology

Short Abstract: Motivation: Genetic Interaction (GI) detection impacts the understanding of human disease and the ability to design personalised treatment. The mapping of every GI in most organisms (even yeast and worm) is far from completion due to the combinatorial amount of gene deletions and knockdowns needed. Computational techniques to predict new interactions based only on network topology have been developed but never applied to GI networks.
Methods: We applied several neighbourhood-based and network-embedding techniques to yeast and worm GI networks to predict new links. To investigate the true robustness of each approach, we removed links uniformly at random from the networks and analysed how sparsification impacts prediction. We also tested if a biologically meaningful network topology can be modelled by adding links uniformly at random to the aforementioned sparsified networks.
Results: We show that topological prediction of GIs is possible with unexpected high precision. Node-neighbourhood-based techniques perform better when the network is dense, while network-embedding approaches present similar performance in both dense and sparse networks, with Minimum Curvilinear Embedding attaining the best result. We also demonstrate that a random network re-densification process cannot re-generate the topology shaped by the evolution of past biological processes.
Conclusion: Computational prediction of GIs is a strong tool to aid high-throughput GI determination and untangle the complex relationships between genotype and phenotype. An innovative technique, inspired by cybernetic principles, should self-adapt its prediction in relation to different sparsity levels.

Z14 - Endosymbiont hunting in the metagenome of Asian citrus psyllid (Diaphorina citri)

Short Abstract: The Asian citrus psyllid (D. citri Kuwayama or ACP) is host to 7+ bacterial endosymbionts and is the insect vector of Ca. liberibacter asiaticus (Las), causal agent of citrus greening. To gain a better understanding of endosymbiont and pathogen ecology and develop improved detection strategies for Las, DNA from D. citri was sequenced to 108X coverage. Initial analyses have focused on Wolbachia, an alpha-proteobacterial primary endosymbiont typically found in the reproductive tissues of ACP and other arthropods. The metagenomic sequences were mined for wACP reads using BLAST and 4 sequenced Wolbachia genomes as bait. Putative wACP reads were then assembled using Velvet and MIRA3 assemblers over a range of parameter settings. The resulting wACP contigs were annotated using the RAST pipeline and compared to Wolbachia endosymbiont of Culex quinquefasciatus (wPip). MIRA3 was able to reconstruct a majority of the wPip CDS regions and was selected for scaffolding with Minimus2, SSPACE and SOPRA using large insert mate-pair libraries. The wACP scaffolds were compared to wPip using Abacas and Mauve contig mover to orient and order the contigs. The functional annotation of scaffolds was evaluated by comparing it to wPip genome using RAST. The draft assembly was verified using an OrthoMCL based comparison to the 4 sequenced Wolbachia genomes. We expanded the scope of endosymbiont characterization beyond wACP using 16S rDNA and partial 23S rDNA analysis as a guide. Results will be presented regarding endosymbionts, their potential interactions and their impact on the disease of citrus greening.

Z15 - An Event-Driven Model Describing Gene Cluster Evolution in Bacteria

Short Abstract: Operons are important bacterial genomic features that contain several genes under common regulatory mechanisms. However, the evolution of operons is poorly understood. Often, operon genes are organized in the same order in which they catalyze reactions in the metabolic pathway. These features allow operons to not only to control large blocks of metabolic functions in bacteria, but allow researchers a method to assign function to unknown genes encountered in genome studies.

Several models were suggested to explain operon evolution, but none explain all types of operons. Here we aim to create a universal framework to describe operon evolution, analogous to the cost-function commonly used to describe DNA sequence evolution. Sequence evolution is described by assigning costs to substitutions and indels; here we assign costs to events in operon evolution that include gene gain, gene loss, operon split, duplication, and fusion.

Using this method, we can identify different mechanisms of operon evolution, and identify whether there is any preference to specific evolutionary trajectories. We discuss several types of proposed models, and how our cost-function describes them. Our most surprising finding is that in many cases operons containing non-homologous genes but sharing the same cellular function also share similar evolutionary trajectories.

Z16 - Cistromic analysis reveals novel insights into hepatic CREB regulatory mechanisms

Short Abstract: The regulation of gluconeogenic genes by CREB in the mammalian liver is a central process in the maintenance of blood glucose homeostasis, and is disrupted in major metabolic diseases such as obesity and type 2 diabetes. However, the precise molecular mechanisms by which CREB achieves functional specificity in different physiological contexts remain to be elucidated. In vitro studies have suggested that fasting-induced phosphorylation of CREB promotes binding to a specific subset of CREB recognition sequences, providing a potential mechanism by which a physiological cue induces the expression of some CREB target genes, but not others. To examine this mechanism in vivo, we performed chromatin immunoprecipitation coupled to high-throughput sequencing (ChIP-seq) of CREB in mouse liver for both the fed and fasted state. We developed a robust analysis method to leverage multiple biological replicates and quantitatively compare the extent of CREB-DNA interactions genome-wide between physiological conditions. Our results demonstrate that CREB-DNA binding is independent of physiological cues in vivo, and support a model in which CREB remains constitutively bound to DNA in hepatocytes regardless of metabolic state. Further computational analysis of the CREB cistrome revealed additional insights into the potential mechanisms by which CREB differentially activates subsets of target genes. In particular, we identified a novel hybrid of CREB and C/EBP binding sequences with functional and mechanistic implications. Taken together, our results demonstrate the importance of a robust, quantitative approach for ChIP-seq, and provide new biological insights into the potential use of CREB as a therapeutic target in major metabolic diseases.

Z17 - Finding sphingolipid pathway modulating genes from Ontology Fingerprint derived gene networks

Short Abstract: Sphingolipids play numerous roles in eukaryotes, and the basics of sphingolipid metabolism are known; however, little insight exists about additional components potentially involved in either regulation and/or metabolite availability. Integrating biomedical literature, ontology, network analyses, and experimental investigation, we inferred novel genes that could modulate the yeast sphingolipid pathway. We first constructed a novel gene network via pairwise comparison of all yeast genes’ ontology fingerprints—a set of ontology terms overrepresented in PubMed abstracts linked to a gene along with their corresponding enrichment p-value. The network was further refined using a Bayesian hierarchical model to identify candidate genes based on their combined connectivity to the sphingolipid gene subnetwork. Top-ranked candidates were enriched with genes that displayed altered sphingolipid pathway functions, initially measured by their sensitivity to myriocin, an inhibitor of de-novo sphingolipid biosynthesis. Further experiments confirmed that the deletion strain of one of these genes, PFA4—which has the highest sensitivity to myriocin among all candidates—was rescued by exogenous sphingolipids from myriocin. These results strongly suggest that the elevated myriocin sensitivity of pfa4? strain is a consequence of an altered sphingolipid pathway. With quantified relevance among genes, our network adds a unique dimension in biological network and can be applied for many novel analyses. Such applications can expand our existing knowledge of biological pathways related to human disease, and to identify novel drug targets and biomarkers for prognostic and diagnostic purposes.

Z18 - Investigating the prevalence of post-translational modification crosstalk

Short Abstract: Many post-translational modifications (PTMs) have now been identified by proteomics but for many of these PTMs we do not know the functional role that they play. PTM crosstalk is now being identified experimentally, where multiple PTMs act together to regulate cellular processes. Performing a bioinformatics based genome wide assessment in both human and yeast we identify the potential for extensive crosstalk between PTMs and propose that this is a widespread mechanism for the regulation of protein function. We relate the presence of crosstalk to the role of PTMs in protein-protein interactions (PPIs). In five different species we have identified the same correlations or anti-correlations between the number of protein-protein interactions a protein has and the presence of specific PTMs, demonstrating the genome wide role of PTMs in regulating PPIs. A structural analysis supports this observation showing at least three types of PTM that are enriched at interfaces. Additionally in human and yeast we observe that proteins with the potential for crosstalk have a higher number of PPIs than proteins without crosstalk. Our results are important for the understanding of the role of PTMs in regulating cellular processes and can also be used to improve methods for the prediction of PPIs, PTM sites and the analysis of the functional effect of genetic variants.

Z19 - Buchner: A homology independent, computationally efficient method for new enzyme discovery.

Short Abstract: The success of cellulosic biofuels and artificial photosynthesis both rely on the discovery of efficient enzymes and bioprocesses they are involved. The rapid pace in microbial genome sequencing, especially deep metagenome sequencing, has enabled rapid identification of millions of genes without cultivation, thereby providing a large pool for identifying new enzymes with desirable biochemical traits. Homology search with millions of genes against a rapid growing database, however, becomes increasingly impractical due to its computational burden.
To overcome the limitation, we developed a homology independent approach that is computationally efficient for new enzyme identification. The resulting software, “Buchner”, achieves not only comparable sensitivity for proteins with linear homology, but also near linear computational performance with the growth in both queries and databases.
We systematically tested Buchner for the computing efficiency and accuracy in new enzyme prediction. We have subset 1000 families having at least 1000 members from the Pfam-A database and extracted frequencies of amino acid dimers with distance 0-5 as feature vectors. In collaboration with NERSC, we could successfully learn 1.5 billion elements of training data in 20 minutes with less than 5GB memory per process using 800 processes. The average prediction accuracy among 800 families tested was 97.33%, and the time elapsed to calculate 160K records of the testing set were less than 3 minutes.
In conclusion, Buchner represents a computationally efficient and accurate protein function prediction method scalable to millions of genes.

Z20 - Peptides, Proteins and Networks: Removing obstacles and getting on the road to systems integration.

Short Abstract: We highlight the utility of peptide de-novo network inference in providing a novel framework for biological interpretation, quality control, inference of protein abundance, assistance in resolving degenerate peptide-protein mappings, and biomarker signature discovery. Using this work as a basis, we then outline a network-guided classification framework to assist with peptide to protein mapping. Our approach involves first separating the bipartite peptide-protein annotation graph into distinct classes, greatly reducing the size of the problem. Then, by examining the trend concordance, topological overlap, and co-expression module assignment (leveraging the de-novo network inference) of peptides for a given protein, we can decide on the most reasonable set of proteins that would produce the observed data. In cases where peptide signal stems from multiple proteins, a mixture model can estimate the proportion of signal deriving from each parent protein. This allows us to discriminate between similar protein family members, improve protein identification and provide higher confidence data for downstream analysis and data integration.

Z21 - iASeq: Integrating Multiple ChIP-seq Datasets for Detecting Allele-specific Binding

Short Abstract: In diploid organisms, certain genes can be expressed, methylated or regulated in an allele-specific manner. These allele-specific events (AS) are of high interest for phenotypic diversity and disease susceptibility. Next generation sequencing technologies provide opportunities to study AS globally. However, little is known about the mechanism of AS. For instance, the patterns of allele-specific binding (ASB) across different Transcription Factors (TFs) and histone modifications (HMs) are unclear. Moreover, the limited number of reads on heterozygotic SNPs results in low-signal-to-noise ratio when calling AS. Here, we propose a Bayes hierarchical model to study ASB by jointly analyzing multiple ChIP-seq studies. The model is able to learn the patterns of ASB across studies and make substantial improvement in calling ASB. We apply our model to 57 ChIP-seq studies for GM12878 in ENCODE. Compared to single study based statistics, its accuracy of identifying ASB increased more than 40% according to two gold standards. Besides, we reveal correlation patterns across different TFs and HMs for ASB. In principle, the model can also be applied to call AS for multiple RNA-seq and MeDIP-seq studies.

Z22 - Improvement of mitochondrial cleavage prediction

Short Abstract: In eukaryotic organisms, the vast majority mitochondrial proteins are coded in the nucleus. Therefore mitochondria import proteins from the cytosol through several known pathways. A large fraction of mitochondrial proteins are synthesized with cleavable N-terminal presequences, but prediction of this cleavage is still challenging. In recent years, large-scale mitochondrial proteomics research has provided large data sets of mitochondrial protein cleavage (Vögtle et al., 2009; Huang et al., 2009) and the discovery of a novel intermediate protease, Icp55 (Vögtle et al., 2009).
We curated proteomic datasets and developed MoiraiSP (Mitochondrial matrix targeting Signal Predictor), a novel SVM based mitochondrial cleavage site predictor trained on proteome scale datasets. Fortunately, it is known that the MTS containing mitochondrial proteins are almost always first cleaved by MPP (Mitochondrial Processing Peptidase) and that cleavage sites of this enzyme almost always have an arginine at the -2 position (the “R-2 rule”). Using this kind of prior biological knowledge we have defined a labeled dataset for three proteases: MPP, Oct1, and Icp55 (for yeast). In fact, out predictor is the first to explicitly utilize the knowledge that plants lack the Oct1 protease.
As measured by Mathews Correlation Coefficient (MCC), MoiraiSP shows better performance than the industry standard tool TargetP (yeast: 0.794 vs. 0.582; plant: 0.733 vs. 0.659) and enables discrimination between the cleavage of three proteases.

Z23 - Ion Torrent Variant Caller

Short Abstract: Ion Torrent variant caller is a software suite that uses sequencing data produced by PGM (Personal Genome Machine) to make accurate variant calls. Here we present a robust statistical framework that uses the unique error model of the PGM to accurately predict the likelihood of each variant. The software achieved both very good accuracy and speed.

Z24 - Detecting Fusions and Tandem Duplications in Acute Myeloid Leukemia Transcriptomes Using Barnacle

Short Abstract: Chimeric transcripts are RNA molecules that cannot be explained with traditional models of alternative splicing. These encompass a variety of chimera types, and we focus our study on three: fusions, partial tandem duplications (PTDs), and internal tandem duplications (ITDs). Two processes that can result in chimeric transcripts are genomic rearrangement and trans-splicing, a type of splicing that involves multiple pre-mRNA molecules. In this work, we present results from Barnacle, our analysis tool that detects and characterizes fusion, PTD, and ITD events in assembled RNA-seq data. We ran Barnacle on several deeply sequenced acute myeloid leukemia (AML) samples. Among the events that Barnacle predicts in these libraries are several known to be important in AML: fusions between the PML and RAR? genes, PTDs in MLL, and ITDs in FLT3. These results for a well-studied cancer with known and clinically significant chimeric transcripts suggest that Barnacle will be a valuable tool for detecting and characterizing such events in RNA-seq data.

Z25 - Transcriptome analysis reveals extensive alternative splicing coupled with nonsense-mediated mRNA decay in a human cell line

Short Abstract: To broaden our understanding of the surveillance and regulatory functions of nonsense-mediated mRNA decay (NMD) in humans, we globally surveyed the isoforms targeted by this pathway with RNA-Seq analysis on HeLa cells in which NMD had been inhibited. We identified almost 2,500 moderately-to-highly expressed isoforms derived from 1,924 genes as putative NMD targets based on the presence of a premature termination codon and significantly increased abundance when NMD was inhibited. Of the 8,358 expressed genes that were alternatively spliced, 1,900 produced NMD-targeted isoforms (23%). Our results are consistent with previously inferred unproductive isoforms while discovering numerous previously uncharacterized ones. NMD-targeted isoforms were derived from genes involved in many functional categories, but most prevalently RNA splicing and processing. Our results support the hypothesis that alternative splicing coupled with NMD can serve as a post-transcriptional regulation mechanism (AS-NMD, RUST), and affects genes involved with a wide variety of cell functions and pathways.

Z26 - Expression Profiling of Archival Tissues for Long-term Health Studies

Short Abstract: Over 20 million archival tissue samples are stored annually in the United States as formalin-fixed, paraffin-embedded (FFPE) tissue blocks, but RNA degradation during fixation and storage has prevented their use for transcriptional profiling. New and highly sensitive assays for whole-transcriptome microarray analysis of FFPE tissues are now available, but the resulting data include noise and variability for which methods developed for previous expression arrays are inadequate. Standard assessments of RNA do not predict expression data quality from these clinical samples, and these recent technologies have not yet been tested on a large scale. Existing bioinformatic methods for quality control on fresh-frozen tissues have thus not been sufficient to open up such a rich data resource for translational studies. We have thus developed and validated novel methods of assessing, summarizing, and visualizing expression data quality for population-scale whole-genome expression studies using previously inaccessible FFPE tissues. We present them here along with the two largest such studies to date, comprising over 1,100 novel samples, in addition to a meta-analysis with 12 published FFPE microarray studies. This includes the analysis of a population-based study of 1,003 colorectal cancer samples from participants in the Nurses’ Health Study and Health Professionals Follow-up Study, two long-term prospective health studies. Quality control methods were additionally developed and validated in a second study we performed of 168 breast cancer and autopsy samples. Thus, this study establishes validated quality control measures at the levels of study, sample, and probe for archival tissue gene expression profiling.

Z27 - Detection of Somatic Copy Number Alterations Employing Targeted Exome Capture Sequencing

Short Abstract: The research community at large is expending considerable resources to sequence tumor genomes and other human diseases. Much of the effort is being focused on comprehensive sequencing of coding regions of the genome using targeted exome capture (i.e., “whole exome sequencing”). The primary goal of targeted exome sequencing is to identify non-synonymous mutations that potentially have functional consequences. Here, we demonstrate that whole exome sequencing data can also be analyzed for the purpose of comprehensively monitoring somatic copy number alterations by benchmarking the technique against conventional array CGH. A series of 17 matched tumor and normal tissues from patients with metastatic castrate resistant prostate cancer was used for this assessment. We show that targeted exome sequencing reliably identifies gene amplifications such as androgen receptor (AR), and deletions such as PTEN, which are known to be common in advanced prostate cancer. Taken together, these data suggest that targeted exome sequencing data can be effectively leveraged for the detection of somatic copy number alterations in cancer.

Z28 - Selecting microbial communities of interest based on marker gene sequence data

Short Abstract: Microbial populations are often surveyed in high-throughput by sequencing marker genes such as the 16S rRNA, with a subset of communities subsequently selected for further in-depth metagenomic, metabolomic, or other investigations. This selection process is currently often ad hoc. We have thus developed and validated a collection of principled sample selection methodologies targeting specific biological characteristics of interest in microbial communities. These methods include selection enriching for microbial clades of interest, maximizing whole-community diversity, identifying the most distinct samples, subsampling a representative set of total study variation, and predictively discriminating among multiple phenotypes. The accuracy of each selection technique and, importantly, its influence on subsequent microbial community properties were evaluated on a controlled set of simulated sequence data sets. Maximizing diversity, for example, is a common current heuristic which proved prone to undersampling the extremes of overall community configurations. Selecting specific clades of interest not only enriched those clades but also differently enriched others based on community structure. Sampling methodologies targeting the selection of representative community variation was shown to minimize community enrichment in selected samples while selection of the most distinct samples exhibited more enrichment biasing sampled community from the initial survey population. Finally, each selection method was also applied to thousands of 16S sequence data sets from the Human Microbiome Project and from inflammatory bowel disease patients in order to identify concordance with paired metagenomic shotgun assays. These methodologies have been implemented for public use as the microPITA (Picking Interesting Taxonomic Abundances) software package.

Z29 - Identify druggable targets using microenvironments matching

Short Abstract: The ability of a target protein to bind small, drug-like molecules with a high affinity is referred to as druggability. It is an important criterion in the target selection phase for drug discovery. Known small molecule drugs occupy a limited area of chemical space and their binding sites share common features. We collected drug-binding sites with known 3D structures, representing the optimal microenvironments for drug binding. Our method DrugFEATURE evaluates similarities between two sites by matching key microenvironments that confers molecular recognitions. Given a target, we compare it to the representative of drug-binding sites and identify similar subsites (clusters of key microenvironments). These subsites confer molecular recognition of chemical groups of drugs. Our hypothesis is that the frequency of similar subsites is associated with the druggability of the given target. We therefore define the Predicted Druggability Index (PDI) to link a protein target to drug chemical space. We benchmarked PDI using the Abbott dataset of which extensive experimental assessment of druggability has been performed. PDI correlates with the hit rate from NMR-based screening and associates with the ability of high affinity drug binding. We also benchmarked PDI using known druggable and difficult sites and proved it a useful indicator of druggability. One of our ongoing efforts focuses on evaluating druggabilities of all functional domains in Protein Data Bank. We aim to discover novel druggable domains and families of which drug-like molecules binding has not been found yet. Our predictions would provide an insightful guidance for drug discovery.

Z30 - Simultaneous host-parasite transcriptomes provide insight into malarial host-parasite interactomes

Short Abstract: Molecular interactions are key to the ability of a parasite to enter and persist in its host. However our understanding of the genes and proteins involved in these interactions is no more than partial in even the most well understood systems. We have applied the popular concept of using correlated gene expression profiles to identify molecular interactions in one species to the interspecific (host-parasite) case. We show for the first time that genes in different species with correlated expression are more likely to encode proteins which interact or are otherwise involved in host-parasite interaction. We go on to examine predicted host-parasite interactions between the malaria parasite and both its mammalian host and insect vector.

Z31 - Matrix geometry determines optimal cancer migration strategy and modulates response to therapeutic agents

Short Abstract: Cell motility is required for many biological processes, including cancer metastasis. The molecular requirements for migration and morphology of migrating cells can vary considerably depending on matrix geometry; therefore, predicting the optimal migration strategy or the effect of experimental perturbation is difficult. Here, we present a model of single cell motility that encompasses actin polymerisation based protrusions, cell cortex asymmetry, membrane blebbing, local heterogeneity, cell-extracellular matrix adhesion, and varying extracellular matrix geometries. This is used to explore the theoretical requirements for rapid migration in different matrix geometries. Confined matrix geometries cause profound changes in the relationship of adhesion and contractility to cell velocity; indeed cell-matrix adhesion is dispensable for migration in discontinuous confined environments. The utility of the model is shown by predicting the effect of different drugs and integrin depletion in vivo based only on simple in vitro measurements. Multiphoton intravital imaging of melanoma is used to verify bleb-driven migration of both melanoma and endothelial cells at tumour margins, and the predicted response to drugs.

Z33 - Regulatory Network Structure as the Dominant Determinant of Transcription Factor Evolutionary Rate in Yeast

Short Abstract: The evolution of transcriptional regulatory networks has thus far mostly been studied at the level of cis-regulatory elements. However, since trans-level variation is known to account for much of the gene expression variation between strains, studying the evolution of trans-factors is crucial to understanding regulatory network evolution. Here, we systematically asses the different genomic and network-level determinants of transcription factor (TF) evolutionary rate in yeast and how they compare to those of generic proteins. We develop a novel method to demonstrate that transcription factors possess significantly distinct trends relating evolutionary rate to various genomic features, such as mRNA expression level, codon adaptation index, the evolutionary rate of physical interaction partners, and, confirming previous reports, to protein-protein interaction degree and regulatory in-degree. We then go on to show that the strongest predictor of transcription factor evolutionary rate is the median evolutionary rate of its target genes, followed by the fraction of target genes which are species-specific. After decomposing the regulatory network into positive and negative edges, we found that this effect is limited to activating regulatory relationships. This work is the first to establish the modularity of TF-target protein evolution and highlights key evolutionary differences between positive and negative regulation systems. We have also demonstrated that systems-level properties can leave evolutionary traces of comparable effect size to physical features such as interaction degree and expression level and that TF evolution in particular is best understood through a regulatory network-level perspective.

Z34 - Fast and accurate metagenomic profiling of microbial community composition using unique clade-specific marker genes

Short Abstract: Identifying which organisms populate a microbial community and in what proportions is crucial for characterizing human-associated microbiomes. Shotgun sequencing allows biological function and phylogenetic composition to be assayed simultaneously, but existing taxonomic profiling methods are impractical for the scope of current datasets. We propose MetaPhlAn, a novel approach incorporating clade-specific marker genes identified computationally using 2,887 reference genomes. The resulting catalog of 400,000 genes permits unambiguous taxonomic assignments from metagenomic data more accurately and >50 times faster than current approaches. The method was evaluated on terabases of short reads in addition to ten synthetic metagenomes, achieving correlations with true organismal relative abundances over 0.99 for high-complexity and log-normally distributed communities. Applied to the 691 metagenomes of the Human Microbiome Project, MetaPhlAn profiled the microbial species populating all 15 assayed body sites together with their abundance pattern signatures. Specifically, on 51 vaginal microbiomes, MetaPhlAn agreed closely with 16S-based results and further identified the Lactobacillus species forming five distinct microbiome types. An analysis of marine ecosystems confirmed detection of archaeal organisms and MetaPhlAn's applicability and accuracy even in communities with limited numbers of sequenced reference genomes. Finally, MetaPhlAn allowed us to perform a meta-analysis integrating 263 samples from the HMP and MetaHIT projects, providing the largest metagenomic community profiling to date of the human gut microbiota. This dataset highlights a range of dominant Bacteroides species among these American and European cohorts, and it suggests complexity at the species level beyond that captured by the recently proposed gut enterotypes.
MetaPhlAn is available at http://huttenhower.sph.harvard.edu/metaphlan.

Z35 - The Landscape of Somatic Structural Variations in Human Cancer Genomes

Short Abstract: The cancer genome is known to harbor various somatic rearrangements. However, the full spectrum of these alterations and their underlying mechanisms remain poorly understood. Here, we performed a comprehensive identification of somatic Structural Variations (SVs) and the mechanisms generating them, using high-coverage whole-genome sequencing data of tumor and matched normal samples from 48 individuals across five tumor types (glioblastoma, ovarian, colon, prostate and multiple myeloma). By analyzing a total of 160 billion Illumina short reads, 4555 somatic SVs have been identified with true positive rate of 91%. The patterns of rearrangements are highly variable across tumor types and among individuals, with translocations (46%) being the most abundant, followed by deletions (36%) and tandem duplications (18%). Our detailed reconstruction of the events responsible for CDKN2 loss, EGFR and CDK4 gain in glioblastoma revealed much more complex sets of events than previously assumed, sometimes involving dozens of fragments. Our analysis of the breakpoints at base pair resolution shows that focal CDKN2 loss is often generated by non-homologous end joining but could also be generated by microhomology-mediated end joining or template switching mechanisms. Focal amplifications are sometimes generated by complex tandem duplications via template switching mechanism. This study provides new insights on cancer genome rearrangements and their contribution to cancer progression.

Z36 - Computing with Chromatin Modification

Short Abstract: In living cells, DNA is wrapped around histone octamers to make the nucleosomes that comprise chromatin. The histones and DNA can be modified with chemical groups that are added, removed and recognized by multi-functional molecular complexes. Here we present a computational model, in which chromatin modifications are information units that can be written onto a one-dimensional chromatin memory. Chromatin-modifying complexes are modeled as read-write rules that operate on several adjacent nucleosomes. We illustrate the use of this “chromatin computer” by writing programs to solve problems that cannot be solved with finite state automata or logic circuits. We show the execution of these programs on a chromatin computer simulator, and provide animated snapshots of the intermediate states of the nucleosome memory. We model additional features of biological chromatin, resulting in more efficient computation. This formalism is useful both analytically, to model chromatin biology, and theoretically, as a programming paradigm.

Z37 - CAGI: The Critical Assessment of Genome Interpretation, a community experiment to evaluate phenotype prediction

Short Abstract: The Critical Assessment of Genome Interpretation (CAGI, 'k?-j?) is a community experiment to objectively assess computational methods for predicting the phenotypic impacts of genomic variation. In this assessment, participants are provided genetic variants and make predictions of resulting phenotype. These predictions are evaluated against experimental characterizations by independent assessors. The CAGI experiment culminates with a community workshop and publications to disseminate results, assess our collective ability to make accurate and meaningful phenotypic predictions, and better understand progress in the field. A long-term goal for CAGI is to improve the accuracy of phenotype and disease predictions in clinical settings.
This presentation will focus on the practical implications of CAGI 2011 results on a diversity of challenges. The presentation will summarize the state-of-the-art in identifying the impact of variants in a metabolic enzyme and in an oncogene, and thus the appropriate use of such methods in basic and clinical research. CAGI has revealed the relative strengths of different prediction approaches, and the best will be described.
CAGI also explored genome-scale data, showing unexpected successes in predicting Crohn’s disease from exomes, as well as disappointing failures in using genome and transcriptome data to distinguish discordant monozygotic twins with asthma. Predictors had promising complementary approaches in predicting distinct response of breast cancer cell lines to a panel of drugs. Predictors also made measurable progress in predicting a diversity of phenotypes present in the personal genome project participants.
Current information including additional challenges is available at the CAGI website at http://genomeinterpretation.org.

Z38 - Data-driven Prediction of Drug Effects and Interactions

Short Abstract: Adverse drug events remain a leading cause of morbidity and mortality around the world and many are not detected during clinical trials. Fortunately, regulatory agencies and other institutions maintain large collections of adverse event reports, and these databases present an opportunity to study drug effects from patient population data. However, confounding factors such as concomitant medications, patient demographics, and reasons for prescribing a drug often are uncharacterized in spontaneous reporting systems (for example, patient medical histories), and these omissions can limit the use of quantitative signal detection methods used in the analysis of such data. Here, we present an adaptive data-driven approach for correcting these factors in cases for which the covariates are unknown or unmeasured and combine this approach with existing methods to improve analyses of drug effects using three test data sets. We also present a comprehensive database of drug effects (OFFSIDES) and a database of drug-drug interaction side effects (TWOSIDES). To demonstrate the biological use of these new resources, we used them to identify drug targets, predict drug indications, and discover drug class interactions. We then corroborated 47 (P < 0.0001) of the drug class interactions using an independent analysis of electronic medical records. Patients taking combined treatment of selective serotonin reuptake inhibitors and thiazides had a significantly increased incidence of prolonged QT. We conclude that confounding effects from covariates in observational clinical data can be controlled in data analyses and thus improve the detection and prediction of adverse drug effects and interactions.

Z39 - Microbial Phylogenomics

Short Abstract: Foodborne pathogens such Escherichia coli and Salmonella strains are of serious concern to public health. It is estimated that there are from 2 to 4 million cases of salmonellosis occur in the U.S. annually, and is rising at a global level. Many methods have been developed for detection and analysis of foodborne pathogens. Conventional culture methods require many days for presumptive results; several rapid methods are available but still require days to confirm results. These methods are slow to develop new diagnostics for closely related bacterial strains. And there is the increasing need to have a greater understanding of the evolution and biology of pathogens, which are becoming more and more of a concern to public health.

However, advances both in rapid next generation DNA sequencing technology, and phylogenetic analytical methods of large datasets across many taxa, has enabled the potential to compare whole genomes of bacteria; important to both clinical and regulatory science agencies. Enabling a better understanding and in much greater detail, the biology and evolution of bacterial strains, and how they are increasingly becoming a danger to public health.

The ability to construct whole phylogenies of complete bacterial genomes, through advances in rapid genome annotation is becoming more apparent. Development of novel gene clustering and analytical pipelines, coupled with rapid phylogenetic analysis presented in this work, will enable a better understanding of bacterial genomics, and help protect public health. This work presents recent results and methods for constructing phylogenies from entire genomes of bacteria, namely Bacterial Phylogenomics.

Z40 - Fragmentation-free LC-MS can identify hundreds of proteins

Short Abstract:

Z41 - A Statistical Method to Estimate DNA Copy Number from Illumina High-density Methylation Arrays

Short Abstract: For the first time, we report that Illumina high-density methylation arrays can also be used to estimate DNA copy numbers. To illustrate this statistical method, we used the Illumina HM450K methylation array to characterize the copy number aberrations of the HT-29 colon cancer cell line. The result was validated using the golden standard of Affymetrix SNP array. We utilize the CAMDA 2010 data set of glioblastoma to demonstrate that our novel statistical method can potentially lower the cost and reduce processing time of large-scale profiling studies where both DNA copy number and methylation status are of interest. Our new method, named methylCNV, is implemented in the Lumi package of Bioconductor.

Z42 - Novel meta-analysis method for high throughput data allows repositioning of two FDA-approved drugs for treating organ transplant patients

Short Abstract: We hypothesized that there is a common mechanism of rejection across all organ transplants. We applied a novel meta-analysis method to 8 gene expression data set from 4 different transplanted organs consisting of 236 human biopsy samples. Our method identifies differentially expressed genes despite a number of confounding factors including different protocols for sample collection and hybridization, microarray platforms, and organ-specific expression profiles by combining evidence from multiple data sets. We found 102 statistically significantly over-expressed genes across all data sets. Analysis of these genes using BioGPS showed that our method is not affected by tissue-specific expression. Using leave-one-organ-out, we narrowed down to a set of 12 genes, termed as core immune response module (CIRM). After validating over-expression of the CIRM module in an independent cohort of 109 human renal transplant biopsies using microarrays and Q-PCR, we hypothesized that pharmacological modulation of a gene in the CIRM module will reduce the number of graft infiltrating cells.Notably, six out of the 12 genes in the CIRM are targets of 7 FDA-approved drugs, of which two are currently used as immunosuppressive drugs in transplantation. In order to test this hypothesis, we treated cardiac transplants in completely HLA-mismatched mice using dasatinib and atorvastatin, which target LCK and CXCL10, respectively, from the CIRM gene set, and compared them against cyclosporine. Both drugs reduced the number of infiltrating cells in the graft significantly, and as much as Cyclosporine. Dasatinib and atorvastatin also significantly reduced macrophages, dendritic cells and natural killer cells, which cyclosporine did not.

Z44 - Network-based inference of signature genes in Alzheimer’s disease progression as an attractor of a disease state

Short Abstract: Alzheimer's disease (AD) is the most common cause of dementia among the elderly. With the advent of the aging society, 42.3 million people are estimated to suffer dementia worldwide by 2020. The cost for caring for the increasing AD patients will rise dramatically. To address this issue, development of biomarkers diagnosing early progression of AD have been urgently expected to archive preemptive medicine on AD. Here, we inferred the gene regulatory network of AD progression based on AD microarray data, and inferred the master regulators in the network. We collected hippocampal microarray data of 9 control subjects and 22 AD subjects (7 incipient, 8 moderate, and 7 severe AD subjects). According to the AD progression stage, we identified the stage specific genes of AD progression. We then employed the ARACNe algorithm to infer the gene regulatory network of AD pathogenesis. The master regulators (transcription factors) in the network were inferred. As for master regulators and their regulating genes, we estimated an expression potential on a gene expression states plane for 31 gene expression states of 9 control subjects and 22 AD subjects. We then calculated gradients on an expression potential field for showing “disease progression trajectories” among gene expression states, that is, disease states. By examining disease progression trajectories on the expression potential filed defined by master regulators and their regulators, we inferred signature genes of Alzheimer’s disease progression. Our network-based approach is considered to more precisely provide candidate genes for biomarkers for diagnosing AD progression.

Z45 - Network analysis of the progression of Alzheimer’s disease by gene expression data

Short Abstract: In protein interaction network (PIN), hub proteins are classified into innermodular proteins and intermodular proteins. Innermodular hubs always co-express with peripheral proteins within the same modules. Intermodular hubs do not always co-express with the same proteins, and modulate different modules. Taylor et al. showed that some innermodular hubs in good-outcome patients suffering breast cancer become intermodular hubs in poor-outcome patients, which dynamically affect to modules.
We hypothesized that constantly expressed hub proteins constituting different modules through every progression stage of disease have a potential to significantly contribute to disease progression. To find these hub proteins is expected to provide a clue for both elucidation of disease mechanisms and development of biomarkers.
In this study, we sought the influential hub proteins dynamically altering modules in Alzheimer's disease (AD).
We identified expressed genes in each brain region and in each pathological progression stage (Braak stage) using a gene expression dataset in six brain regions across six Braak stages of AD postmortem brain. We then constructed spatio-temporal expressed PINs mapping expressed genes onto the human PIN. Spatio-temporal expressed PINs were divided into modules. We found 118 hub proteins belonged to different modules along AD progression. Of these hub proteins, SNCAIP gene was expressed in the most brain regions. Interestingly, SNCAIP gene is known to associate with Parkinson disease (one of neurodegenerative diseases as with AD). We suggested that SNCAIP gene is a hub gene perturbing interactions along with AD progression.

Z46 - Optimizing functional genomics screening strategies for drug target prediction

Short Abstract: Developing new drugs is a lengthy and expensive process. In comparison, many compounds have been identified from natural sources but their activity on living cells has not been characterized. Recent studies have proven the utility of chemical genomics based on yeast functional genomics tools for the discovery of compounds’ modes of action. Specifically, the chemical genetic interactions of a particular compound across a large deletion strain collection should mimic the genetic interactions of the corresponding target. One limitation of this approach, however, is that it requires a relatively high volume of compound given the size of the deletion collection to be queried. As a solution to this problem, we propose a method to identify a small subset of deletion collection that is the most informative in discovering compounds’ modes of action. We have applied this method on the yeast genetic interaction dataset and identified a diagnostic strain set comprising around 5% of the yeast non-essential deletion collection. We show that even with many fewer genes, the diagnostic set performs comparably to complete chemical-genetic profiles. We also demonstrate that our method is much better than the alternative strategies based on selection of random genes or interaction hubs. Large-scale chemical genomics screens of natural compound libraries based this diagnostic set of genes are currently in progress.

Z47 - A draft map of cis-regulatory sequences in the mouse genome

Short Abstract: As the most widely used mammalian model organism, mice play a critical role in biomedical research for mechanistic study of human development and diseases. Today, functional sequences in the mouse genome are still poorly annotated, lagging significantly behind that of several other model organisms. We report here a map of nearly 300,000 murine cis-regulatory sequences, representing active promoters, enhancers and CTCF binding sites experimentally determined from a diverse set of 19 tissues and cell types. This map provides functional annotation to nearly 11% of the mouse genome, and over 70% of conserved, non-coding sequences. We define tissue-specific enhancers and identify potential transcription factors regulating gene expression in each tissue or cell type. Finally, we demonstrate that cis-regulatory sequences are organized into domains of coordinately regulated enhancers and promoters. Our results provide a resource for the annotation of functional elements in the mammalian genome, and study of regulatory mechanisms for tissue-specific gene expression.

TOP

View Posters By Category

Search Posters:

TOP