20th Annual International Conference on
Intelligent Systems for Molecular Biology


Poster numbers will be assigned May 30th.
If you can not find your poster below that probably means you have not yet confirmed you will be attending ISMB/ECCB 2015. To confirm your poster find the poster acceptence email there will be a confirmation link. Click on it and follow the instructions.

If you need further assistance please contact submissions@iscb.org and provide your poster title or submission ID.

Category G - 'Genetic Variation Analysis'
G01 - Co-expression analysis reveals differential immune response according to p53 status in breast cancer
Short Abstract: Studies on immune response in breast cancer have identified signatures that might predict therapeutic sensitivity, therapeutic efficacy and prognosis. This underscores a need for robust characterization of immune system status and their relation with the known tumor driver mechanisms. p53, apart from its key role in cell cycle, DNA replication and repair, also participates in immune regulation. Therefore characterize the immune response and other related processes according to p53 status.

Methods: A cross-platform compilation of 362 expression profiles of breast cancer samples with known p53 mutation status was prepared from previously published datasets. Class-specific co-expression modules were inferred based on spearman correlation and permutation-based significance. In each group, gene pairs showing class-specific significant correlation were taken further for gene ontology enrichment analysis. Module gene promoters were analyzed for de novo motif enrichment.

Results: The mutant p53 group of breast cancer samples showed significant enrichment of genes involved in various immune response related processes, including T-cell receptor signaling pathway, T-cell development, activation and proliferation, NK cell regulation, interferon gamma production etc. Wild-type p53 group showed enrichment of cell adhesion, collagen, cell cycle and response to steroid hormone stimulus. Module of mutant p53 group showed an enrichment of a motif with 85% similarity to V$AML_Q6 (AML).

Conclusion: We identified consistent enrichment of the genes involved in immune response processes in mutant p53 breast cancer. Our findings add further meaning compared to previous studies reporting immune metagene based on differential expression methods.
G02 - Identification of Activated Cryptic 5’ Splice Sites using Structure Profiles and Odds Measure
Short Abstract: "This poster is based on proceedings submission
The activation of cryptic 5’ splice sites (5’SSs) is often related to human hereditary diseases. The DNA-based mutation screening strategies are commonly used to recognize the cryptic 5’ SSs because features of the local DNA sequence can influence the choice of cryptic 5’SSs.
To improve the identification of the cryptic 5’SSs, we developed a structure-based method, named SPO (structure profiles and odds measure), which combines two parameters, the structural feature derived from hydroxyl radical cleavage pattern and odds measure, to assess the likelihood of a cryptic 5’SS activation in competing with its paired authentic 5’SS. Compared to the current tools for identifying activated cryptic 5’SSs, the SPO algorithm achieves higher prediction accuracy than the other methods, including MaxEnt, MDD, MM, WMM, S&S, Ri, and triangle-G. In addition, the predicted triangle-SPO scores from the SPO algorithm exhibited a greater degree of correlation with the strength of cryptic 5’SS activation than that measured from the other seven methods.
In conclusion, the SPO algorithm provides an optimal identification of cryptic 5’SSs, can be applied in designing mutagenesis experiments for various splicing events, and may be helpful to investigate the relationship between structural variants and human hereditary diseases.
G03 - Large-scale analysis of adult human brain gene expression patterns
Short Abstract: We present results of analyzing a recently-released detailed atlas of gene expression in the human brain. Our motivation is the complexity of the transcriptome and cellular content of the brain (over 200 cell types which collectively express over 80% of the genes in the genome). The use of detailed genome-wide data sets can provide insight into evolutionary, developmental and functional processes governing the observed structures. Previously we showed that in the mouse brain, the dominant expression pattern across the brain is characterized by variance in the expression of known cell type markers, suggesting an inverse relationship between local glial and neuronal content (French et al. (2011) Frontiers in Neuroinformatics 5:12). In the current work we analyzed two high-resolution human gene expression profiles provided by the Allen Institute for Brain Science. This constitutes data on the RNA levels of genes across hundreds of brain regions. We applied principal component analysis and other multivariate approaches to these data and report that the patterns found in the mouse are largely conserved in human. We provide evidence that this pattern can also be found within individual brain regions across donors. An intriguing aspect of data is a small number of genes which show strongly discordant patterns between mouse and human. Our findings may be used to identify functionally important differences between the mouse and human brain, and to identify new cell-type-specific genes associated with the patterns.
G04 - Prediction of operons in microbial genomes by integrating diverse information sources
Short Abstract: The number of published sequenced genomes has been growing in recent years, and at the present time, about 2000 microbial genomes are fully sequenced. The next step after sequencing is to predict genes and their functions from the sequence. Here we used Gene Cluster information for generating protein interaction networks. We hypothesize that in bacteria, the protein interaction networks derived via Gene Cluster and integration of different information could similarly improve methods for predicting operons. Previous studies have shown that operons tend to have short distances between their genes in bacteria. Unfortunately, predictions based on intergenic distance alone increase false positive so other sources of information must be added to bring the specificity to an acceptable level. Progress has been made toward a more generalized method for operon prediction based on a variety of diverse information sources, including codon usage statistics, and identification of promoter and terminator sequences. However, very little has been done to examine the relative contribution of these features, individually and in combination, for operon prediction in genomes other than the genome(s) on which a prediction program is trained. We validate predicted Gene cluster pairs against known operons from E. coli K12 and B. subtilis because only these two organisms have a substantial number of experimentally verified operons. We examined whether features based on intergenic distance and protein function prove informative for both genomes. Moreover, we examine how combining interactions predicted by the other methods can improve Gene cluster operon prediction.
G05 - Tamandua: A Pipeline for Annotation and Statistical Analysis of Differential Gene Expression in Amazonian Species
Short Abstract: Statistical analysis of gene expressions is not a trivial task for biologists. Also an efficient functional annotation of DNA sequences is a major requirement for the interpretation of experimental results from Serial Analysis of Gene Expression (SOLiD-SAGE*). In order to facilitate these studies, we have developed, a user-friendly, R-based, annotation and statistical analysis pipeline, Tamandua. Tamandua uses SAET* for sequence cleaning, a program of SOLiD packages. For alignments, Tamandua is configured for run local BLAST. A Java interface gives an option of model organisms that users should choose in order to align and annotate the studies' scope. To analyze data for differential gene expression, Tamandua uses the bioconductor package of R language. A dispersion graphic is generated to show the genes that increase or decrease its expression in relation to control. Then, the genes are annotated based on NCBI Reference Sequences (RefSeq). Graphics and tables are automatically generated in order to show all the results. We used SAGE-SOLiD sequencing to generate a dataset derived from Escherichia coli and the Tambaqui (Colossoma macropomum) fish. Tamandua was used to process the data. The support from analysis of SAGE-SOLiD demonstrates that Tamandua is well suited for handling data from both model and non-model organisms.
G06 - The impact of natural genetic variation on gene expression dynamics
Short Abstract: DNA sequence variation causes changes in gene expression which in turn has profound effects on cellular states. These variations affect tissue development and may ultimately lead to pathological phenotypes such as cancer. A genetic locus containing a sequence variation that affects gene expression is called “expression quantitative trait locus” (eQTL). Whereas the impact of sequence variation on static expression levels is relatively well studied, much less is known about its influence on the dynamics of expression changes.
Here, we show that defining and detecting eQTL affecting expression dynamics is non-trivial. We propose to distinguish “static”, “conditional” and “dynamic” eQTL and suggest new strategies for mapping these eQTL classes.
By using murine mRNA expression data from four stages of hematopoiesis, we demonstrate that eQTL from the above three classes yield associations with different biochemical meanings. Dynamic and conditional eQTL are different sets of eQTL although they are based on integration of the same expression data. Among a total number of 3,953 eQTL we find 2,791 static eQTL, while 643 eQTL are exclusively active in one cell state and 70 eQTL regulate expression dynamics during cell state transitions. This reveals substantial effects of individual genetic variation on cell state specific expression regulation. Great care is needed when linking individual expression variation in one cell type to physiological phenotypes in different cell types or even other tissues.
G07 - An RNAseq pipeline for analysis of pre-mRNA processing
Short Abstract: Alternative splicing is a powerful mechanism present in eukaryotic cells to get a wide range of mature mRNA molecules from a relatively small number of genes. The mechanisms regulating (alternative) splicing remain largely elusive. The paradigm of consecutive splicing of different introns in pre-mRNA has recently been challenged, especially for genes with a large number of introns. RNAseq, a powerful technology using deep sequencing for transcriptome profiling and transcript quantification, is usually performed on mature mRNA and does not allow for the analysis of other splicing stages.
Sequencing (pre-)mRNA at different stages of splicing potentially provides insight into the mRNA maturation. However, this data requires dedicated algorithms dealing with mixes of pre- and mature mRNA and evaluating the splicing order.
We developed an approach to analyze different mRNA splicing stages. We combined an aligner that maps reads efficiently to both exon-exon and exon-intron junctions and tools that we created for correcting biases in RNAseq data and classification of sequence reads according to the splicing stage. We distinguish three major stages of splicing: pre-, intermediate- and post-splicing. This classification shows a significant difference in the introns coverage in several multi-exon genes and differences in sequence coverage within big introns. These data are supported by reads indicating novel splice junctions and possible recursive splicing events. We created a pipeline performing the whole analysis, but all scripts can also be used independently. Our method is platform-independent and can work with any RNA-seq reads but was evaluated on Illumina HiSeq and GAII data.
G08 - Towards a functional genomics screen for DNA-binding effectors in eukaryotic plant pathogens
Short Abstract: Plant pathogens achieve virulence through the secretion of proteins, known as effectors, which are thought to modulate host cell signalling and suppress immunity. However, the function of many effectors in eukaryotic pathogens is currently unknown. The recent identification of transcription activator-like (TAL) DNA-binding effectors from a bacterial pathogen, make the possibility of DNA-binding effectors in eukaryotic pathogens highly likely. In the current work a support vector machine (SVM) for the prediction of DNA-binding function from sequence is applied to a benchmark dataset of proteins sequences with Gene Ontology function annotations from the eukaryote S.cerevisiae. The SVM values from two models were used to select stringent thresholds, to enable DNA-binding function predictions to be made for sequences with high specificity. This method was then applied to candidate effector protein sequences from two economically important crop plant pathogens, Phytophthora infestans (the Irish potato famine pathogen) and Phytophthora capsici, which share a common host in the tomato. Predictions were made for 42 candidate protein effectors from P. infestans (that include the RXLR sequence motif) and 318 candidate protein effector sequences from P.capsici (that encode CRN effector domains). Predictions for 5 P.infestans and 3 P.capsici candidate effectors met the thresholds determined from the benchmark dataset for DNA-binding function. Data from localisation and yeast two-hybrid (Y2H) assays were collated for these effector sequences, and additional protein function prediction tools used to assess supporting evidence for DNA-binding function. The results are presented and the candidate target effectors selected for further experimental validation discussed.
G09 - Bioinformatics analysis of milk micro-RNAs
Short Abstract: Presence of non-coding RNAs in various living organisms have put renewed light on the complex nature of gene regulation. These RNA molecules work in complex DNA-RNA-protein networks to control gene regulation according to the changes in surrounding environmental conditions. One major subclass of non-coding RNAs is microRNA. Each miRNA can regulate the expression of hundred’s of genes in sequence specific manner. Recent studies have found the presence of miRNAs in various body fluids including record concentrations in milk and colostrum. Thus, milk miRNAs may not only represent informative markers of both lactation status and maternal physiology but, we can also propose that they may act as information carrying signals facilitating the timely delivery of maternal signals to the young. However not much is known about the functions of milk miRNAs. Effects of viral miRNAs on host genes are well documented. A recent study reported the presence of plant food miRNAs in human blood as well as their activity on gene expression. Our aim is to explore the possible roles of miRNAs in the post-partum regulation of the lactating mammary gland or the development of the young. Towards this, publicly available gene expression datasets for lactating mammary gland and, colostrum or milk miRNA content, are analyzed to identify conserved or lineage specific milk enriched miRNAs. miRNA target prediction algorithms, GO ontology and pathway analysis tools are used on predicted networks to investigate the potential of milk miRNAs as makers of lactation physiology or, signaling molecules transmitted from mother to child.
G10 - Systematically cataloging transcripts that are differentially expressed in HIV infected human cells by transcriptome deep sequencing analysis
Short Abstract: Next-generation sequencing (NGS) enables both de novo reconstruction of the transcriptome, and the quantification of the abundance of known and un-annotated transcripts. To better understand host responses to HIV infections at the systems level, we aimed to compile a comprehensive catalog of host transcripts differentially expressed during HIV infection using NGS. SupT1 cells, a CD4+ T lymphoblast cell line, were infected with HIV-1 LAI. At both 12 and 24 hours post-infection, the transcriptomes of both infected cells and matched mocks were analyzed using two complementary versions of NGS: mRNA-seq which sequences polyadenlyated transcripts to evaluate mature mRNA transcripts, and total RNA-seq which utilizes ribosomally depleted RNA to sequence both polyadenylated and nonpolyadenylated transcripts, thereby uncovering coding and non-coding RNAs. In total, we obtained 260 million mapped single end (75nt) reads for mRNA-seq, and 250 million mapped paired-end reads (2 x 90nt) for total RNA-seq. Newly discovered transcripts were cross-checked against current human genome annotations, producing a master, non-redundant list of transcripts. All transcripts were then individually quantified across replicate samples to identify genes differentially expressed during HIV infection. Here we show that the identification of differentially expressed transcripts can be significantly improved through: 1) improved annotation thorough transcript assembly; 2) combining analyses at both the gene and transcript levels; 3) complementary transcriptome sequencing strategies. Compared to typical analyses of canonical protein-coding genes, additional insights into host responses to HIV infections are gained through non-coding RNAs, novel transcripts, and intronic transcriptional activities. Detailed results of our analysis will be presented.
G11 - An analysis pipeline for FAIRE-seq data
Short Abstract: Formaldehyde-assisted isolation of regulatory elements followed by sequencing (FAIRE-seq) is a straightforward method that allows the genome-wide identification of nucleosome-depleted regions which often have regulatory functions. Here we used FAIRE-seq to identify chromosomal segments responsive to 3-amino-1,2,4-triazole (3-AT) treatment to unravel the genomic response to an intracellular accumulation of H2O2 in Arabidopsis thaliana.

The FAIRE-seq analysis that we perform can be divided into four general steps: (i) mapping of the genes to the reference genome; (ii) identification of the enriched regions; (iii) analysis of the overlapping and nearest genes and the connection with gene-ontology (GO) categories evaluating the over-representation of GO terms for a specific function, location and/or processes. Finally, in step (iv) motifs in the enriched regions are identified. Our goal is to have an all-encompassing workflow that take of these details letting the user focus on the biological question.

From step (i) we perform an automatized data format-change and sorting, this part can be parallelized, handling control and treatment data at the same time. Then user can easily configure peak-calling parameters for different packages using a configuration file in YAML format. From identified enriched regions, a Perl module in step (iii) allow the linking between enriched, previously parse annotation and GO identifiers, all this data is organize in a relational database in MySQL using DBI-Perl-module. For R statistics/graphics and motif analysis others modules are included. We expect that this new pipeline will be widely useful, while simplifying the data workflow that general has to undergo several steps/processes.
G12 - The Functional Genomics Production Team Software Suite
Short Abstract: The Functional Genomics Production Team at the European Bioinformatics Institute are responsible for the development of a large suite of software designed to facilitate the submission, curation, exchange, integration and annotation of functional genomics datasets, including microarray and high throughput sequencing datasets. Here we present an overview of our publicly available open source tools.

There are three main categories of software in the Functional Genomics Production Team (FGPT) software suite: tools for automation of curation and dataflow, software to support working with community accepted data formats such as MAGE-TAB and ontology annotation tools.

Dataflow software includes Conan (an extremely lightweight workflow system), tab2mage (a series of Perl modules facilitating data submissions to ArrayExpress), and tools for downloading and converting data from GEO and ENA.

Data format tools include Limpopo (a MAGE-TAB and SampleTab parser), MageComet (a semi-automated curation tool for MAGE-TAB), Corona (a curation interface for modifying any functional genomics data resources that can be accessed via a RESTful API), Experiment Checker (a Perl module for validating MAGE-TAB documents) and several tab2mage Perl modules for converting common data formats into MAGE-TAB documents.

Ontology tools include ZOOMA (a tool for determining optimal ontology mappings and annotations), bubastis (on ontology "diff" tool), OntoCAT (a tool for interacting with a variety of ontology resources, including BioPortal and OLS), urigen (a service and Protege plugin for collaboratively generating new ontology URIs) and Fuzzy Recognizer (a Perl module for doing text-based matches to ontology terms).
G13 - Reconstruction of Operon Structures in Prokaryotes from Short Directional RNA-seq Reads Using a Hidden Markov Model
Short Abstract: Although gene expression has been studied in prokaryotes over decades, many aspects of prokaryotic transcriptome remain poorly understood. In particular, how do transcript structures vary under different growth conditions and growth phases? How are the gene expression levels quantitatively regulated? And what portion of a prokaryotic genome are transcribed, including small RNAs and antisense transcripts? With the availability of new generations of sequencing technologies, RNA-seq has provided an unprecedented opportunity to address these important questions. However, accurate assembling the short reads from RNA-seq experiments in prokaryotes is a highly challenging task due to at least two reasons: First, due to the pervasive nature of prokaryotic RNA polymerase, there are many promiscuous transcripts, making the transcriptome rather noisy. Second, because of the labile nature of prokaryotic RNAs, some fragments of RNA might be lost during sequence library preparation, resulting in incomplete coverage of transcribed regions. To address these problems, we have developed a Hidden Markov Model based algorithm to reconstruct the transcripts/operons using as the input the mapped RNA-seq reads to the reference genome. When tested on a strand specific RNA-seq dataset of Escherichia coli K12 str. MG1655 under a variety of growth conditions and growth phases, our algorithm outperforms all tested state of the art transcriptome assembly algorithms in prediction accuracy. With RNA-seq techniques becoming a routine in the research community, we hope our algorithm and toolkit can be very useful to efficiently and accurately survey the dynamic transcriptional activity and regulation mechanism in prokaryotic cells.
G14 - Discovery of motif-based regulatory signatures in whole genome methylation experiments
Short Abstract: This poster is based on Proceedings Submission 105

Motivation: High-throughput experimental techniques enable a paradigm shift from isolated *omics approaches to comprehensive systems biology studies. The vast amounts of integrated data generated by this shift have to be supported by powerful computational biology techniques. An application to analyze sites of methylation identified by Methyl Binding Domain 2 pulldown of genomic DNA followed by massively parallel sequencing is presented here. The automated pipeline takes the sequence data from the methylation peak calling stage to functionally annotated regulatory signatures. These signatures include unique transcription factor binding site motifs and correlations of methylation peaks with mRNA expression profiles.

Results: The pipeline has been successfully applied to murine hematopoietic stem cells, multipotent hematopoietic progenitors and differentiated red cell precursors allowing the evaluation of changes in the associated transcription and expression profiles as a vertical analysis. Horizontal characterizations of transcriptional profiles have been analysed using methylation profiles of monocyte cells from Chronic Myelomonocytic Leukemia patients undergoing treatment with the methylation inhibitor 5-aza-cytidine. For both studies strong agreement with the existing body of literature can be found, demonstrating the applicability of the pipeline to large-scale systems biological problems.

Availability: The source code for the pipeline presented here is made available under the GNU General Public License, version 3.0 (GPLv3) through the Google Project Hosting at http://code.google.com/p/nextgen-signatures
G15 - iPlantTF: a Genome-scale Plant Transcription Factor and Transcription Regulator Identification and Classification Server
Short Abstract: The advancement of high-throughput sequencing technologies has significantly advanced the study of gene expression regulatory mechanism at whole genome level in plants. This urgently calls for the development of high-throughput bioinformatics systems to effectively mine large-scale sequences for systematical identification of important regulatory elements, such as transcription factor and transcription regulator genes. We developed a web-based analysis server, namely iPlantTF, which integrates 1) a sophisticate back-end high-performance parallel computing prediction module to identify and classify plant transcription factors and transcription regulators in user-submitted sequences at high prediction accuracy and coverage, and 2) intuitive web interfaces for user to submit large-scale sequences and retrieve analysis results.

The iPlantTF integrates conserved domain patterns for ~100 published transcription factor and transcription regulator families in plants. These patterns were manually curated to guarantee the prediction quality. The back-end prediction module employs InterProScan, the popular protein domain search tool, to search unique domains in the user-submitted sequences, and further screen potential transcription factors by referring to conserved domain patterns of each transcription factor family. To enable high-throughput analysis capability, we optimized the InterProScan by trimming its databases only to include relevant domain information, and further accelerated the back-end prediction module using our in-house parallel computing platform, namely BioGrid, which is able to effectively use a 400-core Linux cluster. With these optimizations, iPlantTF system is able to analyze genome scale sequences, for example, in our test, the iPlantTF analyze the model plant Arabidopsis thaliana genome within eight minutes. The iPlantTF is publicly available at http://plantgrn.noble.org/iPlantTF/.
G16 - Signal distributions of assay driven probe subtypes on the Affymetrix SNP6.0 and their effects on array correspondence.
Short Abstract: Except for amplicon length and a GC bias, the impact of the sample generation method or probe design limitations have not been carefully explored for Affymetrix SNP6 assays. While not discounting effects due to sampling, detecting rare alleles, the sensitivity of association methods or adequacies of their implementations, it seemed that the impact of the primary assay elements deserved a stringent examination. Indeed, the combination of assay steps and probe specifications yields several clearly distinct marker subtypes in the Affymetrix SMP6 assay: computational modeling of the Affymetrix SNP 6.0 assay reveals a complicated data space. In this work we sub-type the probes according to how assay steps affect their targets. We present here a visual walk- through of these results, graphically demonstrating the different distributions of the different groups of signals that emerge from the array. Lastly we show that careful consideration of the assay parameters and biophysical properties gives an accurate idea of the true correspondence between arrays. The value to the approach is two-fold; an informed approach to subtyping measurements permits better choices of statistical tests during data processing, and eliminating problematic subsets of the data will diminish the non-biological variation in class-specific ways and improve the power and consistency of these studies.
G17 - Variants Affecting Exon Skipping Contribute to Complex Traits
Short Abstract: DNA variants that affect alternative splicing and the relative quantities of different gene product transcripts have been shown to be risk alleles for some Mendelian diseases. However, for complex traits with low odds ratios for any single contributing gene or variant, very few studies have investigated splicing variants. The overarching goal of this study is to discover and characterize the role that variants affecting alternative splicing may play in the emergence of complex traits, which include a significant number of the common human diseases. Specifically, we hypothesize that single nucleotide polymorphisms (SNPs) in splicing regulatory elements can be computationally characterized to accurately identify variants affecting splicing, and that these variants may contribute to the etiology of complex diseases as well as inter-individual variability. We leverage high-throughput expression profiling to 1) experimentally validate our in silico identified skipped exons and 2) to characterize the molecular role of intronic genetic variations in alternative splicing events in the context of complex human traits and diseases. Furthermore, we propose that intronic SNPs play a role as genetic regulators within splicing regulatory elements and show that their associated exon skipping events often affect protein domains. We find that human complex trait-associated SNPs are enriched among intronic splicing enhancers. This finding raises the possibility that therapies targeting alternative splicing mechanisms may be of value in treating a disease.
G18 - Allele specific expression changes after induction of inflammation
Short Abstract: Recent advances in RNA and DNA sequencing technology has enabled a more detailed picture of gene expression and genomic differences to emerge. One particularly interesting aspect is the difference in expression between the two different alleles of a gene within a single individual, one inherited from the mother and one from the father. Any such allele specific expression (ASE) could indicate an allele-specific cis-acting genetic factor. ASE thereby provides an efficient means to explore the functional effects of genomic variation and can help in identifying functional variants in the extensive conserved non-coding part of the genome. In this study we assessed ASE in human white blood cells with and without treatment of the immune-inducing chemical LPS by performing RNA-seq on several individuals. This allowed studying ASE of transcripts which potentially are of special importance in inflammation. Further, to find candidate haplotypes responsible for observed allelic differences we conducted whole genome genotyping of the RNA source subjects. Genotyping can also permit the determination of carriers of risk-alleles associated to atherosclerosis by using a previous genome-wide association study. Preliminary results indicate that about 5% of all genes show ASE. Searching for variants where a change in allele specificity was induced by the treatment, a total of 117 unique significant variants were detected among all individuals, of which ten variants were found in two or more individuals.
G19 - RNA-Seq Analysis of Eucalyptus Genotypes that Differ in Carbon Allocation
Short Abstract: The global demand for wood combined with its diversified use requires the generation of trees that sequester and accumulate more carbon, providing differentiated raw material for the production of pulp, paper, charcoal and even cellulosic ethanol. Since differences in levels of gene expression may largely explain the observed phenotypic variation, and there is great variability among species of Eucalyptus, we decided to perform a gene expression analysis of four contrasting Eucalyptus genotypes to gain insight into the mechanisms that lead to differences in carbon allocation. Leaf, xylem and root samples from each genotype were used for transcriptome sequencing using Illumina Hi-Seq technology. After control quality analysis, the reads were mapped to the Eucapyptus grandis reference genome using TopHat, and the gene expression analysis was performed using CuffDiff. Pairwise comparisons were carried out between tissues and genotypes, and statistical tests were performed to assess differential expression. The analysis was executed within Galaxy, which also permitted us to visualize the reads mapping. CummeRbund was used to generate result tables and charts. Gene Ontology terms were assigned to the genes using InterProScan, and Bingo was used to identify the enriched terms. We generated a total of 89.3Gb reads and between 70.75% and 90.33% of them were aligned to the reference genome. We then computed the FPKM values, thus allowing us to identify 26,190 differentially expressed genes out of 44,974 predicted genes. We expect to unveil candidate genes for carbon allocation that will be further investigated by transgenic over- and down-regulated expression approaches.
Short Abstract: Bayesian network structure learning (BNSL) methods provide an important approach for predicting functional relationships among biomolecules – including transcriptional regulatory relationships. Gene expression patterns can be analyzed by BNSL methods to infer such functional and regulatory relationships. Pituitary adenylate cyclase-activating polypeptide (PACAP) is implicated in a number of neurodegenerative and behavioral disorders. In this study, Bayesian networks were used to find relationships involving PACAP and signaling molecules also associated with the same set of neurodegenerative and behavioral disorders. Data were drawn from a compendium of microarray data from the Phenogen database, representing mRNA expression levels in the whole brain of a variety of strains of mouse. A subset of relevant genes including those for growth factors and their receptors, effector molecules, as well as protein kinases and their substrates was used in the analysis. Established regulatory relationships among members of the mitogen-activated protein kinase family functioned as the initial structure (“must-be-present”) for the BNSL analysis. Version 2.2 of the Bayesian Network Inference with Java Objects (Banjo) toolkit was employed. Both search algorithm options in Banjo, Simulated Annealing and Greedy Search, were used separately to generate a single highest-scoring network. Each search option was paired with the evaluation of a single random local move and of all local moves in each step. Networks generated were compared with each other, and were examined for consistency with the literature. The analysis revealed novel regulatory relationships, demonstrating the utility of Bayesian network structure learning in designing experiments to elucidate signaling relationships in pharmacology.
G21 - A framework for functional variant detection in human genomes and drosophila melanogaster
Short Abstract: Several public tools have already been developed to detect and quantify functional variants within coding variants of next generation sequencing data in order to improve our understanding of genotype phenotype relationships. Prediction of functional variants in non-coding regions remains challenging. Here we are proposing a core framework of features based on multiple species conservation. This framework does not only allow extension of functional variant prediction to non-coding regions, but also provide a core set of features to build species specific models. The performance of these models is demonstrated on variants found in the human genome as well as on variants found in drosophila melanogaster.
Based on data from the human gene mutation database, we are extending this framework to allow for predictions of all variants in intron and promoter regions annotated in the human genomes. Accuracies are being evaluated by cross-validation. Tests on somatic cancer mutations in coding regions show that our model successfully selects those variants common to cancer patients. Sets of functional variants based on different FDR thresholds can be found under http://tingchenlab.usc.edu/sinbad.
Finally we examine how different types of population genetic information based on 35 individual genomes of the drosophila melanogaster genetic reference panel and 35 individual genomes collected from Winters,CA contribute towards functional variant detection in drosophila melanogaster.
G22 - Fine-tuning the scales of hierarchical gene co-expression programs for discovering microRNA regulations and functions
Short Abstract: MicroRNAs (miRNAs) play important roles in many biological and physiological processes. Though hundreds of miRNAs have been discovered, only a few are linked to diverse cellular functions. MiRNA target gene set enrichment analysis in differentially expressed genes is a very useful approach to infer miRNA regulations from genome-wide gene expression profiles generated from different biological processes. To increase the sensitivity of miRNA target gene set enrichment analysis, we developed a method, named Hi-GSA, to incorporate hierarchical gene co-expression information into the enrichment analysis. The method can identify the functional miRNAs which are only linked to sub-clusters of the differentially expressed genes at fine scales by analyzing the miRNA target gene set enrichments along the hierarchical gene co-expression programs. Hi-GSA discovered much more miRNA regulations with literature supports than Fisher’s Exact Test (FET) and Gene Set Enrichment Analysis (GSEA) both on the simulated datasets and two real biological datasets. For the TNF-stimulated vascular endothelial cells dataset, it is found that miR-23~27~24 cluster may cooperate regulate distinct pathways in TNF signaling and that miR-27 and miR-381 may regulate angiogenic sprouting by regulating several Notch signaling pathway ligands. For the colorectal cancer dataset, while Hi-GSA reported 19 onco-miRNA regulations including 7 supported by literatures (miR-21, miR-92, miR-19, etc.).
Short Abstract: A breast cancer disease is the leading cause of cancer related deaths amongst women in the developed nations of the World. Over the past few years now, Increasing concern has been expressed about the need for a therapeutic research on natural agents that can selectively inhibit breast cancer cells (BCC) growth with minimal or nil side effects as compared to their synthetic counterparts such as tamoxifens. Hence, due to paucity of knowledge, we aim to investigate the modulating effects of sesame phytoestrogens (lignans-sesamin, sesamolin, sesamol etc) obtained from sesame radiatum leaves on the MCF7 cell lines and comparing this to both propyl pyrazole triol (PPT- a selective ERa agonist) and ß-estradiol (E2- agonist on both ERa and ERß subtypes) in an in-vitro assay model.
We explore the use of human MCF7 cells assay model based on the fact that they are estrogen responsive breast cancer cell line with capability of expressing both estrogen receptor subtypes [such as estrogen receptor-alpha (ERa) and estrogen receptor-beta (ERß)].
In our previous studies, we have also shown that Sesame phytoestrogenic lignans and other phytochemicals have been found to modulate estrogen receptors activities in the testis and epididymis among the other reproductive organs studied (Shittu et al, 2007, 2008). The proposed hypothesis that would be explored in the present study is that sesame phytoestrogens –regulated cellular pathways can selectively inhibit the ERa, which tends to arrest breast cancer cells growth by altering the function of nuclear cellular proteins (transcription factors) that BCCs used to enhance the synthesis of ERa
G24 - Lysine Catabolism via Aminoadipic Semialdehyde (AASA) in Prokaryotes: Towards Stress Adaptation?
Short Abstract: The Saccharopine Pathway of lysine catabolism has been linked to stress responses in plants and animals, where overexpression of aminoadipic semialdehyde dehydrogenase gene (aasadh) protected against osmotic and oxidative stress. Recently, we demonstrated the existence of active forms of lysine-ketoglutarate reductase (LKR) and saccharopine dehydrogenase (SDH) in the marine alphaproteobacterium Silicibacter pomeroyi. Here, we used Bioinformatics to demonstrate the occurrence, genomic organization and phylogenetic relationship of lkr and sdh genes among prokaryote species whose genomes have been fully sequenced. Surprisingly, only 27 species, out of 1.479 analyzed, contain the lkr and sdh genes, but 323 species have aasadh orthologs. A sdh-related gene, located adjacently to an aasadh gene was identified in the S. pomeroyi genome. This gene, annotated as lysine dehydrogenase (lysdh), encodes an enzyme that directly converts lysine into aminoadipate-semialdehyde (AASA), and was further identified in 159 organisms, 36 of which in the lysdh-aasadh configuration. Most of these organisms live in stressing environments. A gene encoding lysine aminotransferase (LAT) was also identified in 51 other organisms which have the aasadh homologous gene and do not contain lkr, sdh nor lysdh. LAT also catalyzes direct conversion of lysine into AASA. Lkr, sdh, lysdh and aasadh were found adjacent to stress-related genes in S. pomeroyi. We used Real Time PCR to evaluate the lkr, sdh, lysdh and aasadh expression in this bacterium, which were shown to be induced by salt stress. We hypothesize that bacteria evolved diverse pathways to produce AASA from lysine in order to protect them against environmental stress.
G25 - Finding transcriptomic modules mediating plant defense responses to herbivory using systems biology
Short Abstract: Given the importance of gene-gene interactions and their output in plant defense mechanism, a major activity in the field of plant systems biology has been to identify them systematically through analysis of omics data set. Since gene regulatory networks are dynamical systems whose structure is fully revealed only in their temporal response to perturbations, time series add another dimension to other factors like conditions, developmental stages, tissue types, and genetic interventions making a complex factorial experimental design. The aim of the study is to derive better illustration of biological questions using integrative approach by combining non targeted metabolomics data, transcriptomics data and literature derived metabolomic and genomic pathway information. Using this approach, we are studying ecological interactions of Nicotiana attenuata, a wild tobacco plant found in the Desert of Southwestern US, against its specialist herbivore Manduca sexta. Comparing two tissues in response to induction across time series, helped us in finding transcriptomic modules which are tissue or treatment specific, in studying systemic signaling in plants, finding the role of root tissue in above ground herbivory and also, in hypothesizing the role of diurnal rhythms in defense mechanism. Few genes important in regulating these pathways have been identified and are in the process of functional characterization.
G26 - Genome-wide De novo discovery of cis-regulatory modules in yeast using a graph-based method
Short Abstract: In eukaryotes, transcription factor (TF) binding sites(TFBS) are often organized in clusters, called cis-regulatory modules(CRMs), to achieve complex regulation. Most of the current CRM prediction methods have to depend on the information of known regulatory modules and/or TFBS motifs, such like position-specific weight matrices. Thus, they are unable to find new types of TF combinations as well as new TFBSs. One of the major obstacles for predicting novel TF combinatorial regulations is the high false positive rate of de novo motif finding tools. To circumvent this problem, we developed a graph-based methodology to find CRMs in genome-scale. In our approach, we couple the motif discovery with CRM discovery, assuming that the motif predictions can be improved by utilizing their combinatorial information in genome-wide. Specifically, motifs are predicted from each group of orthologous promoters and grouped into windows. A graph is then constructed with windows from all the genes in the target genome as nodes and the similarities between the motifs contained in the two windows as the weight of the edge connecting them. A clustering method is then applied on the graph. Since real CRMs usually present in multiple promoters of co-expressed genes, they are likely to form a cluster while spurious motifs and windows containing little in common are likely to be isolated. Thus, we could predict windows in a cluster as putative CRMs. When tested on seven yeast genomes, our method can efficiently recover known as well as predict new CRMs.

View Posters By Category

Search Posters: