ISMB/ECCB 2011 Posters

19th Annual International Conference on
Intelligent Systems for Molecular Biology and
10th European Conference on Computational Biology

Accepted Posters

Category 'G'- Functional Genomics'

Poster G01

Large-Scale Computational Identification of Regulatory SNPs

Alberto Riva University of Florida

Short Abstract: rSNP-MAPPER is a web-based tool to identify SNPs that potentially affect a Transcription Factor Binding Site to a significant extent (regulatory SNPs, or rSNPs). rSNP-MAPPER builds on MAPPER, a previously developed application for the computational detection of TFBSs in DNA sequences. We have provided MAPPER with the ability to analyze two variants of the same sequence (each containing one allele of a SNPs), determining whether the substitution results in a significant change in the TFBS predictive score.

The application's provides an intuitive and flexible interface. The user may search for potential rSNPs in the promoters of one or more genes, that can be specified as a list, or chosen from the members of a pathway. Alternatively, the user may specify a set of SNPs to be analyzed by uploading a list of SNP identifiers or providing the coordinates of a genomic region; rSNP-MAPPER will determine which SNPs lie within a TFBS and compute the corresponding score changes. Finally, the use can provide two alternative sequences (wildtype and mutant): the system will analyze them to determine the location of variants, to identify potential TFBSs, and to determine the effect of the variants on the TFBS scores. rSNP-MAPPER is optimized to efficiently perform all these operations on a large scale, allowing for the fast annotation of thousands of SNPs.

We present the architecture of rSNP-MAPPER, and we describe its usage through several examples that demonstrate its ability to correctly identify previously known rSNPs, or to predict new ones with high confidence.

Poster G02

Systematic functional characterization of cancer genomes using large scale pooled shRNA screening

Barbara Weir The Broad Institute

Shuba Gopal (The Broad Institute, RNAi platform); Jesse Boehm (The Broad Institute, Cancer Program); Glenn Cowley (The Broad Institute, RNAi platform); Hiu Wing Cheung (The Broad Institute, Cancer Program); Terence Wong (The Broad Institute, Cancer Program); Pablo Tamayo (The Broad Institute, Cancer Program); Nicolas Stransky (The Broad Institute, Cancer Program); Jordi Barretina (The Broad Institute, Cancer Program); Aviad Tsherniak (The Broad Institute, Cancer Program); Josh Gould (The Broad Institute, Cancer Program); Scott Rusin (The Broad Institute, RNAi platform); Justin Scott (The Broad Institute, RNAi platform); Aravind Subramanian (The Broad Institute, Cancer Program); Levi Garraway (The Broad Institute, Cancer Program); Matthew Meyerson (The Broad Institute, Cancer Program); Jill Mesirov (The Broad Institute, Computational Biology and Bioinformatics Program); David Root (The Broad Institute, RNAi platform); William Hahn (The Broad Institute, Cancer Program);

Short Abstract: A wealth of information is being generated with the goal of characterizing all somatic alterations in cancer genomes across various tumor types. A complementary approach to these cancer genomic efforts involves the systematic functional assessment of genes and their requirement for producing malignant phenotypes. This project uses a genome scale loss-of-function screening approach in a diverse panel of cancer cell lines to identify genes involved in cancer pathogenesis. We developed a complete pipeline of computational methods for the processing and analysis of RNAi pooled screening data. Viability data obtained from Affymetrix custom barcode array measurements were processed at the shRNA level, then combined into gene scores and utilized in higher level analyses. We demonstrated that the relative effects on proliferation, as measured in a retest of 350 individual shRNAs in low throughput, agreed well with the quantification of shRNA pooled screen data. A class-discrimination feature selection method was used to identify and rank shRNAs by their ability to distinguish specified classes, including lineage of origin or mutational status. Specifically, we applied our analyses to the ovarian lineage and revealed genes, including PAX8, that were both essential in ovarian cancer cells and reside in recurrently amplified regions found by analyses of ovarian cancer genomes from The Cancer Genome Atlas project. This scheme of integrating structural and functional genomics efforts provides a general and efficient course to identifying cancer drivers, novel targets for cancer therapeutics and previously unknown dependencies.

Poster G03

Annotating New Genes: From In Silico Screening to Validations by Experiments

Shizuka Uchida Max-Planck-Institute for Heart and Lung Research

Pascal Gellert (Max-Planck-Institute for Heart and Lung Research, Dept. of Cardiac Development and Remodelling); Katharina Jenniches (Max-Planck-Institute for Heart and Lung Research, Dept. of Cardiac Development and Remodelling); Mizue Teranishi (Max-Planck-Institute for Heart and Lung Research, Dept. of Cardiac Development and Remodelling); David John (Max-Planck-Institute for Heart and Lung Research, Dept. of Cardiac Development and Remodelling); Piera De Gaspari (Max-Planck-Institute for Heart and Lung Research, Dept. of Cardiac Development and Remodelling); Thomas Braun (Max-Planck-Institute for Heart and Lung Research, Dept. of Cardiac Development and Remodelling);

Short Abstract: In recent years, the advent of high-throughput analytical techniques, such as microarrays and serial analysis of gene expression (SAGE), has led to a rapid accumulation of biological data. The large size of databases precludes manual analysis and renders unsystematic approaches obsolete. To cope with these new challenges and to facilitate efficient data analyses, numerous academic and commercial software packages and databases have been developed. Yet, genes to which no biological function has been assigned compromise the usability of these data. To facilitate functionally annotating these so-called ‘novel genes’, an in silico screening of such genes has been developed by focusing especially on their expression patterns, namely their “tissue-enrichment” and a knowledge database called “C-It” (http://C-It.mpi-bn.mpg.de) has been developed. Emphasis has also been placed on “tissue-specific” isoforms by developing a tool to analyze Affymetrix’s Exon Array, called “Exon Array Analyzer (EAA)” (http://EAA.mpi-bn.mpg.de/).
With these algorithms and tools, ~1,000 genes are currently being annotated, which are enriched in a tissue but have not been characterized. To this end, a project called “1,000 Genes Project” has been initiated to functionally annotate these evolutionary-conserved, tissue-enriched genes with unknown functions using various model organisms (mouse, chicken and zebrafish) and in vitro models (ES cells). In order to benefit from this project, we are also screening for such genes with possible relations to human diseases by incorporating SNPs information. In this conference, we would like to share some of our preliminary data on evolutionary-conserved, tissue-enriched genes with unknown function.

Poster G04

Maximizing the number of aligned reads from RNA-Seq data

Thomas Bonfert Ludwig-Maximilians-University Munich

Gergely Csaba (Ludwig-Maximilians-University Munich, Institute for Informatics); Ralf Zimmer (Ludwig-Maximilians-University Munich, Institute for Informatics); Caroline C. Friedel (Ruprecht-Karls-University Heidelberg, Institute for Molecular Biotechnology);

Short Abstract: RNA sequencing by next generation sequencing technologies (RNA-Seq) provides a novel way to characterize the transcriptome of different cell types. One important application is the detection of different transcripts from the same genetic locus.
For this purpose, the position of reads on the transcriptome has to be determined first. If an organism with a sequenced and well-annotated genome is investigated, a common approach is to align the generated reads against a reference transcriptome. Based on the alignment, the reads are mapped onto exons and exon-exon junctions of annotated genes. This mapping then provides the basis for quantification and analysis of the transcriptome. Unfortunately, depending on the quality of the sequenced reads, this approach generally fails to identify the origin for a significant large amount of reads.
In order to address this problem, we have developed a pipeline which maximizes the number of reads whose origin can be explained by investigating several alternative possibilities. This pipeline is suitable for RNA-Seq data for any organism with available transcriptome and genome annotation. In a first step, a transcriptome mapping is performed as described above. Second, alignment to the genome identifies both incompletely spliced transcripts as well as novel splicing events. Finally, novel transcripts are predicted and potential sources of contaminations are analyzed. Our evaluation results show that using this advanced read processing pipeline, the number of aligned reads can be increased considerably compared to alternative approaches.

Poster G05

Trypanosoma cruzi gene expression analysis in response to gamma radiation

Priscila Grynberg Universidade Federal de Minas Gerais

Danielle Passos-Silva (Universidade Federal de Minas Gerais, Departamento de Bioquímica e Imunologia); Marina Mourão (Centro de Pesquisas René Rachou/FIOCRUZ, Grupo de Genômica e Biologia Computacional); Roberto Hirata-Jr (Universidade de São Paulo, Instituto de Matemática e Estatística); Andrea Macedo (Universidade Federal de Minas Gerais, epartamento de Bioquímica e Imunologia); Carlos Machado (Universidade Federal de Minas Gerais, epartamento de Bioquímica e Imunologia); Daniella Bartholomeu (Universidade Federal de Minas Gerais, epartamento de Parasitologia); Gloria Franco (Universidade Federal de Minas Gerais, epartamento de Bioquímica e Imunologia);

Short Abstract: Among the various types of damages caused by ionizing radiation, the lesions of greatest biological importance are the double-stranded DNA breaks. The damage tolerance requires an efficient apparatus for recognition and repair of lesions and these characteristics define the degree of resistance of various organisms to this type of stress. Trypanosoma cruzi is highly resistant to gamma rays, considering that a 500 Gy dose induces a genomic DNA fragmentation, but the karyotype is gradually repaired and the chromosomal bands pattern is restored in less than 48 hours. In this study, we aimed to compare T. cruzi epimastigotes gene expression from cells irradiated or not with a sub-lethal dose of 500 Gy, through microarray experiments. Our results showed that irradiation caused an arrest in cell growth, but did not significantly affect the integrity of RNA molecules, contrary to what was seen in DNA molecules. The gene expression was affected in a time-dependent manner, because the peak of down-regulated genes (composed mostly of genes with known function) and up-regulated genes (composed mostly of genes of unknown function) occurred after 4 and 96 hours, respectively. Four maxicircle genes and those coding for retrotransposons hot spot protein (RHSs) were strongly induced after 48 hours. However, genes related to basal metabolic functions presented a decrease in the expression levels. There was no induction in the expression of DNA repair genes, excepted for the tyrosil-DNA phosphodiesterase 1 gene.

Poster G06

Mining of Transcriptome of a climacteric fruit species Diospyros kaki Thunb for functional genomics

Farzana Rahman University of Glamorgan

Dr. Gaurav Sablok (Huazhong Agricultural University, Shizishan, Wuhan, China 430070, Key Lab of Horticultural Plant Biology(MOE)); Dr. Tatiana Tatarinova (University of Glamrogan,Pontypridd, Wales, UK, Computational Biology Group);

Short Abstract: Our current research focuses on developing computational resources for analysis of genome of persimmon (Diospyros kaki). Persimmon is an economically important climacteric fruit species, which has promising medicinal values. The genus Diospyros is widely distributed in Asia, Africa, and America and is characterised by complex ploidy level.
In order to provide an insight into the developmental stages of D. kaki, we conducted a comparative analysis of developmental stage specific ESTs libraries. A total of 2,529 putative tentative unigenes were identified in the Mature Fruit (MF) library whereas 3,775 tentative unigenes were identified in the Ovary and Young Fruit (OYF) library.
Comparative mining of SSRs between two ESTs libraries showed 325 EST-SSRs in 296 putative unigenes which were detected in the MF library showing an occurrence of 11.7% with a frequency of 1 SSR/3.16 kb and the OYF library showed an EST-SSRs occurrence of 10.8% with 407 EST- SSRs in the 352 putative unigenes with a frequency of 1 SSR/2.92kb. SNPs and indels were also identified and compared among the two cDNA libraries using redundancy and co-segregation based approach. Overall, the OYF library has 20.94 SNPs/indels per 100 bp, whereas the MF library has only 0.74 SNPs/indels per 100 bp. We detected 506 SSRs primer pairs that can be utilized to infer genetic diversity in persimmon. We also found potentially conserved 159 miRNA family expressed during the developmental stages in OYF library which correlates the potential involvement of miR159 family in development and relative distribution of SNP and SSR-FDMs markers.

Poster G07

An integrated analysis of molecular aberrations in NCI-60 cell lines

Chen-Hsiang Yeang Academia Sinica

Short Abstract: Cancer is a complex disease where various types of molecular aberrations drive the development and progression of malignancies. As large-scale screenings of multiple types of molecular aberrations become increasingly common, a computational model integrating multiple types of information is essential for the analysis of the comprehensive data. We propose an integrated modeling framework to identify the statistical and putative causal relations of various molecular aberrations and gene expressions in cancer. We sequentially applied three layers of logistic regression models with increasing complexity and uncertainty regarding the possible mechanisms connecting molecular aberrations and gene expressions. We applied the layered models to the integrated datasets of NCI-60 cancer cell lines and validated the results with large-scale statistical and biological analysis. Furthermore, we discovered/reaffirmed the following prominent links: (1)Protein expressions are generally consistent with mRNA expressions. (2)Several gene expressions are modulated by composite local aberrations. (3)Amplification of chromosome 6q in leukemia elevates the expression of MYB, and the downstream targets of MYB on other chromosomes are up-regulated accordingly. (4)Amplification of chromosome 3p and hypo-methylation of PAX3 together elevate MITF expression in melanoma, which up-regulates the downstream targets of MITF. (5)Mutations of TP53 are negatively associated with its direct target genes. These validated analysis results justify the utility of the layered models for the incoming flow of cancer genomic data.

Poster G08

CNAmet: an R package for integrating copy number, methylation and expression data

Riku Louhimo University of Helsinki

Sampsa Hautaniemi (University of Helsinki, Institute of Biomedicine);

Short Abstract: Genomic instability is a key enabling characteristic of tumorigenesis. Identification of regions with copy number alterations has uncovered deregulated tumor suppressors and oncogenes which play key roles in tumor progression and drug response. In addition to copy number changes, gene expression is regulated by DNA methylation. Gene copy number, methylation and expression can all be measured with microarrays which enables their computational integration. The goal of integrating copy number and expression data is to characterize genes essential to cancer progression. This is achieved by identifying genes that are both amplified and upregulated or deleted and downregulated. However, gene upregulation can also be caused by hypomethylation (decrease in methylation of cytosine and adenosine residues in DNA) and downregulation caused by hypermethylation. Since both copy number and methylation alterations can deregulate genes, integration of these three types of data should improve the characterization of essential genes in cancer. We introduce the CNAmet R package which integrates high-throughput copy number, methylation and expression data. Our primary goal is to identify genes that are simultaneously amplified, hypomethylated and upregulated, or deleted, hypermethylated and downregulated. To our knowledge CNAmet is the only software package for the three-way integration of copy number, methylation and expression. We illustrate the usefulness of CNAmet by analyzing copy number, methylation and expression data from 174 patients with glioblastoma multiforme, which is the most aggressive type of brain cancer, as well as 196 ovarian cancer patients.

Poster G09

Identification of transcription factor binding sites from ChIP-seq data at high-resolution

Anaïs Bardet Institute of Molecular Pathology

Jonas Steinmann (Institute of Molecular Biotechnology, Knoblich Group); Alexander Stark (Institute of Molecular Pathology, Stark Group);

Short Abstract: Gene expression is mainly regulated at the transcriptional level and achieved through the binding of transcription factors to enhancers. Chromatin immunoprecitation followed by massive parallel sequencing (ChIP-seq) enables the determination of transcription factor binding sites in vivo. Computational tools are available to predict binding sites as regions with significantly enriched read-counts. In our experience however, available tools often merge closely spaced binding sites that frequently occur due to homotypic clusters of binding sites into larger regions.

Here, we present a new computational tool, Peakzilla, which aims at identifying individual transcription factor binding sites, even when closely spaced. It achieves an improved spatial resolution, pushed to half the lengths of the sonicated DNA fragments, by empirically modeling this fragment size and the bimodal distribution of reads from both strands.

Peakzilla frequently splits large peak regions identified by other methods into multiple individual regions. These are bona fide transcription factor binding sites as we find them to be enriched in sequence motifs of the respective transcription factor. Peakzilla’s increased spatial resolution also improves the precision of peak-summit locations, as assessed by their distance to the nearest motif, outperforming other methods. It also provides an option to conveniently visualize the genome-wide score profile (similar to the ChIP read density) and to estimate if the ChIP-seq data allow peak-calling at saturating conditions.

Poster G10

Human gene coexpression landscape and network: location of house-keeping and tissue-specific genes

Alberto Risueño Fundacion de Investigacion del Cancer de la Universidad de Salamanca

Celia Fontanillo (Consejo Superior de Investigaciones Cientificas (CSIC), Bioinformatics and Functional Genomics Group); Javier De Las Rivas (Consejo Superior de Investigaciones Cientificas (CSIC), Bioinformatics and Functional Genomics Group);

Short Abstract: Gene coexpression analyses provide information of co-regulated genes throughout different conditions, making possible to identify transcriptional relationships between genes. Such relationships may be related to common gene evolution and usually correspond to involvement in common functional processes at cellular or physiological level. Microarray technology mesuring genome-wide expression has been shown very suitable for coexpression studies. In order to explore the coexpression landscape of human genes in a systemic way, covering most of the human body transcription events, we develop a bioinformatic analysis using a large set of microarrays from different human tissues. Within this set, we select samples from 32 different tissues to keep a balanced body-wide coverage, each one including 3 biological replicates. This selection provided a perfect clustering in a non-supervised test. Next, coexpression between all possible gene pairs was performed in two ways by combination of parametric/non-parametric signal- and correlation- algorithms: MAS5-Spearman and RMA-Pearson. This provides a robust collection of binary gene-to-gene correlations that allows the construction of a gene coexpression network. In parallel to this approach we analyse all ESTs data from UniGene libraries to build a database reflecting the presence/absence of ESTs per gene and tissue. In this way, both the microarray data and the EST data allow to know if a pair of genes is coexpressed in a given tissue. Finally, we analyse both datasets to identify the human house-keeping and tissue-specific genes. We compare our results with two existing datasets previously published, and we locate both types of genes in the coexpression network.

Poster G11

Functional Annotation of Regulatory Variants: A Systems Biology Approach to Translational Bioinformatics

Konrad Karczewski Stanford University

Joel Dudley (Stanford University, Systems Medicine); Nicholas Tatonetti (Stanford University, Biomedical Informatics); Stephen Landt (Stanford University, Genetics); Atul Butte (Stanford University, Systems Medicine); Michael Snyder (Stanford University, Genetics); Russ Altman (Stanford University, Genetics);

Short Abstract: Many genomic variants are discovered outside of genes, where their functional consequences are more difficult to characterize. However, as many of these variants are associated with disease, it is likely that they affect molecular physiology at the level of gene regulation. We investigate the role of variants in regulatory regions, on both transcription factor cooperativity as well as disease pathophysiology. First, we developed the Allele Binding Cooperativity (ABC) test and the ALPHABIT pipeline, which utilizes variation in transcription factor binding among individuals to discover combinations of factors and their targets. Second, we demonstrate a systematic approach to combine disease association, transcription factor binding, and gene expression data to assess the functional consequences of variants associated with hundreds of human diseases. In this systematic approach, we close a major loop in biological context-free association studies and assign putative function to many disease-associated SNPs. In this way, we apply findings from systems biology in a translational approach.

Poster G12

Genomes and Evolutionary Analysese of Robust Oil Production in Microalgae

Kang Ning Qingdao Institute of Bioenergy and Bioprocess Technology, Chinese Academy of Sciences

Dongmei Wang (Qingdao Institute of Bioenergy and Bioprocess Technology, Chinese Academy of Sciences, Functional Genomics); Jian Xu (Qingdao Institute of Bioenergy and Bioprocess Technology, Chinese Academy of Sciences, Functional Genomics);

Short Abstract: Microalgae are considered as promising biodiesel feedstock. However the genetic diversity and genomic evolution of microalgal traits underlying robust oil production remain ill defined. In this work, we have studied seven-member phylogenome of Nannochloropsis, a phylogenetic distinct group of unicellular microalgae capable of rapid growth and robust neutral-lipid production in large-scale cultivation.

We have first assembled the draft genome of these seven Nannochloropsis from next generation sequencing (both 454 and Solexa platforms), with results indicating their small genome size (25~40M bp). To explore the evolutionary history of Nannochloropsis, we compared its proteome to a serials of evolutionarily representative organisms, Arabidopsis (land plant), Cyanidioschyzon (red algae), Chlamydomonas (Green algae) separately. The phylogenetic analyses indicate Nannochloropsis’ unique evolutionary position in microalgae as well as in eukaryote. Additionally, we have analyzed the horizontal gene transfer (HGT) events within the seven Nannochloropsis strain, and discovered a couple of HGT events introduced in different stages (before the division of Nannochloropsis and after that). Furthermore, the analyses on protein-coding genes have shown that the biological process “response to stress” have highest non-synonymous/synonymous ratio, which is consistent with Nannochloropsis’ dynamic phenotype under different conditions (highlight, low nitrogen, etc.).

The seven-member phylogenomes, the only microalgal phylogenome and Eustigmatophyceae genomes available so far, revealed for the first time a comprehensive genomic hierarchy of an eukaryotic algal genus. Our research showed the striking diversity of oil producing modes in microalgae and provided novel sequence-based and functional markers for high-throughput genotype-based screening and development of oil-production traits in microalgae.

Poster G13

Orange Bioinformatics: Data Analytics through Visual Programming

Marko Toplak University of Ljubljana

Ales Erjavec (University of Ljubljana, Faculty of computer and information science); Janez Demsar (University of Ljubljana, Faculty of computer and information science); Minca Mramor (University of Ljubljana, Faculty of computer and information science); Crtomir Gorup (University of Ljubljana, Faculty of computer and information science); Gregor Rot (University of Ljubljana, Faculty of computer and information science); Gad Shaulsky (Baylor College of Medicine, Department of Molecular and Human Genetics); Tomaz Curk (University of Ljubljana, Faculty of computer and information science); Blaz Zupan (University of Ljubljana, Faculty of computer and information science);

Short Abstract: Data analysis is becoming the bottleneck of scientific discovery. Vast amounts of data generated by high-throughput experiments need to be analysed in context of internal or publicly available data and knowledge, and background knowledge can aid analysis immensely.

Orange Bioinformatics framework (http://orange.biolab.si) aims to decrease the barrier between the experimentalist and the data, reducing the time from data acquisition to the first biomedical insights gained through visual analytics. The experimentalist works in a visual programming environment where computational units called widgets are connected into a data flow schema. Orange widgets are high level operations or visualizations, specially tailored to the needs of molecular biologists. For instance, they implement gene selection, enrichment analysis, or exploration of genes from the data set in KEGG pathways. They also provide access to publicly available data, like GEO data sets, Biomart, GO, KEGG, Atlas, and ArrayExpress. While most widgets are specific enough to be used independently, the real magic lies in their interconnection. Furthermore, the experimentalist can directly, in the same interface, use powerful visualization, network exploration and data mining techniques from the Orange data mining framework.

Orange widgets are crafted in Python, and could in principle combine other libraries, like SciPy and BioPython. Widgets interfacing to R are in development. The framework is freely available and open source.

Poster G14

Analysis of protein functional sites encoding features

Irina Medvedeva Institute of Cytology and Genetics SB RAS

Vladimir Ivanisenko (Institute of Cytology and Genetics SB RAS, Sector of Computer Proteomics);

Short Abstract: Earlier we constructed the computer system SitEx (http://www-bionet.sscc.ru/sitex/) that includes database mapping the protein functional sites (FS) positions on the exon-intron structure of encoding gene and BLAST and 3DExonScan search. Testing this system on the 955 proteins that possess known PDB structures reveal that exons encoding FS are more likely to be found and are more conservative than those not encoding.

We have shown that the FS discontinuity through exon structure is significantly less than expected by chance and the protein FS tends to be encoded by single or neighboring exons. Than we analyzed codon usage and discovered that the exons coding FS amino acids on the 5`-end possess the less optimal codon usage. We also found that the frequency of codons encoding FS on the exon border in phases 1 and 2 is significantly greater than it is for the others. It could be the result of the unification of the genes that encode the single FS.

The extended version contains the information about SNP in FS, and the FS positions projection on homologous and paralogous genes. Based on this information we found that SNP frequency in functional sites amino acids is less than in surroundings. We observed the same effect analyzing FS area and protein domain area. The analysis of physical and chemical properties of the amino acid changes in SNP reveals that acidity changes more often that it could be expected by chance and the mass and volume are changing quite significantly more often that it could be expected.

Poster G15

Improved hit selection in phenotypic screens by combining the information about phenotype strength and specificity

Nikolay Samusik MPI-CBG

Martin Stoeter (Max Planck Institute of Cell Biology and Genetics, TDS); Michael Vanlandewijck (Ludwig Institute for Cancer Research, Uppsala Branch); Aristidis Moustakas (Ludwig Institute for Cancer Research, Uppsala Branch); Yannis Kalaidzidis (Max Planck Institute of Cell Biology and Genetics, -); Marino Zerial (Max Planck Institute of Cell Biology and Genetics, -);

Short Abstract: Image-based multiparametric RNAi and chemical screens became an important tool in systems biology and drug discovery. A typical output of a screen is a set of multidimensional vectors normalized to the negative control, so that the length of the vector corresponds to the phenotype strength, and the direction corresponds to the kind of the phenotype. The standard hit selection approach, chi2, takes only vector length into account, omitting the phenotype direction. However the number of observed phenotypes is typically limited, vectors tend to group around few distinct phenotypic directions. These groups can be identified by clustering and cluster centers would estimate the phenotypic directions. We propose a model in which every measured vector is a linear combination of a specific phenotypic component that is collinear to the center of a respective cluster, and a normally distributed noise component with covariance matrix that is estimated from negative control set. Since the multidimensional space is sparse, it is unlikely that a random noise vector with null specific phenotype component will fall close to one of the phenotypic directions. Based on this, we propose a novel scoring measure Phenoscore that is as a probability that the length of the vector specific phenotypic component (collinear to cluster center) is higher than the expected length of a random noise vector on this given direction. We validate the performance of Phenoscore on the dataset of chemical screen for TGFb inhibitors and show up to 70% improvement of ROC AUC at FPR <1%, compared to chi2.

Poster G16

An open source library for miRNA target prediction with a novel comparative genomics approach

Charles Vejnar University of Geneva

David Gatfield (University of Lausanne, Biology); Ueli Schibler (University of Geneva, Biology); Evgeny Zdobnov (University of Geneva, GEDEV);

Short Abstract: Post-transcriptional regulation via microRNA (miRNA) contributes to fundamental processes such as cell growth, development, and differentiation, yet miRNA target prediction remains a challenging problem. As there is no publicly available complete target prediction software, we implemented three approaches to predict miRNA targets in an open source library covering the full range of published prediction models. These approaches include the thermodynamics of RNA-RNA interactions, the target site sequence features (i.e. the TargetScan context score), and a probabilistic model. Additionally we introduced a novel comparative genomics approach.

Our novel comparative genomics approach measures differences in substitution rates in miRNA target sites that test for alternative modes of selection pressure (acceleration or conservation) on these target sites. This quantification of evolutionary pressures, together with target sites functionality prediction, and with miRNA age estimates, we have gained a new insight into miRNA regulatory network evolution.

Our all-in-one implementation of miRNA target prediction (manuscript in preparation) allowed us to compare these methods and test their efficiency without any mapping biases (as prediction sets are usually produced by different research groups). This comparison was done within the framework of knocking down an endogenously highly expressed miRNA (miR-122 molecules represent about 70% of miRNAs in hepatocytes), and with published expression data on mRNA and protein levels. Together with our open source target prediction software, this study enables the research community to efficiently use all state-of-art target prediction methods.

Poster G17

From data to disease: Identification of genes with cell lineage specific expression in the human kidney

Casey Greene Princeton University

Wenjun Ju (University of Michigan, Department of Nephrology); Felix Eichinger (University of Michigan, Department of Nephrology); Matthias Kretzler (University of Michigan, Department of Nephrology); Olga Troyanskaya (Princeton University, Lewis-Sigler Institute for Integrative Genomics);

Short Abstract: Knowledge of tissue-specific gene action is critical to understanding the biology of complex metazoans such as humans. For many human tissues, cell lineage specific high throughput experiments are infeasible due to the difficulty of obtaining such samples and genomic aberrations caused by immortalization processes. We have developed an iterative machine learning methodology for discovering cell-lineage specific gene expression from high-throughput non-cell-lineage resolved data. We have implemented this methodology in the PILGRM (Platform for Interactive Learning by Genomics Results Mining) system and applied this strategy to the discovery of genes with kidney podocyte (visceral epithelial cells) specific expression patterns. Podocytes are important cells in the kidney glomerulus that play a key role in the kidney’s function. Mutations or alterations of podocyte specific proteins underlie the hereditary proteinuric syndromes and many acquired glomerular diseases. Identification of novel podocyte-specific genes will help us understand the regulation of physiological properties of the podocyte and the mechanisms of its cellular response to disease or injury. We use multiple validation strategies including high throughput experiments in model organisms and low throughput (double immunofluorescence of human kidney biopsies and immunohistochemical staining) approaches. We show that our predictions are highly enriched for podocyte-specific genes. Furthermore, genes that we predict to be podocyte expressed are associated with important clinical outcomes including glomerular flow rate, an important clinical measure of kidney function.

Poster G18

Identification and classification of protein subfamilies using top-down phylogenetic tree reconstruction

Eduardo Costa Katholieke Universiteit Leuven

Celine Vens (Katholieke Universiteit Leuven, Department of Computer Science); Hendrik Blockeel (Katholieke Universiteit Leuven, Department of Computer Science);

Short Abstract: Identifying protein subfamilies is an important step in gene function prediction, which remains a challenge in the post-genomic era. Phylogenomic analysis has been used to tackle this task, as an alternative to homology-based methods. Current phylogenomic methods first build a complete phylogenetic tree, then extract clusters from it that hopefully correspond to protein subfamilies. We propose a novel phylogenomic method that differs from the existing methods in two important ways: (1) it builds the phylogenetic tree top-down, rather than bottom-up, and stops when clusters are found; thus it avoids constructing the whole tree; (2) it associates particular mutations with each division into subclusters. The novel method identifies subfamilies with comparable accuracy to that of existing methods, but is much more efficient (and can therefore identify subfamilies in much larger sets of proteins), and identifies key mutations by which different subfamilies can be recognized, allowing easy classification of new proteins.

Poster G19

Exploring muscle hypertrophy transcriptional signature through knowledge-based gene association networks

Francesca Mulas University of Pavia

Angelo Nuzzo (University of Pavia, Centre for Tissue Engineering); Flavio Ronzoni (University of Pavia, Di.Me.S., Human Anatomy Dept.); Maurilio Sampaolesi (University of Pavia, Di.Me.S., Human Anatomy Dept.); Blaž Zupan (University of Ljubljana, Faculty of Computer and Information Science); Bellazzi Riccardo (University of Pavia, Dipartimento di Informatica e Sistemistica);

Short Abstract: With the increasing number of biological knowledge bases, a variety of approaches have been developed in computational biology for the inclusion of prior knowledge in the data analysis process. For instance, various functional gene annotations contained in biological repositories, such as the Gene Ontology and the biomedical literature, can be exploited to visualize gene similarities by means of graphs called gene association networks. These networks represent genes in nodes and connect a pair of genes if their annotation profile is similar. We present a procedure for the construction of association networks for a set of 'focus genes' based on their annotations in Gene Ontology and the Medical Subject Headings from NCBI's PubMed. While most of the proposed strategies associate genes on the basis of co-citation in the publications or shared biological functions, our method uses approaches from text mining to characterize each gene by highlighting its most significant annotation terms.
The proposed procedure has been applied to a transcriptional analysis of myogenic precursor cells derived from wild type and Magic-F1 (met activating genetically improved chimeric factor 1) transgenic mice. MAGIC-F1 is an engineered protein that was previously demonstrated to induce muscle hypertrophy. Starting from a list of muscle specific genes and muscle growth regulators, we obtained networks that emphasize the results of global transcriptome analysis for the focus genes and their annotation-neighbours. In addition our strategy allowed us to identify novel gene candidates potentially involved in muscle hypertrophy.

Poster G21

Analyzing high resolution nucleosome structure by DNase-seq

Deborah Winter Duke University

Lingun Song (Duke University, IGSP); Sayan Mukherjee (Duke University, IGSP); Terrence Furey (University of North Carolina, Chapel Hill, Department of Genetics); Gregory Crawford (Duke University, IGSP);

Short Abstract: The majority of the human genome is wrapped around histone proteins to form nucleosomes that can be packaged into higher-order structures. These chromatin structures contribute to the compaction of DNA within the nucleus and their arrangement plays a role in gene transcription. Previously, we used the high-throughput assay, DNase-seq, to identify DNaseI hypersensitive sites (DHS), or nucleosome depleted regions, across the genome. Over 30 years ago, gel electrophoresis showed that DNaseI tends to digest DNA, outside of DHS, with approximately 10.4bp periodicity: this matches the bending of the DNA helix around the histone octamer. We detected this pattern even in aggregate DNase-seq data from multiple cell types, suggesting precise nucleosome positioning is relatively conserved. From this high-resolution data, we discovered periodicity within individual loci and located regions of stable nucleosome organization. As anticipated, Fourier analysis established that this data set is dominated by 10-11bp signals. We confirmed these findings across the genome through a convolution operation where fittings representing the 10.4bp pattern demonstrated better agreement than fittings with alternatively spaced patterns. When we equivalently analyzed data from FAIRE-seq – another high-throughput assay for open chromatin but whose detection model we expect would be independent of nucleosome position – we did not observe similar results. On a broader level, we can identify higher-order compaction, such as 30nm fibre, by exploring the pattern of nucleosomes exposed to DNA digestion. Our goal is to use this data to understand the 3D organizational program of chromatin and its impact on regulation.

Poster G22

Estimating differential expression of transcripts with RNA-seq by using Bayesian Inference

Peter Glaus The University of Manchester

Antti Honkela (University of Helsinki , Helsinki Institute for Information Technology HIIT); Magnus Rattray (The University of Sheffield, The Sheffield Institute for Translational Neuroscience);

Short Abstract: High-throughput sequencing enables expression analysis at the level of individual transcripts.
The analysis of transcriptome expression levels and differential expression estimation requires a probabilistic approach to properly account for the ambiguity caused by shared exons and finite read sampling as well as the intrinsic biological variance of transcript expression.
Another important factor are the biological sources of variance, which, as we show in our analysis, can be substantial and may dependent on the transcript expression level.
To avoid false positive differential expression calls, one has to anticipate the intrinsic variance of the transcript expression levels using empirical prior knowledge and information from replicates where they exist.

We present a Bayesian approach for estimation of transcript expression level from RNA-seq experiments.
Inferred relative expression is in the form of a probability distribution represented by samples of the distribution obtained from a Markov chain Monte Carlo inference method applied to a generative model of the read data.
Additionally to implementation of the regular Gibbs sampling algorithm, we provide a comparison with Collapsed Gibbs sampling in which some of the parameters are marginalized in order to obtain faster convergence.

We propose a novel method for differential expression analysis across replicates which propagates uncertainty from the sample-level model while modelling biological variance using an expression-level dependent prior.
We demonstrate the advantages of our method using a RNA-seq dataset (Xu G. et al., RNA 2010) with technical and biological replication for both studied conditions.

Poster G23

ARH-Seq: Alternative Splicing Robust Prediction by Entropy on RNA-Seq data

Axel Rasche MPI Molecular Genetics

Short Abstract: Several experimental platforms for the analysis of alternative splice variants exist such as RNA-Seq, microarrays, and EST libraries. Advanced high-throughput technologies demand adequate methods for the analysis of the generated data. However development of the statistical analysis still lags behind the progress in analysis techniques. Especially method performance has not been evaluated systematically for different methods and technologies. This is surprising as splicing prediction poses a stronger challenge than differential expression due to different expression levels for exons and genes and a coupling is proposed for transcription and splicing.
The focus is on de novo differential splicing predictions allowing the detection of splicing events not known from the transcript libraries as possible in diseases. A statistical model is developed to analyse splicing events based on a modified entropy function (ARH - Alternative splicing Robust prediction by Entropy) in order to judge significance of alternative splicing. This method covers different technologies like transcriptome sequencing or exon array data sets. Especially the RNA-Seq data allows to integrate different information ressources like exon and junction expression. Rather than combining separate predictions on the two ressource, ARH-Seq is able to fuse the measurements of both aspects.
The major drawback in developing and comparing new splicing prediction methods is the lack of controlled data sets. Results from alternative splicing prediction methods are compared on different Illumina sequencing tissue data sets with fixed preprocessing and confirmed splicing events from manually curated ressources.

Poster G24

Expression analysis of intronic non-coding RNAs revealed novel androgen-regulated ncRNAs in prostate cancer cell line

Vinicius Maracaja-Coutinho Universidade de São Paulo

Felipe Beckedorff (Universidade de São Paulo, Departamento de Biquímica, Instituto de Química); Murilo Amaral (Universidade de São Paulo, Departamento de Biquímica, Instituto de Química); Yuri Moreira (Universidade de São Paulo, Departamento de Biquímica, Instituto de Química); João Setubal (Virginia Tech, Virginia Bioinformatics Institute); Eduardo Reis (Universidade de São Paulo, Departamento de Biquímica, Instituto de Química); Sergio Verjovski-Almeida (Universidade de São Paulo, Departamento de Biquímica, Instituto de Química);

Short Abstract: Recent evidence shows that non-coding RNAs (ncRNAs) transcribed from intronic and intergenic genomic regions play important roles in transcriptional regulation. We applied here next-generation sequencing and a custom-built strand-specific 44k intronic-exonic oligoarray to reveal the patterns of expression of non-coding and protein-coding transcripts of LNCaP prostate cancer cell line with or without androgen stimulus. The 454 RNA-seq showed that most of intronic ncRNAs (56%) identified in both androgen-treated and control cells are novel ESTs. We identified a set of 69 intronic non-coding transcripts and 1,416 protein-coding genes differentially regulated by androgen (FDR<5%). We also explored LNCaP cells expression profile with the 44k oligoarray at seven different times of exposure to androgen (t= 0, 1, 3, 6, 12, 18 and 24 hours). This analysis reveals 507 intronic non-coding transcripts and 2,783 protein-coding genes differentially expressed (FDR<5%) over time under androgen treatment in comparison to vehicle-treated cells. Out of the 69 RNA-seq differentially expressed transcripts, 51 (73%) were represented by intronic contigs probed on the oligoarray. Of these, 39 (76%) were identified as expressed in both approaches. Finally, when cross-referenced to public datasets of epigenetic markers for active chromatin (H3K4me1, H3K4me2, H3K4me3) and androgen receptor motifs identified by ChIP-seq, a set of 291 (57%) intronic transcripts, out of 507 identified by our oligoarray analysis, showed enrichment (p-value<0.05) of these marker elements in the genome within up to ~16kb upstream of the androgen-regulated ncRNAs. These results show a set of new, hormone and epigenetically-regulated intronic ncRNAs.
Supported by FINEP, FAPESP, CNPq and CAPES.

Poster G25

Visualizing RNA-Seq fragment alignments

Harold Pimentel University of California, Berkeley

Adam Roberts (University of California, Berkeley, Computer Science); Cole Trapnell (Broad Institute and Harvard University, Pathology); Lior Pachter (University of California, Berkeley, Mathematics, Molecular Cell Biology and Computer Science);

Short Abstract: The increasing use of sequencing technology for functional genomics has spurred the development of visualization tools that allow users to view fragment alignments to reference genomes. Examples of such tools include new browsers such as IGV, and modifications to existing systems such as the UCSC browser. In terms of visualizing RNA-Seq data, these and other browsers generally employ the same approach. Fragments are distributed according to the genomic coordinates of their alignments and displayed as “tracks”, usually above transcript annotations. The information in the tracks is then the most direct way to verify estimates of transcript abundances that are based on fragment counts. The interpretation of results hinges on the ability to evaluate the quality of the alignments, and the alignments to a reference genome are also helpful in validating transcript exon- intron structures via “spliced alignments”.

We present a tool that helps in the interpretation of fragment alignments to a reference genome by coloring reads according to transcript compatibility and abundance. This reveals the likely origin of fragments, and facilitates the assessment of RNA-Seq experiments. Spurious fragment alignments can be easily identified as the fragments are assigned to transcripts by inverting a recent probabilistic assignment technique via maximization of likelihood functions whose parameters represent the transcript abundances. Errors in mapping or annotation can also be flagged for further investigation. The probabilistic assignment of fragments to transcripts requires a non-trivial computation, yet our approach is efficient while allowing for seamless integration with existing next-generation sequence visualization tools.

Poster G26

Impact of copy number alterations on gene expression of triple negative breast cancer patients

Catalin Barbacioru Life Technologies

Short Abstract: Triple negative breast cancer (TNBC) is characterized by the absence of expression of estrogen receptor, progesterone receptor and Her2-neu, accounting for 15% of all breast cancer diagnosis. Targeted therapies have a reasonable likelihood of improving the cure rate of early stage TNBC, and promising therapies discovered in the metastatic setting are rapidly advancing through clinical trials. We present integrated analysis of matched normal and tumor whole genome data, along with tumor transcriptome sequencing data of multiple individuals with metastatic chemo-resistant TNBC.

In each case, two independent 1.5kb Mate-pair libraries were generated for both tumor and germ-line derived genomic DNA and sequenced using SOLiD version 4.0 paired 50mers to a target of 30x depth. The tumor transcriptome was sequenced on four replicates and compared to transcriptome sequencing from ethnicity-matched population-based control hyperplastic breast tissue. Genome analysis was performed using multiple aligners and variant callers. Transcriptome alignment was performed using Life Technologies Bioscope pipeline, and differential expression analysis was performed using EdgeR and DESeq. We prioritized annotated germline and somatic variants by integrative analysis with differential expression results. Several striking examples of intronic events correlating with either altered splicing, differential expression or allelic imbalance were observed in genes relevant to cancer treatment, suggesting that transcriptomic data may have high value in interpreting somatic events that fall outside of coding regions. Final integration of data was validated through knowledge mining and convergence of somatic events and expression.

For Research Use Only. Not intended for any animal or human therapeutic or diagnostic use.

Poster G27

GeneProf: Integrated Analysis of High-Throughput Sequencing Data

Florian Halbritter University of Edinburgh

Simon Tomlinson (University of Edinburgh, MRC Centre for Regenerative Medicine / Institute for Stem Cell Research);

Short Abstract: The promise of Next-Generation Sequencing (NGS) to deliver large volumes of inexpensive data is now becoming a reality. Unfortunately, many biologists and non-specialist users can only make limited use of available data, because they lack the expertise required to analyse and interpret these data. This skills-gap is likely to become wider in future as technologies and analysis methods rapidly develop and diversify.

We sought to improve accessibility of NGS data to a diverse range of users by developing a graphical software suite, called GeneProf. GeneProf consists of two components: A Java web application front-end and a custom-built job-scheduling system operating a network of high-performance compute nodes. This architecture relieves users of the need to set up specialised software and hardware and provides intuitive, tiered access to state-of-the-art analysis methods: Novice users with a bare minimum of data analysis experience can easily and quickly construct complex pipelines using an array of step-by-step wizards, while more advanced users benefit from GeneProf's versatile workflow designer to define highly customised procedures. Computer programmers can extend the application by implementing additional workflow components using a well-documented API. Data analyses tightly couple the research data at hand with the processing workflows and all steps are fully transparent, traceable and reproducible. Completed analyses may be shared and linked in research publications.

We have used GeneProf to re-analyse a wealth of published data. All results are readily accessible and can be explored by any visitor of the website. Additionally, registered users can straightforwardly integrate public data in their own experiments.

Poster G28

Allele specific analysis of gene expression in mouse embryonic stem cell line

Robert Ivanek Friedrich Miescher Institute for Biomedical Research

Dirk Schübeler (Friedrich Miescher Institute for Biomedical Research, Epigenetics);

Short Abstract: The extent to which genetic variation (single nucleotide polymorphisms) may influence the transcription factor binding and chromatin structure, and thus contribute to variation in gene expression and phenotypes, is still not well understood. To address this question we performed a whole genome sequencing of the murine embryonic stem cell (ES) line, which was derived from mouse with mixed genetic background 129xC57BL6/J. By intersecting our sequencing data with publicly available data from Sanger Mouse Genomes Project we found out, that approximately one fifth of the genome of the ES cell line is heterozygous and eligible for allele specific analysis. For the same ES cell line we generated ChIP-seq profiles of several chromatin marks and transcription factors as well as unbiased transcriptome maps using strand specific total RNA sequencing. We established an integrated approach for analysis of allele specific distribution of chromatin marks, transcription factors and their effects on gene expression levels, which could be used as guide to similar studies.

Poster G29

Condition-specific networks of transcriptional responses in human immune cells from multivariate analysis of microarray data

Wanseon Lee EMBL_EBI

Leo Lahti (Aalto University School of Science and Technology, Department of Information and Computer Science); Misha Kapushesky (EMBL-EBI, Functional Genomics); Johan Rung (EMBL-EBI, Functional Genomics);

Short Abstract: The human immune system responds to stress and stimuli through a complex regulatory system, provided through a number of specialized cell types. The response pattern on the gene expression level reflects this complexity and is still not well understood, since it is largely condition- and cell specific, but only a minority of genes are expressed exclusively in any single type of immune cell.
We present a multivariate analysis of gene expression networks across a substantial number of cell types and conditions to investigate the regulatory mechanisms behind unique and shared responses of the immune system. Gene expression data from 30 experiments on immune cells from blood of over 600 healthy individuals were manually collected from the ArrayExpress database and additional curation of annotation was performed.
To investigate condition-specific behaviors of the interaction network the probabilistic NetResponse algorithm (Lahti et al. 2011) was applied. The algorithm detects subnetworks with distinct transcriptional responses in subsets of conditions. It is independent of predefined classifications for genes or conditions, providing tools for unsupervised analysis of network activation patterns. It performs an agglomerative network search, iteratively merging interacting genes with coordinated expression changes. A variational infinite Gaussian mixture model is used to quantify such dependency, and to detect and characterize the condition-specific responses in each subnetwork.
The results provide condition-specific activation patterns in immune cells as well as a global view of interaction networks in the immune response across diverse physiological conditions, with potential for further investigation into the function of immune system.

Poster G30

The Limpopo MAGE-TAB Parser

Tony Burdett European Bioinformatics Institute

Joe White (Dana Faber Cancer Institute, Dana Faber Cancer Institute); Niran Abeygunawardena (European Bioinformatics Institute, Functional Genomics); Vincent Xue (European Bioinformatics Institute, Functional Genomics); Helen Parkinson (European Bioinformatics Institute, Functional Genomics);

Short Abstract: MAGE-TAB is a widely used tab-delimited text document format designed for exchange and submission of gene expression data. It aims to be easy for biologists to use, create and edit and consists of a group of related text ?les that are both “human-readable” and can be constructed using a spreadsheet or text editor.

The MAGE-TAB speci?cation [1] imposes a strict series of rules on the ?les comprising a MAGE-TAB document. These cannot be easily checked by a human reading the document or detected by a simple text parser; programmatic validation is required. To address these challenges we present the Limpopo MAGE-TAB parser, a standard canonical java parser for MAGE-TAB format.
Limpopo contains the parser, a speci?cation-compliant object model of the MAGE-TAB structure written in Java, and an API for accessing the data contained within a MAGE-TAB document, to support applications that consume MAGE-TAB. It also includes a plugin framework for extending the parser to add custom validation rules (on top of the basic syntactic validation provided) or the ability to write data from the Limpopo object model, for example to store MAGE-TAB data in a database.

Limpopo is being used in several projects, including ArrayExpress, the Gene Expression Atlas, and Annotare. These three projects all use validator extensions to add custom validation rules, and the first two also use the write functionality to store data in their respective databases.

[1] Rayner, T. et al. A simple spreadsheet-based, MIAME-supportive format for microarray data: MAGE-TAB. BMC Bioinformatics 2006, 7:489Jo

Poster G31

The Functional Genomics Software Suite

Tomasz Adamusiak European Bioinformatics Institute

Tony Burdett (European Bioinformatics Institute, Functional Genomics); Anna Farne (European Bioinformatics Institute, Functional Genomics); Adam Faulconbridge (European Bioinformatics Institute, Functional Genomics); Emma Hastings (European Bioinformatics Institute, Functional Genomics); Natalja Kurbatova (European Bioinformatics Institute, Functional Genomics); James Malone (European Bioinformatics Institute, Functional Genomics); Ravensara Travillian (European Bioinformatics Institute, Functional Genomics); Eleanor Williams (European Bioinformatics Institute, Functional Genomics); Helen Parkinson (European Bioinformatics Institute, Functional Genomics);

Short Abstract: The Functional Genomics Production Team at the European Bioinformatics Institute have produced a large suite of software for data submissions, working with the MAGE-TAB format, annotating data against the Experimental Factor Ontology and automating curation processes of microarray and high throughput sequencing data. Here we present an overview of our publicly available tools.

Tab2MAGE is a series of Perl modules that facilitate data submissions to ArrayExpress. It includes support for parsing and validating a variety of formats, including MAGE-TAB, MAGE-ML and MIAMExpress. It also contains a lightweight process tracking system and taxon specific template generation system implemented in Ruby on Rails with a MySQL backend.

Limpopo is a MAGE-TAB parser written in Java. It provides functionality for validating MAGE-TAB with custom rules and exporting MAGE-TAB data to any other format (including your own database).

OntoCAT is a Java library that abstracts a number of common ontology-related tasks for interacting with ontology resources, including local files and public repositories. OntoCAT R is an R-package designed to enable ontology traversal and search in R environment. ZOOMA is a tool that provides optimal ontology mappings for user supplied terms, facilitating automatic data annotation to preferred ontologies.

OntologyMapper is a Perl package providing fuzzy concept recognition and lexical matching based on a list of terms and any target ontology.

Bubastis is an ontology diff tool used to compare two ontologies (typically versions of the same ontology) in OWL or OBO format for logical differences in classes and identifying new or deleted classes.

Poster G32

Novel Method for Removal of Sequence Biases in ChIp-seq Data

Joanna Raczynska UT Southwestern Medical Center

Dominika Borek (UT Southwestern Medical Center, Biochemistry); Zbyszek Otwinowski (UT Southwestern Medical Center, Biochemistry);

Short Abstract: Chromatin immunoprecipitation followed by high?throughput sequencing (ChIp?seq) is an increasingly popular method to study protein?DNA interactions in a genome?wide manner. Yet, the enrichment of sequenced fragments at a particular genomic location does not directly correspond to the desired signal (i.e. protein binding) due to various biases. These are: underrepresentation of sequences having extreme GC?composition (either GC or AT?rich) or uneven base frequencies at particular positions within the sequenced fragments. In some ChIp?seq datasets this effect can be very large leading to uninterpretable results or worse, false biological conclusions. Therefore, correction of the biases is necessary for any meaningful interpretation, especially when looking for correlations between different datasets.
Some effort has already been made to account for the described effects [Schwartz et al, 2011, PLoS One 6(1)]. However, none of the existing procedures takes into account the base frequencies at individual positions outside of the sequenced fragment which, as we have found, also significantly influence the bias.
We developed a new method to normalize ChIp?seq data with respect to the reference genome. It takes into account the general base composition (like GC?content) of each read together with its genomic context. Also, it accounts for different base frequencies at individual positions relative to the starting position of a read, not only within the actual read, but also outside of it. The method is based on a Poisson?regression procedure to find a set of coefficients that then allow for calculation of weights for each read based on the sequence of its genomic context.

Accepted Posters

Preparing your Poster - Information and Poster Size
Poster Schedule
Vienna Poster Printing Services
Poster Categories
Search for a Poster

Attention Poster Authors: The ideal poster size should be max. 1.30 m (130 cm) high x 0.90 m (90 cm) wide. Fasteners (Velcro / double sided tape) will be provided at the site, please DO NOT bring tape, tacks or pins. View a diagram of the the poster board here

Posters Display Schedule:

Odd Numbered posters:

Set-up timeframe: Sunday, July 17, 7:30 a.m. - 10:00 a.m.
Author poster presentations: Monday, July 18, 12:40 p.m. - 2:30 p.m.
Removal timeframe: Monday, July 18, 2:30 p.m. - 3:30 p.m.*

Even Numbered posters:

Set-up timeframe: Monday, July 18, 3:30 p.m. - 4:30 p.m.
Author poster presentations: Tuesday, July 19, 12:40 p.m. - 2:30 p.m.
Removal timeframe: Tuesday, July 19, 2:30 p.m. - 4:00 p.m.*

* Posters that are not removed by the designated time may be taken down by the organizers and discarded. Please be sure to remove your poster within the stated timeframe.

Delegate Posters Viewing Schedule

Odd Numbered posters:
On display Sunday, July 17, 10:00 a.m. through Monday, June 18, 2:30 p.m.
Author presentations will take place Monday, July 18: 12:40 p.m.-2:30 p.m.

Even Numbered posters:
On display Monday, July 18, 4:30 p.m. through Tuesday, June 19, 2:30 p.m.
Author presentations will take place Tuesday, July 19: 12:40 p.m.-2:30 p.m

Want to print a poster in Vienna - try these options:

Repacopy- next to the congress venue link [MAP]

Also at Karlsplatz is in the Ring Center, Kärntner Str. 42, link [MAP]

If you need your poster on a thicker material, you may also use a plotter service next to Karlsplatz: http://schiessling.at/portfolio/

View Posters By Category

Search Posters:

↑ TOP