20th Annual International Conference on
Intelligent Systems for Molecular Biology


Posters

Poster numbers will be assigned May 30th.
If you can not find your poster below that probably means you have not yet confirmed you will be attending ISMB/ECCB 2015. To confirm your poster find the poster acceptence email there will be a confirmation link. Click on it and follow the instructions.

If you need further assistance please contact submissions@iscb.org and provide your poster title or submission ID.

Category Q - ''
Q01 - Information Explorer: a Suite of Tools for Cross-study Phenotype Mapping
Short Abstract: Databases such as dbGaP represent extremely valuable resources of data that have been assembled across multiple cohorts. The increasing development of cost-effective high-throughput genotyping and sequencing technologies are resulting in vast amounts of genetic data. While such databases were formed in order to archive and distribute the results of previously performed genetic association analyses, an increasing number of studies have provided de-identified individual-level genotypic and phenotypic data that are made available to outside researchers who have obtained the appropriate authorization. While the amount of data made available has increased dramatically in recent years, relatively little has been done to facilitate phenotype harmonization across studies. Researchers currently have to analyze and compare increasingly larger numbers of variables with varying degrees of documentation associated with them to obtain the desired information. Information Explorer is a suite of tools we are developing that would allow researchers to (1) Quickly obtain the information needed to assess whether a specific study will be useful for the hypothesis of interest; (2) Exclude variables that do not meet research criteria; (3) Ascertain which studies have combinations of phenotype and genetic information of interest; and (4) More easily expand research questions beyond the most basic main-effects to more complex analyses such as gene-by-environment interactions and multivariate tests incorporating multiple phenotypes. The increased utility will also enable larger meta-analyses to be performed, as researchers will be able to more quickly hone in on outcomes, exclusionary variables and covariates of interest, leading to increased statistical power to detect genetic associations.
TOP
Q02 - Genome-wide Identification of SNPs in MicroRNA Genes and the SNP Effects on MicroRNA Target Binding and Biogenesis
Short Abstract: Genome-wide Identification of SNPs in MicroRNA Genes and the SNP Effects on MicroRNA Target Binding and Biogenesis
Jing Gong, Yin Tong, Hong-Mei Zhang, Kai Wang, Tao Hu, Ge Shan, Jun Sun, An-Yuan Guo

ABSTRACT
MiRNAs are studied as key regulators of gene expression involved in different diseases. Several single nucleotide polymorphisms (SNPs) in miRNA genes or target sites (miRNA-related SNPs) have been proved to be associated with human diseases by affecting the miRNA mediated regulatory function. To systematically analyze miRNA-related SNPs and their effects, we performed a genome-wide scan for SNPs in human pre-miRNAs, miRNA flanking regions, target sites and designed a pipeline to predict the effects of them on miRNA-target interaction. As a result, we identified 48 SNPs in human miRNA seed regions and thousands of SNPs in 3’- untranslated regions with the potential to either disturb or create miRNA-target interactions. Furthermore, we experimentally confirmed 7 loss-of-function SNPs and 1 gain-of-function SNP by luciferase assay. It is the first time to experimentally validate a SNP in a miRNA creating a novel miRNA target binding. All useful data were compiled into miRNASNP, a user-friendly free online database (http://www.bioguo.org/miRNASNP/). These data will be a useful resource for studying miRNA function, identifying disease-associated miRNAs, and further personalized medicine.

KEY WORDS: miRNASNP; database; target loss and gain; frequency
TOP
Q03 - AdaPatch: a method to detect regions under positive selection on viral protein structures
Short Abstract: Influenza A virus causes acute short-term infections associated with three to five million cases of severe illness, and about 250 000 to 500 000 deaths every year. The single-stranded RNA viruses are known to evolve rapidly to adapt to environmental conditions, such as an establishment in a novel host or increasing immunity within the current host population. Knowledge of viral protein regions under positive selection is therefore crucial for surveillance and, eventually, vaccine prediction and drug development. We have developed AdaPatch, a method that searches for dense and spatially distinct clusters of sites under positive selection on the protein surface of viral proteins. We determine positive selection based on dN/dS ratios of genetic changes inferred by considering the phylogenetic structure of the data, and then use a graph cut algorithm to identify such clusters.
For the hemagglutinin protein of seasonal human influenza A viruses, our predicted sites significantly overlap with known antigenic and receptor-binding sites. From the structure and sequence data of the 2009 swine-origin influenza A/H1N1 hemagglutinin and PB2 protein, we identified regions that show evidence of evolution under positive selection since introduction of the virus into the human population. The changes in PB2 overlap with sites reported to be associated with mammalian adaptation of the influenza A viruses. We provide our technique as a plain and fast web service and suggest that an application of AdaPatch to the protein structures of viruses of yet unknown adaptive behavior could identify further candidate regions that are important for host-virus interaction.
TOP
Q04 - Spatial Modeling of the Brain Tumor Perivascular Niche
Short Abstract: Glioblastomas are heterogeous in nature, intermingling with a wide range of stromal cells. These cells include vascular cells, microglia, and other cell types that provide a specialized niche for stem-like tumor cells. These interactions result in spatio-temporal dynamics that are not well characterized by population averages. Current mathematical models of cancer, derived at the population level, are not well suited for examining the effects of intercellular signaling with this perivascular niche. Here we propose a multiscale agent-based model of perivascular niche dynamics that links phenomena occurring at the subcellular, cellular, and tissue levels. This model can be used to test hypotheses concerning the role of microenvironmental signals in the maintenance of the brain tumor stem cell population and their effects on therapeutic interventions.
TOP
Q05 - Genetic and Metabolic Characterization of Insomnia
Short Abstract: Insomnia is reported to chronically affect 10~15% of the adult population. However, very little is known about the genetics and metabolism of insomnia. Here we surveyed 10,038 Korean subjects whose genotypes have been previously profiled on a genome-wide scale. About 16.5% reported insomnia and displayed distinct metabolic changes reflecting an increase in insulin secretion, a higher risk of diabetes, and disrupted calcium signaling. Insomnia-associated genotypic differences were highly concentrated within genes involved in neural function. The most significant SNPs resided in ROR1 and PLCB1, genes known to be involved in bipolar disorder and schizophrenia, respectively. Putative enhancers, as indicated by the histone mark H3K4me1, were discovered within both genes near the significant SNPs. In neuronal cells, the enhancers were bound by PAX6, a neural transcription factor that is essential for central nervous system development. Open chromatin signatures were found on the enhancers in human pancreas, a tissue where PAX6 is known to play a role in insulin secretion. In PLCB1, CTCF was found to bind downstream of the enhancer and interact with PAX6, suggesting that it can probably inhibit gene activation by PAX6. PLCB4, a circadian gene that is closely located downstream of PLCB1, was identified as a candidate target gene. Hence, dysregulation of ROR1, PLCB1, or PLCB4 by PAX6 and CTCF may be one mechanism that links neural and pancreatic dysfunction not only in insomnia but also in the relevant psychiatric disorders that are accompanied with circadian rhythm disruption and metabolic syndrome.
TOP
Q06 - Comparison of IBD detection methods with a focus on rare variants
Short Abstract: Identity by descent (IBD) between two individuals means that both inherited the same DNA sequence from a common ancestor. Detection of IBD tracts is important for population genetics and association studies. IBD detection methods perform well for family studies where pedigrees are available and for common single nucleotide variants (SNVs). However recent genotyping projects utilizing next generation sequencing comprise unrelated individuals and detect mostly rare variants. Currently, rare variants are of high interest in genetics because they are assumed to cause complex human diseases. However their association with a disease is hard to detect as standard tests on rare variants yield low power. IBD mapping can be used to increase the power by two approaches. First, SNVs can be grouped based on IBD and subsequently their joint effect tested for disease association. Secondly, local genetic similarities between individuals can be measured by IBD and used for association tests like implemented in the sequence kernel association test (SKAT).
With a focus on rare variants, we compare the two most commonly used IBD detection techniques, BEAGLE’s fastIBD and PLINK, on simulated data with implanted rare IBD tracts. Both methods miss a large proportion of short tracts and tracts that are tagged by few minor alleles. Overall fastIBD has slightly higher power than PLINK while having a higher false discovery rate. fastIBD systematically overestimates the length of IBD tracts while PLINK estimates it well. However the exact location and length of an IBD tract is essential for identifying disease loci by IBD mapping.
TOP
Q07 - Non-Identifiable Pedigrees and a Bayesian Solution
Short Abstract: Some methods aim to correct or test for relationships or to reconstruct the pedigree, or family tree. We show that these methods cannot resolve ties for correct relationships due to identifiability of the pedigree likelihood which is the probability of inheriting the data under the pedigree model. This means that no likelihood-based method can produce a correct pedigree inference with high probability. This lack of reliability is critical both for health and forensics applications.

Pedigree inference methods use a structured machine learning approach where the objective is to find the pedigree graph that maximizes the likelihood. Known pedigrees are useful for both association and linkage analysis which aim to find the regions of the genome that are associated with the presence and absence of a particular disease. This means that errors in pedigree prediction have dramatic effects on downstream analysis.

In this paper we present the first discussion of multiple typed individuals in non-isomorphic pedigrees where the likelihoods are non-identifiable. While there were previously known non-identifiable pairs, we give an example having data for multiple individuals.

Additionally, deeper understanding of the general discrete structures driving these non-identifiability examples has been provided, as well as results to guide algorithms that wish to examine only identifiable pedigrees. This paper introduces a general criteria for identifiability. We suggest a method for dealing with non-identifiable likelihoods: use Bayes rule to obtain the posterior from the likelihood and prior.
TOP
Q08 - A maximum likelihood-based algorithm for removing recombinations leading to more accurate phylogenies
Short Abstract: Algorithms for constructing phylogenies assume that the entire alignment being studied has a single, common ancestry; this condition is violated in recombined sequences (Posada et al., 2002). Hence in order to reconstruct the history of a naturally transformable species, it is necessary to distinguish vertically inherited point mutations, which are informative about relationships between taxa, from horizontally acquired sequences, which may introduce many polymorphisms simultaneously but actually represent just a single mutational event.
A variety of methods, primarily based on Bayesian statistics (Didelot and Falush, 2007) and Hidden Markov Models (Husmeier, 2005), use the concept of a ‘clonal frame’ in bacteria: that sufficiently closely related isolates would share a clonally descended fraction of their chromosomes, interrupted by a number of dispersed loci that had undergone recombination since their divergence (Milkman and Bridges, 1993). The most accurate algorithm (Didelot and Falush, 2007) has been successfully applied to MLST datasets, however its computational intensity means it is impractical for large, whole genome alignments.
We present a maximum likelihood-based algorithm that removes the recombinations from each taxon through analysis of the patterns of polymorphisms occurring on each branch, analogous to the Bayesian implementation of Didelot and Falush. It iteratively removes recombination's, and rebuilds the phylogeny. It has been tested against bacteria of varying sizes and numbers of strains producing results in a feasible amount of time, with a memory footprint small enough to fit on a typical desktop computer.
The source code is available from https://github.com/sanger-pathogens/gubbins under the GPL2 open source licence.
TOP
Q09 - De novo assembly of a viral quasispecies
Short Abstract: Genetic reproduction in most organisms is subject to very low error rates due to high-fidelity error checking mechanisms built into the organism's DNA replication machinery. Certain viruses, like HIV and HCV, lack such error checking mechanisms and thus are highly prone to mutations during replication. These high mutation rates coupled with rapid replication leads to the formation of a community of highly similar, but distinct, genomes within an infected host, referred to as a viral quasispecies. Characterizing the genetic variations in a quasispecies for viruses like HIV is an important tool in coordinating treatment plans for infected individuals, or understanding the evolutionary traits of a quasispecies at a molecular level. Traditional approaches for characterizing quasispecies communities using traditional sequencing techniques are expensive and time consuming, however second generation sequencing technologies enable researchers to deeply sequence an entire quasispecies community quickly and with reduced effort and expense. With few exceptions, assembly software is designed to reconstruct a single genome from a set of sequencing reads rather than a community of highly similar genomes. Some approaches to assembling quasispecies communities are described in literature (ShoRAH, ViSpA, QuRe), however these implementations require a reference sequence that is highly similar to the sequencing data and may not scale to assemble the volume of data produced by some next-generation sequencers. Thus, we describe and implement a de novo assembly method for reconstructing a quasispecies using de Bruijn graphs, and assess the relative abundance of predicted variants with an approach inspired from hidden Markov models.
TOP
Q10 - Assessment of the Genome of the Netherlands as a novel imputation reference dataset
Short Abstract: Genetic imputation is the statistical process of utilizing phased haplotypes from a large reference population in order to increase the markers of a GWAS. In this study we assess the imputation efficiency of a novel reference set based on the pilot version of the Genome of the Netherlands (GoNL) consisting of whole genome, high coverage, sequencing data from 48 Dutch trios (144 individuals). To estimate the benefit of using a population specific reference set we imputed 745 Dutch individuals from a Celiac Disease study, genotyped in Illumina hap550. For comparison we used the European panel of 1000 Genomes (1KG) that contain approximately double number of individuals (286) than GoNL. Preliminary results in chromosome 20 indicate that GoNL contains genotypes for 99.3% of the HapMap550 SNP-set whereas 1KG covers only the 72%. To measure the imputation efficiency we plot the percentage of imputed SNPs that exhibit R2>0.8 for various Minor Allele Frequency (MAF) bins. Results show that GoNL performs equally with 1KG for all MAF bins showing a slight improvement for rare alleles (MAF<0.05). For evaluation, the same individuals were also genotyped in the Immunochip platform. The concordance between the imputed genotypes and Immunochip was increased from 97% with 1KG to 98% with GoNL. Future plans include the assessment of GoNL for additional populations, alone or combined with haplotypes from other datasets. The computation takes place in the e-BioGrid infrastructure, whereas the submission of the pipelines, monitoring of the jobs and management of the data is facilitated in the compute-MOLGENIS environment.
TOP
Q11 - Mapping eQTLs in the human brain by measuring allele-specific expression
Short Abstract: A detailed map of genetic variants affecting gene expression (eQTLs) in the human brain is of great importance. However, the generation of a detailed map of eQTLs in the human brain faces two critical barriers: First, it is difficult to have a large collection of high quality post-mortem brain specimens, especially from prenatal and early postnatal periods, when many important developmental events occur and some of the psychiatric disorders begin to manifest. Second, there are numerous inter-sample and inter-specimen confounding factors affecting the measurement of gene expression in post-mortem human brain tissue, such as age, gender, ethnicity, disease status, drug usage, tissue dissection and ante- and post-mortem conditions.
Existing methods for detecting eQTL using microarrays are badly affected from these inter-sample confounding factors. We developed an efficient new approach using RNA-Seq that can effectively overcome these inter-sample confounding factors and thus dramatically increase the power of eQTL-mapping. The central idea is to measure the expression disparity between two gene alleles based on the heterozygous SNPs harbored in sequencing reads, then to detect local SNPs accounting for this expression disparity. By focusing on the comparison of allele expression, inter-sample confounding factors will be greatly reduced or disappear from the study.
We developed this approach, build a software platform, thus provided a strong basis for large-scale studies in near future to create a detailed map for eQTLs in the developing and adult human brain.
TOP
Q12 - Efficient detection of associated SNP pairs over multiple disease GWAS
Short Abstract: In genome-wide association studies (GWAS), it has been hypothesised that pairs of SNPs could provide stronger associations than individual SNPs[1]. A number of methods have been proposed for this task but are often evaluated with artificial data, for reasons of both scalability and interpretation. Evaluations of this type are limited by our low understanding of SNP interactions in real data. There is also little data on replicability of markers detected in pairwise analysis, with replicability being a known issue in univariate GWAS.

In this work, we present a novel and fast algorithm, Genome-wide Interaction Search (GWIS), for exhaustively evaluating SNP pairs. We have created CPU and GPU implementations which can search an entire dataset in less than 2 hours and 10 minutes for CPU and GPU respectively; significantly faster than comparable algorithms[1].

We present results from running our algorithm over datasets for seven different diseases, obtained from the Wellcome Trust Case Control Consortium[4]. All datasets clearly show SNP pairs that have considerably stronger association with disease, compared with their individual component SNPs. In many cases, our method picked up SNP pairs whose individual components would not have been detected using univariate analysis. In a separate experiment, running our algorithm over five independent celiac datasets[2,3], we have detected the same pairs of SNPS in the different datasets, suggesting that our method is detecting pairs with biological relevance.

References
1. Wan, X., et al, Am J Hum Genet,2010
2. van Heel, D., et al, Nat. Genet,2007
3. Dubois, P., et al, Nat. Genet,2010
4. Burton, P., et al, Nature,2007
TOP
Q13 - An Integrated Genomic Variant Database System for Preliminary Study of SNP-related CNV Analysis
Short Abstract: Single nucleotide polymorphisms (SNPs), that is a representative of genome-wide association studies (GWAS) have contributed to discover the genetic basis of complex diseases such as diabetes, cancer, and psychiatric diseases. However, the large-scale GWAS based on SNP genotyping have only explained a small portion of the heritable variation of these diseases. CNV study is one of the good alternatives which handles missing heritability problems because it has important roles both in disease susceptibilities and gene dosage for genetic risk. Comparing with GWAS, however, CNV association studies have some limitations which bring difficulties to analyze. One explanation is that current technologies do not detect exact breakpoints which identify copy number genotype in genome-wide scale due to low resolution and platform specificity. A stage for filtering out of spurious detections based on quality metrics has been required to analyze CNV more accurately as well. Recently, Gamazon et al. revealed that well-defined polymorphic CNVs are identified with well tagged by SNPs and those CNVs more possibly affect to multiple expression traits than frequency-matched variants. Therefore, providing information of polymorphic CNVs which are accurately ascertained and allele frequency of each genotype in a population is very useful for CNV association study and study of ethnic differences among certain CNVs of different populations. Due to these reasons we developed an integrated genomic variant database system for preliminary study of SNP-related CNV analysis. This system integrates the CNV regions from 1000 Genomes, Wellcome Trust Case Control Consortium, Genome Medicine Institute, Database of Genomic Variants and dbSNP build 130.
TOP

View Posters By Category

Search Posters:


TOP