Posters

Poster numbers will be assigned May 30th.
If you can not find your poster below that probably means you have not yet confirmed you will be attending ISMB/ECCB 2015. To confirm your poster find the poster acceptence email there will be a confirmation link. Click on it and follow the instructions.

If you need further assistance please contact submissions@iscb.org and provide your poster title or submission ID.

Category J - 'Pathogen informatics'
J01 - Characterization, Genetic Diversity of Tomato Yellow Leaf Curl Virus Egyptian Isolate
Short Abstract: Tomato yellow leaf curl Virus (TYLCV) threatens the production of tomatoes both in Egypt and the world. A virus was isolated from naturally infected tomato and pepper plants, grown in different areas in Egypt. The investigated isolate was characterized using electron microscopy, serology, biological tests, and polymerase chain reaction (PCR) using degenerate primers were used to amplify partial sequences (~530 bp) of the begomovirus coat protein gene (Cp) from samples of diseased plants. DNA sequence analyses of Egyptian isolate revealed high nucleotide sequence identities to isolates of TYLCV (94.5%) in GenBank. TYLCV-EG contains on one open reading frame (ORF) which encodes to 176 amino acids residues with MW 19.651 KDa. Twenty amino acids were detected of TYLCV-EG starting with Alanine (A) and ending with Tyrosine (Y). Phylogenetic analysis of Cp gene sequence of TYLCV-EG suggesting that the sequence variations observed in this isolate and some of those identified and published on GenBank may be attributable to intra-specific recombination events involving some TYLCV isolates. The great variability of the TYLCV isolates worldwide should be considered when breeding programs for virus resistance are established. A tomato line tolerant/resistant to a particular TYLCV isolate may not be as effective against another distantly related virus isolate.
TOP
J02 - Machine Learning Driven Prediction of Pathogenicity
Short Abstract: In recent years, the number of sequenced and annotated organism has continuously increased and a large variety of pathogenic organisms have been collected. It has become a major goal of virologists and bacteriologists to pinpoint those sequences that ultimately induce pathogenicity. Since information of large data sets needs to be processed, methods from bioinformatics and statistical learning are in strong demand to support this process.
This contribution reports on the application of machine learning for the classification of virulence-related protein sequences in bacteria. In a first step, we define positive and negative data sets from the Virulence factor data base and annotated entries in Uniprot, respectively. From these data, we generate a variety of features based on protein sequences. These features range from simple count statistics of the occurrence of amino acids to more indirectly inferred information related to the protein structure. The latter inclusion of not directly accessible information is a major novelty in comparison to existing approaches. We implement various supervised learning strategies and evaluate their respective advantages. Additionally, we experiment with a semi-supervised learning strategy to be able to handle insufficiently annotated protein data which commonly occurs, especially with regard to newly occurring pathogens e.g. in zoonosis. We can furthermore discern the most relevant features via feature importance which in turn aids the biological interpretation of the findings.
In conclusion, the approach presented here describes a powerful method to discriminate previously unknown protein sequences in respect of their potential to affect the virulence of an organism.
TOP
J03 - De novo assembly and improvement of 50,000 prokaryotic genomes
Short Abstract: De novo assembly provides a more complete picture of the under lying genome than performing a reference based alignment alone.The problem is that producing good quality assemblies from short read data alone is still a non trivial process.The underlying genome may not be adequately represented in the assembly or it may be too fragmented to be of much use in analysis.We have developed an assembly and improvement pipeline which was used to produce reasonable quality draft assemblies for 50,000 genomes, encompassing more than 300 prokaryotic species.Approximately one million hours of computation time were required to assemble the 25 TeraBases of Illumina short read data.For each strain, Velvet was used to create multiple assemblies by varying the kmer size.The one with the best N50 was chosen.The contigs of the assembly were scaffolded by iteratively running SSPACE.Then gaps identified as 1 or more N’s, were targeted for closure by running 120 iterations of GapFiller.Finally, the reads were aligned back to the improved assembly using SMALT and a set of statistics was produced for assessing the QC of the assembly.All the assemblies produced were created in a standardised manner and required no input from the user, meaning that all the results are reproducible.The results show that 95% of the assemblies produced are within 10% of the genome size of a related high quality reference assembly.A mean of 94% of reads map back to the assemblies generated.The median number of contigs in the resulting assemblies is 44.
TOP
J04 - Probabilistic models for CRISPR spacer content evolution
Short Abstract: CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) is an adaptive and heritable immune system in Prokaryotes. Immunity is encoded in an array of targeting sequences called spacers. Each spacer sequence provides specific immunity to parasites, e.g., phages or plasmids. Even between closely related strains, spacer content is very dynamic and evolves quickly. Standard models of nucleotide evolution cannot be applied to quantify its rate of change since processes other than single nucleotide changes determine its evolution.

We present probabilistic models for the change of CRISPR spacer arrays over time. They can account for the different processes of insertion and deletion observed empirically. In the basic model, spacers are unordered and deletions affect only single spacers. As insertions were observed to occur at the beginning of the array, insertions are constrained in the second model. Finally in the third model, a fragment of adjacent spacers can be deleted in a single event. Parameters of these models are estimated for pairs of arrays by maximum likelihood accounting for unobserved events.

In simulations, parameters are well estimated on average under the models presented here. There is a bias in the rate estimation with fragment deletions. We find that parameters are estimated robustly under incorrect models as well.

Finally, different Yersinia pestis data sets are analyzed. The results between data sets are largely congruent, but the models also capture the variation in spacer diversity among the data sets.
TOP
J05 - Insights into the B-cell response in a natural human infection: high-throughput mapping of epitopes using next-generation peptide chips
Short Abstract: The full set of specificities in the human antibody response to a natural infection remains largely unexplored. Here, we used next-generation high-density peptide-microarrays to demonstrate for the first time, that it is feasible to map hundreds of B-cell epitopes from a complex human infection. In this work, we have analyzed the B-cell immune response in humans with Chagas Disease, caused by a protozoan parasite.
The chip consists of a tiling array of ~200K 15-mer peptides which in concert cover >500 individual proteins, with a maximal 14-mer overlap. Antibody pools from healthy individuals and infected patients were assayed in a single chip, and data processed to obtain disease-specific signal for each position in a protein.
A testing set of previously known Chagas antigens with fine mapped epitopes was used to assess our performance on linear B-cell epitope identification. Performance on this task was excellent, with an area under the ROC curve of 0.92.
Discrimination of antigens from non-antigens is a more challenging task, however. Using a threshold of 20u (1u = background interquartile range) and a setup with an antigen-non-antigen ratio of 1%, we were able to detect 20 out of the set of 45 known antigens (44.4%) with a Positive-Predictive-Value (PPV) of 91% corresponding to 2 false positive predictions. Applying this threshold to the complete set of proteins analyzed on the chip, we detected 78 novel potential antigens, with an average of 1.5 epitopes per protein.
These findings open the door to complete B-cell response maps of complex human infections.
TOP
J06 - Processing and analysis of a whole-proteome microarray and the prediction of a diagnostic antigen set for the pathogen Bacillus anthracis
Short Abstract: Microbial pathogens express a large number of proteins during the course of infection. Many of these are ubiquitous or ‘housekeeping proteins and are found in many eubacteria. A key feature in investigating the pathogenesis of a disease is to identify antigens involved in pathogenicity or survival. These can be used as targets for vaccine or drug development or as a basis for diagnostic tests. Identification of core, disease-specific “sero-diagnostic” antigens/antigen profiles can be used to reliably diagnose infectious disease, or to inform vaccine design stategies. High-throughput protein microarrays are an excellent tool for this type of study: all of the individual expressed proteins from an infectious agent are printed on a microarray chip, and so the entire proteome can be interrogated using immune sera. This approach has been used successfully to study the humoral immune response to immune-dominant protein antigens for a number of important pathogens (e.g. F. tularensis, B. pseudomallei).
A Bacillus anthracis protein microarray from a commercial source was utilised to analyse the antigen reconition profile of sera (IgG & IgA) from a number of Anthrax exposed/infected human groups in comparison to negative controls. Due to the potential of significant cross-reactivity with non-anthrax Bacillus proteins, and the particular nature of the make-up of the array i.e. analysis confounding negative and positive controls, we undertook a number of pre-processing, normalization and differential expression analysis strategies in order to generate a robust set of anthracis-specific antigens. Analyses were undertaken in “R” using functions from several libraries, and commercially-available software.
TOP
J07 - Genome characterization and assembly of an emerging S. Infantis strain reveal the reasons for its spread
Short Abstract: Since 2007, a rapid emergence of S. enterica serotype Infantis (S. Infantis) has been observed in Israel, turning S. Infantis to the most prevalent serovar (2008-2011). To better understand this phenomenon, we performed phenotypic and molecular characterization of the emerging clone in relation to historical isolates. Phenotypic arrays and antimicrobial analysis showed that the emerging strain has multiple resistances to diverse antimicrobial compounds, and has an enhanced ability in adherence and invasion into non-phagocytic host cells. Furthermore, we demonstrated that the emerging clone has higher resistance to oxidative stress and superior formation of biofilms.
Whole genome sequencing was applied to determine the sequence of a large (> 200-kb) plasmid harbored by the emerging strain. This unique plasmid was found to mediate the resistances to tetracyclines (by the tetAR genes) and to trimethoprim (by the dfrA1 cassette). Additionally, the plasmid encodes two chaperone-usher fimbrial operons, and putative colonization factors, which may contribute to the enhanced adhesion, invasion and biofilm formation of the emerging strain. Sequence differences found on the chromosome provided mechanistic explanations for the resistances to nalidixic acid (a gyrA mutation) and to nitrofurantoin (nonsense mutations in nfsA and nfsB genes). Collectively, our data show that multiple episomal and chromosomal differences in the emerging strain are expected to increase its fitness in the environment and during host infection.
TOP
J08 - High Resolution Melting of Vertebrate Cytochrome B gene in Mosquito Blood Meals is Effective in Resolving Feeding Patterns of Potential Arbovirus Disease Vectors in Suba and Baringo Districts, Kenya
Short Abstract: Arbovirus diseases pathogens circulate in the Shores and Island of Lake Victoria and Lake Baringo, Kenya. Both habitats have variety of ecological regions that support a wide diversity of mosquito vectors as well as potential avian, reptilian and mammalian host species that are critical to arboviral disease epidemiology. We investigated vertebrate sources of field collected blood fed mosquitoes sampled by CO2 induced CDC light traps and BG sentinel traps along the Shores and adjacent Islands of Lake Victoria, Suba District and Lake Baringo in 2011 and 2012. Blood fed mosquitoes of genera Aediomya, Aedes, Anopheles, Culex, Mansonia, and Mimomya were trapped, sorted and morphologically identified by standard protocol. DNA was extracted from midgut contents and processed individually to determine feeding preferences by high resolution melting (HRM) assay of vertebrate mitochondrial cytochrome B (Cyt b) gene and result statistically analyzed and clustered by Rotor Gene Q ScreenClust software and R statistics. The result showed blood meals corresponding to domestic, peridomestic and sylvatic cycles. This method also proved efficient and powerful in resolving mixed blood meals. This result portrays the ability of mosquitoes to feed diverse number of a fact that could lead to interaction of vectors with possible arboviral reservoirs hosts. An integrated control approach targeting human, wildlife and domestic animals will form an integral part of a control strategy for mosquito borne arboviral diseases in both area.
TOP
J09 - Genotyping Complex Hepatitis B Virus Dual Infections Involving Recombinant Forms Using Population-Based Sequence Data
Short Abstract: Background:
Correct identification of Hepatitis B Virus (HBV) genotype from patient samples is a common task in routine diagnostics as HBV genotypes influence outcomes of antiviral therapy. We developed the dual infection (DI) model to genotype dual infection using population-based sequence data (Beggel et al., 2012). We now present an extension of the DI model to address dual infections of recombinant forms by the use of the jumping profile Hidden Markov Model (jpHMM) developed by (Schultz et al., 2006).

Methods:
Genotype- and position-specific nucleotide distributions (GPSND) were estimated using 1791 full genome HBV sequences acquired from GenBank. The DI model was combined with jpHMM to identify recombination sites in case of dual infections. Test data was generated by randomly combining triplets of sequences of distinct genotype from the GenBank data-set to simulate dual infections of non-recombinant and recombinant sequences.

Results:
The combination of the DI model and jpHMM was able to genotype 96.4% of 1000 generated test sequences correctly. The accuracy on position level was 98.4%. 98.0% of the simulated recombination sites were identified correctly with a median deviation from the ground truth location of 5 base-pairs.

Conclusions:
Dual infections of recombinant and non-recombinant sequences of distinct genotypes can be identified and genotyped reliably. Recombination sites can be localized precisely by the use of the jpHMM. Therefore, the model can help to disclose HBV dual infections in case of complex population-based sequence data which can help to optimize antiviral therapy regimens.
TOP
J10 - HIV Mutation Browser: Mutations from full text articles
Short Abstract: Extracting mutations from vast scientific literature and mapping them to the corresponding protein sequence is a challenge. Studying the effect of mutations on the structure of HIV and their role in the resistance to drugs in AIDS patients are important research topics in the field of virology. As an initial step in this direction to build an inventory of mutations in HIV, we have scanned 108594 full text articles related to HIV-1, extracted the point mutatons mentioned from these articles and mapped them back to the HIV-1 proteins. An interactive web application is developed to serve the mutations and the corresponding sentences from the literature and cross references to NCBI
PubMed. It also visualizes the mutations in multiple sequence alignment view and on the corresponding protein 3D structure. Soon it will be open to the public.

URL: http://hivmut.org
TOP
J11 - GsVIN: An analytical platform for genome-scale virus-host interaction network
Short Abstract: A virus-host interaction network offers a broad perspective on viral infection mechanism and disease etiology. While the concept of studying virus-host interaction network is not new, there are no bioinformatics tools/applications to facilitate such analysis on the genome scale. We developed a computational framework, GsVIN, to implement the analysis and comparison of the virus-host interaction networks for all human viruses. First, we built a database containing 13,058 virus-host protein-protein interactions (PPIs) involving 674 viral proteins and 2,388 human proteins. Secondly, because the virus-host PPI data were concentrated on several most studied human viruses, i.e. HIV, HCV and EBV, we sought to expand the virus-host PPI networks for each virus by two methods: i) adding more nodes to the network based on sequence homology between proteins from different viruses; ii) using protein domain-domain interaction (DDI) data to construct PPIs between viral proteins and human ones, thus expanding the existing networks. Thirdly, human protein atlas data were incorporated into GsVIN to make it possible to construct context-specific interaction networks. Lastly, we developed some new tools to view the virus-host interaction networks, and to compare multi-virus-human interaction networks or same-virus-human interaction networks of different tissues. Importantly, the analytical tool can apply GO enrichment technique to sort out the similar and the different GO modules between virus-human interaction networks. It is particularly useful to offer insight into the functions of newly emerging viral genomes. GsVIN analytical platform is freely accessible at http://www.viralportal.org/GsVIN/
TOP
J12 - Exploring the full spectrum of human-pathogen molecular events across the biomedical literature using text mining
Short Abstract: Approximately 1,400 reported human pathogenic species can be identified in the biomedical literature. Research into the underlying molecular mechanisms and in particular molecular interactions involved in each of these is vital for understanding these infections and developing treatments. However, despite the abundance of knowledge on these interactions, this information is not presented as machine-readable and as such is unavailable for full-scale analysis. Here we report wikipathogens.org, a full-scale database constructed to enable semi-automated curation of pathogen-related molecular interactions. We have used the existing contextual text mining database constructed in BioContext to extract and aggregate molecular interactions from all of the available biomedical text onto pathogen specific domains. Furthermore, we have supplemented each molecular event with pathogenic relevancy rankings and additional biological contexts (e.g. virulence, vaccine availability etc.) to provide researchers with more informative context. The data is presented to allow curation of the text mining derived results to ensure that the reported information has been extracted correctly and can then be used for more accurate biological analyses. We provide analyses of these results by comparing data retrieved across different pathogens to highlight key human proteins targeted by pathogens in their pathogenesis.
TOP
J13 - Identification of Soybean MicroRNAs Involved in Phytophthora sojae Infection by Deep Sequencing
Short Abstract: Small RNA, as an important regulatory molecular, plays a crucial role in the process of plant resistance to pathogen infection. In this study, we use the high through-put next generation sequencing technologies and bioinformatics approaches to analyze the small RNA population in soybean upon infection with Phytophthora sojae.
The small RNA libraries were constructed using P. sojae infected and mock infected soybean root tissues from two cultivars, Harosoy and Williams82. The libraries were mapped to the soybean genome and microRNAs (miRNAs) were identified. We searched known miRNAs in miRBASE, and performed a prediction for the novel miRNAs. Both known and novel miRNAs were subjected to expression profiling. Among the identified miRNAs, 11 known ones and 13 novel candidates exhibited differential expression levels in the P. sojae infected tissues. Further verification was conducted through Northern blot experiments.
miRNAs negatively regulate target gene expression by inducing mRNA cleavage and/or translation repression. We then predicted potential target genes of the differentially expressed miRNAs. Some predicted targets of these miRNAs, such as nucleotide binding leucine-rich repeat proteins (NB-LRR) and auxin receptors are known to regulate plant defense response. Moreover, miR1507 and miR2109, whose targets are mainly NB-LRR genes, have been computationally predicted to produce secondary phased small interfering RNAs (siRNAs) from their target loci. These phased siRNAs are also induced by P. sojae infection.
This study demonstrates that P. sojae infection triggers expression changes in specific small RNA species, which may play a role in plant immunity by regulating the expression of defense-related genes.
TOP
J14 - Dual RNA seq of pathogen and host in Legionella pneumophila infection
Short Abstract: Legionella pneumophila is a common cause of severe community-acquired pneumonia. This intracellular
pathogen manipulates host immunoreactions by interfering with intracellular signalling pathways and gene transcription,
thus enabling efficient bacterial replication within pulmonary cells.
We aim at characterizing the expression and regulation of both host and pathogen genes in L.pneumophila infection, their
functional consequences on bacterial replication, and potential bacterial interventions in the host cell pathways,
particularly the microRNA pathway.
In addition, we aim at identifying novel differentially expressed bacterial sRNAs and test the hypothesis that they might be
injected into the host cell cytosol.
We perform dual RNA-seq of infected human macrophages with L.Corby and the host and
pathogen transcriptomes are analyzed in parallel at different time points throughout the infection process. Such a dual approach allows to study gene expression on an in-vivo model, to understand how a pathogen
behaves within the host environment and get a deeper understanding of the infection process.
However, such approach has not yet been applied to mixed bacterium-eukaryote model systems, and rises both experimental and computationally challenges, such as the different RNA contents between host
and pathogen, required sequncing depth, normalization strategies and experiments which enrich for particular RNA
types. In this work we address these challenges and develop a computational pipeline for the analysis and
interpretation of such data including differential expression of both host and
pathogen transcripts, human microRNA and bacterial sRNA identification and differential regulation. In addition we
discuss some results from the first sequencing runs and the follow-up experiments performed to test the effect
of candidate miRNAs and pathogenic bacterial genes and sRNAs on the infection progression.
TOP
J15 - A comprehensive Human herpesvirus interactome by data integration
Short Abstract: Human herpes simplex virus 1 (HSV-1) is a large double-stranded DNA virus and the archetypal member of the herpesvirus family. The most common symptom of infection is cold sores but it can also lead to rare and fatal encephalitis.

We present a comprehensive HSV-1 intra-viral protein-protein interaction network. The interactome could help characterize poorly understood protein complexes that are essential to viral lifecycle functions and could form potential future drug targets.

We combine evidence from multiple sources: direct binary protein-protein interactions from an in-house database that collates interactions from multiple online repositories, interactions transferred from homologues and interactions inferred from domain-domain databases.

Direct protein-protein interactions are included in the database using the following four developments:

1) Transfer of interactions by homology matching using HHblits.

2) Interaction confidence scoring by tracking all lines of evidence. This scoring allows for a comprehensive interaction network that integrates multiple sources and facilitates filtering to remove potential false negatives.

3) Use of a novel in-house literature-mining tool to create corpora from full-text sources, implemented using UIMA Java framework for managing unstructured data.

4) Application of 'geometric de-noising', a recent graph-theory approach used to assess the final network for noise. This approach allows us to predict plausible novel interactions and provides a comparative validation of existing interaction scores.

The network should help further understanding of the virus and suggest sub-networks (and hence protein sub-complexes) that could help in future characterisation of the architecture of the entire virus.
TOP
J16 - Modeling Viral Life Progression
Short Abstract: The sequence of viral events taking place upon infection of the cell has been studied in several viral species. Quantitative progression of the viral life cycle can be characterized through time course measurement of marker species (e.g., a specific viral protein) produced as viral intermediates. Ideally, the measured marker concentration starts from zero and monotonically accumulates up to its maximum as more and more cells go through the desired phase of the viral life cycle. In practice, however, this is not the case as the measurements can be confounded with several other factors that obscure the true dynamics of viral progression. In this work we propose a parametric model based on ordinary differential equations to account for initial contamination, marker decay, and experimental noise in such experiments. We apply our model on data from nine viral intermediates produced during HIV-1 infection to generate a high-resolution picture of the various steps of its life cycle, and demonstrate how these pathogen dynamics can be used to describe the transcriptional response in the host cell.
TOP
J17 - Detection of virulence factors in bacterial pathogens via integrated network analysis of genome wide transposon mutagenesis deep sequencing data
Short Abstract: Genome wide transposon mutagenesis (e.g. TN5) on bacterial pathogens and subsequent mutant selection on antibiotic plates yield thousands of mutant populations with at least one transposon insertion per organism. Optionally, resulting mutant populations can be grown under specific conditions (e. g. human blood serum). Deep sequencing of bacterial DNA fragments with the transposon mosaic end as PCR amplification and sequencing primer produces millions of short reads, exclusively covering genomic regions adjacent to inserted transposons.

Adaptor contamination curated short reads are aligned to the prokaryotic genome and thus enabling the identification of transposon integration sites with base-level resolution. The distribution of insertion sites and their corresponding read counts is an indicator for essentiality of the surrounding genomic region for the applied condition. Because of the small ratio between insertion sites and genome length, essential regions in the bacterial genome are detected via applied unsupervised spatial (r-scan) statistics. Advanced comparisons between mutant libraries for different conditions identify essential factors depending on the conditions present. For a deeper understanding of the functional importance of the potential essential regions, we integrate the r-scan statistic into a network analysis. With a computationally extensive but exact network search algorithm, it is possible to find optimal and suboptimal maximum-scoring subgraphs representing functional modules (significantly essential subnetworks according to the tested condition).

This integrated genome wide approach enables identification not only of essential genes but of essential functional modules/pathways in specific conditions of interest (e.g. virulence factors) and thus finding potential drug targets for infectious bacteria.
TOP
J18 - Computational prediction of RNA-interacting bacteria efectors and their targets in host.
Short Abstract: A common virulence mechanism of bacterial-pathogens is the injection of effectors into host-cells that typically target proteins such as specific components of signaling pathways to establish colonization. Effectors that target nucleic acids in host-cells have not been well characterized. Only TAL effectors of the plant pathogen Xanthomonas are known to target host DNA to facilitate bacterial infection. However, RNA-interacting effectors are largely unknown. Hence, efficient and cost-effective methods for computational identification and characterization of effector-RNA interactions are needed to select potential candidates.

In this study, we aim to predict effectors that have the potential to interact with nucleic acids and understand their mechanisms in pathogenicity. We use sequence-based analysis to predict conserved domains in bacterial effectors. This is followed by selection of effectors containing RNA-interacting domains (RI-domains). For this purpose, large sets of RI-domains were retrieved computationally from several domain databases based on their annotation and curated manually. Furthermore, classification of selected effectors is carried out based on functional attributes of domains and ranked by their homology. The top candidates will be subjected to co-immunoprecipitation assays to validate their RNA-interacting potential and identify their RNA-binding partners.

The integration of computational and experimental methods in an iterative manner will be performed to develop and optimize a pipeline for the identification of RNA-interacting effectors and their RNA targets in host-cells. Furthermore, this pipeline can be used as a tool to assess the potential of a protein to interact to nucleic acid in different biological systems and gain new insights into their functions.
TOP

View Posters By Category

Search Posters:


TOP