### Theme Presentation Schedule

Attention Conference Presenters - please review the Speaker Information Page available here.

Highlights, Late Breaking Research and Proceedings Track submissions are presented by scientific theme as part of the combined Theme Presentation schedule.

(PT) - Correlates of deleteriousness: why we create different variant scores
Theme:
Date: TBARoom: TBA

Presenting author: Martin Kircher, ,

Session Chair:

Presentation Overview: Show

TOP

3D Sig (PT) - Birth and Future of Multiscale Modeling of Macromolecules
Theme:
Date: TBARoom: TBA

Presenting author: Michael Levitt, ,

Session Chair:

Presentation Overview: Show

TOP

3D Sig-Intro (PT) - A Multiscale Introduction of Michael Levitt
Theme:
Date: TBARoom: TBA

Presenting author: Ilan Samish, ,

Session Chair:

Presentation Overview: Show

TOP

AKES 03 - Giannoulatou (PT) - How to implement quality assurance in clinical diagnostic laboratories?
Theme:
Date: TBARoom: TBA

Presenting author: Eleni Giannoulatou, ,

Session Chair:

Presentation Overview: Show

TOP

AKES 03 - Ho (PT) - Bioinformatics Software Testing and Quality Assurance
Theme:
Date: TBARoom: TBA

Presenting author: Joshua Ho, ,

Session Chair:

Presentation Overview: Show

TOP

AKES 06 -Brooksbank (PT) - My Career
Theme:
Date: TBARoom: TBA

Presenting author: Cath Brooksbank, ,

Session Chair:

Presentation Overview: Show

TOP

AKES 06 -Goble (PT) - A grant reviewers perspective
Theme:
Date: TBARoom: TBA

Presenting author: Carole Goble , ,

Session Chair:

Presentation Overview: Show

TOP

AKES 06 -Mulder (PT) - Applied Knowledge Exchange session on "How to navigate a bioinformatics career path"
Theme:
Date: TBARoom: TBA

Presenting author: Nicky Mulder, ,

Session Chair:

Presentation Overview: Show

TOP

AKES 06 -QA (PT) - Q & A on Session: How to plan your career but also be aware of your opportunities
Theme:
Date: TBARoom: TBA

Presenting author: Cath Brooksbank, Winston Hide, Nicky Mulder, ,

Session Chair:

Presentation Overview: Show

TOP

AKES 06 -Rodgers (PT) - Being judged: an editor’s view
Theme:
Date: TBARoom: TBA

Presenting author: Peter Rodgers, ,

Session Chair:

Presentation Overview: Show

TOP

AKES 06 -Winhide (PT) - Win Hide's 2015: A career odyssey
Theme:
Date: TBARoom: TBA

Presenting author: Winston Hide, ,

Session Chair:

Presentation Overview: Show

TOP

Awards (PT) - Awards and Closing
Theme:
Date: TBARoom: TBA

Presenting author: - , ,

Session Chair:

Presentation Overview: Show

TOP

ISCB Award (PT) - ISCB Award
Theme:
Date: TBARoom: TBA

Presenting author: - , ,

Session Chair:

Presentation Overview: Show

TOP

KN01 (PT) - Fun With Large Structures and Masses of Sequence
Theme:
Date: TBARoom: TBA

Presenting author: Michael Levitt, ,

Session Chair:

Presentation Overview: Show

TOP

KN02 (PT) - Understanding microbial community function and the human microbiome in health and disease
Theme:
Date: TBARoom: TBA

Presenting author: Curtis Huttenhower, ,

Session Chair:

Presentation Overview: Show

TOP

KN03 (PT) - Genome regulation during embryonic development
Theme:
Date: TBARoom: TBA

Presenting author: Eileen Furlong, ,

Session Chair:

Presentation Overview: Show

TOP

KN04 (PT) - Reversible DNA rearrangement as a switch for cell type in yeasts
Theme:
Date: TBARoom: TBA

Presenting author: Kenneth Wolfe, ,

Session Chair:

Presentation Overview: Show

TOP

KN05 (PT) - How Lucky I Have Been
*Nature Publication Group has provided a journal cover photo used in this presentation
http://www.nature.com/nature/journal/v302/n5908/index.html
Theme:
Date: TBARoom: TBA

Presenting author: Cyrus Chothia, ,

Session Chair:

Presentation Overview: Show

TOP

KN06 (PT) - neXtProt 2015 highlights: SPARQL endpoint and biocuration efforts around the human protein variome
Theme:
Date: TBARoom: TBA

Presenting author: Amos Bairoch, ,

Session Chair:

Presentation Overview: Show

TOP

OP01 (PT) - Role of the DPAGT1/β-catenin/YAP signaling network in oral squamous cell carcinoma
Theme: Bioinformatics of Disease and Treatment / Functional Genomics
Date: Sunday, July 12, 10:10 am - 10:30 amRoom: Wicklow Hall 2B

Presenting author: Vinay Kartha, , United States

Session Chair:

Presentation Overview: Show
Progression of oral squamous cell carcinoma (OSCC) to metastasis involves complex changes in epithelial cell growth, survival and migration. While the roles of protein N-glycosylation, Wnt/β-catenin and Hippo pathways in cancer have been independently highlighted, the interplay between these pathways in promoting tumor metastasis is less understood. Prior studies have identified this co-dependent homeostatic pathway network to be deregulated in OSCC, playing a vital role in its tumorigenesis. However, identifying exact mediators of these changes still remains a challenging task and is crucial to the discovery of novel and lasting OSCC therapeutics. Here, we apply a multi-omic profiling approach to identify potential regulators of OSCC pathogenic pathway activity using a combination of OSCC cell line gene expression profiles and massive public genomic data. Gene expression signatures pertaining to genetic knockdowns of DPAGT1 - a gene crucial to protein N-glycosylation, and TAZ and YAP - two transcriptional activators involved in the Hippo pathway were derived using SCC2 cells. Primary human OSCC high-throughput gene expression data obtained from The Cancer Genome Atlas (TCGA) was then projected onto these signatures and analyzed for their association with clinical features including tumor grade and stage. By scoring samples based on their level of pathway deregulation, and additionally leveraging TCGA Copy Number Alteration (CNA), DNA methylation and somatic mutation data, we are able to identify potential genetic and epigenetic regulators of human OSCC development in the context of the DPAGT1/β-catenin/YAP signaling network, paving the way to discovering targets of OSCC therapy.

TOP

OP02 (PT) - LAPRAS: An Integrative Model Incorporating Heterogeneous Datasets to Discover Genetic Etiology of Autism Spectrum Disorder
Theme: Bioinformatics of Disease and Treatment / Functional Genomics
Date: Sunday, July 12, 10:10 am - 10:30 amRoom: Wicklow Hall 2B

Presenting author: Sumaiya Nazeen, , United States

Session Chair:

Presentation Overview: Show
Autism spectrum disorder (ASD), prevalent in 1% of the population, refers to a group of complex neurodevelopmental disorders sharing the common feature of dysfunctional reciprocal social interaction. There is compelling evidence that genetic factors are a predominant cause of ASD; however, the genetic heterogeneity underlying ASD makes it challenging to gain conclusive biological insights into the disease. Most of the general-purpose gene prioritization methods and ASD-specific gene network methods suffer from the limitation of depending just on the protein-protein interaction (PPI) network and/or co-expression network, and do not properly utilize other types of ASD-related information available in literature. We believe understanding the complex genetic background of ASD requires a strategy that can integrate multiple forms of data. To this end, we present a computational method termed LAPRAS (LAsso-Penalized logistic Regression based gene ASsociation) that incorporates ASD-specific DNA copy number variations, PPI network topology, phenotypic similarities of diseases, and pathway knowledge from literature. We provide a rank-list of genes in descending order of their probability of association with ASD. The top-ranked genes are overrepresented in neurological pathways, cell adhesion pathways, and signal transduction pathways pertinent to brain, cellular assembly and communication, synaptic development, and neuronal development. The most significant sub-networks discovered in the top-ranked genes are overrepresented in gastro-intestinal disorders, nervous system development, hereditary developmental disorders, and organismal abnormalities suggesting the existence of subclasses of ASD. This integrative method is novel and outperforms other state-of-the-art gene ranking methods.

TOP

OP03 (PT) - Combined strategy to detect somatic point mutations from circulating DNA by targeted sequencing
Theme: Bioinformatics of Disease and Treatment / Genetic Variation Analysis
Date: Sunday, July 12, 10:10 am - 10:30 amRoom: Wicklow Hall 2B

Presenting author: Nicola Casiraghi, , Italy

Session Chair:

Presentation Overview: Show
We developed a computational method that combines genetic knowledge and empirical signal to readily detect and quantify somatic point mutations in cell free DNA by fully exploiting single base resolution information from targeted next generation sequencing data using patient’s plasma (case) and matched germline sample (control). First, each targeted base is tested both in cases and controls for allelic fraction, local coverage and reads supporting the alternative allele(s). Controls allelic fractions distribution is built to determine the cut-off corresponding to the desired detection specificity. Second, to mitigate the impact of potential strand-bias, we implemented a combination of standard Fisher’s and Odds Ratio tests with ad-hoc analysis of study cohort reference/alternative strand proportions distribution. Third, control samples are exploited to build a genomic locus-specific error model to estimate the probability that observed case allelic fraction is indeed evidence of a somatic event. Fourth, comparison of expected versus observed ratios of non-synonymous and synonymous substitution rates in targeted control genes is adopted as additional quality check. Last, if the targeted design allows for case tumor content and local somatic copy number state estimations, the method also controls for point mutation detection suitability stratified by locus coverage (false negative rates). The robustness of our combined strategy was tested across a range of coverage depths by in-silico down-sampling analysis. We will present the strategy efficacy on 46 plasma samples from 15 metastatic patients recently profiled with a targeted panel spanning 40 Kb across eight cancer genes at 1500X mean coverage.

TOP

OP04 (PT) - Transcriptomics of rare cell populations in the aging neural stem cell lineage
Theme: Bioinformatics of Disease and Treatment / other
Date: Sunday, July 12, 10:30 am - 10:50 amRoom: Wicklow Hall 2B

Presenting author: Katja Hebestreit, , United States

Session Chair:

Presentation Overview: Show
Neural stem cell niches in the adult brain are the locations where neural stem cells produce new neurons necessary for the maintenance and plasticity of brain function. With age, neural stem cell niches deteriorate, with a decline in neural stem cell proliferation and production of new neurons. To examine the transcriptional landscape in neural stem cells during aging we obtained RNA-seq data from freshly isolated cells along the neural stem cell lineage from young and old mice. Because of very low cell numbers per replicate and because differences with age were expected to be subtle, we captured unwanted variance in the data using surrogate variable analysis. We used limma to detect differentially expressed genes between cell types and between old and young samples for each cell type. We found strong gene expression differences between the cell types, especially between quiescent and activated cell types. Intriguingly, we found that quiescent neural stem cells show transcriptional changes with age, whereas activated neural stem cells do not seem to have an aging signature. Using pathway enrichment analysis we found that quiescent and activated neural stem cells use different primary pathways to carry out different modes of proteostasis with quiescent neural stem cells favoring autophagy and activated neural stem cells using the proteasome pathway. As defective proteostasis is a hallmark of aging, it represents an interesting candidate of further investigation to understand why activated neural stem cells are protected from transcriptional aging.

TOP

OP05 (PT) - Improving Clustal Omega's sequence alignment accuracy with annotated profile Hidden Markov Models
Theme: Sequence Analysis / Sequence Analysis
Date: Sunday, July 12, 10:30 am - 10:50 amRoom: Wicklow Hall 2B

Presenting author: Quan Le, , Ireland

Session Chair:

Presentation Overview: Show
Clustal Omega is the latest member of the Clustal sequence alignment program family; it allows the use of an additional profile HMM to improve the accuracy of the alignment. In this experiment, we use the tools HMMER 3.0 and pfam_scan to annotate each sequence in the set of sequences to align with profile HMMs from the Pfam database, we then add the annotated profile HMMs as the extra inputs to Clustal Omega to improve the alignment quality. Using one Pfam profile HMM per one alignment, we obtain positive results on all 5 reference sets of sequence alignment benchmark BALIBASE 3.0 (the average total columns scores improve from 2.4 % for the reference 3 to more than 20% for the reference 1 version 1 ). For the case multiple Pfam profile HMMs hit the sequences to align, we are performing initial experiments with using concatenated profile HMMs to improve further the alignment quality.

TOP

OP06 (PT) - Investigating evolutionary models of genome structure in aggressive prostate cancer
Theme: Bioinformatics of Disease and Treatment / Sequence Analysis
Date: Sunday, July 12, 10:30 am - 10:50 amRoom: Wicklow Hall 2B

Presenting author: Marek Cmero, , Australia

Session Chair:

Presentation Overview: Show
Tumour evolution is a complex and multifaceted process. Recently, many approaches have arisen for inferring the evolutionary dynamics of tumour cell populations from point-mutation and copy-number data. Studying the role of structural variations (SVs) in cancer evolution however, particularly balanced rearrangements, has been less thoroughly explored. We present a method of reconstructing cancer phylogeny from multiple single-patient samples using large scale genomic aberrations and apply it to prostate cancer, which is particularly rearrangement-driven. We demonstrate that tumour phylogenies are able to be reconstructed using rearrangement data alone, and we further expand our model to characterise subclonal SVs. We demonstrate our methods by applying them to longitudinal samples from patients undergoing second-line anti-hormone therapy to gain insight into the mechanisms of castration resistance.

TOP

OP07 (PT) - Systematic characterization of the disease and tissue distributions for identification of novel drug targets
Theme: Bioinformatics of Disease and Treatment / Systems Biology and Networks
Date: Sunday, July 12, 10:50 am - 11:10 amRoom: Wicklow Hall 2B

Presenting author: David Westergaard, , Denmark

Session Chair:

Presentation Overview: Show
The identification and validation of drug targets remains a major obstacle in drug development. To date, the majority of drug targets fall into four classes: G protein-coupled receptors, nuclear receptors, ion channels, and kinases (Overington et al., 2006). Illuminating the Druggable Genome (IDG) is an NIH initiative that will aid the discovery of novel targets by integrating heterogeneous methods and data sources. To this end, we have developed two novel resources, DISEASES and TISSUES, which project evidence onto proteins from the STRING database and two controlled vocabularies, namely Disease Ontology and the BRENDA Tissue Ontology. The use of controlled vocabularies ensures a perfect translatability between the two resources.

The DISEASES and TISSUES resources both integrate heterogeneous evidence from manually curated databases, high-throughput experiments, and automatic literature mining. DISEASES integrates disease-gene associations from Genetic Home Reference, UniProt, DistiLD, COSMIC. TISSUES is a database of gene expression in human tissues according to publicly available data from microarrays, RNA sequencing, mass spectrometry and immunohistochemical staining. Both resources also contain evidence from comentioning in Medline abstracts. Using gold standards, we calibrate quality scores across evidence types and estimate a confidence level for each association.

The resources described here are publicly available under a CC-BY-4.0 license at http://diseases.jensenlab.org and http://tissues.jensenlab.org

TOP

OP08 (PT) - Bioinformatic Analysis of Long Non-coding RNAs in Neuroblastoma
Theme: Bioinformatics of Disease and Treatment / Systems Biology and Networks
Date: Sunday, July 12, 10:50 am - 11:10 amRoom: Wicklow Hall 2B

Presenting author: Kate Killick, , Ireland

Session Chair:

Presentation Overview: Show
Neuroblastoma is an embryonic childhood cancer arising from the neural crest progenitor cells of the sympathetic nervous system. It is the most commonly found extra cranial pediatric tumor accounting for approximately for 15% of all childhood cancer deaths. Amplification of the MYCN gene is found in 25% of neuroblastoma tumors and the degree of amplification is correlated with patient outcome. Non-coding RNAs have no protein coding potential yet have been shown to play a role in a diverse range of cellular functions including cell differentiation and embryonic development. In particular, over the last several years a large body of literature has emerged supporting a role for long non-coding RNAs (lncRNAs) in many types of cancer. Identification of novel lncRNAs has the potential to serve as diagnostic markers and therapeutic targets in this complex disease. Coupled with this, improved methods of examining the transcriptome have enabled advances in identifying and understanding non-coding RNAs. Here bioinformatic analyses were used to identify lncRNAs from RNAseq data taken from a range of MYCN amplified neuroblastoma cell lines. Time course data from a MYCN over-expressed cell line was also examined as well as data from a neuroblastoma cell line treated with a retinoid compound known to induce differentiation of neuroblastoma tumors into mature neurons, rendering them benign. Collectively these results demonstrate the induction of lncRNAs by MYCN in neuroblastoma and identify a subset lncRNAs involved in neuroblastoma cell fate and offer a new perspective for neuroblastoma research.

TOP

OP09 (PT) - A new molecular signature approach for prediction of driver cancer pathways from transcriptional data
Theme: Bioinformatics of Disease and Treatment / Systems Biology and Networks
Cancelled
Date: Sunday, July 12, 10:50 am - 11:10 amRoom: Wicklow Hall 2B

Presenting author: Boris Reva, , United States

Session Chair:

Presentation Overview: Show
Assigning cancer patients to the most effective treatments requires an understanding of the molecular basis of their disease. While DNA-based molecular profiling approaches have flourished over the past several years to transform our understanding of driver pathways across a broad range of tumors, a systematic characterization of key driver pathways based on RNA data has not been undertaken.

Here we introduce a new approach to predict the status of driver cancer pathways based on weighted sums of gene expressions or signature functions derived from RNA sequencing data. To identify the driver cancer pathways of interest, we mined DNA variant data from TCGA and nominated driver alterations in seven major cancer pathways in breast, ovarian, and colon cancer tumors. The activation status of these driver pathways were then characterized using RNA sequencing data by constructing signature functions in training datasets and then testing the accuracy of the signatures in test datasets.

The signature functions perform well in separation tumors with nominated active pathways from tumors with no genomic signs of activation (average AUC equals to 0.83) systematically exceeding the accuracies obtained by the SVM method that we employed as a control approach. A typical pathway signature is composed of ~20 biomarker genes that are unique to a given pathway and cancer type. Our results confirm that driver genomic alterations are distinctively displayed at the transcriptional level and that the transcriptional signatures can generally provide an alternative to DNA sequencing methods in detecting specific driver pathways.

TOP

OP10 (PT) - Evaluating and optimizing variant calling: a comparison of Roche 454, Ion Torrent PGM and Illumina NextSeq sequencing data
Theme: Genetic Variation Analysis / Bioinformatics of Disease and Treatment
Date: Sunday, July 12, 11:40 am - 12:00 pmRoom: Wicklow Hall 2B

Presenting author: Sarah Sandmann, , Germany

Session Chair:

Presentation Overview: Show
There are various next-generation sequencing (NGS) techniques, all of them striving to replace Sanger sequencing as the gold standard. The ongoing development of NGS methods has greatly reduced turnaround time and cost of sequencing. However, false positive calls of SNVs and specially indels are a widely known problem of basically all NGS sequencers.

We developed optimized variant calling pipelines for three common NGS sequencers considering both SNVs and short indels. Amplicon-based targeted sequencing of 20 genes known to be recurrently mutated in myeloid dysplastic syndromes (MDS) was performed in parallel on Roche 454, Ion Torrent PGM and Illumina NextSeq500 platforms. Diagnostic material of MDS patients -- partially sequenced twice on each sequencing platform -- formed the basis of the optimization, representing the learning cohort. If required, called variants were confirmed by Sanger sequencing of the original patient material.

We calculated various parameters to characterize both SNVs and indels. Yet, instead of setting arbitrary thresholds for each parameter, we combined them to estimate generalized linear models returning a probability for each variant to be a true positive. A single threshold for each model was chosen to provide maximum sensitivity as well as a maximum positive predictive value.

Subsequently, we performed a comparison of the three NGS platforms and their previously optimized variant calling pipelines. Sequencing data from additional MDS patients with lab validated SNVs and indels formed the basis of the comparison, representing the validation cohort.

TOP

OP11 (PT) - Bacterial vaccine design using reverse vaccinology
Theme: Pathogen informatics / Bioinformatics of Disease and Treatment
Date: Sunday, July 12, 11:40 am - 12:00 pmRoom: Wicklow Hall 2B

Presenting author: Ashley Heinson, ,

Session Chair:

Presentation Overview: Show
Reverse Vaccinology (RV) uses computational approaches to identify vaccine candidates in the genomes of bacterial pathogens. Vaccine development for bacterial pathogens is at a critical juncture due to widespread antibiotic resistance. Previously our group was the first to apply machine learning approaches to the identification of vaccine candidates in an RV pipeline. The current study aims to dramatically enhance RV by increasing the size of the training data, expanding the number of bioinformatics programs with biological relevance used for protein annotation, and employing nested cross-validation. A literature search identified 200 vaccine candidates, defined as a protein that resulted in significant protection in an animal model following immunization and subsequent challenge with a bacterial pathogen. This positive training data was twinned with negative training data and annotated with 30 bioinformatic tools capable of annotating protein data to derive a total of 200 annotation features. A support vector machine was trained on this data and compared to previous analyses that used smaller training data sets, less protein annotation tools, and improper models of cross-validation. Although nested cross validation led to a reduction in accuracy compared to previous methods (that were over fit), increasing the size of the training data set and expanding the number of protein annotation tools led to higher accuracies (>92%). In conclusion, we have dramatically improved previous RV approaches such that our trained classifier can now be used to select novel vaccine candidates in the genomes of bacterial pathogens for validation in animal models.

TOP

OP12 (PT) - Pathway relevance ranking for tumor samples through network-based data integration
Theme: Systems Biology and Networks / Bioinformatics of Disease and Treatment
Date: Sunday, July 12, 11:40 am - 12:00 pmRoom: Wicklow Hall 2B

Presenting author: Lieven Verbeke, , Belgium

Session Chair:

Presentation Overview: Show
We present a new pathway relevance ranking method that is able to prioritize pathways according to the information contained in any combination of tumor related omics datasets. Key to the method is the conversion of all available data into a single network representation containing not only genes but also individual patient samples. Additionally, all data are linked through a network of previously identified molecular interactions. The performance of the new method is demonstrated by applying it to breast and ovarian cancer datasets from The Cancer Genome Atlas. By integrating gene expression, copy number, mutation and methylation data, the method’s potential to identify key pathways involved in breast cancer development shared by different molecular subtypes, is illustrated. Interestingly, certain pathways were ranked equally important for different subtypes, even when the underlying (epi)-genetic disturbances were diverse. The pathway ranking method was also able to identify subtype-specific pathways. Often the score of a pathway could only be explained by a combination of genetic and epi-genetic disturbances, stressing the need for a network-based data-integration approach. The analysis of ovarian tumors, as a function of survival-based subtypes, demonstrated the method’s ability to correctly identify key pathways, irrespective of tumor subtype. A differential analysis of survival-based subtypes revealed several pathways with higher importance for the bad-outcome patient group than for the good-outcome patient group. Many of the pathways exhibiting higher importance for the bad-outcome patient group could be related to ovarian tumor proliferation and survival.

TOP

OP13 (PT) - ContiBAIT: An R Package for Genome Finishing Using Strand-seq
Theme: Genome Organization and Annotation / Genome Organization and Annotation
Date: Sunday, July 12, 12:00 pm - 12:20 pmRoom: Wicklow Hall 2B

Presenting author: Kieran O’Neill, , Canada

Session Chair:

Presentation Overview: Show
Strand-seq is a method for directional, low-coverage sequencing of DNA
template strands in single cells. Taken together, strand-seq data from
cells from the same organism provide genomic distance information.
This can be used to improve the quality of early-build reference
genomes made up of many contigs with no bridging sequence, firstly by
grouping contigs from the same chromosome together, and secondly by
ordering contigs within chromosomes. We present ContiBAIT, an R

For grouping contigs into chromosomes, contiBAIT uses a custom
clustering method based on a Chinese restaurant process. Contigs are
then reoriented using a greedy algorithm which optimises for global
inter-contig distance. Contig groups showing close strand similarity
following reorientation are merged.

For ordering contigs within a putative chromosome, ContiBAIT computes
the strand distance between all pairs of contigs. The problem then
becomes one of finding the lowest-weight Hamiltonian path over the
contigs, which can be reformulated into a travelling salesman problem.
ContiBAIT then finds the best ordering of contigs using the TSP
package.

To validate contig clustering, we applied ContiBAIT to an early build
of the mouse genome (mm2), with coordinates lifted over to mm10.
ContiBAIT was able to assign most contigs with sufficient read depth
for strand-seq analysis to the correct chromosome (median
F-measure=0.91).

To validate contig ordering, we applied ContiBAIT to artificial
contigs sampled from mm10, of sizes 1MB, 500kB and 250kB. Some
chromosomes were well-ordered (Pearson's rho=0.99), while others had
large sections locally well-ordered but incorrectly ordered relative
to each other.

TOP

OP14 (PT) - Novel brain-specific miRNA discovery using small RNA sequencing in post-mortem human brain
Theme: Genome Organization and Annotation / Bioinformatics of Disease and Treatment
Date: Sunday, July 12, 12:00 pm - 12:20 pmRoom: Wicklow Hall 2B

Presenting author: Christian Wake, , United States

Session Chair:

Presentation Overview: Show
MicroRNAs (miRNA) are short non-coding RNAs that regulate gene expression mainly through translational repression of target mRNA molecules. More than 2700 human miRNAs have been identified and some are known to display tissue-specific patterns of expression. Here, we use high-throughput small RNA sequencing to discover novel and possibly brain-specific miRNAs in 94 human post-mortem prefrontal cortex samples from patients with Huntington's disease and Parkinson's disease and normal neuropathology. Using a custom analysis pipeline, we identified 66 novel miRNA candidates that originate in both intergenic and intragenic regions of the genome. 21 of the candidate miRNAs show sequence similarity with known mature miRNA sequences and may be novel members of known miRNA families, while the remaining 45 may constitute previously undiscovered families of miRNAs that are specific to the brain. In a small number of these novel miRNAs, preliminary differential expression analysis between neurodegenerative disease and normal samples identified differences in expression. These results suggest that a portion of these novel miRNAs may not only be unique to brain, but may have a role in the neurodegenerative disease processes.

TOP

OP15 (PT) - Computationally efficient approach for novel transcript discovery across large RNA-seq dataset reveals glioblastoma-associated lncRNAs
Theme: Genome Organization and Annotation / Bioinformatics of Disease and Treatment
Date: Sunday, July 12, 12:00 pm - 12:20 pmRoom: Wicklow Hall 2B

Presenting author: Maria Laaksonen, , Finland

Session Chair:

Presentation Overview: Show
Availability of RNA-sequencing data from human tumors and normal tissues has resulted in discovery of hundreds of tissue specific transcripts. Uncovering novel transcripts typically requires computationally expensive de novo transcriptome assembly and combination of assemblies across samples have proven challenging. To be able to search for new transcripts from large RNA-seq cohorts, we developed a computational approach that directly identifies unannotated genomic loci that are variably expressed within a sample set, or differentially expressed between two sample sets. These loci are then subject to gene structure analysis, allowing identification of full transcript structures in data driven manner. Our approach was validated by re-discovering a set of well annotated genes. We were able to correctly re-build known gene structures and identify the typical structural features of protein coding genes even when only a single exon of the gene was given as input.

We applied our approach to RNA-seq data of 169 primary glioblastoma samples from The Cancer Genome Atlas (TCGA). We identified 53 unannotated transcripts that did not contain good quality open reading frames, indicating that they were lncRNAs. The expression of 20 out of 22 high confidence lncRNAs was validated by PCR in at least one glioblastoma cell line. Clinical association analyses in the TCGA glioma cohort revealed that a subset of lncRNA expression profiles associates with patient survival, tumor grade and/or IDH1 mutation status. The functional analysis of lncRNA knockdowns was performed in glioblastoma cells to evaluate their significance in disease aggressiveness.

TOP

OP16 (PT) - Tau Protein Related Acetylation of Histone 3 Lysine 9 in the Human Brain
Theme: Epigenetics / Bioinformatics of Disease and Treatment
Date: Sunday, July 12, 12:20 pm - 12:40 pmRoom: Wicklow Hall 2B

Presenting author: Hans-Ulrich Klein, , United States

Session Chair:

Presentation Overview: Show
Accumulation of tau proteins and amyloid-β peptides in the brain are two hallmarks of Alzheimer’s Disease (AD). Recent studies suggest that epigenetic mechanisms are likely to play a key role in the pathogenesis of AD. Here, we studied genome wide the active mark H3K9ac using ChIP-seq in 669 post-mortem human brain samples to detect alterations of the epigenome induced by tau. RNA-seq was performed for 500 samples to assess the effect on transcription. We considered modifications of local H3K9ac domains as well as large genomic regions and distinguished alterations primarily associated with tau from those with amyloid.

We identified 26,384 H3K9ac domains which primarily occurred at promoters (15,225) and enhancers (8,071). H3K9ac levels at promoters were positively correlated with transcription, even though H3K9ac alone was not sufficient for transcription. Tau protein loads were significantly associated with H3K9ac levels in 5,980 domains and had a much broader impact than amyloid (610 domains). Domains positively associated with tau showed a strong enrichment (p<10^-16) for binding sites of CTCF, which regulates chromatin structure. Indeed, we found large genomic regions showing concordant tau associated increases in H3K9ac. Average transcription in these regions was consistently up-regulated. Strikingly, effect sizes within the regions were highly correlated with the regions' proportion of open chromatin.

Our results demonstrate a genome wide change in chromatin structure in AD, which is mediated by tau. Tau is known to cause heterochromatin relaxation in Drosophila models. CTCF could be a key factor in the pathogenic process of chromatin opening.

TOP

OP17 (PT) - Low concordance of differential DNA methylation analysis methods
Theme: Epigenetics / Sequence Analysis
Date: Sunday, July 12, 12:20 pm - 12:40 pmRoom: Wicklow Hall 2B

Presenting author: Helen McCormick, , Australia

Session Chair:

Presentation Overview: Show
DNA methylation is one of the most widely used markers for the study of epigenetic contributions to phenotypic variation and disease. There are several methods for analyzing genome-wide DNA methylation data in common use, but there has been no rigorous evaluation of their performance. We have performed a systematic assessment and comparison of four packages: MethySig, methylKit, eDMR and DSS, using an empirical dataset of 12 reduced representation bisulphite sequencing libraries (6 test, 6 control). Surprisingly, we observed very low concordance among these commonly used model-based and binomial test-based approaches: using equivalent pre-processing and filtering parameters for each method, we found that the four methods identified significant differentially methylated cytosines at a concordance rate of less than 1%. Similarly low levels of concordance were observed with identification of differentially methylated regions using tiled data. Our study highlights the need for systematic approaches to reliable differential methylation analysis via data simulation. This concept of simulation will be discussed in the context of the growing implementation of epigenomic data in human medicine.

TOP

OP18 (PT) - Computational method for detecting patterns of epigenetic changes from time series ChIP-seq data
Theme: Epigenetics / Functional Genomics
Date: Sunday, July 12, 12:20 pm - 12:40 pmRoom: Wicklow Hall 2B

Presenting author: Petko Fiziev, , United States

Session Chair:

Presentation Overview: Show
Histone modifications associate with important regulatory regions such as promoters and distal enhancers that control the expression of genes. Time-course genome-wide maps of these epigenetic marks have become available in a growing number of biological settings including stem cell reprogramming and differentiation, adipogenesis, cardiac development, circadian rhythms, embryogenesis and lymphocyte development. However, our understanding of the underlying cellular processes remains limited, because the current bioinformatics tools often fail to utilize fully the temporal aspects of this data. Here, we present a novel computational method for systematic detection of major classes of spatio-temporal patterns of epigenetic changes. The method takes as input data from a series of chromatin immunoprecipitation followed by high throughput sequencing (ChIP-seq) experiments for a single histone mark that are performed at consecutive time points during a given biological process. The method uses a probabilistic mixture model that explicitly models the spatio-temporal nature of the data to identify regions for which the mark either expands or contracts significantly with time or holds steady. Furthermore, it incorporates information about replicate experiments at each time point, which can increase the accuracy of the method. We present applications of the method on publicly available data from T-cell development, which help in understanding the underlying regulatory dynamics during this process.

TOP

OP19 (PT) - Human paralog genes share regulatory elements and co-localize in the three-dimensional chromatin architecture
Theme: Functional Genomics / Genome Organization and Annotation
Date: Sunday, July 12, 2:00 pm - 2:20 pmRoom: Wicklow Hall 2B

Presenting author: Jonas Ibn-Salem, , Germany

Session Chair:

Presentation Overview: Show
Paralog genes arise from gene duplication events during evolution. The resulting sequence similarity between paralogs often leads to proteins of similar structures and functions in common pathways. Therefore it might be useful for the cell to have paralog genes co-regulated. However, since paralog genes often show also slightly different functions, for example alternative domains, it might be also useful for cells to exclusively express only one out of several paralogs for a specific function or response.
Eukaryotic genes are regulated by binding of transcription factors to distal enhancer elements which perform looping interactions to the transcription machinery at gene promoters. We hypothesised that paralog genes share common regulatory mechanism that allows co-regulation and exclusive expression.

To test this hypothesis, we integrated paralogy annotations with genome-wide data-sets of enhancer-promoter associations and genome-wide chromatin interaction data from Hi-C experiments in human cells.

With carefully sampled control data sets that take linear co-localisation of paralogs into account, we show that paralog gene pairs share a significant amount of common enhancer elements. Furthermore they are located significantly more often in the same topological association domain than expected and therefore cluster not only in the linear genome but also in the three-dimensional chromatin structure of the nucleus.

Together our results indicate that human paralog gene pairs share common regulatory mechanisms. We will further integrate expression data from different tissues and functional annotation of genes to support our findings that paralog genes tent to be expressed either collectively or exclusively depending on the cells functional needs.

TOP

OP20 (PT) - Comprehensive analysis of association between heterogeneity and translation of 5’ leaders
Theme: Functional Genomics / Genome Organization and Annotation
Date: Sunday, July 12, 2:00 pm - 2:20 pmRoom: Wicklow Hall 2B

Presenting author: Paul Korir, , Ireland

Session Chair:

Presentation Overview: Show
There is overwhelming evidence of translation of upstream Open Reading Frames (uORFs) in the 5’ leaders of many mammalian mRNAs. The translation of uORFs often inhibits translation of annotated coding ORFs (acORFs) and allows for regulation in response to changes in cellular conditions. We hypothesised that 5’ leader heterogeneity of alternative transcripts (due to alternative transcription initiation and splicing) is associated with a synthesis of mRNAs that code for the same protein product, but regulated differently depending on uORF organization of their 5’ leaders.

To explore the relationship between translation of 5’ leaders and their heterogeneity, we carried out bioinformatic analyses using publicly available ribosome footprinting data. Our analyses involve identifying high-confidence translated regions then estimating various facets of heterogeneity of 5’ leaders across alternative transcripts. We devised a simple peak-calling method on ribosome footprints in 5’ leaders treating such peaks as a proxy for 5’ leader translation. We defined heterogeneity on the set of transcript isoforms associated with a pair of translation termination sites for non-overlapping genes. We reasoned that such an approach would emphasise the effect of heterogeneity because such transcripts differ only in mRNA leader regions, which confer the bulk of regulatory activity. We examined several key aspects of heterogeneity such as alternative initiation and/or splicing, mean leader length across isoforms, sequence content (uAUGs, GC content, regulatory motifs such as terminal-oligopyrimidine (TOP) tracts, codon bias), and secondary structure on translation. Finally, we performed functional analyses on extreme cases cases of heterogeneity to identify enriched gene categories.

TOP

OP21 (PT) - A systems biology characterization of the anti-cancer compound Vorinostat.
Theme: Systems Biology and Networks / Proteomics
Date: Sunday, July 12, 2:00 pm - 2:20 pmRoom: Wicklow Hall 2B

Presenting author: Christopher Woelk, ,

Session Chair:

Presentation Overview: Show
Vorinostat is a histone deacetylase inhibitor (HDACi) used to treat refractory cutaneous T-cell lymphoma (CTCL) and is being investigated as a component of “shock and kill” strategies to cure HIV. Vorinostat inhibits deacetylation, leading to the acetylation of histones and the relaxation of chromatin. However, little is known about other mechanisms of action or the off-target effects of this compound. Therefore, the effects of Vorinostat on primary CD4 T cells were evaluated in a systems biology approach. Cells were isolated from 10 healthy donors and treated with 1µM of Vorinostat for 24 hours or left untreated. Protein extracts from 4 donors were subjected to iTRAQ labeling and characterized by two-dimensional liquid chromatography-mass spectrometry quantitative proteomics. RNA was isolated from 6 donors and subjected to transcriptomic analysis (Illumina HT12 v4 microarrays). Differentially expressed genes (DEGs) and proteins (DEPs), as well as differentially expressed phosphorylated (DPPs) and acetylated (DAPs) proteins were identified using Limma. Data integration was primarily facilitated by using all four data types to construct a single protein interaction network. The addition of proteomic data revealed a much more detailed protein interaction network with the inclusion of many nodes not regulated at the transcriptional level but at the post-translational level. In addition, HMGA1 was differentially expressed at the transcript, protein, and acetylated protein levels. This protein is of particular interest since it may repress transcription from the HIV promoter and thus may limit the effectiveness of Vorinostat in HIV cure strategies.

TOP

OP22 (PT) - A high-resolution gene expression atlas of epistasis between gene-specific transcription factors reveals new mechanisms for genetic interactions
Theme: Functional Genomics / Systems Biology and Networks
Date: Sunday, July 12, 2:20 pm - 2:40 pmRoom: Wicklow Hall 2B

Presenting author: Patrick Kemmeren, , Netherlands

Session Chair:

Presentation Overview: Show
Recent studies have systematically exposed large numbers of non-additive genetic interactions, the majority of which are functionally uncharacterized. To investigate such genetic interactions between gene-specific transcription factors (GSTFs) in Saccharomyces cerevisiae, we systematically analysed 72 GSTF pairs by DNA microarray analysis of double and single deletion mutants. These pairs were selected through previously published growth-based genetic interaction as well as through similarity in DNA binding properties. The result is a high-resolution atlas of gene expression-based genetic interactions that provides systems-level insight into GSTF epistasis. The atlas confirms known genetic interactions and exposes new ones. Importantly, the data can be used to elucidate the mechanisms that underlie individual genetic interactions. Evidence is provided for two previously uncharacterized mechanisms, "Buffering by induced dependency" and "Alleviation by derepression". These mechanisms demonstrate how negative genetic interactions can occur between seemingly unrelated pathways and how positive genetic interactions can indirectly expose parallel- rather than same-pathway relationships. The study provides general insights into the complex nature of epistasis and results in new models for genetic interactions, the majority of which do not fall into easily recognizable within- or between pathway relationships.

TOP

OP23 (PT) - A novel approach to identify highly connected and differentially expressed subnetworks reveals underlying biological processes in endometrial cancer metastasis
Theme: Functional Genomics / Systems Biology and Networks
Date: Sunday, July 12, 2:20 pm - 2:40 pmRoom: Wicklow Hall 2B

Presenting author: Kanthida Kusonmano, , Norway

Session Chair:

Presentation Overview: Show
Differential expression analyses based on high-throughput data have been used to study molecular changes between phenotypes of interest. Further from a typical analysis of deriving a ranked list of individual genes that are significant different between the studied conditions, several methods have been developed to identify differentially expressed genes as a set to facilitate functional interpretation. One main approach is gene set analysis, which evaluates functional enrichment of differentially expressed genes based on publicly available gene sets. Meanwhile network analysis tries to identify functional modules of genes with their interactions based on a studied data. Here we present a novel approach, which combine the features of both gene set and network analyses by detecting subnetworks based on internal relations of the studied data and assessing their differential expression using a well-known gene set method, Gene Set Enrichment Analysis (GSEA). The subnetworks are derived by integrating a priori gene-gene interactions (here we used protein-protein interactions) and expression correlations. We demonstrate our approach on endometrial cancer data between aggressive primary tumors and metastases. The detected differentially subnetworks show biological insights in metastatic settings and display interesting expression trends through tumor aggressiveness. A few subnetworks also have significant links to patient disease specific survival. The study provide exceptional discovery in metastatic context that is interesting for further follow-up studies.

TOP

OP24 (PT) - Rapamycin treatment of normal human fibroblasts increases the transcriptional abundance of genes involved in cytokine-cytokine receptor signaling
Theme: Systems Biology and Networks / Functional Genomics
Date: Sunday, July 12, 2:20 pm - 2:40 pmRoom: Wicklow Hall 2B

Presenting author: Kimberly MacKay, , Canada

Session Chair:

Presentation Overview: Show
Background: Rapamycin is an immunosuppressant drug that is currently used to prevent transplant organ rejection. It is additionally being investigated as a potential therapy for many other diseases. The effect it has on cytoplasmic and genomic function has been extensively studied in model organisms. However, it is unclear what affect rapamycin has on gene expression in normal human primary cells.

Objective: To determine the global impact rapamycin has on gene expression in normal human fibroblasts.

Methods: RNA-seq was performed on proliferative and rapamycin-treated human fibroblasts. SeqMonk was used to calculate the fold-change difference in transcriptional abundance by comparing the read counts of the two datasets. A protein interaction network was constructed based on the genes that had at least a 5-fold change in transcriptional abundance using Cytoscape and ReactomeFI. The resultant network was annotated using biological process, molecular function and cellular component terms from the Gene Ontology Consortium as well as pathway annotation terms from the Kyoto Encyclopedia of Genes and Genomes.

Conclusions: Rapamycin treatment of normal human fibroblasts resulted in 537 genes having a 5-fold or greater change in transcriptional abundance. The network analysis revealed a significant enrichment for genes associated with PI3K-AKT signaling, linking our observations to rapamycin’s established cytoplasmic target. The most significant pathway annotation was cytokine-cytokine receptor interaction with many of these genes belonging to the Interleukin-6 signaling pathway. It is possible that prolonged exposure to rapamycin and the production of cytokines like Interleukin-6 could produce sufficient cellular stress to drive normal human primary cells into senescence.

TOP

OP25 (PT) - DNA methylation-dependent transcription regulatory networks elucidate dynamics of transcription regulatory circuitry in cancers
Theme: Systems Biology and Networks / Epigenetics
Date: Sunday, July 12, 2:40 pm - 3:00 pmRoom: Wicklow Hall 2B

Presenting author: Xuerui Yang, , China

Session Chair:

Presentation Overview: Show
Context-dependent DNA methylation plays a critical role in regulating gene transcription, thereby serving as an important epigenetic marker or regulator in many biological processes and complex diseases such as cancer. However, previously DNA methylation has rarely been taken into account as a significant factor in most of the de novo reconstructions of cancer type-specific transcription regulatory networks. The present study was set to systematically assess the involvement of DNA methylation in transcription regulatory circuitry in cancer. We took advantages of the multi-dimensional profiling data of DNA methylations and gene expressions in tumors of different cancers in The Cancer Genome Atlas consortium, and developed an integrative analysis pipeline based on conditional mutual information, to quantify the cooperative regulatory effects of CpG site methylation and transcription factor activity on gene expressions. Our genome-wide analysis shows that DNA methylation and transcription factors indeed cooperate to control gene expressions. To map the interplay between these two major defining factors of gene expression, DNA Methylation-dependent Transcription Regulatory Network (MeTRN), the first of its kind, was assembled for each of 19 major cancer types, and broadly validated using public ChIP-seq and DNaseI-seq data. Comparison of these networks across cancer types showed that context-specificity of transcriptional circuits can be largely attributed to the context-dependent nature of DNA methylation patterns. In summary, MeTRN recapitulates an epigenetic scheme that implements dynamics of transcription regulatory circuitry across cancers via context-dependent DNA methylation marks, and thereby serves as a new basis for further mechanistic studies of gene expression dysregulations in cancers.

TOP

OP26 (PT) - Next generation sequencing of human tumor xenografts is significantly improved by prior depletion of mouse cells
Theme: Bioinformatics of Disease and Treatment / Genetic Variation Analysis
Date: Sunday, July 12, 2:40 pm - 3:00 pmRoom: Wicklow Hall 2B

Presenting author: Stefan Tomiuk, , Germany

Session Chair:

Presentation Overview: Show
Human tumor xenografts represent the gold standard method for research areas such as drug discovery, cancer stem cell biology, and metastasis prediction.
During the growth phase in vivo, xenografted tissue is vascularized and infiltrated by cells of mouse origin. Due to this, a strong impact of mouse-derived reads on downstream NGS analyses can be expected.
To overcome these limitations, we have developed a fast and easy method (MCD) allowing for the comprehensive depletion of mouse cells by using automated tissue dissociation and magnetic cell sorting (MACS). We have performed whole exome sequencing of bulk human tumor xenografts from lung, bladder, and kidney cancer, and compared the results to samples depleted of mouse cells. A significant increase in read counts (33%) was observed after MCD, indicating improved sample quality.
We mapped the reads of all samples against human and mouse genomes and determined their putative origin. An average of 12% of reads derived from non-depleted samples was assigned to mouse cells. This amount could be reduced to 0.3% by MCD.
A strong impact of MCD was observed on SNP calling: 63+/-10% of all SNPs predicted for the non-depleted samples could no longer be detected after MCD, 18+/-1% were specific for the depleted xenograft samples, probably due to higher coverage.
Taken together, MCD significantly improves the analysis of human tumor xenografts by NGS. As this effect was observed although a human sequence specific selection has been carried out during exome enrichment, the influence on whole genome and transcriptome sequencing is expected to be even more prominent.

TOP

OP27 (PT) - The Developmental Transcriptome for Lytechinus variegatus
Theme: Systems Biology and Networks / Sequence Analysis
Date: Sunday, July 12, 2:40 pm - 3:00 pmRoom: Wicklow Hall 2B

Presenting author: Emily Speranza, , United States

Session Chair:

Presentation Overview: Show
Embryonic development is arguably the most complex process an organism undergoes during its lifetime. Understanding development is best approached with a systems-level perspective. The sea urchin has become a valuable model organism for understanding developmental specification, morphogenesis, and evolution. As a non-chordate deuterostome, the sea urchin occupies an important evolutionary niche between protostomes and vertebrates. Lytechinus variegatus (Lv) is an Atlantic Ocean species that has been studied for a number of years, and has provided important insights into signal transduction, patterning, and morphogenetic changes during embryonic/larval development. The Pacific Ocean species, Strongylocentrotus purpuratus (Sp), is well-studied particularly for gene regulatory networks and cis-regulatory analyses. A well-annotated genome and transcriptome for Sp are available, but similar resources have not been developed for Lv. Here, we provide analysis of the Lv transcriptome at 11 time points during embryonic/larval development. Based on analysis for the expression of a conserved set of genes, we find that the late pluteus larval stage most closely matches the phylotypic vertebrate pharyngula stage, suggesting that conservation of this temporal gene expression pattern predates the appearance of the chordates. Using principal component analysis, we show that the major transitions in variation of embryonic transcription divide the developmental time series into four temporally sequential groups, which is corroborated by k-means cluster analysis, specification network analysis, and metabolic network analysis. Together, these analyses indicate that sea urchin development includes sequential intervals of relatively stable gene expression states punctuated by more abrupt transitions.

TOP

OP28 (PT) - Protein interaction interfaces and genetic variation
Theme: Protein Structure and function Prediction and Analysis / Genetic Variation Analysis
Date: Sunday, July 12, 3:30 pm - 3:50 pmRoom: Wicklow Hall 2B

Session Chair:

Presentation Overview: Show
There are currently more than 62 million Single Nucleotide Polymorphisms (SNPs) known and this number is doubling every two years stimulated by the falling cost of sequencing. Although many methods have been developed to predict the effect of non-synonymous SNPs on biological function and disease, few have focused on SNPs at protein-protein and protein-ligand interaction interfaces. Interfaces are essential sites for protein function and adaptation, and key in a majority of biological processes. The effects of non-disease intra- and inter-species variation occurring in such interaction surfaces remain mostly unexplored. The availability of over 105,000 protein three-dimensional structures allows the structural context of many SNPs at interfaces to be examined in atomic detail. Here, we present ProIntVar, a computational framework for mapping SNPs onto structure in order to study the features of variation at protein-protein and protein-ligand interfaces. ProIntVar allows the systematic analysis of genetic variation in protein structure interaction surfaces by integrating structural and sequencing data from several biological databases and resources. Genetic variants are analyzed in the context of functional families (FunFams), which are derived from structurally and functionally related protein domains classified in CATH (Class, Architecture, Topology, Homology). Examination of variation in protein interaction interfaces helps to infer which key residues are important for the function of the interface in a broader evolutionary sense. This approach has the potential to identify correlated adaptation, susceptibility to disease and unspecific protein-drug interactions in the human population that are due to sequence variation.

TOP

OP29 (PT) - Length-independent canonical forms of antibody Complementarity Determining Regions
Theme: Protein Structure and function Prediction and Analysis / other
Date: Sunday, July 12, 3:30 pm - 3:50 pmRoom: Wicklow Hall 2B

Presenting author: Jaroslaw Nowak, ,

Session Chair:

Presentation Overview: Show
Antibodies are Y-shaped proteins used by the immune system to bind and potentially neutralize foreign objects (antigens) that have entered the body. The antigen combining site of an antibody consists primarily of six hypervariable loops (L1-L3, H1-H3), known as the Complementarity Determining Regions or CDRs. Together, these determine an antibody's binding properties. Five out of the six CDRs (L1, L2, L3, H1, H2) form only a small number of discrete conformations called canonical classes. Previous work in this area assumes that CDRs of different lengths should, by default, belong to different classes. We exploited dynamic time warping, an algorithm originally designed for comparing temporal sequences varying in speed, to measure similarity between loops of different lengths and used density-based clustering to classify CDRs into length-independent canonical classes. The concept of length-independence allows us to cluster a larger number of CDRs into a smaller number of classes than the length dependent approach. In comparison to the length-dependent approach, it also improves the accuracy of canonical class prediction from sequence. We have also found that CDRs of different lengths that are co-clustered tend to show similar sequence patterns, even when they are coded by genes from different subgroups, pointing to a greater functional redundancy in the immune loci than previously known.

TOP

OP30 (PT) - Determining the winning SH3 coalition: how cooperative game theory reveals the importance of domain residues in peptide binding
Theme: Protein Structure and function Prediction and Analysis / Proteomics
Date: Sunday, July 12, 3:30 pm - 3:50 pmRoom: Wicklow Hall 2B

Presenting author: Ashley Conard, , United States

Session Chair:

Presentation Overview: Show
Cell signaling relies on protein-protein and protein-peptide interactions involving signaling domains, which typically recognize specific peptide motifs. For instance, SH3 domains bind preferably to proline-rich amino-acid motifs. Phage-display experiments allow one to determine those motifs and whether surface or core domain mutants gain or loose preference for peptide motifs. Here, we present an approach utilizing the Shapley Value (SV) from Cooperative Game Theory to determine the importance of seven residues in the Fyn SH3’s hydrophobic core. The core positions and the residues in those positions represent the players of a cooperative game in which the worth of each coalition is measured through its capacity to discriminate the binding and non-binding mutants for certain classes of peptides. The players (positions or residues) can be seen as the features of SH3 mutants in a binary classification task. Essentially, we use a feature selection method based on the SV to assign a pay-off to each core position and residue. We quantify their importance to promote peptide binding as well as their joint effects, and their interactions, represented through networks. Our results provide novel insights suggesting that the Fyn SH3 domain must contain different signatures of amino acids to promote binding to various peptide classes. This analysis highlights residue importance for proper domain function, which helps scale conservation profiles (e.g. WebLogo) by adding functionally relevant properties. These detailed pieces of information contribute an effective and novel approach to understanding the role core residues play, next to normally investigated binding-site residues, in binding specific peptides.

TOP

OP31 (PT) - Detecting small structural variants with SoftSV using soft-clipping information
Theme: Sequence Analysis / Genetic Variation Analysis
Date: Sunday, July 12, 3:50 pm - 4:10 pmRoom: Wicklow Hall 2B

Presenting author: Christoph Bartenhagen, , Germany

Session Chair:

Presentation Overview: Show
Numerous tools for the detection of structural variations (SVs) have been developed over the last years, including our own contribution called SoftSV. But there still remains a gap between small indels, which can be detected by gapped alignments, and large SVs (many hundred or thousand bp), which can be reconstructed by paired-end reads or read-depth information. Filling this gap remains difficult and often demands special algorithms for split-read alignments directly at the breakpoints, which only a few of the published tools do for this range of SVs.

We initially developed SoftSVs for large SVs and now expanded our approach to small and medium-sized deletions, tandem duplications and inversions (starting at 20bp). Similar to large rearrangements, we detect their exact breakpoints under the premise that no threshold filters SVs with low support or reads with low mapping quality or ambiguous mappings. Our greedy approach exploits any kind of soft-clipped alignment and reconstructs the breakpoint sequence just by comparing the soft-clipped reads at the start and end of an SV.

Using simulated and four real datasets from the 1000 Genomes Project, we evaluate the sensitivity and precision of SoftSV and four other tools. Our results show that sensitive and reliable SV detection is subject to many different factors like read length, coverage and SV type. SoftSV achieved sensitivities and PPVs between 80% and 100% consistently for all SV types on simulated datasets starting at 75bp reads and 10-15x sequence coverage, without requiring any parameter configuration by the user.

SoftSV is freely available at http://sourceforge.net/projects/softsv.

TOP

OP32 (PT) - Using reference-free compressed data structures to analyse thousands of human genomes
Theme: Sequence Analysis / Population Genetics, Variation and Evolution
Date: Sunday, July 12, 3:50 pm - 4:10 pmRoom: Wicklow Hall 2B

Presenting author: Thomas Keane, ,

Session Chair:

Presentation Overview: Show
We are rapidly approaching the point where we have sequenced the genomes of hundreds of thousands of human individuals. The scale up of human population sequencing has enabled us to detect sequence variants down to extremely low minor allele frequencies, explore ancient human lineages, and use genomics for screening of disease causing mutations. The Burrows-Wheeler transform (BWT) and FM-index have been widely employed as a highly compressed searchable data structure used by read aligners and for de novo assembly. We sought to explore the use of BWTs to store and compress the raw sequencing reads of 26 human populations from 2535 individuals in the 1000 Genomes Project. We show that it is possible to achieve compression ratios of 0.09 bytes per bp (including sample meta-data), much higher than any of the existing sequencing data formats. A key feature of this population BWT is that as more individuals are added to the structure, identical read sequences are observed and compression becomes ever more efficient. BWTs are inherently reference-free so one can rapidly query all the raw sequencing data for non-reference haplotypes and viral integrations. We use the BWT to assess the support in the raw data for the predicted 1000 Genomes haplotypes and investigate the population support along different versions of the human reference genome, and evaluate sequence specific to versions of the reference with and without support in the population. We develop methods to derive accurate genotypes for both single base variants and short indels reference free.

TOP

OP33 (PT) - Frequency-based haplotype reconstruction from deep sequencing data of bacterial populations
Theme: Sequence Analysis / Population Genetics, Variation and Evolution
Date: Sunday, July 12, 3:50 pm - 4:10 pmRoom: Wicklow Hall 2B

Presenting author: Sergio Pulido Tamayo, , Belgium

Session Chair:

Presentation Overview: Show
Clonal populations accumulate mutations over time, resulting in different haplotypes. Deep sequencing of such a population in principle provides information to reconstruct these haplotypes and the frequency at which the haplotypes occur. However, this reconstruction is technically not trivial, especially not in clonal systems with a relatively low mutation frequency. The low number of segregating sites in those systems add ambiguity to the haplotype phasing and thus obviates the reconstruction of genome-wide haplotypes based on sequence overlap information.

Therefore, we present EVORhA, a haplotype reconstruction method that complements phasing information in the non-empty read overlap with the frequency estimations of inferred local haplotypes. As was shown with simulated data, as soon as read lengths and/or mutation rates become restrictive for state-of-the-art methods, the use of this additional frequency information allows EVORhA to still reliably reconstruct genome-wide haplotypes. On real data, we show the applicability of the method in reconstructing the population composition of evolved bacterial populations and in decomposing mixed bacterial infections from clinical samples

TOP

OP34 (PT) - Simple Rapid RNA-seq Analysis with Unique Gapped q-Grams
Theme: Sequence Analysis / Functional Genomics
Date: Sunday, July 12, 4:10 pm - 4:40 pmRoom: Wicklow Hall 2B

Presenting author: Sven Rahmann, , Germany

Session Chair:

Presentation Overview: Show
We present a new simple approach to RNA-seq gene expression analysis that avoids separate read mapping and feature counting by constructing an index with the following property:
Each gene (exons and exon junctions) is represented by its q-grams (substrings of length q, e.g. q=16), or, more generally, by gapped q-grams with a given shape.
These sets of q-grams are reduced to gene-specific ones, i.e., all q-grams that occur in more than one gene are discarded.
Now each of the 4^q possible q-grams is either not present, specific for a gene, or present in more than one gene.
We build an index that recognises the specific q-grams and maps them to their respective genes.
Optimisation of the q-gram mask results in high sensitivity and specificity, as we show with several examples.

We iterate over a read's q-grams and count the number of hits to each gene. Careful analysis allows to pick the correct gene (or declare the read as unmappable or ambiguous) at unprecedented speed.
We thus obtain raw gene counts in a much simpler and computationally less demanding way than with standard approaches.

Further analysis (e.g., differential expression, implicated pathways) can proceed as before.
The poster compares the running time and the obtained counts resulting from our method and standard methods, showing that we achieve equivalent results with much less computational work.

We also outline possible extensions of the approach, including variant-tolerance and fusion gene detection.

TOP

OP35 (PT) - Nomenclature of the olfactory receptor gene family
Theme: Comparative Genomics / Sequence Analysis
Date: Sunday, July 12, 4:10 pm - 4:40 pmRoom: Wicklow Hall 2B

Presenting author: Tsviya Olender, , Israel

Session Chair:

Presentation Overview: Show
Olfactory Receptors (ORs) are G protein-coupled receptors with a crucial role in odor detection. There are ~1000 OR genes and pseudogenes in a typical mammalian genome, however the number of functional ORs varies among species reflecting their adaptation to different environments, a process which involves gene duplication/deletion events. While for human the current OR nomenclature is based on sequence similarity classification, for other mammals such a nomenclature has not yet been adopted, thus concealing important structural and functional insights. The difficulty stems from the complex orthology relationship among the ORs. We developed the Mutual Maximum Similarity (MMS) algorithm, a systematic classifier for assigning a human-based nomenclature to any OR gene based on detecting hierarchical similarity relationships between any two species. We used the MMS algorithm to compare mouse and rat OR repertoires to human, dog and opossum, and assigned a symbol for each rodent gene. In mouse, 31% of the symbols assigned were identical to human symbols, reflecting orthology. An additional 63% of the symbols were classified into pre-defined OR subfamilies; the remainder (6%) were classified into novel OR subfamilies. In rat, 86% of the genes were assigned the same symbol as their mouse ortholog. The suggested nomenclature was further supported by synteny and phylogenetic analyses. Using symbol comparison only we identified species-specific expansions in mouse, rat and human, demonstrating the power of this unified nomenclature system in generating a framework for studying mammalian OR evolution. This nomenclature will be expanded to other mammals in due course.

TOP

OP36 (PT) - Site-specific evolution of selected post-synaptic protein complexes
Theme: Sequence Analysis / Systems Biology and Networks
Date: Sunday, July 12, 4:10 pm - 4:40 pmRoom: Wicklow Hall 2B

Presenting author: Maciej Pajak, ,

Session Chair:

Presentation Overview: Show
Sequence conservation analysis of proteins belonging to the post-synaptic proteome (PSP) has previously revealed that key synaptic protein classes are present in primitive organisms preceding the emergence of nervous systems.
Recent studies suggest that evolution of the PSP may be responsible for the emergence of complex neural system function and behaviour but these analyses assess evolution only at the whole protein level.

We have developed an analysis workflow that integrates codon-resolution selection pressure estimates with domain and motif data to allow refinement of our understanding of domain-centric functionalisation of the PSP.

We show the application of this workflow to the Activity-regulated cytoskeleton protein (Arc) complex, a set of 26 Arc interacting proteins. Arc is highly conserved among placental mammals and plays a significant role in the post-synaptic density as a major regulator of long-term synaptic plasticity, the presumed molecular correlate of memory and learning.

Maximum likelihood phylogenetic inference for proteins of the Arc interactome, followed by site-by-site selection pressure analysis using a fixed effect likelihood methodology reveals a small set of positively selected sites as well as many regions under strong negative selection pressure. Mapping of these sites onto both known and predicted binding domains and post-translational modification sites allows inference of key domain-level functionalisation events during Arc complex evolution and provides a rational basis for prioritising regions for functional studies.

TOP

SP01 (PT) - James Joyce's Ulysses: A Bioinformatics Perspective
Theme:
Date: TBARoom: TBA

Presenting author: David Searls, ,

Session Chair:

Presentation Overview: Show

TOP

SS01-Part A (PT) - Translational Medicine: the current landscape and future directions
Theme:
Date: TBARoom: TBA

Presenting author: Winston Hide, ,

Session Chair:

Presentation Overview: Show

TOP

SS01-Part B (PT) - Standards and ontologies in harmonisation efforts of clinical data
Theme:
Date: TBARoom: TBA

Presenting author: Philippe Rocca-Serra, ,

Session Chair:

Presentation Overview: Show

TOP

SS01-Part C (PT) - eTRIKS, European Translational Information and Knowledge Management Services
Theme:
Date: TBARoom: TBA

Presenting author: David Henderson, ,

Session Chair:

Presentation Overview: Show

TOP

SS02-Part A (PT) - Chromosome organization & polymer entanglements: Insights from computer simulations
Theme:
Date: TBARoom: TBA

Presenting author: Angelo Rosa, ,

Session Chair:

Presentation Overview: Show

TOP

SS02-Part D (PT) - Assessing the limits of restraint-based 3D modeling of genomes and genomic domains.
Theme:
Date: TBARoom: TBA

Presenting author: Marc A. Marti-Renom, ,

Session Chair:

Presentation Overview: Show

TOP

SS02-Part E (PT) - Mining chromatin interactions: challenges in data integration and classification
Theme:
Date: TBARoom: TBA

Presenting author: Yoli Shavit, ,

Session Chair:

Presentation Overview: Show

TOP

SS04-Part A (PT) - Crowd-Sourced Benchmarking of Somatic Single Nucleotide Variant Detection
Theme:
Date: TBARoom: TBA

Presenting author: Paul Boutros, ,

Session Chair:

Presentation Overview: Show

TOP

SS04-Part B (PT) - Structural Variant Detection in DNA
Theme:
Date: TBARoom: TBA

Presenting author: Anna Lee, ,

Session Chair:

Presentation Overview: Show

TOP

SS04-Part C (PT) - novoBreak: robust characterization of structural breakpoints in cancer genomes
Theme:
Date: TBARoom: TBA

Presenting author: Ken Chen, ,

Session Chair:

Presentation Overview: Show

TOP

TP001 (PT) - Synthetic long read technologies in genome phasing and beyond
Theme: Genes
Date: Sunday, July 12, 10:10 am - 10:30 amRoom: The Liffey A

Presenting author: Volodymyr Kuleshov, Stanford University, United States

Dan Xie, Stanford University, Genetics
Rui Chen, Stanford University, Genetics
Dmitry Pushkarev, Stanford University, Physics
Zhihai M, Stanford University, Genetics
Tim Blauwkamp, Illumina, Inc., R&D
Michael Kertesz, Stanford University, Bioengineering
Serafim Batzoglou, Stanford University, Computer Science
Michael Snyder, Stanford University, Genetics

Session Chair: Siu Ming Yiu

Presentation Overview: Show
New synthetic long read technologies are finally offering us tools for studying unresolved aspects of the human genome such as structural variation and genomic phase. In this talk, we will present a new phasing technology based on Tru-seq synthetic long reads that places more than 99% of human genomic variants into highly accurate, megabase-long haplotypes. Its low cost and excellent performance bring haplotyping one step closer to being a routine tool for studying allele-specific phenomena such as differential methylation.

At the core of this technology is novel phasing algorithm called Prism that augments long-read phasing with statistical methods; this idea dramatically reduces sequencing requirements, increases haplotype length by almost 10x, and supports any long-read phasing technology. More generally, we will demonstrate through this as well as other ongoing work in metagenomics and de-novo assembly how synthetic long reads combined with sophisticated algorithms can help solve important problems in genomics.

TOP

TP002 (PT) - PAGER: Constructing PAGs and new PAG-PAG Relationships for Network Biology
Theme: Systems
Date: Sunday, July 12, 10:10 am - 10:30 amRoom: Liffey Hall 2

Presenting author: Zongliang Yue, Indiana University-Purdue University Indianapolis, United States

Madhura Kshirsagar, Indiana University-Purdue University Indianapolis, United States
Thanh Nguyen, Indiana University–Purdue University Indianapolis, United States
Thanh Nguyen, Indiana University–Purdue University Indianapolis, United States
Michael Neylon, Indiana University-Purdue University Indianapolis, United States
Liugen Zhu, Indiana University–Purdue University Indianapolis, United States
Timothy Ratliff, Purdue University, United States
Jake Chen, Indiana University-Purdue University Indianapolis, United States

Session Chair: Igor Jurisica

Presentation Overview: Show
In this paper, we described a new database framework to perform integrative “gene-set, network, and pathway analysis” (GNPA). In this framework, we integrated heterogeneous data on pathways, annotated list, and gene-sets (PAGs) into a PAG electronic repository (PAGER). PAGs in the database are organized into P-type, A-type, and G-type PAGs with a three-letter-code standard naming convention. The PAGER database currently compiles 44,313 genes from 5 species including human, 38,663 PAGs, 324,830 gene-gene relationships, and two types of 3,174,323 PAG-PAG regulatory relationships—co-membership based and regulatory relationship based. To help users assess each PAG’s biological relevance, we developed a cohesion measure called Cohesion Coefficient (CoCo), which is capable of disambiguating between biologically significant PAGs and random PAGs with an Area-Under-Curve (AUC) performance of 0.98. PAGER database was set up to help users to search and retrieve PAGs from its online web interface. PAGER enable advanced users to build PAG-PAG regulatory networks that provide complementary biological insights not found in gene set analysis or individual gene network analysis. We provide a case study using cancer functional genomics data sets to demonstrate how integrative GNPA help improve network biology data coverage and therefore biological interpretability.

The PAGER database can be accessible openly at http://discovery.informatics.iupui.edu/PAGER/.

TOP

TP003 (PT) - MSProGene - Integrative proteogenomics beyond six-frames and single nucleotide polymorphisms
Theme: Proteins
Date: Sunday, July 12, 10:10 am - 10:30 amRoom: The Liffey B

Presenting author: Franziska Zickmann, Robert Koch Institute, Germany

Bernhard Renard, Robert Koch Institute, Germany

Session Chair: Anna Tramontano

Presentation Overview: Show
Summary: Ongoing advances in high-throughput technologies have facilitated accurate proteomic measurements and provide a wealth of information on genomic and transcript level. In proteogenomics, this multi-omics data is combined to analyze unannotated organisms and to allow more accurate sample-specific predictions.
Existing analysis methods still mainly depend on six-frame translations or reference protein databases that are extended by transcriptomic information or known single nucleotide polymorphisms (SNPs). However, six-frames introduce an artificial six fold increase of the target database and SNP integration requires a suitable database summarizing results from previous experiments.
We overcome these limitations by introducing MSProGene, a new method for integrative proteogenomic analysis based on customized RNA-Seq driven transcript databases. MSProGene is independent from existing reference databases or annotated SNPs and avoids large six-frame translated databases by constructing sample-specific transcripts. In addition, it creates a network combining RNA-Seq and peptide information that is optimized by a maximum-flow algorithm. It thereby also allows resolving the ambiguity of shared peptides for protein inference.
We applied MSProGene on three data sets and show that it facilitates a database-independent reliable yet accurate prediction on gene and protein level and additionally identifies novel genes.

Availability: MSProGene is written in Java and Python. It is open source and available at http://sourceforge.net/projects/msprogene/

TOP

TP004 (PT) - Computational dissection of transcriptional heterogeneity in single-cell RNA-Seq studies
Theme: Genes
Date: Sunday, July 12, 10:30 am - 10:50 amRoom: The Liffey A

Presenting author: Oliver Stegle, EMBL European Bioinformatics Institute,

Session Chair: Siu Ming Yiu

Presentation Overview: Show
Many key biological processes are driven by differences in the regulatory landscape between single cells. Recent technical developments have enabled the transcriptomes of hundreds of cells to be assayed in an unbiased manner, opening up the possibility that new, and physiologically relevant, sub-populations of cells can be found. A key Bioinformatics challenge in analyzing these data is to comprehensively account for the different sources of variation between cells such that biologically relevant signatures can be reliably identified.

To address this, we here develop a computational approach to dissect single-cell transcriptome variation data, accounting for known and hidden sources of variation. We validate this latent variable model on single-cell data from labeled cellular states before applying it to study data generated from asynchronously differentiating T cells. By accounting for cell-to-cell correlations due to the cell cycle, we show how single-cell RNA-Seq data can be used to place individual cells on the trajectory between undifferentiated and differentiated cells.

TOP

TP005 (PT) - Metabolome-scale de novo pathway reconstruction using regioisomer-sensitive graph alignments
Theme: Systems
Date: Sunday, July 12, 10:30 am - 10:50 amRoom: Liffey Hall 2

Presenting author: Masaaki Kotera, Tokyo Institute of Technology, Japan

Yasuo Tabei, Japan Science and Technology Agency, Japan
Yoshihiro Yamanishi, Kyushu University, Japan

Session Chair: Igor Jurisica

Presentation Overview: Show
Motivation: Recent advances in mass spectrometry and the related metabolomics technology enable rapid and comprehensive analysis of a huge number of metabolites, however, biosynthetic and biodegra- dation pathways are known only for a small portion of metabolites, and most metabolic pathways remain uncharacterized.
Results: In this study, we develop a novel method for supervi- sed de novo metabolic pathway reconstruction with an improved graph alignment-based approach in the reaction-filling framework. We propose a novel chemical graph alignment algorithm, which we call PACHA (Pairwise Chemical Aligner), in order to detect regioisomer-sensitive connectivities between aligned substructures of two compound structures. Unlike other existing graph alignment methods, PACHA can efficiently detect only one common subgraph between two compounds. Our results show that the proposed method outperforms previous descriptor-based methods or existing graph alignment-based methods in the enzymatic reaction-likeness predic- tion for isomer-enriched reactions, and it is also useful for reaction annotation that assigns potential reaction characteristics such as EC numbers and PIERO terms to substrate-product pairs. Finally, we make a comprehensive enzymatic reaction-likeness prediction for all possible uncharacterized compound pairs, suggesting potential metabolic pathways of newly predicted substrate-product pairs.

TOP

TP006 (PT) - Revising human protein coding gene numbers
Theme: Proteins
Date: Sunday, July 12, 10:30 am - 10:50 amRoom: The Liffey B

Presenting author: Michael Tress, Spanish National Cancer Research Centre (CNIO), Spain

Alfonso Valencia, Spanish National Cancer Research Centre (CNIO), Structural and Computational Biology Programme
David Juan, Spanish National Cancer Research Centre (CNIO), Structural and Computational Biology Programme
Iakes Ezkurdia, Centro Nacional de Investigaciones Cardiovasculares, CNIC, 3. National Bioinformatics Institute (INB)
Jesus Vazquez, Centro Nacional de Investigaciones Cardiovasculares, CNIC, Laboratorio de Proteómica Cardiovascular
Adam Frankish, Wellcome Trust Sanger Institute, HAVANA group
Jennifer Harrow, Wellcome Trust Sanger Institute, HAVANA group
Mark Diekhans, University of California Santa Cruz (UCSC), Center for Biomolecular Science and Engineering
Jose Manuel Rodriguez, Spanish National Cancer Research Centre (CNIO), National Bioinformatics Institute (INB)

Session Chair: Anna Tramontano

Presentation Overview: Show
In this paper we mapped peptides from 7 large-scale proteomics studies to protein coding genes from the human genome. While we identified peptides for more than 96% of genes that evolved before bilateria, we did not find peptides for primate-specific genes, for genes without protein-like features or for genes with poor cross-species conservation. We described a set of 2,001 genes that were potentially non-coding based on features such as weak conservation, a lack of protein features, or ambiguous annotations from major databases, all of which correlated with low peptide detection across the seven experiments. We show that many of these genes behave more like non-coding genes than protein-coding genes, and suggest that many may not code for proteins. Their inclusion in the human protein coding gene catalogue is being revised as part of the ongoing human genome annotation effort.

TOP

TP007 (PT) - Scaffolding Draft Genomes with Nanopore Reads
Theme: Genes
Date: Sunday, July 12, 10:50 am - 11:10 amRoom: The Liffey A

Presenting author: Rene Warren, BC Cancer Agency, Canada

Rene Warren, BC Cancer Agency, Genome Sciences Centre
Benjamin Vandervalk, BC Cancer Agency, Genome Sciences Centre
Steven Jones, BC Cancer Agency, Genome Sciences Centre
Inanc Birol, BC Cancer Agency, Genome Sciences Centre

Session Chair: Siu Ming Yiu

Presentation Overview: Show

TOP

TP008 (PT) - Unbiased Metabolic Pathway Analysis of Large Networks by Metabolomics Integration
Theme: Systems
Date: Sunday, July 12, 10:50 am - 11:10 amRoom: Liffey Hall 2

Presenting author: Christian Jungreuthmayer, Austrian Centre of Industrial Biotechnology, Austria

Matthias Gerstl, Austrian Centre of Industrial Biotechnology, Metabolic Modeling Group
Juergen Zanghellini, Austrian Centre of Industrial Biotechnology, Metabolic Modeling Group

Session Chair: Igor Jurisica

Presentation Overview: Show
In the presentation we will introduce the theoretical concept of our novel approach, discuss the main aspects of its numerical implementation and illustrate the biological relevance. Then, we will give a brief demonstration of our toolkit, which is open source software and freely available for everyone from our website. Our presentation will go beyond published work in that we show that the number of relevant pathways can be reduced even further. By means of a novel method based on linear programming we show that only small subsets of all pathways can simultaneously carry a thermodynamically feasible flux.
We identify these phenotypically relevant subsets in a medium scale E. coli model and show that they are characterized by their ability to maximize biomass and ATP production, consistent with evolutionary interpretations of cell behavior.

TOP

TP009 (PT) - IgRepertoireConstructor: a novel algorithm for antibody repertoire construction and immunoproteogenomics analysis
Theme: Proteins
Date: Sunday, July 12, 10:50 am - 11:10 amRoom: The Liffey B

Presenting author: Yana Safonova, St. Petersburg State University, Russian Federation

Stefano Bonissone, University of California, San Diego, United States
Eugene Kurpilyansky, St. Petersburg Academic University, Russian Federation
Ekaterina Starostina, St. Petersburg State University, Russian Federation
Alla Lapidus, St. Petersburg State University, Russian Federation
Jeremy Stinson, Genentech, United States
Laura Depalatis, Genentech, United States
Wendy Sandoval, Genentech, United States
Jennie Lill, Genentech, United States
Pavel Pevzner, University of California, San Diego, United States

Session Chair: Anna Tramontano

Presentation Overview: Show
The analysis of concentrations of circulating antibodies in serum (antibody repertoire) is a fundamental, yet poorly studied, problem in immunoinformatics. The two current approaches to the analysis of antibody repertoires (next generation sequencing (NGS) and mass spectrometry (MS)) present difficult computational challenges since antibodies are not directly encoded in the germline but are extensively diversified by somatic recombination and hypermutations. Therefore,
the protein database required for the interpretation of spectra from circulating antibodies is custom for each individual. While such a database can be constructed via NGS, the reads generated by NGS are error-prone and even a single nucleotide error precludes identification of a peptide by the standard proteomics tools. Here, we present the IgRepertoireConstructor algorithm that performs error-correction of immunosequencing reads and uses mass spectra to validate the constructed antibody repertoires.

Availability: IgRepertoireConstructor is open source and freely available as a C++ and Python program running on all Unix-compatible platforms.
The source code is available from http://bioinf.spbau.ru/igtools.
Contact: ppevzner@University of California, San Diego.edu

TOP

TP010 (PT) - Quality score compression improves genotyping accuracy
Theme: Genes
Date: Sunday, July 12, 11:40 am - 12:00 pmRoom: The Liffey A

Presenting author: Deniz Yorukoglu, Massachusetts Institute of Technology, United States

Y. William Yu, Massachusetts Institute of Technology, Mathematics
Jian Peng, Massachusetts Institute of Technology, Mathematics

Session Chair: Siu Ming Yiu

Presentation Overview: Show
In this presentation, we show how to recover quality information directly from sequence data using the compression tool “Quartz,” rendering such scores redundant and yielding substantially better space and time efficiencies for storage and analysis. Quartz is designed to operate on reads in FASTQ format but can be trivially modified to discard quality scores in other formats with sequence-quality score pairs.

Discarding 95% of quality scores counterintuitively resulted in improved genotyping, implying that compression need not come at the expense of accuracy. In contrast to previous results, we show that although completely discarding quality scores comes at the cost of accuracy and quality score recalibration to improve variant calling accuracy generally decreases compressibility, there is a happy medium at which we can get both good compression and improved accuracy.

TOP

TP011 (PT) - Refined elasticity sampling for Monte Carlo-based identification of stabilizing network patterns
Theme: Systems
Date: Sunday, July 12, 11:40 am - 12:00 pmRoom: Liffey Hall 2

Presenting author: Dorothee Childs, European Molecular Biology Laboratory, Heidelberg, Germany

Sergio Grimbs, Jacobs University Bremen, Germany
Joachim Selbig, University of Potsdam and Max-Planck Institute for Molecular Plant Physiology, Germany

Session Chair: Igor Jurisica

Presentation Overview: Show
Motivation: Structural kinetic modeling (SKM) is a framework to analyse whether a metabolic steady state remains stable under perturbation, without requiring detailed knowledge about individual rate equations.
It provides a representation of the system`s Jacobian matrix that depends solely on the network structure, steady state measurements, and the elasticities at the steady state.
For a measured steady state, stability criteria can be derived by generating a large number of structural kinetic models (SK-models) with randomly sampled elasticities and evaluating the resulting Jacobian matrices. The elasticity space can be analysed statistically in order to detect network positions that contribute significantly to the perturbation response.
Here we extend this approach by examining the kinetic feasibility of the elasticity combinations created during Monte Carlo sampling.

Results: Using a set of small example systems, we show that the majority of sampled SK-models would yield negative kinetic parameters if they were translated back into kinetic models. To overcome this problem, a simple criterion is formulated that mitigates such infeasible models.
After evaluating the small example pathways, the methodology was used to study two steady states of the neuronal TCA cycle and the intrinsic mechanisms responsible for their stability or instability. The findings of the statistical elasticity analysis confirm that several elasticities are jointly coordinated to control stability and that the main source for potential instabilities are mutations in the enzyme alpha-ketoglutarate dehydrogenase.

TOP

TP012 (PT) - Bumps and traffic lights along the translation of secretory proteins
Theme: Proteins
Date: Sunday, July 12, 11:40 am - 12:00 pmRoom: The Liffey B

Presenting author: Michal Linial, The Hebrew University of Jerusalem, Israel

Session Chair: Anna Tramontano

Presentation Overview: Show
Protein translation is the most expensive operation. Therefore, managing the speed and allocation of resources is tightly controlled. In this study we show that the entire proteome in yeast, fly, human, worm, plant and cow do not show the unique properties at the N-terminal segment while a signal is associated with the Signal peptide (SP) containing proteins. We found pattern in the N-terminal for slowing down the translation rate for SP proteome. We critically analyze these observations from statistical and evolutionary perspectives. We generalize our observation to other groups of proteins that govern by the ‘speed controls’. Specifically, the pattern of codons and their prevalence was tested for GPI-anchored and mitochondrial Transit peptide containing proteins. In all cases, a “speed control” pattern is recorded for all tested organisms. We conclude that tuning the translation of a nascent protein is essential for coping with the constraints imposed by proteins’ cellular fate.

TOP

TP013 (PT) - De Novo Meta-Assembly of Ultra-deep Sequencing Data
Theme: Genes
Date: Sunday, July 12, 12:00 pm - 12:20 pmRoom: The Liffey A

Presenting author: Stefano Lonardi, University of California, Riverside, United States

Hamid Mirebrahim, University of California, Riverside, United States
Timothy J. Close, University of California, Riverside, United States

Session Chair: Siu Ming Yiu

Presentation Overview: Show
We introduce a new divide and conquer approach to deal with the problem of de novo genome assembly in the presence of ultra-deep sequencing data (i.e., coverage of 1,000x or higher). Our proposed meta-assembler SLICEMBLER partitions the input data into optimal- sized “slices” and uses a standard assembly tool (e.g., Velvet, SPAdes, IDBA, Ray) to assemble each slice individually. SLICEMBLER uses majority voting among the individual assemblies to identify long contigs that can be merged to the consensus assembly.

To improve its efficiency, SLICEMBLER uses a generalized suffix tree to identify these frequent contigs (or fraction thereof). Extensive experimental results on real ultra-deep sequencing data (8,000x coverage) and simulated data show that SLICEMBLER significantly improves the quali- ty of the assembly compared to the performance of the base as- sembler. In fact, most of the times SLICEMBLER generates error-free assemblies. We also show that SLICEMBLER is much more resistant against high sequencing error rate than the base assembler. SLICEMBLER can be accessed at http://slicembler.cs.ucr.edu/

TOP

TP014 (PT) - Dynamic networks reveal key players in aging
Theme: Systems
Date: Sunday, July 12, 12:00 pm - 12:20 pmRoom: Liffey Hall 2

Presenting author: Tijana Milenkovic, University of Notre Dame, United States

Fazle Faisal, University of Notre Dame, Department of Computer Science and Engineering
Han Zhao, University of Notre Dame, Department of Computer Science and Engineering

Session Chair: Igor Jurisica

Presentation Overview: Show
Studying human aging is of societal importance. Analyses of gene expression or sequence data have been indispensible for studying human aging. But these typically ignore interconnectivities between genes (proteins). Since proteins interact to keep us alive, and since this is what biological networks (BNs) model, BN research will further our understanding of aging. Because different data types can give complementary biological insights, we integrate current static BNs with aging-related expression data to form dynamic, age-specific BNs. Then, we study cellular changes with age from such BNs to identify key players in aging. Also, analogous to sequence alignment, we use BN alignment to transfer aging-related knowledge from well-studied model species to poorly-studied human between conserved network regions. In the process, we propose a novel superior BN alignment method. We validate the aging-related candidates resulting from our integrative, dynamic, and comparative BN analyses by linking them to aging-related cellular processes and diseases.

TOP

TP015 (PT) - Protein (Multi-) Location Prediction: Utilizing Interdependencies via a Generative Model
Theme: Proteins
Date: Sunday, July 12, 12:00 pm - 12:20 pmRoom: The Liffey B

Presenting author: Hagit Shatkay, University of Delaware, United States

Sebastian Briesemeister, University of Tuebingen, Germany
Oliver Kohlbacher, University of Tuebingen, Germany
Ramanuja Simha , University of Delaware, United States

Session Chair: Anna Tramontano

Presentation Overview: Show
Motivation: Proteins are responsible for a multitude of vital tasks in all living organisms. Given that a protein's function and role are strongly related to its subcellular location, protein location prediction is an important research area. While proteins move from one location to another and can localize to multiple locations, most existing location prediction systems assign only a single location per protein. A few recent systems attempt to predict multiple locations for proteins, however, their performance leaves much room for improvement. Moreover, such systems do not capture dependencies among locations and usually consider locations as independent. We hypothesize that a multi-location predictor that captures location inter-dependencies can improve location predictions for proteins.

Results:
We introduce a probabilistic generative model for protein localization, and develop a system based on it – which we call MDLoc – that utilizes inter-dependencies among locations to predict multiple locations for proteins. The model captures location inter-dependencies using Bayesian networks and represents dependency between features and locations using a mixture model. We use iterative processes for learning model parameters and for estimating protein locations. We evaluate our classifier MDLoc, on a dataset of single- and multi-localized proteins derived from the DBMLoc dataset, which is the most comprehensive protein multi-localization dataset currently available. Our results, obtained by using MDLoc, significantly improve upon results obtained by an initial simpler classifier, as well as on results reported by other top systems.

MDLoc is available at: http://www.eecis.udel.edu/~compbio/mdloc.

TOP

TP016 (PT) - Misassembly Detection using Paired-End Sequence Reads and Optical Mapping Data
Theme: Genes
Date: Sunday, July 12, 12:20 pm - 12:40 pmRoom: The Liffey A

Presenting author: Martin Muggli, Colorado State University, United States

Simon Puglisi, University of Helsinki, Finland
Roy Ronen, University of California, San Diego, United States
Christina Boucher, Colorado State University, United States

Session Chair: Siu Ming Yiu

Presentation Overview: Show
Motivation: A crucial problem in genome assembly is the discov- ery and correction of misassembly errors in draft genomes. We develop a method called MISSEQUEL that enhances the quality of draft genomes by identifying misassembly errors and their break- points using paired-end sequence reads and optical mapping data. Our method also fulfills the critical need for open source compu- tational methods for analyzing optical mapping data. We apply our method to various assemblies of the loblolly pine, Francisella tularen- sis, rice and budgerigar genomes. We generated and used stimulated optical mapping data for loblolly pine and Francisella tularensis, and used real optical mapping data for rice and budgerigar.

Results: Our results demonstrate that we detect more than 54% of extensively misassembled contigs and more than 60% of locally misassembed contigs in assemblies of Francisella tularensis, and between 31% and 100% of extensively misassembled contigs and between 57% and 73% of locally misassembed contigs in assemblies of loblolly pine. Using the real optical mapping data, we correctly iden- tified 75% of extensively misassembled contigs and 100% of locally misassembled contigs in rice, and 77% of extensively misassembled contigs and 80% of locally misassembled contigs in budgerigar.

Availability: MISSEQUEL can be used as a post-processing step in combination with any genome assembler and is freely available at http://www.cs.colostate.edu/seq/

TOP

TP017 (PT) - Understanding multicellular function and disease with human tissue-specific networks
Theme: Systems
Date: Sunday, July 12, 12:20 pm - 12:40 pmRoom: Liffey Hall 2

Presenting author: Aaron Wong, Princeton University, United States

Arjun Krishnan, Princeton University, Lewis-Sigler Institute for Integrative Genomics
Casey Greene, Dartmouth, Department of Genetics, Norris Cotton Cancer Center, Institute for Quantitative Biomedical Sciences, The Geisel School of Medicine
Emanuela Ricciotti, University of Pennsylvania, Department of Pharmacology, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine
Rene Zelaya, Dartmouth, Department of Genetics, The Geisel School of Medicine
Daniel Himmelstein, University of California, San Francisco, Biology and Medical Informatics
Ran Zhang, Princeton University, Department of Molecular Biology
Boris Hartmann, Icahn School of Medicine at Mount Sinai, Department of Neurology
Elena Zaslavsky, Icahn School of Medicine at Mount Sinai, Department of Neurology
Stuart Sealfon, Icahn School of Medicine at Mount Sinai, Department of Neurology
Daniel Chasman, Brigham and Women's Hospital and Harvard Medical School, Division of Preventive Medicine
Garret FitzGerald, University of Pennsylvania, Perleman School of Medicine, Department of Pharmacology, Institute for Translational Medicine & Therapeutics
Kara Dolinski, Princeton University, Lewis-Sigler Institute for Integrative Genomics
Tilo Grosser, University of Pennsylvania, Institute for Translational Medicine and Therapeutics
Olga Troyanskaya, Princeton University, Department of Computer Science, Lewis-Sigler Institute for Integrative Genomics

Session Chair: Igor Jurisica

Presentation Overview: Show
Tissue and cell-type identity lie at the core of human physiology and disease. Understanding the genetic underpinnings of complex tissues and individual cell lineages is crucial for developing improved diagnostics and therapeutics. We present genome-wide functional interaction networks for 144 human tissues and cell types developed using a data-driven Bayesian methodology that integrates thousands of diverse experiments spanning tissue and disease states. Tissue-specific networks predict lineage-specific responses to perturbation and reveal genes’ changing functional roles across tissues. We introduce NetWAS, which combines genes with nominally significant GWAS p-values and tissue-specific networks to identify disease-gene associations more accurately than GWAS alone. Our webserver, GIANT (http://giant.princeton.edu), provides an interface to human tissue networks through multi-gene queries, network visualization, analysis tools including NetWAS, and downloadable networks. GIANT enables systematic exploration of the landscape of interacting genes across more than one hundred human tissues and cell types.

TOP

TP018 (PT) - Global view of the protein universe
Theme: Proteins
Date: Sunday, July 12, 12:20 pm - 12:40 pmRoom: The Liffey B

Presenting author: Rachel Kolodny, University of Haifa, Israel

Nir Ben-Tal, George S. Wise Faculty of Life Sciences, Tel Aviv University, Department of Biochemistry and Molecular Biology
Sergey Nepomnyachiy, Polytechnic Institute of New York University, Department of Computer Science and Engineering

Session Chair: Anna Tramontano

Presentation Overview: Show
To globally explore protein space, we represent all similarities among a representative set of domains as networks. In the “domain network” edges connect domains that share “motifs,” i.e., significantly sized segments of similar sequence and structure, and in the “motif network” edges connect recurring motifs that appear in the same domain. These networks offer a way to organize protein space, and examine how the definition of “evolutionary relatedness” among domains influences their structure. At excessively strict thresholds the networks falls apart; for very lax thresholds, there are network paths between virtually all domains. Interestingly, at intermediate thresholds the network constitutes two regions: "discrete" versus “continuous.” The discrete region consists of isolated islands, each generally corresponding to a fold; the continuous region is dominated by domains with alternating alpha and beta elements. The networks can also suggest evolutionary paths between domains, and be used for protein search and design.

TOP

TP019 (PT) - Developments to the Combined Annotation Dependent Depletion (CADD) framework for estimating deleteriousness of human genetic variation
Theme: Disease
Date: Sunday, July 12, 2:00 pm - 2:20 pmRoom: The Auditorium

Presenting author: Martin Kircher, University of Washington, United States

Daniela Witten, University of Washington, Biostatistics
Preti Jain, Columbia University, Pathology and Cell Biology
Brian O'Roak, Oregon Health and Science University, School of Medicine
Gregory Cooper, HudsonAlpha Institute for Biotechnology, -
Jay Shendure, University of Washington, Genome Sciences

Session Chair: Yana Bromberg

Presentation Overview: Show
The interpretation of human genetic variation on a genome-wide scale is a crucial challenge in both research and clinical settings. Available annotations tend to exploit a single information type (e.g. conservation) and/or are restricted in scope (e.g. missense changes). We developed a broadly applicable metric that objectively weights and integrates the large, diverse, and otherwise unwieldy collection of annotation data available. Combined Annotation Dependent Depletion (CADD) integrates these annotations by contrasting variants that survived natural selection with simulated mutations. We show that CADD-based scores correlate with allelic diversity, pathogenicity of both coding and non-coding variants, and experimentally measured regulatory effects, and also highly rank causal variants within individual genome sequences. We pre-compute SNV scores for the whole human genome and enable scoring of short InDels (http://cadd.gs.washington.edu). We describe our method and discuss the integration of additional annotations as well as methodological improvements that we have made over the last year.

TOP

TP020 (PT) - Structural features of the 5-colors Drosophila chromatin types
Theme: Genes
Date: Sunday, July 12, 2:00 pm - 2:20 pmRoom: The Liffey A

Presenting author: Davide Bau, National Center for Genomic Analysis, Spain

Session Chair: Reinhard Schneider

Presentation Overview: Show
Advances in genomic technologies and the development of new analytical methods (e.g. Hi-C) have allowed to get better insights into how the genome is organized inside the cell nucleus. Recently, it has been shown that chromatin is organized in Topologically Associating Domains (TADs), large interacting domains that are conserved among different cell types.
The Drosophila genome is also folded into TADs, which are packaged into a mosaic of five principal chromatin types, defined by a unique combination of proteins. The five types of chromatin differ substantially in their genome coverage, numbers of domains, and numbers of genes [1]. To determine whether these TADs correspond to functional domains defined by epigenetic marks, Hou et al. [2], examined the composition of chromatin types within physical domains, following the 5-colors classification described in [1]. To figure out whether these “chromatin color blocks” have characteristic structural features, we studied the relationship between the 3D architecture of selected regions of the Drosophila genome and their chromatin color. Using Hi-C data at 10 Kb resolution, we found that the analyzed regions have structural features characteristic of their functional signatures. Although with the present data resolution it is not possible to unambiguously distinguish between different chromatin types by simple comparison of their structural features, our results show that different chromatin type have specific structural characteristics that correlate with their functional roles, with active and inactive chromatin type showing significantly different structural characteristics.

[1] Filion et al. Cell, 143(2), 212–224.
[2] Hou et al. Molecular Cell, 48(3), 471–484.

TOP

TP021 (PT) - Genome-wide detection of intervals of genetic heterogeneity associated with complex traits
Theme: Systems
Date: Sunday, July 12, 2:00 pm - 2:20 pmRoom: Liffey Hall 2

Presenting author: Felipe Llinares-Lopez, ETH Zürich, Switzerland

Dominik Grimm, ETH Zürich, Switzerland
Dean Bodenham, ETH Zurich, Switzerland
Udo Gieraths, ETH Zurich, Switzerland
Mahito Sugiyama, Osaka University, Japan
Beth Rowan, Max Planck Institut for Developmental Biology, Germany
Karsten Borgwardt, ETH Zurich, Switzerland

Session Chair: Nicolas Le Novere

Presentation Overview: Show
Motivation: Genetic heterogeneity, the fact that several sequence variants give rise to the same phenotype, is a phenomenon that is of the utmost interest in the analysis of complex phenotypes. Current approaches for finding regions in the genome that exhibit genetic heterogeneity suffer from at least one of two shortcomings: 1) they require the definition of an exact interval in the genome that is to be tested for genetic heterogeneity, potentially missing intervals of high relevance, or 2) they suffer from an enormous multiple hypothesis testing problem due to the large number of potential candidate intervals being tested, which results in either many false positives or a lack of power to detect true intervals.

Results: Here, we present an approach that overcomes both problems: It allows one to automatically find all contiguous sequences of SNPs in the genome that are jointly associated with the phenotype. It also solves both the inherent computational efficiency problem and the statistical problem of multiple hypothesis testing, which are both caused by the huge number of candidate intervals. We demonstrate on Arabidopsis thaliana GWAS data that our approach can discover regions that exhibit genetic heterogeneity and would be missed by single-locus mapping.
Conclusions: Our novel approach can contribute to the genome- wide discovery of intervals that are involved in the genetic heterogeneity underlying complex phenotypes.

Availability: The code can be obtained at: http://www.bsse.ethz.ch/mlcb/research/bioinformatics-and-computational-biology/sis.html
Contact: felipe.llinares@bsse.ethz.ch

TOP

TP022 (PT) - Inferring mechanism of DNA double-strand break formation using sequencing data
Theme: Genes / Disease
Date: Sunday, July 12, 2:00 pm - 2:20 pmRoom: The Liffey B

Presenting author: Maga Rowicka, University of Texas Medical Branch, United States

Maga Rowicka, University of Texas Medical Branch, Biochemistry and Molecular Biology

Session Chair: Janet Kelso

Presentation Overview: Show
Double-stranded DNA breaks (DSBs) are most dangerous form of DNA damage. Despite many studies on the mechanisms of DSB formation, our knowledge of them is very incomplete, due to lack of appropriate techniques to detect DSBs accurately genome-wide. We recently developed a method to label DSBs in situ followed by deep sequencing (BLESS), and used it to map DSBs in human cells with a resolution 2-3 orders of magnitude better than previously achieved. Here, we will show how mathematical modelling and numerical simulations can elucidate and quantify various mechanisms of DSB formation. This paradigm of using in silico experiments as a method of choice for discovery and quantification of global, genome-wide rules and chromatin context dependence should be also beneficial for other systems studied using omics data.

TOP

TP023 (PT) - Big Data, AI, and Evolution: Towards a Calculus for Precision Medicine
Theme: Disease
Date: Sunday, July 12, 2:20 pm - 2:40 pmRoom: The Auditorium

Presenting author: Olivier Lichtarge, Baylor College of Medicine, United States

Martin Lisewski, Baylor College of Medicine, Human and Molecular Genetics
Angela Wilkins, Baylor College of Medicine, Human and Molecular Genetics
Panagiotis Katsonis, Baylor College of Medicine, Human and Molecular Genetics

Session Chair: Yana Bromberg

Presentation Overview: Show
Slide 1 will break the problem of computing personalized therapy into steps, each one a paper. Slides 2-4 will discuss the Cell paper: a network compression approach to integrate and analyze structured Big Data, from databases, culminating with the discovery of the target and mechanism of the best anti-malarial drug with use for future drug screens. Slide 5-6, will expand integration to unstructured data from the entire literature using AI, with an application to p53 biology. Slide 7-9 will turn to the inclusion of personalized information into the network by scoring accurately individual genome variations. Illustrations will summarize winning performance to predict deleterious mutations at the CAGI blind competition and application to head and neck cancer. Slide 10 will summarize the strategy, key results and future directions.

TOP

TP024 (PT) - A comparative encyclopedia of DNAelements in the mouse genome
Theme: Genes
Date: Sunday, July 12, 2:20 pm - 2:40 pmRoom: The Liffey A

Presenting author: Feng Yue, The Pennsylvania State University, United States

Yong Cheng, Stanford University, Genetics
Alessandra Breschi, Centre for Genomic Regulation and UPF, Bioinformatics and Genomics Group
Jeff Vierstra, University of Washington, Genome Sciences
Weisheng Wu, Computational Medicine & Bioinformatics, Michigan
Tyrone Ryba, New College of Florida, Biology
Ricard Sandstrom, University of Washington, Genome Sciences
Zhihai Ma, Stanford University, Genetics
Carrie Davis, Cold Spring Harbor Laboratory, Functional Genomics
Benjamin Pope, Florida State University, Biological Science
Yin Shen, University of California San Diego, Ludwig Institute for Cancer Research
John Stamatoyannopoulos, University of Washington, Genome Sciences
Michael Snyder, Stanford University, Genetics
Roderic Guigo, Centre for Genomic Regulation and UPF, Bioinformatics and Genomics Group
Thomas Gingeras, Cold Spring Harbor Laboratory, Functional Genomics
David Gilbert, Florida State University, Biological Science
Ross Hardison, The Pennsylvania State University, Department of Biochemistry and Molecular Biology and Center for Comparative Genomics and Bioinformatics

Session Chair: Reinhard Schneider

Presentation Overview: Show
As the premier model organism in biomedical research, the laboratory mouse shares the vast majority of protein-coding genes with humans, but significant differences exist between the two mammals, posing considerable challenges in the modeling of human diseases. The mouse ENCODE consortium produced more than1000 coordinated datasets, including transcription, DNase I hypersensitivity, transcription factor binding, chromatin modifications and replication domains, in over 100 mouse cell types and tissues. By comparative analysis with the data from human ENCODE, we found that although the majority of gene expression and cis-regulatory elements are conserved between the two species, a large degree of gene regulatory elements appear to be species-specific and these species-specific elements are enriched for genes involved in certain pathways such as immune system and metabolic process, suggesting different gene pathways evolve at distinct rates. Our work also provides a great resource for research into mammalian biology and mechanisms of human disease.

TOP

TP025 (PT) - Identification of causal genes for complex traits
Theme: Systems
Date: Sunday, July 12, 2:20 pm - 2:40 pmRoom: Liffey Hall 2

Presenting author: Farhad Hormozdiari, University of California, Los Angeles, United States

Gleb Kichaev, University of California, Los Angeles, United States
Wen-Yun Yang, University of California, Los Angeles, United States
Bogdan Pasaniuc, University of California, Los Angeles, United States
Eleazar Eskin, University of California, Los Angeles, United States

Session Chair: Nicolas Le Novere

Presentation Overview: Show
Motivation: Although genome-wide association studies (GWAS)
have identified thousands of variants associated with common
diseases and complex traits, only a handful of these variants
are validated to be causal. We consider “causal variants” as
variants which are responsible for the association signal at a
locus. As opposed to association studies that benefit from linkage
disequilibrium (LD), the main challenge in identifying causal variants
at associated loci lies in distinguishing among the many closely
correlated variants due to LD. This is particularly important for model
organisms such as inbred mice, where LD extends much further than
in human populations, resulting in large stretches of the genome
with significantly associated variants. Furthermore, these model
organisms are highly structured, and require correction for population
structure to remove potential spurious associations.

Results: In this work, we propose CAVIAR-Gene, a novel method
that is able to operate across large LD regions of the genome while
also correcting for population structure. A key feature of our approach
is that it provides as output a minimally sized set of genes that
captures the genes which harbor causal variants with probability .
Through extensive simulations, we demonstrate that our method not
only speeds up computation, but also have an average of 10% higher
recall rate compared to the existing approaches. We validate our
method using a real mouse high-density lipoprotein data (HDL) and
show that CAVIAR-Gene is able to identify Apoa2 (a gene known to
harbor causal variants for HDL), while reducing the number of genes
that need to be tested for functionality by a factor of 2.

The software is freely available for download at genetics.cs.University of California, Los Angeles.edu/caviar

TOP

TP026 (PT) - Comparing Genomes with Rearrangements and Segmental Duplications
Theme: Genes / Disease
Date: Sunday, July 12, 2:20 pm - 2:40 pmRoom: The Liffey B

Presenting author: Mingfu Shao, EPFL, Switzerland

Bernard Moret, EPFL, Switzerland

Session Chair: Janet Kelso

Presentation Overview: Show
Motivation: Large-scale evolutionary events such as genomic rearrangements and segmental duplications form an important part of the evolution of genomes and are widely studied from both biological and computational perspectives. A basic computational problem is to infer these events in the evolutionary history for given modern genomes, a task for which many algorithms have been proposed under various constraints. Algorithms that can handle both rearrangements and content-modifying events such as duplications and losses remain few and limited in their applicability.

Results:We study the comparison of two genomes under a model including general rearrangements (through DCJ) and segmental duplications. We formulate the comparison as an optimization problem, and describe an exact algorithm to solve it by using an integer linear program. We also devise a sufficient condition and an efficient algorithm to identify optimal substructures, which can simplify the problem while preserving optimality. Using the optimal substructures with the ILP formulation yields an exact, yet practical, algorithm -- the first practical method to provide exact solutions to the problem of comparing two arbitrary genomes under rearrangements and duplications. We then apply our algorithm to assign in-paralogs and orthologs (a necessary step in handling duplications), and compare its performance with that of the state-of-the-art method MSOAR (an approximation method), using both simulations and real data. On simulated datasets our method outperforms MSOAR by a significant margin, and on 5 well-annotated species, MSOAR achieves high accuracy, yet our method performs slightly better on each of the 10 pairwise comparisons.

Availability: http://lcbb.epfl.ch/softwares/coser
Contact: mingfu.shao@epfl.ch

TOP

TP027 (PT) - Optimizing cancer genome sequencing and analysis
Theme: Disease
Date: Sunday, July 12, 2:40 pm - 3:00 pmRoom: The Auditorium

Presenting author: Malachi Griffith, Washington University, United States

Malachi Griffith, Washington University, The Genome Institute
Christopher Miller, Washington University, The Genome Institute
Obi Griffith, Washington University, The Genome Institute
Kilannin Krysiak, Washington University, The Genome Institute
Zachary Skidmore, Washington University, The Genome Institute
Avinash Ramu, Washington University, The Genome Institute
Jason Walker, Washington University, The Genome Institute
Ha Dang, Washington University, The Genome Institute
Lee Trani, Washington University, The Genome Institute
David Larson, Washington University, The Genome Institute
Ryan Demeter, Washington University, The Genome Institute
Michael Wendl, Washington University, The Genome Institute
Rachel Austin, Washington University, The Genome Institute
Vincent Magrini, Washington University, The Genome Institute
Sean McGrath, Washington University, The Genome Institute
Amy Ly, Washington University, The Genome Institute
Shashikant Kulkarni, Washington University, The Genome Institute
Joshua McMichael, Washington University, The Genome Institute
Matt Cordes, Washington University, The Genome Institute
Catrina Fronick, Washington University, The Genome Institute
Robert Fulton, Washington University, The Genome Institute
Christopher Maher, Washington University, The Genome Institute
Li Ding, Washington University, The Genome Institute
Jeffery Klco, Washington University, The Genome Institute
Elaine Mardis, Washington University, The Genome Institute
Timothy Ley, Washington University, The Genome Institute
Richard Wilson, Washington University, The Genome Institute

Session Chair: Yana Bromberg

Presentation Overview: Show
Tumors are typically sequenced to depths of 75-100x (exome) or 30-50x (whole genome). We demonstrate that current sequencing paradigms based on this coverage are inadequate for tumors that are impure, aneuploid, and/or clonally heterogeneous. To reassess optimal sequencing strategies, we performed ultra-deep (up to ~312x) whole genome sequencing and exome capture (up to ~433x) of a primary acute myeloid leukemia, its subsequent relapse, and a matched normal skin sample. We tested 7 alignment algorithms and 7 single-nucleotide variant callers, and validated ~200,000 putative SNVs by sequencing them to mean depths of ~1,000x. Additional targeted sequencing provided over 10,000x coverage and ddPCR assays provided up to ~250,000x sampling of selected sites (of up to 2 ug of input DNA per assay). Using these data, we evaluated the effects of different library generation approaches, depth of sequencing, and analysis strategies on the ability to effectively characterize a complex tumor. This dataset, representing the most comprehensively sequenced tumor described to date, will serve as an invaluable community resource.

TOP

TP028 (PT) - Cypiripi: exact genotyping of CYP2D6 using High Throughput Sequencing Data
Theme: Genes
Date: Sunday, July 12, 2:40 pm - 3:00 pmRoom: The Liffey A

Presenting author: Salem Malikic, Simon Fraser University, Canada

Ibrahim Numanagić, Simon Fraser University, Canada
Victoria Pratt, Indiana University School of Medicine, United States
Todd Skaar, IUPUI, United States
David A. Flockhart, Indiana University School of Medicine, United States
S. Cenk Sahinalp, Simon Fraser University, Canada

Session Chair: Reinhard Schneider

Presentation Overview: Show
Motivation: CYP2D6 is highly polymporphic gene which encodes the (CYP2D6) enzyme, involved in the metabolism of 20-25% of all clinically prescribed drugs and other xenobiotics in the human body. CYP2D6 genotyping is recommended prior to treatment decisions involving one or more of the numerous drugs sensitive to CYP2D6 allelic composition. In this context High Throughput Sequencing (HTS) technologies provide a promising time-efficient and cost- effective alternative to currently used genotyping techniques. In order to achieve accurate interpretation of HTS data, however, one needs to overcome several obstacles such as high sequence similarity and genetic recombinations between CYP2D6 and evolutionarily related pseudogenes CYP2D7 and CYP2D8, high copy number variation among individuals, and short read lengths generated by HTS technologies.

Results: In this work, we present the first algorithm to computationally infer CYP2D6 genotype at basepair resolution from HTS data. Our algorithm is able to resolve complex genotypes, including alleles that are the products of duplication, deletion and fusion events involving CYP2D6 and its evolutionarily related cousin CYP2D7. Through extensive experiments using simulated and real datasets we show that our algorithm accurately solves this important problem with potential clinical implications.

Availability: Cypiripi is available at http://sfu-compbio.github.io/cypiripi.
Contact: S. Cenk Sahinalp (cenk@sfu.ca)

TOP

TP029 (PT) - Exploring disease etiology through a large-scale mapping of deleterious genes to cell types
Theme: Systems
Date: Sunday, July 12, 2:40 pm - 3:00 pmRoom: Liffey Hall 2

Presenting author: Alex Cornish, Imperial College London,

Ioannis Filippis, Imperial College London, Life Sciences
Alessia David, Imperial College London, Life Sciences
Michael Sternberg, Imperial College London, Life Sciences

Session Chair: Nicolas Le Novere

Presentation Overview: Show
While the majority of diseases are manifested within a specific anatomical structure, known disease-associated alleles are often inherited and therefore present throughout the body. Understanding how these ubiquitous alleles produce localized disease is key to understanding the mechanisms that drive disease. We have developed a novel approach, called gene set compactness (GSC), that contrasts the relative positions of disease-associated genes on cell type-specific interactomes to identify the cell types most likely to be affected by the alleles. Cell type-specific interactomes were created through the integration of protein-protein interaction (PPI) data and cell type-specific expression data from the FANTOM5 project. We conducted text-mining of the PubMed database to produce an independent map of disease-associated cell types, which we used to validate our method. Our method identifies previously-suggested associations, along with associations that warrant further study. These include mast cells and multiple sclerosis (MS); a population of cells that is currently being targeted in an MS phase 2 clinical trial. Furthermore, we used the associations identified by our method to construct a pathogenic cell type-based diseasome, offering insight into diseases linked by common etiology. The dataset produced represents the first large-scale mapping of diseases to their pathogenic cell types. Overall, we demonstrate that the GSC method links disease-associated genes to the phenotypes they produce; one of the key goals of systems biology.

TOP

TP030 (PT) - 3D hotspots of recurrent retroviral insertions reveal long-range interactions with cancer genes
Theme: Genes / Disease
Date: Sunday, July 12, 2:40 pm - 3:00 pmRoom: The Liffey B

Presenting author: Jeroen de Ridder, Delft University of Technology, Netherlands

Sepideh Babaei, Delft University of Technology, Delft Bioinformatics Lab
Waseem Akhtar, Netherlands Cancer Institute, Division of Molecular Genetics
Johann de Jong, Netherlands Cancer Institute, Division of Molecular Carcinogenesis
Marcel Reinders, Delft University of Technology, Delft Bioinformatics Lab

Session Chair: Janet Kelso

Presentation Overview: Show
Genomically distal mutations can contribute to deregulation of cancer genes by engaging in chromatin interactions. To study this, we overlay viral cancer-causing insertions obtained in a murine retroviral insertional mutagenesis screen with genome-wide chromatin conformation capture data. In this talk, we show that insertions tend to cluster in 3D hotspots within the nucleus. The identified hotspots are significantly enriched for known cancer genes, and bear the expected characteristics of bona-fide regulatory interactions, such as enrichment for transcription factor binding sites. Additionally, we observe a striking pattern of mutual exclusive integration. This is an indication that insertions in these loci target the same gene, either in their linear genomic vicinity or in their 3D spatial vicinity. Our findings shed new light on the repertoire of targets obtained from insertional mutagenesis screening and underlines the importance of considering the genome as a 3D structure when studying effects of genomic perturbations.

TOP

TP031 (PT) - Dynamic changes of RNA-sequencing expression for precision medicine: N-of-1-pathways Mahalanobis distance within pathways of single subjects predicts breast cancer survival
Theme: Disease
Date: Sunday, July 12, 3:30 pm - 3:50 pmRoom: The Auditorium

Presenting author: A. Grant Schissler, The University of Arizona, United States

Vincent Gardeux, The University of Arizona, United States
Qike Li, The University of Arizona, United States
Ikbel Achour, The University of Arizona, United States
Haiquan Li, The University of Arizona, United States
Walter W Piegorsch, The University of Arizona, United States
Yves A Lussier, The University of Arizona, United States

Session Chair: Yana Bromberg

Presentation Overview: Show
Motivation: The conventional approach to personalized medicine relies on molecular data analytics across multiple patients. The path to precision medicine lies with molecular data analytics that can discover interpretable single-subject signals (N-of-1). We developed a global framework, N-of-1-pathways, for a mechanistic-anchored approach to single-subject gene expression data analysis. We pre-viously employed a metric that could prioritize the statistical signifi-cance of a deregulated pathway in single subjects, however, it lacked in quantitative interpretability (e.g., the equivalent to a gene expression fold-change).

Results: In this study, we extend our previous approach with the application of statistical Mahalanobis distance to quantify personal pathway-level deregulation. We demonstrate that this approach, N-of-1-pathways Paired Samples Mahalanobis Distance (N-OF-1-PATHWAYS-MD), detects deregulated pathways (empirical simula-tions), while not inflating false positive rate using a study with biolog-ical replicates. Finally, we establish that N-OF-1-PATHWAYS-MD scores are, biologically significant, clinically relevant, and are predic-tive of breast cancer survival (p<0.05, n=80 invasive carcinoma; TCGA RNA-sequences).

Conclusion: N-of-1-pathways MD provides a practical approach towards precision medicine. The method generates the magnitude and the biological significance of personal deregulated pathways results derived solely from the patient’s transcriptome. These path-ways offer the opportunities for deriving clinically actionable deci-sions that have the potential to complement the clinical interpretabil-ity of personal polymorphisms obtained from DNA acquired or inher-ited polymorphisms and mutations. In addition, it offers an opportuni-ty for applicability to diseases in which DNA changes may not be relevant, and thus expand the “interpretable ‘omics” of single sub-jects (e.g. personalome).

Availability: http://www.lussierlab.net/publications/N-of-1-pathways
Contact: yves@email.arizona.edu

TOP

TP032 (PT) - Insights into Secondary Metabolism from a Global Analysis of Prokaryotic Biosynthetic Gene Clusters
Theme: Genes / Proteins
Date: Sunday, July 12, 3:30 pm - 3:50 pmRoom: The Liffey A

Presenting author: Marnix Medema, Wageningen University, Netherlands

Peter Cimermancic, University of California, San Francisco, Department of Bioengineering and Therapeutic Sciences and the California Institute for Quantitative Biosciences
Jan Claesen, University of California, San Francisco, Department of Bioengineering and Therapeutic Sciences and the California Institute for Quantitative Biosciences
Kenji Kurita, University of California, Santa Cruz, Department of Chemistry and Biochemistry
Eriko Takano, University of Manchester, Manchester Institute of Biotechnology, Faculty of Life Sciences
Andrej Sali, University of California, San Francisco, Department of Bioengineering and Therapeutic Sciences and the California Institute for Quantitative Biosciences
Roger Linington, University of California, Santa Cruz, Department of Chemistry and Biochemistry
Michael Fischbach, University of California, San Francisco, Department of Bioengineering and Therapeutic Sciences and the California Institute for Quantitative Biosciences

Session Chair: Reinhard Schneider

Presentation Overview: Show
Bacterial secondary metabolism is of major importance to society, as it is the source of large numbers of antibiotics, anticancer agents, and other important bioactive compounds. The genes encoding the biosynthetic pathways to make these molecules are usually grouped together on the chromosome in so-called biosynthetic gene clusters (BGCs). In our recent paper (Cell 158: 412-421, 2014), we describe a novel algorithm to effectively identify BGCs, and apply this to perform a systematic analysis of BGCs throughout the prokaryotic tree of life. Network analysis of the predicted BGCs revealed numerous large gene cluster families, most of which are uncharacterized. We experimentally characterized the largest of these, which is widespread among bacteria and encodes the biosynthesis of molecules that appear to protect their hosts against oxidative stress. Finally, a detailed evolutionary genomic analysis of all known and predicted BGCs revealed how the astonishing molecular diversity of microbial secondary metabolism continuously evolves.

TOP

TP033 (PT) - Integrative Random Forest for Gene Regulatory Network Inference
Theme: Systems
Date: Sunday, July 12, 3:30 pm - 3:50 pmRoom: Liffey Hall 2

Presenting author: Francesca Petralia, Icahn School of Medicine at Mount Sinai, United States

Pei Wang, Icahn School of Medicine at Mount Sinai, United States
Jialiang Yang, Icahn School of Medicine at Mount Sinai, United States
Zhidong Tu, Icahn School of Medicine at Mount Sinai, United States

Session Chair: Nicolas Le Novere

Presentation Overview: Show

TOP

TP034 (PT) - In silico phenotyping via co-training for improved phenotype prediction from genotype
Theme: Genes / Disease
Date: Sunday, July 12, 3:30 pm - 3:50 pmRoom: The Liffey B

Presenting author: Damian Roqueiro, ETH Zurich, Switzerland

Menno Witteveen, ETH Zurich, Switzerland
Verneri Anttila, Broad Institute of MIT and Harvard, United States
Gisela Terwindt, Leiden University Medical Center, Netherlands
Arn van den Maagdenberg, Leiden University Medical Center, Netherlands
Karsten Borgwardt, ETH Zurich, Switzerland

Session Chair: Janet Kelso

Presentation Overview: Show
Motivation: Predicting disease phenotypes from genotypes is a key challenge in medical applications in the postgenomic era. Large training datasets of patients that have been both genotyped and phenotyped are the key requisite when aiming for high prediction accuracy. With current genotyping projects producing genetic data for hundreds of thousands of patients, large-scale phenotyping has become the bottleneck in disease phenotype prediction.

Results: Here we present an approach for imputing missing disease phenotypes given the genotype of a patient. Our approach is based on co-training, which predicts the phenotype of unlabeled patients based on a second class of information, e.g. clinical health record information. Augmenting training datasets by this type of in silico phenotyping can lead to significant improvements in prediction accuracy. We demonstrate this on a dataset of patients with two diagnostic types of migraine, termed migraine with aura and migraine without aura, from the International Headache Genetics Consortium.

Conclusions: Imputing missing disease phenotypes for patients via co-training leads to larger training datasets and improved prediction accuracy in phenotype prediction.

TOP

TP035 (PT) - FERAL: Network Based Classifier with Application to Breast Cancer Outcome Prediction
Theme: Disease
Date: Sunday, July 12, 3:50 pm - 4:10 pmRoom: The Auditorium

Presenting author: Amin Allahyar, Delft University of Technology, Netherlands

Jeroen De Ridder, Delft University of Technology, Netherlands

Session Chair: Yana Bromberg

Presentation Overview: Show
Motivation: Breast cancer outcome prediction based on gene expression profiles is an important strategy for personalize patient care. To improve performance and consistency of discovered markers of the intial molecular classifiers, Network based Outcome Prediction methods (NOPs) have been proposed. In spite of the initial claims, recent studies revealed that neither performance nor consistency can be improved using these methods. NOPs typically rely on the construction of meta-genes by averaging the expression of several genes connected in a network that encodes protein interactions or pathway information. In this paper, we expose several fundamental issues in NOPs that impede on the prediction power, consistency of discovered markers and obscures biological interpretation.

Results: To overcome these issues, we propose FERAL, a network- based classifier that hinges upon the Sparse Group Lasso which performs simultaneous selection of marker genes and training of the prediction model. An important feature of FERAL, and a significant departure from existing NOPs, is that is uses multiple operators to summarize genes into meta-genes. This gives the classifier the opportunity to select the most relevant meta-gene for each gene set. Extensive evaluation revealed that the discovered markers are markedly more stable across independent datasets. Moreover, interpretation of the marker genes detected by FERAL reveals valuable mechanistic insight into the aetiology of breast cancer.

TOP

TP036 (PT) - Understanding operon evolution using an event-driven model and phylogenetic visualizatons
Theme: Genes / Proteins
Date: Sunday, July 12, 3:50 pm - 4:10 pmRoom: The Liffey A

Presenting author: Iddo Friedberg, Miami University, United States

David Ream, Miami University, Microbiology
Asma Bankapur, Miami University, Microbiology

Session Chair: Reinhard Schneider

Presentation Overview: Show
Gene blocks are genes co-located on the chromosome. In many cases, genes blocks are conserved between bacterial species, sometimes as operons, when genes are co-transcribed. The conservation is rarely absolute: gene loss, gain, duplication,
block splitting, and block fusion are frequently observed. An open question in bacterial molecular evolution is that of the formation and breakup of gene blocks, for which several models have been proposed. These models, however, are not generally applicable to all types of gene blocks, and consequently cannot be used to broadly compare and study gene block evolution. To address this problem we introduce an event-based
method for tracking gene block evolution in bacteria.

In my talk will explain this method, and demonstrate a new visualization technique we call phylomatrices. I will show how we can easily gauge operon conservation, and discover interesting clade-based aberrations as well as horizontal gene transfers.

TOP

TP037 (PT) - Gene network inference by fusing data from diverse distributions
Theme: Systems
Date: Sunday, July 12, 3:50 pm - 4:10 pmRoom: Liffey Hall 2

Presenting author: Marinka Zitnik, University of Ljubljana, Slovenia

Blaz Zupan, University of Ljubljana, Slovenia

Session Chair: Nicolas Le Novere

Presentation Overview: Show
Motivation: Markov networks are undirected graphical models that are widely used to infer relations between genes from experimental data. Their state-of-the-art inference procedures assume the data arise from a Gaussian distribution. High-throughput omics data, such as that from next generation sequencing, often violates this assumption. Furthermore, when collected data arise from multiple related but otherwise nonidentical distributions, their underlying networks are likely to have common features. New principled statistical approaches are needed that can deal with different data distributions and jointly consider collections of data sets.

Results: We present FuseNet, a Markov network formulation that infers networks from a collection of nonidentically distributed data sets. Our approach is computationally efficient and general: given any number of distributions from an exponential family, FuseNet represents model parameters through shared latent factors that define neighborhoods of network nodes. In a simulation study we demonstrate good predictive performance of FuseNet in comparison to several popular graphical models. We show its effectiveness in an application to breast cancer RNA-sequencing and somatic mutation data, a novel application of graphical models. Fusion of data sets offers substantial gains relative to inference of separate networks for each data set. Our results demonstrate that network inference methods for non-Gaussian data can help in accurate modeling of the data generated by emergent high-throughput technologies.

TOP

TP038 (PT) - The human splicing code reveals new insights into the genetic determinants of disease
Theme: Genes / Disease
Date: Sunday, July 12, 3:50 pm - 4:10 pmRoom: The Liffey B

Presenting author: Brendan Frey, University of Toronto, Canada

Hui Xiong, University of Toronto, Engineering and Medicine
Babak Alipanahi, University of Toronto, Engineering and Medicine
Leo Lee, University of Toronto, Engineering and Medicine
Hannes Bretschneider, University of Toronto, Engineering and Medicine
Daniele Merico, University of Toronto, Engineering and Medicine
Ryan Yuen, University of Toronto, Engineering and Medicine
Yimin Hua, University of Toronto, Engineering and Medicine
Serge Gueroussov, University of Toronto, Engineering and Medicine
Hamed Najafabadi, University of Toronto, Engineering and Medicine
Tim Hughes, University of Toronto, Engineering and Medicine
Quaid Morris, University of Toronto, Engineering and Medicine
Yoseph Barash, University of Toronto, Engineering and Medicine
Adrian Krainer, University of Toronto, Engineering and Medicine
Nebojsa Jojic, University of Toronto, Engineering and Medicine
Steve Scherer, University of Toronto, Engineering and Medicine
Ben Blencowe, University of Toronto, Engineering and Medicine

Session Chair: Janet Kelso

Presentation Overview: Show
To facilitate precision medicine and whole-genome annotation, we developed a machine-learning technique that scores how strongly genetic variants affect RNA splicing, whose alteration contributes to many diseases. Analysis of more than 650,000 intronic and exonic variants revealed widespread patterns of mutation-driven aberrant splicing. Intronic disease mutations that are more than 30 nucleotides from any splice site alter splicing nine times as often as common variants, and missense exonic disease mutations that have the least impact on protein function are five times as likely as others to alter splicing. We detected tens of thousands of disease-causing mutations, including those involved in cancers and spinal muscular atrophy. Examination of intronic and exonic variants found using whole-genome sequencing of individuals with autism revealed misspliced genes with neurodevelopmental phenotypes. Our approach provides evidence for causal variants and should enable new discoveries in precision medicine.

TOP

TP039 (PT) - Integrating Different Data Types by Regularized Unsupervised Multiple Kernel Learning with Application to Cancer Subtype Discovery
Theme: Disease
Date: Sunday, July 12, 4:10 pm - 4:30 pmRoom: The Auditorium

Presenting author: Nora Katharina Speicher, Max Planck Institute for Informatics, Germany

Nico Pfeifer, Max Planck Institute for Informatics, Germany

Session Chair: Yana Bromberg

Presentation Overview: Show
Despite ongoing cancer research, available therapies are still limited in quantity and effectiveness, and making treatment decisions for individual patients remains a hard problem. Established subtypes, which help guide these decisions, are mainly based on individual data types. However, the analysis of multidimensional patient data involving the measurements of various molecular features could reveal intrinsic characteristics of the tumor. Large-scale projects accumulate this kind of data for various cancer types, but we still lack the computational methods to reliably integrate this information in a meaningful manner. Therefore, we apply and extend current multiple kernel learning for dimensionality reduction approaches. On the one hand, we add a regularization term to avoid overfitting during the optimization procedure, and on the other hand, we show that one can even use several kernels per data type and thereby alleviate the user from having to choose the best kernel functions and kernel parameters for each data type beforehand.

We have identified biologically meaningful subgroups for five different cancer types. Survival analysis has revealed significant differences between the survival times of the identified subtypes, with P-values comparable or even better than state-of-the-art methods. Moreover, our resulting subtypes reflect combined patterns from the different data sources, and we demonstrate that input kernel matrices with only little information have less impact on the integrated kernel matrix. Our subtypes show different responses to specific therapies, which could eventually assist in treatment decision making.

TOP

TP040 (PT) - Deconvolving Molecular Signatures of Interactions Between Microbial Colonies
Theme: Genes / Proteins
Date: Sunday, July 12, 4:10 pm - 4:30 pmRoom: The Liffey A

Presenting author: Yeu-Chern Harn, University of North Carolina, Chapel Hill, United States

Matthew Powers, University of North Carolina, Chapel Hill, United States
Elizabeth Shank, University of North Carolina, Chapel Hill, United States
Vladimir Jojic, University of North Carolina, Chapel Hill, United States

Session Chair: Reinhard Schneider

Presentation Overview: Show
Motivation: The interactions between microbial colonies through chemical signaling is not well understood. A microbial colony can use different molecules to inhibit or accelerate the growth of other colonies. A better understanding of the molecules involved in these interactions could lead to advancements in health and medicine. Imaging mass spectrometry (IMS) applied to co-cultured microbial communities aims to capture the spatial characteristics of the colonies’ molecular ﬁngerprints. These data are high-dimensional and require computational analysis methods to interpret.

Results: Here we present a dictionary learning method that deconvolves spectra of different molecules from IMS data. We call this method MOLecular Dictionary Learning (MOLDL). Unlike standard dictionary learning methods which assume Gaussian-distributed data, our method uses the Poisson distribution to capture the count nature of the mass spectrometry data. Also, our method incorporates universally applicable information on common ion types of molecules in MALDI mass spectrometry. This greatly reduces model parametrization and increases deconvolution accuracy by eliminating spurious solutions. Moreover, our method leverages the spatial nature of IMS data by assuming that nearby locations share similar abundances, thus avoiding overﬁtting to noise. Tests on simulated data sets show that this method has good performance in recovering molecule dictionaries. We also tested our method on real data measured on a microbial community composed of two species. We conﬁrmed through follow-up validation experiments that our method recovered true and complete signatures of molecules. These results indicate that our method can discover molecules in IMS data reliably, and hence can help advance the study of interaction of microbial colonies.

Availability : The code used in this paper is available at: https://github.com/frizfealer/IMS_project

TOP

TP041 (PT) - Inferring orthologous gene regulatory networks using interspecies data fusion
Theme: Systems
Date: Sunday, July 12, 4:10 pm - 4:30 pmRoom: Liffey Hall 2

Presenting author: Christopher Penfold, University of Warwick, United Kingdom

Jonathan Millar, University of Warwick, United Kingdom
David Wild, University of Warwick, United Kingdom

Session Chair: Nicolas Le Novere

Presentation Overview: Show
Motivation: The ability to jointly learn gene regulatory networks (GRNs) in, or leverage GRNs between, related species would allow the vast amount of legacy data obtained in model organisms to inform the GRNs of more complex, or economically or medically relevant counterparts. Examples include transferring information from Arabidopsis thaliana into related crop species for food security purposes, or from mice into humans for medical applications. Here we develop two related Bayesian approaches to network inference that allow GRNs to be jointly inferred in, or leveraged between, several related species: in one framework network information is directly propagated between species; in the second hierarchical approach, network information is propagated via an unobserved “hypernetwork”. In both frameworks information about network similarity is captured via graph kernels, with the networks additionally informed by species- specific time series gene expression data, when available, using Gaussian processes to model the dynamics of gene expression.

Results: Results on in silico benchmarks demonstrate that joint inference, and leveraging of known networks between species, offers better accuracy than stand alone inference. The direct propagation of of network information via the non-hierarchical framework is more appropriate when there are relatively few species, whilst the hierarchical approach is better suited when there are many species. Both methods are robust to small amounts of mislabelling of orthologues. Finally the use of S.cerevisiae data and networks to inform inference of networks in the budding yeast S.pombe predicts a novel role in cell cycle regulation for Gas1 (SPAC19B12.02c), a 1,3-beta-glucanosyltransferase.

Availability: Matlab code is available from a temporary anonymous url for peer review http://www2.warwick.ac.uk/fac/sci/systemsbiology/research/software/

TOP

TP042 (PT) - Associating enhancers with TH2 memory differentiation and asthma susceptibility
Theme: Genese / Disease
Date: Sunday, July 12, 4:10 pm - 4:30 pmRoom: Liffey Hall 2

Presenting author: Lukas Chavez, German Cancer Research Institute, Germany

Session Chair: Janet Kelso

Presentation Overview: Show
A characteristic feature of asthma is the aberrant accumulation, differentiation or function of memory CD4(+) T cells that produce type 2 cytokines (TH2 cells). By mapping genome-wide histone modification profiles for subsets of T cells isolated from peripheral blood of healthy and asthmatic individuals, we identified enhancers with known and potential roles in the normal differentiation of human TH1 and TH2 cells. We discovered disease-specific enhancers in T cells that differ between healthy and asthmatic individuals. Enhancers that gained the histone H3 Lys4 dimethyl (H3K4me2) mark during TH2 cell development showed the highest enrichment for asthma-associated single nucleotide polymorphisms (SNPs), which supported a pathogenic role for TH2 cells in asthma. In silico analysis of cell-specific enhancers revealed transcription factors, microRNAs and genes potentially linked to human TH2 cell differentiation. Our results establish the feasibility and utility of enhancer profiling in well-defined populations of specialized cell types involved in disease pathogenesis.

TOP

TP043 (PT) - Reconstructing 16S rRNA genes in metagenomic data
Theme: Genes
Date: Monday, July 13, 10:10 am - 10:30 amRoom: The Liffey A

Presenting author: Yanni Sun, Michigan State University, United States

Jikai Lei, Michigan State University, United States
James Cole, Michigan State University, United States
Cheng Yuan, Michigan State University, United States

Session Chair: Jerome Waldispuhl

Presentation Overview: Show
Metagenomic data, which contains sequenced DNA reads of uncultured microbial species from environmental samples, provide a unique opportunity to thoroughly analyze microbial species that have never been identified before. Reconstructing 16S ribosomal RNA, a phylogenetic marker gene, is usually required to analyze the composition of the metagenomic data. However, massive volume of dataset, high sequence similarity between related species, skewed microbial abundance, and lack of reference genes make 16S rRNA reconstruction difficult. Generic de novo assembly tools are not optimized for assembling 16S rRNA genes.
In this work, we introduce a targeted rRNA assembly tool, REAGO (REconstruct 16S ribosomal RNA Genes from metagenOmic data). It addresses the above challenges by combining secondary structure-aware homology search, properties of rRNA genes, and de novo assembly. Our experimental results show that our tool can correctly recover more rRNA genes than several popular generic metagenomic assembly tools and specially designed rRNA construction tools.

Availability: The source code of REAGO is freely available at github.
Contact: chengy@msu.edu and yannisun@msu.edu

TOP

TP044 (PT) - Inferring parental genomic ancestries using pooled semi-Markov processes
Theme: Systems
Date: Monday, July 13, 10:10 am - 10:30 amRoom: The Liffey B

Presenting author: James Zou, Microsoft Research, United States

Eran Halperin, Tel Aviv University, Israel
Esteban Burchard, University of California San Francisco, United States
Sriram Sankararaman, Harvard Medical School, United States

Session Chair: Hidde de Jong

Presentation Overview: Show
Motivation: A basic problem of broad public and scientific interest is to use the DNA of an individual to infer the genomic ancestries of the parents. In particular, we are often interested in the fraction of each parent's genome that come from specific ancestries (e.g. European, African, Native American, etc). This has many applications ranging from understanding the inheritance of ancestry-related risks and traits to quantifying human assortative mating patterns.

Results: We model the problem of parental genomic ancestry inference as a pooled semi-Markov process. We develop a general mathematical framework for pooled semi-Markov processes and construct efficient inference algorithms for these models. Applying our inference algorithm to genotype data from 231 Mexican trios and 258 Puerto Rican trios where we have the true genomic ancestry of each parent, we demonstrate that our method accurately infers parameters of the semi-Markov processes and parents' genomic ancestries. We additionally validated the method on simulations. Our model of pooled semi-Markov process and inference algorithms may be of independent interest in other settings in genomics and machine learning.

TOP

TP045 (PT) - A hierarchical Bayesian model for flexible module discovery in three-way time series data
Theme: Disease / Other
Date: Monday, July 13, 10:10 am - 10:30 amRoom: Liffey Hall 2

Presenting author: David Amar, Tel Aviv University, Israel

Daniel Yekutieli, Tel Aviv University, Israel
Adi Maron-Katz, Tel Aviv University, Israel
Talma Hendler, Tel Aviv University, Israel
Ron Shamir, Tel Aviv University, Israel

Session Chair: Yves Moreau

Presentation Overview: Show
Motivation: Detecting modules of coordinated activity is fundamental in the analysis of large biological studies. For two-dimensional data (e.g. genes x patients) this is often done via clustering or biclustering. More recently, studies monitoring patients over time have added another dimension. Analysis is much more challenging in this case, especially when time measurements are not synchronized. New methods that can analyze 3-way data are thus needed.

Results: We present a new algorithm for finding coherent and flexible modules in 3-way data. Our method can identify both core modules that appear in multiple patients and patient-specific augmentations of these core modules that contain additional genes. Our algorithm is based on a hierarchical Bayesian data model and Gibbs sampling. The algorithm outperforms extant methods on both simulated and real data.The method successfully dissected key components of septic shock response from time series measurements of gene expression. Detected patient-specific module augmentations were informative for disease outcome. In analyzing brain fMRI time series of subjects at rest, it detected the pertinent brain regions involved.

Availability: R code and data are available at http://acgt.cs.tau.ac.il/twigs/

TOP

TP046 (PT) - A genome-wide map of hyper-edited RNA reveals numerous new sites
Theme: Genes
Date: Monday, July 13, 10:30 am - 10:50 amRoom: The Liffey A

Presenting author: Erez Levanon, Bar-Ilan University, Israel

Hagit Porath, Bar-Ilan University, Faculty of Life Sciences
Shai Carmi , Columbia University, Department of Computer Science

Session Chair: Jerome Waldispuhl

Presentation Overview: Show
Adenosine-to-inosine editing is one of the most frequent post-transcriptional modifications, manifested as A-to-G mismatches when comparing RNA sequences with their source DNA. Recently, a number of RNA seq data sets have been screened for the presence of A-to-G editing, and hundreds of thousands of editing sites identified. Here we show that existing screens missed the majority of sites by ignoring reads with excessive ('hyper') editing that do not easily align to the genome. We show that careful alignment and examination of the unmapped reads in RNA-seq studies in human reveal numerous new sites, usually many more than originally discovered, and in precisely those regions that are most heavily edited. Specifically, we more than double the number of detected sites in several published screens. We also identify thousands of new sites in mouse, rat, opossum and fly. Our results establish that hyper-editing events account for the majority of editing sites.

TOP

TP047 (PT) - Adapt-Mix: Learning local genetic correlation structure improves summary statistics based analyses
Theme: Systems
Date: Monday, July 13, 10:30 am - 10:50 amRoom: The Liffey B

Presenting author: Brielin Brown, University of California at Berkeley, United States

Celeste Eng, University of California San Francisco, United States
Scott Huntsman, University of California San Francisco, United States
Donglei Hu, University of California San Francisco, United States
Dara Torgerson, University of California San Francisco, United States
Esteban Burchard, University of California, San Francisco, United States
Noah Zaitlen, University of California San Francisco, United States
Danny Park, University of California San Francisco, United States

Session Chair: Hidde de Jong

Presentation Overview: Show
Motivation: Approaches to identifying new risk loci, training risk prediction models, imputing untyped variants, and fine-mapping causal variants from summary statistics of genome-wide association studies are playing an increasingly important role in the human genetics community. Current summary statistics based methods rely on global “best guess” reference panels in order to model the genetic correlation structure of the dataset being studied. This approach, especially in admixed populations, has the potential to produce misleading results, ignores variation in local structure, and is not feasible when appropriate reference panels are missing or small. Here we develop a method, Adapt-Mix, that combines information across all available reference panels to produce estimates of local genetic correlation structure for summary statistics based methods in arbitrary populations.

Results: We applied Adapt-Mix to estimate the genetic correlation structure of both admixed and non-admixed individuals using simulated and real data. We evaluated our method by measuring the performance of two summary statistics based methods: imputation and joint-testing. When using our method as opposed to the current standard of “best guess” reference panels, we observed a 28% decrease in mean-squared error for imputation and a 73.7% decrease in mean-squared error for joint-testing.

Availability: Our method is publicly available in a software package called ADAPT-Mix available at https://github.com/dpark27/adapt mix

TOP

TP048 (PT) - CANDL: Coarsely Aligning Networks with Diffusion and Landmarks
Theme: Disease / Other
Date: Monday, July 13, 10:30 am - 10:50 amRoom: Liffey Hall 2

Presenting author: Benjamin Hescott, Tufts University, United States

Inbar Fried, Tufts University, Computer Science Department
Anthony Cannistra, Tufts University, Computer Science Department
Carter Casey, Tufts University, Computer Science Department
Adam Piel, Tufts University, Computer Science Department
Mark Crovella, Boston University, Computer Science Department
Benjamin Hescott, Tufts University, Computer Science Department

Session Chair: Yves Moreau

Presentation Overview: Show
In this work we shift focus in the global network alignment problem, moving away from identifying local structural similarities, and focusing instead on finding coherent, functionally related groups of genes across species. We introduce CANDL — Coarsely Aligning Networks with Diffusion and Landmarks. Unlike previous methods that seek to conserve local motifs, CANDL identifies neighborhoods that are functionally similar. To do this, CANDL incorporates two key innovations. First, it uses a small set of known homologs to establish a set of landmarks that form the basis for a metric embedding of network nodes. Second, CANDL embeds the network using metrics known to capture functionally-relevant network structure, namely random walk commute time and eigenvectors of the Laplacian heat kernel. We show that CANDL captures functionally coherent neighborhood mappings considerably better than current state of the of art aligners. To do so we introduce two new validation tests based of functional coherence: cross validation using known homologs, and similarity of GO terms in neighborhoods. In the process, we also identify and quantify previously overlooked limitations of structural network alignment techniques that arise due to network automorphisms.

TOP

TP049 (PT) - Simultaneous reconstruction of microRNA-target and ceRNA networks
Theme: Genes
Date: Monday, July 13, 10:50 am - 11:10 amRoom: The Liffey A

Presenting author: Pavel Sumazin, Baylor College of Medicine, United States

Session Chair: Jerome Waldispuhl

Presentation Overview: Show
We introduce a method for simultaneous prediction of microRNA-target interactions and their mediated competitive endogenous RNA (ceRNA) interactions. Using high-throughput validation assays in breast cancer cell lines, we show that our integrative approach significantly improves on microRNA-target prediction accuracy as assessed by both mRNA and protein level measurements. Our biochemical assays support nearly 500 microRNA-target interactions with evidence for regulation in breast-cancer tumors. Moreover, these assays constitute the most extensive validation platform for computationally inferred networks of microRNA-target interactions in breast-cancer tumors, providing a useful benchmark to ascertain future improvements.

TOP

TP050 (PT) - Natural genetic variation impacts expression levels of coding, non-coding, and antisense transcripts in fission yeast
Theme: Systems
Date: Monday, July 13, 10:50 am - 11:10 amRoom: The Liffey B

Presenting author: Mathieu Clément-Ziza, University of Cologne, Germany

Francesc X Marsellach, University College London, , Department of Genetics, Evolution & Environment
Sandra Codlin, University College London, , Department of Genetics, Evolution & Environment
Manos A Papadakis, Technical university of Denmark, Center for Biological Sequence Analysis
Susanne Reinhardt, Technische Universität Dresden, BIOTEC
Maria Rodriguez-Lopez, University College London, Department of Genetics, Evolution & Environment
Stuart Martin, University College London, Department of Genetics, Evolution & Environment
Samuel Marguerat, Imperial College London, MRC Clinical Sciences Centre
Alexander Schmidt, University of Basel, University of Basel
Eunhye Grace Lee, University College London, Department of Genetics, Evolution & Environment
Christopher T Workman, Technical university of Denmark, Center for Biological Sequence Analysis
Jürg Bähler, University College London, epartment of Genetics, Evolution & Environment
Andreas Beyer, University of Cologne, CECAD

Session Chair: Hidde de Jong

Presentation Overview: Show
Our current understanding of how natural genetic variation affects gene expression beyond well annotated coding genes is still limited. The use of deep sequencing technologies for the study of expression quantitative trait loci (eQTLs) has the potential to close this gap. Here, we generated the first recombinant strain library for fission yeast and conducted an RNA-seq-based QTL study of the coding, non-coding, and antisense transcriptomes. We show that the frequency of distal effects (trans-eQTLs) greatly exceeds the number of local effects (cis-eQTLs) and that non-coding RNAs are as likely to be affected by eQTLs as protein-coding RNAs. We identified a genetic variation of swc5 that modifies the levels of many RNAs, with effects on both sense and antisense transcription, and downstream effects on the histone composition at promoters. The strains, methods, and datasets generated here provide a rich resource for future studies.

TOP

TP051 (PT) - Bayesian inference of viral fitness landscapes in the quasispecies model
Theme: Disease / Other
Date: Monday, July 13, 10:50 am - 11:10 amRoom: Liffey Hall 2

Presenting author: David Seifert, ETH Zurich, Switzerland

Francesca Di Giallonardo, University Hospital Zurich, Division of Infectious Diseases and Hospital Epidemiology
Karin J. Metzner, University Hospital Zurich, Division of Infectious Diseases and Hospital Epidemiology
Huldrych F. Günthard, University Hospital Zurich, Division of Infectious Diseases and Hospital Epidemiology
Niko Beerenwinkel, ETH Zurich, D-BSSE

Session Chair: Yves Moreau

Presentation Overview: Show
QuasiFit is a Bayesian MCMC sampler for inferring intra-host viral fitness landscapes from next-generation sequencing data. To estimate fitness, QuasiFit uses cross-sectional genetic data and assumes the viral quasispecies to be in mutation-selection equilibrium. With the inferred posterior fitness distribution, effects such as epistasis and neutral genotype networks can be determined, which will be helpful in judging which viral strains are highly fit and driving intra-host evolution. We applied QuasiFit to infer the viral fitness landscapes in two HIV-infected patients. By using intra-host data, QuasiFit enables learning of host-specific, personalized viral fitness landscapes.

TOP

TP052 (PT) - NGS-Logistics: federated analysis of NGS sequence variants across multiple locations.
Theme: Genes
Date: Monday, July 13, 11:40 am - 12:00 pmRoom: The Liffey A

Presenting author: Amin Ardeshirdavani, KU Leuven, Belgium

Erika Souche, KU Leuven, Center of Human Genetics
Luc Dehaspe, KU Leuven, Center of Human Genetics
Jeroen Van Houdt, KU Leuven, Center of Human Genetics
Joris Robert Vermeesch, KU Leuven, Center of Human Genetics
Yves Moreau, KU Leuven, ESAT - STADIUS

Session Chair: Jerome Waldispuhl

Presentation Overview: Show
As many personal genomes are now being sequenced, collaborative analysis of those genomes has become essential to effectively gain biomedical knowledge from those sequencing efforts. However, analysis of personal genomic data raises important confidentiality issues. We propose a methodology called NGS-Logistics, for federated analysis of sequence variants from personal genomes that contributes to alleviate those problems. Our method allows querying the genome for both a set of samples to which the user has authorized direct access (active data set) and for the whole set of samples. The query results are statistics that do not breach data confidentiality but allow further exploration of the data. Relevant samples outside the active data set can be identified through pseudonymous identifiers so that researchers can negotiate access to these samples with the authorized party. This approach minimizes the impact on data confidentiality while enabling powerful data analysis by gaining access to important rare samples.

TOP

TP053 (PT) - Co-analysis of transcriptome, exome, and protein interaction network information in cancers points to therapeutically targetable mutations
Theme: Systems / Disease
Date: Monday, July 13, 11:40 am - 12:00 pmRoom: The Liffey B

Presenting author: Sarah-Jane Schramm, The University of Sydney, Australia

Shila Ghazanfar, The University of Sydney, School of Mathematics and Statistics
Sarah-Jane Schramm, The University of Sydney, Sydney Medical School at Westmead Millennium Institute
John T. Ormerod, The University of Sydney, School of Mathematics and Statistics
Graham J. Mann, The University of Sydney, Sydney Medical School at Westmead Millennium Institute; Melanoma Institute Australia
Jean Yee Hwa Yang, The University of Sydney, School of Mathematics and Statistics; Melanoma Institute Australia

Session Chair: Hidde de Jong

Presentation Overview: Show
A long standing goal in cancer research is to describe the landscape of mutations responsible for neoplastic development and progression. Improved understanding of how gene and protein networks function in cancers would lead to identification of potential therapeutic targets, paving the way for advances in disease management at the level of individual patients. Using melanoma as a model disease, we recently found that differences in the coordination of gene co-expression among protein-protein interaction (PPI) networks were significantly associated (p<0.05) with patient survival. Moreover, these survival-related networks showed significant increases in the number of functional mutations present, relative to networks without such gene co-expression disruption. These findings suggest that increased functional mutation burden may be a pathogenic mechanism behind the differential network behavior observed. If true, these mutations would form a selectable basis of accumulation of disturbances during tumorigenesis, and be important drivers of disease progression/clinical outcome. Extending these analyses, we have recently shown in unpublished work that our original findings are reproducible in other cancers including lung squamous cell carcinoma (p<0.02), and serous ovarian cancer (p<0.05). Subsequent literature-based analysis reveals these survival-related networks are highly relevant to biology underlying tumor behaviour. These findings may guide the identification of therapeutically targetable mutations, including outside the exome.

TOP

TP054 (PT) - Finding Novel Molecular Connections between Developmental Processes and Disease
Theme: Disease / Other
Date: Monday, July 13, 11:40 am - 12:00 pmRoom: Liffey Hall 2

Presenting author: Donna Slonim, Tufts University, United States

Heather Wick, Johns Hopkins University, Human Genetics
Daniel Kee, Braintree Payments,
Keith Noto, Ancestry, Inc,
Jill Maron, Tufts Univerisy, Pediatrics
Donna Slonim, Tufts University, Computer Science

Session Chair: Yves Moreau

Presentation Overview: Show
Experiences during early development can affect lifelong health and disease risk. In our study, we have identified significant and surprising links between diseases and several tissue-specific developmental processes. Our work relies on a novel approach whose strength comes from pooling disease genes across related diseases, overcoming problems posed by limited information about gene-disease associations. We demonstrate the efficacy of the pooling method by evaluation on withheld data. We further validate the links between developmental processes and disease by demonstrating that our results, collectively, recover expected connections, such as those between heart development and cardiovascular disorders. We also describe some of the more surprising connections we found, several of which are consistent with other molecular evidence or recent literature. Finally, we present a web-based application that enables users to perform the same analysis for any set of genes of interest, and includes a visualization tool for exploration of the results.

TOP

TP055 (PT) - The functional importance of synonymous mutations in cancer and microbes analyzed in massive genomic data sets
Theme: Genes
Date: Monday, July 13, 12:00 pm - 12:20 pmRoom: The Liffey A

Presenting author: Fran Supek, Centre for Genomic Regulation, Spain

Belén Miñana, Centre For Genomic Regulation, Barcelona, Gene Regulation, Stem Cells and Cancer Program
Juan Valcárcel, Centre For Genomic Regulation, Barcelona, Gene Regulation, Stem Cells and Cancer Program
Toni Gabaldón, Centre For Genomic Regulation, Barcelona, Bioinformatics and Genomics Program
Ben Lehner, Centre For Genomic Regulation, Barcelona, EMBL/CRG Systems Biology Research Unit
Anita Kriško, Mediterranean Institute for Life Sciences, Split, Biology of Robustness
Tea Copić, Mediterranean Institute for Life Sciences, Split, Biology of Robustness

Session Chair: Jerome Waldispuhl

Presentation Overview: Show
Synonymous mutations do not change the encoded amino acids, but are known to have subtle effects on protein translation and on regulation of splicing. We have examined the prevalence of synonymous mutations among somatic changes catalogued across ~3800 human cancer genomes (Supek et al, Cell 2014). Oncogenes harbor an excess of synonymous mutations when compared against the broader gene set and intronic mutation rates. Such mutations were likely to alter exonic splicing enhancer/silencer motifs; RNA-Seq data indicated this leads to aberrantly spliced transcripts. Next, we analyzed the synonymous codon usage biases in 910 prokaryotic genomes (Krisko et al, Genome Biol 2014). Here, we found associations of codon biases within orthologous gene clusters to environmental preferences of microbes, and used this to predict the adaptive value of genes for aerobic, hot, or hypersaline environments. Out of 200 novel functional annotations for COG groups thus obtained, we experimentally validated 35/44 tested predictions.

TOP

TP056 (PT) - A multiscale statistical mechanical framework integrates biophysical and genomic data to assemble cancer networks.
Theme: Systems / Disease
Date: Monday, July 13, 12:00 pm - 12:20 pmRoom: The Liffey B

Presenting author: Mohammed AlQuraishi, Harvard Medical School, United States

Grigoriy Koytiger, Harvard Medical School, Systems Biology
Anne Jenney, Harvard Medical School, Systems Biology
Gavin MacBeath, Harvard Medical School, Systems Biology
Peter Sorger, Harvard Medical School, Laboratory of Systems Pharmacology

Session Chair: Hidde de Jong

Presentation Overview: Show
Functional interpretation of genomic variation is critical to understanding human disease, but it remains difficult to predict the effects of specific mutations on protein interaction networks and the phenotypes they regulate. We describe an analytical framework based on multiscale statistical mechanics that integrates genomic and biophysical data to model the human SH2-phosphoprotein network in normal and cancer cells. We apply our approach to data in The Cancer Genome Atlas (TCGA) and test model predictions experimentally. We find that mutations mapping to phosphoproteins often create new interactions but that mutations altering SH2 domains result almost exclusively in loss of interactions. Some of these mutations eliminate all interactions, but many cause more selective loss, thereby rewiring specific edges in highly connected subnetworks. Moreover, idiosyncratic mutations appear to be as functionally consequential as recurrent mutations. By synthesizing genomic, structural and biochemical data, our framework represents a new approach to the interpretation of genetic variation.

TOP

TP057 (PT) - Uncovering the mechanisms modulating cardiac electrophysiology using systems genetics approaches in recombinant inbred rat strains
Theme: Disease / Other
Date: Monday, July 13, 12:00 pm - 12:20 pmRoom: Liffey Hall 2

Presenting author: Michiel Adriaens, Maastricht University, Netherlands

Aida Moreno-Moral, Imperial College London, Integrative Genomics and Medicine
Elisabeth Lodder, AMC, Experimental Cardiology
Carol Ann Remme, AMC, Experimental Cardiology
Rianne Wolswinkel, AMC, Experimental Cardiology
Enrico Petretto, Imperial College London, Integrative Genomics and Medicine
Stuart Cook, Imperial College London, Integrative Genomics and Medicine
Connie Bezzina, AMC, Experimental Cardiology

Session Chair: Yves Moreau

Presentation Overview: Show
Genome-wide association studies have identified many common genetic variants impacting on susceptibility to cardiac arrhythmias and sudden cardiac death (SCD). But uncovering the underlying disease mechanisms remains a substantial challenge, as the required resources for the human heart are sparse and underpowered. Hence, the only means to paint the full picture is to complement insights derived from human studies with systems genetics approaches in statistically powerful animal models. In this study we use 29 BXH/HXB recombinant inbred (RI) rat strains, a strong model to uncover the mechanisms modulating cardiac electrical function. Prolonged ECG indices of conduction and repolarization are risk factors for cardiac arrhythmias and SCD, and here we combine such indices with genotyping and RNA-seq transcriptomics data. In this data we hunt for quantitative trait loci (QTL): genetic markers associated with changes in a quantitative trait, i.e. an ECG index or gene expression level. Using a Bayesian systems genetics framework, we identified multiple candidate genes and networks. One of these genes is Acbd4: a nearby genetic marker appears to modulate the expression of this gene (eQTL). Additionally, the same marker is associated with PR prolongation (ecgQTL). The protein product of Acbd4 plays a role in vesicle formation, deregulation of which is known to be linked to heart disease. Acbd4’s co-expression network is significantly positively correlated with PR duration and partly conserved in human, suggesting that the underlying mechanism may be of clinical relevance as well. Validation of our findings is currently ongoing.

TOP

TP058 (PT) - Robust reconstruction of gene expression profiles from reporter gene data using linear inversion
Theme: Genes
Date: Monday, July 13, 12:20 pm - 12:40 pmRoom: The Liffey A

Presenting author: Valentin Zulkower, INRIA Grenoble-Rhône-Alpes, France

Michel Page, INRIA Grenoble-Rhône-Alpes, IAE Grenoble, France
Delphine Ropers, INRIA Grenoble-Rhône-Alpes, UJF Grenoble, France
Johannes Geiselmann, INRIA Grenoble-Rhône-Alpes, UJF Grenoble, France
Hidde de Jong, INRIA Grenoble-Rhône-Alpes, France

Session Chair: Jerome Waldispuhl

Presentation Overview: Show
Motivation: Time-series observations from reporter gene experiments
are commonly used for inferring and analyzing dynamical models
of regulatory networks. The robust estimation of promoter activities
and protein concentrations from primary data is a difficult problem
due to measurement noise and the indirect relation between the
measurements and quantities of biological interest.

Results: We propose a general approach based on regularized linear
inversion to solve a range of estimation problems in the analysis of
reporter gene data, notably the inference of growth rate, promoter
activity, and protein concentration profiles. We evaluate the validity
of the approach using in-silico simulation studies, and observe
that the methods are more robust and less biased than indirect
approaches usually encountered in the experimental literature based
on smoothing and subsequent processing of the primary data. We
apply the methods to the analysis of fluorescent reporter gene data
acquired in kinetic experiments with Escherichia coli. The methods
are capable of reliably reconstructing time-course profiles of growth
rate, promoter activity, and protein concentration from weak and noisy
signals at low population volumes. Moreover, they capture critical
features of those profiles, notably rapid changes in gene expression
during growth transitions.

Availability: The methods described in this paper are made available
as a Python package (LGPL licence) and also accessible through a
fr/ibis/wellinverter.
Contact: Hidde.de-Jong@inria.fr

TOP

TP059 (PT) - Biological network modeling helps finding genetic determinants of metastatic colon cancer
Theme: Systems / Disease
Date: Monday, July 13, 12:20 pm - 12:40 pmRoom: The Liffey B

Presenting author: Inna Kuperstein, Institut Curie –U900 INSERM - Mines ParisTech, France

Maia Chanrion, Institut Curie-CNRS UMR 144, Centre de Recherche
David Cohen, Institut Curie – U900 INSERM - Mines ParisTech, Bioinformatics and Computational Systems Biology of Cancer
Emmanuel Barillot, Institut Curie – U900 INSERM - Mines ParisTech, Bioinformatics and Computational Systems Biology of Cancer
Daniel Louvard, Institut Curie-CNRS UMR 144, Bioinformatics and Computational Systems Biology of Cancer
Sylvie Robine, Institut Curie-CNRS UMR 144, Centre de Recherche
Andrei Zinovyev, Institut Curie – U900 INSERM - Mines ParisTech, Bioinformatics and Computational Systems Biology of Cancer

Session Chair: Hidde de Jong

Presentation Overview: Show
Epithelial-to-mesenchymal transition (EMT) initiates metastases in cancer, however the key players of the process are still debatable. We constructed a comprehensive map of EMT signaling network and performed structural analysis that allowed highlighting the network organization principles and complexity reduction up to core regulatory routs. Using the reduced network we compared combinations of single and double mutants for achieving the EMT phenotype; predicted that a combination of p53 knock-out and overexpression of Notch would induce EMT and suggested the molecular mechanism. This prediction lead to generation of colon cancer mice model with metastases in distant organs. We confirmed in invasive human colon cancer samples that EMT markers are associated with modulation of Notch and p53 gene expression in similar manner as in the mice model, supporting a synergy between these genes to permit EMT induction. The computational and experimental approaches lead to discovery of new metastasis mechanism in colon cancer.

TOP

TP060 (PT) - Relating Essential Proteins To Drug Side-Effects Using Canonical Component Analysis
Theme: Disease / Other
Date: Monday, July 13, 12:20 pm - 12:40 pmRoom: Liffey Hall 2

Presenting author: Tianyun Liu, Stanford University, United States

Session Chair: Yves Moreau

Presentation Overview: Show
We identified molecular mechanisms of drug side-effects by associating drugs to essential proteins using canonical component analysis.

TOP

TP061 (PT) - Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes
Theme: Disease
Date: Monday, July 13, 2:00 pm - 2:20 pmRoom: The Auditorium

Presenting author: Mark Leiserson, Brown University, United States

Fabio Vandin, Brown University, Computer Science and Computational Molecular Biology
Hsin-Ta Wu, Brown University, Computer Science and Computational Molecular Biology
Jason Dobson, Brown University, Computer Science, Computational Molecular Biology, and Molecular Biology, Cell Biology & Biochemistry
Alexandra Papoutsaki, Brown University, Computer Science
Beifang Niu, Washington University in St. Louis, Genome Institute
Michael McLellan, Washington University in St. Louis, Genome Institute
Michael Lawrence, Broad Institute of MIT and Harvard, Cancer Genomics Informatics
Abel Gonzalez-Perez, Pompeu Fabra University, Department of Experimental and Health Sciences
David Tamborero, Pompeu Fabra University, Department of Experimental and Health Sciences
Gregory Ryslik, Yale University, Biostatistics
Yuwei Cheng, Yale University, Statistical Genomics and Proteomics
Nuria Lopez-Bigas, Pompeu Fabra University, Department of Experimental and Health Sciences
Li Ding, Washington University in St. Louis, The Genome Institute
Benjamin Raphael, Brown University, Computer Science and Computational Molecular Biology

Session Chair: Paul Horton

Presentation Overview: Show
A key challenge in cancer genomics is to identify mutations that drive cancer in a cohort of tumor samples. These mutations often target genetic regulatory and signaling pathways and protein complexes, each including multiple genes. We present the HotNet2 (diffusion oriented subnetworks) algorithm for identifying significantly mutated subnetworks in a protein interaction network. HotNet2 uses an insulated heat diffusion process to simultaneously encode the local topology of a protein and its mutations when identifying significantly mutated (hot) subnetworks. We applied HotNet2 to the The Cancer Genome Atlas Pan-Cancer dataset, including 3110 tumor samples from twelve cancer types. HotNet2 identified significantly mutated subnetworks overlapping well-known cancer pathways, protein complexes with recently characterized roles in cancer (e.g. SWI/SNF and BAP1), and less characterized complexes (including the condensin and cohesin complexes). Our presentation will also include recent extensions and applications of the HotNet2 algorithm.

TOP

TP062 (PT) - Reconstructing gene regulatory dynamics from high-dimensional single-cell snapshot data
Theme: Genes
Date: Monday, July 13, 2:00 pm - 2:20 pmRoom: The Liffey A

Presenting author: Fabian J. Theis, Helmholtz Zentrum München; German Research Center for Environmental Health, Germany

Laleh Haghverdi, Helmholtz Center Munich, Germany
Nikola S. Mueller, Helmholtz Center Munich, Germany
Andrea Ocone, Helmholtz Center Munich, Germany

Session Chair: Uwe Ohler

Presentation Overview: Show
Motivation: High-dimensional single-cell snapshot data is becoming widespread in the systems biology community, as a mean to understand biological processes at the cellular level. However, as temporal information is lost with such data, mathematical models have been limited to capture only static features of the underlying cellular mechanisms.

Results: Here, we present a modular framework which allows to recover the temporal behaviour from single-cell snapshot data and reverse engineer the dynamics of gene expression. The framework combines a dimensionality reduction method with a cell time-ordering algorithm to generate pseudo time-series observations. These are in turn used to learn transcriptional ODE models and do model selection on structural network features. We apply it on synthetic data and then on real hematopoietic stem cells data, to reconstruct gene expression dynamics during differentiation pathways and infer the structure of a key gene regulatory network.

TOP

TP063 (PT) - Inference of interactions between chromatin modifiers and histone modifications: from ChIP-Seq data to chromatin-signaling
Theme: Systems
Date: Monday, July 13, 2:00 pm - 2:20 pmRoom: The Liffey B

Presenting author: Ho-Ryun Chung, Max-Planck-Institut F. Molekulare Genetik, Germany

Juliane Perner, Max Planck Institute for Molecular Genetics, Computational Molecular Biology
Julia Lasserre, Max Planck Institute for Molecular Genetics, Computational Molecular Biology
Sarah Kinkley, Max Planck Institute for Molecular Genetics, Otto-Warburg-Laboratories
Martin Vingron, Max Planck Institute for Molecular Genetics, Computational Molecular Biology

Session Chair: Russell Schwartz

Presentation Overview: Show
Chromatin modifiers and histone modifications form chromatin-signaling networks that regulate and drive transcription. In many cases, interactions between chromatin modifiers and histone modifications have only been studied in vitro or are based on the analysis of a few genes. Due to the biased nature of these experimental approaches and the dynamic complexity of chromatin signaling networks many interactions remain undisclosed. To recover novel interactions between chromatin modifiers and histone modifications, we applied computational methods to genome-wide ChIP-Seq data. The identified chromatin-signaling network recovered several previously described interactions and revealed as of yet unknown interactions. We experimentally verified two of these interactions, linking H4K20me1 with members of the Polycomb Repressive Complexes 1 and 2. These findings demonstrate that our computational method identifies interactions with experimental support and leads to novel biological insights, underlining its power in unraveling the connectivity of highly dynamic chromatin signaling networks.

TOP

TP064 (PT) - Using the Power of Big Data and Crowdsourcing for Catalyzing Breakthroughs in Amyotrophic Lateral Sclerosis (ALS)
Theme: Systems / Disease
Date: Monday, July 13, 2:00 pm - 2:20 pmRoom: Liffey Hall 2

Presenting author: Robert Kueffner, Helmholtz Center Munich, Germany

Zach Neta, Prize4Life, Prize4Life
Gustavo Stolovitzky, IBM, Translational Systems Biology and Nanobiotechnology

Session Chair: Knut Reinert

Presentation Overview: Show
We developed a crowdsourced DREAM Challenge to predict ALS disease progression using clinical trial data. The data are complex and non-uniform as they were measured by different laboratories. Therefore an important step was the harmonization of the different data sets.
On this clinical data, tree-based ensemble regression techniques proved to be most effective for machine learning. Based on the accuracy of the winning algorithms, we will present a simulation model to estimate the expected reduction in the number of patients needed for a clinical trial. The best performing submissions also outperformed the predictions of a group of world leading clinicians. One important outcome of the challenge was the identification of novel predictors of progression rate, potentially offering novel insights about disease mechanisms. We will also discuss our registrant survey where we determined factors that motivated or discouraged potential solvers to participate.

TOP

TP065 (PT) - Statistical Assessment of Darwinian Selection for Mitochondrial Mutations in Cancer
Theme: Disease
Date: Monday, July 13, 2:20 pm - 2:40 pmRoom: The Auditorium

Presenting author: Thomas LaFramboise, Case Western Reserve University, United States

Session Chair: Paul Horton

Presentation Overview: Show
Somatic mitochondrial DNA (mtDNA) mutations accumulate in human cancers, although the mutations’ roles in tumorigenesis are unclear and subject to some debate. In contrast to the nuclear genome’s two copies per cell, the mitochondrial genome – although very small at 16,568 bp – is typically present at hundreds to thousands of copies per cell. This complicates analysis of mtDNA level variants since they may be present at a continuous range of abundances between 0% and 100%, as opposed to the 0%, 50% or 100% discrete levels for nuclear genome variants. Furthermore, the per-cell copy number of the mitochondrial genome often shifts dramatically between the tumor and surrounding normal tissue, although the reasons for this phenomenon and its role, if any, in tumor development are unclear. To address these issues in a rigorous manner, we perform an analysis of cancer-specific mutational patterns and copy number changes in the whole mitochondrial genomes of 7,817 patient samples across 14 tumor types. We develop and apply statistical tests to query selection for somatic and inherited variants in mitochondrial DNA. We find specific tumor types and specific genes that show particularly prominent signals of positive selection. Since selection implies function, our results support the role of mtDNA mutations as causative factors in the initiation and development of human cancer.

TOP

TP066 (PT) - Large-scale Imputation of Epigenomic Datasets for Systematic Annotation of Diverse Human Tissues
Theme: Genes
Date: Monday, July 13, 2:20 pm - 2:40 pmRoom: The Liffey A

Presenting author: Jason Ernst, University of California Los Angeles, United States

Manolis Kellis, Massachusetts Institute of Technolog, Computer Science Artificial Intelligence Laboratory

Session Chair: Uwe Ohler

Presentation Overview: Show
With hundreds of epigenomic maps, the opportunity arises to exploit the correlated nature of epigenetic signals, across both marks and samples, for large-scale prediction of additional datasets. Here, we undertake epigenome imputation by leveraging such correlations through an ensemble of regression trees. We impute 4,315 high-resolution signal maps, of which 26% are also experimentally observed. Imputed signal tracks show overall similarity to observed signals and surpass experimental datasets in consistency, recovery of gene annotations and enrichment for disease-associated variants. We use the imputed data to detect low-quality experimental datasets, to find genomic sites with unexpected epigenomic signals, to define high-priority marks for new experiments and to delineate chromatin states in 127 reference epigenomes spanning diverse tissues and cell types. Our imputed datasets provide the most comprehensive human regulatory region annotation to date, and our approach and the ChromImpute software constitute a useful complement to large-scale experimental mapping of epigenomic information.

TOP

TP067 (PT) - Genome-wide modelling of transcription kinetics reveals patterns of RNA processing delays
Theme: Systems
Date: Monday, July 13, 2:20 pm - 2:40 pmRoom: The Liffey B

Presenting author: Antti Honkela, University of Helsinki, Finland

Antti Honkela, University of Helsinki, Helsinki Institute for Information Technology HIIT, Dept. of Computer Science
Jaakko Peltonen, Aalto University, Helsinki Institute for Information Technology HIIT, Dept. of Computer Science
Hande Topa, Aalto University, Helsinki Institute for Information Technology HIIT, Dept. of Computer Science
Iryna Charapitsa, Institute for Molecular Biology Mainz, -
Filomena Matarese, Radboud University Nijmegen, Nijmegen Centre for Molecular Life Sciences
Korbinian Grote, Genomatix Software GmbH, -
George Reid, Institute for Molecular Biology Mainz, -
Neil D. Lawrence, University of Sheffield, Department of Computer Science
Magnus Rattray, University of Manchester, Faculty of Life Sciences

Session Chair: Russell Schwartz

Presentation Overview: Show
Genes with similar transcriptional activation kinetics can display very different temporal mRNA profiles due to differences in transcription time, degradation rate and RNA processing kinetics. Recent studies have shown that a splicing-associated RNA processing delay can be significant. We introduce a joint model of transcriptional activation and mRNA accumulation which can be used for inference of transcription rate, RNA processing delay and degradation rate given genome-wide data from high-throughput sequencing time course experiments. We combine a mechanistic differential equation model with a non-parametric statistical modelling approach which allows us to capture a broad range of activation kinetics, and use Bayesian parameter estimation to quantify the uncertainty in the estimates of the kinetic parameters.

We apply the model to data from estrogen receptor (ER-α) activation in the MCF-7 breast cancer cell line. We use RNA polymerase II (pol-II) ChIP-Seq time course data to characterise transcriptional activation and mRNA-Seq time course data to quantify mature transcripts. We find that 11% of genes with a good signal in the data display a delay of more than 20 minutes between completing transcription and mature mRNA production. The genes displaying these long delays are significantly more likely to be short. We also find a statistical association between high delay and late intron retention in pre-mRNA data, indicating significant splicing-associated processing delays in many genes.

TOP

TP068 (PT) - A community computational challenge to predict the activity of pairs of compounds
Theme: Systems / Disease
Date: Monday, July 13, 2:20 pm - 2:40 pmRoom: Liffey Hall 2

Presenting author: Gustavo Stolovitzky, IBM Research / Mt Sinai Hospital, United States

Andrea califano, Columbia University, Department of Systems Biology
James Costello, University of Colorado, Department of Pharmacology
Mukesh Bansal, Columbia University, Department of Systems Biology

Session Chair: Knut Reinert

Presentation Overview: Show
Recent therapeutic successes have renewed interest in drug combinations, but experimental screening approaches are costly and often identify only small numbers of synergistic combinations. The DREAM consortium launched an open challenge to foster the development of in silico methods to computationally rank 91 compound pairs, from the most synergistic to the most antagonistic, based on gene-expression profiles of human B cells treated with individual compounds at multiple time points and concentrations. Using scoring metrics based on experimental dose-response curves, we assessed 32 methods, four of which performed significantly better than random guessing. We highlight similarities between the methods. Although the accuracy of predictions was not optimal, we find that computational prediction of compound-pair activity is possible, and that community challenges can be useful to advance the field of in silico compound-synergy prediction.

TOP

TP069 (PT) - Identifying driver genomic alterations in cancers by searching minimum-weight, mutually exclusive sets
Theme: Disease
Date: Monday, July 13, 2:40 pm - 3:00 pmRoom: The Auditorium

Presenting author: Xinghua Lu, University of Pittsburgh,

Session Chair: Paul Horton

Presentation Overview: Show
An important goal of cancer genomic research is to identify the driving pathways underlying disease mechanisms. It is well known that somatic genome alterations (SGAs) affecting the genes that encode the proteins within a common signaling pathway exhibit mutual exclusivity, in which these SGAs usu-ally do not co-occur in a tumor. With some success, this property has been utilized as an objective function to guide the search for driver mutations. However, the mutual exclusivity alone is not suffi-cient to indicate that genes affected by such SGAs are in common pathways. Here, we propose a nov-el, signal-oriented framework for identifying driver SGAs, such that our new method constrains the mutual exclusivity only on tumors that have SGAs to perturb a common signal (not on all tumors as previous methods used). We apply this framework to the OV and GBM data from TCGA, and perform systematic evaluations. Our results indicate that the signal-oriented approach enhances the ability to find informative sets of driver SGAs that likely constitute signaling pathways.

TOP

TP070 (PT) - Correcting for sample heterogeneity in epigenome-wide association studies.
Theme: Genes
Date: Monday, July 13, 2:40 pm - 3:00 pmRoom: The Liffey A

Presenting author: James Zou, Microsoft Research, United States

Session Chair: Uwe Ohler

Presentation Overview: Show
In epigenome-wide association studies, cell-type composition
often differs between cases and controls, yielding associations
that simply tag cell type rather than reveal fundamental
biology. Current solutions require actual or estimated
cell-type composition—information not easily obtainable
for many samples of interest. We propose a method,
FaST-LMM-EWASher, that automatically corrects for cell-type
composition without the need for explicit knowledge of it,
and then validate our method by comparison with the
state-of-the-art approach.

TOP

TP071 (PT) - Widespread degradation of transcripts by splicing and nonsense-mediated mRNA decay (NMD) includes ultraconserved targets whose regulation by alternative splicing and NMD is conserved between kingdoms
Theme: Systems
Date: Monday, July 13, 2:40 pm - 3:00 pmRoom: The Liffey B

Presenting author: Steven Brenner, University of California, Berkeley, United States

Liana Lareau, University of California, Berkeley, QB3 Institute

Session Chair: Russell Schwartz

Presentation Overview: Show
Ultraconserved elements, unusually long regions of perfect sequence identity, are found in genes encoding numerous RNA-binding proteins including SR splicing factors. Expression of these genes is regulated via alternative splicing of the ultraconserved regions to yield mRNAs that are degraded by nonsense- mediated mRNA decay (NMD), a process termed unproductive splicing. We have found that unproductive splicing of affects all human SR genes, but rather than being ancestral, it arose independently in nearly every case. We demonstrate that unproductive splicing of the splicing factor SRSF5 is conserved among all animals and even observed in fungi; this is a rare example of alternative splicing conserved between kingdoms, yet its effect is to trigger mRNA degradation. As the gene duplicated, the ancient unproductive splicing was lost in paralogs, and distinct unproductive splicing evolved rapidly and repeatedly to take its place.

TOP

TP072 (PT) - Improving compound-protein interaction prediction by building up highly credible negative samples
Theme: Systems / Disease
Date: Monday, July 13, 2:40 pm - 3:00 pmRoom: Liffey Hall 2

Presenting author: Yanni Sun, Fudan University, China

Jianjiang Sun, Fudan University, China
Jihong Guan, Tongji University, China
Jie Zheng, Nanyang Technological University, Singapore
Shuigeng Zhou, Fudan University, China
Hui Liu, Changzhou University, China

Session Chair: Knut Reinert

Presentation Overview: Show
Motivation : Computational prediction of compound-protein interactions is of great importance for drug design and development, as genome-scale experimental validation of compound-protein interactions is not only time-consuming but also prohibitively expensive. With the availability of an increasing number of validated interactions, the performance of computational prediction approaches is severely impended by the lack of reliable negative compound-protein interaction samples. A systematic method of screening reliable negative sample becomes critical to improving the performance of in silico prediction methods.

Results : This paper aims at building up a set of highly credible negative samples of compound-protein interactions via an in silico screening method. As most existing computational models assume that similar compounds are likelyto interact with similar target proteins and achieve remarkable performance, it is rational to identify potential negative samples based on the converse negative proposition that the proteins dissimilar to every known/predicted target of a compound are not likely to be targeted by the compound, and vice versa. We integrated various resources, including chemical structures, chemical expression profiles and side effects of compounds, amino acid sequences, protein-protein interaction network, and functional annotations of proteins, into a systematic screening framework. We first tested the screened negative samples on six classical classifiers, and all these classifiers achieved remarkably higher performance on our negative samples than on randomly-generated negative samples for both human and C.elegans. We then verified the negative samples on three existing prediction models, including bipartite local model, Gaussian kernel profile, Bayesian matrix factorization, and found that the performances of these models are also significantly improved on the screened negative samples. Moreover, we validated the screened negative samples on a drug bioactivity dataset. Finally, we derived two sets of new interactions by training an SVM classifier on the positive interactions annotated in DrugBank and our screened negative interactions. The screened negative samples and the predicted interactions provide the research community with a useful resource for identifying new drug targets and a helpful supplement to the current curated compound-protein databases.

Availability: Supplementary files and a preliminary Web server of this work are available at: http://admis.fudan.edu.cn/negative-cpi/

TOP

TP073 (PT) - An optimized chemical genomics pipeline for genome-wide discovery of new molecular probes from large compound collections
Theme: Disease
Date: Monday, July 13, 3:30 pm - 3:50 pmRoom: The Auditorium

Presenting author: Chad Myers, University of Minnesota, United States

Scott Simpkins, University of Minnesota, Biomedical Informatics and Computational Biology
Justin Nelson, University of Minnesota, Biomedical Informatics and Computational Biology
Jeff Piotrowski, University of Wisconsin-Madison, Great Lakes Bioenergy Research Center
Raamesh Deshpande, University of Minnesota, Computer Science and Engineering
Sheena Li, RIKEN, Chemical Genomics Research Group
Jacqueline Barber, RIKEN, Chemical Genomics Research Group
Hamid Safizadeh, University of Minnesota, Electrical and Computer Engineering
Reika Okamoto, RIKEN, Chemical Genomics Research Group
Mami Yoshimura, RIKEN, Chemical Genomics Research Group
Tamio Saito, RIKEN, Chemical Genomics Research Group
Hiroyuki Osada, RIKEN, Chemical Genomics Research Group
Minoru Yoshida, RIKEN, Chemical Genomics Research Group
Charles Boone, University of Toronto, Donnelly Centre for Cellular and Biomolecular Research
Chad Myers, University of Minnesota, Computer Science and Engineering

Session Chair: Paul Horton

Presentation Overview: Show
As an alternative to“target-centric” approaches to drug discovery, we have developed an ultra high-throughput yeast chemical genomics assay that allows the prediction of a compound’s gene- and process-level targets across the entire genome. This methodology provides a novel and informative way to screen compounds for specific bioactivities. This methodology was applied to screen more than 13,000 compounds with diverse origins (synthetic, natural product and derivative, and clinically-relevant compounds). We obtain high confidence process-level predictions for over 10% of the screened compounds. At the current level of throughput, we can screen more than 10,000 compounds and generate genome-wide target predictions within a few months’ time, demonstrating that we have developed an efficient, high-throughput method to assess genome-wide bioactivities.

TOP

TP074 (PT) - Histone variants delineate the transcription orientation at enhancers
Theme: Genes
Date: Monday, July 13, 3:30 pm - 3:50 pmRoom: The Liffey A

Presenting author: Kyoung-Jae Won, University of Pennsylvania, United States

Kyoung-Jae Won, University of Pennsylvania, Genetics
Inchan Choi, Univerisity of Pennsylvania, Genetics
Benjamin Garcia, University of Pennsylvania, Biochemistry and Biophysics

Session Chair: Uwe Ohler

Presentation Overview: Show
Genome-wide localization analyses using chromatin immunoprecipitation followed by next generation sequencing (ChIP-seq) against the four histone variants (H3.1, H3.3, H2A.Z and macroH2A) identified various combinations of histone variants (histone variants codes). While H2A.Z were highly enriched at promoter, H3.3 and H3.1 were observed at the body and the 3’UTR of active genes. While majority of distal regulatory regions were enriched for H3.3 and/or H2A.Z, we newly identified a group of regulatory regions enriched in H3.1 and the histone variant associated with repressive marks macroH2A, indicating that histone variants are deposited at regulatory regions to assist gene regulation. Systematic analysis identified both symmetric and asymmetric patterns of histone variant (H3.3 and H2A.Z) occupancies at intergenic regulatory regions. Strikingly, these directional patterns were associated with RNA Polymerase II (PolII). These asymmetric patterns correlated with the enhancer activities measured by global run-on sequencing (GRO-seq) data. We also showed that enhancers with skewed histone variants patterns well facilitate enhancer activity. Our study indicates that H2A.Z and H3.3 delineate the orientation of transcription at enhancers as observed at promoters.

TOP

TP075 (PT) - Genome-wide ceRNA networks
Theme: Systems
Date: Monday, July 13, 3:30 pm - 3:50 pmRoom: The Liffey B

Presenting author: Mario Flores, University of Texas at San Antonio, United States

Session Chair: Russell Schwartz

Presentation Overview: Show
Postranscriptional regulation of gene expression can be modeled as a competitive endogenous RNA (ceRNA) network in which mRNAs compete for miRs binding. Previous research shows that this competition maintains and fine-tune levels of protein coding genes and the disruption of the network contributes to phenotypic conditions like cancer. Based on our previous studies we provided a tool (TraceRNA) for reconstruction of ceRNA networks around a gene of interest (GoI). The approach used in TraceRNA although practical and useful for gene-based studies provides only a partial landscape of the ceRNA mechanisms and phenotypes. Besides TraceRNA offers an ad-hoc approach for the study of the ceRNA phenomenon. In this work we present a formal genome-wide approach for ceRNA networks study. This novel and formal treatment of the ceRNA phenomenon provides new perspectives in the study of ceRNA networks and its specific phenotype. We divide the study of genome-wide ceRNA networks in three main sections: network construction, analysis of network components by network perturbation and network stability.

TOP

TP076 (PT) - ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes
Theme: Genes / Other
Date: Monday, July 13, 3:30 pm - 3:50 pmRoom: Liffey Hall 2

Presenting author: Siavash Mirarab, University of Texas at Austin, United States

Tandy Warnow, The University of Illinois at Urbana-Champaign, United States

Session Chair: Knut Reinert

Presentation Overview: Show
Motivation: The estimation of species phylogenies requires multiple loci, since different loci can have different trees due to incomplete lineage sorting (ILS), modelled by the multi-species coalescent model. We recently developed a coalescent-based method, ASTRAL (ECCB 2014), which is statistically consistent under the multi-species coalescent model and which is more accurate than other coalescent- based methods on the datasets we examined (Mirarab et al., 2014a). ASTRAL heuristically solves an NP-hard problem in polynomial time, by constraining the search space through a set of allowed “bipartitions”. Despite the limitation to allowed bipartitions, ASTRAL is statistically consistent.

Results: We present a new version of ASTRAL, which we call ASTRAL-II. We will show that ASTRAL-II has substantial advantages over ASTRAL: it is faster, can analyze much larger datasets (up to 1000 species and 1000 genes), and has substantially better accuracy under some conditions. ASTRAL’s running time is $O(n^2k|X|^2)$, and ASTRAL-II’s running time is $O(nk|X|^2)$, where n is the number of species, k is the number of loci, and X is the set of allowed bipartitions for the search space.

Availability: ASTRAL-II is available in open source at https://github.com/smirarab/ASTRAL.
Contact: smirarab@gmail.com

TOP

TP077 (PT) - Using collective expert judgements to evaluate quality measures of mass spectrometry images
Theme: Data
Date: Monday, July 13, 3:30 pm - 3:50 pmRoom: Wicklow Hall 2A

Presenting author: Andrew Palmer, EMBL, Germany

Ekaterina Ovchinnikova, EMBL, Germany
Mikael Thune, Denator, Sweden
Regis Lavigne, Inserm U1085, France
Blandine Guevel, Inserm U1085, France
Andrey Dyatlov, Uni Bremen, Germany
Olga Vitek, Northeastern University, United States
Charles Pineau, Inserm U1085, France
Mats Boren, Denator, Sweden
Theodore Alexandrov, EMBL, Germany

Session Chair: Robert F. Murphy

Presentation Overview: Show
Motivation: Imaging Mass Spectrometry (IMS) is a maturating technique of molecular imaging. Confidence in the reproducible quality of IMS data is essential for its integration into routine use. However, the predominant method for assessing quality is visual examination, a time consuming, unstandardised and non-scalable approach. So far, the problem of assessing the quality has only been marginally addressed and existing measures do not account for the spatial information of IMS data. Importantly, no approach exists for unbiased evaluation of potential quality measures.

Results: We propose a novel approach for evaluating potential measures by creating a gold-standard set using collective expert judgements upon which we evaluated image-based measures. To produce a gold standard, we engaged 80 IMS experts, each to rate the relative quality between 52 pairs of ion images from MALDI- TOF IMS datasets of rat brain coronal sections. Experts’ optional feedback on their expertise, the task and the survey showed that (i) they had diverse backgrounds and sufficient expertise, (ii) the task was properly understood, and (iii) the survey was comprehensible. A moderate inter-rater agreement was achieved with Krippendorff’s alpha of 0.5. A gold-standard set of 634 pairs of images with accompanying ratings was constructed and showed a high agreement of 0.85. Eight families of potential measures with a range of parameters and statistical descriptors, giving 143 in total, were evaluated. Both signal-to-noise and spatial chaos based measures performed highly with a correlation of 0.7 to 0.9 with the gold standard ratings. Moreover, we showed that a composite measure with the linear coefficients (trained on the gold standard with regularised least squares optimisation and lasso) showed a strong linear correlation of 0.94 and an accuracy of 0.98 in predicting which image in a pair was of higher quality.

Availabiility: The anonymised data collected from the survey and the Matlab source code for data processing can be found at: https: //github.com/alexandrovteam/IMS_quality.

TOP

TP078 (PT) - Phenome-driven Disease Genetics Prediction Towards Drug Discovery
Theme: Disease
Date: Monday, July 13, 3:50 pm - 4:10 pmRoom: The Auditorium

Presenting author: Rong Xu, Case Western Reserve University, United States

Li Li, Case Western Reserve University, United States
Guo-Qiang Zhang, Case Western Reserve University, United States
Yang Chen, Case Western Reserve University, United States

Session Chair: Paul Horton

Presentation Overview: Show
Motivation: Discerning genetic contributions to diseases not only enhances our understanding of disease mechanisms, but also leads to translational opportunities for drug discovery. Recent computational approaches incorporate disease phenotypic similarities to improve the prediction power of disease gene discovery. However, most current studies used only one data source of human disease phenotype. We present an innovative and generic strategy for combining multiple different data sources of human disease phenotype and predicting disease associated genes from integrated phenotypic and genomic data.

Methods: To demonstrate our approach, we explored a new phenotype database from biomedical ontologies and constructed Disease Manifestation Network (DMN). We combined DMN with mimMiner, which was a widely-used phenotype database in disease gene prediction studies. We developed a network analysis approach to predict disease-gene associations from the integrated disease phenotype networks and a gene network.

Results: Our approach achieved significantly improved performance over a baseline method, which used only one phenotype data source. In the leave-one-out cross validation and de novo gene prediction analysis, our approach achieved the area under the curves (AUCs) of 90.7% and 90.3%, which are significantly higher than 84.2% (p

TOP

TP079 (PT) - Integrated reporters reveal distinct pathways of gene silencing in Drosophila
Theme: Genes
Date: Monday, July 13, 3:50 pm - 4:10 pmRoom: The Liffey A

Presenting author: Guillaume Filion, Center for Genomic Regulation, Spain

Session Chair: Uwe Ohler

Presentation Overview: Show
Recent genome-wide mapping studies in eukaryotes have shown that most transcriptionally silent domains lack repressive histone marks and repressors of transcription, prompting to ask what makes genes of these regions silent. Here we set out to answer this question by assaying position effects genome-wide for several reporters of transcription. To this end, we used a shotgun approach called TRIP (Thousands of Reporters Integrated in Parallel) to insert identical reporter genes at different loci of the Drosophila genome and measure their expression. We obtained expression data for more than 85,000 integrated reporters under eight different promoters, constituting the largest dataset of position effects available to date. We identified 10-100 kb domains of either high or low reporter activity. These domains are similar for different reporter constructs, showing that they correspond to the underlying organization of the genome. While these domains are similar between constructs, the degree of response to the context of each promoter is variable, yet the constructs are equally permeable to the neighboring chromatin. We identified novel protein signatures associated to the repression of reporter genes. One of them consists of chromatin proteins associated to transcriptionally active regions with a deficit of DMAP1, which suggests that this protein is critical for the expression of reporters. Overall, our results reveal that the effect of the chromatin context on transcription results from multiple processes at work simultaneously.

TOP

TP080 (PT) - Alternatively spliced homologous exons have ancient origins and are highly expressed at the protein level
Theme: Proteins
Date: Monday, July 13, 3:50 pm - 4:10 pmRoom: The Liffey B

Presenting author: Michael Liam Tress, Spanish National Cancer Research Centre (CNIO), Spain

Michael Tress, Spanish National Cancer Research Centre (CNIO), Structural Biology and Computational Programme
Federico Abascal, Spanish National Cancer Research Centre (CNIO), Structural Biology and Computational Programme
Alfonso Valencia, Spanish National Cancer Research Centre (CNIO), Structural Biology and Computational Programme
Juan Rodriguz, Spanish National Cancer Research Centre (CNIO), Structural Biology and Computational Programme
Jose Manuel Rodriguez, Spanish National Cancer Research Centre (CNIO), Spanish National Bioinformatics Unit (INB)
Iakes Ezkurdia, Centro Nacional de Investigaciones Cardiovasculares, CNIC, Unidad de Proteómica
Jesus Vazquez, Centro Nacional de Investigaciones Cardiovasculares, CNIC, Laboratorio de Proteómica Cardiovascular
Angela del Pozo, Hospital Universitario La Paz, Laboratorio de Proteómica Cardiovascular

Session Chair: Russell Schwartz

Presentation Overview: Show
Alternative splicing of messenger RNA can generate a wide variety of mature RNA transcripts, and these transcripts may produce protein isoforms with diverse cellular functions. While there is much supporting evidence for the expression of alternative transcripts, the same is not true for the alternatively spliced protein products. Although large-scale mass spectroscopy experiments have identified evidence of alternative splicing at the protein level, results have been contradictory.

Here we carried out a rigorous analysis of the peptide evidence from eight large-scale proteomics experiments to assess the scale of alternative splicing detectable by high-resolution mass spectroscopy. While we identified peptides for almost 64% of human protein coding genes, we detected just 282 splice events. We demonstrate that this is fewer splice events than would be expected, and show that most genes have a single dominant isoform at the protein level.
The most striking result was that more than 20% of the splice isoforms we identified were generated by substituting one homologous exon for another. This is significantly more than would be expected from their frequency in the genome. These homologous exon substitution events were remarkably conserved - all the homologous exons we identified evolved over 460 million years ago - and eight of the fourteen tissue-specific splice isoforms we identified were generated from homologous exons. The combination of proteomics evidence, ancient origin and tissue-specific splicing is a clear indication that isoforms generated from homologous exons may have important cellular roles.

TOP

TP081 (PT) - Pooled Assembly of Metagenomic Data: Chimeric Contigs Enable Better Annotation and Discovery of New Marine Bacteria
Theme: Genes / Other
Date: Monday, July 13, 3:50 pm - 4:10 pmRoom: Liffey Hall 2

Presenting author: Dietlind Gerloff, Foundation for Applied Molecular Evolution, United States

Session Chair: Knut Reinert

Presentation Overview: Show
A metagenomic sample can contain billions of cells, thousands of different genomes. The set of sequencing reads derived from it will be sparse by comparison and underrepresent this complexity by orders of magnitude. Additionally, metagenome annotation is confounded by short reads that capture only small fragments of genes, and by the small fraction of known microbes represented in sequence databases, often described as "the culturable 1%". Difficulties include distinguishing known from novel species and often affect the majority of reads in a data set.  In our paper, we demonstrate quantitatively how careful assembly of marine metagenomic pyrosequencing reads within, but also across, datasets can alleviate annotation problems. Our results outline exciting prospects for data sharing in the metagenomics community. In follow-on work, we have developed a new "geographic profiling" approach that allows us to use chimeric contigs obtained through pooled assembly for (low-cost) discovery of new species in old data.

TOP

TP082 (PT) - Interactive and exploratory visual analytics of epigenome-wide data
Theme: Data
Date: Monday, July 13, 3:50 pm - 4:10 pmRoom: Wicklow Hall 2A

Presenting author: Hector Corrada Bravo, University of Maryland, United States

Florin Chelaru, University of Maryland, Computer Science

Session Chair: Robert F. Murphy

Presentation Overview: Show
Data visualization is an integral aspect of the analysis of epigenomic experimental results. Commonly, the data visualized
in these tools is the output of analyses performed in computing
environments like _Bioconductor_. These two essential aspects of data
analysis, algorithmic/statistical analysis and visualization, are
usually distinct and disjoint but are most effective when used
iteratively. We will introduce epigenomics data visualization tools that
provide tight-knit integration with computational and statistical
modeling and data analysis: _Epiviz_ (_http://epiviz.cbcb.umd.edu_), a
web-based genome browser application, and the _Epivizr_ Bioconductor
package that provides interactive integration with _R/Bioconductor_
sessions. This combination of technologies permits interactive
visualization within a state-of-the-art functional genomics analysis
platform. The web-based design of our tools facilitates the reproducible
dissemination of interactive data analyses in a user-friendly platform.

TOP

TP083 (PT) - Interactome based drug discovery
Theme: Disease
Date: Monday, July 13, 4:10 pm - 4:30 pmRoom: The Auditorium

Presenting author: Gaurav Chopra, University of California, San Francisco, United States

Gaurav Chopra, UCSF & SUNY-Buffalo, Diabetes Center (UCSF) & Biomedical Informatics (SUNY)
Ram Samudrala, SUNY-Buffalo, Biomedical Informatics

Session Chair: Paul Horton

Presentation Overview: Show
We have developed a Computational Analysis of Novel Drug Opportunities (CANDO) platform (http://protinfo.org/cando/) funded by a 2010 NIH Director's Pioneer Award that analyzes compound-proteome interaction signatures to determine drug behavior, in contrast to traditional single (or few) target approaches. Our platform implements a modeling pipeline that generates an interaction matrix between 3,733 human approved drugs and 48,278 proteins using a hierarchical chem- and bio-informatic fragment-based docking with dynamics protocol (~ 1 billion predicted interactions evaluated, considering multiple binding sites per protein). The platform then uses similarity of interaction signatures across all proteins indicative of similar functional behavior and nonsimilar signatures for off- and anti-target (side) effects, in effect inferring homology of compound/drug behavior at a proteomic level. The benchmarking accuracy using this approach to rank compounds for over 650 indications/diseases is ~36%, in contrast to accuracies of ~0.2% obtained when using scrambled control matrices. We prospectively validated “high value” predictions in vitro and in vivo preclinical studies for more than a dozen indications, including type 1 diabetes, herpes, dental caries, dengue, tuberculosis, malaria, hepatitis B, and different cancers. Our drug prediction accuracy is ~35% across the nine indications, where 57/162 compounds validated thus far show comparable or better activity than an existing drug, or micromolar inhibition at the cellular level, and serve as novel repurposeable therapies. Our approach is broadly applicable beyond repurposing, enables personalized and precision medicine, and foreshadows a new era of faster, safer, and cheaper drug discovery.

TOP

TP084 (PT) - Predicting the human epigenome from DNA motifs
Theme: Genes
Date: Monday, July 13, 4:10 pm - 4:30 pmRoom: The Liffey A

Presenting author: John Whitaker, Janssen Pharmaceutical Companies of Johnson & Johnson, United States

Wei Wang, UCSD, Department of Chemistry and Biochemistry, Department of Cellular and Molecular Medicine
Zhou Chen, UCSD, Department of Chemistry and Biochemistry
Kai Zhang, UCSD, Department of Chemistry and Biochemistry

Session Chair: Uwe Ohler

Presentation Overview: Show
The epigenome is established and maintained by the site-specific recruitment of chromatin-modifying enzymes and their cofactors. Identifying the cis elements that regulate epigenomic modification is critical for understanding the regulatory mechanisms that control gene expression patterns. We present Epigram, an analysis pipeline that predicts histone modification and DNA methylation patterns from DNA motifs. The identified cis elements represent interactions with the site-specific DNA-binding factors that establish and maintain epigenomic modifications. We cataloged the cis elements in embryonic stem cells and four derived lineages and found numerous motifs that have location preference, such as at the center of H3K27ac or at the edges of H3K4me3 and H3K9me3, which provides mechanistic insight about the shaping of the epigenome.

TOP

TP085 (PT) - Modeling Ribosome Profiling Data with Bayesian Hidden Markov Models
Theme: Systems
Date: Monday, July 13, 4:10 pm - 4:30 pmRoom: The Liffey B

Presenting author: Brandon Malone, Max Planck Institute for Biology of Ageing, Germany

Brandon Malone, Max Planck Institute for Biology of Ageing, Computational RNA Biology and Ageing
Florian Aeschimann, Friedrich Miescher Institute for Biomedical Research, Epigenetics
Jieyi Xiong, Max Planck Institute for Biology of Ageing, Computational RNA Biology and Ageing
Helge Grosshans, Friedrich Miescher Institute for Biomedical Research, Epigenetics
Christoph Dieterich, Max Planck Institute for Biology of Ageing, Computational RNA Biology and Ageing

Session Chair: Russell Schwartz

Presentation Overview: Show
Ribosome profiling via high-throughput sequencing (ribo-seq) is a promising new technique for characterizing the occupancy of ribosomes on messenger RNA (mRNA) at base-pair resolution. The ribosome is responsible for translating mRNA into proteins, so information about its occupancy offers a detailed view of ribosome density and position which could be used to discover new upstream open reading frames, alternative start codons and new isoforms. Furthermore, this data allows the study of translational dynamics, such as decoding speed and ribosome pausing. Despite the wealth of information offered by ribo-seq, current analysis techniques have focused on coarse, gene-level statistics. In this work, we propose a hidden Markov model (HMM) approach to predict, at base-pair resolution, ribosome occupancy and translation. We use state-of-the-art learning algorithms to fit the parameters of our model, which correspond to biologically meaningful quantities, such as expected ribosome occupancy. Furthermore, we extend the model with Bayesian hyperparameters to quantify the uncertainty of the learned parameters. Preliminary evaluation shows that the HMM achieves a much higher true positive rate, and overall higher AUC, in identifying proteomics-verified coding regions compared to using the raw profile.

TOP

TP086 (PT) - Deciphering cocktail party of biological, technical and artefactual signals in tumoural transcriptomes
Theme: Genes / Other
Date: Monday, July 13, 4:10 pm - 4:30 pmRoom: Liffey Hall 2

Presenting author: Emmanuel Barillot, Institut Curie, France

Anne Biton, University of California, Department of Medicine
Emmanuel Barillot, Institut Curie, Bioinformatics and Systems Biology of Cancer
Francois Radvanyi, Institut Curie, Molecular Oncology

Session Chair: Knut Reinert

Presentation Overview: Show
Large-scale projects are generating massive amounts of molecular profiles for tumoural samples. It remains a challenge to unravel their complexity into the action of relatively few independent signals. This ambitious task can be approached by blind source separation methods such as Independent Component Analysis (ICA). We analysed data on nine different cancers from 21 patient cohorts and 6671 tumours and identified their commonalities, as well as the cancer type-specific characteristics. By carefull interpretation of ICA results, we managed to distinguish the signals coming from tumoural cells from those coming from the tumour microenvironment, clearly identified signals associated with technology and related to different treatments of tumour tissue biases. We showed that the information captured in independent components is also reflected into anatomopathological staining microscopy images. Analysis of one of the bladder cancer-specific ICA component led to formulating a new hypothesis on the role of PPARG gene which was experimentally verified.

TOP

TP087 (PT) - A generic methodological framework for studying single cell motility in high-throughput time-lapse data
Theme: Data
Date: Monday, July 13, 4:10 pm - 4:30 pmRoom: Wicklow Hall 2A

Presenting author: Alice Schoenauer Sebag, Mines ParisTech - INSERM - Agro Paristech, France

Céline Raulet-Tomkiewicz, INSERM - Paris V, France
Robert Barouki, INSERM - Paris V, France
Jean-Philippe Vert, Mines ParisTech - Institut Curie, France
Thomas Walter, Institut Curie, France

Session Chair: Robert F. Murphy

Presentation Overview: Show
Motivation: Motility is a fundamental cellular attribute, which plays a major part in processes ranging from embryonic development to metastasis. Traditionally, single cell motility is often studied by live cell imaging. Yet, such studies were so far limited to low throughput. In order to systematically study cell motility at a large scale, we need robust methods to quantify cell trajectories in live cell imaging data.

Results: The primary contribution of this paper is to present MotIW, a generic workflow for the study of single cell motility in High-Throughput (HT) time-lapse screening data. It is composed of cell tracking, cell trajectory mapping to an original feature space, and hit detection according to a new statistical procedure. We show that this workflow is scalable and demonstrate its power by application to simulated data, as well as large-scale live cell imaging data. This application enables the identification of an ontology of cell motility patterns in a fully unsupervised manner.

Availability: Python code and examples available at http://cbio.ensmp.fr/~aschoenauer/motiw.html
Contact: thomas.walter@mines-paristech.fr

TOP

TP088 (PT) - A Study Of Common Disease Using The Human Phenotype Ontology
Theme: Data
Date: Tuesday, July 14, 10:10 am - 10:30 amRoom: Liffey Hall 2

Presenting author: Tudor Groza, Garvan Institute of Medical Research, Australia

Tudor Groza, Garvan Institute of Medical Research, Kinghorn Center for Clinical Genomics
Sebastian Köhler, Charité-Universitätsmedizin Berlin, Institute for Medical and Human Genetics
Dawid Moldenhauer, Charité-Universitätsmedizin Berlin, Institute for Medical and Human Genetics
Nicole Vasilevsy, Oregon Health & Science University, Library Department
Gareth Baynam, King Edward Memorial Hospital, Genetic Services of Western Australia
Lynn Schriml, University of Maryland School of Medicine, Department of Epidemiology and Public Health
Warren Kibbe, National Cancer Institute, Center for Biomedical Informatics and Information Technology
Tim Beck, University of Leicester, Department of Genetics
Anthony Brookes, University of Leicester, Department of Genetics
Andreas Zankl, The Children's Hospital at Westmead, Department of Medical Genetics
Nicole Washington, Lawrence Berkeley National Laboratory, Berkeley Bioinformatics Open-source Projects
Christopher Mungall, Lawrence Berkeley National Laboratory, Berkeley Bioinformatics Open-source Projects
Suzanna Lewis, Lawrence Berkeley National Laboratory, Berkeley Bioinformatics Open-source Projects
Melissa Haendel, Oregon Health & Science University, Department of Medical Informatics & Clinical Epidemiology
Peter Robinson, Charité-Universitätsmedizin Berlin, Institute for Medical and Human Genetics

Session Chair: Ioannis Xenarios

Presentation Overview: Show
Deep phenotyping, the precise and comprehensive analysis of individual phenotypic abnormalities for the purpose of translational research, diagnostics, or personalized care, depends on computational resources to capture the phenotype of patients or diseases and integrate it with other relevant information such as genomic variation. The Human Phenotype Ontology (HPO) is widely used in the rare disease community for differential diagnostics, phenotype-driven analysis of next-generation sequence variation data, and translational research, but a comparable resource has not been available for common disease. This presentation introduces disease models for 3,145 common human diseases comprising a total of 132,006 annotations to terms of the HPO, which enabled us to build a common disease phenotypic network, as well as to study the phenotypic and genetic overlap across common diseases.

TOP

TP089 (PT) - Protein Structures in the PDB Show the Temperature Dependance of Hydrophobicity
Theme: Proteins
Date: Tuesday, July 14, 10:10 am - 10:30 amRoom: The Liffey B

Presenting author: Sanne Abeln, VU University, Netherlands

Session Chair: Francisco Melo Ledermann

Presentation Overview: Show
The hydrophobic effect is the main driving force in protein folding. One can estimate the relative strength of this hydrophobic effect for each amino acid by mining a large set of experimentally determined protein structures. However, the hydrophobic force is known to be strongly temperature dependent. This temperature dependence is thought to explain the denaturation of proteins at low temperatures. Here we investigate if it is possible to extract this temperature dependence directly from a large set of protein structures determined at different temperatures.
Using NMR structures filtered for sequence identity, we were able to extract hydrophobicity propensities for all amino acids at five different temperature ranges (spanning 265-340 K). These propensities show that the hydrophobicity becomes weaker at lower temperatures, in line with current theory. Alternatively, one can conclude that the temperature dependence of the hydrophobic effect has a measurable influence on protein structures. Moreover, this work provides a method for probing the individual temperature dependence of the different amino acid types, which is difficult to obtain by direct experiment.

TOP

TP090 (PT) - Pancancer analysis of DNA methylation-driven genes using MethylMix
Theme: Disease
Date: Tuesday, July 14, 10:10 am - 10:30 amRoom: The Liffey A

Presenting author: Olivier Gevaert, Stanford University, United States

Session Chair: Louxin Zhang

Presentation Overview: Show
Aberrant DNA methylation is an important mechanism that contributes to oncogenesis. Yet, few algorithms exist that exploit this vast dataset to identify hypo- and hyper-methylated genes in cancer. We developed a novel computational algorithm called MethylMix to identify differentially methylated genes that are also predictive of transcription. We apply MethylMix to twelve individual cancer sites, and additionally combine all cancer sites in a pancancer analysis. We discover pancancer hypo- and hyper-methylated genes and identify novel methylation-driven subgroups with clinical implications. MethylMix analysis on combined cancer sites reveals ten pancancer clusters reflecting new similarities across malignantly transformed tissues.

TOP

TP091 (PT) - Exploiting Ontology Graph for Predicting Sparsely Annotated Gene Function
Theme: Data
Date: Tuesday, July 14, 10:30 am - 10:50 amRoom: Liffey Hall 2

Presenting author: Sheng Wang, University of Illinois at Urbana-Champaign, United States

Hyunghoon Cho, Massachusetts Institute of Technology, United States
Chengxiang Zhai, University of Illinois at Urbana, United States
Bonnie Berger, Massachusetts Institute of Technology, United States
Jian Peng, University of Illinois at Urbana-Champaign, United States

Session Chair: Ioannis Xenarios

Presentation Overview: Show
Motivation: Systematically predicting gene (or protein) function based on molecular interaction networks has become an important tool in refining and enhancing the existing annotation catalogs, such as the Gene Ontology (GO) database. However, functional labels with only a few (<10) annotated genes, which constitute about half of the GO terms in yeast, mouse and human, pose a unique challenge in that any prediction algorithm that independently considers each label faces a paucity of information and thus is prone to capture non-generalizable patterns in the data, resulting in poor predictive performance. There exist a variety of algorithms for function prediction, but none properly address this “overfitting” issue of sparsely annotated functions, or do so in a manner scalable to tens of thousands of functions in the human catalog.

Results: We propose a novel function prediction algorithm, clusDCA, which transfers information between similar functional labels to alleviate the overfitting problem for sparsely annotated functions. Our method is scalable to datasets with a large number of annotations. In a cross-validation experiment in yeast, mouse and human, our method greatly outperformed previous state-of-the-art function prediction algorithms in predicting sparsely annotated functions, without sacrificing the performance on labels with sufficient information. Furthermore, we show that our method can accurately predict genes that will be assigned a functional label that has no known annotations, based only on the ontology graph structure and genes associated with other labels, which further suggests that our method effectively utilizes the similarity between gene functions.

Availability: https://github.com/wangshenguiuc/clusDCA

TOP

TP092 (PT) - Protein Structure Novelty has Regressed 20 Years
Theme: Proteins
Date: Tuesday, July 14, 10:30 am - 10:50 amRoom: The Liffey B

Presenting author: John-Marc Chandonia, Lawrence Berkeley National Laboratory, United States

John-Marc Chandonia, Berkeley National Lab, Physical Biosciences Division
Steven Brenner, University of California, Berkeley, Department of Plant and Microbial Biology

Session Chair: Francisco Melo Ledermann

Presentation Overview: Show
The number of new protein structures deposited every month in the PDB has steadily increased, and is now at over 750 structures per month. On average, fewer than 15 of these structures (i.e., 2%) represent the first solved structure from a Pfam protein family. Fifteen families per month is the lowest rate at which families have been structurally characterized in nearly 20 years, despite vastly more efficient technology. Today, less than half as many families are newly structurally characterized every month as during the heyday of Structural Genomics, between 2003 and 2007. Because the rate of sequencing has outpaced the rate of structural characterization of families, the fraction of large protein families with a known structure peaked 7 years ago, and is 10% lower today than it was at its peak. This makes curation of protein structure classification databases easier, but interpretation of sequence variation is more challenging than would otherwise be the case.

TOP

TP093 (PT) - MEMCover: Integrated Analysis of Mutual Exclusivity and Functional Net-work Reveals Dysregulated Pathways Across Multiple Cancer Types
Theme: Disease
Date: Tuesday, July 14, 10:30 am - 10:50 amRoom: The Liffey A

Presenting author: Yoo-Ah Kim, NCBI/NLM/NIH, United States

Dongyeon Cho, NCBI/NLM/NIH, United States
Phuong Dao, NCBI/NLM/NIH, United States
Teresa Przytycka, NCBI/NLM/NIH, United States

Session Chair: Louxin Zhang

Presentation Overview: Show
The data gathered by the Pan-Cancer initiative has created an unprecedented opportunity for illuminating common features across different cancer types. However separating tissue specific features from across cancer signatures has proven to be challenging. One of the often-observed properties of the mutational landscape of cancer is the mutual exclusivity of cancer driving mutations. Even though studies based on individual cancer types suggested that mutually exclusive pairs often share the same functional pathway, the relationship between across cancer mutual exclusivity and functional connectivity has not been previously investigated. Here we introduce a classification of mutual exclusivity into three basic classes: within tissue type exclusivity, across tissue type exclusivity, and between tissue type exclusivity. We then combined across-cancer mutual exclusivity with interactions data to uncover pan-cancer dysregulated pathways. Our new method, Mutual Exclusivity Module Cover (MEMCover) not only identified previously known Pan-Cancer dysregulated sub-networks but also novel subnetworks whose across cancer role has not been appreciated well before. In addition, we demonstrate the existence of mutual exclusivity hubs, putatively corresponding to cancer drivers with strong growth advantages. Finally, we show that while mutually exclusive pairs within or across cancer types are predominantly functionally interacting, the pairs in between cancer mutual exclusivity class are more often disconnected in functional networks.

TOP

TP094 (PT) - SwissTargetPrediction: a web server for target prediction of bioactive small molecules
Theme: Data
Date: Tuesday, July 14, 10:50 am - 11:10 amRoom: Liffey Hall 2

Presenting author: David Gfeller, University of Lausanne, Switzerland

Aurelien Grosdidier, Swiss Institute of Bioinformatics, Molecular modelling
Matthias Wirth, Swiss Institute of Bioinformatics, Molecular modelling
Antoine Daina, Swiss Institute of Bioinformatics, Molecular modelling
Olivier Michielin, Swiss Institute of Bioinformatics, Molecular modelling
Vincent Zoete, Swiss Institute of Bioinformatics, Molecular modelling

Session Chair: Ioannis Xenarios

Presentation Overview: Show
Large-scale phenotypic screening initiatives increasingly allow researchers to test the functional impact of small molecules in different eukaryotic species. However, for most bioactive compounds the targets are only partially known. Here, we introduce a new computational approach to predict the targets of bioactive small molecules based on a combination of chemical similarity measures [Gfeller et al. Bioinformatics, Dec 2013]. We further investigate the use of target homology to transfer small molecule-target interactions across organisms. Interestingly, when considering separately orthology and paralogy relationships, we find that mapping small molecule interactions among orthologs significantly improves prediction accuracy, while including paralogs leads to lower prediction accuracy. Overall, our work provides a novel approach to accurately predict the targets of small molecules by combining different kinds of chemical similarity measures and, for the first time, integrates target homology to leverage data from different species. The method is accessible at http://www.swisstargetprediction.ch.

TOP

TP095 (PT) - Large-Scale Model Quality Assessment for Improving Protein Tertiary Structure Prediction
Theme: Proteins
Date: Tuesday, July 14, 10:50 am - 11:10 amRoom: The Liffey B

Presenting author: Jianlin Cheng, University of Missouri-Columbia, United States

Debswapna Bhattacharya, University of Missouri-Columbia, United States
Jilong Li, University of Missouri-Columbia, United States
Renzhi Cao, University of Missouri-Columbia, United States

Session Chair: Francisco Melo Ledermann

Presentation Overview: Show
Motivation: Sampling structural models and ranking them are the two major challenges of protein structure prediction. Traditional protein structure prediction methods generally use one or a few quality assessment methods to select the best-predicted models, which cannot consistently select relatively better models and rank a large number of models well.

Results: Here, we develop a novel large-scale model quality assessment method in conjunction with model clustering to rank and select protein structural models. It unprecedentedly applied 14 model quality assessment methods to generate consensus model rankings, followed by model refinement based on model combination (i.e., averaging). Our experiment demonstrates that the large-scale model quality assessment approach is more consistent and robust in selecting models of better quality than any individual quality assessment method. Our method was blindly tested during the 11th Critical Assessment of Techniques for Protein Structure Prediction (CASP11) as MULTICOM group. It was officially ranked 3rd out of all 143 human and server predictors according to the total scores of the first models predicted for 78 CASP11 protein domains and 2nd according to the total scores of the best of the five models predicted for these domains. MULTICOM’s outstanding performance in the extremely competitive 2014 CASP11 experiment proves that our large-scale quality assessment approach together with model clustering is a promising solution to one of the two major problems in protein structure modeling.

Availability: The web server is available at: http://sysbio.rnet.missouri.edu/multicom_cluster/human/.
Contact: chengji@missouri.edu

TOP

TP096 (PT) - Integrated exome and transcriptome sequencing reveals ZAK isoform usage in gastric cancer
Theme: Disease
Date: Tuesday, July 14, 10:50 am - 11:10 amRoom: The Liffey A

Presenting author: Jinfeng Liu, Genentech, United States

Mark McCleland, Genentech, Pathology
Eric Stawiski, Genentech, Molecular Biology
Florian Gnad, Genentech, Bioinformatics and Computational Biology
Oleg Mayba, Genentech, Bioinformatics and Computational Biology
Peter Haverty, Genentech, Bioinformatics and Computational Biology
Steffen Durinck, Genentech, Molecular Biology
Ying-Jiun Chen, Genentech, Molecular Biology
Christiaan Klijn, Genentech, Bioinformatics and Computational Biology
Suchit Jhunjhunwala, Genentech, Bioinformatics and Computational Biology
Michael Lawrence, Genentech, Bioinformatics and Computational Biology
Hanbin Liu, Genentech, Bioinformatics and Computational Biology
Yinan Wan, Genentech, Bioinformatics and Computational Biology
Vivek Chopra, Genentech, Pathology
Murat Yaylaoglu, Genentech, Pathology
Wenlin Yuan, Genentech, Molecular Biology
Connie Ha, Genentech, Molecular Biology
Houston Gilbert, Genentech, Non-clinical Biostatistics
Jens Reeder, Genentech, Bioinformatics and Computational Biology
Gregoire Pau, Genentech, Bioinformatics and Computational Biology
Jeremy Stinson, Genentech, Molecular Biology
Howard Stern, Genentech, Pathology
Gerard Manning, Genentech, Bioinformatics and Computational Biology
Thomas Wu, Genentech, Bioinformatics and Computational Biology
Richard Neve, Genentech, Discovery Oncology
Frederic de Sauvage, Genentech, Molecular Biology
Zora Modrusan, Genentech, Molecular Biology
Somasekar Seshagiri, Genentech, Molecular Biology
Ron Firestein, Genentech, Pathology
Zemin Zhang, Genentech, Bioinformatics and Computational Biology

Session Chair: Louxin Zhang

Presentation Overview: Show
Integrative data analysis of genomic and transcriptomic alterations has become critical towards our understanding of disease drivers and personalized cancer therapy. Here, we describe the first comprehensive characterization of paired exomes and transcriptomes of 48 primary tumors and 21 cell lines from gastric cancer, the second leading cause of worldwide cancer mortality. We found that more than half of our patient collection could potentially benefit from targeted therapies. We performed systematic analysis of both mutation-dependent aberrant splicing and mutation-independent splicing isoforms in gastric cancer, and identified 55 splice-site mutations accompanied by aberrant splicing products and about 200 genes with differential isoform usage between tumors and normals. Among genes in cancer pathways found to have altered splicing in tumors, we discovered that the long isoform of ZAK kinase was preferentially upregulated in several cancer types, and isoform-specific oncogenic properties of ZAK were subsequently confirmed by functional validation.

TOP

TP097 (PT) - Inferring Models of Multiscale Copy Number Evolution for Single-Tumor Phylogenetics
Theme: Genes / Disease
Date: Tuesday, July 14, 11:40 am - 12:00 pmRoom: The Auditorium

Presenting author: Russell Schwartz, Carnegie Mellon University, United States

E. Michael Gertz, NCBI/NLM/NIH, United States
Darawalee Wangsa, NCI/NIH, United States
Thomas Ried, NCI/NIH, United States
Alejandro Schaffer, NCBI/NLM/NIH, United States
Salim Akhter Chowdhury, Carnegie Mellon University, United States

Session Chair: Niko Beerenwinkel

Presentation Overview: Show
Motivation: Phylogenetic algorithms have begun to see widespread use in cancer research to reconstruct processes of evolution in tumor progression. Developing reliable phylogenies for tumor data requires quantitative models of cancer evolution that include the unusual genetic mechanisms by which tumors evolve, such as chromosome abnormalities, and allow for heterogeneity between tumor types and individual patients. Previous work on inferring phylogenies of single tumors by copy number evolution assumed models of uniform rates of genomic gain and loss across different genomic sites and scales, a substantial oversimplification necessitated by a lack of algorithms and quantitative parameters for fitting to more realistic tumor evolution models.

Results: We propose a framework for inferring models of tumor progression from single-cell gene copy number data, including variable rates for different gain and loss events. We propose a new algorithm for identification of most parsimonious combinations of single gene and single chromosome events. We extend it via dynamic programming to include genome duplications. We implement an expectation maximization (EM)-like method to estimate mutation-specific and tumor-specific event rates concurrently with tree reconstruction. Application of our algorithms to real cervical cancer data identifies key genomic events in disease progression consistent with prior literature. Classification experiments on cervical and tongue cancer datasets lead to improved prediction accuracy for the metastasis of primary cervical cancers and for tongue cancer survival.

Availability: Our software (FISHtrees) and two datasets are available at ftp://ftp.ncbi.nlm.nih.gov/pub/FISHtrees.

TOP

TP098 (PT) - Entropy-scaling search of massive biological data
Theme: Data
Date: Tuesday, July 14, 11:40 am - 12:00 pmRoom: Liffey Hall 2

Presenting author: Noah Daniels, Massachusetts Institute of Technology, United States

Y. William Yu, MIT, Mathematics
Bonnie Berger, MIT, Mathematics & CSAIL
David Danko, MIT, CSAIL

Session Chair: Ioannis Xenarios

Presentation Overview: Show
The continual onslaught of new omics data has forced upon scientists the fortunate problem of having too much data to analyze. Luckily, it turns out that many datasets exhibit well-defined structure that can be exploited for the design of smarter analysis tools. We introduce an entropy-scaling data structure—which given a low fractal dimension database, scales in both time and space with the entropy of that underlying database—to perform similarity search, a fundamental operation in data science. Using these ideas, we present accelerated versions of standard tools for use by practitioners in the three domains of high-throughput drug screening, metagenomics, and protein structure search, none of which have any loss in specificity or significant loss in sensitivity: a 12x speedup of small molecule similarity search (SMSD) with less than 4% loss in sensitivity; a 673x speedup of BLASTX with less than 5% loss in sensitivity; and a 10x speedup of protein structure search (FragBag) with less than 0.2% loss in sensitivity.

TOP

TP099 (PT) - cNMA: A framework of encounter complex-based normal mode analysis to model conformational changes in protein interactions
Theme: Proteins
Date: Tuesday, July 14, 11:40 am - 12:00 pmRoom: The Liffey B

Presenting author: Yang Shen, Texas A&M University, United States

Tomasz Oliwa, Toyota Technological Institute as Chicago, United States

Session Chair: Francisco Melo Ledermann

Presentation Overview: Show
Motivation: It remains both a fundamental and practical challenge to understand and anticipate motions and conformational changes of proteins during their associations. Conventional normal mode analysis (NMA) based on anisotropic network model (ANM) addresses the challenge by generating normal modes reflecting intrinsic flexibility of proteins, which follows a conformational selection model for protein--protein interactions. But earlier studies have also found cases where conformational selection alone could not adequately explain conformational changes and other models have been proposed. Moreover, there is a pressing demand of constructing a much reduced but still relevant subset of protein conformational space in order to improve computational efficiency and accuracy in protein docking, especially for the difficult cases with significant conformational changes.

Method and Results: With both conformational selection and induced fit models considered, we extend ANM to include concurrent but differentiated intra- and inter-molecular interactions and develop an encounter complex-based NMA (cNMA) framework. Theoretical analysis and empirical results over a large data set of significant conformational changes indicate that cNMA is capable of generating conformational vectors considerably better at approximating conformational changes with contributions from both intrinsic flexibility and inter-molecular interactions than conventional NMA only considering intrinsic flexibility does. The empirical results also indicate that a straightforward application of conventional NMA to an encounter complex often does not improve upon NMA for an individual protein under study and intra- and inter-molecular interactions need to be differentiated properly. Moreover, in addition to induced motions of a protein under study, the induced motions of its binding partner as well as the coupling between the two sets of protein motions present in a near-native encounter complex lead to the improved performance. A study to isolate and assess the sole contribution of intermolecular interactions towards improvements against conventional NMA further validates the additional benefit from induced-fit effects. Taken together, these results provide new insights into molecular mechanisms underlying protein interactions and new tools for dimensionality reduction for flexible protein docking.

Availability: Source codes are available upon request.

TOP

TP100 (PT) - Reconstruction of clonal trees and tumor composition from multi-sample sequencing data
Theme: Genes / Disease
Date: Tuesday, July 14, 12:00 pm - 12:20 pmRoom: The Auditorium

Presenting author: Mohammed El-Kebir, Brown University, United States

Layla Oesper, Brown University, United States
Hannah Acheson-Field, Brown University, United States
Ben Raphael, Brown University, United States

Session Chair: Niko Beerenwinkel

Presentation Overview: Show
Motivation: DNA sequencing of multiple samples from the same tumor provides data to analyze the process of clonal evolution in the population of cells that give rise to a tumor.

Results: We formalize the problem of reconstructing the clonal evolution of a tumor using single-nucleotide mutations as the Variant Allele Frequency Factorization Problem (VAFFP). We derive a combinatorial characterization of the solutions to this problem and show that the problem is NP-complete. We derive an integer linear programming solution to the VAFFP in the case of error-free data and extend this solution to real data with a probabilistic model for errors. The resulting AncesTree algorithm is better able to identify ancestral relationships between individual mutations than existing approaches, particularly in ultra-deep sequencing data when high read counts for mutations yield high confidence variant allele frequencies.

TOP

TP101 (PT) - MeSHLabeler: Improving the Accuracy of Large-scale MeSH indexing by Integrating Diverse Evidence
Theme: Data
Date: Tuesday, July 14, 12:00 pm - 12:20 pmRoom: Liffey Hall 2

Presenting author: Shanfeng Zhu, Fudan University, China

Shengwen Peng, Fudan University, China
Junqiu Wu, Central South University, China
Chengxiang Zhai, UIUC, United States
Hiroshi Mamitsuka, Kyoto University, Japan
Ke Liu, Fudan University, China

Session Chair: Ioannis Xenarios

Presentation Overview: Show
Motivation: Medical Subject Headings (MeSH) are used by National Library of Medicine (NLM) to index almost all citations in MEDLINE, which greatly facilitates the applications of biomedical information retrieval and text mining. To reduce the time and financial cost of manual annotation, NLM has developed a software package, Medical Text Indexer (MTI), for assistin MeSH annotation, which uses {\it k}-nearest neighbors (KNN), pattern matching and indexing rules. Other types of information, such as prediction by MeSH classifiers (trained separately), can also be used for automatic MeSH annotation. However, existing methods cannot effectively integrate multiple evidence for MeSH annotation.

Methods: We propose a novel framework, MeSHLabeler, to integrate multiple evidence for accurate MeSH annotation by using "learning to rank''. Evidence includes numerous predictions from MeSH classifiers, KNN, pattern matching, MTI and the correlation between different MeSH terms, etc. Each MeSH classifier is trained independently, and thus prediction scores from different classifiers are incomparable. To address this issue, we have developed an effective score normalization procedure to improve the prediction accuracy.

Result: MeSHLabeler won the first place in Task 2A of 2014 BioASQ challenge, achieving the Micro F-measure of 0.6248 for 9,040 citations provided by the BioASQ challenge.
Note that this accuracy is around 9.15\% higher than 0.5724, obtained by
MTI.
Availability: The software is available upon request.

TOP

TP102 (PT) - Sequence co-evolution gives 3D contacts and structures of protein complexes
Theme: Proteins
Date: Tuesday, July 14, 12:00 pm - 12:20 pmRoom: The Liffey B

Presenting author: Charlotta Schaerfe, University of Tübingen/Harvard Medical School, Germany

Thomas Hopf, Technische Universität München/Harvard Medical School, Department of Informatics/Systems Biology
João Rodrigues, Utrecht University, Computational Structural Biology Group
Anna Green, Harvard Medical School, Systems Biology
Oliver Kohlbacher, University of Tübingen, Department of Computer Science
Chris Sander, Memorial Sloan Kettering Cancer Center, Computational Biology Center
Alexandre Bonvin, Utrecht University, Computational Structural Biology Group
Debora Marks, Harvard Medical School, Systems Biology

Session Chair: Francisco Melo Ledermann

Presentation Overview: Show
The interactions of proteins with other biomolecules are essential for all biological activity and thus the accurate prediction of protein-protein interaction partners and interface-residues has been of great interest to the scientific community. Here we present a method, EVcomplex, that allows to predict such data from the evolutionary sequence record alone by making use of residue coevolution between proteins.
This method can have stark implications for various topics from the determination of the actual binding partners and binding sites in large protein complexes to whole genome interactome predictions. In the presentation I will show that the evolutionary record allows us to predict novel protein-protein interactions as well as alternate binding conformations without additional external knowledge of the protein’s 3D structure.

TOP

TP103 (PT) - Reconstructing the Evolutionary History of Tumors
Theme: Genes / Disease
Date: Tuesday, July 14, 12:20 pm - 12:40 pmRoom: The Auditorium

Presenting author: Amit Deshwar, University of Toronto, Canada

Shankar Vembu, University of Toronto, CCBR
Christina Yung, Ontario Institute for Cancer Research, Informatics & Biocomputing Program
Gun Ho Jang, Ontario Institute for Cancer Research, Informatics & Biocomputing Program
Lincoln Stein, Ontario Institute for Cancer Research, Informatics & Biocomputing Program

Session Chair: Niko Beerenwinkel

Presentation Overview: Show
Tumors often contain multiple, genetically-diverse subpopulations. Reconstructing the genotype of these subpopulations by determining which of the somatic tumor-associated mutations they contain is a problem of considerable interest to aid in the understanding of tumor development and treatment response. While there has been considerable progress in automated methods for reconstruction, many fundamental questions about this problem remain unanswered. Many subclonal reconstruction methods, including ours, attempt to reconstruct the evolutionary history of the tumour as a means to assign complete genotypes to each subpopulations. I will discuss the current state of the field and our latest work on this problem. I will introduce PhyloWGS, a Bayesian method that is the first to use CNVs and SNVs to perform phylogenetic subclonal reconstruction. PhyloWGS returns a distribution over possible subclonal reconstructions, enabling the identification of portions of the reconstruction that are highly certain and those that are not.

TOP

TP104 (PT) - Knowledge-driven geospatial location resolution for phylogeographic models of virus migration
Theme: Data
Date: Tuesday, July 14, 12:20 pm - 12:40 pmRoom: Liffey Hall 2

Presenting author: Graciela Gonzalez, Arizona State University, United States

Tasnia Tahsin, Arizona State University, United States
Rachel Beard, Arizona State University, United States
Mari Firago, Arizona State University, United States
Robert Rivera, Arizona State University, United States
Matthew Scotch, Arizona State University, United States
Davy Weissenbacher, Arizona State University, United States

Session Chair: Ioannis Xenarios

Presentation Overview: Show
Diseases caused by zoonotic viruses (viruses transmittable between humans and animals) are a major threat to public health throughout the world. By studying virus migration and mutation patterns, the field of phylogeography provides a valuable tool for improving their surveillance. A key component in phylogeographic analysis of zoonotic viruses involves identifying the specific locations of relevant viral sequences. This is usually accomplished by querying public databases such as GenBank and examining the geospatial metadata in the record. When sufficient detail is not available, a logical next step is for the researcher to conduct a manual survey of the corresponding published articles.

In this paper, we present a system for detection and disambiguation of locations (toponym resolution) in full-text articles in order to automate the retrieval of sufficient metadata. Our system has been tested on a manually annotated corpus of journal articles related to phylogeography using integrated heuristics for location disambiguation including a distance heuristic, a population heuristic, and a novel heuristic utilizing knowledge obtained from GenBank metadata (i.e. a "metadata heuristic").

For detecting and disambiguating locations, our system performed best using the metadata heuristic (0.54 Precision, 0.89 Recall and 0.68 F-score). Precision reaches 0.88 when examining only the disambiguation of location names. Our error analysis showed that a noticeable increase in the accuracy of toponym resolution is possible by improving the geospatial location detection. By improving these fundamental automated tasks, our system can be a useful resource to phylogeographers that rely on geospatial metadata of GenBank sequences.

TOP

TP105 (PT) - Finding Optimal Interaction Interface Alignments between Biological Complexes
Theme: Proteins
Date: Tuesday, July 14, 12:20 pm - 12:40 pmRoom: The Liffey B

Presenting author: Xuefeng Cui, King Abdullah University of Science and Technology, Saudi Arabia

Hammad Naveed, King Abdullah University of Science and Technology, Saudi Arabia
Xin Gao, King Abdullah University of Science and Technology, Saudi Arabia

Session Chair: Francisco Melo Ledermann

Presentation Overview: Show
Motivation: Biological molecules perform their functions through
interactions with other molecules. Structure alignment of interaction
interfaces between biological complexes is an indispensable step in detecting
their structural similarities, which are key to understanding their
evolutionary histories and functions. Although various structure alignment
methods have been developed to successfully access the similarities of protein
structures or certain types of interaction interfaces, existing alignment tools
cannot directly align arbitrary types of interfaces formed by protein, DNA or
RNA molecules. Specifically, they require a "blackbox preprocessing" to
standardize interface types and chain identifiers. Yet their performance is
limited and sometimes unsatisfactory.

Results: Here we introduce a novel method, PROSTA-inter, that
automatically determines and aligns interaction interfaces between two
arbitrary types of complex structures. Our method uses sequentially remote
fragments to search for the optimal superimposition. The optimal residue
matching problem is then formulated as a maximum weighted bipartite matching
problem to detect the optimal sequence order-independent alignment. Benchmark
evaluation on all non-redundant protein-DNA complexes in PDB shows significant
performance improvement of our method over TM-align and iAlign (with the
"blackbox preprocessing"). Two case studies where our method discovers, for
the first time, structural similarities between two pairs of functionally
related protein-DNA complexes are presented. We further demonstrate the power
of our method on detecting structural similarities between a protein-protein
complex and a protein-RNA complex, which is biologically known as a protein-RNA
mimicry case.

TOP

TP106 (PT) - Importance of rare copy number alterations for personalized tumor characterization
Theme: Disease
Date: Tuesday, July 14, 2:00 pm - 2:20 pmRoom: The Auditorium

Presenting author: Andreas Beyer, University of Cologne, Germany

Andreas Beyer, University of Cologne, CECAD
Betty Friedrich, ETH Zurich, IMSB
Michael Seifert, TU Dresden, ZIH

Session Chair: Natasa Przulj

Presentation Overview: Show
Copy number alterations (CNAs) of large genomic regions are frequent in many tumor types, but only few of them are assumed to be relevant for the cancerous phenotype. It has proven exceedingly difficult to ascertain rare mutations that might have strong effects in individual patients. Here, we show that a genome-wide transcriptional regulatory network inferred from gene expression and gene copy number data of 768 human cancer cell lines can be used to quantify the impact of individual patient-specific gene CNAs on cancer-specific survival signatures. The model was highly predictive for gene expression in 4,548 clinical samples originating from 13 different tissues. Focused analysis of tumors from six tissues revealed that in an individual patient a combination of up to 100 gene CNAs directly or indirectly affected the expression of clinically relevant survival signature genes. Importantly, rare patient-specific mutations (< 1% in a given cohort) often had stronger effects on signature genes than frequent mutations. Subsequent integration with genomic data suggests that frequency variation among high-impact genes is mainly driven by gene location rather than gene function. Our framework contributes to the individualized quantification of cancer risk, along with determining individual key risk factors and their downstream targets.

TOP

TP107 (PT) - An Integrated Mass Spectrometry-Computational Approach for Modelling Large Protein Assemblies
Theme: Proteins
Date: Tuesday, July 14, 2:00 pm - 2:20 pmRoom: The Liffey B

Presenting author: Argyris Politis, King's College London,

Session Chair: Donna Slonim

Presentation Overview: Show
We present an integrated mass spectrometry (MS)-computational method for modelling the structure and dynamics of large protein assemblies. This method computationally integrates orthogonal data sets derived from native MS, ion mobility MS and labelling MS experiment with different levels of resolution and information content. We assessed the method on its ability to reproduce the native structures in a set of five benchmark complexes with varying levels of MS-derived data. Then we applied the method to characterizing the 3D architecture of the yeast eukaryotic initiation factor eIF3 in complex with eIF5.

TOP

TP108 (PT) - Accurate phasing of allele-specific copy-numbers for inferring tumour evolution with probe-level resolution
Theme: Disease
Date: Tuesday, July 14, 2:20 pm - 2:40 pmRoom: The Auditorium

Presenting author: Roland Schwarz, European Molecular Biology Laboratory - European Bioinformatics Institute,

Roland Schwarz, European Molecular Biology Laboratory, European Bioinformatics Institute

Session Chair: Natasa Przulj

Presentation Overview: Show
Accurate reconstruction of the evolutionary history of cancer in the patient and quantification of intra-tumour heterogeneity are current challenges in cancer genomics. The accuracy of tree inference from genomic rearrangements depends on the quality of the phasing of copy-numbers: the assignment of major and minor copy-numbers to the two physical parental alleles. So far phasing has been done using evolutionary criteria alone, a heuristic and computationally expensive procedure which impedes probe-level resolution tree reconstruction.

We here present a novel phasing algorithm, which extends our previous work on allele-specific segmentation of copy-numbers. Using the shared genetic background of multiple samples from the same patient we assign copy-numbers to physical alleles based on the bi-allelic frequency distribution of heterozygous SNPs. In combination with our previously established evolutionary phasing algorithm this provides a new, accurate and fast phasing method which leverages the available SNP data effectively. This is a crucial step towards probe-level resolution tree inference on genomic rearrangement events in cancer and exact quantification of genetic heterogeneity for routine applications in translational cancer research.

TOP

TP109 (PT) - In silico prediction of physical protein interactions and characterization of interactome orphans
Theme: Proteins
Date: Tuesday, July 14, 2:20 pm - 2:40 pmRoom: The Liffey B

Presenting author: Igor Jurisica, Princess Margaret Cancer Centre, Canada

Max Kotlyar, UHN, -
Chiara Pastrello, UHN, -
Flavia Pivetta, CRO, -
A Losardo, CRO, -
Christian A. Cumbaa, UHN, -
Han Li, SLRI, -
Z Ding, MDA, -
Tania Naranian, SLRI, -
Yun Niu, Nanjing University, -
F Vafaee, USW, -
Julia Petschnigg, UCL, -
Gordon Mills, MDA, -
Andrea Jurisicova, SLRI, -
Igor Stagljar, U Toronto, CCBR
Roberta Maestro, CRO, -

Session Chair: Donna Slonim

Presentation Overview: Show
Protein interaction networks represent an essential infrastructure for systems biology. However, about 20% of human proteins have no interactions and another 33% have <= five. Many of these proteins play important roles in disease and are potential drug targets. To reduce this “disease-related sparseness” of the human interactome, we introduced a data mining-based method, FpClass, and predict 250,452 high confidence PPIs among 10,529 proteins, including 1,089 interactome orphans. Compared to previous methods, FpClass achieved better agreement with experimentally detected PPIs. Using three bioassays we validated 137 of 233 tested predictions; 5 involving orphans now shown to interact with P53. Overall, validation achieved 74% sensitivity with 53% FDR. To better understand why some proteins have few known interactions we investigated their properties and discovered that they are significantly younger, more tissue specific, and more likely to be extracellular than other proteins. However, additional challenges prevent systematic study of these proteins.

TOP

TP110 (PT) - Inferring clonal evolution from single-cell sequencing data
Theme: Disease
Date: Tuesday, July 14, 2:40 pm - 3:00 pmRoom: The Auditorium

Presenting author: Edith Ross, University of Cambridge,

Edith Ross, University of Cambridge, Cancer Research UK Cambridge Institute
Florian Markowetz, University of Cambridge, Cancer Research UK Cambridge Institute

Session Chair: Natasa Przulj

Presentation Overview: Show
Tumour evolution leads to genetic intra-tumour heterogeneity, which poses major challenges to cancer therapy. While this heterogeneity has been documented in several cases, many details of the underlying evolutionary processes are still unknown.

Studying pathways of tumour evolution promises to provide insights into early stages of cancer development and to allow predictions about whether or not early-stage tumours are likely to progress to more aggressive forms. So far, most methods for inferring tumour phylogenies use bulk sequencing data. However, they struggle to deconvolute the mixed signal into separate clones and their corresponding genotypes.

Here, we present oncoNEM, a probabilistic method for inferring intra-tumour evolutionary lineage trees from noisy exome- or genome-wide single-cell sequencing data. OncoNEM is based on the nested structure of mutations observed between cells and jointly infers the tree structure, the number of clones and their composition.

We evaluate the accuracy of oncoNEM in the controlled setting of a simulation study and demonstrate that (i) our method can accurately infer trees of tumour evolution despite the high allelic dropout rates of current single-cell sequencing technologies, (ii) it is robust to inaccuracies in the estimation of model parameters and (iii) it substantially outperforms competing methods.

TOP

TP111 (PT) - Cereblon as a gateway for pharmacologically induced teratogenicity
Theme: Proteins
Date: Tuesday, July 14, 2:40 pm - 3:00 pmRoom: The Liffey B

Presenting author: Andrei Lupas, Max-Planck-Institute for Developmental Biology, Germany

Iuliia Boichenko, Max-Planck-Institute for Developmental Biology, Protein Evolution
Mateusz Korycinski, Max-Planck-Institute for Developmental Biology, Protein Evolution
Hongbo Zhu, Max-Planck-Institute for Developmental Biology, Protein Evolution
Murray Coles, Max-Planck-Institute for Developmental Biology, Protein Evolution
Fabio Zanini, Max-Planck-Institute for Developmental Biology, Evolutionary Dynamics and Biophysics Group
Marcus Hartmann, Max-Planck-Institute for Developmental Biology, Protein Evolution
Birte Hernandez Alvarez, Max-Planck-Institute for Developmental Biology, Protein Evolution

Session Chair: Donna Slonim

Presentation Overview: Show
In the public perception, thalidomide mainly evokes children with stunted limbs. Less known is its ongoing importance for treating multiple myeloma and leprosy. Interest in its further pharmacological development thus remains high, but is hindered by the limited understanding of its teratogenic side-effects. Even the main target of thalidomide in the human body, cereblon, was unknown until recently. Given the intractability of cereblon for biochemical studies, we analyzed the evolution of its thalidomide-binding domain and used sequence-structure relationships to identify a prokaryotic model system, which we validated in vitro and in vivo, in a zebrafish fin development assay. In computational and experimental searches we identified uridine as the first biological, universally available ligand. We also found that a surprisingly large number of pharmacologically important substances with known teratogenic effects act through the same binding site as thalidomide, identifying cereblon as a gateway for teratogenicity in the human body.

TOP

TP112 (PT) - Exploring the structure and function of temporal networks with dynamic graphlets
Theme: Systems
Date: Tuesday, July 14, 3:30 pm - 3:50 pmRoom: The Auditorium

Presenting author: Tijana Milenkovic, University of Notre Dame, United States

Huili Chen, University of Notre Dame, United States
Yuriy Hulovatyy, University of Notre Dame, United States

Session Chair: Natasa Przulj

Presentation Overview: Show
Motivation: With increasing availability of temporal real-world networks, how to efficiently study these data? One can model a temporal network as a single aggregate static network, or as a series of time-specific snapshots, each being an aggregate static network over the corresponding time window. Then, one can use established methods for static analysis on the resulting aggregate network(s), but losing in the process valuable temporal information either completely, or at the interface between different snapshots, respectively. Here, we develop a novel approach for studying a temporal network more explicitly, by capturing inter-snapshot relationships.

Results: We base our methodology on well-established graphlets (subgraphs), which have been proven in numerous contexts in static network research. We develop new theory to allow for graphlet-based analyses of temporal networks. Our new notion of dynamic graphlets is different from existing dynamic network approaches that are based on temporal motifs (statistically significant subgraphs). The latter have limitations: their results depend on the choice of a null network model that is required to evaluate the significance of a subgraph, and choosing a good null model is non-trivial. Our dynamic graphlets overcome the limitations of the temporal motifs. Also, when we aim to characterize the structure and function of an entire temporal network or of individual nodes, our dynamic graphlets outperform the static graphlets. Clearly, accounting for temporal information helps. We apply dynamic graphlets to temporal age-specific molecular network data to deepen our limited knowledge about human aging.

TOP

TP113 (PT) - Computational saturated mutagenesis for mapping protein binding landscapes and identifying affinity- and specificity-enhancing mutations
Theme: Proteins
Date: Tuesday, July 14, 3:30 pm - 3:50 pmRoom: The Liffey B

Presenting author: Julia Shifman, Hebrew University of Jerusalem, Israel

Yonatan Aizner, Hebrew University of Jerusalem, Biological Chemistry
Jason Shirian, Hebrew University of Jerusalem, Biological Chemistry
Oz Sharabi, Hebrew University of Jerusalem, Biological Chemistry

Session Chair: Donna Slonim

Presentation Overview: Show
We developed an in silico saturation mutagenesis protocol that allows us to scan any binding interface with all amino acids and to predict changes in free energy of binding due to all single mutations, thereby constructing binding landscapes for various protein-protein interactions (PPIs). We tested the performance of the in silico saturation mutagenesis protocol in two evolutionary different classes of PPIs: high-affinity and multispecific PPIs and demonstrated that their binding landscapes are remarkably different. Wild-type sequences of high-affinity complexes are nearly optimized for binding and contain only a handful of mutations that enhance binding affinity further. In contrast, sequences of multispecific proteins lie far from the fitness maximum, presenting multiple possibilities for improvement. In both examples we show that our computational predictions agree well with experimental results and allow for successful identification of affinity- and specificity-enhancing mutations and cold-spot positions where mutations to several amino acids produce affinity improvement.

TOP

TP114 (PT) - Data visualization and modeling using Atlas of Cancer Signaling Network predicts clinical outcome
Theme: Systems
Date: Tuesday, July 14, 3:50 pm - 4:10 pmRoom: The Auditorium

Presenting author: Inna Kuperstein, Institut Curie –U900 INSERM - Mines ParisTech, France

Inna Kuperstein, Institut Curie –U900 INSERM - Mines ParisTech, Computational Systems Biology of Cancer
Eric Bonnet, Institut Curie –U900 INSERM - Mines ParisTech, Computational Systems Biology of Cancer
Eric Viara, Institut Curie –U900 INSERM - Mines ParisTech, Computational Systems Biology of Cancer
Maia Chanrion, Institut Curie –U900 INSERM - Mines ParisTech, Computational Systems Biology of Cancer
Hien-Anh Nguyen, Institut Curie –U900 INSERM - Mines ParisTech, Computational Systems Biology of Cancer
David Cohen, Institut Curie –U900 INSERM - Mines ParisTech, Computational Systems Biology of Cancer
Laurence Calzone, Institut Curie –U900 INSERM - Mines ParisTech, Computational Systems Biology of Cancer
luca Grieco, Institut Curie –U900 INSERM - Mines ParisTech, Computational Systems Biology of Cancer
Christophe Russo, Institut Curie –U900 INSERM - Mines ParisTech, Computational Systems Biology of Cancer
Maria Kondratova, Institut Curie –U900 INSERM - Mines ParisTech, Computational Systems Biology of Cancer
Marie Dutreix, Institut Curie –U900 INSERM - Mines ParisTech, Computational Systems Biology of Cancer
Sylvie Robine, Institut Curie –U900 INSERM - Mines ParisTech, Computational Systems Biology of Cancer
Emmanuel Barillot, Institut Curie –U900 INSERM - Mines ParisTech, Computational Systems Biology of Cancer
Andrei Zinovyev, Institut Curie –U900 INSERM - Mines ParisTech, Computational Systems Biology of Cancer

Session Chair: Natasa Przulj

Presentation Overview: Show
The successful application of bioinformatics and systems biology methods for analysis of high-throughput data in cancer research depends on availability of global and detailed reconstructions of signaling networks amenable for computational analysis. The Atlas of Cancer Signaling Network (ACSN) is an interactive and comprehensive map of molecular mechanisms implicated in cancer that includes tools for map navigation, visualization and analysis of molecular data in the context of signaling network maps. Constructing and updating ACSN involves manual literature curation and participation of experts in the corresponding fields. The cancer-oriented content of ACSN is original and covers major mechanisms involved in cancer progression. Cell signaling mechanisms are depicted in details, together creating a seamless ‘geographic-like’ map of molecular interactions frequently deregulated in cancer. The map is browsable using NaviCell web interface using the Google Maps engine and semantic zooming principle. The associated web-blog provides a forum for commenting and curating the ACSN content. ACSN allows uploading heterogeneous omics data from users on top of the maps for visualization and performing functional analyses. We suggest several scenarios for ACSN application in cancer research for visualizing high-throughput data. In addition, we show a study on drug sensitivity prediction using the ACSN. Finally, we describe how epithelial to mesenchymal transition (EMT) signaling network from the ACSN collection has been used for finding metastasis inducers in colon cancer through network analysis. ACSN may support data analysis and interpretation; patient stratification; prediction of treatment response and resistance to cancer drugs and design of novel treatment strategies.

TOP

TP115 (PT) - Using Kernelized Partial Canonical Correlation Analysis To Study Directly Coupled Side Chains and Allostery in Small G Proteins
Theme: Proteins
Date: Tuesday, July 14, 3:50 pm - 4:10 pmRoom: The Liffey B

Presenting author: Mu Zhu, University of Waterloo, Canada

Forbes Burkowski, University of Waterloo, Canada
Mu Zhu, University of Waterloo, Canada

Session Chair: Donna Slonim

Presentation Overview: Show
Motivation: Inferring structural dependencies among a protein’s side
chains helps us understand their coupled motions. It is known that
coupled fluctuations can reveal pathways of communication used for
information propagation in a molecule. Side-chain conformations are
commonly represented by multivariate angular variables, but existing
partial correlation methods that can be applied to this inference task
are not capable of handling multivariate angular data. We propose
a novel method to infer direct couplings from this type of data, and
show that this method is useful for identifying functional regions and
their interactions in allosteric proteins.

Results: We developed a novel extension of canonical correlation
analysis (CCA), which we call “kernelized partial CCA” (or simply
KPCCA), and used it to infer direct couplings between side chains,
while disentangling these couplings from indirect ones. Using the
conformational information and fluctuations of the inactive structure
alone for allosteric proteins in the Ras and other Ras-like families,
our method identified allosterically important residues not only as
strongly coupled ones but also in densely connected regions of the
interaction graph formed by the inferred couplings. Our results were
in good agreement with other empirical findings. By studying distinct
members of the Ras, Rho, and Rab sub-families, we show further that
KPCCA was capable of inferring common allosteric characteristics in
the small G protein super-family.

TOP

TP116 (PT) - Detecting Molecular Similarities Between Allergenic And Metazoan Parasitic Proteins: Allergy In The Light of Immunity
Theme: Proteins
Date: Tuesday, July 14, 4:10 pm - 4:30 pmRoom: The Liffey B

Presenting author: Nicholas Furnham, London School of Hygiene and Tropical Medicine,

Nidhi Tyagi, European Molecular Biology Laboratory, European Bioinformatics Institute
Edward Farnell, University of Cambridge, Department of Pathology
Colin Fitzsimmons, University of Cambridge, Department of Pathology
Stephanie Ryan, University of Edinburgh, Institute of Immunology and Infection Research
Rick Maizels, University of Edinburgh, Institute of Immunology and Infection Research
David Dunne, University of Cambridge, Department of Pathology
Janet Thornton, European Molecular Biology Laboratory, European Bioinformatics Institute
Nicholas Furnham, London School of Hygiene & Tropical Medicine, Department of Pathogen Molecular Biology

Session Chair: Donna Slonim

Presentation Overview: Show
Allergic reactions are observed to be very similar to those implicated in the acquisition of an important degree of immunity against metazoan parasites, eliciting a similar immunoglobulin E (IgE) immune response. Based on the hypothesis that IgE-mediated immune responses evolved to provide extra protection against metazoan parasites rather than to cause allergy, we predict that environmental allergens will share key molecular properties with metazoan parasite antigens that are specifically targeted by IgE. Using large scale computational studies, we have established molecular similarity between parasite proteins and allergens and are able to predict the regions of parasite proteins that potentially share similarity with the IgE-binding region(s) of allergens. Nearly half of 2445 parasite proteins that show significant similarity with allergenic proteins fall within the 10 most abundant allergenic protein domain families. Our experimental studies support the predictions, and we present the first confirmed example of a plant pollen-like protein that is the commonest allergen in pollen in a worm and confirming it is targeted by IgE in those exposed to infection in a schistosomiasis endemic area of Uganda. The identification of such similarities explains the ‘off-target’ effects of the IgE-mediated immune system in allergy.

TOP

TT01 (PT) - Biomolecular sequence analysis with Jalview and JPred4
Theme:
Date: Monday, July 13, 10:10 am - 11:10 amRoom: Wicklow Hall 2B

Presenting author: Geoffrey Barton, ,

Session Chair:

Presentation Overview: Show
The updated Jpred4 protein secondary structure prediction service has enhanced result visualisations, and a REST API for high-throughput analysis. The Jalview multiple sequence alignment workbench now supports linked cDNA/Protein analysis, integration with UCSF Chimera and the PDBe query services for 3D structure discovery.

TOP

TT02 (PT) - BioJS 2.0: more flexible, modular, open web biological visualization
Theme:
Date: Monday, July 13, 11:40 am - 12:00 pmRoom: Wicklow Hall 2B

Presenting author: Manuel Corpas, ,

Session Chair:

Presentation Overview: Show
We present BioJS 2.0, the newly updated BioJavaScript library for visualisation of biological data. Since our upgrade we now have adopted NPM, browserify and other JavaScript technologies to give BioJS that interactive look and feel we all recognise in modern web applications.

TOP

TT03 (PT) - The European Variation Archive
Theme:
Date: Monday, July 13, 12:00 pm - 12:20 pmRoom: Wicklow Hall 2B

Presenting author: Francisco Lopez-Domingo, ,

Session Chair:

Presentation Overview: Show
The European Variation Archive (EVA) is a genetic variation resource covering variants of any type across species and populations, including both germline and cancer genomes. EVA is an open access Big Data provider that enables users to efficiently retrieve relevant information from its knowledge-base. Current implementation consists of a HTML5 web-interface and a NoSQL backend

TOP

TT04 (PT) - Sensitive and fast homology searches with HMMER via the Web
Theme:
Date: Monday, July 13, 12:20 pm - 12:40 pmRoom: Wicklow Hall 2B

Presenting author: Robert Finn, ,

Session Chair:

Presentation Overview: Show
Homology searches allow the transfer of data between sequences. The HMMER website (http://www.ebi.ac.uk/Tools/hmmer) provides fast, accurate and sensitive searches. Through the synergistic coupling of algorithmic developments, hardware and intuitive data visualizations, large sequence databases can be searched and the results analyzed and filtered in close to real-time.

TOP

TT05 (PT) - The Universal Protein Resource (UniProt): New Development on Proteomes, Variation and Proteomics
Theme:
Date: Monday, July 13, 2:00 pm - 2:20 pmRoom: Wicklow Hall 2B

Presenting author: Benoit Bely, ,

Session Chair:

Presentation Overview: Show
The demonstration will present how to use the Proteomes portal on UniProt.org to find data about non-redundant and redundant proteomes.
It will also demonstrate the content and use of proteomics and variation data in UniProt.

TOP

TT06 (PT) - ProDomAs; A Web Server for Assigning Protein Domains using Neural Network
Theme:
Date: Monday, July 13, 2:20 pm - 2:40 pmRoom: Wicklow Hall 2B

Presenting author: Changiz Eslahchi, , Iran

Session Chair:

Presentation Overview: Show
ProDomAs web server implements a novel automatic algorithm for assigning structural domains of a protein chain. Results show that ProDomAs outperforms other automatic methods. This web server can be used to decompose and visualize structural domains in 1D and 3D representations and is available at http://bs.ipm.ir/softwares/prodomas.

TOP

TT07 (PT) - Multiple Sequence Alignment using Clustal Omega
Theme:
Date: Monday, July 13, 2:40 pm - 3:00 pmRoom: Wicklow Hall 2B

Presenting author: Fabian Sievers, , Ireland

Session Chair:

Presentation Overview: Show
Clustal Omega is a Multiple Sequence Alignment program that can
align virtually any number of protein sequences and nucleotide
sequences of comparable length. It uses Hidden Markov Models to
align individual sequences and profiles. Clustal Omega is faster than
high accuracy aligners and more accurate than faster aligners, as
measured on protein structure derived benchmarks

TOP

TT08 (PT) - 20 years of CATH: new methods to predict the structure and function of novel protein sequences
Theme:
Date: Monday, July 13, 3:30 pm - 3:50 pmRoom: Wicklow Hall 2B

Presenting author: Ian Sillitoe, ,

Session Chair:

Presentation Overview: Show
New tools in CATH: predicting protein function from sequence, investigating evolution of structural domains, accessing representative datasets and bleeding-edge annotations.

TOP

TT09 (PT) - Phyre2: Protein modeling and analysis made easy
Theme:
Date: Monday, July 13, 3:50 pm - 4:10 pmRoom: Wicklow Hall 2B

Presenting author: Mark Wass, ,

Session Chair:

Presentation Overview: Show
The Phyre2 web portal is used by over 40,000 researchers and provides rapid, accurate 3D modeling of a protein sequence with an intuitive and easy-to-use interface for exploring results. The demonstration will provide an overview of the range of tools available in Phyre2

TOP

TT10 (PT) - Aquaria: Simplifying discovery and insight from protein structures
Theme:
Date: Monday, July 13, 4:10 pm - 4:30 pmRoom: Wicklow Hall 2B

Presenting author: Seán O'Donoghue, , Australia

Session Chair:

Presentation Overview: Show
This Tech Track talk will present Aquaria (http://aquaria.ws), a new web resource that streamlines and simplifies the generation of insight from protein structures. The talk will outline how Aquaria can be used, both by biologists (via its web-based user interface) as well as bioinformaticians (via its API).

TOP

TT11 (PT) - Illumina BaseSpace® Apps - Accelerating Bioinformatics Analysis with Easy, Accurate, and Efficient tools for Next Generation Sequencing Data
Theme:
Date: Tuesday, July 14, 10:10 am - 11:10 amRoom: Wicklow Hall 2A

Presenting author: Raymond Tecotzky, , United States

Session Chair:

Presentation Overview: Show
We will show an overview and then a rapid, real-time demonstration of several apps to demonstrate the power of state of the art informatics analysis of genomic nextgen sequencing data and the compelling results of those analyses.
:-)

TOP

TT12 (PT) - A comparison of predicted promoters between hg19 and hg38 reference genomes
Theme:
Date: Tuesday, July 14, 10:10 am - 10:30 amRoom: Wicklow Hall 2B

Presenting author: Alexander Kaplun, , United States

Session Chair:

Presentation Overview: Show
We will discuss the results of the comparison of the virtually calculated promoters between the hg19 and hg38 reference genomes along with insights gained regarding the distribution of regulatory features such as known transcription factor binding sites, ChIP fragments and DNAse I hypersensitivity sites.

TOP

TT13 (PT) - "Bringing structure to biology" on a device near you
Theme:
Date: Tuesday, July 14, 10:30 am - 10:50 amRoom: Wicklow Hall 2B

Presenting author: Swanand Gore, ,

Session Chair:

Presentation Overview: Show
We present significant enhancements to macromolecular structure representation at PDBe (http://pdbe.org):

- REST API with interactive documentation (http://pdbe.org/api)

- Responsive webpages with concise representation of key data features including biological assemblies; domain mappings; function; literature; experiment and validation (e.g. http://pdbe.org/2qk9)

- Search interface with a powerful auto-suggestion and faceted presentation of results
(http://www.ebi.ac.uk/pdbe/entry/search/index)

TOP

TT14 (PT) - A tale of ice and fire: Stability and change in bioinformatics framework services at EMBL-EBI
Theme:
Date: Tuesday, July 14, 11:40 am - 12:00 pmRoom: The Liffey A

Presenting author: Andrew Cowley, ,

Session Chair:

Presentation Overview: Show
We will describe recent changes to the Job Dispatcher framework and EBI Search engine at EMBL-EBI, highlighting the need to maintain compatibility while also adding new features and retiring depreciated services.

TOP

TT15 (PT) - Chipster VM - comprehensive package of NGS data analysis tools and reference data, with an intuitive GUI
Theme:
Date: Tuesday, July 14, 11:40 am - 12:00 pmRoom: Wicklow Hall 2A

Presenting author: Eija Korpelainen, , Finland

Session Chair:

Presentation Overview: Show
Chipster is free, open source software for analyzing high-throughput data. It is available as a ready-to-run virtual machine containing a comprehensive collection of analysis tools and reference data. Users can save workflows and visualize data interactively with a GUI. This talk covers Chipster from users’ and administrators’ point of view.

TOP

TT16 (PT) - What’s new in the Galaxy (Project)?
Theme:
Date: Tuesday, July 14, 12:00 pm - 12:20 pmRoom: The Liffey A

Presenting author: Dave Clements, , United States

Session Chair:

Presentation Overview: Show
An update on the Galaxy Project, a free and open- source data integration and analysis platform for life science research (http://galaxyproject.org). This talk will highlight recent and upcoming work in multi-sample analysis, user interface improvements, tool usage and definition updates, using Docker for tool integration, and integration with interactive environments.

TOP

TT17 (PT) - XEML Interactive Designer, Metadata Management Tool for Genotypes, Environmental Conditions and Sampling Strategy
Theme:
Date: Tuesday, July 14, 12:00 pm - 12:20 pmRoom: Wicklow Hall 2A

Presenting author: Benjamin Dartigues, , France

Session Chair:

Presentation Overview: Show
XEML-Lab Designer, a metadata capture workflow enabling the intuitive design of experiments is back. Originally designed to facilitate sharing of environmental data related to experiments, the new version is now available for all standard platforms. This presentation summarizes all available functionalities as well as the downstream data integration analysis.

TOP

TT18 (PT) - RiboGalaxy: a platform for the alignment, analysis and visualization of ribo-seq data.
Theme:
Date: Tuesday, July 14, 12:20 pm - 12:40 pmRoom: The Liffey A

Presenting author: Audrey Michel, , Ireland

Session Chair:

Presentation Overview: Show
We will present RiboGalaxy (http://ribogalaxy.ucc.ie/), a freely
available Galaxy-based web server specifically tailored for pre-processing, aligning and analysing ribo-seq data with the visualisation functionality provided by GWIPS-viz (http://gwips.ucc.ie).
We will demonstrate how users
can pre-process, align and analyse their ribo-seq data with the tools and workflows hosted on RiboGalaxy.

TOP

TT19 (PT) - @Note2 – open-source computational tools for biomedical text mining
Theme:
Date: Tuesday, July 14, 12:20 pm - 12:40 pmRoom: Wicklow Hall 2A

Presenting author: Miguel Rocha, , Portugal

Session Chair:

Presentation Overview: Show
The @Note2 project (http://www.anote-project.org) provides a number of user-friendly open-source computational tools for biomedical text mining, including tools for information retrieval and publication management, named entity recognition and relationship extraction, and a manual curation environment for annotating documents. This demonstration will present the main @Note functionalities and technological features.

TOP

TT20 (PT) - Zegami: Software for high throughput image exploration
Theme:
Date: Tuesday, July 14, 2:00 pm - 2:20 pmRoom: Liffey Hall 2

Presenting author: Stephen Taylor, ,

Session Chair:

Presentation Overview: Show
We have developed a tool called Zegami which has the ability to display thousands of images, movies, stacked images or even 3D objects in a web browser. From here you can search, sort, filter and group based on the image associated metadata. The software is free for academic use.

TOP

TT21 (PT) - Publishing with Genomics, Proteomics & Bioinformatics
Theme:
Date: Tuesday, July 14, 2:00 pm - 2:40 pmRoom: Wicklow Hall 2A

Presenting author: Jialei Xie, , China

Session Chair:

Presentation Overview: Show
The open-access journal Genomics, Proteomics & Bioinformatics (GPB) is the official journal of Beijing Institute of Genomics, Chinese Academy of Sciences and Genetics Society of China. GPB publishes high-quality articles in omics and bioinformatics worldwide in various article types. GPB has been indexed in PubMed/MEDLINE, Chemical Abstracts, Scopus, etc.

TOP

TT22 (PT) - Secure, Scalable and Cost Effective Computing for Large Scale Genomics Analysis on Amazon Web Services
Theme:
Date: Tuesday, July 14, 2:00 pm - 3:00 pmRoom: Wicklow Hall 2B

Presenting author: Angel Pizarro, , United States

Session Chair:

Presentation Overview: Show
Scalability, security, and low costs make cloud computing attractive for genomic sequencing data analysis and storage. We will discuss how researchers are using Amazon Web Services (AWS) to create scalable, performant, collaborative, and cost effective solutions for genomics, health, and big-data workflows. Come learn about how AWS can help your research.

TOP

TT23 (PT) - CIViC: Crowdsourcing the Clinical Interpretation of Variants in Cancer
Theme:
Date: Tuesday, July 14, 2:20 pm - 2:40 pmRoom: Liffey Hall 2

Presenting author: Malachi Griffith, , United States

Session Chair:

Presentation Overview: Show
Interpreting the clinical relevance of cancer variants is a significant bottleneck in clinical cancer sequencing pipelines, delaying the adoption of precision medicine. To this end, we present CIViC (www.civicdb.org) a crowd sourcing interface for the clinical interpretation of variants in cancer.

TOP

TT24 (PT) - WebChemistry: a platform for the detection, validation, comparison, and characterization of structural patterns in biomacromolecules
Theme:
Date: Tuesday, July 14, 2:40 pm - 3:00 pmRoom: Liffey Hall 2

Presenting author: David Sehnal, , Czech Republic

Session Chair:

Presentation Overview: Show
We present the WebChemistry platform, a set of tools for in silico analysis of structural bioinformatics data. The tools give the user the ability to detect, validate, compare, and characterize any pattern of interest (such as binding and catalytic sites, sequences, or channels) and are freely accessible at http://ncbr.muni.cz/WebChemistry.

TOP

TT25 (PT) - Realising the Infrastructure for Systems Biology Europe
Theme:
Date: Tuesday, July 14, 2:40 pm - 3:00 pmRoom: Wicklow Hall 2A

Presenting author: Carole Goble, University of Manchester,

Session Chair:

Presentation Overview: Show
ISBE will offer European researchers easy access to state-of-the-art facilities, data, models, and training by interconnecting national systems biology centres and making their collective expertise, resources and services readily available.

This is improve biological data and model standardisation across difference laboratories, makig them more combinable and re-usable.

TOP

TT26 (PT) - FAIRDOM: FAIR Data and Model Mangement in Systems Biology
Theme:
Date: Tuesday, July 14, 3:30 pm - 3:50 pmRoom: Wicklow Hall 2A

Presenting author: Natalie Stanford and Carole Goble, , Netherlands

Session Chair:

Presentation Overview: Show
FAIRDOM (http://fair-dom.org/) enables the systems biology community to produce Findable, Accessible, Interoperable and Reproducible Data, Operating procedures and Models. Heterogeneous data and models from systems investigations can be aggregated and interlinked in the central FAIRDOMHub, for stewardship and publication, or in a local FAIRDOM instance, for collaboration and sharing.

TOP

TT27 (PT) - Bioinformatics Cloud Services for Life Sciences
Theme:
Date: Tuesday, July 14, 3:50 pm - 4:10 pmRoom: Wicklow Hall 2A

Presenting author: Christophe Blanchet, , France

Session Chair:

Presentation Overview: Show
The French Institute of Bioinformatics (IFB), ELIXIR French node, aims to provide scientists with bioinformatics services relying on the required computing and storage capacities while providing a user-friendly and easily adjustable solution. A selection of bioinformatics software and pipelines have been integrated as turnkey cloud services, now available to scientists.

TOP

TT28 (PT) - GenomeSpace: An environment for frictionless bioinformatics
Theme:
Date: Tuesday, July 14, 4:10 pm - 4:30 pmRoom: Wicklow Hall 2A

Presenting author: Sara Garamszegi, , United States

Session Chair:

Presentation Overview: Show
GenomeSpace is a platform that supports an open community of genomics analysis tools. Using GenomeSpace, researchers can construct analyses requiring communication and data transfer between diverse tools such as Web applications and client-side tools. GenomeSpace supports cloud-based storage, handles file format compatibility issues automatically, and enables reproducibility of multi-tool analyses.

TOP

VC03 (PT) - The FATHMM Family of Predictors
Theme:
Date: TBARoom: TBA

Presenting author: Colin Campbell, ,

Session Chair:

Presentation Overview: Show

TOP

WK01_PartA (PT) - Top Ten Tips for Setting up a Bioinformatics Course
Theme:
Date: TBARoom: TBA

Presenting author: Gabriella Rustici, ,

Session Chair:

Presentation Overview: Show

TOP

WK01_PartB (PT) - Learn Bioinformatics by Doing Bioinformatics
Theme:
Date: TBARoom: TBA

Session Chair:

Presentation Overview: Show

TOP

WK01_PartC (PT) - Train-the-trainer: from bioinformatics turned to bioinformatics trainer
Theme:
Date: TBARoom: TBA

Presenting author: Annette McGrath, ,

Session Chair:

Presentation Overview: Show

TOP

WK01_PartD (PT) - Experience Exchange: Focus on NGS Course
Theme:
Date: TBARoom: TBA

Presenting author: Francis Ouellette, Gabriella Rustici, Dave Clements, Annette McGrath , ,

Session Chair:

Presentation Overview: Show

TOP

WK02_PartAB (PT) - Part A: The role of core facilities when everyone is a bioinformatician Part B: Bioinformatics core facilities as service providers
Theme:
Date: TBARoom: TBA

Presenting author: Davide Cittaro and Sven Nahsen, ,

Session Chair:

Presentation Overview: Show

TOP

WK02_PartCD (PT) - Part C: Maintaining a Publicly Used Analysis Infrastructure Part D:The business of core services
Theme:
Date: TBARoom: TBA

Presenting author: Madelaine Gogol and Jim Cavalcoli, ,

Session Chair:

Presentation Overview: Show

TOP