ISMB ECCB 2009

Birds of a Feather

Highlights Track

Art & Science Exhibition

Birds of a Feather

Highlights Track

Proceedings Track

Special Sessions

Satellite Meetings & SIGs

Student Council Symposium

Technology Track

Art & Science Exhibition

Birds of a Feather

Highlights Track

Highlights Track - Late

Special Interest Groups (SIGs)

Special Sessions

Student Council Symposium

Technology Track

Travel Fellowship

Speaker Information

Conference Sponsors

Exhibit Floor Plan

Sponsor Opportunities

Sponsorship Benefits

Sponsor Brochure

Sponsor & Exhibitor Sign-up

Conference Chairs

Scientific Posters

Institute Posters

Industry Posters

Highlights Track Presentation Schedule

Highlights Track: HL01
The Plurality of Prognostic Gene Signatures for Cancer
Monday, June 29 - 10:45 a.m. - 11:10 a.m.
Room: T1
Presenting author: Paul Boutros, Ontario Institute for Cancer Research, Canada
Presentation Overview:
Many diseases exhibit highly variable prognosis, with some individuals responding to therapy and others not. Despite this variability, patients are often treated identically. One major goal of modern medicine is to identify biomarkers that predict the optimal therapy for each patient. Although diverse computational techniques have been developed, existing methods have two key weaknesses: they overfit training datasets and they yield non-overlapping biomarkers that complicate clinical validation. We addressed these weaknesses in the context of lung cancer. First, we developed a non-linear biomarker-identification technique by coupling unsupervised machine-learning to gradient-descent optimization. This algorithm identified a six-gene biomarker that was validated on eight independent datasets comprising 589 patients. Second, we devised a technique for estimating the null distribution of biomarkers that can be used to obtain an unbiased ranking of biomarkers; our six-gene biomarker is in the 99.98 percentile. More importantly, this analysis reveals that over 500,000 unique lung cancer biomarkers exist. Thus, our work resolves two key questions in the field of biomarker identification: the over-fitting and non-overlapping biomarker problems. Although we focus on lung cancer, our techniques are directly applicable to other diseases. We are currently applying these techniques to other diseases. Preliminary results from breast cancer and schizophrenia will be presented.

ISMB/ECCB 2009 Blog

HL01: Paul Boutros - The Plurality of Prognostic Gene Signatures for Cancer

Add a Comment - Like - View in FriendFeed

lung cancer is highly heterogenous in classifications and outcomes -- how to separate those patients that would and would not benefit from chemotherapy after surgery - Andrew Su

previous studies based on gene expression analysis have not replicated. why? weak statistical methods - Andrew Su

method: "modified steepest descent" - Andrew Su

first assess genes individually, then take best gene and evaluate best pair, take top pair and screen for top triple, etc. - Andrew Su

identified 6 -gene marker set -- stx1a, hif1a, cct3, mafk, rnf5, hla_dpb1 - Andrew Su

classier in training data, survival difference 1E-5 (not surprising) - Andrew Su

8 separate studies: p = 0.02 (409 patients - Andrew Su

evaulation: take 10 million random six-gene sets, better than 99.9999% of backgorund, drops to ~90% on other training data, merged 99.98% - Andrew Su

but, what are the ~450,000 random sets that are better? Could this explain non-overlap of previous microarray studies? - Andrew Su

evaulate gene based on marker pluraily -- how many good predictors does a gene occur in? - Andrew Su

Highlights Track: HL02
Modeling Ecological and Genetic Diversity in Bacteria
Monday, June 29 - 10:45 a.m. - 11:10 a.m.
Room: T2
Presenting author: Eric Alm, MIT, United States
Presentation Overview:
Defining species boundaries is a major challenge for microbiologists. This is because bacteria are difficult to study in their natural habitat, and they can exchange genes across species boundaries (however these may be defined). One promising direction is to combine environmental DNA sequence data together with ecological metadata to infer: (i) populations adapted to a habitat/niche; (ii) the structure of the habitat underlying a series of ecological measurements; and (iii) the history of ecological adaptation within a lineage. We describe a quantitative model (AdaptML) that infers the evolutionary history of ecological differentiation for a collection of ocean bacteria, revealing populations specific for different seasons and life-styles, and in more recent work we have used the predictions of the AdaptML algorithm to target entire populations of bacteria for complete genome sequencing. This population genomic approach is helping to elucidate the underlying microevolutionary processes shaping bacterial species.

ISMB/ECCB 2009 Blog

HL02: Eric Alm - Modeling Ecological and Genetic Diversity in Bacteria

Add a Comment - Like - View in FriendFeed

So far this is similar to http://ff.im/4x4Vg of the BioPathways SIG yesterday - Ruchira S. Datta

this lets us study sympatric speciation in the ocean - Ruchira S. Datta

inspired by Mandel, ..., Ruby, "A single regulatory gene is sufficient to alter bacterial host range": what is the genetic basis for adaptation to the niches we found (zooplankton-associated vs small-particle associated)? - Ruchira S. Datta

want to do ecologically-driven population genomics - Ruchira S. Datta

wanted to sequence all genomes in each of the subtrees - Ruchira S. Datta

asked for 20 genomes in grant application, was met with "?!" - Ruchira S. Datta

sat next to Eric Lander at a dinner, who said "20 genomes? why not 100?", so got funding from Broad Institute - Ruchira S. Datta

want to model changes within a population of genomes; gene flow and recombination within the population - Ruchira S. Datta

wanted to find recombination breakpoints - Ruchira S. Datta

dimorphic sites, with one nucleotide changing: supports splits, leading to tree with single mutation - Ruchira S. Datta

Highlights Track: HL03
Whole genome analysis of mtDNA natural evolution in human and in cancer
Monday, June 29 - 11:15 a.m. - 11:40 a.m.
Room: T1
Presenting author: Eitan Rubin, Ben Gurion University, Israel
Presentation Overview:
mtDNA is an exceptional model genome for bioinformatics. It is the only genome sequenced in 2400 different individuals. In this work we demonstrate how comparative whole-genome analysis reveals strong patterns the recur in human evolution and in cancer.

ISMB/ECCB 2009 Blog

HL03: Eitan Rubin - Whole genome analysis of mtDNA natural evolution in human and in cancer

Add a Comment - Like - View in FriendFeed

Allyson Lister liked this

in advance of 1000 genomes, can we use mitochondrial dna sequencing? advantages: small size (16kb), and maternally inherited (allows human phylogeny) - Andrew Su

mitochondria have known role in cancer, but unclear if the mito genome (and mutations therein) are relevant - Andrew Su

method: compare tumor and normal mtDNA -- patient-matched. --> mutational landscape --> eliminate SNPs in normals --> find high recurrence. BUT, nothing found... - Andrew Su

method 2: from mutational landscape, look for co-mutations: two patients with multiple mutations observed - Andrew Su

one "flip-flop" observed -- mutation in tumor in one patient and in normal in another -- bi-stability? - Andrew Su

compare evolution in normal population as reference; create trees via nieghbor joining, create inner nodes, find deepest branch with mutation; conclusion: cancer uses snps that arise early in human evolution - Andrew Su

relates to african vs european haplotypes? - Andrew Su

conclusion: mtDNA is under selection in cancer; strategy of finding multiple SNPs in subset of tumors; functional prediction of 25 mtDNA snps - Andrew Su

philosophilcal question -- are tumors different species? unicellular parasites? - Andrew Su

A tumor is like a parasite branching off as a separate species from the human - Michael Kuhn

Highlights Track: HL04
The role of the RNA folding free energy in the evolution of the influenza virus
Monday, June 29 - 11:15 a.m. - 11:40 a.m.
Room: T2
Presenting author: Panayiotis Benos, University of Pittsburgh, United States
Presentation Overview:
The influenza A virus genome consists of eight single stranded RNA segments of negative polarity. Current efforts to understand viral host-specificity have largely focused on the amino acid differences between avian and human isolates. The results presented here demonstrate that the RNA folding free energy (FFE) of the influenza polymerase genes plays a key role in the evolution and host specificity of the virus. In particular, we found that the distribution of the FFEs is significantly different between human and avian isolated strains, with human isolates having generally higher FFEs (less stringent RNA structures). When avian polymerase genes are introduced in the human population, their FFEs shift toward higher values over the years. Infection experiments in mammalian cells growing at different temperatures show that human isolated viruses cannot propagate efficiently at higher temperatures and more recent results (not in the paper) show the opposite: i.e., avian isolates cannot propagate efficiently in lower temperatures. Taken together, our data suggest for the first time that RNA structure stability is important for the emergence and host shift of influenza A virus. The fact that cellsÕ temperature affects virus propagation in mammalian cells has important consequences for the prevention and therapeutic strategies.

ISMB/ECCB 2009 Blog

HL04: Panayiotis Benos - The role of the RNA folding free energy in the evolution of the influenza virus

Add a Comment - Like - View in FriendFeed

Highlights Track: HL05
Proteomics first approach for discovering sub-network targets in cancer
Monday, June 29 - 11:45 a.m. - 12:10 p.m.
Room: T1
Presenting author: Rod Nibbe, Case Western Reserve University, United States
Presentation Overview:
Using a proteomics first approach we identified many targets significant for late stage human colon cancer. These targets were used to seed a search in a well-annotated PPI for subnetworks possibly significant for the late stage phenotype. We devised a method to score certain of the subnetworks found using an information theoretic (mutual information) approach based on a complement of transcription data (microarray) as a surrogate for subnetwork activity. The subnetworks were pruned to leave significant targets, and extended one hop to infer functional relevance and inform follow-on experiments. The significant targets in one subnetwork were validated by label-free mass spectrometry or western blot, and found to be coordinately regulated at the level of protein and mRNA. Overall, the work outlines a novel quantitative approach for extending the results of proteomic profiling for finding disease discriminators at the level of protein subnetworks (and thus function), and drives target selection for in vitro/in vivo verification.

ISMB/ECCB 2009 Blog

HL05: Rod Nibbe - Proteomics first approach for discovering sub-network targets in cancer

Add a Comment - Like - View in FriendFeed

Move beyond single targets, an aim to identify concerted effects - Oliver Hofmann

Understanding the patophysiology of a late-stage colon cancer phenotype - Oliver Hofmann

proteomics data _ legacy data + transcriptomics --> quantitative sub-network specific to disease phenotype - Michael Kuhn

Top-down proteomics approach: paired normal / tumour biopsies from 12 patients - Oliver Hofmann

Differential image analysis to identify significantly changing proteins, MS/MS of excised spots results in lists of significant proteins associated with the phenotype - Oliver Hofmann

the paper: http://www.ncbi.nlm.nih.gov/pubmed/19098285 - Michael Kuhn

map cancer-significant genes onto PPI db, look for sub-networks enriched in cancer proteins - Michael Kuhn

find 4 significant subnetworks, extend by one hop and look for evidence in lit - Michael Kuhn

Pruned to core network members, expanded by one step for functional inference - Oliver Hofmann

Looks like a fairly standard MetaCore Meta-analysis to me so far. Not quite sure how they scored the subnetworks with the array data, need to check the paper - Oliver Hofmann

Highlights Track: HL06
Bayesian Inference of Selection Histories in Six Mammalian Genomes
Monday, June 29 - 11:45 a.m. - 12:10 p.m.
Room: T2
Presenting author: Tomas Vinar, Comenius University in Bratislava, Slovak Republic
Presentation Overview:
Genome-wide scans for positively selected genes (PSGs) in mammals haveprovided insight into the dynamics of genome evolution, the geneticbasis of differences between species, and the functions of individualgenes. Here we present the most comprehensive examination of mammalianPSGs to date, using the six high-coverage genome assemblies nowavailable for eutherian mammals. The increased phylogenetic depth ofthis dataset results in substantially improved statistical power, andpermits several new lineage- and clade-specific tests to beapplied. Of ~16,500 human genes with high-confidence orthologs in atleast two other species, 400 genes showed significant evidence ofpositive selection (FDR

ISMB/ECCB 2009 Blog

HL06: Tomas Vinar - Bayesian Inference of Selection Histories in Six Mammalian Genomes

Add a Comment - Like - View in FriendFeed

The ratio of nonsynonymous to synonymous mutations may identify functional selection. - Gabriele Sales

Detect positive selection by comparing orthologous genes in six mammalian genomes: human, chimp, macaque, mouse, rat and dog. - Gabriele Sales

One would like to identify which branches of the phylogenetic tree are under positive selection for a given gene. - Gabriele Sales

Algorithm summary: 1) take genes under positive selection under likelihood-ratio tests (544); 2) pre-compute the likelihood of every gene under all 511 possible histories of the tree (using PAML); 3) compute many iterations of Gibbs sampling (very fast); 4) compute history posterior probabilities from the samples. - Gabriele Sales

Most genes (from the 544 originally selected) appear to have switched between evolutionary models (positive selection and nonselection): 95% at least once, 53% at least twice. - Gabriele Sales

Highlights Track: HL07
Computational approach to model peptide antigenicity
Monday, June 29 - 12:15 p.m. - 12:40 p.m.
Room: T1
Presenting author: Carlos Camacho, University Of Pittsburgh, United States
Presentation Overview:
It is well known that relatively unstable peptides bearing only partial structural resemblance to native protein can trigger antibodies recognizing higher order structures found in native protein. Based on sound thermodynamic principles and computational modeling, this work reveals that stability of immunogenic protein-like motifs is a critical parameter rationalizing the diverse humoral immune responses induced by different linear peptide epitopes. In this paradigm, peptides with a minimal amount of stability (∆GX < 0 kcal/mol) around a protein-like motif (X) are capable to induce antibodies with similar affinity for both peptide and native protein, more weakly stable peptides (∆GX > 0 kcal/mol) trigger antibodies recognizing full protein but not peptide, and unstable peptides (∆GX > 8 kcal/mol) fail to generate antibodies against either peptide or protein. Immunization experiments involving peptides derived from the autoantigen histidyl-tRNA synthetase verify that selected peptides with varying relative stabilities predicted by molecular dynamics simulations induce antibody responses consistent with this theory. Collectively, these studies provide insight pertinent to the structural basis of immunogenicity and, at the same time, validate this form of thermodynamic and molecular modeling as an approach to probe the development/evolution of humoral immune responses.

ISMB/ECCB 2009 Blog

HL07: Carlos Camacho - Computational approach to model peptide antigenicity

Add a Comment - Like - View in FriendFeed

Highlights Track: HL08
Insights into corn genes derived from large-scale cDNA sequencing
Monday, June 29 - 12:15 p.m. - 12:40 p.m.
Room: T2
Presenting author: Nickolai Alexandrov, Ceres, Inc., United States
Presentation Overview:
We present a large portion of the transcriptome of Zea mays, including 31,552 fully sequenced non-redundant cDNA clones. These and other previously sequenced transcripts have been aligned with genome sequences and have provided new insights into the characteristics of gene structures and promoters within this major crop species. We found that although the average number of introns per gene is about the same in corn and Arabidopsis, corn genes have more alternatively spliced isoforms. Corn genes, as well as genes of other Poaceae (Grass family), can be divided into two classes according to the GC content at the third position in codons. Many transcripts that have lower GC content have dicot homologs but the high GC transcripts are more specific to the grasses. The high GC content class is also enriched with intronless genes. This evolutionary divergence may be the result of horizontal gene transfer from species not only with different GC content but possibly that did not have introns, perhaps outside of the plant kingdom. By comparing the cDNAs described herein with the non-redundant set of corn mRNAs in GenBank, we estimate that there are about 50,000 different protein coding genes in Zea.

ISMB/ECCB 2009 Blog

HL08: Nickolai Alexandrov - Insights into corn genes derived from large-scale cDNA sequencing

Add a Comment - Like - View in FriendFeed

ab initio gene prediction with accuracy of only 45% for protein coding genes - Diego M. Riaño-Pachón

many more discoveries by mapping transcript to genomic sequences - Diego M. Riaño-Pachón

corn genes divided into two groups by GC content, unknown why - Marcel Martin

Corn genome several times bigger than rice genome - Diego M. Riaño-Pachón

2400 Mb - Marcel Martin

There are a couple of million ESTs available - Diego M. Riaño-Pachón

less than 50000 full length cDNA, around 10000 high quality - Diego M. Riaño-Pachón

thorough quality checking was done - Marcel Martin

longer 5'UTR than in Arab and rice - Diego M. Riaño-Pachón

estimation of no. of genes by same method that was used for human genome, result: approx. 50000 genes - Marcel Martin

Highlights Track: HL09
Inferring pathway activity toward precise disease classification
Monday, June 29 - 2:15 p.m. - 2:40 p.m.
Room: T1
Presenting author: Eunjung Lee, KAIST, Korea, Dem. Rep.
Presentation Overview:
The advent of microarray technology has made it possible to classify disease states based on gene expression profiles of patients. Typically, marker genes are selected by measuring the power of their expression profiles to discriminate among patients of different disease states. However, expression-based classification can be challenging in cancer due to factors such as cellular heterogeneity within a tissue sample and genetic heterogeneity across patients. A promising technique for coping with these challenges is to incorporate pathway information into the disease classification procedure in order to classify disease based on the activity of entire signaling pathways or protein complexes rather than the expression levels of individual genes or proteins. We propose a new pathway-based classification procedure in which markers are encoded not as individual genes, nor as the set of genes making up a known pathway, but as subsets of Òcondition-responsive genes (CORGs)Ó within those pathways. Using expression profiles from seven different microarray studies, we show that the accuracy of this method is significantly better than both the conventional gene- and pathway- based diagnostics. Furthermore, the identified CORGs may facilitate the development of effective diagnostic markers and the discovery of molecular mechanisms underlying disease.

ISMB/ECCB 2009 Blog

HL09: Eunjung Lee - Inferring pathway activity toward precise disease classification

Add a Comment - Like - View in FriendFeed

Highlights Track: HL10
Evolution of genome structure: what statistics can tell us about the biology of chromosomes
Monday, June 29 - 2:15 p.m. - 2:40 p.m.
Room: T2
Presenting author: Aaron Darling, University of California-Davis, United States
Presentation Overview:
The profound effect recombination has on genome evolution remains poorly understood. Recombination can give rise to myriad genomic mutations, including large-scale deletions, lateral gene transfers, duplications, and even genome rearrangements. In this talk, I examine patterns in how homologous recombination gives rise to genomic inversions. The process of evolution by inversion has been modeled using a Bayesian statistical framework, and under that model we infer the phylogenetic history of inversions among nine Yersinia genomes. We statistically confirm that inversions are generally short and that bacteria exhibit a Òbilateral genomic symmetryÓ between the origin and terminus of chromosome replication. We extend previous controversy over breakpoint reuse rates by discovering that hotspots of breakpoint reuse localize near the origin of replication. We illustrate how statistical confidence intervals can be derived for breakpoint reuse rates. Finally, we discover a canonical configuration for the origin and terminus of replication in Yersinia. Our work highlights how advances in combinatorial theory of genome rearrangement can lead to novel statistical inference methods, which can in turn offer new insight into genomic biology.

ISMB/ECCB 2009 Blog

HL10: Aaron Darling - Evolution of genome structure: what statistics can tell us about the biology of chromosomes

Add a Comment - Like - View in FriendFeed

Got into this area studying Yersinia pestis - Allyson Lister

However, he thinks there is a lot of structure in the process, even if we get a pattern that suggests uniform breakage - Allyson Lister

Inversion events in bacteria are generally scattered across the chromosome. - Allyson Lister

Mauve is an excellent package for visualizing rearrangements - David Sexton

"Seevolution" visualizes inversions - Marcel Martin

Breakpoints near the origin of the replication are 5-6x more frequent than at the terminus of replication. - Allyson Lister

Blog post: http://themindwobbles.wordpress.com/2009/06/29/hl10-evolution-of-genome-structure-what-statistics-can-tell-us-about-the-biology-of-chromosomes/ - Allyson Lister

Highlights Track: HL11
Using side effects of medicines to identify drug targets
Monday, June 29 - 2:45 p.m. - 3:10 p.m.
Room: T1
Presenting author: Michael Kuhn, EMBL Heidelberg, Germany
Presentation Overview:
This talk is based on the study ÒDrug target identification using side-effect similarityÓ, published in Science (July 2008). It describes a computational method to predict whether two drugs share targets based on phenotypic side-effect similarities. While earlier studies focused on molecular or cellular features, we employed a global analysis of drugs and their side effects to predict sharing of drug targets. Applied to 746 marketed drugs, a network of 1018 side effect-driven drug-drug relations became apparent, 261 of which are formed by chemically dissimilar drugs from different therapeutic indications. We experimentally tested 20 of these unexpected drug-drug relations and validated 13 implied drug-target relations by in vitro binding assays, of which 11 reveal inhibition constants equal to less than 10 micromolar. Nine of these were tested and confirmed in cell assays, documenting the feasibility of using phenotypic information to infer molecular interactions and hinting at new uses of marketed drugs.

ISMB/ECCB 2009 Blog

HL11: Michael Kuhn - Using side effects of medicines to identify drug targets

Add a Comment - Like - View in FriendFeed

Egon Willighagen liked this

By drugs he means small molecules - David Sexton

side effects of drugs are human phenotypes - David Sexton

drugs with similar side effects may share targets - David Sexton

some side effects are common and uninformative - David Sexton

sideeffects.embl.de - David Sexton

Highlights Track: HL12
Gene Loss Under Neighbourhood Selection Following Whole Genome Duplication And The Reconstruction Of The Ancestral Diploid
Monday, June 29 - 2:45 p.m. - 3:10 p.m.
Room: T2
Presenting author: David Sankoff, University of Ottawa, Canada
Presentation Overview:
How can we construct a phylogeny based on gene order if some of the genomes under study are descendents of whole genome doubling events? We have integrated a \\\\\\\\\\\\\\\"guided genome halving algorithm\\\\\\\\\\\\\\\" and a median genome routine to heuristically solve the small phylogeny problem, and have applied it to data containing thousands of sets of homologs among the poplar (tetraploid), grapevine (diploid) and papaya (diploid) genomes. We have been able to reconstruct the last diploid ancestor of poplar before its genome was doubled. We can then follow the evolution of duplicate genes pairs, and assess the mechanism that determines which of the two is likely to be lost and whether there is a bias towards losing adjacent genes on the same strand.

ISMB/ECCB 2009 Blog

HL12: David Sankoff - Gene Loss Under Neighbourhood Selection Following Whole Genome Duplication And The Reconstruction Of The Ancestral Diploid

Add a Comment - Like - View in FriendFeed

Highlights Track: HL13
The Human Phenotype Ontology
Monday, June 29 - 3:15 p.m.- 3:40 p.m.
Room: T1
Presenting author: Peter Robinson, CharitŽ - UniversitŠtsmedizin Berlin, Germany
Presentation Overview:
We will describe the Human Phenotype Ontology and explain how to use it to annotate and analyze human disease. In addition, we will present new methods for performing clinical diagnostics using ontological similarity measures in the HPO and methods for calculating P-values for the scores, as well as new results on the relationship between phenotypic modules (all genes/proteins related to phenotypic features of the HPO) and the network characteristics of these proteins in the protein interactome.

ISMB/ECCB 2009 Blog

HL13: Peter Robinson - The Human Phenotype Ontology

Add a Comment - Like - View in FriendFeed

Nakao M. liked this

problem with OMIM: no controlled vocabulary - Michael Kuhn

therefore: build ontology for OMIM for all descriptions used at least twice - Michael Kuhn

synonyms were merged - Michael Kuhn

refinement purely manual - Michael Kuhn

~8800 terms in HPO - Michael Kuhn

http://www.human-phenotype-ontology.org - Allyson Lister

They have a procedure which calculates phenotypic similarity by finding their most-specific common ancestor. - Allyson Lister

can calculate phenotypic similarity with the HPO, using information content of the most specific ancentor (root: info cont. of 0) - Michael Kuhn

phenotypic similarity of disease: take average over similarities of the phenotypes - Michael Kuhn

can cluster the human phenome - Michael Kuhn

Highlights Track: HL14
Visualizing Genomic Dark Matter: Repeat Probability Clouds in the Human Genome
Monday, June 29 - 3:15 p.m.- 3:40 p.m.
Room: T2
Presenting author: David Pollock, University of Colorado School of Medicine, United States
Presentation Overview:
We have no clear idea about where half the human genome comes from. This is the dark matter of the genome. The part of the genome we do know about is about 90% derived from repetitive elements (mostly transposable elements) and 10% derived from protein- or RNA-encoding genes. It is reasonable to presume that the dark matter is made up of the same things in similar proportions, but that the original sequences have mutated so much that we can no longer identify them easily. The newly developed Òrepeat probability cloudÓ approach is a means of identifying the signature of repeat structure in large genomes that is invisible to standard approaches. In a similar manner to how dark matter is visualized in the physical universe, we visualize genomic dark matter by its perturbing effect on sequence space. Other advantages of this method are that it is extremely fast and that it does not require prior knowledge of transposable element structure. The application of this technique is expected to revolutionize our understanding of the human genome and its origins in the evolution of ancestral mammalian genomes.

ISMB/ECCB 2009 Blog

HL14: David Pollock - Visualizing Genomic Dark Matter: Repeat Probability Clouds in the Human Genome

Add a Comment - Like - View in FriendFeed

Nakao M. liked this

Distribution of sequence types in the human genome: 1.5% exons; 44% repetitive sequences; 54.5% unknown (dark matter). - Gabriele Sales

RepeatMasker is not performing too well at identifying repeats; it misses ~50% of MIR fragments in the human genome, for example. - Gabriele Sales

The dominant approach to repeat identification is based on the search of consensus sequences for repeat families (e.g. RepBase). Drawbacks: slow; not very sensitive to fragmentary matches; unable to identify novel repeats. - Gabriele Sales

De novo methods: RECON, PILER, RepeatFinder, RepeatScout. - Gabriele Sales

Evolutionary de novo repeat analysis: better detection of imperfect repeats and fragment left over from extensive duplication and divergence. - Gabriele Sales

Consensus based methods reduce multiple sequences to a single one; P-clouds group similar, high-copy L-mers together (16-mers for mammals, E<1 per genome). - Gabriele Sales

Next step: mapping P-clouds onto the genome using a sliding window. - Gabriele Sales

Example: human chromosome X. Time required on a single workstation: 45 min (vs. >10 days for RECON and 8hrs for RepeatScout). - Gabriele Sales

Application to the whole human genome. 38% repetitive (P-clouds and RepeatMasker), 7% (only RM), 27% (PC only), 28% non-repetitive. - Gabriele Sales

Highlights Track: HL15
Analyzing risk factor of heart disease by a computational lipidology approach
Monday, June 29 - 3:45 p.m. - 4:10 p.m.
Room: T1
Presenting author: Katrin Huebner, University Heidelberg, Germany
Presentation Overview:
In our work clinical investigation meets computational simulation to analyze blood lipid values beyond “bad“ and “good“ cholesterol to understand in vivo mechanisms leading to atherosclerosis and heart disease. Following an introduction into the clinical relevance of lipoproteins the experimental and a novel modeling setup is explained. As main results virtual lipoprotein profiles that closely matches clinical values from healthy subjects, model-based predictions of mimicked disorders in underlying molecular processes and alterations in high-resolution lipoprotein profiles are presented.

ISMB/ECCB 2009 Blog

HL15: Katrin Huebner - Analyzing risk factor of heart disease by a computational lipidology approach

Add a Comment - Like - View in FriendFeed

Mike Chelen liked this

model with 6 lipoprotein components, each one may take part in up to 20 kinetic processes - Michael Kuhn

e.g. synthesis, uptake, exchange with tissue/lipids, ... - Michael Kuhn

simulation of lipoprotein particles, particles have an internal state --> observed kinetic heterogeneity - Michael Kuhn

test if simulation results correspond to clinical data - Michael Kuhn

virtual blood profiles closely match experimental values - Michael Kuhn

hypercholestrolemia: accumulation of cholesterol, especially LDL cholesterol. decrease parameter of LDL receptor to mimick malfunction of the receptor in vivo. simulated lipids follow clinical observations - Michael Kuhn

lipoprotein profile depends on initial composition (e.g. influence by diet) - Michael Kuhn

however, there are some dissimilar initial compositions which result in similar lipid profiles - Michael Kuhn

increase resolution to make sure this is not a sample artifact - Michael Kuhn

look for alterations in the high-density profile, but no difference in the low-resolution profile - Michael Kuhn

Highlights Track: HL16
MotifMap: a human genome-wide map of candidate regulatory motif sites.
Monday, June 29 - 3:45 p.m. - 4:10 p.m.
Room: T2
Presenting author: Pierre Baldi, UC Irvine, United States
Presentation Overview:
Comprehensive identification of all regulatory elements encoded in the human genome is a fundamental need in biomedical research. So far, only a small fraction of these elements have been identified experimentally. There is great interest in systematically discovering regulatory elements through computational means. We describe how to use comparative genomics to derive the first comprehensive map of regulatory elements in the human genome, taking advantage of the recent availability of 18 mammalian genomes. We developed a new scoring scheme for detecting regulatory elements, called Bayesian branch length score (BBLS), which can account for phylogenetic relationship between the species being compared, and motif-matching score at each individual species, while at the same time being flexible to alignment errors and missing sequences. Using BBLS, we were able to predict 1.5 million regulatory sites in the human genome with FDR less than 50%, corresponding to 380 regulatory motifs in the Transfac database. The method is particularly effective for 155 motifs, for which over 121 thousands sites can be mapped with FDR less than 10%.

ISMB/ECCB 2009 Blog

HL16: Xiaohui Xie - MotifMap: a human genome-wide map of candidate regulatory motif sites.

Add a Comment - Like - View in FriendFeed

Highlights Track: HL17
Learning from Resequencing Data: What To Do When the $1000 Genome Arrives?
Tuesday, June 30 - 10:45 a.m. - 11:10 a.m.
Room: T1
Presenting author: Gregory Kryukov, Brigham & Women's Hospital / Harvard Medical School , United States
Presentation Overview:
We investigated the potential of resequencing all exons in a clinical population to discover genes underlying human complex phenotypes. Computer simulations based on currently available deep resequencing data show that genes meaningfully affecting a human trait can be identified in an unbiased fashion, although large sample sizes would be required to achieve substantial power.

ISMB/ECCB 2009 Blog

HL17: Gregory Kryukov - Learning from Resequencing Data: What To Do When the $1000 Genome Arrives?

Add a Comment - Like - View in FriendFeed

Richard Klancer liked this

If it were not for the great variability among individuals medicine might as well be a science and not an art (Ostler 1892) (did I get this right? :)) - Allyson Lister

Mendelian diseases can be characterized by linkage analysis (classical association studies) - Oliver Hofmann

Currently method of choice is GWAS. Today, talk about another type of AS that is based on sequencing and is based on rare rather than common SNPs. - Allyson Lister

Association studies based on sequencing will allow for understanding rare SNPs - David Sexton

New sequencing technologies, ways to collect clinical populations and exon capturing approaches can revolutionize the search for genes underlying human phenotypes - Oliver Hofmann

@Allyson: sounds about right :) - Oliver Hofmann

Sequencing will uncover low frequency variants - David Sexton

All genes have rare coding variants, and while sequencing will uncover low frequency variants the power to detect their associations is reduced and the multiple testing correction becomes very stringent - Oliver Hofmann

resequencing AS (RAS). The theory here is that most new missense mutations are functional, most new missense mutations are only weakly deleterious, and most functional missense mutations are likely to influence phenotype in the same direction. - Allyson Lister

How many individuals need to be sequenced to find variants with minimal effect. - David Sexton

Highlights Track: HL18
Phylogenetic and Functional Assessment of Orthologs Inference Projects and Methods
Tuesday, June 30 - 10:45 a.m. - 11:10 a.m.
Room: T2
Presenting author: Christophe Dessimoz, ETH Zurich, Switzerland
Presentation Overview:
The identification of orthologs, pairs of homologous genes in different species that started diverging through speciation events, is a central problem in genomics with applications in many research areas, including comparative genomics, phylogenetics, protein function annotation, or genome rearrangement.An increasing number of projects aim at inferring orthologs from complete genomes, but little is known about their relative accuracy or coverage. Since the exact evolutionary history of entire genomes remains largely unknown, predictions can only be validated indirectly, that is, in the context of the different applications of orthology. The few comparison studies published so far have asssessed orthology exclusively from the expectation that orthologs have conserved protein function.In the present work, we have introduced methodology to verify orthology in terms of phylogeny, and perform a comprehensive comparison of nine leading orthologs inference projects and two methods using both phylogenetic and functional tests. The results show large variations among the different projects in terms of performances, which indicates that the choice of orthology database can have a strong impact on any downstream analysis.

ISMB/ECCB 2009 Blog

HL18: Christophe Dessimoz - Phylogenetic and Functional Assessment of Orthologs Inference Projects and Methods

Add a Comment - Like - View in FriendFeed

Types of homology according to Fitch, 1970 - Diego M. Riaño-Pachón

Orthologous relationships are not transitive - Diego M. Riaño-Pachón

In-paralogues and out-paralogues, defined by the relation between speciation and gene duplication - Diego M. Riaño-Pachón

Genes with the same function: equivalogs or isofunctional homologues - Diego M. Riaño-Pachón

Homologues in the same genomic context isopositional orthologues - Diego M. Riaño-Pachón

No biological meaning in the COG clustering algorithm - Diego M. Riaño-Pachón

InParanoid, only pairs of genomes - Diego M. Riaño-Pachón

Orthology inferece: by best bi-directional hits or species/gene tree reconciliation - Diego M. Riaño-Pachón

assessment of orthology inference methods - Diego M. Riaño-Pachón

assessment; previous work: Hulsen, Huynen et al., Genome Biology, 2006 http://www.ncbi.nlm.nih.gov/pubmed/16613613 and Chen, Mackey et al., PLoS ONE, 2007 http://www.ncbi.nlm.nih.gov/pubmed/17440619 - Michael Kuhn

Highlights Track: HL19
A Mathematical Framework for the Selection of an Optimal Set of Peptides for Epitope-Based Vaccines
Tuesday, June 30 - 11:15 a.m. - 11:40 a.m.
Room: T1
Presenting author: Oliver Kohlbacher, Eberhard-Karls-UniversitŠt, Germany
Presentation Overview:
Due to their manifold advantages (e.g., safety, ease of production,analytical control) and their applicability in personalized medicineepitope-based vaccines (EVs) have recently been attracting significantinterest, in particular as a therapeutic strategy for cancer andinfectious diseases. EVs trigger an immune response via target-specificimmunogenic peptides (epitopes). A crucial step in the design of an EVis the selection of the epitopes to be included. Depending on the numberof candidate epitopes, the diversity of the target population, and theimmunological requirements, the epitope selection can become a verycomplex problem. The epitope selection problem poses an interesting and novelbioinformatics problem. We present a mathematical framework to find anoptimal set of epitopes for an EV. Given a set of epitopes, theframework efficiently identifies the epitopes most likely to elicit abroad and potent immune response in the target population.We can translate the epitope selection problem into an integer linearprogram which allows an easy adaptation to different variants of the EVdesign problem. Among the few published computational approaches ourapproach is the first to identify an optimal epitope set. Thismathematical framework will prove to be a valuable tool in vaccine design.

ISMB/ECCB 2009 Blog

HL19: Oliver Kohlbacher - A Mathematical Framework for the Selection of an Optimal Set of Peptides for Epitope-Based Vaccines

Add a Comment - Like - View in FriendFeed

OptiTope webserver: www.epitoolkit.org/optitope - Janet Kelso

Still need to: get real immunogenicity data rather than using MHC binding as a proxy, confirm that additivity holds - Janet Kelso

Highlights Track: HL20
Prediction of Binding Sites on Proteins Using the Gaussian Network Model
Tuesday, June 30 - 11:15 a.m. - 11:40 a.m.
Room: T2
Presenting author: Burak Erman, Koc University, Turkey
Presentation Overview:
Residues at the binding sites of the ligand and receptor of several enzyme-inhibitor and antibodyantigencomplexes are predicted from the slowest (for the ligand) and fastest (for the receptor) modes ofmotion by the Gaussian Network Model applied to unbound molecules.

ISMB/ECCB 2009 Blog

HL20: Burak Erman - Prediction of Binding Sites on Proteins Using the Gaussian Network Model

Add a Comment - Like - View in FriendFeed

Energy and environment are a closed system, thermodynamic modeling of system with entropy, energy, volume and position of the residues - Roland Krause

A probability function over the average values of the system can be used to formulate the average of the extensive variables and the conjugate variables. - Roland Krause

Harmonic fluctuations models the relationship between the residues in the system. - Roland Krause

Higher order correlation can be used to obtain the energy fluctiuations and the residues, which is proportional. - Roland Krause

The highest mode is the interesting end of the decomposition. - Roland Krause

HIV protease as a model system, active site and binding interfaces are identified as important residues as fast modes. System behaves as an energy sieve. - Roland Krause

Residues at the surface are energy gates, residues connecting two surfaces are hubs. Residues along the interaction pathway are anticorrelated to the important ones. - Roland Krause

Example of the peptide binding HLA protein (1OGT) shows 6 important residues by highest mode analysis. - Roland Krause

One AA is binding the peptide, other stabilizes the interaction. - Roland Krause

[Structure pictures not shown] - Roland Krause

Highlights Track: HL21
The miRNA/siRNA saturation effect - transfection of small RNAs compromise gene regulation by endogenous microRNAs
Tuesday, June 30 - 11:45 a.m. - 12:10 p.m.
Room: T1
Presenting author: Debora Marks, Harvard Medical School, United States
Presentation Overview:
Transfection of siRNAs or miRNAs (microRNAs) into cells typically lowers gene expression of hundreds of genes, assessed by decreases in protein or mRNA levels, but increases in gene expression have also been observed. One explanation for unexpected up-regulation upon miRNA or siRNA over-expression is a reduction in the effective function of the endogenous microRNAs. This may result from competition between exogenous and endogenous RNAs for the intracellular small RNA protein machinery. We tested the validity of this explanation by computational analysis of more than 150 si/miRNA transfection experiments in 7 different cell types for which genome-wide mRNA changes were measured using microarrays. After verifying the expected down-regulation of genes with UTRs that contain target sites for the exogenously introduced small RNAs, we show that genes with target sites for endogenous miRNAs are significantly up-regulated. Confirming this result we observe this competition effect with protein expression changes and a striking correlation of the dose response and temporal response of up-regulated genes with down-regulated genes after siRNA transfections. These findings have broad implications for the design and interpretation of experiments using small RNAs and for the design of clinical trials using siRNA therapeutics.

ISMB/ECCB 2009 Blog

HL21: Debora Marks - The miRNA/siRNA saturation effect - transfection of small RNAs compromise gene regulation by endogenous microRNAs

Add a Comment - Like - View in FriendFeed

Debora Marks has had to cancel her presentation due to a family emergency. - bb

Highlights Track: HL22
Sequence Similarity Network Reveals Common Ancestry of Multidomain Proteins
Tuesday, June 30 - 11:45 a.m. - 12:10 p.m.
Room: T2
Presenting author: Dannie Durand, Carnegie Mellon University, United States
Presentation Overview:
The challenge of homology identification in multidomain families with varied domain architectures is to distinguish sequence pairs that share common ancestry from pairs that share an inserted domain but are otherwise unrelated. This distinction is essential for accuracy in gene annotation, function prediction, and comparative genomics. We first present an extension of the traditional model of homology to include domain insertions and a manually curated benchmark of well-studied mammalian families. We next introduce Neighborhood Correlation, a novel method that identifies homologs with great accuracy based on the observation that gene duplication and domain shuffling leave distinct patterns in the sequence similarity network. In a rigorous, empirical comparison, Neighborhood Correlation outperforms sequence similarity, alignment length, and domain architecture comparison. Neighborhood Correlation is easy to compute, does not require explicit knowledge of domain architecture, and classifies both single and multidomain homologs with high accuracy. Our work represents a departure from the prevailing view that the concept of homology cannot be applied to genes that have undergone domain shuffling. We show that homology can be rationally defined for multidomain families with diverse architectures by considering the genomic context of the genes that encode them.

ISMB/ECCB 2009 Blog

HL22: Dannie Durand - Sequence Similarity Network Reveals Common Ancestry of Multidomain Proteins

Add a Comment - Like - View in FriendFeed

Ruchira S. Datta, Lars Juhl Jensen, Jason Stajich and arne liked this

Multidomain proteins are difficult to categorize because different parts have different histories. - Gabriele Sales

Multidomain homolgs - finding homologs is an important aspect of functional genomics. - Roland Krause

Song et al 2008 PLoS Computational Biology 4(5) - Allyson Lister

http://www.neighborhoodcorrelation.org/ - Roland Krause

Paper: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2377100 - Gabriele Sales

Genes that share common ancestry tend to have similar structure and function. - Gabriele Sales

also use to build comparative map of synteny - Ruchira S. Datta

Sequence comparison can be used to identify chromosomal regions that share common ancestry. - Gabriele Sales

this is called "spatial genomics" - Ruchira S. Datta

How multidomain proteins fit this picture? - Gabriele Sales

Highlights Track: HL23
Models from experiments: combinatorial perturbations of cancer cells
Tuesday, June 30 - 12:15 p.m. - 12:40 p.m.
Room: T1
Presenting author: Sven Nelander, University of Gothenburg , Sweden
Presentation Overview:
We present a novel method for deriving network models from molecular profiles of perturbed cellular systems. The network models aim to predict quantitative outcomes of combinatorial perturbations, such as drug pair treatments or multiple genetic alterations. Mathematically, we represent the system by a set of nodes, representing molecular concentrations or cellular processes, a perturbation vector and an interaction matrix. After perturbation, the system evolves in time according to differential equations with built-in nonlinearity, similar to Hopfield networks, capable of representing epistasis and saturation effects. For a particular set of experiments, we derive the interaction matrix by minimizing a composite error function, aiming at accuracy of prediction and simplicity of network structure. To evaluate the predictive potential of the method, we performed 21 drug pair treatment experiments in a human breast cancer cell line (MCF7) with observation of phospho-proteins and cell cycle markers. The best derived network model rediscovered known interactions and contained interesting predictions. Possible applications include the discovery of regulatory interactions, the design of targeted combination therapies and the engineering of molecular biological networks.

ISMB/ECCB 2009 Blog

HL23: Sven Nelander - Models from experiments: combinatorial perturbations of cancer cells

Add a Comment - Like - View in FriendFeed

What happens when you change more than one thing in a cell? - Allyson Lister

A defining feature of cancer cells is a breakdown in regulation. - Allyson Lister

A defining feature of cancer cells is a breakdown in regulation by aquired GOF (gain of function) or LOF (loss of function) mutations. - Allyson Lister

One of the advances in cancer pharmacology are compounds that target the function of the proteins in these pathways. - Allyson Lister

Approaches to CP (combinatorial perturbations) include metabolic flux models, and what we need is an equivalent for the cancer cells. - Allyson Lister

Each perturbation pattern consists of 1, 2 or 3 perturbations in sequence. This pattern is converted into a phosphoprotein and phenotype pattern. - Allyson Lister

use a linear differentiation equation - Allyson Lister

Start with good tradeoffs between model fit and simplicity. - Allyson Lister

They use a MC simulation, which results in the probabilities of functional interaction. - Allyson Lister

They adapted CoPIA to cancer genomics data, and the main issue was speed. - Allyson Lister

Highlights Track: HL24
Discovery of a hidden sequence motif conserved in the bacterial type III secretion signal: implications for structure, drug discovery and host-pathogen systems models.
Tuesday, June 30 - 12:15 p.m. - 12:40 p.m.
Room: T2
Presenting author: Jason McDermott, Pacific Northwest National Laboratory, United States
Presentation Overview:
We describe a novel computational method for identification of type III secreted virulence effectors in bacteria and characterization of a putative secretion signal from apparently unrelated sequences. We will discuss preliminary experimental results supporting our model of a partially disordered structure that may be widely conserved in the absence of sequence similarity and show how these predictions support development of systems models of host-pathogen interactions.

ISMB/ECCB 2009 Blog

HL24: Jason McDermott - Discovery of a hidden sequence motif conserved in the bacterial type III secretion signal: implications for structure, drug discovery and host-pathogen systems models.

Add a Comment - Like - View in FriendFeed

Salmonella typhimurium is the main model organism for this work. It builds its own niche using the T3SS in the macrophages, kills many people each year. In the US, it's mainly a food pathogen. - Roland Krause

Good model system for Salmonella typhi, the agent of typhus, a narrow range human pathogen. - Roland Krause

Includes two T3SS, injecting effectors into the host cell. 47 known effectors. - Roland Krause

Problems in T3SS, consisting of the injectosome and the secreted effectors, is the remote homogy of the efffectors. N-terminus of the effectors required for secretion but no similarity could be found. - Roland Krause

Typically not part of the 3D structure, appears to be disordered. How could the diverse sequences have the same function? Different mechanism ´for every effector? Cognate chaperones help the effectors to thread the protein into the injectosome. But you can delete the chaperone. - Roland Krause

Hypothesis for this talk: There is a common mechanism in the effectors. Typically BLAST was used to identify the secretion family. - Roland Krause

Previous analysis have been working for some organisms but not across species. The group developed a machine learning method (SVMs) from the primary sequence, including the nt-.sequence and used an unaligned input of the sequence. Used Pseudomonas syringae with 35 known effectors. Used the rest of the genome as negative examples. - Roland Krause

ROC is in the 0.05 range. - Roland Krause

Applied this to Chlamydia trachomatis, an almost untractable obligate intracellular pathogen. - Roland Krause

A similar paper appeared back to back with this paper, another paper was published recently. http://www.ncbi.nlm.nih.gov/pubmed/19526054 - Roland Krause

Highlights Track: HL25
Re-examining the connection between the network topology and essentiality
Tuesday, June 30 - 2:15 p.m. - 2:40 p.m.
Room: T1
Presenting author: Teresa Przytycka, NIH, United States
Presentation Overview:
We instigate the reason for correlation between degree and essentiality, observed in a number of yeast networks. Based on an analysis of six genome-wide protein interaction networks compiled from diverse sources of interaction data, we rejected the previously proposed hypotheses and put forward an alternative explanation. We argued that the majority of hubs are essential due to their involvement in groups densely connected proteins, many presumably protein complexes enriched in essential proteins.

ISMB/ECCB 2009 Blog

HL25: Teresa Przytycka - Re-examining the connection between the network topology and essentiality

Add a Comment - Like - View in FriendFeed

Highlights Track: HL26
Built-in loops allow versatility in domainÐdomain interactions: Lessons from self-interacting domains
Tuesday, June 30 - 2:15 p.m. - 2:40 p.m.
Room: T2
Presenting author: Eyal Akiva, The Hebrew University, Israel
Presentation Overview:
The function of most proteins depends on their interaction with other proteins. It was shown that many proteinÐprotein interactions are mediated by protein domains, and that there are distinct domain pairs that are used repeatedly as interaction mediators in various protein contexts. However, not all protein pairs with the corresponding domains that may mediate interaction do interact. It is conceivable that there are intra-domain structural and sequence features that play a role in determining the interaction potential of domains. Here, we discover such features by comparing domains that, on the one hand, mediate homodimerization of proteins and, on the other, occur in different proteins that are monomeric. This comparison uncovered surface loops that can be considered as determinants of the interactions. There are enabling loops, which mediate the domain interactions, and disabling loops that prevent the interactions. The presence of the enabling/disabling loops is consistent with the fulfillment/prevention of the interaction and is highly preserved in evolution. Thus, along with the preservation of structural elements that enable interaction, evolution maintains elements intended to prevent unwanted interactions. Our results extend the hierarchy of attributes that establish the modularity of domain-mediated protein-protein interactions, and provide a novel approach for predicting domain-domain interactions.

ISMB/ECCB 2009 Blog

HL26: Eyal Akiva - Built-in loops allow versatility in domain-domain interactions: Lessons from self-interacting domains

Add a Comment - Like - View in FriendFeed

Highlights Track: HL27
Regulatory networks define phenotypic classes of human stem cell lines
Tuesday, June 30 - 2:45 p.m. - 3:10 p.m.
Room: T1
Presenting author: Igor Ulitsky, Tel Aviv University, Israel
Presentation Overview:
Hundreds of different human cell lines from embryonic, fetal and adult sources are referred to as stem cells, even though they range from pluripotent cellsÑtypified by embryonic stem cells, which are capable of virtually unlimited proliferation and differentiationÑto adult stem cell lines, which can generate a far more limited repertoire of differentiated cell types. The rapid increase in reports of new sources of stem cells and their anticipated value to regenerative medicine calls for a general, reproducible method for classification of these cells. We have created and analyzed a database of global gene expression profiles that enables the analysis of cultured human stem cells in the context of a wide variety of pluripotent, multipotent and differentiated cell types. We categorized a collection of 150 cell samples, and discovered that pluripotent stem cell lines group together, whereas other cell types, including brain-derived neural stem cell lines, are very diverse. In addition, we uncovered a proteinÐprotein network that is shared by the pluripotent stem cells. Our results offer a new strategy for classifying stem cells and support the idea that pluripotency and self-renewal are under tight control by specific molecular networks.

ISMB/ECCB 2009 Blog

HL27: Igor Ulitsky - Regulatory networks define phenotypic classes of human stem cell lines

Add a Comment - Like - View in FriendFeed

want to understand molecular basis of pluripotency; need to ensure safety of stem cell therapy (don't want to create cancers) - Michael Kuhn

mRNA profiling of pluripotent and multipotent stem cells. using negative matrix factorization to group the cell types - Michael Kuhn

12 clusters work well, get mostly homogenous clusters - Michael Kuhn

Matisse approach, Ulitsky & Shamir BMC Systems Biol, 2007 - Michael Kuhn

create subnetwork of genes upregulated in pluripotent cells, have reason to believe that the subnetwork contains most of the pluripotency machinery. highly expressed in plurip. cells, not highly expressed in differentiated cells - Michael Kuhn

also look for microRNAs; profile 800 miRNAs in 26 cell lines: Laurent et al., Stem Cells 2008. stem cell groups share similar miRNA profiles - Michael Kuhn

look for differences between embryonic and other cells lines. clustered miRNA had a tendency to be differentially regulated in embr. cell lines - Michael Kuhn

major seed sequence (AAGUGC) contained in miRNAs from 4 clusters (21 miRNAs, 18 are upregulated in embr. cells), also have such families in other species (non-homologous to human) - Michael Kuhn

find complement of seed sequences in differentially regulated genes; look for pathways co-regulated by miRNAs, identify 57 such "PRMs"; find subnetwork of proteins which are jointly regulated, e.g. in cell cycle progression; find new submodule, which could be important in protection from neurodegeneration - Michael Kuhn

Login servers still down. Talk notes at http://www.fiamh.info/articles/17/ismbeccb-2009-day-2 - Oliver Hofmann

Highlights Track: HL28
Confirming alternative protein isoforms in Drosophila
Tuesday, June 30 - 2:45 p.m. - 3:10 p.m.
Room: T2
Presenting author: Michael Tress, Spanish National Cancer Research Center (CNIO), Spain
Presentation Overview:
Alternative splicing of messenger RNA permits the formation of a wide range of mature RNA transcripts and has the potential to generate a diverse spectrum of functional proteins. While there is extensive evidence for large scale alternative splicing at the transcript level there have been no comparable studies validating the existence of alternatively spliced protein isoforms.Two recent large scale proteomics studies generated extensive, high quality peptide catalogs from the Drosophila melangaster proteome. The analysis of this proteomic data confirmed the presence of multiple alternative gene products for over a hundred Drosophila genes and for the first time demonstrated the large-scale expression of alternatively spliced gene products. The fact that evidence for alternatively spliced isoforms came from proteomics studies confirms that these alternative isoforms must be expressed in sufficient quantity and be stable enough in vivo to be detected. However, the study suggested that many of the alternative gene products are likely to have regions that are disordered in solution, and that specific proteomics methodologies may be required to identify these isoforms.The analysis highlights the growing importance of proteomics in the validation of predicted proteins and points the way towards further research in this area.

ISMB/ECCB 2009 Blog

HL28: Michael Tress - Confirming alternative protein isoforms in Drosophila

Add a Comment - Like - View in FriendFeed

Highlights Track: HL29
Biomedical Discovery Acceleration
Tuesday, June 30 - 3:15 p.m.- 3:40 p.m.
Room: T1
Presenting author: Lawrence Hunter, University of Colorado School of Medicine, United States
Presentation Overview:
Recent technology has made it possible to do experiments that show hundreds or even thousands of genes play a role in a disease or other biological phenomena. Interpreting these experimental results in the light of everything that has ever been published about any of those genes is often overwhelming, and the failure to take advantage of all prior knowledge may impede biomedical research. The computer program described in this paper ÒreadsÓ the biomedical literature and molecular biology databases, ÒreasonsÓ about what all that information means to this experiment, and ÒreportsÓ on its findings in a way that makes digesting all of this information far more efficient than ever before possible. Analysis of a large, complex dataset with this tool led rapidly to the creation of a novel hypothesis about the role of several genes in the development of the tongue, which was then confirmed experimentally.

ISMB/ECCB 2009 Blog

HL29: Lawrence Hunter - Biomedical Discovery Acceleration

Add a Comment - Like - View in FriendFeed

Colm Ryan liked this

single genes don't explain phenotypes, have to understand gene sets. genes work together in complex, dynamics groups. need to be able to apply data from other disciplines to our data. but: knowledge has increased so much that it's impractical to screen all potentially relevant papers. - Michael Kuhn

example : Relaxin I. Why do some people not respond to beta blockers during heart failure? Cardiologists didn't know about relaxin I, well known in context of child birth. - Michael Kuhn

"Genes do not respect disciplinary boundaries." - Michael Kuhn

3R system: "Hanalyzer", Reading (import dbs, extract knowledge)/Reasoning (infer additional knowledge)/Reporting (help biologists explain data / generate new hypothesis). - Michael Kuhn

idea: align knowledge networks with data networks. nodes: fixed entities, e.g. gene db ids or ontology terms. links: potentially uncertain relations between the nodes. (i.e. probabilistic network). compare multiple sources of info into combined edge - Michael Kuhn

data network: data from experiments; knowledge network: data from literature. combine data and knowledge, visualize in two ways: either equal weight, or logistics (emphasizes data) - Michael Kuhn

cyotscape plugin "common attributes", also see Hanalyzer demo on YouTube, have links to data sources - Michael Kuhn

can look at correlated genes, first check out which genes are known and then extend the network to find new genes that may be developed in the process: get new hypothesis. check localization of proteins: could validate predictions. - Michael Kuhn

example question: What is the roloe of CAV3 in muscle. create network of genes, GO terms and other ontology terms - Michael Kuhn

Highlights Track: HL30
Automated Analysis of Patterns in Human Protein Atlas Images
Tuesday, June 30 - 3:15 p.m.- 3:40 p.m.
Room: T2
Presenting author: Robert Murphy, Carnegie Mellon University, United States
Presentation Overview:
This paper describes the first approach to automatically analyze all major subcellular patterns in tissue images from the Human Protein Atlas.

ISMB/ECCB 2009 Blog

HL30: Robert Murphy - Automated Analysis of Patterns in Human Protein Atlas Images

Add a Comment - Like - View in FriendFeed

I think it's based on this paper: http://murphylab.web.cmu.edu/publications/146-newberg2008-asap.pdf - Michael Kuhn

Pubmed link: http://www.ncbi.nlm.nih.gov/pubmed/18435555 - Michael Kuhn

Highlights Track: HL31
Toward Automated, Practical Provision of Need-Based, High-Utility Text to Diverse Biomedical Users and Database Curators
Tuesday, June 30 - 3:45 p.m. - 4:10 p.m.
Room: Victoria Hall
Presenting author: Hagit Shatkay, Queen's University , Canada
Presentation Overview:
This work is concerned with bridging the gap between actual text-needs of biomedical users (database curators being one example), and text-mining methods. Biomedical text-mining aims to serve a diverse community of scientists by identifying relevant information within scientific text. We note that there is no Òaverage biologistÓª client; different users have distinct needs. Specifically, evaluation efforts (BioCreative, TREC) noted that database curators are often interested in sentences showing experimental evidence and methods. Conversely, lab scientists often search for high-confidence facts about genes and proteins. We have recently introduced a multi-dimensional categorization and annotation scheme, applicable to a wide variety of biomedical text, while supporting specific biomedical retrieval and extraction tasks, including the identification of methods and experimental evidence. Along with it, we developed a large annotated corpus, (10,000 sentences tagged by eight annotators), and trained and tested machine learning classifiers to automatically categorize text based on the tagging scheme. We have also developed models to handle noise and disagreements among annotators. We show that automatic annotation of text along multiple useful dimensions is highly feasible, and that our new framework for scientific sentence categorization is applicable in practice. Among other categories, our classifier accurately identifies methodology and experimental statements.

ISMB/ECCB 2009 Blog

HL31: Hagit Shatkay - Toward Automated, Practical Provision of Need-Based, High-Utility Text to Diverse Biomedical Users and Database Curators

Add a Comment - Like - View in FriendFeed

Highlights Track: HL32
Network modeling of human interactome and phenome
Tuesday, June 30 - 3:45 p.m. - 4:10 p.m.
Room: T1
Presenting author: Rui Jiang, Tsinghua University, China
Presentation Overview:
We model the genome- and phenome-wide human gene-disease relationship using simple regression models, and show that the correlation between protein network distance and disease phenotype similarity accurately predicts disease genes. We perform genome-wide candidate gene prioritization for over 5000 diseases, revealing a comprehensive and modular genetic landscape of human disease. We also provide quantitative evidence supporting the correlation between phenotypic overlap and genetic overlap in human diseases, which implies a concordance between the topology of disease network and gene network. We then introduce the network alignment technique to compare the topology of the protein network and disease network, leading to the discovery of 39 disease families and corresponding causal gene networks, as well as a novel network alignment-based disease gene prediction approach and the high-quality predictions for 70 human diseases. Related paper has been featured by Nature Publishing Group in four areas: Genetics, Pathology, Systems Biology and Biotechnology, and the paper is also highlighted in Nature China. The paper became the journal\'s most accessed article by September of 2008, and is now cited by 7 papers according to ISI. The predicted disease genetic landscape is publicly available, and has been visited by over 1000 researchers around the world.

ISMB/ECCB 2009 Blog

HL32: Rui Jiang - Network modeling of human interactome and phenome

Add a Comment - Like - View in FriendFeed

Kuan-Ting Lin liked this

most current methods to predict disease genes are based on "guilt by association" - Michael Kuhn

e.g. approach based on protein-protein interaction networks - Michael Kuhn

these methods depend on seed genes, which limits the power of such methods - Michael Kuhn

Question: can one borrow seed genes of other diseases for predicting the basis of a disease without known genetic basis? - Michael Kuhn

consider disease as a multi-layered network: sequence -- molecules/proteins -- clinical traits -- disease - Michael Kuhn

sharing of nodes and edges between diseases - Michael Kuhn

can calculate similarity of diseases in their clinical traits - Michael Kuhn

linear regression model to explain similarity between diseases based on contributions from genes (that are close in the PPI network) - Michael Kuhn

approach called CIPHER - Michael Kuhn

concordance score for gene--disease links - Michael Kuhn

Highlights Track: HL33
Mitochondrial beta-barrel Outer Membrane Proteins, All Accounted For?
Tuesday, June 30 - 3:45 p.m. - 4:10 p.m.
Room: T2
Presenting author: Paul Horton, AIST, Computational Biology Research Center, Japan
Presentation Overview:
Mitochondrial -barrel Outer Membrane Proteins (MBOMPs) are an important class of proteins which include the essential proteins Tom40, Sam50 and the highly abundant VDAC protein. (Until our challenge) it has been thought that Eukaryotic genomes such as yeast would have 100\'s of such proteins [Wimley, Cur. Opin. Struct. Bio. 2003], an estimate largely based on analogy to bacteria, which share a common ancestor with mitochondria and have numerous families of -barrel Outer Membrane Proteins. Interestingly, despite these high estimates and the availability of the complete genomes of many Eukaryotic organisms; only 5 families of MBOMPs are currently known: (Tom40, Sam50, VDAC, Mdm10, Mmm2).In our talk we will present substantial evidence to support the provocative hypothesis that the 5 known families of MBOMPs represent all or nearly all of the MBOMPs in the entire Eukaryotic world. This conclusion is based on 1) our initial analysis [Imai et al. Cell 2008] of the recently discovered -signal ([Kutik et al. Cell 2008]) for MBOMP membrane integration, and 2) further analysis covering all uniprot Eukaryotic proteins as well as a search for possible \"bacteria-like\" MBOMPs without a -signal..

ISMB/ECCB 2009 Blog

HL33: Paul Horton - Mitochondrial beta-barrel Outer Membrane Proteins, All Accounted For?

Add a Comment - Like - View in FriendFeed

Highlights Track: HL34
Network-based prediction of human tissue-specific metabolism
Wednesday, July 1 - 10:45 a.m. - 11:10 a.m.
Room: T1
Presenting author: Tomer Shlomi, Technion, Israel
Presentation Overview:
A major challenge in studying metabolic processes in mammals is that different tissues are characterized by distinct metabolic functions whose direct in vivo investigation is difficult. Here we present the first computational method that successfully obtains a large-scale, tissue-specific description of human metabolism. Our approach is based on integrating tissue-specific gene and protein expression data with an existing comprehensive reconstruction of the global metabolic network. Applying the method to predict tissue-specific metabolic activity for 10 human tissues reveals that post-transcriptional regulation plays a central role in shaping tissue-specific metabolic activity profiles. The predicted tissue specificity of metabolic disease-causing genes and of metabolite exchange with biofluids are shown to go markedly beyond that manifested in the enzyme expression data, and are validated via large-scale mining of tissue-specificity data. Our results lay down the computational basis for the genome-wide study of normal and abnormal human metabolism in a tissue-specific manner.

ISMB/ECCB 2009 Blog

HL34: Tomer Shlomi - Network-based prediction of human tissue-specific metabolism

Add a Comment - Like - View in FriendFeed

so far similar to Sunday's biopathways SIG talk: http://friendfeed.com/ismbeccb2009/59604510/biopathways-sig-tomer-shlomi-predicting - Michael Kuhn

want to predict metabolic activity of human genes - Michael Kuhn

problem: expression level does not reflect metabolic activity: has post-translational regulation - Michael Kuhn

(4% up-regulated, 16% down-regulated) - Michael Kuhn

large-scale validation using tissue-specific literature data - Michael Kuhn

tissue-specificity of disease genes: if a gene causes a disease in a certain tissue, it is likely to be active in this tissue - Michael Kuhn

method is capable of predicting tissues where the genes are post-transcriptionally up-regulated - Michael Kuhn

highly expressed neighboring genes give hints that a certain gene will be active - Michael Kuhn

MEAT: metabolic expression analysis tool - Michael Kuhn

follow-up work: biomarkers from genetic disorders, see SIG talk linked at the top and the paper: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2683725 - Michael Kuhn

Highlights Track: HL35
Context-specific BLAST detects twice as many homologous proteins as BLAST
Wednesday, July 1 - 10:45 a.m. - 11:10 a.m.
Room: T2
Presenting author: Johannes Soeding, University of Munich, Germany
Presentation Overview:
We present a context-specific approach to sequence comparison that allows to drastically improve the sensitivity and alignment quality in comparison with conventional search methods. Our context-specific version of BLAST, CS-BLAST, achieves over two-fold increase in sensitivity at the same specificity and speed, the iterative version CSI-BLAST finds as many homologs after two iterations as PSI-BLAST after five iterations.

ISMB/ECCB 2009 Blog

HL35: Johannes Soeding - Context-specific BLAST detects twice as many homologous proteins as BLAST

Add a Comment - Like - View in FriendFeed

Gabriele Sales and Diego M. Riaño-Pachón liked this

Not only about BLAST, but also about models of protein evolution. - Gabriele Sales

Importance of BLAST searches: 400,000 searches run at NCBI per day. - Gabriele Sales

BLAST (1990) and PSI-BLAST (1997) cited 45,000 times. - Gabriele Sales

Protein bioinformatics relies heavily on sequence searching. - Gabriele Sales

Sequence alignments are a special case of profile alignments. - Gabriele Sales

The score of an alignment can be thought as the log of the ratio of the probability of the mutations needed to go from sequence X to Y over the average probability of Y. - Gabriele Sales

Context specific substitution matrices used with success, among others, for protein structure prediction (Rice & Eisenberg 1997, Huang & Bystroff 2006). - Gabriele Sales

Their approch: take 6 neighbours to the left and to the right of a nucleotide. - Gabriele Sales

The search uses a sliding window over the sequence; the mutation probability is computed by looking at a precalculated library of profiles. - Gabriele Sales

Profiles are built out of homology relations found with BLAST. The library contains 1 million profiles. - Gabriele Sales

Highlights Track: HL36
Leveraging the context-specific coordination of transcript and metabolite concentrations to discover gene-metabolite interactions.
Wednesday, July 1 - 11:15 a.m. - 11:40 a.m.
Room: T1
Presenting author: Patrick Bradley, Princeton University , United States
Presentation Overview:
Metabolite concentrations can regulate gene expression, which can in turn regulate metabolic activity. The extent to which functionally related transcripts and metabolites show similar patterns of concentration changes, however, remains unestablished. We measure and analyze the metabolomic and transcriptional responses of Saccharomyces cerevisiae to carbon and nitrogen starvation. Our analysis demonstrates that transcripts and metabolites show coordinated response dynamics. Furthermore, metabolites and gene products whose concentration profiles are alike tend to participate in related biological processes. To identify specific, functionally related genes and metabolites, we develop an approach based on Bayesian integration of the joint metabolomic and transcriptomic data. This algorithm finds interactions by evaluating transcript-metabolite correlations in light of the experimental context in which they occur and the class of metabolite involved. It effectively predicts known enzymatic and regulatory relationships, including a gene-metabolite interaction central to the glycolytic-gluconeogenetic switch. This work provides quantitative evidence that functionally related metabolites and transcripts show coherent patterns of behavior on the genome scale and lays the groundwork for building gene-metabolite interaction networks directly from systems-level data.

ISMB/ECCB 2009 Blog

HL36: Patrick Bradley - Leveraging the context-specific coordination of transcript and metabolite concentrations to discover gene-metabolite interactions.

Add a Comment - Like - View in FriendFeed

[apologizes to bloggers for the long title we'd have to write down :) ] - Michael Kuhn

want to find missing edges in metabolic maps, especially for non-model organisms - Michael Kuhn

e.g. pathogens which have evolved novel p/ws - Michael Kuhn

want to find different types of edges (e.g. regulation) - Michael Kuhn

gene-metabolite interactions: AMP kinase senses concentrations of AICAR, AMP and ATP - Michael Kuhn

how do transcription and metabolism influence each other on a global scale? - Michael Kuhn

paper: Bradley PH, Brauer MJ et al, PLoS Comp Bio, Jan 2009. http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1000270 - Michael Kuhn

starve yeast for etiehr carbon or nitrogen, look at transcriptome and metabolomic response - Michael Kuhn

SVD (singular value decomposition) suggests that transcr. and metab. responses are coordinated - Michael Kuhn

4 broad classes of metabolites: glycolysis; TCA cycle; biosynthetic intermediates; amino acids - Michael Kuhn

Highlights Track: HL37
Classification, Evolution, and Assembly of Protein Complexes
Wednesday, July 1 - 11:15 a.m. - 11:40 a.m.
Room: T2
Presenting author: Emmanuel Levy, Universite de Montreal, Canada
Presentation Overview:
A homomer is formed by self-interacting copies of a protein unit. This is functionally important, as in allostery, and structurally crucial because mis-assembly of homomers is implicated in disease. Homomers are widespread, with 50Ð70% of proteins with a known quaternary state assembling into such structures. Despite their prevalence, little is known about the mechanisms that drive their formation, both at the level of evolution and assembly in the cell. Here we present an analysis of over 5,000 unique atomic structures and show that the quaternary structure of homomers is conserved in over 70% of protein pairs sharing as little as 30% sequence identity. Where quaternary structure is not conserved, a detailed investigation revealed well-defined evolutionary pathways by which proteins transit between different quaternary structure types. Furthermore, we show by perturbing subunit interfaces within complexes and by mass spectrometry analysis, that the (dis)assembly pathway mimics the evolutionary pathway. These data represent a molecular analogy to Haeckel\'s evolutionary paradigm of embryonic development, where an intermediate in the assembly of a complex represents a form that appeared in its own evolutionary history. Our model of self-assembly allows reliable prediction of evolution and assembly of a complex solely from its crystal structure.

ISMB/ECCB 2009 Blog

HL37: Emmanuel Levy - Classification, Evolution, and Assembly of Protein Complexes

Add a Comment - Like - View in FriendFeed

Highlights Track: HL38
MeltDB: A software platform for the analysis and integration of Metabolomics Experiment Data
Wednesday, July 1 - 11:45 a.m. - 12:10 p.m.
Room: T1
Presenting author: Heiko Neuweger, Bielefeld University, Germany
Presentation Overview:
The recent advances in metabolomics have created the potential to measure the levels of hundreds of metabolites which are the end products of cellular regulatory processes. The automation of the sample acquisition and subsequent analysis in high-throughput instruments that are capable of measuring metabolites is posing a challenge on the necessary systematic storage and computational processing of the experimental datasets. Whereas a multitude of specialized software systems for individual instruments and preprocessing methods exists, there is clearly a need for a free and platform-independent system that allows the standardized and integrated storage and analysis of data obtained from metabolomics experiments. Here, we present the web based and platform independent MeltDB systems that provides functionality to consistently store, organize and annotatethe datasets generated in metabolomics experiments. The system offersfunctionality for the preprocessing of mass spectrometry datasets in the file formats netCDF, mzXML and mzData. The results of the preprocessing are visualized and integrated within a functional genomics context and access to higher level statistical analysis are provided via the MeltDB web interface.

ISMB/ECCB 2009 Blog

HL38: Heiko Neuweger - MeltDB: A software platform for the analysis and integration of Metabolomics Experiment Data

Add a Comment - Like - View in FriendFeed

Highlights Track: HL39
Evolutionary potentials for protein structure and function prediction
Wednesday, July 1 - 11:45 a.m. - 12:10 p.m.
Room: T2
Presenting author: Francisco Melo, P. Universidad Catolica de Chile, Chile
Presentation Overview:
We describe a new type of potentials for protein structure prediction, which are called \'evolutionary potentials\' (EvPs). In contrast to current potentials, which are derived from a set of non-redundant protein structures, the EvPs described here exploit the evolutionary record of all known proteins that adopt a specific fold.This study involved large-scale computations such as the structural comparison and clustering of the complete Protein Data Bank, the comparison at the sequence level of the non-redundant proteins against all known proteins at the NCBI database (about 7 million proteins), the building of 3D structure models for all proteins present in each multiple sequence alignment and the derivation of about 19,000 EvPs. The performance of EvPs was assessed for the task of fold assessment. It was demonstrated that EvPs outperform a typical representative potential and that the increase in performance is not a consequence of the amount of information retrieved.As an extension of the already published paper, recent results when using EvPs in the detection of distantly related proteins that would adopt a similar structure, as well as in the detection of key residues for protein function, will be presented for specific and biologically relevant example cases.

ISMB/ECCB 2009 Blog

HL39: Francisco Melo - Evolutionary potentials for protein structure and function prediction

Add a Comment - Like - View in FriendFeed

Ruchira S. Datta liked this

protein structure prediction through: fold recognition and comparative modeling, or ab initio structure prediction - Ruchira S. Datta

after making a model, its quality is assessed to decide whether to use it or discard it - Ruchira S. Datta

usually have several iterations of comparative modeling - Ruchira S. Datta

will speak about model quality assessment, an important ingredient in this process - Ruchira S. Datta

below 40% sequence identity, have larger errors in predicted structure, mostly due to sequence alignment error - Ruchira S. Datta

one way of dealing with this is to produce many alignments and use model quality assessment to assess the errors of the produced models - Ruchira S. Datta

detect errors due to incorrect template and misalignment, which can occur with template-target identities of 20%, 25%, or 30% id - Ruchira S. Datta

knowledge-based potentials, mean force potentials, or statistical potentials are scoring functions for model assessment - Ruchira S. Datta

states are represented using geometrical descriptors - Ruchira S. Datta

obtained scores represent pseudo-energies - Ruchira S. Datta

Highlights Track: HL40
Histone modifications at human enhancers reflect global cell-type-specific gene expression
Wednesday, July 1 - 12:15 p.m. - 12:40 p.m.
Room: T1
Presenting author: Gary Hon, University of California, San Diego, United States
Presentation Overview:
Although it is known that gene expression is driven by promoters, enhancers, and insulators, the relative roles of these regulatory elements in this process are not clear. Here we identify these elements in multiple cell types and investigate their roles in cell-type-specific gene expression. We observed that the chromatin state at promoters and CTCF-binding at insulators is largely invariant across diverse cell types. In contrast, enhancers are marked with highly cell-type-specific histone modification patterns, strongly correlate to cell-type-specific gene expression programs on a global scale, and are functionally active in a cell-type-specific manner. Our results define over 100,000 potential transcriptional enhancers in the human genome, significantly expanding the current catalogue of human enhancers and highlighting the role of these elements in cell-type-specific gene expression.

ISMB/ECCB 2009 Blog

HL40: Gary Hon - Histone modifications at human enhancers reflect global cell-type-specific gene expression

Add a Comment - Like - View in FriendFeed

Different cells are different because they express different genes. - Cass Johnston

Gene expression profiles from microarray studies can be used to identify cell types - Cass Johnston

Regulation of transcription still not really understood. Coding regions are well studied, but most of the genome is non-coding. Presumably these regions contain regulatory elements. - Cass Johnston

Some regulatory elements are known. Promoters, Enhancers, Insulators. - Cass Johnston

Enhancers are the least understood elements in transcriptional regulation. - Gabriele Sales

How to find enhancers genome wide? - Cass Johnston

Enhancers can be found by: sequence elements (>2500 TF motifs). - Gabriele Sales

sequence based searches? >2500 transcription factor motifs... - Cass Johnston

can look at motifs, but they are highly degenerate - Mikhail Spivakov

But such motives are short and highly degenerate. - Gabriele Sales

Highlights Track: HL41
Comparative analysis of crystal interfaces of homologous proteins
Wednesday, July 1 - 12:15 p.m. - 12:40 p.m.
Room: T2
Presenting author: Roland Dunbrack, Fox Chase Cancer Center, United States
Presentation Overview:
Many proteins act as homooligomers. Often there is little direct evidence of what interfaces are present in the biologically active oligomer(s). Most such oligomers are derived from examination of crystal interfaces, but these are mostly hypothetical, based on surface area and observation of specific kinds of interactions. We have compared interfaces in crystals across different crystal forms of proteins, and correlated the existence of an interface in all or most crystal forms as evidence in favor of biological relevance. We used previously published benchmarks as well as monomeric and oligomeric structures solved by solution NMR to validate this assumption. We find that presence of a similar interface in two or more crystals when sequence identity is less than 90% correlates highly with the benchmark data. The data indicate that of three publicly available sources of biological assemblies (PDB, PQS, and PISA) that PISA is most consistent with interfaces observed in large numbers of crystal forms. Comparative crystal analysis is better at identifying likely monomers, when there are large interfaces present in crystals which automated methods tend to identify as biologically relevant. The cytosolic sulfotransferases provide a particularly interesting example of this kind of analysis.

ISMB/ECCB 2009 Blog

HL41: Roland Dunbrack - Comparative analysis of crystal interfaces of homologous proteins

Add a Comment - Like - View in FriendFeed

two papers: Xu JMB 381, 487-507 2008: http://www.ncbi.nlm.nih.gov/pubmed/18599072 , missed 2nd one - Michael Kuhn

homooligomers: 45-60% of all proteins in the PDB are homolooligomers - Michael Kuhn

is a protein an oligomer in solution? - Michael Kuhn

can determine using gel filtration, analytical ultracentrification, cross-linking + SDS-PAGE - Michael Kuhn

sources of information: PDB/PQS/PISA - Michael Kuhn

need to distinguish between asymmetric and biological units - Michael Kuhn

Xu, Canutescu, Dunbrack, Bioinformatics, 2006: look at overlap of the 3 info sources - Michael Kuhn

66% total intersection - Michael Kuhn

see different annotations for the same protein family - Michael Kuhn

shows an example where a dimer is formed by a small interface, the biounit dbs often don't get this correct - Michael Kuhn

Highlights Track: HL42
Exploring the human genome with functional maps
Wednesday, July 1 - 2:15 p.m. - 2:40 p.m.
Room: T1
Presenting author: Curtis Huttenhower, Princeton University, United States
Presentation Overview:
Human genomic data of many types are readily available, but the complexity and scale of human molecular biology make it difficult to integrate this body of data, understand it from a systems level, and apply it to the study of specific pathways or genetic disorders. An investigator could best explore a particular protein, pathway, or disease if given a functional map summarizing the data and interactions most relevant to his or her area of interest. Using a regularized Bayesian integration system, we provide maps of functional activity and interaction networks in over 200 areas of human cellular biology, each including information from ~30,000 genome-scale experiments. Key to these analyses is the ability to efficiently summarize this large data collection from a variety of biologically informative perspectives: prediction of protein function and functional modules, cross-talk among biological processes, and association of novel genes and pathways with known genetic disorders. Experimental investigation of five specific genes (AP3B1, ATP6AP1, BLOC1S1, LAMP2 and RAB11A) has confirmed novel roles for these proteins in the proper initiation of macroautophagy in human fibroblasts. Our functional maps can be explored using HEFalMp, a web interface allowing interactive visualization and investigation of this large body of information.

ISMB/ECCB 2009 Blog

HL42: Curtis Huttenhower - Exploring the human genome with functional maps

Add a Comment - Like - View in FriendFeed

Kuan-Ting Lin, Oliver Hofmann and Mike Chelen liked this

HEFalMp: data integration for human genomic data. - Gabriele Sales

Integration of genomic data and prior knowledge using bayesian integration. - Gabriele Sales

Each resulting network is based on a specific functional context. - Gabriele Sales

Input data is huge: 10,000s of experiments, billions of data points. - Gabriele Sales

But also very sparse. - Gabriele Sales

Our prior knowledge comes from GeneOntology - Venkata P. Satagopam

Human case sources: genomic data (interactions, microarrays, sequences); prior knowledge (229 biological processes from GO and KEGG). - Gabriele Sales

There is so much information about humans that the independence assumption at the base of naive bayesian classification is violated. Used mutual information for regularization. - Gabriele Sales

start with predicted gene or gene product interaction network - Ruchira S. Datta

Functional mapping methodology. Start from gene interaction network. - Gabriele Sales

Highlights Track: HL43
Disordered flanks prevent peptide aggregation
Wednesday, July 1 - 2:15 p.m. - 2:40 p.m.
Room: T2
Presenting author: Sanne Abeln, FOM Institute for Atomic and Molecular Physics [AMOLF], Netherlands
Presentation Overview:
In their natural cellular environment proteins are dissolved in a concentrated aqueous solution of biomolecules. Even under such crowded conditions, proteins must not aggregate; aggregates may be cytotoxic or compromise the biological function of the peptide. Evolutionary pressure generally ensures that proteins do not aggregate in their natural biochemical environment. A well-known mechanism to prevent aggregation is the folding of proteins. Here we report a different mechanism that can prevent the aggregation of proteins. Recently, it was discovered that many proteins contain regions that are disordered (not folded) in their natural environment. We show with coarse-grained simulations that embedding small hydrophobic binding motifs in disordered regions can prevent aggregation: the disordered regions of different proteins sterically hinder the formation of aggregates. Moreover, our simulations show that the disordered regions have no adverse effect on the biological (signalling) function of the binding motifs, because they do not obstruct the binding and folding of the binding motif on its specific substrate.

ISMB/ECCB 2009 Blog

HL43: Sanne Abeln - Disordered flanks prevent peptide aggregation

Add a Comment - Like - View in FriendFeed

Highlights Track: HL44
Benchmarking tools in Metabolic Pathway Analysis
Wednesday, July 1 - 2:45 p.m. - 3:10 p.m.
Room: T1
Presenting author: Luis de Figueiredo, Friedrich-Schiller-UniversitŠt Jena, Germany
Presentation Overview:
Metabolic Pathway Analysis is a growing field within Systems Biology. For thereconstruction and prediction of metabolic pathways, the concept of elementaryflux modes has turned out to be very useful. It takes into accountstoichiometry and mass balance at steady state not only for monomolecularreactions but also for reactions of higher molecularity. Alternative approacheshave been proposed, which are based on graph theory and neglect the mass balanceof co-substrates and byproducts. Here, we present three benchmark examples bywhich pathway analysis tools can be compared. They concern the question whethereven-chain fatty acids can be converted into sugars in animals, whetherMycoplasma hominis can convert glucose into pyruvate and whether human redblood cells can salvage hypoxanthine. Moreover, new results and directions willbe given for improving stoichiometry-based tools in order to deal withgenome-scale metabolic networks.

ISMB/ECCB 2009 Blog

HL44: Luis de Figueiredo - Benchmarking tools in Metabolic Pathway Analysis

Add a Comment - Like - View in FriendFeed

how to represent a metabolic system? it's a thermodynamically open system, we need to draw a boundary around it - Ruchira S. Datta

we approximate that the system reaches a steady state - Ruchira S. Datta

certain metabolites form a pooll and are removed from the representation of the steady state - Ruchira S. Datta

Metabolic system can be represented as stoichiometric matrix - Venkata P. Satagopam

get the stoichiometric matrix; the solution set is a polyhedral convex cone. similarly if using other constrained representations - Ruchira S. Datta

Elementary Flux Mode: minimal set of enzymes that can operate at steady state - Ruchira S. Datta

See also here: http://ff.im/4vaOK - Ruchira S. Datta

1st bench mark system - conversion of fatty acids to carbohydrate - Venkata P. Satagopam

how to convert fatty acids to carbohydrate? Weinman et al 1957 showed a certain pathway is not possible - Ruchira S. Datta

model used by Weinman et al in 1957 - Venkata P. Satagopam

Highlights Track: HL45
From the detection of functional regions towards function annotation in proteins
Wednesday, July 1 - 2:45 p.m. - 3:10 p.m.
Room: T2
Presenting author: Nir Ben-Tal, Tel Aviv University, Israel
Presentation Overview:
The identification of functional regions in proteins may aid in function annotation, mutation analysis and drug discovery. In the talk I will present PatchFinder, a method for the identification of functionally important regions in proteins with known three-dimensional structure. The method is based on the assumption that these regions are often evolutionarily conserved in order to retain the effectiveness of the protein. My colleagues and I compiled the N-Func database of 757 proteins of unknown function, whose structure is known, and used PatchFinder to predict their functional regions. In some cases we suggested what the function might be. N-Func and PatchFinder are available as a webserver.Obviously, some of N-Func\\\'s protein bind DNA. In my talk I will also present a new method that we developed for discrimination between proteins that do and do not bind DNA. The method is based on various characteristics of the protein and the (predicted) functional region. We used it to predict DNA-binding proteins in the N-Func database.

ISMB/ECCB 2009 Blog

HL45: Nir Ben-Tal - From the detection of functional regions towards function annotation in proteins

Add a Comment - Like - View in FriendFeed

Highlights Track: HL46
Prioritizing functional modules mediating genetic perturbations and phenotypic effects
Wednesday, July 1 - 3:15 p.m.- 3:40 p.m.
Room: T1
Presenting author: Li Wang, University of Southern California, United States
Presentation Overview:
How variation in DNA leads to variation in phenotypes is a question open to interpretation. Here we present a global strategy based on the Bayesian network framework to prioritize the functional modules mediating genetic perturbations and their phenotypic effects among a set of overlapping candidate modules. We take lethality in Saccharomyces cerevisiae and human cancer as two examples to show the superiority of this approach over the traditional hypergeometric enrichment test, which ignores the interrelationships among modules.

ISMB/ECCB 2009 Blog

HL46: Li Wang - Prioritizing functional modules mediating genetic perturbations and phenotypic effects

Add a Comment - Like - View in FriendFeed

validation: ortholog lethal ration: the proportion of yeast genes whose orthologs in another species, specifically, C. elegans, are lethal - Ruchira S. Datta

hypothesis about ortholog lethal ratio: nonlethal genes in lethal complexes > nonlethal genes in nonlethal complexes - Ruchira S. Datta

hypergeometric enrichment test was not significantly different, but the Bayesian network model was validated. they think this is because the hypergeometric test produced many false positives - Ruchira S. Datta

gene lethality is more conserved at the module level - Ruchira S. Datta

use module-based mapping to transfer phenotypic mapping across species, rather than gene-based - Ruchira S. Datta

Highlights Track: HL47
Fitting multiple components into a cryoEM map of their assembly
Wednesday, July 1 - 3:15 p.m.- 3:40 p.m.
Room: T2
Presenting author: Keren Lasker, Tel Aviv University, Israel
Presentation Overview:
Models of macromolecular assemblies are essential for a mechanistic description of cellular processes. Such models are increasingly obtained by fitting atomic-resolution structures of components into a density map of the whole assembly. Yet, current density-fitting techniques are frequently insufficient for an unambiguous determination of the positions and orientations of all components. In the first part of the talk, we will describe MultiFit, a method for simultaneously fitting atomic structures of components into their assembly density map at resolutions of as low as 25 Å. In MultiFit, the positions and orientations of the components are optimized with respect to a scoring function that includes the quality-of-fit of components in the map, the protrusion of components from the map envelope, and the shape complementarity between pairs of components. The scoring function is optimized by our new exact inference optimizer DOMINO (Discrete Optimization of Multiple INteracting Objects) that efficiently finds the global minimum in a discrete sampling space. In the second part of the talk, we will demonstrate the utility of MultiFit for modeling the configuration of large macromolecular assemblies.

ISMB/ECCB 2009 Blog

HL47: Keren Lasker - Fitting multiple components into a cryoEM map of their assembly

Add a Comment - Like - View in FriendFeed

Neil Saunders liked this

cryoEM has become a standard tool for structural characterization of large protein complexes - Anne Tuukkanen

Complex modeling by using em maps and minimizing an objective function. The objective function includes terms for geometric complementary, a fitting score and term for envelope penetration. - Anne Tuukkanen

Highlights Track: HL48
Architecture of CpG methylation in the human genome
Wednesday, July 1 - 3:45 p.m. - 4:10 p.m.
Room: K2
Presenting author: Israel Steinfeld, Technion - Israel Institute of Technology, Israel
Presentation Overview:
Constitutively unmethylated regions in the genome significantly contribute to open chromatin domains within a sea of global transcriptional repression. This constitutive unmethylated status is commonly thought to be directly associated to the density of CpG dinucleotides. We will present data and analysis results from genome-wide CpG-island methylation profiling of multiple human tissue samples. We will show how unmethylated regions (UMRs) seem to be formed during early embryogenesis, not as a result of CpG-ness, but rather through the recognition of specific sequence motifs closely associated with transcription start sites. We will describe the computational methods used in the analysis, including methylation status calling, motif discovery, GO enrichment and machine learning techniques. We will introduce a new class of nonpromoter UMRs that become de novo methylated in a tissue-specific manner during development and present experimental validation of our findings. In short, we show that UMRs influence genome structure and have a dynamic role in development.

ISMB/ECCB 2009 Blog

HL48: Israel Steinfeld - Architecture of CpG methylation in the human genome

Add a Comment - Like - View in FriendFeed

DNA methylation: modification of cytosines in CpG dinucleotides, maintained across cell divisions - Marcel Martin

CG dinucleotide content in HG: 1%, expected: 4.5% - Marcel Martin

CpG islands: regions on DNA that contain many CpGs. 28000 islands annotated in HG. almost all of them are near gene promoters - Marcel Martin

mDIP: methyl-DNA immunoprecipitation assay, similar to ChIP-chip. 244k DNA methylation array - Marcel Martin

array methylation score (IMS): average signal for all probes mapped to it. bimodal distribution. house keeping genes are methylated (ie, on one side of the distribution) - Marcel Martin

approx 15 samples (different tissues). almost all are not methylated (~70%) - Marcel Martin

Nature: Sp1 elements protect a CpG island from de novo methylation, Michael Brandeis et al, Nature 371, September 1994 - Marcel Martin

DRIM Discovering Rank Imbalanced Motifs: http://bioinfo.cs.technion.ac.il/drim - Marcel Martin

use machine learning to distinguish between methyl. and nonmethyl. islands. - Marcel Martin

UMR: undermethylated region (?) - Marcel Martin

Highlights Track: HL49
Robust simplifications of multiscale biochemical networks
Wednesday, July 1 - 3:45 p.m. - 4:10 p.m.
Room: T1
Presenting author: Andrei Zinovyev, Institut Curie, France
Presentation Overview:
Model reduction, i.e. simplifying complex models to simpler ones, preserving some important (dynamical) features is a necessary technique in almost all projects on biochemicalnetworks modeling, but unfortunately there is no yet solid methodology which can be systematically appliedfor systems biology models.There exist several model reduction directions, connected with time scale separation.A textbook notion of the \'limiting\' reaction can be applied to few simple networkstructures. In this paper we show how the notion of the \'limiting\' reaction rate can be applied forlarge and complex networks.The general view that we develop is that at a particular time window, only a small dominant subsetof reactions determine the behaviour of the biochemical model, but this subset can change with time.We develop an algorithm of model reduction in linear networks, for which the dominant system is unique. We show that the case of hierarchical linear biochemical networks is the only one when the topology of the network determinescompletely its dynamical features.We develop a notion of the dominant subsystem for non-linear networks and demonstrate howit can be applied for simplifying a complex model of NFkB signalling. Our approach generatesa hierarchy of simplified models some levels of which can be compared with existing models of NFkB signalling.

ISMB/ECCB 2009 Blog

HL49: Andrei Zinovyev - Robust simplifications of multiscale biochemical networks

Add a Comment - Like - View in FriendFeed

model reduction: simplify models in order to understand - Ruchira S. Datta

e.g., "Consider a spherical cow..." A farmer hires a physicist to help with milk production, who takes half a year to produce a paper beginning thusly. - Ruchira S. Datta

complexity of data and complexity of models: non-identifiable models still have robust properties - Ruchira S. Datta

See e.g. Chen et al, Molecular Systems Biology 5:239, 2009 - Ruchira S. Datta

given only the order relations between the model parameters, can we provide robust first and second order solutions? - Ruchira S. Datta

biological systems are hierarchical and multiscale (an observation, not a theorem) - Ruchira S. Datta

structure: functional modules, motifs; scales: time scales, concentratio scales - Ruchira S. Datta

this makes it possible to neglect small quantities in favor of larger ones, given proper theory - Ruchira S. Datta

aymptotic approximations in chemical kinetics: quasi-equilibrium (fast reactions), quasi-steady state, ..., quasistationary - Ruchira S. Datta

enzymatic catalysis in quasiequilibrium vs quasistationary approximations gives very different results - Ruchira S. Datta

Highlights Track: HL50
Rapid sampling of molecular motion with prior information constraints: Insights into channel gating and domain swapping
Wednesday, July 1 - 3:45 p.m. - 4:10 p.m.
Room: T2
Presenting author: Ora Schueler-Furman, The Hebrew University of Jerusalem, Israel
Presentation Overview:
Proteins are active, flexible machines that perform a range of different functions. Innovative experimental approaches may now provide limited partial information about conformational changes along motion pathways of proteins. There is therefore a need for computational approaches that efficiently incorporate prior information into motion prediction schemes. We present PathRover, a framework designed for the integration of prior information into the motion-planning algorithm of Rapidly-exploring Random Trees (RRT). Each suggested motion pathway comprises a sequence of low-energy, clash-free conformations that satisfy a number of prior information constraints, derived from experimental data or from expert intuition. The incorporation of prior information in an outright fashion narrows down the vast search in the typically high-dimensional conformational space, leading to dramatic reduction in running time. Hybridization of low-energy pathways is then performed using a novel algorithm for the efficient alignment and comparison of molecular motion pathways (similar to string matching algorithms). The suggested framework can serve as an effective, complementary tool for Molecular Dynamics, Normal Mode Analysis, and other prevalent techniques for predicting motion in proteins. We used PathRover to explore in detail molecular motions of domain swapping, substrate binding, and ion channel gating.

ISMB/ECCB 2009 Blog

HL50: Ora Schueler-Furman - Rapid sampling of molecular motion with prior information constraints: Insights into channel gating and domain swapping

Add a Comment - Like - View in FriendFeed

Highlights Track: HL51
GraphWeb: functional analysis of genomic networks
Thursday, July 2 - 10:45 a.m. - 11:10 a.m.
Room: Victoria Hall
Presenting author: JŸri Reimand, University of Tartu, Estonia
Presentation Overview:
Deciphering heterogeneous cellular networks of transcriptional regulation, protein interactions and metabolism is a great challenge of current systems biology. Such networks contain modules of interacting genes and proteins that potentially share regulatory mechanisms and common function. Here, we present GraphWeb (http://biit.cs.ut.ee/graphweb/), a web server for network analysis and module discovery. GraphWeb (i) integrates heterogeneous and multispecies data into networks; (ii) discovers topological network modules and (iii) interprets the modules using Gene Ontology, pathways, regulatory motifs and microRNA targets.

ISMB/ECCB 2009 Blog

HL51: Jüri Reimand - GraphWeb: functional analysis of genomic networks

Add a Comment - Like - View in FriendFeed

A tool for large, genome scale networks is applied to yeast data. - Roland Krause

The modular nature of molecular cell biology can be seen in today's large networks. A broad overview over the different, complex networks. - Roland Krause

Many biological entities can be seen as networks, e.g. proteins and genes, measured by DNA binding, physical and genetic interactions, which can be combined into heterogenous networks. - Roland Krause

Edges are not born equal, we need edge-weights, easy in gene expression networks. SAGA and SWI/SNF example. Assign local weights to rank edges in the network - Roland Krause

Data set are not born equal. Small scale data set are more reliable than global approaches. - Roland Krause

# Not sure I agree fully here, but he is merely motivating the weights approach. - Roland Krause

Some data set are trusted more than others. - Roland Krause

Different scores for global and local weights can be combined. - Roland Krause

Hairball. - Roland Krause

Everything is interconnected, no way to interpret. Needs to be dissected. - Roland Krause

Highlights Track: HL52
Characterizing transcriptome plasticity using whole-genome tiling arrays and machine learning
Thursday, July 2 - 10:45 a.m. - 11:10 a.m.
Room: T2
Presenting author: Georg Zeller, Friedrich Miescher Laboratory of the Max Planck Society, Germany
Presentation Overview:
Currently, whole-genome tiling arrays are still a cost-effective technology to quantitatively monitor transcriptomes. We present machine learning methods for data normalization and de novo transcript identification and show substantially improved accuracy of the resulting predictions compared to competing methods. Application to Arabidopsis tiling array data revealed thousands of new transcripts missing in current annotations, including ones with a stress-dependant expression pattern. We moreover characterized the transcriptomes of mutants impaired in various steps of RNA processing.

ISMB/ECCB 2009 Blog

HL52: Georg Zeller - Characterizing transcriptome plasticity using whole-genome tiling arrays and machine learning

Add a Comment - Like - View in FriendFeed

Neil Saunders liked this

Motivation: find all genes in the model plant Arabidopsis thaliana. - Gabriele Sales

profile transcriptome changes under stress - Michael Kuhn

tiling arrays: cost-effective, but noisy - Michael Kuhn

normalization pipeline: background correction to reduce imaging artifacts - Michael Kuhn

quantile normalization: comparison between arrays - Michael Kuhn

Normalization: background correction, quantile normalization between arrays, transcript normalization (to remove probe sequence bias) - Gabriele Sales

transcript normalization: probe sequence bias reduced - Michael Kuhn

Not all exon probes show the same level. - Gabriele Sales

assume that ideal transcripts have constant level of expression, try to predict deviation between actual and ideal signal intensities - Michael Kuhn

Sequence-based quantile normalization: exon and intron sequences are normalized to 2 distinct levels - sebi

Highlights Track: HL53
Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project
Thursday, July 2 - 10:45 a.m. - 11:10 a.m.
Room: T1
Presenting author: Chris Taylor, EMBL-EBI, United Kingdom
Presentation Overview:
Minimum information checklists specify the information that should be provided when reporting research. They promote transparency and data accessibility, and support more thorough quality assessment, increasing the value of data set, and by extension the competitiveness of the originators and the host database(s). However, with no mechanisms to coordinate checklist development, to establish the number of extant checklists and to track their evolution were both challenging exercises. Furthermore, overlaps in scope between checklists and arbitrary decisions on wording and structure almost guaranteed significant incompatibilities.Consequently, representatives of several checklist development projects began the MIBBI (Minimum Information for Biological and Biomedical Investigations) project (http://mibbi.org/). MIBBI has two broad goals: To provide access to checklists and their developers (a Ôone-stop shopÕ), and to foster the development of new, integrated checklist ÔsuiteÕ by the participant communities.Since publication last year the MIBBI project has found increasing favour with both journals and funders. The Portal now lists twenty-nine projects and the Foundry is generating new modules. Tools are also emerging to help researchers follow MIBBI guidelines. Overall, we have much of interest to the bioinformatics community: as significant consumers of research data, they stand to benefit greatly from the standardisation that MIBBI is encouraging.

ISMB/ECCB 2009 Blog

HL53: Chris Taylor - Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project

Add a Comment - Like - View in FriendFeed

Cass Johnston liked this

Standards are hugely dependent on their respective communities - Allyson Lister

Much of biological experiments can fall under an ISA (Investigation-Study-Assay) structure. - Allyson Lister

You should then use three types of standards: syntax (FuGE, ISA-TAB etc), semantics, and scope. MIBBI is all about scope. - Allyson Lister

Why do we care about standards? Data exchange, comprehensibility, and scope for reuse. - Allyson Lister

"Metaprojects": FuGE, OBI, ISA-TAB - draw together many different domains and present in structure/semantics useful across all. - Allyson Lister

When the independent MI projects overlap, arbitrary decisions on wording and substructuring make integration difficult. This makes it hard to take parts of different guidelines - not very modular. This is what MIBBI helps with. - Allyson Lister

MIBBI promotes gradual integration of checklists. - Allyson Lister

Nature Biotechnology 26, 889 - 896 (2008) doi:10.1038/nbt0808-889 - Allyson Lister

Blog post: http://themindwobbles.wordpress.com/2009/07/02/hl53-promoting-coherent-minimum-reporting-guidelines-for-biological-and-biomedical-investigations-the-mibbi-project-ismb-2009/ - Allyson Lister

Highlights Track: HL54
Search and discovery of recurring patterns with interactomes
Thursday, July 2 - 11:15 a.m. - 11:40 a.m.
Room: Victoria Hall
Presenting author: Mona Singh, Princeton University, United States
Presentation Overview:
Searching for recurring patterns in biological data has been the backbone of much research and analysis in bioinformatics. For example, within the realm of sequence analysis, the search for recurring or similar patterns has given rise to extensive work on sequence alignments and sequence motif discovery, and has resulted in large sequence motif libraries. Our goal is to begin to develop the analogous techniques for biological networks, by building a framework for searching and mining interaction networks in order to reveal and systematize the recurring protein and interaction patterns within them.I will describe:(1) Our formalism--network schemas--for representing recurring patterns within interactomes. Network schemas specify descriptions of proteins (e.g., their molecular functions or putative domains) and the topology of interactions among them. They can describe domain-domain interactions, signaling and regulatory pathways or more complex network patterns.(2) Our fast algorithms for searching for user-supplied network schemas in arbitrary biological networks.(3) Our framework for systematically uncovering recurring, over-represented schemas in physical interaction networks.(4) An application of our methods to the yeast and human interactomes, where we identify hundreds of recurring and over-represented network schemas of various complexity and provide several lines of evidence of their functional importance.

ISMB/ECCB 2009 Blog

HL54: Mona Singh - Search and discovery of recurring patterns with interactomes

Add a Comment - Like - View in FriendFeed

Neil Saunders liked this

Hairballs. - Roland Krause

Different large scale data sets, genetic, phosphorylation etc. exist but how to interpret? - Roland Krause

Add protein annotatons, sequence, structure, motifs, domains, functional characterization. - Roland Krause

Particular interested in interaction domains, analyzing cellular organization and interactomes. Can we discover and analyze recurring patterns. - Roland Krause

Analogy to multiple alignment of protein sequences with PROSITE pattern - Roland Krause

For networks schemas, nodes are description of proteins, e.g. domains. An extension of network by attributes, e.g. PFAM or PROSITE. - Roland Krause

Examples for network schema are homologous pathways. - Roland Krause

Netgrep has visual interface. - Roland Krause

No details of the algorithm, tricky problem but can solve it fast. - Roland Krause

Triangles, quad topology (linear combination of four), Y star topology of four. - Roland Krause

Highlights Track: HL55
The Computational Exploration of (Alternative) Splicing Mechanisms
Thursday, July 2 - 11:15 a.m. - 11:40 a.m.
Room: T2
Presenting author: Michael Sammeth, CRG/IMIM/UPF, Spain
Presentation Overview:
The recent advent of high-throughput methods that allow for sequencing the whole RNA complement of cell populations has given a first impression on the amount of novel exons and spliceforms that is expected to be added to transcriptome annotations over the next years. In this respect alternative splicing (AS) is more and more coming to the fore as the molecular mechanism that is mainly responsible for creating the plethora of different combinations Ð which finally account for organsim complexity Ð from the limited reservoir of genetically inherited information. In order to keep step with the flood of data, we have developed a generic model of AS based on an universal defenition of its atomary unit called \\\\\\\'event\\\\\\\' and a method which can efficiently retrieve all such events from huge datasets (i.e., millions of annotations) containing a high level of noise (i.e., sequencing errors). In the highlighted publication we focus on comparing these generic events to the ones that have usually been considered in computational analyzes across 12 different species. Additionally, we now show several examples in which characterization of such events provides bona fide propositions for kinetic models to explain the molecular mechanism of splicing in general.

ISMB/ECCB 2009 Blog

HL55: Michael Sammeth - The Computational Exploration of (Alternative) Splicing Mechanisms

Add a Comment - Like - View in FriendFeed

Nakao M. liked this

Numbering all possible splice variant sites, assign ASCII symbol to each site, "alternative splicing code" that captures all differences between alternative splicing variation - sebi

ASTALAVISTA web service generates these strings at http://genome.imim.es/astalavista - sebi

ESTs can be truncated, and still splicing events can be detected as a sub-structure - sebi

"Bubbles" are cyclic graphs, complete events of splicing. Can construct bubble hierarchies for very complex splicing loci. - sebi

Workflow: Gene annotation -> database of events -> visualization with ASTALAVISTA; optionally add RNA-Seq data for expression levels of events (even simulated data from the Flux Capacitator) - sebi

Highlights Track: HL56
Nominalization and alternations in the language of molecular biology: Implications for text mining
Thursday, July 2 - 11:15 a.m. - 11:40 a.m.
Room: T1
Presenting author: Kevin Cohen, University of Colorado School of Medicine, United States
Presentation Overview:
We present data on the understudied phenomenon of nominalization in the language of molecular biology and demonstrate its implications for the design of biomedical text mining systems.

ISMB/ECCB 2009 Blog

HL56: Kevin Cohen - Nominalization and alternations in the language of molecular biology: Implications for text mining

Add a Comment - Like - View in FriendFeed

nominalizations are dominant in biomedical texts, e.g. "expression" is much more used than "express" - Michael Kuhn

Cohen et al, 2008 [not a very useful reference]: nominalizations are more difficult to handle than verbs, but can yield higher precision - Michael Kuhn

http://www.plosone.org/article/info:doi%2F10.1371%2Fjournal.pone.0003158 - Michael Kuhn

alternation: variations like active/passive. much less characterized for nouns than for verbs - Michael Kuhn

e.g. pre-nominal arguments: agent: "phenobarbital treatment" or patient: "cancer treatment" - Michael Kuhn

previous work has tried to handle nominalizations, e.g. Ono et al (2001): interactions, association, complex and binding - Michael Kuhn

genescene: http://doi.ieeecomputersociety.org/10.1109/JCDL.2003.10011 [?} - Michael Kuhn

use in-house tool to do annotation of nominalizations - Michael Kuhn

alternations are very diverse, contrary to previous prediction that there would only be a limited number of alternations in scientific literature (sub-language model) - Michael Kuhn

Highlights Track: HL57
FunCoup: global networks of functional coupling in eukaryotes
Thursday, July 2 - 11:45 a.m. - 12:40 p.m.
Room: Victoria Hall
Presenting author: Erik Sonnhammer, Stockholm Bioinformatics Centre, Sweden
Presentation Overview:
Interactomes computationally predicted via data integration are becoming an increasingly popular tool and context for biological research. However merging disparate data sources and presenting relevant parts of a global network is not trivial.� FunCoup, an optimised Bayesian framework and a web resource, was developed to resolve these issues. Because interactomes comprise functional coupling of many types, FunCoup annotates network edges with confidence scores in support of different kinds of interactions � physical interaction, protein complex member, metabolic or signalling link. This capability boosted overall accuracy. On the whole, the constructed framework was comprehensively tested to optimise the overall confidence and ensure seamless, automated incorporation of new data sets of heterogeneous types. Using over 50 datasets in seven organisms, and extensively transferring information between orthologs, FunCoup predicted global networks in eight eukaryotes. For the Ciona intestinalis network only orthologous information was used, and it recovered a significant number of experimental facts. FunCoup predictions were validated on independent cancer mutation data. The networks, which are the largest interactome reconstructions to date, are freely available for download and query at http://FunCoup.sbc.su.se. The site allows detailed graphical and tabular analysis of subnetworks around query genes, as well as comparative analysis of orthologous networks in multiple species.

ISMB/ECCB 2009 Blog

HL57: Erik Sonnhammer - FunCoup: global networks of functional coupling in eukaryotes

Add a Comment - Like - View in FriendFeed

How to reconstruct networks - experimental networks are incomplete. - Roland Krause

Up to 300.000 interactions are proposed for human - Roland Krause

only 35000 known - Ruchira S. Datta

Each experimental method can give you more than 20% of the interactions. - Roland Krause

experiments have high false negative and false positive rates - Ruchira S. Datta

e.g., false positives from in vitro experiments: the interaction may never happen in a living cell - Ruchira S. Datta

Interactions have to be combined and evaluated. - Roland Krause

FunCoup - Venkata P. Satagopam

there are many kinds of evidence for functional coupling - Ruchira S. Datta

Lots of evidence for functional coupling, not only from PPI but als from localization, gene expresson, interacting domain, TFBS; miRNAs. - Roland Krause

Highlights Track: HL58
A new method for high-resolution gene expression analysis
Thursday, July 2 - 11:45 a.m. - 12:10 p.m.
Room: T2
Presenting author: Caroline Friedel, Ludwig Maximilians - University Munich, Germany
Presentation Overview:
We present a novel approach for measuring both RNA transcription and decay in a single experimental setting. We show that this approach increases the sensitivity for differentially expressed genes and temporal kinetics of transcriptional regulation. Furthermore, alterations in transcription and decay can be distinguished and rates of RNA turnover can be determined with superior accuracy. This provides new insights into gene regulation and is important for quantitative systems biology modelling.

ISMB/ECCB 2009 Blog

HL58: Caroline Friedel - A new method for high-resolution gene expression analysis

Add a Comment - Like - View in FriendFeed

Diego M. Riaño-Pachón liked this

Metabolic tagging of newly transcribed RNA - Diego M. Riaño-Pachón

Able to measure decay and de novo synthesis - Diego M. Riaño-Pachón

this methods tries to solve a long standing problem in transcriptomics, ] - Diego M. Riaño-Pachón

i.e., bias for differential expression of short-lived RNAs - Diego M. Riaño-Pachón

and thus in subsequent analysis - Diego M. Riaño-Pachón

solution is to perform gene expression analysis of newly transcribed RNA - Diego M. Riaño-Pachón

using 4-thiouridine - Diego M. Riaño-Pachón

Kenzelmann et al PNAS 2007 http://www.ncbi.nlm.nih.gov/pubmed/17405863 - Diego M. Riaño-Pachón

after tagging,normal analysis techniques for microarray or RNA-seq can be sused - Diego M. Riaño-Pachón

bur of course new methods could exploit features specific for the tagging of newly transcribed RNA - Diego M. Riaño-Pachón

Highlights Track: HL59
Comparative community assessments for applied biomedical text mining: BioCreative II challenge and metaservices.
Thursday, July 2 - 11:45 a.m. - 12:10 p.m.
Room: T1
Presenting author: Florian Leitner, Spanish National Cancer Research Centre (CNIO), Spain
Presentation Overview:
Comparative evaluation of computational tools applied to biological data is crucial to enable monitoring improvements over time and identify competitive strategies, while promoting the tools\' availability through service encapsulation and unification supports their usage. In the interest of biomedical text mining, the BioCreative initiative has contributed substantially to the assessment of text mining tools applied to biologically relevant tasks. The second BioCreative initiative not only had a considerable impact in the development of text mining systems for the extraction of biological entities and protein interactions, but also motivated the implementation of the first text mining meta-server for biology, the BioCreative MetaServer (BCMS). The BCMS is a key infrastructure for the upcoming BioCreative II.5 event, where efforts of text mining developers, biological annotation databases, article authors and publishers contribute to the improvement of information access to full text articles.

ISMB/ECCB 2009 Blog

HL59: Florian Leitner - Comparative community assessments for applied biomedical text mining: BioCreative II challenge and metaservices.

Add a Comment - Like - View in FriendFeed

Highlights Track: HL60
Studying alternative splicing regulatory networks through partial correlation analysis
Thursday, July 2 - 12:15 p.m. - 12:40 p.m.
Room: Victoria Hall
Presenting author: Liang Chen, University of Southern California, United States
Presentation Overview:
Alternative pre-mRNA splicing is an important gene regulation mechanism for expanding proteomic diversity in higher eukaryotes. The rapid accumulation of high-throughput data provides us an unprecedented opportunity to understand the complicated alternative splicing regulatory network. However, existing statistical and computational methods are still lagging behind the advanced technologies. Sorting out the coordinate and combinatorial alternative splicing regulatory network proposes a major challenge for post-genomic era. We developed statistical methods to derive signals from high-throughput exon array data or high-throughput sequencing data to understand the Òsplicing codesÓ in gene regulation. Partial correlation analysis was proposed to identify the association links between co-spliced exons and links between alternative exons and their regulators. The reconstructed splicing regulatory networks can help us better understand the coordinate and combinatorial nature of the alternative splicing regulation.

ISMB/ECCB 2009 Blog

HL60: Liang Chen - Studying alternative splicing regulatory networks through partial correlation analysis

Add a Comment - Like - View in FriendFeed

Highlights Track: HL61
Global Measures of Uncertainty: Long Overdue in Computational Molecular Biology
Thursday, July 2 - 12:15 p.m. - 12:40 p.m.
Room: T2
Presenting author: Lee Newberg, Wadsworth Center, United States
Presentation Overview:
High-dimensional (high-D) discrete prediction and estimation problems are arguably the most common inference problem in computational biology, covering a range from sequence alignment to network inference. Regardless of the procedure employed, when it delivers a single answer, that answer is a point estimate selected from the set of all possible solutions, the solution ensemble. For high-D discrete spaces the immense size of these ensembles almost always portends considerable uncertainty in estimation. Nevertheless our field has focused little attention on global uncertainty measures of these estimates. Specifically, the almost complete absence of confidence limits is a major oversight of our community, most embarrassingly including me. In this talk I describe credibility limits, Bayesian confidence limits (Webb), and two procedures for efficiently obtaining these limits: a sampling based procedure (Webb) and a more general DP algorithm (Newberg). This intentionally provocative talk will focus on the statistical underpinnings of prediction and estimation in discrete high-D spaces, the glaring need to delineate the uncertainty of estimates in these spaces, credibility limits for this delineation, the value of such measures in the comparison of estimators in the absence of Ògold standardsÓ, and an illustration of these principles on sequence alignment.

ISMB/ECCB 2009 Blog

HL61: Lee Newberg - Global Measures of Uncertainty: Long Overdue in Computational Molecular Biology

Add a Comment - Like - View in FriendFeed

Neil Saunders liked this

two papers: webb-robertson & lawrence; newberg ... lawrence - Michael Kuhn

what is a good e-value? - Diego M. Riaño-Pachón

E- and p-values are small when random data in unlikely to do well - Michael Kuhn

E-values and p-values tell you about random data but not whether there are other solutions. - Roland Krause

but there still might be other solutions which equally good E or p values - Michael Kuhn

need to define confidence / credibility limits - Michael Kuhn

e-values and p-values are poor proxies to credibility - Diego M. Riaño-Pachón

an example from RNA 2D prediction, MFE is not the best representative - Diego M. Riaño-Pachón

today's goal: compute a global measure of representativeness of a point estimate - Michael Kuhn

solution spaces are immense, but we often choose a particular point estimate... misleading e.g. if there's a bimodal solution - Michael Kuhn

Highlights Track: HL62
A Complete Neandertal Mitochondrial Genome Sequence Determined by High-Throughput Sequencing
Thursday, July 2 - 12:15 p.m. - 12:40 p.m.
Room: T1
Presenting author: Richard E. Green, Max Planck Institute for Evolutionary Anthropology, Germany
Presentation Overview:
Recent advances in high-throughput sequencing have opened new vistas in ancient DNA genomics. Following these advances, we have embarked on an effort to retrieve and assemble the genome of our closest, extinct relative, the Neandertal. As a preamble to the complete nuclear genome, we have recovered, sequenced and assembled several complete mitochondrial genomes from Neandertal fossil bones. The extreme depth of sequencing coverage in these assemblies allows for a comprehensive and qualitative assessment of the issues inherent in ancient DNA sequencing, alignment, and assembly. We encoded this knowledge into a custom assembler for ancient DNA. Using this assembler, we generated six complete Neandertal mtDNA genomes. Comparison within Neandertals and to human and other great ape mtDNA sequences confirms that Neandertal the mtDNA lineage diverged roughly 600,000 years ago. Furthermore, these data reveal the low genetic diversity among late Neandertals and imply an effective population size of fewer than 3,500 females.

ISMB/ECCB 2009 Blog

HL62: Richard Green - A Complete Neandertal Mitochondrial Genome Sequence Determined by High-Throughput Sequencing

Add a Comment - Like - View in FriendFeed

Allyson Lister and sebi liked this

Neandertals: closest extinct relative. existed 400000-30000 yrs ago - Marcel Martin

chimps are closest living relatives, but deviated longer ago - Marcel Martin

first DNA from extinct species was in 1985 from the Quagga - Marcel Martin

Neandertal mitochondrial genome, as it is present about 1000 copies per cell vs 2 copies of genomic DNA - sebi

mitochondrial genome is easier to recover since there are many copies per cell - Marcel Martin

http://www.mitomap.org/WorldMigrations.pdf - Marcel Martin

Mitochondial genome is useful for tracking maternal lineages, and it accumulates mutations slowly -- ideal for building trees - sebi

seems like there was no interbreeding between Neandertals and modern humans - Marcel Martin

Deep sequencing with high-throughput next generation sequencing, used to be direct PCR - sebi

ancient DNA fragments are just 60nt in length - Marcel Martin