Leading Professional Society for Computational Biology and Bioinformatics
Connecting, Training, Empowering, Worldwide

banner

TransMed: Translational Medical Informatics

COSI Track Presentations

Attention Presenters - please review the Speaker Information Page available here
Schedule subject to change
Sunday, July 8th
10:15 AM-10:20 AM
TransMed: Introduction
Room: Columbus IJ
  • Venkata Satagopam, University of Luxembourg
10:20 AM-10:30 AM
Opening talk on behalf of the organizing committee
Room: Columbus IJ
  • Reinhard Schneider, University of Luxembourg
10:30 AM-11:10 AM
Interpreting newborn genomes
Room: Columbus IJ

Presentation Overview: Show

This presentation will cover three projects in interpretation of human genomic variation.

We have developed an analysis protocol whose distinctive features enabled solving clinical cases. Applied to exomes from newborn patients with undiagnosed primary immune disorders, it helped guide appropriate treatment, family genetic counseling, and avoidance of diagnostic odyssey.

This inspired a project that explores the feasibility of sequencing to augment or supersede mass spectrometry for pervasive public health newborn screening. We sequenced exomes from de-identified dried blood spots of nearly all newborns affected with any metabolic disorder screened by tandem mass-spectroscopy (MS/MS) in California from 2006 to 2013 (around 1300 out of around 4.45 million screened). Our preliminary analysis indicates that several affected individuals lack any obviously damaging mutations in genes responsible for their metabolic disorders. We also found some cases where exomes confidently implicated a disorder different from the original diagnosis by the metabolic center clinician, suggesting that sequencing information would have been valuable for proper clinical diagnoses in some cases. While still not sufficiently specific for screening of all disorders, exomes could facilitate timely and more precise clinical resolution for some disorders.

To conclude, I will briefly present results from The Critical Assessment of Genome Interpretation (CAGI, \'kā-jē\), a community experiment to objectively assess computational methods for predicting the phenotypic impacts of genomic variation.

11:10 AM-11:20 AM
A systematic computational approach in translational medicine to integrate transcriptional profiles to clinical and structural changes for outcome prediction in Diabetic Kidney Disease
Room: Columbus IJ
  • Viji Nair, University of Michigan, United States
  • Claudiu Komorowsky, Universität Erlangen-Nürnberg, Germany
  • Jennifer Harder, University of Michigan, United States
  • Bradley Godfrey, University of Michigan, United States
  • Carine Boustany, Boehringer Ingelheim Pharmaceuticals Inc., United States
  • Kevin Lemley, University of Southern California Keck School of Medicine, United States
  • Robert Nelson, National Institute of Diabetes and Digestive and Kidney Diseases, United States
  • Matthias Kretzler, University of Michigan, United States

Presentation Overview: Show

Diabetic kidney disease progression is the major cause of ESRD. Currently the clinicians rely on less accurate GFR and ACR as biomarkers to evaluate disease progression. Integrating structural changes with molecular profiles using computational machine learning methods may identify functional correlates of early structural damage that predict subsequent disease progression
Transcriptomic profiling and quantitative morphometric scoring was performed on kidney biopsies. Transcriptional co-expression modules were generated and associated with traits using Weighted Gene Coexpression Network Analysis.Pathway Analysis and Network analysis was performed on the associated genes.
The tubulointerstitial damage, VvInt, showed strong association with molecular signatures from 4 modules. Inflammatory, cell-cell/cell-matrix interaction and metabolic pathways were enriched. A subset of these transcripts correlated with GFR and ACR measured ~10 years after biopsy.
A integrative translational approach captures mechanisms/biomarkers activated at a preclinical disease stage. 81% of these transcripts were regulated in an advanced stage of diabetic cohort. This overlap suggests that the fingerprints in the early diabetic cohort might provide a crucial entry point for early intervention targets and novel noninvasive biomarkers with predictive clinical utility. A similar approach in the glomerular tissue and traits identified markers that could classify patients with and without ESRD ( AUC =0.86)

11:20 AM-11:40 AM
Proceedings Presentation: Association Mapping in Biomedical Time Series via Statistically Significant Shapelet Mining
Room: Columbus IJ
  • Christian Bock, ETH Zurich, Switzerland
  • Thomas Gumbsch, ETH Zurich, Switzerland
  • Michael Moor, ETH Zurich, Switzerland
  • Bastian Rieck, ETH Zurich, Germany
  • Damian Roqueiro, ETH Zurich, Switzerland
  • Karsten Borgwardt, ETH Zurich, Switzerland

Presentation Overview: Show

Motivation: Most modern intensive care units continuously record the physiological and vital signs of patients. When processed, these data can be used to extract temporal signatures (biomarkers) that help physicians understand the biological complexity of many syndromes. However, most biological biomarkers suffer from either poor predictive performance or weak explanatory power. Recent developments in time series classification focus on discovering shapelets, i.e., subsequences that are most predictive in terms of class membership. Shapelets have the advantage of combining an interpretable component—their shape—with high predictive performance. Currently, most shapelet discovery methods do not rely on statistical tests to verify the significance of individual shapelets. Therefore, it is of the utmost importance to identify statistically significant associations between the shapelets of physiological biomarkers and patients that exhibit certain phenotypes of interest. This would enable the discovery and subsequent ranking of novel physiological signatures that are interpretable, statistically validated, and accurate predictors of clinical endpoints.
Results: We present a novel and scalable method for scanning time series and identifying discriminative patterns that are statistically significant. The significance of a shapelet is evaluated while considering the problem of multiple hypothesis testing and mitigating it by efficiently pruning untestable shapelet candidates with Tarone’s method. We demonstrate the utility of our method by discovering patterns in a patient’s vital signs (heart rate, respiratory rate, and systolic blood pressure) that are early indicators of the severity of a future sepsis event, i.e., an inflammatory response to an infective agent that can lead to organ failure and death, if not treated in time.
Availability: We make our method and the scripts that are required to reproduce the experiments publicly available at https://github.com/BorgwardtLab/S3M.

11:40 AM-12:00 PM
Proceedings Presentation: LONGO: An R Package for Interactive Gene Length Dependent Analysis for Neuronal Identity
Room: Columbus IJ
  • Matthew J. McCoy, Washington University, United States
  • Alexander J. Paul, Saint Louis University, United States
  • Matheus B. Victor, Washington University, United States
  • Michelle Richner, Washington University, United States
  • Harrison W. Gabel, Washington University, United States
  • Haijun Gong, Saint Louis University, United States
  • Andrew S. Yoo, Washington University, United States
  • Tae-Hyuk Ahn, Saint Louis University, United States

Presentation Overview: Show

Motivation: Reprogramming somatic cells into neurons holds great promise to model neuronal devel-opment and disease. The efficiency and success rate of neuronal reprogramming, however, may vary between different conversion platforms and cell types, thereby necessitating an unbiased, systematic ap-proach to estimate neuronal identity of converted cells. Recent studies have demonstrated that long genes (>100 kb from transcription start to end) are highly enriched in neurons, which provides an oppor-tunity to identify neurons based on the expression of these long genes.
Results: We have developed a versatile R package, LONGO, to analyze gene expression based on gene length. We propose a systematic analysis of long gene expression (LGE) with a metric termed the long gene quotient (LQ) that quantifies LGE in RNA-seq or microarray data to validate neuronal identity at the single-cell and population levels. This unique feature of neurons provides an opportunity to utilize measurements of LGE in transcriptome data to quickly and easily distinguish neurons from non-neuronal cells. By combining this conceptual advancement and statistical tool in a user-friendly and interactive software package, we intend to encourage and simplify further investigation into LGE, particularly as it applies to validating and improving neuronal differentiation and reprogramming methodologies.

12:00 PM-12:20 PM
Proceedings Presentation: Driver gene mutations based clustering of tumors: methods and applications
Room: Columbus IJ
  • Wensheng Zhang, Xavier University of Louisiana, United States
  • Erik Flemington, Tulane University, United States
  • Kun Zhang, Xavier University of Louisiana, United States

Presentation Overview: Show

Motivation: Somatic mutations in proto-oncogenes and tumor suppressor genes constitute a major category of causal genetic abnormalities in tumor cells. The mutation spectra of thousands of tumors have been generated by The Cancer Genome Atlas (TCGA) and other whole genome (exome) sequencing projects. A promising approach to utilizing these resources for precision medicine is to identify genetic similarity-based subtypes within a cancer type and relate the pinpointed subtypes to the clinical outcomes and pathologic characteristics of patients.
Results: We propose two novel methods, ccpwModel and xGeneModel, for mutation-based clustering of tumors. In the former, binary variables indicating the status of cancer driver genes in tumors and the genes’ involvement in the core cancer pathways are treated as the features in the clustering process. In the latter, the functional similarities of putative cancer driver genes and their confidence scores as the “true” driver genes are integrated with the mutation spectra to calculate the genetic distances between tumors. We apply both methods to the TCGA data of 16 cancer types. Promising results are obtained when these methods are compared to state-of-the-art approaches as to the associations between the determined tumor clusters and patient race (or survival time). We further extend the analysis to detect mutation-characterized transcriptomic prognostic signatures, which are directly relevant to the etiology of carcinogenesis.

12:20 PM-12:40 PM
Proceedings Presentation: Learning with multiple pairwise kernels for drug bioactivity prediction
Room: Columbus IJ
  • Anna Cichonska, University of Helsinki, Finland
  • Tapio Pahikkala, University of Turku, Finland
  • Sandor Szedmak, Aalto University, Finland
  • Heli Julkunen, Aalto University, Finland
  • Antti Airola, University of Turku, Finland
  • Markus Heinonen, Aalto University, Finland
  • Tero Aittokallio, University of Helsinki, Finland
  • Juho Rousu, Aalto University, Finland

Presentation Overview: Show

Motivation: Many inference problems in bioinformatics, including drug bioactivity prediction, can be formulated as pairwise learning problems, in which one is interested in making predictions for pairs of objects, e.g., drugs and their targets. Kernel-based approaches have emerged as powerful tools for solving problems of that kind, and especially multiple kernel learning (MKL) offers promising benefits as it enables integrating various types of complex biomedical information sources in the form of kernels, along with learning their importance for the prediction task. However, the immense size of pairwise kernel spaces remains a major bottleneck, making the existing MKL algorithms computationally infeasible even for small number of input pairs. Results: We introduce pairwiseMKL, the first method for time- and memory-efficient learning with multiple pairwise kernels. pairwiseMKL first determines the mixture weights of the input pairwise kernels, and then learns the pairwise prediction function. Both steps are performed efficiently without explicit computation of the massive pairwise matrices, therefore making the method applicable to solving large pairwise learning problems. We demonstrate the performance of pairwiseMKL in two related tasks of quantitative drug bioactivity prediction using up to 167 995 bioactivity measurements and 3 120 pairwise kernels: (i) prediction of anticancer efficacy of drug compounds across a large panel of cancer cell lines; and (ii) prediction of target profiles of anticancer compounds across their kinome-wide target spaces. We show that pairwiseMKL provides accurate predictions using sparse solutions in terms of selected kernels, and therefore it automatically identifies also data sources relevant for the prediction problem.

12:40 PM-2:00 PM
Lunch Break
2:00 PM-2:20 PM
Tracing genomic histories and environmental influences in cancer development
Room: Columbus IJ

Presentation Overview: Show

Cancer develops as a result of various mutational processes that act upon cells, imposing specific mutational footprints in their genomes. Recent advances in computational methodologies for inferring these mutational footprints have allowed us to retrace patterns of cancer development to the original environmental or intrinsic risk factors. I will discuss some of the progress achieved in this area and exemplify our recent use of the methodology to elucidate aetiology subtypes in oesophageal adenocarcinoma, an aggressive cancer with poor prognosis and increasing incidence in the Western world. Along with genomic alterations, another constraint during cancer development and progression is the tumour microenvironment, which shapes cancer trajectories either by promoting tumour invasion or by suppressing abnormal growth. I will discuss several insights into tumour immunity obtained recently from large-scale sequencing efforts by combining mutational profiles and immune cell abundance estimates from bulk tumour data. Employing data integration and machine learning approaches can help uncover genomic instability characteristics and specific mutational triggers associated with immune phenotypes in cancer. I will illustrate some of the approaches we have used to explore evolutionary dynamics and the tumour-immune crosstalk in primary tumour samples from oesophageal cancer cohorts.

2:20 PM-2:40 PM
Tracing genomic histories and environmental influences in cancer development
Room: Columbus IJ

Presentation Overview: Show

Cancer develops as a result of various mutational processes that act upon cells, imposing specific mutational footprints in their genomes. Recent advances in computational methodologies for inferring these mutational footprints have allowed us to retrace patterns of cancer development to the original environmental or intrinsic risk factors. I will discuss some of the progress achieved in this area and exemplify our recent use of the methodology to elucidate aetiology subtypes in oesophageal adenocarcinoma, an aggressive cancer with poor prognosis and increasing incidence in the Western world. Along with genomic alterations, another constraint during cancer development and progression is the tumour microenvironment, which shapes cancer trajectories either by promoting tumour invasion or by suppressing abnormal growth. I will discuss several insights into tumour immunity obtained recently from large-scale sequencing efforts by combining mutational profiles and immune cell abundance estimates from bulk tumour data. Employing data integration and machine learning approaches can help uncover genomic instability characteristics and specific mutational triggers associated with immune phenotypes in cancer. I will illustrate some of the approaches we have used to explore evolutionary dynamics and the tumour-immune crosstalk in primary tumour samples from oesophageal cancer cohorts.

2:40 PM-3:00 PM
Proceedings Presentation: GSEA-InContext: Identifying novel and common patterns in expression experiments
Room: Columbus IJ
  • Rani Powers, University of Colorado, United States
  • Andrew Goodspeed, University of Colorado, United States
  • Harrison Pielke-Lombardo, University of Colorado, United States
  • Aik-Choon Tan, University of Colorado, United States
  • James Costello, University of Colorado, United States

Presentation Overview: Show

Motivation: Gene Set Enrichment Analysis (GSEA) is routinely used to analyze and interpret coordinate
pathway-level changes in transcriptomics experiments. For an experiment where less than seven samples
per condition are compared, GSEA employs a competitive null hypothesis to test significance. A gene
set enrichment score is tested against a null distribution of enrichment scores generated from permuted
gene sets, where genes are randomly selected from the input experiment. Looking across a variety of
biological conditions, however, genes are not randomly distributed with many showing consistent patterns
of up- or down-regulation. As a result, common patterns of positively and negatively enriched gene
sets are observed across experiments. Placing a single experiment into the context of a relevant set
of background experiments allows us to identify both the common and experiment-specific patterns of
gene set enrichment.

Results: We compiled a compendium of 442 small molecule transcriptomic experiments and used GSEA
to characterize common patterns of positively and negatively enriched gene sets. To identify experimentspecific
gene set enrichment, we developed the GSEA-InContext method that accounts for gene expression
patterns within a background set of experiments to identify statistically significantly enriched gene sets.
We evaluated GSEA-InContext on experiments using small molecules with known targets to show that it
successfully prioritizes gene sets that are specific to each experiment, thus providing valuable insights
that complement standard GSEA analysis.

Availability and Implementation: GSEA-InContext implemented in Python, supplemental results, and
the background expression compendium are available at: https://github.com/CostelloLab/GSEA-InContext

3:00 PM-3:20 PM
Proceedings Presentation: AnoniMME: Bringing Anonymity to the Matchmaker Exchange Platform for Rare Disease Gene Discovery
Room: Columbus IJ
  • Bristena Oprisanu, University College London, United Kingdom
  • Emiliano De Cristofaro, University College London, United Kingdom

Presentation Overview: Show

Motivation: Advances in genome sequencing and genomics research are bringing us closer to a new era of personalized medicine, where healthcare can be tailored to the individual’s genetic makeup, and to more effective diagnosis and treatment of rare genetic diseases. Much of this progress depends on collaborations and access to genomes, and thus a number of initiatives have been introduced to support seamless data sharing. Among these, the Global Alliance for Genomics and Health runs a popular platform, called Matchmaker Exchange, which allows researchers to perform queries for rare genetic disease discovery over multiple federated databases. Queries include gene variations which are linked to rare diseases, and the ability to find other researchers that have seen or have interest in those variations is extremely valuable. Nonetheless, in some cases, researchers may be reluctant to use the platform since the queries they make (thus, what they are working on) are revealed to other researchers, and this creates concerns with privacy and competitive advantage.
Contributions: We present AnoniMME, a novel framework geared to enable anonymous queries within the Matchmaker Exchange platform. The framework, building on a cryptographic primitive called Reverse Private Information Retrieval (PIR), let researchers anonymously query the federated platform, in a multi-server setting. Specifically, they write their query, along with a public encryption key, anonymously in a public database. Responses are also supported, so that other researchers can respond to queries by providing their encrypted contact details.
Availability and Implementation: https://github.com/bristena-op/AnoniMME.

3:20 PM-3:30 PM
Activity landscapes of cancer cell lines predict drug response
Room: Columbus IJ
  • Martin Frejno, Technical University of Munich, Germany
  • Benjamin Ruprecht, Merck & Co., United States
  • Chen Meng, Technical University of Munich, Germany
  • Alexander Hogrebe, Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark, Denmark
  • Jana Zecha, Technical University of Munich, Germany
  • Dominic Helm, Proteomics Core Facility, EMBL, Heidelberg, Germany, Germany
  • Thomas Oellerich, Goethe University, Germany
  • Sebastian Scheich, Goethe University, Germany
  • Hans-Michael Kvasnicka, Goethe University, Germany
  • Enken Drecoll, Technical University of Munich, Germany
  • Wilko Weichert, Technical University of Munich, Germany
  • Bernhard Kuster, Technical University of Munich, Germany

Presentation Overview: Show

In recent years, proteomic profiling of cancer cell lines combined with quantification of their response to drugs has proven to be useful for the identification of protein biomarkers of drug sensitivity and resistance. However, given that phosphorylation-based signaling is known to play a major role in determining drug response, phosphoproteomic profiling can provide a different angle on predicting drug sensitivity of cancer cell lines by focusing on their activity landscapes. Here, we profiled the proteomes and phosphoproteomes of 125 cancer cell lines using label-free mass spectrometry to a depth of >10,000 proteins and >55,000 phosphorylation sites (p-sites). We applied a wide range of computational approaches, including elastic net, concordance analysis, etc, to integrate these data with publicly available drug sensitivity measurements, identify proteomic and phosphoproteomic markers of drug response and suggest novel kinase-substrate relationships. The results not only recapitulated known drug-gene/protein interactions, but also suggested novel biomarkers predicting drug responses, which were subsequently validated in vitro, in vivo and on the patient level. These results suggest that, in combination with advanced computation methods, the activity profiling of cell lines has important value in translational research.

3:30 PM-3:40 PM
A generalizable and interpretable deep learning model for predicting microsatellite instability from routine histopathology images
Room: Columbus IJ
  • Renyu Zhang, Toyota Technological Institute at Chicago, United States
  • Boleslaw Osinski, University of Chicago, United States
  • Timothy Taxter, Tempus Labs, United States
  • Jason Perera, Tempus Labs, United States
  • Denise Lau, Tempus Labs, United States
  • Aly Khan, Toyota Technological Institute at Chicago, United States

Presentation Overview: Show

Microsatellite instability is an important genomic phenotype that can direct clinical treatment decisions, especially in the context of cancer immunotherapies. We introduce a new deep learning framework, AMIBA, to predict microsatellite instability status directly from routine histopathology slides. To facilitate adoption of our framework in a clinical setting, we combine recent advances in deep learning to improve the generalizability of our predictive model on tumor types not observed in training, and provide a means to visually interpret topological and morphological features that influence predictions. Taken together, AMIBA introduces a novel modality for microsatellite instability diagnostic testing, with a potential to profoundly expand access for testing at the earliest time point in a cancer diagnosis.

3:40 PM-3:50 PM
PROMIS-Med: Precise and Reproducible OMICS-Data Management and Integrative System for Precision Medicine
Room: Columbus IJ
  • Zeeshan Ahmed, University of Connecticut Health Center, United States
  • Bruce T. Liang, University of Connecticut Health Center, United States

Presentation Overview: Show

To improve the quality and transition of healthcare, robust big data management platforms are necessary to analyze heterogeneous genomics and healthcare data of high volume, velocity, variety and veracity. Healthcare data includes information about patient life style, medical history, visits to the practice, wet lab and imaging test, diagnoses, medications, surgical procedures, consulted providers and genomics profile. Adequate and analytic access to the healthcare and genomics data has potential to revolutionize the field of medicine by developing better understanding of biological mechanisms and modelling complex biological interactions by integrating and analyzing knowledge in a holistic manner. To effectively meet the goals of implementing system for precision medicine, significant efforts are required from the experts in various disciplines, located within one or multiple organizational units. One of the major challenges is to establish an efficient and secure workflow that can connect all units to streamline transparent data flow, quality inspection, processing, analysis and sharing. Here we presents a new, user-friendly HIPAA compliant precision medicine platform i.e. PROMIS-Med towards complex and large scaled healthcare and genomics data management, analysis and visualization. PROMIS-Med is managing healthcare data of over 800,000 patients and helping integrative processing and analysis of genomics data of various kinds.

3:50 PM-4:00 PM
Integrative Genomics Analysis Identifies Distinct Prognostic Subgroups In Pediatric Cancers
Room: Columbus IJ
  • Lei Huang, University of Chicago, United States
  • Riyue Bao, University of Chicago, United States
  • Jorge Andrade, University of Chicago, United States

Presentation Overview: Show

Pediatric cancers are generally rare and have not been thoroughly investigated compared to adult cancers. Tumor microenvironments are shaped by many variables including genetics, transcriptomics and epigenetics. However, recent study of pediatric cancers focuses more on genetic landscapes under clinical factors than the cancer prognosis analysis. Data-driven integration of multi-omics data and clinical information offers a novel solution for identifying patient subgroups, understanding the tumor heterogeneity and developing biomarkers and therapies.
We applied Affinity Network Fusion (ANF) method to the RNAseq, miRNAseq, and DNA methylation array data of Rhabdoid Tumor (RT, n=57) and Wilms Tumor (WT, n=117) from TARGET database. We identified two prognostic subgroups with significant survival differences in each cancer type. In RT, we integrated mRNA and miRNA expression data to stratify the patients (log-rank p-value = 0.025); while in WT, the patients were stratified using either mRNA-DNA methylation data (log-rank p-value = 0.001) or mRNA-miRNA expression data (log-rank p-value = 0.013). Gene functional analysis reveals distinctive pathways between the patient subgroups, which warrant further interrogation. This unsupervised integrative approach can be easily adapted to a wide spectrum of cancers and provides guidance on evidence-based design of clinical trials and treatments tailored to patient’s unique genetic background.

4:00 PM-4:40 PM
Coffee Break
4:40 PM-4:50 PM
Building and Using a Gen3 Data Commons for Translational Medicine
Room: Columbus IJ
  • Christopher Meyer, University of Chicago, United States
  • Xiangyan Kuang, University of Chicago, United States
  • Yilin Xu, University of Chicago, United States
  • Francisco Ortuno, University of Chicago, United States
  • Zac Flamig, University of Chicago, United States
  • Christina Yung, University of Chicago, United States
  • Robert Grossman, University of Chicago, United States

Presentation Overview: Show

The data commons paradigm aims to accelerate scientific discoveries by facilitating cross-project analyses through harmonization of ingested data curated from a variety of sources. The Gen3 software stack is a suite of open source software for hosting data commons in a secure, scalable platform for applications. Gen3 includes five main services for authentication and authorization, GraphQL based searching, curating submissions against a metadata dictionary, mapping data GUIDs to locations, and an interactive website.

The process of building a Gen3 Data Commons and using it requires harmonizing datasets by creating a standardized data dictionary of variable names and using this dictionary for data ingestion and co-analyses. Since all Gen3 Data Commons share a common infrastructure, open-source tools and apps can be developed for analyses that span different datasets, even across different commons.

We describe the Gen3 software components in detail and discuss steps required for creating data dictionaries. We then demonstrate our cloud-based workspace for data analysis and visualization, which supports Python and R Jupyter notebooks, ShinyR applications, and Docker/CWL analysis pipelines. These tools represent real use cases as they interoperate with existing Gen3 Data Commons, including the Brain Commons, BloodPAC Commons, Environmental Data Commons, and the NIAID Data Hub.

4:50 PM-5:00 PM
Identifying Crohn's disease signal from variome analysis
Room: Columbus IJ
  • Yanran Wang, Rutgers University, United States
  • Yuri Astrakhan , United States
  • Britt-Sabina Petersen, Christian-Albrechts-University of Kiel, Germany
  • Stefan Schreiber, Christian-Albrechts-University of Kiel, Germany
  • Andre Franke, Christian-Albrechts-University of Kiel, Germany
  • Yana Bromberg, Rutgers University, United States

Presentation Overview: Show

Background: After years of research, the cause of Crohn’s disease (CD) remains unknown. Its accurate diagnosis, however, can help in management and preventing the onset of disease. Whole exome sequencing (WES) provides a new way of evaluating CD-predisposition and can help identify new disease genes and pathways.
Method and Results: We developed AVA,Dx (Analysis of Variation for Association with Disease), a machine learning-based method that uses WES data alone to highlight CD genes and predict individual CD status. AVA,Dx first predicts changes in function of genes due to individual-specific genetic variation. Then, it maps the resulting gene-function vectors to individual CD-status. In testing, AVA,Dx differentiated three quarters of the CD patients from healthy controls with 71% precision. Importantly, we were able to account for batch effects to enable accurately predicting individual-CD status for individuals from a separately sequenced cohort. Furthermore, some of the genes selected by our method as relevant to CD were not previously identified, but were significantly enriched in some known CD pathways.
Conclusions: AVA,Dx highlights new CD genes and pathways and accurately predicts CD-status. Note that using AVA,Dx techniques may help improve our understanding of other complex disease in the future.

5:00 PM-5:10 PM
Secure genome crowdsourcing for million-individual association studies
Room: Columbus IJ
  • Hyunghoon Cho, Massachusetts Institute of Technology, United States
  • David J. Wu, Stanford University, United States
  • Bonnie Berger, Massachusetts Institute of Technology, United States

Presentation Overview: Show

Most sequenced genomes are currently stored in strict access-controlled repositories. Free access to these data could improve the power of genome-wide association studies (GWAS) to identify disease-causing genetic variants and may aid in the discovery of new drug targets. However, concerns over genetic data privacy may deter individuals from contributing their genomes to scientific studies and in many cases, prevent researchers from sharing data with the scientific community. Although several cryptographic techniques for secure data analysis exist, none scales to computationally intensive analyses, such as GWAS. Here we describe an end-to-end protocol for large-scale genome-wide analysis that facilitates quality control and population stratification correction in 9K, 13K, and 23K individuals while maintaining the confidentiality of underlying genotypes and phenotypes. We show the protocol could feasibly scale to a million individuals. This approach may help to make currently restricted data available to the scientific community and could potentially enable 'secure genome crowdsourcing,' allowing individuals to contribute their genomes to a study without compromising their privacy.

5:10 PM-5:50 PM
Clinical and translational informatics: An overview
Room: Columbus IJ

Presentation Overview: Show

The field of biomedical informatics is at an exciting stage of progression. The past decade has seen the establishment of translational bioinformatics and clinical research informatics as disciplines unto themselves and clinical informatics is the newest specialty for medical board certification. The explosion of molecular data coupled with clinical data on actual patients holds the potential to define an entirely new taxonomy of disease based on mechanistic insights rather than macroscopic observations. A number of large-scale initiatives around the world aim to leverage novel methods and "big data" to derive new knowledge and enable precision medicine- the ability to target the right person with the right intervention at the right time. In addition, informatics enables a learning healthcare system in which data collected in the course of clinical care may be used for knowledge discovery. This talk will give an overview of the landscape of clinical and translational informatics and provide context for how bioinformatics methods and discoveries are improving human health.

5:50 PM-6:00 PM
Concluding Remarks
Room: Columbus IJ