WORKSHOPS - Monday, 27 and Friday, 31 October

Practical Bioinformatics MiniCourse on "Gene Functional Networks and Protein Interactomics"

Presenter:  Javier De Las Rivas, Consejo Superior de Investigaciones Cientificas (CSIC)
Session 1 (8:30 - 10:30, 2h) Bioinformatic tools for functional Enrichment Analysis: annotation of selected gene lists.

- Functional biological information and annotation: GO, KEGG, Interpro (orthogonal biological databases).

- Functional Enrichment Analysis (EA): from single to modular methods.

- Using EA tools to annotate gene lists: DAVID (single), GSEA (gene sets), GeneCodis (modular)

- The problems of redundant and general terms: post-enrichment tool GeneTermLinker (postEA)

Session 2 (10:30 - 12:30, 2h) Construction of gene functional networks: linking genes that share common functions.

- From co-annotation and enrichment to the construction of functional networks:

- Using a R-Bioconductor tool to build functional networks

- Analysis of functional modules and groups: distance

Session 3 (13:30 - 15:30, 2h) Protein interaction networks: definition, databases, tools.

- Definition and properties of protein interaction networks

- Protein interaction networks compared to coexpression networks and biological pathways

- Using on-line tools to build protein interaction networks: APID, STRING, GeneMANIA, PSICQUIC

- Binary networks: proteins and drugs interactions (STITCH)

Session 4 (15:30 - 17:30, 2h) Construction and analysis of gene/protein networks using Cytoscape

- Import biomolecular networks in Cytoscape

- Visualize and explore biomolecular networks in Cytoscape

- Topological analysis of biomolecular networks: nodes centrality and modules finding

- From networks to pathways: a human functional protein interaction network

Software needed on the computers:
  1. R ( version 3.0.1 or after
  2. Bioconductor ( version 2.14
  3. Cytoscape ( version 3.0 or after and also if possible version 2.6.3
[ top ]
[ top ]
Coverage Analysis for NGS data experiments

Presenter: Marcel Caraciolo, Genomika Diagnósticos Lab
Abstract: With the advent of next generation sequencing platforms and tools, larger alignment output data from bioinformatic pipelines is pushing the standards for high quality levels at the analysis step. Verifying adequate read coverage is an important task to ensure the robustness expected along the coding regions of the genome, specially in clinical sequencing, when the focus is on genetic disorders and most of them are gene related. In this workshop we will present the techniques and related main tools for coverage analysis, which includes GATK; BEDTools among other open­source software. At the end, the audience will have learned the basics on how to evaluate the quality of sequencing experiments and the main metrics that must be considered in this task.

Keywords: Sequencing Analysis; Genetic Variation Analysis; Coverage Analysis; Relevance to the community: Coverage analysis is becoming an important step on sequencing pipelines and it requires adequate tools and filtering information to the bioinformaticians to evaluate their NGS data. Different from tools already popular among the community on analyzing the FASTQ files (raw data); this study is a step further in the level of coding genes and exons (bam and bed files) when the goal is to check; for instance from exome sequencing data; the efficiency of the capture and the coverage along the target regions that are being investigated. In the lack of tutorials about this topic we decided to propose this workshop with hands­on practice on the main tools used in this analysis and the basic knowledge for further exploration.
Topics to be covered: 1. Introduction to Coverage Analysis (30min)
1.1 Understanding the basic concepts and its applications o DNA­seq; RNA­seq; transcript analysis;
1.2. Defining coding regions; loci coordinates; 1.3 BAM and bed formats

2. Defining the important metrics to evaluate (30 min)
2.1 Understanding reads depth; completeness; coverage; annotations; variability
2.2 Limitations that affect the analysis 2.3 Main visualizations

3. Hands­on at BedTools; GATK; Chanjo; (2hrs and 30 min)
3.1 Practices on bedtools coverage commands; 3.1 GATK depth of coverage 3.2 Plot graphs with R and Python 3.3 Automated coverage with chanjo (open­source tool from our lab).

4. Conclusions and further steps (30 min)
4.1 Summing up into one report and its interpretation
4.2 Further steps on coverage analysis
4.3 References and suggested tools.
[ top ]
Intro to BioPerl for Pipeline building in Bioinformatics

Presenter:  Rodrigo Gonzalo Parra, Univ. of Buenos Aires
Abstract: Perl was the first widely used programming language for Bioinformatics and it still remains widely popular among Bioinformaticians with several large international consortiums and projects providing Perl APIs to access their data (Ensembl, Uniprot among others). The BioPerl project is a large library of modules that allow rapid and easy development of code for Bioinformatics applications. We propose to conduct a workshop on “Intro to BioPerl for Pipeline building in Bioinformatics”. The workshop will provide a review of main aspects of Perl, an introduction to the BioPerl library and several of the most commonly used modules, and also introduce the concept of pipelines and provide strategies for linking together existing tools with students’ code to quickly automate groups of analyses.This workshop will mainly be useful for new Bioinformatics students as well as Computer Scientists / Biologists (with some programming knowledge) who are looking to get into Bioinformatics, thus helping to grow the Bioinformatics community in Latin America. Additionally, the organizers of this workshop are also organizers of the First Latin American Student Council Symposium (LA-SCS) that is being held the day before the ISCB-LA conference. We look to this workshop both as a way to reach out to more students about the LA-SCS and activities of the Student Council, and also to hopefully raise some funds from part of the registration fees, that can be used to help with the organization of the next LA-SCS.
The proposed workshop will be for 6-8 hrs (depending on venue availability and the conference schedule) and will discuss the following topics: 1) Perl syntax overview, scalars, vectors, hashes, control structures, loops, text handling, useful Perl functions.
2) How to build custom functions and manipulations of arguments
3) Simple algorithms to manipulate big amounts of data. Introduction to awk. 4) Automatic retrieval of data from the main Bioinformatics databases making use of the RESTFull frameworks.
5) Introduction to BioPerl
6) A glance over the most used BioPerl packages.
7) CPAN and installation of external packages.
8) Pipes into external tools. Examples of HMMER, ClustalW and others.
9) How to easily generate graphics in your pipeline with GNUplot.
[ top ]

PATRIC Bioinformatics Resource Center workshop featuring data and analysis tools for bacterial pathogens


Rebecca Wattam and Maulik Shukla, Virginia Bioinformatics Institute

PATRIC is the Bacterial Bioinformatics Resource Center, an information system designed to support the biomedical research community’s work on bacterial infectious diseases via integration of vital pathogen information with rich data and analysis tools.  PATRIC sharpens and hones the scope of available bacterial phylogenomic data from numerous sources specifically for the bacterial research community, in order to save biologists time and effort when conducting comparative analyses.  The freely available PATRIC platform provides an interface for biologists to discover data and information and conduct comprehensive comparative genomics and other analyses in a one-stop shop.  PATRIC is a NIH-NIAID -funded project of Virginia Tech’s Cyberinfrastructure Division.


9:00 – 10:00: Introduction & data overview

10:00 – 12:00: Comparative genomics: protein families and pathways

12:00 – 13:00: Lunch

13:00 – 14:00: RNASeq pipeline

14:00 –15:00: Comparative transcriptomics

15:00 – 16:00: More hands on: questions and answers

1) Consistent annotation across all sequenced bacterial species from GenBank and other sources via RAST.

2) A wealth of associated, descriptive genome metadata parsed from a variety of sources in over 60 fields such as, isolation source and geographic location.

3) Data integration across sources, data types, molecular entities, and organisms. Data types include genomics, transcriptomics, protein-protein interacIons, 3D protein structures, sequence typing data, and metadata.

4) Various search tools help you easily find data and genomes of interest such as single specific genomes, whole genome sets within certain taxa, and sets of genomes that share common metadata.

5) Upload your own data and analyze it privately and securely with PATRIC analysis tools by itself or against public datasets.

6) A Personal workspace to permanently save groups of genomic data, gene associations, and uploaded private data.  Free downloadable data and analysis results.