Oral Presentation 43

Simple Local Assembly Program

Presenter: Adam Spargo, Wellcome Trust Sanger Institute
Authors: Adam Spargo, Zemin Ning

Abstract: We present a simple local assembly program which will be used in the contig assembly stage of the Phusion2 pipeline. Phusion [1] clusters sequencing reads by shared long k-mer words, these clusters are then assembled in parallel, currently using Phrap[2]. This pipeline was very successful with Sanger sequencing technology, however second generation sequencing technologies have presented several issues (i) Phrap cannot handle very high coverage data and so clusters must be small, (ii) Phrap cannot make use of read-pairs; with contigs requiring extensive post-processing by Phusion, both to join via read-pairs and to break at mis-assemblies, (iii) Long running Phrap jobs destroy the previously effective parallelization of Phusion, (iv) Phrap cannot handle all of the different second generation technologies effectively, making a hybrid approach to genome sequencing more difficult than necessary. The local assembler has been implemented via the overlap-layout-consensus methodology, using libraries from the Smalt alignment tool [3] and the Boost Graph library [4]. We detail this implementation and then report on our investigations into algorithms for overlap-graph disambiguation; using read-pairs, defined nucleotide positions and read-depth. Re-use of robust/multi-threaded libraries allows us to quickly implement new algorithms and concentrate our research on developing new methods to make the best of the available technologies. Results show the disambiguation of graphs generated from carefully constructed simulation data for various classes of repeats as well as real data.

[1] The phusion assembler. Mullikin JC and Ning Z. Genome research 2003;13;1;81-90.
[2] www.phrap.org/
[3] www.sanger.ac.uk/resources/software/smalt/
[4] /www.boost.org/doc/libs/1_44_0/libs/graph/doc/index.html


>>Close