ISCB-Asia/SCCG 2012 Proceedings Talk

A highly accurate heuristic algorithm for the haplotype assembly problem

Fei Deng, Wenjuan Cui & Lusheng Wang
City University of Hong Kong, Computer Science

Abstract

In this paper, we study the haplotype assembly problem (also known as single individual haplotyping). The input to this problem is a set of fragments sequenced from the two copies of a chromosome of a single individual, and their locations in the chromosome, which can be pre-determined by aligning the fragments to a reference DNA sequence. The goal here is to reconstruct two haplotypes (h1, h2) from the input fragments. Minimum error correction (MEC) is a frequently used model for solving this problem. The haplotype assembly problem with MEC aims to minimize the total number of conflicts (errors) between the fragments and the constructed haplotypes (h1,h2).

We propose a heuristic algorithm for the haplotype assembly problem with MEC. We have tested our algorithm on a set of benchmark datasets. Experiments show that our algorithm can give very accurate solutions. It outperforms most of the existing programs when the error rate of the input fragments is high.