Tokyo Tech News

GHOSTZ: A faster sequence homology search algorithm based on database subsequence clustering

RSS

Published: May 14, 2015

Sequence homology searches are widely used in genome studies. New DNA sequencers produce large amounts of sequence data, which require continual increases in the size of sequence databases.

As a result, homology searches require huge amounts of computational time, especially for metagenomic analysis. In metagenomic analysis, environmental samples (from soil, the sea, the human body, and so on) frequently include DNA sequences from many different species, and the reference database often does not contain closely-related genome sequences. This means that more sensitive approaches are required to identify novel genes. Even general homology search analyses using BLASTX become difficult in terms of computational cost.

Now, Yutaka Akiyama and colleagues at Tokyo Institute of Technology have developed a faster homology search method based on database subsequence clustering, and implemented it as GHOSTZ. The source code is freely available for download.

This method clusters similar subsequences from a database to perform an efficient seed search and ungapped extension by reducing alignment candidates based on triangle inequality.

When measured with metagenomic data, GHOSTZ is ~2.2-2.8 times faster than RAPSearch and is ~185-261 times faster than BLASTX.

The algorithm was designed for functional and taxonomic annotation in metagenome analysis, but it could also prove to be a useful tool in proteome research.

Similarity filtering using triangle inequality among subsequences.

Figure. Similarity filtering using triangle inequality among subsequences.

Table.
Computation times for the SRR407548 reads against the KEGG GENES database. The acceleration ratio is relative to BLASTX using single thread.

Computation times for the SRR407548 reads against the KEGG GENES database. The acceleration ratio is relative to BLASTX using single thread.

Reference

Authors:
Shuji Suzuki, Masanori Kakuta, Takashi Ishida, and Yutaka Akiyama.
Title of original paper:
Faster sequence homology searches by clustering subsequences.
Journal, volume, pages and year:
Bioinformatics 31(8), 1183-1190 (2015).
Affiliations:
Department of Computer Science, Tokyo Institute of Technology; Education Academy of Computational Life Sciences, Tokyo Institute of Technology.
DOI :

Further information

Professor Yutaka Akiyama
Bioinformatics Laboratory, Tokyo Institute of Technology
Email akiyama@cs.titech.ac.jp
Tel +81-35-734-3645

RSS