TY - JOUR
T1 - MAFFT
T2 - A novel method for rapid multiple sequence alignment based on fast Fourier transform
AU - Katoh, Kazutaka
AU - Misawa, Kazuharu
AU - Kuma, Kei Ichi
AU - Miyata, Takashi
N1 - Funding Information:
We thank Drs N. Iwabe, H. Suga and D. Hoshiyama for helpful comments. This work was supported by grants from the Ministry of Education, Culture, Sports, Science and Technology of Japan.
PY - 2002/7/15
Y1 - 2002/7/15
N2 - A multiple sequence alignment program, MAFFT, has been developed. The CPU time is drastically reduced as compared with existing methods. MAFFT includes two novel techniques. (i) Homologous regions are rapidly identified by the fast Fourier transform (FFT), in which an amino acid sequence is converted to a sequence composed of volume and polarity values of each amino acid residue. (ii) We propose a simplified scoring system that performs well for reducing CPU time and increasing the accuracy of alignments even for sequences having large insertions or extensions as well as distantly related sequences of similar length. Two different heuristics, the progressive method (FFT-NS-2) and the iterative refinement method (FFT-NS-i), are implemented in MAFFT.The performances of FFT-NS-2 and FFT-NS-1 were compared with other methods by computer simulations and benchmark tests; the CPU time of FFT-NS-2 is drastically reduced as compared with CLUSTALW with comparable accuracy. FFT-NS-i is over 100 times faster than T-COFFEE, when the number of input sequences exceeds 60, without sacrificing the accuracy.
AB - A multiple sequence alignment program, MAFFT, has been developed. The CPU time is drastically reduced as compared with existing methods. MAFFT includes two novel techniques. (i) Homologous regions are rapidly identified by the fast Fourier transform (FFT), in which an amino acid sequence is converted to a sequence composed of volume and polarity values of each amino acid residue. (ii) We propose a simplified scoring system that performs well for reducing CPU time and increasing the accuracy of alignments even for sequences having large insertions or extensions as well as distantly related sequences of similar length. Two different heuristics, the progressive method (FFT-NS-2) and the iterative refinement method (FFT-NS-i), are implemented in MAFFT.The performances of FFT-NS-2 and FFT-NS-1 were compared with other methods by computer simulations and benchmark tests; the CPU time of FFT-NS-2 is drastically reduced as compared with CLUSTALW with comparable accuracy. FFT-NS-i is over 100 times faster than T-COFFEE, when the number of input sequences exceeds 60, without sacrificing the accuracy.
UR - http://www.scopus.com/inward/record.url?scp=0037100671&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0037100671&partnerID=8YFLogxK
U2 - 10.1093/nar/gkf436
DO - 10.1093/nar/gkf436
M3 - Article
C2 - 12136088
AN - SCOPUS:0037100671
VL - 30
SP - 3059
EP - 3066
JO - Nucleic Acids Research
JF - Nucleic Acids Research
SN - 0305-1048
IS - 14
ER -