TY - JOUR
T1 - Sequence analysis of the genome of the unicellular cyanobacterium synechocystis sp. strain PCC6803. II. Sequence determination of the entire genome and assignment of potential protein-coding regions
AU - Kaneko, Takakazu
AU - Sato, Shusei
AU - Kotani, Hirokazu
AU - Tanaka, Ayako
AU - Asamizu, Erika
AU - Nakamura, Yasukazu
AU - Miyajima, Nobuyuki
AU - Hirosawa, Makoto
AU - Sugiura, Masahiro
AU - Sasamoto, Shigemi
AU - Kimura, Takaharu
AU - Hosouchi, Tsutomu
AU - Matsuno, Ai
AU - Muraki, Akiko
AU - Nakazaki, Naomi
AU - Naruo, Kaoru
AU - Okumura, Satomi
AU - Shimpo, Sayaka
AU - Takeuchi, Chie
AU - Wada, Tsuyuko
AU - Watanabe, Akiko
AU - Yamada, Manabu
AU - Yasuda, Miho
AU - Tabata, Satoshi
PY - 1996
Y1 - 1996
N2 - The sequence determination of the entire genome of the Synechocystis sp. strain PCC6803 was completed. The total length of the genome finally confirmed was 3,573,470 bp, including the previously reported sequence of 1,003,450 bp from map position 64% to 92% of the genome. The entire sequence was assembled from the sequences of the physical map-based contigs of cosmid clones and of λ clones and long PCR products which were used for gap-filling. The accuracy of the sequence was guaranteed by analysis of both strands of DNA through the entire genome. The authenticity of the assembled sequence was supported by restriction analysis of long PCR products, which were directly amplified from the genomic DNA using the assembled sequence data. To predict the potential protein-coding regions, analysis of open reading frames (ORFs), analysis by the GeneMark program and similarity search to databases were performed. As a result, a total of 3,168 potential protein genes were assigned on the genome, in which 145 (4.6%) were identical to reported genes and 1,257 (39.6%) and 340 (10.8%) showed similarity to reported and hypothetical genes, respectively. The remaining 1,426 (45.0%) had no apparent similarity to any genes in databases. Among the potential protein genes assigned, 128 were related to the genes participating in photosynthetic reactions. The sum of the sequences coding for potential protein genes occupies 87% of the genome length. By adding rRNA and tRNA genes, therefore, the genome has a very compact arrangement of protein- and RNA-coding regions. A notable feature on the gene organization of the genome was that 99 ORFs, which showed similarity to transposase genes and could be classified into 6 groups, were found spread all over the genome, and at least 26 of them appeared to remain intact. The result implies that rearrangement of the genome occurred frequently during and after establishment of this species.
AB - The sequence determination of the entire genome of the Synechocystis sp. strain PCC6803 was completed. The total length of the genome finally confirmed was 3,573,470 bp, including the previously reported sequence of 1,003,450 bp from map position 64% to 92% of the genome. The entire sequence was assembled from the sequences of the physical map-based contigs of cosmid clones and of λ clones and long PCR products which were used for gap-filling. The accuracy of the sequence was guaranteed by analysis of both strands of DNA through the entire genome. The authenticity of the assembled sequence was supported by restriction analysis of long PCR products, which were directly amplified from the genomic DNA using the assembled sequence data. To predict the potential protein-coding regions, analysis of open reading frames (ORFs), analysis by the GeneMark program and similarity search to databases were performed. As a result, a total of 3,168 potential protein genes were assigned on the genome, in which 145 (4.6%) were identical to reported genes and 1,257 (39.6%) and 340 (10.8%) showed similarity to reported and hypothetical genes, respectively. The remaining 1,426 (45.0%) had no apparent similarity to any genes in databases. Among the potential protein genes assigned, 128 were related to the genes participating in photosynthetic reactions. The sum of the sequences coding for potential protein genes occupies 87% of the genome length. By adding rRNA and tRNA genes, therefore, the genome has a very compact arrangement of protein- and RNA-coding regions. A notable feature on the gene organization of the genome was that 99 ORFs, which showed similarity to transposase genes and could be classified into 6 groups, were found spread all over the genome, and at least 26 of them appeared to remain intact. The result implies that rearrangement of the genome occurred frequently during and after establishment of this species.
KW - Genome sequencing
KW - Potential protein genes
KW - Synechocystis PCC6803
UR - http://www.scopus.com/inward/record.url?scp=0030606607&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0030606607&partnerID=8YFLogxK
U2 - 10.1093/dnares/3.3.109
DO - 10.1093/dnares/3.3.109
M3 - Article
C2 - 8905231
AN - SCOPUS:0030606607
VL - 3
SP - 109
EP - 136
JO - DNA Research
JF - DNA Research
SN - 1340-2838
IS - 3
ER -