Discrepancies between human DNA, mRNA and protein reference sequences and their relation to single nucleotide variants in the human population

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

The protein coding sequences of the human reference genome GRCh38, RefSeq mRNA and UniProt protein databases are sometimes inconsistent with each other, due to polymorphisms in the human population, but the overall landscape of the discordant sequences has not been clarified. In this study, we comprehensively listed the discordant bases and regions between the GRCh38, RefSeq and UniProt reference sequences, based on the genomic coordinates of GRCh38. We observed that the RefSeq sequences are more likely to represent the major alleles than GRCh38 and UniProt, by assigning the alternative allele frequencies of the discordant bases. Since some reference sequences have minor alleles, functional and structural annotations may be performed based on rare alleles in the human population, thereby biasing these analyses. Some of the differences between the RefSeq and GRCh38 account for biological differences due to known RNA-editing sites. The definitions of the coding regions are frequently complicated by possible micro-exons within introns and by SNVs with large alternative allele frequencies near exon-intron boundaries. The mRNA or protein regions missing from GRCh38 were mainly due to small deletions, and these sequences need to be identified. Taken together, our results clarify overall consistency and remaining inconsistency between the reference sequences.

Original languageEnglish
JournalDatabase : the journal of biological databases and curation
Volume2016
DOIs
Publication statusPublished - 2016 Jan 1

ASJC Scopus subject areas

  • Information Systems
  • Biochemistry, Genetics and Molecular Biology(all)
  • Agricultural and Biological Sciences(all)

Fingerprint Dive into the research topics of 'Discrepancies between human DNA, mRNA and protein reference sequences and their relation to single nucleotide variants in the human population'. Together they form a unique fingerprint.

  • Cite this