Background: Genome-wide association studies have revealed associations between single-nucleotide polymorphisms (SNPs) and phenotypes such as disease symptoms and drug tolerance. To address the small sample size for rare variants, association studies tend to group gene or pathway level variants and evaluate the effect on the set of variants. One of such strategies, known as the sequential kernel association test (SKAT), is a widely used collapsing method. However, the reported p-values from SKAT tend to be biased because the asymptotic property of the statistic is used to calculate the p-value. Although this bias can be corrected by applying permutation procedures for the test statistics, the computational cost of obtaining p-values with high resolution is prohibitive. Results: To address this problem, we devise an adaptive SKAT procedure termed AP-SKAT that efficiently classifies significant SNP sets and ranks them according to the permuted p-values. Our procedure adaptively stops the permutation test when the significance level is outside some confidence interval of the estimated p-value for a binomial distribution. To evaluate the performance, we first compare the power and sample size calculation and the type I error rates estimate of SKAT, SKAT-O, and the proposed procedure using genotype data in the SKAT R package and from 1000 Genome Project. Through computational experiments using whole genome sequencing and SNP array data, we show that our proposed procedure is highly efficient and has comparable accuracy to the standard procedure. Conclusions: For several types of genetic data, the developed procedure could achieve competitive power and sample size under small and large sample size conditions with controlling considerable type I error rates, and estimate p-values of significant SNP sets that are consistent with those estimated by the standard permutation test within a realistic time. This demonstrates that the procedure is sufficiently powerful for recent whole genome sequencing and SNP array data with increasing numbers of phenotypes. Additionally, this procedure can be used in other association tests by employing alternative methods to calculate the statistics.
- Genome wide association study
- Multiple test
- Rare variants
ASJC Scopus subject areas