In gene-gene interaction analysis using single nucleotide polymorphism (SNP) data, empty cells arise in the genotype contingency table more frequently than in single SNP association studies. Empty cells lead to unidentifiable regression coefficients in regression model fitting. It is unclear whether the degrees of freedom (d.f.) for testing interactions are reduced for such sparse contingency tables. Boolean Operation based Screening and Testing is an exhaustive gene-gene interaction search method in which a fixed d.f. of four (the most conservative choice) is used in the chi-squared null distribution for the likelihood ratio test for gene-gene interactions under a logistic regression model. In this paper, the choice of d.f. is investigated theoretically by introducing a decomposition of type I error. An adaptive method using the observed d.f. can be less conservative than the fixed d.f. method, thereby enhancing power. In simulated data, type I error rates for the adaptive method were usually better controlled under various scenarios for Gaussian linear regression and logistic regression, including prospective and retrospective sampling designs, as well as for artificial data that mimic actual genome-wide SNPs. When the adaptive method was applied to public datasets generated from simulations, it exhibited an improvement in power over the fixed method.
ASJC Scopus subject areas
- Statistics and Probability