Machine learning for effectively avoiding overfitting is a crucial strategy for the genetic prediction of polygenic psychiatric phenotypes

Yuta Takahashi, Masao Ueki, Gen Tamiya, Soichi Ogishima, Kengo Kinoshita, Atsushi Hozawa, Naoko Minegishi, Fuji Nagami, Kentaro Fukumoto, Kotaro Otsuka, Kozo Tanno, Kiyomi Sakata, Atsushi Shimizu, Makoto Sasaki, Kenji Sobue, Shigeo Kure, Masayuki Yamamoto, Hiroaki Tomita

研究成果: Article査読

1 被引用数 (Scopus)

抄録

The accuracy of previous genetic studies in predicting polygenic psychiatric phenotypes has been limited mainly due to the limited power in distinguishing truly susceptible variants from null variants and the resulting overfitting. A novel prediction algorithm, Smooth-Threshold Multivariate Genetic Prediction (STMGP), was applied to improve the genome-based prediction of psychiatric phenotypes by decreasing overfitting through selecting variants and building a penalized regression model. Prediction models were trained using a cohort of 3685 subjects in Miyagi prefecture and validated with an independently recruited cohort of 3048 subjects in Iwate prefecture in Japan. Genotyping was performed using HumanOmniExpressExome BeadChip Arrays. We used the target phenotype of depressive symptoms and simulated phenotypes with varying complexity and various effect-size distributions of risk alleles. The prediction accuracy and the degree of overfitting of STMGP were compared with those of state-of-the-art models (polygenic risk scores, genomic best linear-unbiased prediction, summary-data-based best linear-unbiased prediction, BayesR, and ridge regression). In the prediction of depressive symptoms, compared with the other models, STMGP showed the highest prediction accuracy with the lowest degree of overfitting, although there was no significant difference in prediction accuracy. Simulation studies suggested that STMGP has a better prediction accuracy for moderately polygenic phenotypes. Our investigations suggest the potential usefulness of STMGP for predicting polygenic psychiatric conditions while avoiding overfitting.

本文言語English
論文番号294
ジャーナルTranslational Psychiatry
10
1
DOI
出版ステータスPublished - 2020 12 1

ASJC Scopus subject areas

  • 精神医学および精神衛生
  • 細胞および分子神経科学
  • 生物学的精神医学

フィンガープリント

「Machine learning for effectively avoiding overfitting is a crucial strategy for the genetic prediction of polygenic psychiatric phenotypes」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル