TY - JOUR
T1 - Improved metabolomic data-based prediction of depressive symptoms using nonlinear machine learning with feature selection
AU - Takahashi, Yuta
AU - Ueki, Masao
AU - Yamada, Makoto
AU - Tamiya, Gen
AU - Motoike, Ikuko N.
AU - Saigusa, Daisuke
AU - Sakurai, Miyuki
AU - Nagami, Fuji
AU - Ogishima, Soichi
AU - Koshiba, Seizo
AU - Kinoshita, Kengo
AU - Yamamoto, Masayuki
AU - Tomita, Hiroaki
N1 - Funding Information:
This work was supported by a grant from the Strategic Research Program for Brain Sciences from the Japan Agency for Medical Research and Development (AMED) under Grant number JP19dm0107099 and the Tohoku Medical Megabank Project from the Ministry of Education, Culture, Sports, Science, and Technology (MEXT) of Japan and AMED under Grant numbers JP19km0105001 and JP19km0105002. We are grateful to Drs. Atsushi Hozawa, Shinichi Kuriyama, Ichiro Tsuji, Naoko Minegishi, Takako Takai-Igarashi, Nobuo Fuse, Osamu Tanabe, Junichi Sugawara, Tadashi Ishii, Kiyoshi Ito, Eiichi N. Kodama, Yasuyuki Taki, Masao Nagasaki, Ritsuko Shimizu, Akito Tsuboi, Kichiya Suzuki, Hiroshi Tanaka, Hiroshi Kawame, Hiroaki Hashizume, Shinichi Higuchi, Nobuo Yaegashi, Shigeo Kure, Sadayoshi Ito, and all faculties and stuffs of the Tohoku University Tohoku Medical Megabank Organization (http://www.megabank. tohoku.ac.jp/english/a191201/) for establishing the cohort which founded the materials and information analyzed in this study, as well as the participants of the Tohoku Medical Megabank Project for supporting this study.
Publisher Copyright:
© 2020, The Author(s).
PY - 2020/12/1
Y1 - 2020/12/1
N2 - To solve major limitations in algorithms for the metabolite-based prediction of psychiatric phenotypes, a novel prediction model for depressive symptoms based on nonlinear feature selection machine learning, the Hilbert–Schmidt independence criterion least absolute shrinkage and selection operator (HSIC Lasso) algorithm, was developed and applied to a metabolomic dataset with the largest sample size to date. In total, 897 population-based subjects were recruited from the communities affected by the Great East Japan Earthquake; 306 metabolite features (37 metabolites identified by nuclear magnetic resonance measurements and 269 characterized metabolites based on the intensities from mass spectrometry) were utilized to build prediction models for depressive symptoms as evaluated by the Center for Epidemiologic Studies-Depression Scale (CES-D). The nested fivefold cross-validation was used for developing and evaluating the prediction models. The HSIC Lasso-based prediction model showed better predictive power than the other prediction models, including Lasso, support vector machine, partial least squares, random forest, and neural network. l-leucine, 3-hydroxyisobutyrate, and gamma-linolenyl carnitine frequently contributed to the prediction. We have demonstrated that the HSIC Lasso-based prediction model integrating nonlinear feature selection showed improved predictive power for depressive symptoms based on metabolome data as well as on risk metabolites based on nonlinear statistics in the Japanese population. Further studies should use HSIC Lasso-based prediction models with different ethnicities to investigate the generality of each risk metabolite for predicting depressive symptoms.
AB - To solve major limitations in algorithms for the metabolite-based prediction of psychiatric phenotypes, a novel prediction model for depressive symptoms based on nonlinear feature selection machine learning, the Hilbert–Schmidt independence criterion least absolute shrinkage and selection operator (HSIC Lasso) algorithm, was developed and applied to a metabolomic dataset with the largest sample size to date. In total, 897 population-based subjects were recruited from the communities affected by the Great East Japan Earthquake; 306 metabolite features (37 metabolites identified by nuclear magnetic resonance measurements and 269 characterized metabolites based on the intensities from mass spectrometry) were utilized to build prediction models for depressive symptoms as evaluated by the Center for Epidemiologic Studies-Depression Scale (CES-D). The nested fivefold cross-validation was used for developing and evaluating the prediction models. The HSIC Lasso-based prediction model showed better predictive power than the other prediction models, including Lasso, support vector machine, partial least squares, random forest, and neural network. l-leucine, 3-hydroxyisobutyrate, and gamma-linolenyl carnitine frequently contributed to the prediction. We have demonstrated that the HSIC Lasso-based prediction model integrating nonlinear feature selection showed improved predictive power for depressive symptoms based on metabolome data as well as on risk metabolites based on nonlinear statistics in the Japanese population. Further studies should use HSIC Lasso-based prediction models with different ethnicities to investigate the generality of each risk metabolite for predicting depressive symptoms.
UR - http://www.scopus.com/inward/record.url?scp=85084963127&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85084963127&partnerID=8YFLogxK
U2 - 10.1038/s41398-020-0831-9
DO - 10.1038/s41398-020-0831-9
M3 - Article
C2 - 32427830
AN - SCOPUS:85084963127
VL - 10
JO - Translational Psychiatry
JF - Translational Psychiatry
SN - 2158-3188
IS - 1
M1 - 157
ER -