TY - JOUR
T1 - Direct conditional probability density estimation with sparse feature selection
AU - Shiga, Motoki
AU - Tangkaratt, Voot
AU - Sugiyama, Masashi
N1 - Funding Information:
Motoki Shiga was supported by JSPS KAKENHI 25870322. Masashi Sugiyama was supported by JSPS KAKENHI 23120004 and AOARD. Authors thank Dr. Ichiro Takeuchi, Nagoya Institute of Technology, for kindly providing his source codes.
Publisher Copyright:
© 2015, The Author(s).
PY - 2015/9/17
Y1 - 2015/9/17
N2 - Regression is a fundamental problem in statistical data analysis, which aims at estimating the conditional mean of output given input. However, regression is not informative enough if the conditional probability density is multi-modal, asymmetric, and heteroscedastic. To overcome this limitation, various estimators of conditional densities themselves have been developed, and a kernel-based approach called least-squares conditional density estimation (LS-CDE) was demonstrated to be promising. However, LS-CDE still suffers from large estimation error if input contains many irrelevant features. In this paper, we therefore propose an extension of LS-CDE called sparse additive CDE (SA-CDE), which allows automatic feature selection in CDE. SA-CDE applies kernel LS-CDE to each input feature in an additive manner and penalizes the whole solution by a group-sparse regularizer. We also give a subgradient-based optimization method for SA-CDE training that scales well to high-dimensional large data sets. Through experiments with benchmark and humanoid robot transition datasets, we demonstrate the usefulness of SA-CDE in noisy CDE problems.
AB - Regression is a fundamental problem in statistical data analysis, which aims at estimating the conditional mean of output given input. However, regression is not informative enough if the conditional probability density is multi-modal, asymmetric, and heteroscedastic. To overcome this limitation, various estimators of conditional densities themselves have been developed, and a kernel-based approach called least-squares conditional density estimation (LS-CDE) was demonstrated to be promising. However, LS-CDE still suffers from large estimation error if input contains many irrelevant features. In this paper, we therefore propose an extension of LS-CDE called sparse additive CDE (SA-CDE), which allows automatic feature selection in CDE. SA-CDE applies kernel LS-CDE to each input feature in an additive manner and penalizes the whole solution by a group-sparse regularizer. We also give a subgradient-based optimization method for SA-CDE training that scales well to high-dimensional large data sets. Through experiments with benchmark and humanoid robot transition datasets, we demonstrate the usefulness of SA-CDE in noisy CDE problems.
KW - Conditional density estimation
KW - Feature selection
KW - Sparse structured norm
UR - http://www.scopus.com/inward/record.url?scp=84939259738&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84939259738&partnerID=8YFLogxK
U2 - 10.1007/s10994-014-5472-x
DO - 10.1007/s10994-014-5472-x
M3 - Article
AN - SCOPUS:84939259738
SN - 0885-6125
VL - 100
SP - 161
EP - 182
JO - Machine Learning
JF - Machine Learning
IS - 2-3
ER -