TY - JOUR
T1 - Detecting outliers in high-dimensional neuroimaging datasets with robust covariance estimators
AU - Fritsch, Virgile
AU - Varoquaux, Gaël
AU - Thyreau, Benjamin
AU - Poline, Jean Baptiste
AU - Thirion, Bertrand
N1 - Funding Information:
This work was supported by a Digiteo DIM-Lsc grant (HiDiNim project, No. 2010-42D). The data were acquired within the Imagen project. JBP was partly funded by the Imagen project, which receives research funding from the E.U. Community’s FP6, LSHM-CT-2007-037286. This manuscript reflects only the author’s views and the Community is not liable for any use that may be made of the information contained therein.
PY - 2012/10
Y1 - 2012/10
N2 - Medical imaging datasets often contain deviant observations, the so-called outliers, due to acquisition or preprocessing artifacts or resulting from large intrinsic inter-subject variability. These can undermine the statistical procedures used in group studies as the latter assume that the cohorts are composed of homogeneous samples with anatomical or functional features clustered around a central mode. The effects of outlying subjects can be mitigated by detecting and removing them with explicit statistical control. With the emergence of large medical imaging databases, exhaustive data screening is no longer possible, and automated outlier detection methods are currently gaining interest. The datasets used in medical imaging are often high-dimensional and strongly correlated. The outlier detection procedure should therefore rely on high-dimensional statistical multivariate models. However, state-of-the-art procedures, based on the Minimum Covariance Determinant (MCD) estimator, are not well-suited for such high-dimensional settings. In this work, we introduce regularization in the MCD framework and investigate different regularization schemes. We carry out extensive simulations to provide backing for practical choices in absence of ground truth knowledge. We demonstrate on functional neuroimaging datasets that outlier detection can be performed with small sample sizes and improves group studies.
AB - Medical imaging datasets often contain deviant observations, the so-called outliers, due to acquisition or preprocessing artifacts or resulting from large intrinsic inter-subject variability. These can undermine the statistical procedures used in group studies as the latter assume that the cohorts are composed of homogeneous samples with anatomical or functional features clustered around a central mode. The effects of outlying subjects can be mitigated by detecting and removing them with explicit statistical control. With the emergence of large medical imaging databases, exhaustive data screening is no longer possible, and automated outlier detection methods are currently gaining interest. The datasets used in medical imaging are often high-dimensional and strongly correlated. The outlier detection procedure should therefore rely on high-dimensional statistical multivariate models. However, state-of-the-art procedures, based on the Minimum Covariance Determinant (MCD) estimator, are not well-suited for such high-dimensional settings. In this work, we introduce regularization in the MCD framework and investigate different regularization schemes. We carry out extensive simulations to provide backing for practical choices in absence of ground truth knowledge. We demonstrate on functional neuroimaging datasets that outlier detection can be performed with small sample sizes and improves group studies.
KW - High-dimension
KW - Minimum covariance determinant
KW - Neuroimaging
KW - Outlier detection
KW - Robust estimation
UR - http://www.scopus.com/inward/record.url?scp=84866444366&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84866444366&partnerID=8YFLogxK
U2 - 10.1016/j.media.2012.05.002
DO - 10.1016/j.media.2012.05.002
M3 - Article
C2 - 22728304
AN - SCOPUS:84866444366
VL - 16
SP - 1359
EP - 1370
JO - Medical Image Analysis
JF - Medical Image Analysis
SN - 1361-8415
IS - 7
ER -