We herein introduce a general variable selection procedure, which can be applied to several parametric multivariate problems, including principal components and regression, among others. The aim is to allow the identification of a small subset of the original variables that can ‘better explain’ the model through nonparametric relationships. The method typically yields some noisy uninformative variables and some variables that are strongly related because of their general dependence and our aim is to help understand the underlying structures in a given data–set. The asymptotic behaviour of the proposed method is considered and some real and simulated data–sets are analysed as examples.
Accepté le :
DOI : 10.1051/ps/2016022
Keywords: Variable selection, regression, principal components analysis
Fraiman, Ricardo 1 ; Gimenez, Yanina 2 ; Svarc, Marcela 2
@article{PS_2016__20__463_0,
author = {Fraiman, Ricardo and Gimenez, Yanina and Svarc, Marcela},
title = {Seeking relevant information from a statistical model},
journal = {ESAIM: Probability and Statistics},
pages = {463--479},
year = {2016},
publisher = {EDP Sciences},
volume = {20},
doi = {10.1051/ps/2016022},
zbl = {1353.62070},
language = {en},
url = {https://www.numdam.org/articles/10.1051/ps/2016022/}
}
TY - JOUR AU - Fraiman, Ricardo AU - Gimenez, Yanina AU - Svarc, Marcela TI - Seeking relevant information from a statistical model JO - ESAIM: Probability and Statistics PY - 2016 SP - 463 EP - 479 VL - 20 PB - EDP Sciences UR - https://www.numdam.org/articles/10.1051/ps/2016022/ DO - 10.1051/ps/2016022 LA - en ID - PS_2016__20__463_0 ER -
%0 Journal Article %A Fraiman, Ricardo %A Gimenez, Yanina %A Svarc, Marcela %T Seeking relevant information from a statistical model %J ESAIM: Probability and Statistics %D 2016 %P 463-479 %V 20 %I EDP Sciences %U https://www.numdam.org/articles/10.1051/ps/2016022/ %R 10.1051/ps/2016022 %G en %F PS_2016__20__463_0
Fraiman, Ricardo; Gimenez, Yanina; Svarc, Marcela. Seeking relevant information from a statistical model. ESAIM: Probability and Statistics, Tome 20 (2016), pp. 463-479. doi: 10.1051/ps/2016022
J.C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York (1981). | Zbl
, and , Asymptotic theory for the principal component analysis of a vector random function: Some applications to statistical inference. J. Multivariate Anal. 12 (1982) 136–154. | Zbl | DOI
K.A. De Jong and W.M. Spears, Using genetic algorithms to solve NP-complete problems. In Proc. of the Third International Conference on Genetic Algorithms. Edited by J.D. Schaffer (1989) 124–132.
, , and , Least angle regression. With discussion, and a rejoinder by the authors. Ann. Stat. 32 (2004) 407–499. | Zbl
and , Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96 (2001) 1348–1361. | Zbl | DOI
, and , Selection of variables for cluster analysis and classification rules. J. Am. Stat. Assoc. 103 (2008) 1294–1303. | Zbl | DOI
and , Model-based clustering, discriminant analysis, and density estimation. J. Am. Stat. Assoc. 97 (2002) 611–631. | Zbl | DOI
C. Fraley and A.E. Raftery, MCLUST Version 3 for R: Normal Mixture Modeling and Model-based Clustering. Technical Report No. 504, Department of Statistics, University of Washington (2009).
Y. Gimenez, Selección de variables para datos multivariado y para datos funcionales. Ph.D. thesis (2015). Available at http://cms.dm.uba.ar/academico/carreras/doctorado/TesisYaninaGimenez.pdf
, Uniform convergence rates for kernel estimation with dependent data. Econometric Theory 24 (2008) 726–748. | Zbl | DOI
W.K. Härdle and L. Simar, Applied Multivariate Statistical Analysis. Springer Verlag, Berlin (2007). | Zbl
T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning. Data mining, Inference and Prediction. Springer Verlag, Berlin (2001). | Zbl
and , Bivariate tensor-product B-splines in partly linear models. J. Multivariate Anal. 58 (1996) 162–181. | Zbl | DOI
, and , Bayesian variable and transformation selection in linear regression. J. Comput. Graph. Statist. 11 (2002) 485–507. | DOI
I.T. Jolliffe, Principal Components Analysis, 2nd edition. Springer Verlag, Berlin (2002). | Zbl
, and , K-NN nonparametric estimation of regression functions in the presence of irrelevant variables. Econom. J. 11 (1987) 396–408. | Zbl | DOI
R.A. Marona, D.R. Martin and V.Y. Yohai, Robust Statistics. Theory and Methods. Wiley (2006). | Zbl
, Principal variables. Technometrics 26 (1984) 137–144. | Zbl | DOI
G.A.F. Seber and A.J. Lee, Linear regression analysis, Second edition. Wiley series in probability and statistics (2005). | Zbl
L.J. Snell, Topics in Contemporary Probability and its Applications. CRC Press (1995). | Zbl
, Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58 (1996) 267–288. | Zbl
and , Testing significance of features by lassoed principal components. Ann. Appl. Stat. 2 (2008) 986–1012. | Zbl | DOI
and , A framework for feature selection in clustering. J. Am. Stat. Assoc. 105 (2010) 713–726. | Zbl | DOI
, and , A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10 (2009) 515–534. | Zbl | DOI
, Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38 (2010) 894–942. | Zbl | DOI
, and , Sparse principal components analysis. J. Comput. Graph. Stat. 15 (2006) 265–286. | DOI
Cité par Sources :






