We observe a matrix of independent, identically distributed Gaussian random variables which are centered except for elements of some submatrix of size where the mean is larger than some . The submatrix is sparse in the sense that and tend to 0, whereas and tend to infinity. We consider the problem of selecting the random variables with significantly large mean values, as was also considered by [M. Kolar, S. Balakrishnan, A. Rinaldo and A. Singh, NIPS (2011)]. We give sufficient conditions on as a function of and and construct a uniformly consistent procedure in order to do sharp variable selection. We also prove the minimax lower bounds under necessary conditions which are complementary to the previous conditions. The critical values separating the necessary and sufficient conditions are sharp (we show exact constants), whereas [M. Kolar, S. Balakrishnan, A. Rinaldo and A. Singh, NIPS (2011)] only prove rate optimality and focus on suboptimal computationally feasible selectors. Note that rate optimality in this problem leaves out a large set of possible parameters, where we do not know whether consistent selection is possible.
Keywords: Estimation, minimax testing, large matrices, selection of sparse signal, sharp selection bounds, variable selection
Butucea, Cristina 1, 2 ; Ingster, Yuri I.  ; Suslina, Irina A. 3
@article{PS_2015__19__115_0,
author = {Butucea, Cristina and Ingster, Yuri I. and Suslina, Irina A.},
title = {Sharp variable selection of a sparse submatrix in a high-dimensional noisy matrix},
journal = {ESAIM: Probability and Statistics},
pages = {115--134},
year = {2015},
publisher = {EDP Sciences},
volume = {19},
doi = {10.1051/ps/2014017},
mrnumber = {3374872},
zbl = {1330.62169},
language = {en},
url = {https://www.numdam.org/articles/10.1051/ps/2014017/}
}
TY - JOUR AU - Butucea, Cristina AU - Ingster, Yuri I. AU - Suslina, Irina A. TI - Sharp variable selection of a sparse submatrix in a high-dimensional noisy matrix JO - ESAIM: Probability and Statistics PY - 2015 SP - 115 EP - 134 VL - 19 PB - EDP Sciences UR - https://www.numdam.org/articles/10.1051/ps/2014017/ DO - 10.1051/ps/2014017 LA - en ID - PS_2015__19__115_0 ER -
%0 Journal Article %A Butucea, Cristina %A Ingster, Yuri I. %A Suslina, Irina A. %T Sharp variable selection of a sparse submatrix in a high-dimensional noisy matrix %J ESAIM: Probability and Statistics %D 2015 %P 115-134 %V 19 %I EDP Sciences %U https://www.numdam.org/articles/10.1051/ps/2014017/ %R 10.1051/ps/2014017 %G en %F PS_2015__19__115_0
Butucea, Cristina; Ingster, Yuri I.; Suslina, Irina A. Sharp variable selection of a sparse submatrix in a high-dimensional noisy matrix. ESAIM: Probability and Statistics, Tome 19 (2015), pp. 115-134. doi: 10.1051/ps/2014017
, , and , Adapting to unknown sparsity by controlling the false discovery rate. Ann. Statist. 34 (2006) 559–1047. | MR | Zbl
, and , Detection of an anomalous clusters in a network. Ann. Statist. 39 (2011) 278–304. | MR | Zbl
E. Arias-Castro, E.J. Candès and Y. Plan, Global Testing and Sparse Alternatives: ANOVA, Multiple Comparisons and the Higher Criticism. Preprint arXiv:1007.1434 (2010). | MR | Zbl
, and , Near-optimal detection of geometric objects by fast multiscale methods. IEEE Trans. Inform. Theory 51 (2005) 2402–2425. | MR | Zbl
E. Arias-Castro and K. Lounici, Variable selection with exponential weights and -penalization. Preprint arxiv:1208.2635 (2012).
and , Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 (1995) 289–3300. | MR | Zbl
and , Selection of variables and dimension reduction in high-dimensional non-parametric regression. Electron. J. Stat. 2 (2008) 1224–1241. | MR | Zbl
, and , Simultaneous analysis of Lasso and Dantzig selector. Ann. Statist. 37 (2009) 1705–1732. | MR | Zbl
and , Detection of a sparse submatrix of a high-dimensional noisy matrix. Bernoulli 19 (2013) 2652–2688. | MR | Zbl
C. Butucea and G. Gayraud, Sharp detection of smooth signals in a high-dimensional sparse matrix with indirect observations. Preprint arxiv:1301.4660 (2013). | MR
, and , Estimation and confidence sets for sparse normal mixtures. Ann. Statist. 35 (2007) 2421–2449. | MR | Zbl
and , Tight oracle inequalities for low-rank matrix recovery from a minimal number of noisy random measurements. IEEE Trans. Inform. Theory. 57 (2011) 2342–2359 | MR | Zbl
and , Exact matrix completion via convex optimization. Found. Comput. Math. 9 (2009) 717–772. | MR | Zbl
and , Tight conditions for consistency of variable selection in the context of high dimensionality. Ann. Statist. 40 (2012) 2359–2763. | MR | Zbl
and , Higher criticism for detecting sparse heterogeneous mixtures. Ann. Statist. 32 (2004) 962–994. | MR | Zbl
, , and , Maximum entropy and the nearly black object. With Discussion. J. Roy. Statist. Soc., Ser. B. 54 (1992) 41-81. | MR | Zbl
P. Embrechts, C. Klüppelberg and T. Mikosch, Modelling extremal events: for insurance and finance.Springer (1997). | MR | Zbl
, Recovering low-rank matrices from few coefficients in any basis IEEE, Information theory 57 (2010) 1548–1566. | MR | Zbl
, Some problems of hypothesis testing leading to infinitely divisible distributions. Math. Methods Stat. 6 (1997) 47–69. | MR | Zbl
Yu.I. Ingster and N.A. Stepanova, Adaptive selection of sparse regression function components. Zapiski Nauchn. Sem. POMI ZAI (2012).
Yu.I. Ingster and I.A. Suslina, Nonparametric goodness-of-fit testing under gaussian models. Vol. 169 of Lect. Notes in Statist. Springer-Verlag, New York (2003). | MR | Zbl
and , On a detection of a signal of known shape in multichannel system. Zapiski Nauchn. Sem. POMI 294 (2002) 88–112, Transl. J. Math. Sci. 127 (2002) 1723–1736. | MR | Zbl
, and , Matrix completion from noisy entries. J. Mach. Learn. Res. 11 (2010) 2057–2078. | MR | Zbl
M. Kolar, S. Balakrishnan, A. Rinaldo and A. Singh, Minimax localization of structural information in large noisy matrices. NIPS (2011).
, and , Nuclear norm penalization and optimal rates for noisy low rank matrix completion. Ann. Statist. 39 (2011) 2302–2329. | MR | Zbl
and , Rodeo: sparse, greedy nonparametric regression. Ann. Statist. 36 (2008) 28–63. | MR | Zbl
, A simpler approach to matrix completion. J. Machine Learning 12 (2011) 3413–3430. | MR | Zbl
and , Estimation of high-dimensional low-rank matrices, Ann. Statist. 39 (2011) 887–930. | MR | Zbl
and , On the maximal size of Large-Average and ANOVA-fit Submatrices in a Gaussian Random Matrix. Bernoulli 19 (2013) 275–294. | MR | Zbl
, , and , Finding Large Average Submatrices in High Dimensional Data. Ann. Appl. Statist. 3 (2009) 985–1012. | MR | Zbl
A.B. Tsybakov, Introduction to nonparametric statistics. Springer Ser. Stat. Springer, New-York (2009). | MR | Zbl
, Minimax risks for sparse regressions: Ultra-high dimensional phenomenons. Electron. J. Stat. 6 (2012) 38-90. | MR | Zbl
Cité par Sources :






