Type I error rate control for testing many hypotheses: a survey with proofs

Roquain, Etienne

Type I error rate control for testing many hypotheses: a survey with proofs
[Une revue du contrôle de l’erreur de type I en test multiple]

Roquain, Etienne

Journal de la société française de statistique, Tome 152 (2011) no. 2, pp. 3-38.

Résumé
Abstract

Ce travail présente une revue des récents travaux du contrôle de l’erreur de type I en test multiple. On considère le problème du contrôle du “ $k$ -family-wise error rate" (kFWER, probabilité d’effectuer au moins $k$ fausses découvertes) et du “false discovery proportion" (FDP, proportion de fausses découvertes parmi les découvertes). Le FDP est contrôlé soit via son espérance (correspondant au fameux “false discovery rate") soit via sa queue de distribution. Nous recherchons à obtenir à la fois des résultats unifiés et des preuves mathématiques simples et concises. De plus, nous proposons de nouvelles contributions méthodologiques pour contrôler le kFWER et la queue de distribution du FDP. En particulier, nous introduisons une nouvelle procédure qui contrÃ´le le FDP sous indépendance et qui est basée sur les quantiles de la loi binomiale.

This paper presents a survey on some recent advances for the type I error rate control in multiple testing methodology. We consider the problem of controlling the $k$ -family-wise error rate (kFWER, probability to make $k$ false discoveries or more) and the false discovery proportion (FDP, proportion of false discoveries among the discoveries). The FDP is controlled either via its expectation, which is the so-called false discovery rate (FDR), or via its upper-tail distribution function. We aim at deriving general and unified results together with concise and simple mathematical proofs. Furthermore, while this paper is mainly meant to be a survey paper, some new contributions for controlling the kFWER and the upper-tail distribution function of the FDP are provided. In particular, we derive a new procedure based on the quantiles of the binomial distribution that controls the FDP under independence.

MR Zbl

Keywords: multiple testing, type I error rate, false discovery proportion, family-wise error, step-up, step-down, positive dependence
Mot clés : test multiple, erreur de type I, taux de fausses découvertes, probabilité de fausses découvertes, dépendance positive

@article{JSFS_2011__152_2_3_0,
     author = {Roquain, Etienne},
     title = {Type {I} error rate control for testing many hypotheses: a survey with proofs},
     journal = {Journal de la soci\'et\'e fran\c{c}aise de statistique},
     pages = {3--38},
     publisher = {Soci\'et\'e fran\c{c}aise de statistique},
     volume = {152},
     number = {2},
     year = {2011},
     mrnumber = {2821220},
     zbl = {1316.62115},
     language = {en},
     url = {http://www.numdam.org/item/JSFS_2011__152_2_3_0/}
}

TY  - JOUR
AU  - Roquain, Etienne
TI  - Type I error rate control for testing many hypotheses: a survey with proofs
JO  - Journal de la société française de statistique
PY  - 2011
SP  - 3
EP  - 38
VL  - 152
IS  - 2
PB  - Société française de statistique
UR  - http://www.numdam.org/item/JSFS_2011__152_2_3_0/
LA  - en
ID  - JSFS_2011__152_2_3_0
ER  -

%0 Journal Article
%A Roquain, Etienne
%T Type I error rate control for testing many hypotheses: a survey with proofs
%J Journal de la société française de statistique
%D 2011
%P 3-38
%V 152
%N 2
%I Société française de statistique
%U http://www.numdam.org/item/JSFS_2011__152_2_3_0/
%G en
%F JSFS_2011__152_2_3_0

Roquain, Etienne. Type I error rate control for testing many hypotheses: a survey with proofs. Journal de la société française de statistique, Tome 152 (2011) no. 2, pp. 3-38. http://www.numdam.org/item/JSFS_2011__152_2_3_0/

Bibliographie
Cité par

[1] Arlot, S.; Blanchard, G.; Roquain, E. Some nonasymptotic results on resampling in high dimension. I. Confidence regions, Ann. Statist., Volume 38 (2010) no. 1, pp. 51-82 | DOI | MR | Zbl

[2] Arlot, S.; Blanchard, G.; Roquain, E. Some nonasymptotic results on resampling in high dimension. II. Multiple tests, Ann. Statist., Volume 38 (2010) no. 1, pp. 83-99 | DOI | MR | Zbl

[3] Blanchard, G.; Fleuret, F. Occam’s hammer, Learning theory (Lecture Notes in Comput. Sci.), Volume 4539, Springer, Berlin, 2007, pp. 112-126 | DOI | MR | Zbl

[4] Benjamini, Y.; Hochberg, Y. On the adaptive control of the false discovery rate in multiple testing with independent statistics, J. Behav. Educ. Statist., Volume 25 (2000), pp. 60-83

[5] Benjamini, Y.; Heller, R. False discovery rates for spatial signals, J. Amer. Statist. Assoc., Volume 102 (2007) no. 480, pp. 1272-1281 | DOI | MR | Zbl

[6] Benjamini, Y.; Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing, J. Roy. Statist. Soc. Ser. B, Volume 57 (1995) no. 1, pp. 289-300 | MR | Zbl

[7] Baraud, Y.; Huet, S.; Laurent, B. Adaptive tests of linear hypotheses by model selection, Ann. Statist., Volume 31 (2003) no. 1, pp. 225-251 | MR | Zbl

[8] Benjamini, Y.; Krieger, A. M.; Yekutieli, D. Adaptive linear step-up procedures that control the false discovery rate, Biometrika, Volume 93 (2006) no. 3, pp. 491-507 | MR | Zbl

[9] Black, M. A. A note on the adaptive control of false discovery rates, J. R. Stat. Soc. Ser. B Stat. Methodol., Volume 66 (2004) no. 2, pp. 297-304 | MR | Zbl

[10] Blanchard, G. Contrôle non-asymptotique adaptatif du family-wise error rate en tests multiples (2009) (Talk at Journées Statistiques du Sud, Porquerolles)

[11] Blanchard, G.; Roquain, E. Two simple sufficient conditions for FDR control, Electron. J. Stat., Volume 2 (2008), pp. 963-992 | MR | Zbl

[12] Blanchard, G.; Roquain, E. Adaptive false discovery rate control under independence and dependence, J. Mach. Learn. Res., Volume 10 (2009), pp. 2837-2871 | MR | Zbl

[13] Benjamini, Y.; Yekutieli, D. The control of the false discovery rate in multiple testing under dependency, Ann. Statist., Volume 29 (2001) no. 4, pp. 1165-1188 | MR | Zbl

[14] Celisse, A.; Robin, S. A cross-validation based estimation of the proportion of true null hypotheses, Journal of Statistical Planning and Inference, Volume 140 (2010) no. 11, pp. 3132 -3147 | DOI | MR | Zbl

[15] Chi, Z.; Tan, Z. Positive false discovery proportions: intrinsic bounds and adaptive control, Statist. Sinica, Volume 18 (2008) no. 3, pp. 837-860 | MR | Zbl

[16] Durot, C.; Rozenholc, Y. An adaptive test for zero mean, Math. Methods Statist., Volume 15 (2006) no. 1, pp. 26-60 | MR

[17] Delattre, S.; Roquain, E. On the false discovery proportion convergence under Gaussian equi-correlation, Statistics & Probability Letters, Volume 81 (2011) no. 1, pp. 111-115 | DOI | MR | Zbl

[18] Dudoit, S.; van der Laan, M. J. Multiple testing procedures with applications to genomics, Springer Series in Statistics, Springer, New York, 2008, xxxiv+588 pages | DOI | MR | Zbl

[19] Efron, B. Microarrays, empirical Bayes and the two-groups model, Statist. Sci., Volume 23 (2008) no. 1, pp. 1-22 | DOI | MR | Zbl

[20] Efron, B. Correlated z -values and the accuracy of large-scale statistical estimates, J. Amer. Statist. Assoc., Volume 105 (2010) no. 491, pp. 1042-1055 | MR | Zbl

[21] Farcomeni, A. Some results on the control of the false discovery rate under dependence, Scand. J. Statist., Volume 34 (2007) no. 2, pp. 275-297 | MR | Zbl

[22] Finner, H.; Dickhaus, T.; Roters, M. Dependency and false discovery rate: asymptotics, Ann. Statist., Volume 35 (2007) no. 4, pp. 1432-1455 | MR | Zbl

[23] Finner, H.; Dickhaus, R.; Roters, M. On the false discovery rate and an asymptotically optimal rejection curve, Ann. Statist., Volume 37 (2009) no. 2, pp. 596-618 | MR | Zbl

[24] Fisher, R. A. The Design of Experiments., Oliver and Boyd, Edinburgh.p, 1935 | JFM

[25] Ferreira, J. A.; Zwinderman, A. H. On the Benjamini-Hochberg method, Ann. Statist., Volume 34 (2006) no. 4, pp. 1827-1849 | MR | Zbl

[26] Gavrilov, Y.; Benjamini, Y.; Sarkar, S. K. An adaptive step-down procedure with proven FDR control under independence, Ann. Statist., Volume 37 (2009) no. 2, pp. 619-629 | MR | Zbl

[27] Goeman, J.; Finos, L. The inheritance procedure: multiple testing of tree-structured hypotheses (2010) (Technical report) | MR

[28] Guo, W.; Rao, M. B. On control of the false discovery rate under no assumption of dependency, Journal of Statistical Planning and Inference, Volume 138 (2008) no. 10, pp. 3176-3188 | MR | Zbl

[29] Goeman, J.; Solari, A. The sequential rejection principle of familywise error control, Ann. Statist., Volume 38 (2010) no. 6, pp. 3782-3810 | MR | Zbl

[30] Genovese, C.; Wasserman, L. Operating characteristics and extensions of the false discovery rate procedure, J. R. Stat. Soc. Ser. B Stat. Methodol., Volume 64 (2002) no. 3, pp. 499-517 | MR | Zbl

[31] Genovese, C.; Wasserman, L. A stochastic process approach to false discovery control, Ann. Statist., Volume 32 (2004) no. 3, pp. 1035-1061 | MR | Zbl

[32] Holm, S. A simple sequentially rejective multiple test procedure, Scand. J. Statist., Volume 6 (1979) no. 2, pp. 65-70 | MR | Zbl

[33] Hochberg, Y.; Tamhane, A. C. Multiple comparison procedures, Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics, John Wiley & Sons Inc., New York, 1987, xxiv+450 pages | DOI | MR | Zbl

[34] Kim, K. In; Roquain, E.; van de Wiel, M. A. Spatial Clustering of Array CGH Features in Combination with Hierarchical Multiple Testing, Stat. Appl. Genet. Mol. Biol., Volume 9 (2010) no. 1 | MR | Zbl

[35] Kim, K. In; van de Wiel, M. Effects of dependence in high-dimensional multiple testing problems, BMC Bioinformatics, Volume 9 (2008) no. 1 http://www.biomedcentral.com/1471-2105/9/114 | DOI

[36] Lehmann, E. L.; Romano, J. P. Generalizations of the familywise error rate, Ann. Statist., Volume 33 (2005), pp. 1138-1154 | MR | Zbl

[37] Lehmann, E. L.; Romano, Joseph P. Testing statistical hypotheses, Springer Texts in Statistics, Springer, New York, 2005, xiv+784 pages | MR | Zbl

[38] Massart, P. Concentration inequalities and model selection, Lecture Notes in Mathematics, 1896, Springer, Berlin, 2007, xiv+337 pages (Lectures from the 33rd Summer School on Probability Theory held in Saint-Flour, July 6–23, 2003, With a foreword by Jean Picard) | MR | Zbl

[39] M., Christopher J.; G., Christopher; Nichol, R. C.; Wasserman, L.; Connolly, A.; Reichart, D.; Hopkins, A.; Schneider, J.; Moore, A. Controlling the False-Discovery Rate in Astrophysical Data Analysis, The Astronomical Journal, Volume 122 (2001) no. 6, pp. 3492-3505 http://stacks.iop.org/1538-3881/122/i=6/a=3492

[40] Meinshausen, N.; Meier, L.; Bühlmann, P. p-values for high-dimensional regression, J. Amer. Statist. Assoc., Volume 104 (2009) no. 488, pp. 1671-1681 | DOI | MR | Zbl

[41] Neuvial, P. Asymptotic properties of false discovery rate controlling procedures under independence, Electron. J. Stat., Volume 2 (2008), pp. 1065-1110 | DOI | MR | Zbl

[42] Pantazis, D.; Nichols, T. E.; Baillet, S.; Leahy, R. M. A comparison of random field theory and permutation methods for statistical analysis of MEG data, NeuroImage, Volume 25 (2005), pp. 383-394

[43] Rubin, D.; Dudoit, S.; van der Laan, M. A method to increase the power of multiple testing procedures through sample splitting, Stat. Appl. Genet. Mol. Biol., Volume 5 (2006), 20 pages (electronic) | MR | Zbl

[44] Romano, J. P.; Shaikh, A. M. Stepup procedures for control of generalizations of the familywise error rate, Ann. Statist., Volume 34 (2006) no. 4, pp. 1850-1873 | MR | Zbl

[45] Roquain, E.; Villers, Fanny Exact calculations for false discovery proportion with application to least favorable configurations, Ann. Statist., Volume 39 (2011) no. 1, pp. 584-612 | Zbl

[46] Roquain, E.; van de Wiel, M. Optimal weighting for false discovery rate control, Electron. J. Stat., Volume 3 (2009), pp. 678-711 | Zbl

[47] Romano, J. P.; Wolf, M. Exact and approximate stepdown methods for multiple hypothesis testing, J. Amer. Statist. Assoc., Volume 100 (2005) no. 469, pp. 94-108 | MR | Zbl

[48] Romano, J. P.; Wolf, M. Control of generalized error rates in multiple testing, Ann. Statist., Volume 35 (2007) no. 4, pp. 1378-1408 | DOI | MR | Zbl

[49] Romano, J. P.; Wolf, M. Balanced control of generalized error rates, Ann. Statist., Volume 38 (2010) no. 1, pp. 598-633 | DOI | MR | Zbl

[50] Sarkar, S. K. Some results on false discovery rate in stepwise multiple testing procedures, Ann. Statist., Volume 30 (2002) no. 1, pp. 239-257 | MR | Zbl

[51] Sarkar, S. K. On Methods Controlling the False Discovery Rate, Sankhya, Ser. A, Volume 70 (2008), pp. 135-168 | Zbl

[52] Seeger, P. A Note on a Method for the Analysis of Significances en masse, Technometrics, Volume 10 (1968) no. 3, pp. 586-593 http://www.jstor.org/stable/1267112

[53] Spokoiny, V. G. Adaptive hypothesis testing using wavelets, Ann. Statist., Volume 24 (1996) no. 6, pp. 2477-2498 | DOI | MR | Zbl

[54] Schweder, T.; Spjøtvoll, E. Plots of P-values to evaluate many tests simultaneously, Biometrika, Volume 69 (1982) no. 3, pp. 493-502 | DOI

[55] Storey, J. D. A direct approach to false discovery rates, J. R. Stat. Soc. Ser. B Stat. Methodol., Volume 64 (2002) no. 3, pp. 479-498 | MR | Zbl

[56] Storey, J. D.; Taylor, J. E.; Siegmund, D. Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach, J. R. Stat. Soc. Ser. B Stat. Methodol., Volume 66 (2004) no. 1, pp. 187-205 | MR | Zbl

[57] van der Laan, M. J.; Birkner, M. D.; Hubbard, A. E. Empirical Bayes and resampling based multiple testing procedure controlling tail probability of the proportion of false positives, Stat. Appl. Genet. Mol. Biol., Volume 4 (2005), 32 pages (electronic) | MR | Zbl

[58] van de Wiel, M. A.; Berkhof, J.; van Wieringen, W. N. Testing the prediction error difference between 2 predictors, Biostat, Volume 10 (2009) no. 3, pp. 550-560 | DOI | Zbl

[59] van de Wiel, M. A.; van Wieringen, W. N. CGHregions: Dimension Reduction for Array CGH Data with Minimal Information Loss, Cancer Inform, Volume 3 (2007), pp. 55-63

[60] Verzelen, N.; Villers, F. Goodness-of-fit tests for high-dimensional Gaussian linear models, Ann. Statist., Volume 38 (2010) no. 2, pp. 704-752 | DOI | MR | Zbl

[61] Wasserman, L. All of statistics, Springer Texts in Statistics, Springer-Verlag, New York, 2004, xx+442 pages (A concise course in statistical inference) | MR | Zbl

[62] Wasserman, L.; Roeder, K. Weighted Hypothesis Testing (2006) (Technical report) | arXiv

[63] Westfall, P. H.; Young, S. S. Resampling-Based Multiple Testing, Wiley, 1993 (Examples and Methods for $P$ -Value Adjustment) | Zbl