Choix d’estimateurs basé sur le risque de Kullback-Leibler
Journal de la société française de statistique, Tome 151 (2010) no. 1, pp. 38-57.

Le choix d’estimateurs est un point crucial en statistique. Le critère le plus connu dans ce domaine est le critère proposé par Akaike. Il a été présenté comme une estimation, à une constante près, du risque de Kullback-Leibler. Cependant une valeur précise du critère d’Akaike n’a pas d’interprétation directe et la variabilité de ce critère est souvent ignorée. Nous exposons plusieurs approches pour estimer des différences de risques de Kullback-Leibler. Les critères proposés peuvent être utilisés dans un contexte paramétrique, non-paramétrique ou semi-paramétrique. Une extension de ces critères aux cas de données incomplètes est présentée. Plusieurs applications dans le cadre de donnée de survie sont décrits : choix d’estimateurs lisses pour la fonction de risque, choix entre estimateurs issus du modèle à risques proportionnels et de modèle stratifié, et choix entre estimateurs issus de modèle markovien et non-markovien. Dans le prolongement de ces travaux, des critères sont définis pour le choix d’estimateurs basés sur des observations différentes.

Estimators choice is a crucial topic in statistics. The most famous criterion is the Akaike information criterion. It has been constructed as an approximation, up to a constant, of the Kullback-Leibler risk. However, a precise value of the Akaike criterion has no direct interpretation and its variability is often ignored. We propose several approaches to estimate Kullback-Leibler risks. The criteria defined can be used in a parametric, non-parametric or semi-parametric context. An extension of these criteria for incomplete data is presented. The issue of the choice of estimators in the presence of incomplete data is described. Several applications in the survival framework is described: smooth estimators choice for the hazard function, estimators choice from proportional hazard model and stratified model, and estimators choice for markov model and non markov model. Finally, several criteria are defined for selecting estimators based on different observations.

Mot clés : AIC, données censurées, risque de Kullback-Leibler, sélection de modèle, validation croisée
Keywords: AIC criterion, incomplete data, Kullback-Leibler risk, model selection, cross-validation
@article{JSFS_2010__151_1_38_0,
     author = {Liquet, Benoit},
     title = {Choix d{\textquoteright}estimateurs bas\'e sur le risque de {Kullback-Leibler}},
     journal = {Journal de la soci\'et\'e fran\c{c}aise de statistique},
     pages = {38--57},
     publisher = {Soci\'et\'e fran\c{c}aise de statistique},
     volume = {151},
     number = {1},
     year = {2010},
     mrnumber = {2679309},
     zbl = {1316.62012},
     language = {fr},
     url = {http://www.numdam.org/item/JSFS_2010__151_1_38_0/}
}
TY  - JOUR
AU  - Liquet, Benoit
TI  - Choix d’estimateurs basé sur le risque de Kullback-Leibler
JO  - Journal de la société française de statistique
PY  - 2010
SP  - 38
EP  - 57
VL  - 151
IS  - 1
PB  - Société française de statistique
UR  - http://www.numdam.org/item/JSFS_2010__151_1_38_0/
LA  - fr
ID  - JSFS_2010__151_1_38_0
ER  - 
%0 Journal Article
%A Liquet, Benoit
%T Choix d’estimateurs basé sur le risque de Kullback-Leibler
%J Journal de la société française de statistique
%D 2010
%P 38-57
%V 151
%N 1
%I Société française de statistique
%U http://www.numdam.org/item/JSFS_2010__151_1_38_0/
%G fr
%F JSFS_2010__151_1_38_0
Liquet, Benoit. Choix d’estimateurs basé sur le risque de Kullback-Leibler. Journal de la société française de statistique, Tome 151 (2010) no. 1, pp. 38-57. http://www.numdam.org/item/JSFS_2010__151_1_38_0/

[1] Aalen, O. O. Nonparametric inference for a family of counting processes, The Annals of Statistics, Volume 6 (1978), pp. 701-726 | MR | Zbl

[2] Akaike, H. Information Theory and an Extension of the Maximum Likelihood Principle, Second International Symposium on Information Theory (Petrov, B.N.; Csaki, F., eds.), Budapset, Akademiai kiado (1973), pp. 267-281 | MR | Zbl

[3] Burnham, K. P.; Anderson, D. R. Model selection and multimodel inference : a practical information-theoretic approach, 2nd Edition. Springer-Verlag, New York, 2002 | MR | Zbl

[4] Barron, A.R.; Birgé, L.; Massart, P. Risk bounds for model selection via penalization. Probab, Probability Theory and related Fields, Volume 113(3) (1999), pp. 301-415 | MR | Zbl

[5] Brunel, E.; Comte, Fabienne Adaptive estimation of hazard rate with censored data, Communications in Statistics, Theory and methods, Volume 37(8) (2008), pp. 1284-1305 | MR | Zbl

[6] Birgé, L.; Massart, P. Minimal penalties for Gaussian model selection, Probability Theory and related Fields, Volume 138(1-2) (2007), pp. 33-73 | MR | Zbl

[7] Bozdogan, H. Akaike’s information criterion and recent developments in information complexity, J. Math. Psych., Volume 44 (2000), pp. 62-91 | MR | Zbl

[8] Commenges, D.; Gegout-Petit, A. Likelihood inference for incompletely observed stochastic processes : ignorability conditions, Scandinavian Journal of Statistics, Volume 34(2) (2007), pp. 432-450 | MR | Zbl

[9] Claeskens, G.; Hjort, N. L. Model selection and model averaging, Cambridge University Press, 2008 | MR | Zbl

[10] Commenges, D.; Joly, P.; Gegout-Petit, A.; Liquet, B. Choice between semi-parametric estimators of Markov and non-Markov multi-state models from generally coarsened observations, Scandinavian Journal of Statistics, Volume 34 (2007), pp. 33-52 | MR | Zbl

[11] Commenges, D.; Sayyareh, A.; Letenneur, L.; Guedj, J.; Bar-hen, A. Estimating a difference of Kullback-Leibler risks using a normalized difference of AIC, Annals of Applied Statistics, Volume 2(3) (2008), pp. 1123-1142 | MR | Zbl

[12] Craven, P.; Wahba, G. Smoothing noisy data with spline functions : estimating the correct degree of smoothing by the method of generalized cross-validation, Numerical Mathematics, Volume 31 (1979), pp. 377-403 | MR | Zbl

[13] DeLeeuw, J. Introduction to Akaike (1973) information theory and an extension of the maximum likelihood principle, Breakthroughs in statistics, Volume 1, Springer-Verlag, London (1992), pp. 599-609 (Kotz, S. and Johnson, N. L.)

[14] Gibbs, A.L.; Su, F.E. On choosing and bounding probability metrics, International statistical review, Volume 70 (2002), pp. 419-435 | Zbl

[15] Hastie, T. J.; Tibshirani, R. J. Generalized Additive Models, Chapman and Hall, London, 1990 | MR | Zbl

[16] Jacod, J. Multivariate point processes : predictable projection, Radon-Nikodym derivatives, representation of martingales, Wahrsch. verw Geb, Volume 31 (1975), pp. 235-253 | MR | Zbl

[17] Joly, P.; Commenges, D.; Letenneur, L. A Penalized likelihood approach for arbitrarily censored and truncated data : application to age-specific incidence of dementia, Biometrics, Volume 54 (1998), pp. 185-194 | Zbl

[18] Konishi, S.; Kitagawa, G. Information Criteria and Statistical Modeling, New-York : Springer Series in Statistics, 2008 | MR | Zbl

[19] Konishi, S.; Kitagawa, G. Generalised Information Criteria In Model Selection, Biometrika, Volume 83 (1996), pp. 875-890 | MR | Zbl

[20] Kullback, S.; Leibler, R. A. On Information and Sufficiency, Annals of Mathematical Statistics, Volume 22 (1951), pp. 79-86 | MR | Zbl

[21] Liquet, B.; Commenges, D. Estimating the expectation of the log-likelihood with censored data for estimator selection, Lifetime Data Analysis, Volume 10 (2004), pp. 351-367 | MR | Zbl

[22] Liquet, B.; Commenges, D. Choice of estimators based on different observations : modified AIC and LCV criteria, Scandinavian Journal of Statistics, Volume In press (2010) | MR | Zbl

[23] Letenneur, L.; Commenges, D.; Dartigues, J.; Barberger-Gateau, P. Incidence of dementia and alzheimer’s disease in elderly community residents of south-western france, International Journal of Epidemiology, Volume 23 (1994), pp. 1256-1261

[24] Liquet, B.; Sakarovitch, C.; Commenges, D. Bootstrap choice of estimators in non-parametric families : an extension of EIC, Biometrics, Volume 59 (2003), pp. 172-178 | MR | Zbl

[25] Liquet, B.; Saracco, J.; Commenges, D. Selection between proportional and stratified hazards models based on expected log-likelihood, Computational Statistics, Volume 22 (2007), pp. 619-634 | MR | Zbl

[26] Linhart, H.; Zucchini, W. Model selection, New-York : Wiley, 1986 | MR | Zbl

[27] Massart, P. Sélection de modèles : de la théorie à la pratique, Journal de la SFDS, Volume 4 (2008), pp. 5-28 | MR | Zbl

[28] O’Sullivan, F. Fast computation of fully automated log-density and log-hazard estimators, SIAM Journal on Scientific Computing, Volume 9 (1988), pp. 363-379 | MR | Zbl

[29] Ramlau-Hansen, H. Smoothing counting process intensities by means of kernel functions, The Annals of Statistics, Volume 11 (1983), pp. 453-466 | MR | Zbl

[30] Schwarz, G. Estimating the dimension of a model, Annals of Statistics, Volume 6 (1978), pp. 461-464 | MR | Zbl

[31] Shibata, R. Bootstrap Estimate Of Kullback-Leibler Information For Model Selection, Statistica Sinica, Volume 7 (1997), pp. 375-394 | MR | Zbl

[32] Silverman, B.W. Density estimation for statistics and data analysis, Chapman and Hall, London, 1986 | MR | Zbl

[33] Stone, M. An asymptotic equivalence of choice of model by cross-validation and Akaike’s criterion, Journal of the Royal Statistical Society B, Volume 39 (1974), pp. 44-47 | MR | Zbl

[34] Tsybakov, A. Introduction à l’estimation non-paramétrique, Mathématiques and Applications, Volume 41, Springer-Verlag, Berlin, 2004 | MR | Zbl

[35] Vuong, Q.H. Likelihood Ratio Tests for Model Selection and Non-Nested Hypotheses, Econometrica, Volume 57 (1989), pp. 307-333 | MR | Zbl