Tail index estimation based on survey data

Bertail, Patrice; Chautru, Emilie; Clémençon, Stéphan

doi:10.1051/ps/2014011

Bertail, Patrice ^1,² ; Chautru, Emilie ³ ; Clémençon, Stéphan ⁴

¹MODAL’X - Université Paris Ouest, 92001 Nanterre, France
²Laboratoire de Statistique, CREST, France
³Laboratoire AGM - Université de Cergy-Pontoise, 95000 Cergy-Pontoise, France
⁴Institut Mines-Télécom - LTCI UMR Télécom ParisTech/CNRS No. 5141, 75634 Paris, France.

ESAIM: Probability and Statistics, Tome 19 (2015), pp. 28-59

Résumé

This paper is devoted to tail index estimation in the context of survey data. Assuming that the population of interest is described by a heavy-tailed statistical model, we prove that the survey scheme plays a crucial role in the design of consistent inference methods for extremes. As can be revealed by simulation experiments, ignoring the sampling plan generally induces a significant bias, jeopardizing the accuracy of the extreme value statistics thus computed. Focus is here on the celebrated Hill method for tail index estimation, it is shown how to modify it in order to take into account the survey design. Precisely, under specific conditions on the inclusion probabilities of first and second orders, we establish the consistency of the variant of the Hill estimator we propose. Additionally, its asymptotic normality is proved in a specific situation. Application of this limit result for building Gaussian confidence intervals is thoroughly discussed and illustrated by numerical results.

MR Zbl

DOI : 10.1051/ps/2014011

Classification : 62D05, 62F12, 62G32
Keywords: Survey sampling, tail index estimation, Hill estimator, Poisson survey scheme, rejective sampling

Affiliations des auteurs :

Bertail, Patrice ^{1
,

2} ; Chautru, Emilie ³ ; Clémençon, Stéphan ⁴

¹ MODAL’X - Université Paris Ouest, 92001 Nanterre, France
² Laboratoire de Statistique, CREST, France
³ Laboratoire AGM - Université de Cergy-Pontoise, 95000 Cergy-Pontoise, France
⁴ Institut Mines-Télécom - LTCI UMR Télécom ParisTech/CNRS No. 5141, 75634 Paris, France.

@article{PS_2015__19__28_0,
     author = {Bertail, Patrice and Chautru, Emilie and Cl\'emen\c{c}on, St\'ephan},
     title = {Tail index estimation based on survey data},
     journal = {ESAIM: Probability and Statistics},
     pages = {28--59},
     year = {2015},
     publisher = {EDP-Sciences},
     volume = {19},
     doi = {10.1051/ps/2014011},
     mrnumber = {3374868},
     zbl = {1351.62019},
     language = {en},
     url = {https://www.numdam.org/articles/10.1051/ps/2014011/}
}

TY  - JOUR
AU  - Bertail, Patrice
AU  - Chautru, Emilie
AU  - Clémençon, Stéphan
TI  - Tail index estimation based on survey data
JO  - ESAIM: Probability and Statistics
PY  - 2015
SP  - 28
EP  - 59
VL  - 19
PB  - EDP-Sciences
UR  - https://www.numdam.org/articles/10.1051/ps/2014011/
DO  - 10.1051/ps/2014011
LA  - en
ID  - PS_2015__19__28_0
ER  -

%0 Journal Article
%A Bertail, Patrice
%A Chautru, Emilie
%A Clémençon, Stéphan
%T Tail index estimation based on survey data
%J ESAIM: Probability and Statistics
%D 2015
%P 28-59
%V 19
%I EDP-Sciences
%U https://www.numdam.org/articles/10.1051/ps/2014011/
%R 10.1051/ps/2014011
%G en
%F PS_2015__19__28_0

Bertail, Patrice; Chautru, Emilie; Clémençon, Stéphan. Tail index estimation based on survey data. ESAIM: Probability and Statistics, Tome 19 (2015), pp. 28-59. doi: 10.1051/ps/2014011

Bibliographie
Cité par

J. Beirlant, Y. Goegebeur, J. Segers and J. Teugels, Statistics of extremes: theory and applications. John Wiley & Sons Inc (2004). | MR | Zbl

Y.G. Berger, Rate of convergence to normal distribution for the Horvitz−Thompson estimator. J. Stat. Plann. Inference 67 (1998) 209–226. | MR | Zbl

P. Bertail, E. Chautru and S. Clémençon, Empirical processes in survey sampling. Submitted (2013).

N.H. Bingham, C.M. Goldie and J.L. Teugels, Regular variation. Encycl. Math. Appl. Cambridge Univ Press, Cambridge (1987). | MR | Zbl

D. Bonnéry, J. Breidt and F. Coquet, Propriétés asymptotiques de l’échantillon dans le cas d’un plan de sondage informatif. Submitted (2011).

N.E. Breslow, T. Lumley, C. Ballantyne, L. Chambless and M. Kulich, Improved Horvitz−Thompson estimation of model parameters from two-phase stratified samples: applications in epidemiology. Stat. Biosci. 1 (2009) 32–49.

N.E. Breslow and J.A. Wellner, Weighted likelihood for semiparametric models and two-phase stratified samples, with application to Cox regression. Scand. J. Stat. 35 (2007) 186–192. | MR | Zbl

N.E. Breslow and J.A. Wellner, A Z-theorem with estimated nuisance parameters and correction note for “Weighted likelihood for semiparametric models and two-phase stratified samples, with application to Cox regression”. Scand. J. Stat. 35 (2008) 186–192. | MR | Zbl

W.G. Cochran, Sampling techniques. Wiley, New York (1977). | MR | Zbl

J. Danielsson, L. De Haan, L. Peng and C.G. De Vries, Using a bootstrap method to choose the sample fraction in tail index estimation. J. Multivariate Anal. 76 (2001) 226–248. | MR | Zbl

L. De Haan and A. Ferreira, Extreme value theory: an introduction. Springer Verlag (2006). | MR | Zbl

L. De Haan and L. Peng. Comparison of tail index estimators. Stat. Neerl. 52 (1998) 60–70. | MR | Zbl

L. De Haan and S. Resnick, On asymptotic normality of the Hill estimator. Stoch. Models 14 (1998) 849–867. | MR | Zbl

L. De Haan and S. Stadtmüller, Generalized regular variation of second order. J. Austral. Math. Soc. Ser. A 61 (1996) 381–395. | MR | Zbl

J.C. Deville, Réplications d’échantillons, demi-échantillons, Jackknife, bootstrap dans les sondages. Economica, Ed. Droesbeke, Tassi, Fichet (1987).

J.C. Deville and C.E. Särndal, Calibration estimators in survey sampling. J. Am. Stat. Assoc. 87 (1992) 376–382. | MR | Zbl

W. Feller, An introduction to probability theory and its applications, 2nd edition. John Wiley & Sons Inc., New York (1971). | MR | Zbl

R.D. Gill, Y. Vardi and J.A. Wellner, Large sample theory of empirical distributions in biased sampling models. Ann. Stat. 16 (1988) 1069–1112. | MR | Zbl

Y. Goegebeur, J. Beirlant and T. De Wet, Linking Pareto-tail kernel goodness-of-fit statistics with tail index at optimal threshold and second order estimation. Revstat 6 (2008) 51–69. | MR | Zbl

C.M. Goldie and R.L. Smith, Slow variation with remainder: theory and applications. Quart. J. Math. Oxford 38 (1987) 45–71. | MR | Zbl

M.I. Gomes and O. Oliveira, The bootstrap methodology in statistics of extremes – choice of the optimal sample fraction. Extremes 4 (2001) 331–358. | MR | Zbl

C. Gourieroux, Théorie des sondages. Economica (1981).

C. Gourieroux, Effets d’un sondage: cas du $χ^{2}$ et de la régression. Economica, Ed. Droesbeke, Tassi, Fichet (1987).

J. Hajek, Asymptotic theory of rejective sampling with varying probabilities from a finite population. Ann. Math. Stat. 35 (1964) 1491–1523. | MR | Zbl

H.O. Hartley and J.N.K. Rao, Sampling with unequal probabilities and without replacement. Ann. Math. Stat. 33 (1962) 350–374. | MR | Zbl

B.M. Hill, A simple general approach to inference about the tail of a distribution. Ann. Stat. 3 (1975) 1163–1174. | MR | Zbl

D.G. Horvitz and D.J. Thompson, A generalization of sampling without replacement from a finite universe. J. Am. Stat. Assoc. 47 (1951) 663–685. | MR | Zbl

D.M. Mason, Laws of large numbers for sums of extreme values. Ann. Probab. 10 (1982) 756–764. | MR | Zbl

R.B. Nelsen, An introduction to copulas. Springer (1999). | MR | Zbl

S.I. Resnick, Heavy-tail phenomena: probabilistic and statistical modeling. Springer Verlag (2007). | MR | Zbl

P.M. Robinson, On the convergence of the Horvitz−Thompson estimator. Austral. J. Stat. 24 (1982) 234–238. | MR | Zbl

P. Rosen, Asymptotic theory for successive sampling. J. Am. Math. Soc. 43 (1972) 373–397. | Zbl

T. Saegusa and J.A. Wellner, Weighted likelihood estimation under two-phase sampling. Preprint available at (2011). | arXiv | MR | Zbl

Y. Tillé, Sampling algorithms. Springer Ser. Stat. (2006). | MR | Zbl

Cité par Sources :