L’identification de valeurs extrêmes s’avère particulièrement délicate en analyse multivariée lorsque la distribution sous-jacente est asymétrique et/ou à queues lourdes. Cet article présente une méthode d’identification extrêmement simple, bien adaptée à ce type de distribution et qui n’exige qu’une faible complexité calculatoire.
In multivariate analysis, it is very difficult to identify outliers in case of skewed and/or heavy-tailed distributions. In this paper, we propose a very simple outlier identification tool that works with these types of distributions and that keeps the computational complexity low.
Mot clés : identification de valeurs extrêmes, distribution multivariée asymétrique, distribution multivariée à queues lourdes, distribution de Tukey $g$-et-$h$
@article{JSFS_2016__157_2_90_0, author = {Verardi, Vincenzo and Vermandele, Catherine}, title = {Outlier identification for skewed and/or heavy-tailed unimodal multivariate distributions}, journal = {Journal de la soci\'et\'e fran\c{c}aise de statistique}, pages = {90--114}, publisher = {Soci\'et\'e fran\c{c}aise de statistique}, volume = {157}, number = {2}, year = {2016}, mrnumber = {3554075}, zbl = {1358.62053}, language = {en}, url = {http://www.numdam.org/item/JSFS_2016__157_2_90_0/} }
TY - JOUR AU - Verardi, Vincenzo AU - Vermandele, Catherine TI - Outlier identification for skewed and/or heavy-tailed unimodal multivariate distributions JO - Journal de la société française de statistique PY - 2016 SP - 90 EP - 114 VL - 157 IS - 2 PB - Société française de statistique UR - http://www.numdam.org/item/JSFS_2016__157_2_90_0/ LA - en ID - JSFS_2016__157_2_90_0 ER -
%0 Journal Article %A Verardi, Vincenzo %A Vermandele, Catherine %T Outlier identification for skewed and/or heavy-tailed unimodal multivariate distributions %J Journal de la société française de statistique %D 2016 %P 90-114 %V 157 %N 2 %I Société française de statistique %U http://www.numdam.org/item/JSFS_2016__157_2_90_0/ %G en %F JSFS_2016__157_2_90_0
Verardi, Vincenzo; Vermandele, Catherine. Outlier identification for skewed and/or heavy-tailed unimodal multivariate distributions. Journal de la société française de statistique, Tome 157 (2016) no. 2, pp. 90-114. http://www.numdam.org/item/JSFS_2016__157_2_90_0/
[1] A robust measure of skewness, Journal of Computational and Graphical Statistics, Volume 13 (2004) no. 4, pp. 996-1017 | MR
[2] A generalized boxplot for skewed and heavy-tailed distributions, Statistics and Probability Letters, Volume 95 (2014) no. 1, pp. 110-117 | MR | Zbl
[3] Breakdown properties of multivariate location estimators (1982) (Technical report Qualifying paper)
[4] Exploring Data Tables, Trends and Shapes (Wiley, ed.), Wiley, New York, 1985 | Zbl
[5] An adjusted boxplot for skewed distributions, Comput. Stat. Data Anal., Volume 52 (2008) no. 12, pp. 5186-5201 | MR | Zbl
[6] Outlier detection for skewed data, J. Chemometrics, Volume 22 (2008) no. 3-4, pp. 235-246
[7] Using Tukey’s and family of distributions to calculate value-at-risk and conditional value-at-risk, J. Risk, Volume 13 (2011) no. 4, pp. 95-116
[8] Sinh-arcsinh distributions, Biometrika, Volume 96 (2009) no. 4, pp. 761-780 | MR | Zbl
[9] Flexible modelling in statistics: past, present and future, Journal de la Société Française de Statistique, Volume 156 (2015) no. 1, pp. 76-96 | Numdam | MR | Zbl
[10] Shape properties of the g-and-h and Johnson families, Communications in Statistics - Theory and Methods, Volume 21 (1992) no. 5, pp. 1233-1250 | MR | Zbl
[11] Some properties of the Tukey g-and-h family of distributions, Communications in Statistics - Theory and Methods, Volume 13 (1984) no. 3, pp. 353-369 | MR | Zbl
[12] A comparison of methods of estimation of parameters of Tukey’s gh family of distributions, Pakistan Journal of Statistics, Volume 24 (2008) no. 2, pp. 135-144 | MR
[13] The behavior of the Stahel-Donoho robust multivariate estimator, Journal of the American Statistical Association, Volume 90 (1995) no. 429, pp. 330-341 | MR | Zbl
[14] Robuste Schätzungen: Infinitesimale Optimalität und Schätzungen von Kovarianzmatrizen, ETH Zürich (1981) (Ph. D. Thesis) | Zbl
[15] Modern techniques in data analysis, Proceedings of the NSF-Sponsored Regional Research Conference (1977)
[16] Efficient maximum approximated likelihood inference for Tukey’s, Computational Statistics & Data Analysis, Volume 91 (2015), pp. 78-91 | MR | Zbl
[17] Robust estimation of the parameters of g-and-h distributions, with application to outlier detection, Computational Statistics & Data Analysis, Volume 75 (2014), pp. 66-80 | MR | Zbl