Outlier identification for skewed and/or heavy-tailed unimodal multivariate distributions
[Identification de valeurs extrêmes pour des distributions multivariées unimodales asymétriques et/ou à queues lourdes]
Journal de la société française de statistique, Tome 157 (2016) no. 2, pp. 90-114.

L’identification de valeurs extrêmes s’avère particulièrement délicate en analyse multivariée lorsque la distribution sous-jacente est asymétrique et/ou à queues lourdes. Cet article présente une méthode d’identification extrêmement simple, bien adaptée à ce type de distribution et qui n’exige qu’une faible complexité calculatoire.

In multivariate analysis, it is very difficult to identify outliers in case of skewed and/or heavy-tailed distributions. In this paper, we propose a very simple outlier identification tool that works with these types of distributions and that keeps the computational complexity low.

Keywords: outlier identification, skewed multivariate distribution, heavy-tailed multivariate distribution, Tukey $g$-and-$h$ distribution
Mot clés : identification de valeurs extrêmes, distribution multivariée asymétrique, distribution multivariée à queues lourdes, distribution de Tukey $g$-et-$h$
@article{JSFS_2016__157_2_90_0,
     author = {Verardi, Vincenzo and Vermandele, Catherine},
     title = {Outlier identification for skewed and/or heavy-tailed unimodal multivariate distributions},
     journal = {Journal de la soci\'et\'e fran\c{c}aise de statistique},
     pages = {90--114},
     publisher = {Soci\'et\'e fran\c{c}aise de statistique},
     volume = {157},
     number = {2},
     year = {2016},
     mrnumber = {3554075},
     zbl = {1358.62053},
     language = {en},
     url = {http://www.numdam.org/item/JSFS_2016__157_2_90_0/}
}
TY  - JOUR
AU  - Verardi, Vincenzo
AU  - Vermandele, Catherine
TI  - Outlier identification for skewed and/or heavy-tailed unimodal multivariate distributions
JO  - Journal de la société française de statistique
PY  - 2016
SP  - 90
EP  - 114
VL  - 157
IS  - 2
PB  - Société française de statistique
UR  - http://www.numdam.org/item/JSFS_2016__157_2_90_0/
LA  - en
ID  - JSFS_2016__157_2_90_0
ER  - 
%0 Journal Article
%A Verardi, Vincenzo
%A Vermandele, Catherine
%T Outlier identification for skewed and/or heavy-tailed unimodal multivariate distributions
%J Journal de la société française de statistique
%D 2016
%P 90-114
%V 157
%N 2
%I Société française de statistique
%U http://www.numdam.org/item/JSFS_2016__157_2_90_0/
%G en
%F JSFS_2016__157_2_90_0
Verardi, Vincenzo; Vermandele, Catherine. Outlier identification for skewed and/or heavy-tailed unimodal multivariate distributions. Journal de la société française de statistique, Tome 157 (2016) no. 2, pp. 90-114. http://www.numdam.org/item/JSFS_2016__157_2_90_0/

[1] Brys, G.; Hubert, M.; Struyf, A. A robust measure of skewness, Journal of Computational and Graphical Statistics, Volume 13 (2004) no. 4, pp. 996-1017 | MR

[2] Bruffaerts, C.; Verardi, V.; Vermandele, C. A generalized boxplot for skewed and heavy-tailed distributions, Statistics and Probability Letters, Volume 95 (2014) no. 1, pp. 110-117 | MR | Zbl

[3] Donoho, D.L. Breakdown properties of multivariate location estimators (1982) (Technical report Qualifying paper)

[4] Hoaglin, D.C.; Mosteller, F.; Tukey, J.W. Exploring Data Tables, Trends and Shapes (Wiley, ed.), Wiley, New York, 1985 | Zbl

[5] Hubert, M.; Vandervieren, E. An adjusted boxplot for skewed distributions, Comput. Stat. Data Anal., Volume 52 (2008) no. 12, pp. 5186-5201 | MR | Zbl

[6] Hubert, M.; Van der Veeken, S. Outlier detection for skewed data, J. Chemometrics, Volume 22 (2008) no. 3-4, pp. 235-246

[7] Jiménez, J.A.; Arunachalam, V. Using Tukey’s g and h family of distributions to calculate value-at-risk and conditional value-at-risk, J. Risk, Volume 13 (2011) no. 4, pp. 95-116

[8] Jones, M. C.; Pewsey, A. Sinh-arcsinh distributions, Biometrika, Volume 96 (2009) no. 4, pp. 761-780 | MR | Zbl

[9] Ley, Chr. Flexible modelling in statistics: past, present and future, Journal de la Société Française de Statistique, Volume 156 (2015) no. 1, pp. 76-96 | Numdam | MR | Zbl

[10] MacGillivray, H.L. Shape properties of the g-and-h and Johnson families, Communications in Statistics - Theory and Methods, Volume 21 (1992) no. 5, pp. 1233-1250 | MR | Zbl

[11] Martinez, J.; Iglewicz, B. Some properties of the Tukey g-and-h family of distributions, Communications in Statistics - Theory and Methods, Volume 13 (1984) no. 3, pp. 353-369 | MR | Zbl

[12] Mahbubul, A.; Majumder, A.; Ali, M.M. A comparison of methods of estimation of parameters of Tukey’s gh family of distributions, Pakistan Journal of Statistics, Volume 24 (2008) no. 2, pp. 135-144 | MR

[13] Maronna, R.A.; Yohai, V.J. The behavior of the Stahel-Donoho robust multivariate estimator, Journal of the American Statistical Association, Volume 90 (1995) no. 429, pp. 330-341 | MR | Zbl

[14] Stahel, W.A. Robuste Schätzungen: Infinitesimale Optimalität und Schätzungen von Kovarianzmatrizen, ETH Zürich (1981) (Ph. D. Thesis) | Zbl

[15] Tukey, J.W. Modern techniques in data analysis, Proceedings of the NSF-Sponsored Regional Research Conference (1977)

[16] Xu, G.; Genton, M.G. Efficient maximum approximated likelihood inference for Tukey’s, Computational Statistics & Data Analysis, Volume 91 (2015), pp. 78-91 | MR | Zbl

[17] Xu, Y.; Iglewicz, B.; Chervoneva, I. Robust estimation of the parameters of g-and-h distributions, with application to outlier detection, Computational Statistics & Data Analysis, Volume 75 (2014), pp. 66-80 | MR | Zbl