Flexible modelling in statistics: past, present and future
Journal de la société française de statistique, Tome 156 (2015) no. 1, pp. 76-96.

Dans des temps où de plus en plus de données deviennent accessibles et où ces données sont de plus en plus complexes (asymétrie évidente, queues lourdes ou légères), la modélisation flexible est devenue une tâche essentielle pour les statisticiens ainsi que pour les chercheurs et praticiens de domaines tels que l’économie, la finance ou les sciences environnementales. Ceci est reflété par la richesse de propositions existantes pour des distributions flexibles ; des exemples connus sont la skew-normale d’Azzalini, la g -et- h de Tukey, des distributions de mixture ainsi que des distributions deux-morceaux, pour ne citer que celles-là. Mon but dans cet article est de donner une introduction à ce domaine de recherche, destinée à être utile à la fois pour des novices et des professionnels du domaine. Après une brève description du courant de recherche lui-même, je vais raconter l’histoire passionnante de la modélisation flexible, mettant en vedette des héros emblématiques comme Edgeworth et Pearson, puis je vais décrire trois familles de distributions flexibles qui figurent parmi les plus utilisées, et finalement donner un aperçu sur le futur de la modélisation flexible en posant des questions ouvertes stimulantes.

In times where more and more data become available and where the data exhibit rather complex structures (significant departure from symmetry, heavy or light tails), flexible modelling has become an essential task for statisticians as well as researchers and practitioners from domains such as economics, finance or environmental sciences. This is reflected by the wealth of existing proposals for flexible distributions; well-known examples are Azzalini’s skew-normal, Tukey’s g -and- h , mixture and two-piece distributions, to cite but these. My aim in the present paper is to provide an introduction to this research field, intended to be useful both for novices and professionals of the domain. After a description of the research stream itself, I will narrate the gripping history of flexible modelling, starring emblematic heroes from the past such as Edgeworth and Pearson, then depict three of the most used flexible families of distributions, and finally provide an outlook on future flexible modelling research by posing challenging open questions.

Mots clés : queues lourdes et légères, asymétrie et kurtosis, distribution skew-normalen tests de symétrie et de normalité, approche par transformation, distributions deux-pièces
@article{JSFS_2015__156_1_76_0,
     author = {Ley, Christophe},
     title = {Flexible modelling in statistics: past, present and future},
     journal = {Journal de la soci\'et\'e fran\c{c}aise de statistique},
     pages = {76--96},
     publisher = {Soci\'et\'e fran\c{c}aise de statistique},
     volume = {156},
     number = {1},
     year = {2015},
     zbl = {1316.62023},
     mrnumber = {3338241},
     language = {en},
     url = {http://www.numdam.org/item/JSFS_2015__156_1_76_0/}
}
TY  - JOUR
AU  - Ley, Christophe
TI  - Flexible modelling in statistics: past, present and future
JO  - Journal de la société française de statistique
PY  - 2015
DA  - 2015///
SP  - 76
EP  - 96
VL  - 156
IS  - 1
PB  - Société française de statistique
UR  - http://www.numdam.org/item/JSFS_2015__156_1_76_0/
UR  - https://zbmath.org/?q=an%3A1316.62023
UR  - https://www.ams.org/mathscinet-getitem?mr=3338241
LA  - en
ID  - JSFS_2015__156_1_76_0
ER  - 
Ley, Christophe. Flexible modelling in statistics: past, present and future. Journal de la société française de statistique, Tome 156 (2015) no. 1, pp. 76-96. http://www.numdam.org/item/JSFS_2015__156_1_76_0/

[1] Azzalini, A.; Arellano-Valle, R. B. Maximum penalized likelihood estimation for skew-normal and skew-t distributions, J. Statist. Plann. Infer., Volume 143 (2013), pp. 419-433 | MR 2984104 | Zbl 1254.62020

[2] Arnold, B. C.; Beaver, R. J. Skewed multivariate models related to hidden truncation and/or selective reporting (with discussion), Test, Volume 11 (2002), pp. 7-54 | MR 1915776 | Zbl 1033.62013

[3] Azzalini, A.; Capitanio, A. Distributions generated by perturbation of symmetry with emphasis on a multivariate skew-t distribution, J. Roy. Stat. Soc. Ser. B, Volume 65 (2003) no. 367-389 | MR 1983753 | Zbl 1065.62094

[4] Azzalini, A.; Capitanio, A. The Skew-Normal and Related Families, Cambridge: IMS Monographs, Cambridge University Press, 2014 | MR 3468021

[5] Azzalini, A.; Capitanio, A. Statistical applications of the multivariate skew-normal distributions, J. Roy. Stat. Soc. Ser. B, Volume 61 (1999), pp. 579-602 | MR 1707862 | Zbl 0924.62050

[6] Azzalini, A.; Dalla Valle, A. The multivariate skew-normal distribution, Biometrika, Volume 83 (1996), pp. 715-726 | MR 1440039 | Zbl 0885.62062

[7] Azzalini, A.; Genton, M. G. Robust likelihood methods based on the skew-t and related distributions, Internat. Statist. Rev., Volume 76 (2008), pp. 106-129 | Zbl 1206.62102

[8] Arnold, B. C.; Groeneveld, R. A. Measuring skewness with respect to the mode, Amer. Statist., Volume 49 (1995), pp. 34-38 | MR 1341197

[9] Aigner, D. J.; Lovell, C. A. K.; Schmidt, P. Formulation and estimation of stochastic frontier production function model, J. Economet., Volume 12 (1977), pp. 21-37 | MR 448782 | Zbl 0366.90026

[10] Allard, D.; Naveau, P. A new spatial skew-normal random field model, Comm. Statist. Theor. Meth., Volume 39 (2007), pp. 1821-1834 | MR 2409813 | Zbl 1124.62064

[11] Azzalini, A.; Regoli, G. Some properties of skew-symmetric distributions, Ann. Inst. Statist. Math., Volume 64 (2012), pp. 857-879 | MR 2927774 | Zbl 1253.62038

[12] Azzalini, A.; Regoli, G. The work of Fernando de Helguero on non-normality arising from selection, Chil. J. Statist., Volume 3 (2012), pp. 113-128 | MR 2982065 | Zbl 1449.62001

[13] Arellano-Valle, R. B.; Azzalini, A. The centred parametrization for the multivariate skew-normal distribution, J. Multivariate Anal., Volume 99 (2008), p. 1362-1382. Corrigendum: vol. 100 (2009), p. 816 | MR 2424355 | Zbl 1140.62040

[14] Arellano-Valle, R. B.; Gómez, H. W.; Quintana, F. A. Statistical inference for a general class of asymmetric distributions, J. Statist. Plann. Infer., Volume 128 (2005), pp. 427-443 | MR 2102768 | Zbl 1095.62015

[15] Azzalini, A. The skew-normal distribution and related multivariate families (with discussion), Scand. J. Statist., Volume 32 (2005), pp. 159-188 | MR 2188669 | Zbl 1091.62046

[16] Azzalini, A. A class of distributions which includes the normal ones, Scand. J. Statist., Volume 12 (1985), pp. 171-178 | MR 808153 | Zbl 0581.62014

[17] Box, G. E. P.; Cox, D. R. An analysis of transformations, J. Roy. Stat. Soc. Ser. B, Volume 26 (1964), pp. 211-252 | MR 192611 | Zbl 0156.40104

[18] Birnbaum, Z. W. Effect of linear truncation on a multinormal population, Ann. Math. Statist., Volume 21 (1950), pp. 272-279 | Zbl 0038.09201

[19] Bauwens, L.; Laurent, S. A new class of multivariate skew densities, with application to generalized autoregressive conditional heteroscedasticity models, J. Bus. Econom. Statist., Volume 23 (2005), pp. 346-354

[20] Balanda, K. P.; MacGillivray, H. L. Kurtosis: a critical review, Amer. Statist., Volume 42 (1988), pp. 111-119

[21] Barndorff-Nielsen, O.; Kent, J.; Sorensen, M. Normal variance-mean mixtures and z distributions, Internat. Statist. Rev., Volume 50 (1982), pp. 145-159 | Zbl 0497.62019

[22] Cassart, D.; Hallin, M.; Paindaveine, D. Optimal detection of Fechner-asymmetry, J. Statist. Plann. Infer., Volume 138 (2008), pp. 2499-2525 | Zbl 1173.62034

[23] Critchley, F.; Jones, M. C. Asymmetry and gradient asymmetry functions: density-based skewness and kurtosis, Scand. J. Statist., Volume 35 (2008), pp. 415-437 | Zbl 1199.60037

[24] Cherubini, U.; Luciano, E.; Vecchiato, W. Copula Methods in Finance, New York: Wiley, 2004 | Zbl 1163.62081

[25] Cover, T. M.; Thomas, J. A. Elements of Information Theory, Wiley-interscience, 2006 | Zbl 1140.94001

[26] Charemza, W.; Vela, C. D.; Makarova, S. Too many skew-normal distributions? The Practitioners perspective. (2013) (Discussion Papers in Economics 13/07)

[27] de Helguero, F. Sulla rappresentazione analitica delle curve abnormali, Atti del IV Congresso Internazionale dei Matematici (Roma, 6-11 Aprile 1908), volume III, sezione III-B, Roma: R. Accademia dei Lincei, 1909, pp. 288-299 | JFM 40.0294.03

[28] de Helguero, F. Sulla rappresentazione analitica delle curve statistiche, Giorn. Econ., Volume 38 (1909), pp. 241-265

[29] De Forest, E. On an asymmetrical probability curve, Analyst, Volume 9 (1882), pp. 135-143 | JFM 14.0160.02

[30] De Forest, E. On an asymmetrical probability curve, Analyst, Volume 10 (1883), pp. 67-74 | JFM 15.0165.01

[31] de Vries, H. Ueber halbe Galton-Curven als Zeichen discontinuirlicher Variation, Ber. Deutsch. Bot. Ges., Volume 12 (1894), pp. 197-207

[32] Duerinckx, M.; Ley, C.; Swan, Y. Maximum likelihood characterization of distributions, Bernoulli, Volume 20 (2014), pp. 775-802 | Zbl 1400.62029

[33] Edgeworth, F. Y. The law of error and the elimination of chance, Philos. Mag., Volume 21 (1886), pp. 308-324 | JFM 18.0185.02

[34] Edgeworth, F. Y. On the representation of statistics by mathematical formulae, J. R. Statist. Soc., Volume 61 (1898), pp. 670-700

[35] Fujisawa, H.; Abe, T. A family of multivariate skew distributions with monotonicity of skewness (2014) (Manuscript)

[36] Fujisawa, H.; Abe, T. A family of skew-unimodal distributions with mode-invariance through transformation of scale, Statistical Methodology (2015) | Zbl 07035731

[37] Fechner, G. T. Kollektivmasslehre, Leipzig: Engelmann, 1897 | JFM 28.0208.01

[38] Field, C.; Genton, M. G. The multivariate g -and- h distribution, Technometrics, Volume 48 (2006), pp. 104-111

[39] Fischer, M.; Klein, I. Kurtosis modelling by means of the J -transformation, Allgemeines Statistisches Archiv, Volume 88 (2004), pp. 35-50 | Zbl 1123.62303

[40] Ferreira, J. T. A. S.; Steel, M. F. J. A constructive representation of univariate skewed distributions, J. Amer. Statist. Assoc., Volume 101 (2006), pp. 823-829 | Zbl 1119.62311

[41] Ferreira, J. T. A. S.; Steel, M. F. J. A new class of skewed multivariate distributions with applications to regression analysis, Statist. Sinica, Volume 17 (2007), pp. 505-529 | Zbl 1144.62035

[42] Ferreira, J. T. A. S.; Steel, M. F. J. Model comparison of coordinate-free multivariate skewed distributions with an application to stochastic frontiers, J. Economet., Volume 137 (2007), pp. 641-673 | Zbl 1360.62258

[43] Fernández, C.; Steel, M. F. J. On Bayesian modeling of fat tails and skewness, J. Amer. Statist. Assoc., Volume 93 (1998), pp. 359-371 | Zbl 0910.62024

[44] Gauss, C. F. Theoria motus corporum coelestium in sectionibus conicis solem ambientium, Cambridge Library Collection. Cambridge University Press, 1809 | Zbl 1234.01016

[45] Genton, M. G. Skew-elliptical Distributions and Their Applications: A Journey Beyond Normality, Edited volume, Boca Raton, FL: Chapman and Hall/CRC, 2004 | Zbl 1069.62045

[46] Hald, A. A History of Mathematical Statistics from 1750 to 1930, Wiley, New York, 1998 | Zbl 0979.01012

[47] Hansen, B. E. Autoregressive conditional density estimation, Internat. Econ. Rev., Volume 35 (1994), pp. 705-730 | Zbl 0807.62090

[48] Hallin, M.; Ley, C. Skew-symmetric distributions and Fisher information - a tale of two densities, Bernoulli, Volume 18 (2012), pp. 747-763 | Zbl 1243.62068

[49] Hallin, M.; Ley, C. Skew-symmetric distributions and Fisher information: the double sin of the skew-normal, Bernoulli, Volume 20 (2014), pp. 1432-1453 | Zbl 1302.60030

[50] Haynes, M. A.; MacGillivray, H. L.; Mergersen, K. L. Robustness of ranking and selection rules using generalized g and k distributions, J. Statist. Plann. Infer., Volume 65 (1997), pp. 45-66 | Zbl 0955.62017

[51] Heinz, G.; Peterson, L. J.; Johnson, R. W.; Kerk, G. J. Exploring relationships in body dimensions, Journal of Statistics Education (online only), Volume 11 (2003) www.amstat.org/publications/jse/v11n2/datasets.heinz.html

[52] Jones, M. C.; Anaya-Izquierdo, K. On parameter orthogonality in symmetric and skew models, J. Statist. Plann. Infer., Volume 141 (2011), pp. 758-770 | Zbl 1214.62012

[53] Johnson, N. L.; Kotz, S.; Balakrishnan, N. Continuous Univariate Distributions, New York: John Wiley and Sons, 1994 | Zbl 0821.62001

[54] Johnson, N. L. Systems of frequency curves generated by methods of translation, Biometrika, Volume 36 (1949), pp. 149-176 | Zbl 0033.07204

[55] Jones, M. C. Families of distributions arising from distributions of order statistics, Test, Volume 13 (2004), pp. 1-43 | Zbl 1110.62012

[56] Jones, M. C. Distributions generated by transformation of scale using an extended Schlömilch transformation, Sankhya A, Volume 72 (2010), pp. 359-375 | Zbl 1213.60034

[57] Jones, M. C. Generating distributions by transformation of scale, Statist. Sinica, Volume 24 (2014), pp. 749-772 | Zbl 1285.62016

[58] Jones, M. C. On bivariate transformation of scale distributions, Comm. Statist. Theor. Meth. (2015)

[59] Jones, M. C. On families of distributions with shape parameters (with discussion), Internat. Statist. Rev. (2015)

[60] Jones, M. C.; Pewsey, A. Sinh-arcsinh distributions, Biometrika, Volume 96 (2009), pp. 761-780 | Zbl 1183.62019

[61] Jones, M. C.; Rosco, J. F.; Pewsey, A. Skewness-invariant measures of kurtosis, Amer. Statist., Volume 65 (2011), pp. 89-95

[62] Kapteyn, J. C. Skew Frequency Curves in Biology and Statistics, Groningen: Noordhoff, 1903 | JFM 34.0268.03

[63] Kurowicka, D.; Joe, H. Dependence Modeling: Vine Copula Handbook, World Scientific: Singapore, 2010

[64] Kotz, S.; Vicari, D. Survey of developments in the theory of continuous skewed distributions, METRON, Volume 63 (2005), pp. 225-261 | Zbl 1416.62111

[65] Ley, C. Skew distributions, Encyclopedia of Environmetrics Second Edition, A.-H. El-Shaarawi and W. Piegorsch (eds), John Wiley & Sons Ltd, Chichester, UK, 2012, pp. 1944-1949

[66] Lee, C.; Famoye, F.; Alzaatreh, A. Methods for generating families of univariate continuous distributions in the recent decades, WIREs Comput. Statist., Volume 5 (2013), pp. 219-238

[67] Ley, C.; Paindaveine, D. Le Cam optimal tests for symmetry against Ferreira and Steel’s general skewed distributions, J. Nonparam. Statist., Volume 21 (2009), pp. 943-967 | Zbl 1175.62042

[68] Ley, C.; Paindaveine, D. Multivariate skewing mechanisms: a unified perspective based on the transformation approach, Stat. Probab. Lett., Volume 80 (2010), pp. 1685-1694 | Zbl 1219.60009

[69] Ley, C.; Paindaveine, D. On Fisher information matrices and profile log-likelihood functions in generalized skew-elliptical models, METRON, Volume 68, Special Issue on “Skew-symmetric and flexible distributions” (2010), pp. 235-250 | Zbl 1301.62019

[70] Ley, C.; Paindaveine, D. On the singularity of multivariate skew-symmetric models, J. Multivariate Anal., Volume 101 (2010), pp. 1434-1444 | Zbl 1196.60024

[71] Ley, C.; Paindaveine, D. Discussion of “On families of distributions with shape parameters” by M. C. Jones, Internat. Statist. Rev. (2015) (To appear.)

[72] Ley, C.; Verdebout, T. Skew-rotsymmetric distributions on unit spheres and related efficient inferential procedures (2014) (ECARES Working Paper 2014-46)

[73] McWilliams, T. P. A distribution-free test for symmetry based on a runs statistic, J. Amer. Statist. Assoc., Volume 85 (1990), pp. 1130-1133

[74] Mudholkar, G. S.; Hutson, A. D. The epsilon-skew-normal distribution for analyzing near-normal data, J. Statist. Plann. Infer., Volume 83 (2000), pp. 291-309 | Zbl 0943.62012

[75] McLachlan, G.; Peel, D. Finite Mixture Models, Wiley Series in Probability and Statistics, 2000 | Zbl 0963.62061

[76] Meeusen, W.; van den Broeck, J. Efficiency estimation from Cobb-Douglas production functions with composed error, Internat. Econ. Rev., Volume 18 (1977), pp. 435-444 | Zbl 0366.90025

[77] Nelsen, R. B. An Introduction to Copulas, Second Edition, Springer: New York, 2006 | Zbl 1152.62030

[78] Naveau, P.; Genton, M. G.; Shen, X. A skewed Kalman filter, J. Multivariate Anal., Volume 94 (2005), pp. 382-400 | Zbl 1066.62091

[79] O’Hagan, A.; Leonard, T. Bayes estimation subject to uncertainty about parameter constraints, Biometrika, Volume 63 (1976), pp. 201-203 | Zbl 0326.62025

[80] Pearson, K. Mathematical contributions to the theory of evolution. X. Supplement to a memoir on skew variation, Phil. Trans. R. Soc. Lond. A, Volume 197 (1901), pp. 443-459 | JFM 32.0239.05

[81] Pearson, K. Das Fehlergesetz und seine Verallgemeinerungen durch Fechner und Pearson. A rejoinder, Biometrika, Volume 4 (1905), pp. 169-212 | JFM 36.0313.10

[82] Pearson, K. Mathematical contributions to the theory of evolution, XIX: Second supplement to a memoir on skew variation, Phil. Trans. R. Soc. Lond. A, Volume 216 (1916), pp. 429-457 | JFM 46.1495.08

[83] Pearson, K. Asymmetrical frequency curves, Nature, Volume 48 (1893), pp. 615-616 | JFM 25.0347.04

[84] Pearson, K. Contributions to the mathematical theory of evolution. II. Skew variation in homogeneous material, Phil. Trans. R. Soc. Lond. A, Volume 186 (1895), pp. 343-414 | JFM 26.0243.03

[85] Pearson, K. On skew probability curves, Nature, Volume 52 (1895), p. 317-317 | Article | JFM 26.0242.04

[86] Pewsey, A. Invited discussion of “On families of distributions with shape parameters” by M. C. Jones, Internat. Statist. Rev. (2015)

[87] Poincaré, H. Calcul des Probabilités, Carré-Naud, Paris, 1896 | JFM 27.0190.11

[88] Pretorius, S. J. Skew bivariate frequency surfaces, examined in the light of numerical illustrations, Biometrika, Volume 22 (1930), pp. 109-223 | JFM 56.1107.04

[89] Randles, R. H.; Fligner, M. A.; Policello, G. E.; Wolfe, D. A. An asymptotically distribution-free test for symmetry versus asymmetry, J. Amer. Statist. Assoc., Volume 75 (1980), pp. 168-172 | Zbl 0427.62024

[90] Ranke, K. E.; Greiner, A. Das Fehlergesetz und seine Verallgemeinerungen durch Fechner und Pearson in ihrer Tragweite fur die Anthropologie, Archiv für Anthropologie, Volume 2 (1904), pp. 295-332

[91] Rayner, G. D.; MacGillivray, H. L. Numerical maximum likelihood estimation for the g -and- k and generalized g -and- h distributions, Stat. Comput., Volume 12 (2002), pp. 57-75 | Zbl 1247.62069

[92] Rieck, J. R.; Nedelman, J. R. A log-linear model for the Birnbaum-Saunders distribution, Technometrics, Volume 33 (1991), pp. 51-60 | Zbl 0717.62090

[93] Rubio, F. J.; Ogundimu, E. O.; Hutton, J. L. Robust modelling using two-piece sinh-arcsinh distributions (2014) (arXiv:1307.6021)

[94] Rosco, J. F. Aplicaciones de la Transformacion Sinh-arcsinh y Problemas Relacionados (2012) (Ph. D. Thesis)

[95] Rubio, F. J.; Steel, M. F. J. Inference in two-piece location-scale models with Jeffreys priors (with discussion), Bayesian Analysis, Volume 9 (2014), pp. 1-22 | Zbl 1327.62157

[96] Rubio, F. J. Modelling of kurtosis and skewness : Bayesian inference and distribution theory (2013) (Ph. D. Thesis)

[97] Razali, N. M.; Wah, Y. B. Power comparisons of Shapiro-Wilk, Kolmogorov-Smirnov, Lilliefors and Anderson-Darling tests, J. Statist. Model. Analytics, Volume 2 (2011), pp. 21-33

[98] Serfling, R. J. Multivariate symmetry and asymmetry, Encyclopedia of Statistical Sciences, Second Edition (S. Kotz, N. Balakrishnan, C.B. Read and B. Vidakovic, eds.), Vol. 8, Wiley, 2006, pp. 5338-5345

[99] Sklar, A. Fonctions de répartition à n dimensions et leurs marges, Publications de l’Institut de Statistique de l’Université de Paris, Volume 8 (1959), pp. 229-231 | Zbl 0100.14202

[100] Smith, R. L.; Naylor, J. C. A comparison of maximum likelihood and Bayesian estimators for the three-parameter Weibull distribution, J. Roy. Stat. Soc. Ser. C, Volume 36 (1987), pp. 358-369

[101] Steel, M. F. J.; Rubio, F. J. Discussion of “On families of distributions with shape parameters” by M. C. Jones, Internat. Statist. Rev. (2015)

[102] Stamhuis, I. H.; Seneta, E. Pearson’s statistics in the Netherlands and the astronomer Kapteyn, Internat. Statist. Rev., Volume 77 (2009), pp. 96-117

[103] Stigler, S. M. Francis Ysidro Edgeworth, statistician (with discussion), J. Roy. Stat. Soc. Ser. A, Volume 141 (1978), pp. 287-322 | Zbl 0431.01007

[104] Stigler, S. M. The history of statistics: The measurement of uncertainty before 1900, Cambridge and London: The Belknap Press of Harvard University Press, 1986 | Zbl 0656.62005

[105] Tukey, J. W. Modern techniques in data analysis, NSF-sponsored Regional Research Conference (1977)

[106] Umbach, D. The effect of the skewing distribution on skew-symmetric families, Soochow Journal of Mathematics, Volume 33 (2007), pp. 657-668 | Zbl 1137.60308

[107] Wallis, K. F. The two-piece normal, binormal, or double Gaussian distribution: its origin and rediscoveries, Statist. Sci., Volume 29 (2014), pp. 106-112 | Zbl 1332.60009

[108] Wang, J.; Boyer, J.; Genton, M. G. A skew-symmetric representation of multivariate distribution, Statist. Sinica, Volume 14 (2004), pp. 1259-1270 | Zbl 1060.62059