Distributions to model overdispersed count data
[Distributions pour le comptage de données surdispersées]
Journal de la société française de statistique, Tome 157 (2016) no. 2, pp. 39-63.

Au début du vingtième siècle, seules quelques lois de comptage (loi binomiale, loi de Poisson) sont couramment utilisées en modélisation. Elles trouvent leur limite dans la modélisation de données bimodales ou surdispersées, notamment celles associées à un phénomène dont la survenue engendre d’autres occurrences. Pour les modéliser, de nouvelles lois sont élaborées, dites « lois contagieuses ». Ces distributions, telles que la loi binomiale négative, la loi de Neyman, la loi de Thomas ou encore la loi de Pólya-Aeppli, peuvent s’exprimer sous forme de mélange de lois usuelles ou encore de somme de variables aléatoires dont l’indice est borné par une variable aléatoire. Elles permettent d’ajuster des répartitions bimodales, surdispersées et très irrégulières. L’objectif de notre revue de la littérature est de décrire l’apparition de ces lois surdispersées, d’effectuer un descriptif de chacune de ces distributions, en s’intéressant en particulier à leurs différentes caractérisations, à leurs propriétés élémentaires et à leur utilisation potentielle, et enfin de les comparer en les appliquant à des données réelles surdispersées (cas de tuberculose bovine).

In the early twentieth century, only a few count distributions (binomial and Poisson distributions) were commonly used in modeling. These distributions fail to model bimodal or overdispersed data, especially data related to phenomena for which the occurrence of a given event increases the chance of additional events occurring. New count distributions have since been introduced to address such phenomena; they are named “contagious” distributions. This group of distributions, which includes the negative binomial, Neyman, Thomas and Pólya-Aeppli distributions, can be expressed as mixture distributions or as stopped-sum distributions. They take into account bimodality and overdispersion, and show a greater flexibility with regards to value distributions. The aim of this literature review is to 1) explain the introduction of these distributions, 2) describe each of these overdispersed distributions, focusing in particular on their definitions, their basic properties, and their practical utility, and 3) compare their strengths and weaknesses by modeling overdispersed real count data (bovine tuberculosis cases).

Mots clés : Distributions discrètes, Surdispersion, Mélange de lois, Somme finie de distributions, Loi Binomiale Négative, Loi de Neyman
@article{JSFS_2016__157_2_39_0,
     author = {Coly, Sylvain and Yao, Anne-Franoise and Abrial, David and Charras-Garrido, Myriam},
     title = {Distributions to model overdispersed count data},
     journal = {Journal de la soci\'et\'e fran\c{c}aise de statistique},
     pages = {39--63},
     publisher = {Soci\'et\'e fran\c{c}aise de statistique},
     volume = {157},
     number = {2},
     year = {2016},
     zbl = {1357.92005},
     language = {en},
     url = {http://www.numdam.org/item/JSFS_2016__157_2_39_0/}
}
TY  - JOUR
AU  - Coly, Sylvain
AU  - Yao, Anne-Franoise
AU  - Abrial, David
AU  - Charras-Garrido, Myriam
TI  - Distributions to model overdispersed count data
JO  - Journal de la société française de statistique
PY  - 2016
DA  - 2016///
SP  - 39
EP  - 63
VL  - 157
IS  - 2
PB  - Société française de statistique
UR  - http://www.numdam.org/item/JSFS_2016__157_2_39_0/
UR  - https://zbmath.org/?q=an%3A1357.92005
LA  - en
ID  - JSFS_2016__157_2_39_0
ER  - 
Coly, Sylvain; Yao, Anne-Franoise; Abrial, David; Charras-Garrido, Myriam. Distributions to model overdispersed count data. Journal de la société française de statistique, Tome 157 (2016) no. 2, pp. 39-63. http://www.numdam.org/item/JSFS_2016__157_2_39_0/

[1] ANSES; DGAL Bulletin Epidemiologique - Sante animale-alimentation (2011), pp. 1-68 (Technical report)

[2] Adelstein, AM Accident proneness: a criticism of the concept based upon an analysis of shunters’ accidents, Journal of the Royal Statistical Society. Series A, Volume 115 (1952) no. 3, pp. 354-410 http://www.jstor.org/stable/2980739

[3] Aeppli, Alfred Zur Theorie verketteter Wahrscheinlichkeiten (1924), pp. 1-57 (Ph. D. Thesis)

[4] Agarwal, Deepak K; Gelfand, Alane; Citron-Pousty, Steven Zero-inflated models with application to spatial count data, Environmental and Ecological Statistics, Volume 9 (2002), pp. 341-355

[5] Anscombe, F.J. Sampling theory of the negative binomial and logarithmic series distributions, Biometrika, Volume 37 (1950) no. 3, pp. 358-382 | Zbl 0039.14202

[6] Archibald, E.E.A. Plant populations I: New application of Neyman’s contagious distribution, Annals of botany, Volume 12 (1948) no. 221

[7] Bliss, C.I.; Fisher, R.A. Fitting the negative binomial distribution to biological data / Note on the efficient fitting of the negative binomial, Biometrics, Volume 9 (1953) no. 2, pp. 176-200

[8] Beall, Geoffrey; Rescia, Richard R. A generalization of Neyman’s contagious distributions, Biometrics, Volume 9 (1953) no. 3, pp. 354-386

[9] Brent, R.P. Algorithms for minimization without derivatives, Dover Publications, 2002 | Zbl 1009.90133

[10] Consul, P.C.; Jain, G.C A generalization of the Poisson distribution, Technometrics, Volume 15 (1973) no. 4, pp. 791-799 | Zbl 0271.60020

[11] Clayton, D; Kaldor, J Empirical Bayes estimates of age-standardized relative risks for use in disease mapping., Biometrics, Volume 43 (1987), pp. 671-681 | Article

[12] Eggenberger, Florian Die wahrscheinlichkeitsansteckung (1924) no. 371 http://e-collection.library.ethz.ch/view/eth:21693

[13] Eggenberger, Florian; Polya, G. Über die Statistik verketteter Vorgange, Zeitschrift fur Angewandte Mathmatik und Mechanik, Volume 1 (1923), pp. 279-289 | JFM 49.0382.01

[14] Evans, D.A. Experimental evidence concerning contagious distributions in ecology, Biometrika, Volume 40 (1953) no. 1/2, pp. 186-211 | Zbl 0050.36505

[15] Fisher, R.A.; Corbet, A. Steven; Williams, C.B. The relation between the number of species and the number of individuals in a random sample of an animal population, British Ecological Society, Volume 12 (1943) no. 1, pp. 42-58

[16] Feller, W. On a general class of “contagious” distributions, Annals of Mathematical Statistics, 1943, pp. 389-400 | Zbl 0063.01341

[17] Feller, W. An introduction to probability theory and its applications, New-York, 1957 | Zbl 0077.12201

[18] Fahrmeir, Ludwig; Osuna, Leyre; Echavarría, L O Structured additive regression for overdispersed and zero-inflated count data, Applied Stochastic Models in Business and Industry, Volume 22 (2006), pp. 351-369 | Article | Zbl 1114.62023

[19] Française, Académie Dictionnaire de l’Académie française, 1932-1935

[20] Famoye, Felix; Singh, Karan P On inflated Generalized Poisson regression models, Advances and Applications in Statistics, Volume 3 (2003), pp. 145-158 | Zbl 1042.62063

[21] Gates, Scott Econometric Analysis of Count Data (Book), Journal Of Peace Research, Volume 39 (2002) no. 1, p. 132-132 | Article

[22] Grogger, J.T.; Carson, R.T. Models for truncated counts, Journal of Applied Econometrics (1991)

[23] Gurland, John Some interrelations among compound and generalized distributions, Biometrika, Volume 44 (1957) no. 1–2, pp. 265-268 | Zbl 0084.14003

[24] Gurland, John A generalized class of contagious distributions, Biometrics, Volume 14 (1958) no. 2, pp. 229-249 | Zbl 0081.13902

[25] Gurland, J Some applications of the negative binomial and other contagious distributions., American journal of public health and the nation’s health, Volume 49 (1959) no. 10, p. 1388-99 http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1372954&tool=pmcentrez&rendertype=abstract

[26] Gurland, John Some families of compound and generalized distributions (1963), pp. 1-29 http://oai.dtic.mil/oai/oai?verb=getRecord&metadataPrefix=html&identifier=AD0401218

[27] Greenwood, M; Yule, GU An inquiry into the nature of frequency distributions representative of multiple happenings with particular reference to the occurrence of multiple attacks of disease or of, Journal of the Royal Statistical Society, Volume 83 (1920) no. 2, pp. 255-279 http://www.jstor.org/stable/2341080

[28] Grange, John M; Yates, Malcolm D; de Kantor, Isabel N Guidelines for speciation within the Mycobacterium tuberculosis complex (1996), pp. 1-18 (Technical report)

[29] Hsu, P.L. Contribution to the theory of Students’s’t-test as applied to the problem of two samples, Statistical Research Memoirs, Volume 2 (1938), pp. 1-24 | JFM 64.1209.03 | Zbl 0020.14901

[30] Jones, P.C.T.; Mollison, J.E.; Quenouille, M.H. A technique for the quantitative estimation of soil micro-organisms, J. General Microbiol., Volume 2 (1948), pp. 54-69

[31] Joe, Harry; Zhu, Rong Generalized Poisson distribution: the property of mixture of Poisson and comparison with negative binomial distribution., Biometrical journal. Biometrische Zeitschrift, Volume 47 (2005), pp. 219-229 | Article | Zbl 1442.62431

[32] Kendall, D. G. Stochastic processes and population growth, J. R. Statist. Soc. Ser. B, Volume 11 (1949), pp. 230-264 | Zbl 0038.08803

[33] Kennedy, Peter A Guide to Econometrics, MIT Press, 1994

[34] Kitagawa, T.; Huruya, S. The application of the limit theorems of the contagious stochastic processes to contagious diseases, Mem. Faculty of Sc. KyushuImperial University A, Volume 1 (1941), pp. 195-207 | JFM 67.0464.01 | Zbl 0024.16002

[35] Karlis, Dimitris; Xekalaki, Evdokia Mixed Poisson distributions, International Statistical Review, Volume 73 (2005) no. 1, pp. 35-58 | Zbl 1104.62010

[36] Lambert, Diane Zero-Inflated Poisson regression, with an application to defects in manufacturing, Technometrics, Volume 34 (1992) no. 1, pp. 1-14 http://www.jstor.org/stable/1269547 | Zbl 0850.62756

[37] Le Cam, L An approximation theorem for the Poisson binomial distribution, Pacific J. Math, Volume 10 (1960) no. 4, pp. 1181-1197 http://msp.org/pjm/1960/10-4/pjm-v10-n4-p11-p.pdf | Zbl 0118.33601

[38] Lundberg, O. On random processes and their application to sickness and accident statistics (1940) (Ph. D. Thesis) | Zbl 0063.03678

[39] McKendrick, A.G. Studies on the theory of continuous probabilities, with special reference to its bearing on natural phenomena of a progressive nature, Proceeding of the London Mathematical Society, Volume 13 (1914), pp. 401-416 | JFM 45.1265.03

[40] Massé, Jean-Claude; Theodorescu, Radu Neyman type A distribution revisited, Statistica Neerlandica, Volume 59 (2005) no. 2, pp. 206-213 | Zbl 1085.62010

[41] Neyman, J. On a new class of “contagious” distributions, applicable in entomology and bacteriology, The Annals of Mathematical Statistics, Volume 10 (1939) no. 1, pp. 35-57 | JFM 65.0570.03 | Zbl 0020.38203

[42] Nuel, G. Cumulative distribution function of a geometric Poisson distribution, Journal of Statistical Computation and Simulation, Volume 78 (2008) no. 3, pp. 385-394 | Article | Zbl 1136.62018

[43] OIE Tuberculose bovine, Fiches d’information generale sur les maladies (2012), pp. 1-6

[44] Özel, Gamze; Ceyhan, Inal The probability function of a geometric Poisson distribution, Journal of Statistical Computation and Simulation, Volume 80 (2010) no. 5, pp. 479-487 | Article | Zbl 1197.60008

[45] Palm, C. Inhomogenous telephone traffic in full-availability groups, Ericsson Technics, Volume 1 (1937), pp. 1-36

[46] Pearson, Karl Contributions to the mathematical theory of evolution - II. Skew variation in homogeneous material, Philosophical transactions of the royal society of London (A), Volume 186 (1895), pp. 343-414 http://scholar.google.com/scholar?hl=en&btnG=Search&q=intitle:Philosophical+Transactions+of+the+Royal+Society+of+London.+A,#0 | JFM 26.0243.03

[47] Park, Kwang Su; Jun, Chi Hyuck Use of contagious distributions in the semiconductor yield models considering cluster effect, Communications in Statistics - Theory and Methods (2000), pp. 1-17 | Article | Zbl 0955.62121

[48] Park, Kwang Su; Jun, Chi Hyuck Semiconductor yield models using contagious distributions and their limiting forms, Computers and Industrial Engineering, Volume 42 (2002) no. 2–4, pp. 115-125

[49] Pólya, G. Sur quelques points de la théorie des probabilités, Annales de l’Institut Henri Poincaré, Volume 1 (1930) no. 2, pp. 117-161 | JFM 57.0610.02

[50] Quenouille, M.H. A relation between the logarithmic, Poisson, and negative binomial series, Biometrics, Volume 5 (1949) no. 2, pp. 162-164

[51] Rémond de Montmort, Pierre Essai d’analyse sur les jeux de hasard (Quillau, Jacques, ed.), Paris, 1713 | Zbl 0476.60001

[52] Rosenblatt, Alfred Sur le concept de contagion de M. G. Pólya dans le calcul des probabilités. Divers schèmas. Application à la peste bubonique au Pérou, Academia de Ciencias Exactas, Fisicas y Naturales de Lima, 1940 | Zbl 0063.06557

[53] Rutherford, R.S.G. On a contagious distribution (1954), pp. 703-713 (Technical report) | Zbl 0056.35902

[54] Satterthwaite, F.E. Generalized Poisson distribution, Annals of Mathematical Statistics, Volume 13 (1942), pp. 410-417 | Zbl 0063.06743

[55] Schlattmann, Peter; Böhning, Dankmar Mixture models and disease mapping, Statistics in Medicine, Volume 12 (1993), pp. 1943-1950 | Article

[56] Sherbrooke, C. C. Discrete compound poisson processes and tables of the geometric poisson distribution, Naval Research Logistics, Volume 15 (1968), pp. 189-203 | Article

[57] Skellam, J.G. Studies in statistical ecology 1, spacial patterns, Biometrika, Volume 39 (1952), pp. 346-362 | Zbl 0047.38602

[58] New Oxford American Dictionary (Stevenson, Angus; Lindberg, Christine A., eds.), Oxford University Press, 2015

[59] Subrahmaniam, K. On a general class of contagious distributions: The Pascal-Poisson distribution, Trabajos de estadistica y de investigacion operativa, Volume 17 (1966) no. 2–3, pp. 109-128 | Article | Zbl 0147.19304

[60] Teich, M C Role of the doubly stochastic Neyman type-A and Thomas counting distributions in photon detection., Applied optics, Volume 20 (1981) no. 14, p. 2457-67 http://www.ncbi.nlm.nih.gov/pubmed/20332977

[61] Thomas, Marjorie A generalization of Poisson’s binomial limit for use in ecology, Biometrika, Volume 36 (1949) no. 1, pp. 18-25

[62] von Luders, Rolf Die Statistik der seltenen Ereignisse, Biometrika, Volume 26 (1934) no. 1–2, pp. 108-128 | Article | Zbl 0009.22201

[63] Winkelmann, R. Econometric Analysis of Count Data, Springer, 2003 http://books.google.fr/books?id=mFi05v3OVE0C | Zbl 1032.62108

[64] Woodbury, M. A. On a probability distribution, The annals of mathematical statistics, University of Michigan, 1949, pp. 311-313 | Zbl 0041.25005