New insights into Approximate bayesian Computation
Annales de l'I.H.P. Probabilités et statistiques, Volume 51 (2015) no. 1, pp. 376-403.

Approximate Bayesian Computation (ABC for short) is a family of computational techniques which offer an almost automated solution in situations where evaluation of the posterior likelihood is computationally prohibitive, or whenever suitable likelihoods are not available. In the present paper, we analyze the procedure from the point of view of k-nearest neighbor theory and explore the statistical properties of its outputs. We discuss in particular some asymptotic features of the genuine conditional density estimate associated with ABC, which is an interesting hybrid between a k-nearest neighbor and a kernel method.

Le terme anglais « Approximate Bayesian Computation » (ABC en abrégé) désigne une famille de techniques bayésiennes ayant pour objet la simulation selon une loi de probabilité lorsque la vraisemblance a posteriori n’est pas disponible ou s’avère impossible à évaluer numériquement. Dans le présent article, nous envisageons cette procédure du point de vue de la théorie des k-plus proches voisins, en nous attachant plus particulièrement à examiner les propriétés statistiques des sorties de l’algorithme. Cela nous conduit à analyser le comportement asymptotique d’un estimateur de la densité conditionnelle naturellement associé à ABC, utilisé en pratique et possédant à la fois les caractéristiques d’un estimateur des k-plus proches voisins et celles d’une méthode à noyau.

DOI: 10.1214/13-AIHP590
Classification: 62C10, 62F15, 62G20
Keywords: approximate bayesian computation, nonparametric estimation, conditional density estimation, nearest neighbor methods, mathematical statistics
@article{AIHPB_2015__51_1_376_0,
     author = {Biau, G\'erard and C\'erou, Fr\'ed\'eric and Guyader, Arnaud},
     title = {New insights into {Approximate} bayesian {Computation}},
     journal = {Annales de l'I.H.P. Probabilit\'es et statistiques},
     pages = {376--403},
     publisher = {Gauthier-Villars},
     volume = {51},
     number = {1},
     year = {2015},
     doi = {10.1214/13-AIHP590},
     zbl = {06412909},
     language = {en},
     url = {http://www.numdam.org/articles/10.1214/13-AIHP590/}
}
TY  - JOUR
AU  - Biau, Gérard
AU  - Cérou, Frédéric
AU  - Guyader, Arnaud
TI  - New insights into Approximate bayesian Computation
JO  - Annales de l'I.H.P. Probabilités et statistiques
PY  - 2015
SP  - 376
EP  - 403
VL  - 51
IS  - 1
PB  - Gauthier-Villars
UR  - http://www.numdam.org/articles/10.1214/13-AIHP590/
DO  - 10.1214/13-AIHP590
LA  - en
ID  - AIHPB_2015__51_1_376_0
ER  - 
%0 Journal Article
%A Biau, Gérard
%A Cérou, Frédéric
%A Guyader, Arnaud
%T New insights into Approximate bayesian Computation
%J Annales de l'I.H.P. Probabilités et statistiques
%D 2015
%P 376-403
%V 51
%N 1
%I Gauthier-Villars
%U http://www.numdam.org/articles/10.1214/13-AIHP590/
%R 10.1214/13-AIHP590
%G en
%F AIHPB_2015__51_1_376_0
Biau, Gérard; Cérou, Frédéric; Guyader, Arnaud. New insights into Approximate bayesian Computation. Annales de l'I.H.P. Probabilités et statistiques, Volume 51 (2015) no. 1, pp. 376-403. doi : 10.1214/13-AIHP590. http://www.numdam.org/articles/10.1214/13-AIHP590/

[1] I. S. Abramson. On bandwidth variation in kernel estimates – A square root law. Ann. Statist. 10 (1982) 1217–1223. | DOI | MR | Zbl

[2] D. M. Bashtannyk and R. J. Hyndman. Bandwidth selection for kernel conditional density estimation. Comput. Statist. Data Anal. 36 (2001) 279–298. | DOI | MR | Zbl

[3] M. Beaumont, J.-M. Cornuet, J.-M. Marin and C. P. Robert. Adaptive approximate Bayesian computation. Biometrika 96 (2009) 983–990. | DOI | MR | Zbl

[4] M. A. Beaumont, W. Zhang and D. J. Balding. Approximate Bayesian computation in population genetics. Genetics 162 (2002) 2025–2035. | DOI

[5] G. Biau, F. Cérou and A. Guyader. On the rate of convergence of the bagged nearest neighbor estimate. J. Mach. Learn. Res. 11 (2010) 687–712. | MR | Zbl

[6] M. Blum. Approximate Bayesian computation: A nonparametric perspective. J. Amer. Statist. Assoc. 105 (2010) 1178–1187. | DOI | MR | Zbl

[7] L. Breiman, W. Meisel and E. Purcell. Variable kernel estimates of multivariate densities. Technometrics 19 (1977) 135–144. | DOI | Zbl

[8] F. Cérou and A. Guyader. Nearest neighbor classification in infinite dimension. ESAIM Probab. Stat. 10 (2006) 340–355. | DOI | Numdam | MR | Zbl

[9] T. M. Cover. Estimation by the nearest neighbor rule. IEEE Trans. Inform. Theory 14 (1968) 50–55. | DOI | Zbl

[10] M. De Guzmán. Differentiation of Integrals in n . Lecture Notes in Mathematics 481. Springer, Berlin, 1975. | MR | Zbl

[11] L. Devroye. Necessary and sufficient conditions for the pointwise convergence of nearest neighbor regression function estimates. Z. Wahrsch. Verw. Gebiete 61 (1982) 467–481. | DOI | MR | Zbl

[12] L. Devroye and A. Krzyżak. New multivariate product density estimates. J. Multivariate Anal. 82 (2002) 88–110. | DOI | MR | Zbl

[13] L. Devroye, L. Györfi and G. Lugosi. A Probabilistic Theory of Pattern Recognition. Springer, New York, 1996. | DOI | MR | Zbl

[14] J. Fan and T. H. Yim. A crossvalidation method for estimating conditional densities. Biometrika 91 (2004) 819–834. | DOI | MR | Zbl

[15] O. P. Faugeras. A quantile-copula approach to conditional density estimation. J. Multivariate Anal. 100 (2009) 2083–2099. | DOI | MR | Zbl

[16] P. Fearnhead and D. Prangle. Constructing summary statistics for approximate Bayesian computation: Semi-automatic approximate Bayesian computation. J. Roy. Statist. Soc. Ser. B 74 (2012) 419–474. | DOI | MR | Zbl

[17] E. Fix and J. L. Hodges. Discriminatory analysis – Nonparametric discrimination: Consistency properties. Project 21-49-004, Report Number 4, USAF School of Aviation Medicine, Randolph Field, TX, 1951. | Zbl

[18] Y. X. Fu and W. H. Li. Estimating the age of the common ancestor of a sample of DNA sequences. Mol. Biol. Evol. 14 (1997) 195–199. | DOI

[19] L. Györfi and M. Kohler. Nonparametric estimation of conditional distributions. IEEE Trans. Inform. Theory 53 (2007) 1872–1879. | DOI | MR | Zbl

[20] P. Hall and J. S. Marron. Variable window width kernel estimates of probability densities. Probab. Theory Related Fields 80 (1988) 37–49. | DOI | MR | Zbl

[21] P. Hall, J. Racine and Q. Li. Cross-validation and the estimation of conditional probability densities. J. Amer. Statist. Assoc. 99 (2004) 1015–1026. | DOI | MR | Zbl

[22] B. H. Hansen. Nonparametric conditional density estimation. Technical report, Univ. Wisconsin, 2004.

[23] G. H. Hardy, J. E. Littlewood and G. Pólya. Inequalities. Cambridge Univ. Press, Cambridge, 1988. | MR | Zbl

[24] W. K. Hastings. Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57 (1970) 97–109. | DOI | MR | Zbl

[25] R. J. Hyndman, D. M. Bashtannyk and G. K. Grunwald. Estimating and visualizing conditional densities. J. Comput. Graph. Statist. 5 (1996) 315–336. | MR

[26] B. Jessen, J. Marcinkiewicz and A. Zygmund. Note on the differentiability of multiple integrals. Fund. Math. 25 (1935) 217–234. | DOI | EuDML | JFM

[27] M. C. Jones. Variable kernel density estimates and variable kernel density estimates. Aust. J. Stat. 32 (1990) 361–371. | DOI | MR

[28] P. Joyce and P. Marjoran. Approximately sufficient statistics and Bayesian computation. Stat. Appl. Genet. Mol. Biol. 7 (2008) Art. ID 26. | MR | Zbl

[29] E. Kaufmann and R.-D. Reiss. On conditional distributions of nearest neighbors. J. Multivariate Anal. 42 (1992) 67–76. | DOI | MR | Zbl

[30] D. O. Loftsgaarden and C. P. Quesenberry. A nonparametric estimate of a multivariate density function. Ann. Math. Statist. 36 (1965) 1049–1051. | DOI | MR | Zbl

[31] Y. P. Mack and M. Rosenblatt. Multivariate k-nearest neighbor density estimates. J. Multivariate Anal. 9 (1979) 1–15. | DOI | MR | Zbl

[32] J. M. Marin and C. P. Robert. Bayesian Core: A Practical Approach to Computational Bayesian Statistics. Springer, New York, 2007. | MR | Zbl

[33] J. M. Marin, N. Pillai, C. P. Robert and J. Rousseau. Relevant statistics for Bayesian model choice. J. R. Stat. Soc. Ser B. To appear, 2014. | MR | Zbl

[34] J. M. Marin, P. Pudlo, C. P. Robert and R. Ryder. Approximate Bayesian computational methods. Stat. Comput. 22 (2012) 1167–1180. | DOI | MR | Zbl

[35] N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller and E. Teller. Equations of state calculations by fast computing machines. J. Chem. Phys. 21 (1953) 1087–1091. | DOI | Zbl

[36] D. S. Moore and J. W. Yackel. Consistency properties of nearest neighbor density function estimators. Ann. Statist. 5 (1977) 143–154. | DOI | MR | Zbl

[37] D. S. Moore and J. W. Yackel. Large sample properties of nearest neighbor density function estimators. In Statistical Decision Theory and Related Topics II: Proceedings of a Symposium Held at Purdue University, May 17–19, 1976, S. S. Gupta and D. S. Moore (Eds) 269–279. Academic Press, New York, 1977. | MR | Zbl

[38] E. A. Nadaraya. On estimating regression. Theory Probab. Appl. 9 (1964) 141–142. | DOI | Zbl

[39] E. A. Nadaraya. On nonparametric estimates of density functions and regression curves. Theory Probab. Appl. 10 (1965) 186–190. | DOI | MR | Zbl

[40] E. Parzen. On the estimation of a probability density function and the mode. Ann. Math. Statist. 33 (1962) 1065–1076. | DOI | MR | Zbl

[41] J. K. Pritchard, M. T. Seielstad, A. Perez-Lezaun and M. W. Feldman. Population growth of human Y chromosomes: A study of Y chromosome microsatellites. Mol. Biol. Evol. 16 (1999) 1791–1798. | DOI

[42] B. D. Ripley. Stochastic Simulation. Wiley, New York, 1982. | MR | Zbl

[43] C. P. Robert and G. Casella. Monte Carlo Statistical Methods, 2nd edition. Springer, New York, 2004. | MR | Zbl

[44] C. P. Robert, J.-M. Cornuet, J.-M. Marin and N. S. Pillai. Lack of confidence in approximate Bayesian computation model choice. Proc. Natl. Acad. Sci. USA 108 (2011) 15112–15117. | DOI

[45] M. Rosenblatt. Conditional probability density and regression estimates. In Multivariate Analysis II, P. R. Krishnaiah (Ed.) 25–31. Academic Press, New York, 1969. | MR

[46] R. M. Royall. A class of non-parametric estimates of a smooth regression function. Ph.D. thesis, Stanford Univ., 1966. | MR

[47] D. Rubin. Bayesianly justifiable and relevant frequency calculations for the applied statistician. Ann. Statist. 12 (1984) 1151–1172. | DOI | MR | Zbl

[48] S. A. Sisson, Y. Fan and M. M. Tanaka. Sequential Monte Carlo without likelihoods. Proc. Natl. Acad. Sci. USA 104 (2007) 1760–1765. | DOI | MR | Zbl

[49] E. M. Stein. Singular Integrals and Differentiability Properties of Functions. Princeton Univ. Press, Princeton, 1970. | MR | Zbl

[50] C. J. Stone. Consistent nonparametric regression. Ann. Statist. 5 (1977) 595–645. | DOI | MR | Zbl

[51] S. Tavaré, D. Balding, R. Griffith and P. Donnelly. Inferring coalescence times from DNA sequence data. Genetics 145 (1997) 505–518. | DOI

[52] G. S. Watson. Smooth regression analysis. Sankhya A 26 (1964) 359–372. | MR | Zbl

[53] R. L. Wheeden and A. Zygmund. Measure and Integral. An Introduction to Real Analysis. Marcel Dekker, New York, 1977. | MR | Zbl

[54] R. D. Wilkinson. Approximate Bayesian computation (ABC) gives exact results under the assumption of model error. Stat. Appl. Genet. Mol. Biol. 12 (2008) 129–141. | MR

[55] A. Zygmund. Trigonometric Series, Vol. II. Cambridge Univ. Press, Cambridge, 1959. | MR | Zbl

Cited by Sources: