Bayesian learning with Wasserstein barycenters
ESAIM: Probability and Statistics, Tome 26 (2022), pp. 436-472

We introduce and study a novel model-selection strategy for Bayesian learning, based on optimal transport, along with its associated predictive posterior law: the Wasserstein population barycenter of the posterior law over models. We first show how this estimator, termed Bayesian Wasserstein barycenter (BWB), arises naturally in a general, parameter-free Bayesian model-selection framework, when the considered Bayesian risk is the Wasserstein distance. Examples are given, illustrating how the BWB extends some classic parametric and non-parametric selection strategies. Furthermore, we also provide explicit conditions granting the existence and statistical consistency of the BWB, and discuss some of its general and specific properties, providing insights into its advantages compared to usual choices, such as the model average estimator. Finally, we illustrate how this estimator can be computed using the stochastic gradient descent (SGD) algorithm in Wasserstein space introduced in a companion paper, and provide a numerical example for experimental validation of the proposed method.

DOI : 10.1051/ps/2022015
Classification : 62G05, 62FL5, 62H10, 62G20
Keywords: Bayesian learning, non-parametric estimation, Wasserstein distance and barycenter, consistency, MCMC, stochastic gradient descent in Wasserstein space
@article{PS_2022__26_1_436_0,
     author = {Backhoff-Veraguas, Julio and Fontbona, Joaquin and Rios, Gonzalo and Tobar, Felipe},
     title = {Bayesian learning with {Wasserstein} barycenters},
     journal = {ESAIM: Probability and Statistics},
     pages = {436--472},
     year = {2022},
     publisher = {EDP-Sciences},
     volume = {26},
     doi = {10.1051/ps/2022015},
     mrnumber = {4519227},
     zbl = {07730417},
     language = {en},
     url = {https://www.numdam.org/articles/10.1051/ps/2022015/}
}
TY  - JOUR
AU  - Backhoff-Veraguas, Julio
AU  - Fontbona, Joaquin
AU  - Rios, Gonzalo
AU  - Tobar, Felipe
TI  - Bayesian learning with Wasserstein barycenters
JO  - ESAIM: Probability and Statistics
PY  - 2022
SP  - 436
EP  - 472
VL  - 26
PB  - EDP-Sciences
UR  - https://www.numdam.org/articles/10.1051/ps/2022015/
DO  - 10.1051/ps/2022015
LA  - en
ID  - PS_2022__26_1_436_0
ER  - 
%0 Journal Article
%A Backhoff-Veraguas, Julio
%A Fontbona, Joaquin
%A Rios, Gonzalo
%A Tobar, Felipe
%T Bayesian learning with Wasserstein barycenters
%J ESAIM: Probability and Statistics
%D 2022
%P 436-472
%V 26
%I EDP-Sciences
%U https://www.numdam.org/articles/10.1051/ps/2022015/
%R 10.1051/ps/2022015
%G en
%F PS_2022__26_1_436_0
Backhoff-Veraguas, Julio; Fontbona, Joaquin; Rios, Gonzalo; Tobar, Felipe. Bayesian learning with Wasserstein barycenters. ESAIM: Probability and Statistics, Tome 26 (2022), pp. 436-472. doi: 10.1051/ps/2022015

[1] M. Agueh and G. Carlier, Barycenters in the Wasserstein space. SIAM J. Math. Anal. 43 (2011) 904–924. | MR | Zbl | DOI

[2] J. Altschuler and E. Boix-Adserà, Wasserstein barycenters are NP-hard to compute. Preprint [2021). | arXiv | MR | Zbl

[3] P.C. Álvarez-Esteban, E. Del Barrio, J.A. Cuesta-Albertos and C. Matrán, A fixed-point approach to barycenters in Wasserstein space. J. Math. Anal. Appi. 441 (2016) 744–762. | MR | Zbl | DOI

[4] P.C. Álvarez-Esteban, E. Del Barrio, J.A. Cuesta-Albertos and C. Matrán, Wide Consensus aggregation in the Wasserstein Space. Application to location-scatter families. Bernoulli 24 (2018) 3147–3179. | MR | Zbl

[5] L. Ambrosio, N. Gigli and G. Savará, Gradient flows in metric spaces and in the space of probability measures. Lectures in Mathematics ETH Zurich, 2nd edn., Birkhäuser Verlag, Basel (2008). | MR | Zbl

[6] C. Andrieu, N. De Freitas, A. Doucet and M.I. Jordan, An introduction to MCMC for machine learning. Mach. Learn. 50 (2003) 5–43. | Zbl | DOI

[7] J. Backhoff-Veraguas, J. Fontbona, G. Rios and F. Tobar, Stochastic gradient descent in Wasserstein space. Preprint [2022). | arXiv

[8] J.O. Berger, Statistical decision theory and Bayesian analysis. Springer Science & Business Media (2013). | MR | Zbl

[9] R.H. Berk et al., Limiting behavior of posterior distributions when the model is incorrect. Ann. Math. Stat. 37 (1966) 51–58. | MR | Zbl | DOI

[10] J. Bigot and T. Klein, Characterization of barycenters in the Wasserstein space by averaging optimal transport maps. ESAIM: Probab. Stat. 22 (2018) 35–57. | MR | Zbl | Numdam | DOI

[11] S. Brooks, A. Gelman, G. Jones and X.-L. Meng, Handbook of Markov chain Monte Carlo. CRC Press (2011). | Zbl | MR | DOI

[12] E. Cazelles, F. Tobar and J. Fontbona, A novel notion of barycenter for probability distributions based on optimal weak mass transport, 2021 Conference on Neural Information Processing Systems NeurIPS (2021) [ | arXiv

[13] S. Chewi, T. Maunu, P. Rigollet and A.J. Stromme, Gradient descent algorithms for Bures-Wasserstein barycenters, in Conference on Learning Theory, PMLR (2020) 1276–1304.

[14] J. Cuesta-Albertos, L. Ruschendorf and A. Tuero-Diaz, Optimal coupling of multivariate distributions and stochastic processes. J. Multivariate Anal. 46 (1993) 335–361. | MR | Zbl | DOI

[15] M. Cuturi and A. Doucet, Fast computation of Wasserstein barycenters, in International Conference on Machine Learning (2014) 685–693.

[16] M. Cuturi and G. Peyré, A smoothed dual approach for variational Wasserstein problems. SIAM J. Imag. Sci. 9 (2016) 320–343. | MR | Zbl | DOI

[17] P. Diaconis and D. Freedman, On the consistency of Bayes estimates. Ann. Stat. (1986) 1–26. | MR | Zbl

[18] P. Dognin, I. Melnyk, Y. Mroueh, J. Ross, C.D. Santos and T. Sercu, Wasserstein barycenter model ensembling. Preprint [2019). | arXiv

[19] D.C. Dowson and B.V. Landau, The Frechet distance between multivariate normal distributions. J. Multivariate Anal. 12 (1982) 450–455. | MR | Zbl | DOI

[20] T.A. El Moselhy and Y.M. Marzouk, Bayesian inference with optimal maps. J. Comput. Phys. 231 (2012) 7815–7850. | MR | Zbl | DOI

[21] R. Flamary and N. Courty, POT Python Optimal Transport library (2017).

[22] M. Fréchet, Les éléments aléatoires de nature quelconque dans un espace distancié, in Annales de l’institut Henri Poincará 10 (1948) 215–310. | MR | Zbl | Numdam

[23] S. Ghosal and A. Van Der Vaart, vol. 44 of Fundamentals of nonparametric Bayesian inference. Cambridge University Press (2017). | MR | Zbl

[24] C.R. Givens and R.M. Shortt, A class of Wasserstein metrics for probability distributions. Michigan Math. J. 31 (1984) 231–240. | MR | Zbl | DOI

[25] J. Goodman and J. Weare, Ensemble samplers with affine invariance. Commun. Appl. Math. Comput. Sci. 5 (2010) 65–80. | MR | Zbl | DOI

[26] M. Grendár and G. Judge, Asymptotic equivalence of empirical likelihood and Bayesian map. Ann. Statist. 37 (2009) 2445–2457. | MR | Zbl | DOI

[27] S. Kim, D. Mesa, R. Ma and T.P. Coleman, Tractable fully Bayesian inference via convex optimization and optimal transport theory. Preprint [2015). | arXiv

[28] Y.-H. Kim and B. Pass, Wasserstein barycenters over Riemannian manifolds. Adv. Math. 307 (2017) 640–683. | MR | Zbl | DOI

[29] B.J.K. Kleijn, Bayesian asymptotics under misspecification, Ph.D. thesis, Vrije Universiteit Amsterdam (2004).

[30] B.J.K. Kleijn and A.W. Van Der Vaart, The Bernstein-von-Mises theorem under misspecification. Electr. J. Stat. 6 (2012) 354–381. | MR | Zbl

[31] A. Korotin, V. Egiazarian, L. Lingxiao and E. Burnaev, Wasserstein Iterative Networks for Barycenter Estimation. [2022). | arXiv

[32] J. Lacombe, J. Digne, N. Courty and N. Bonneel, Learning to generate Wasserstein barycenters. Preprint [2021). | arXiv | MR | Zbl

[33] T. Le Gouic and J.-M. Loubes, Existence and consistency of Wasserstein Barycenters. Probab. Theory Related Fields 168 (2017) 901–917. | MR | Zbl | DOI

[34] A. Mallasto, A. Gerolin and H.Q. Minh, Entropy-regularized 2-Wasserstein distance between Gaussian measures. Inf. Geometry (2021) 1–35. | MR | Zbl

[35] Y. Marzouk, T. Moselhy, M. Parno and A. Spantini, Sampling via measure transport: An introduction. Handbook of Uncertainty Quantification (2016) 1–41. | MR

[36] P. Massart, Concentration Inequalities and Model Selection. Springer (2007). | MR | Zbl

[37] K.P. Murphy, Machine learning: a probabilistic perspective. The MIT Press, Cambridge, MA (2012). | Zbl

[38] V.M. Panaretos and Y. Zemel, An invitation to statistics in Wasserstein space. SpringerBriefs in Probability and Mathematical Statistics, Springer, Cham (2020). | MR | Zbl

[39] M. Parno, Transport maps for accelerated Bayesian computation, Ph.D. thesis, Massachusetts Institute of Technology (2015). | MR

[40] B. Pass, Optimal transportation with infinitely many marginals. J. Funct. Anal. 264 (2013) 947–963. | MR | Zbl | DOI

[41] G. Peyré and M. Cuturi, Computational optimal transport: with applications to data science. Found. Trends Mach. Learn. 11 (2019) 355–607. | DOI

[42] H. Robbins and S. Monro, A stochastic approximation method. Ann. Math. Stat. (1951) 400–407. | MR | Zbl | DOI

[43] F. Santambrogio, Optimal transport for applied mathematicians. Birkauser, NY (2015) 99–102. | MR | Zbl

[44] L. Schwartz, On Bayes procedures. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 4 (1965) 10–26. | MR | Zbl | DOI

[45] C. Villani, Topics in optimal transportation. Graduate Studies in Mathematics, vol. 58, American Mathematical Society, Providence, RI (2003). | MR | Zbl

[46] C. Villani, Optimal transport. Old and new, Grundlehren der mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], vol. 338. Springer-Verlag, Berlin (2009). | MR | Zbl

[47] R. Wang, X. Wang and L. Wu, Sanov’s theorem in the Wasserstein distance: a necessary and sufficient condition. Stat. Probab. Lett. 80 (2010) 505–512. | MR | Zbl | DOI

[48] Y. Zemel and V. Panaretos, Fréchet means and Procrustes analysis in Wasserstein space. Bernoulli 25 (2019) 932–976. | MR | Zbl | DOI

Cité par Sources :

Partially funded by ANID-Chile grants: Fondecyt-Regular 1201948 (JF) and 1210606 (FT); Center for Mathematical Modeling ACE210010 and FB210005 (JF, FT); and Advanced Center for Electrical and Electronic Engineering FB0008 (FT).