We introduce and study a novel model-selection strategy for Bayesian learning, based on optimal transport, along with its associated predictive posterior law: the Wasserstein population barycenter of the posterior law over models. We first show how this estimator, termed Bayesian Wasserstein barycenter (BWB), arises naturally in a general, parameter-free Bayesian model-selection framework, when the considered Bayesian risk is the Wasserstein distance. Examples are given, illustrating how the BWB extends some classic parametric and non-parametric selection strategies. Furthermore, we also provide explicit conditions granting the existence and statistical consistency of the BWB, and discuss some of its general and specific properties, providing insights into its advantages compared to usual choices, such as the model average estimator. Finally, we illustrate how this estimator can be computed using the stochastic gradient descent (SGD) algorithm in Wasserstein space introduced in a companion paper, and provide a numerical example for experimental validation of the proposed method.
Keywords: Bayesian learning, non-parametric estimation, Wasserstein distance and barycenter, consistency, MCMC, stochastic gradient descent in Wasserstein space
@article{PS_2022__26_1_436_0,
author = {Backhoff-Veraguas, Julio and Fontbona, Joaquin and Rios, Gonzalo and Tobar, Felipe},
title = {Bayesian learning with {Wasserstein} barycenters},
journal = {ESAIM: Probability and Statistics},
pages = {436--472},
year = {2022},
publisher = {EDP-Sciences},
volume = {26},
doi = {10.1051/ps/2022015},
mrnumber = {4519227},
zbl = {07730417},
language = {en},
url = {https://www.numdam.org/articles/10.1051/ps/2022015/}
}
TY - JOUR AU - Backhoff-Veraguas, Julio AU - Fontbona, Joaquin AU - Rios, Gonzalo AU - Tobar, Felipe TI - Bayesian learning with Wasserstein barycenters JO - ESAIM: Probability and Statistics PY - 2022 SP - 436 EP - 472 VL - 26 PB - EDP-Sciences UR - https://www.numdam.org/articles/10.1051/ps/2022015/ DO - 10.1051/ps/2022015 LA - en ID - PS_2022__26_1_436_0 ER -
%0 Journal Article %A Backhoff-Veraguas, Julio %A Fontbona, Joaquin %A Rios, Gonzalo %A Tobar, Felipe %T Bayesian learning with Wasserstein barycenters %J ESAIM: Probability and Statistics %D 2022 %P 436-472 %V 26 %I EDP-Sciences %U https://www.numdam.org/articles/10.1051/ps/2022015/ %R 10.1051/ps/2022015 %G en %F PS_2022__26_1_436_0
Backhoff-Veraguas, Julio; Fontbona, Joaquin; Rios, Gonzalo; Tobar, Felipe. Bayesian learning with Wasserstein barycenters. ESAIM: Probability and Statistics, Tome 26 (2022), pp. 436-472. doi: 10.1051/ps/2022015
[1] and , Barycenters in the Wasserstein space. SIAM J. Math. Anal. 43 (2011) 904–924. | MR | Zbl | DOI
[2] and , Wasserstein barycenters are NP-hard to compute. Preprint [2021). | arXiv | MR | Zbl
[3] , , and , A fixed-point approach to barycenters in Wasserstein space. J. Math. Anal. Appi. 441 (2016) 744–762. | MR | Zbl | DOI
[4] , , and , Wide Consensus aggregation in the Wasserstein Space. Application to location-scatter families. Bernoulli 24 (2018) 3147–3179. | MR | Zbl
[5] , and , Gradient flows in metric spaces and in the space of probability measures. Lectures in Mathematics ETH Zurich, 2nd edn., Birkhäuser Verlag, Basel (2008). | MR | Zbl
[6] , , and , An introduction to MCMC for machine learning. Mach. Learn. 50 (2003) 5–43. | Zbl | DOI
[7] , , and , Stochastic gradient descent in Wasserstein space. Preprint [2022). | arXiv
[8] , Statistical decision theory and Bayesian analysis. Springer Science & Business Media (2013). | MR | Zbl
[9] et al., Limiting behavior of posterior distributions when the model is incorrect. Ann. Math. Stat. 37 (1966) 51–58. | MR | Zbl | DOI
[10] and , Characterization of barycenters in the Wasserstein space by averaging optimal transport maps. ESAIM: Probab. Stat. 22 (2018) 35–57. | MR | Zbl | Numdam | DOI
[11] , , and , Handbook of Markov chain Monte Carlo. CRC Press (2011). | Zbl | MR | DOI
[12] , and , A novel notion of barycenter for probability distributions based on optimal weak mass transport, 2021 Conference on Neural Information Processing Systems NeurIPS (2021) [ | arXiv
[13] , , and , Gradient descent algorithms for Bures-Wasserstein barycenters, in Conference on Learning Theory, PMLR (2020) 1276–1304.
[14] , and , Optimal coupling of multivariate distributions and stochastic processes. J. Multivariate Anal. 46 (1993) 335–361. | MR | Zbl | DOI
[15] and , Fast computation of Wasserstein barycenters, in International Conference on Machine Learning (2014) 685–693.
[16] and , A smoothed dual approach for variational Wasserstein problems. SIAM J. Imag. Sci. 9 (2016) 320–343. | MR | Zbl | DOI
[17] and , On the consistency of Bayes estimates. Ann. Stat. (1986) 1–26. | MR | Zbl
[18] , , , , and , Wasserstein barycenter model ensembling. Preprint [2019). | arXiv
[19] and , The Frechet distance between multivariate normal distributions. J. Multivariate Anal. 12 (1982) 450–455. | MR | Zbl | DOI
[20] and , Bayesian inference with optimal maps. J. Comput. Phys. 231 (2012) 7815–7850. | MR | Zbl | DOI
[21] and , POT Python Optimal Transport library (2017).
[22] , Les éléments aléatoires de nature quelconque dans un espace distancié, in Annales de l’institut Henri Poincará 10 (1948) 215–310. | MR | Zbl | Numdam
[23] and , vol. 44 of Fundamentals of nonparametric Bayesian inference. Cambridge University Press (2017). | MR | Zbl
[24] and , A class of Wasserstein metrics for probability distributions. Michigan Math. J. 31 (1984) 231–240. | MR | Zbl | DOI
[25] and , Ensemble samplers with affine invariance. Commun. Appl. Math. Comput. Sci. 5 (2010) 65–80. | MR | Zbl | DOI
[26] and , Asymptotic equivalence of empirical likelihood and Bayesian map. Ann. Statist. 37 (2009) 2445–2457. | MR | Zbl | DOI
[27] , , and , Tractable fully Bayesian inference via convex optimization and optimal transport theory. Preprint [2015). | arXiv
[28] and , Wasserstein barycenters over Riemannian manifolds. Adv. Math. 307 (2017) 640–683. | MR | Zbl | DOI
[29] , Bayesian asymptotics under misspecification, Ph.D. thesis, Vrije Universiteit Amsterdam (2004).
[30] and , The Bernstein-von-Mises theorem under misspecification. Electr. J. Stat. 6 (2012) 354–381. | MR | Zbl
[31] , , and , Wasserstein Iterative Networks for Barycenter Estimation. [2022). | arXiv
[32] , , and , Learning to generate Wasserstein barycenters. Preprint [2021). | arXiv | MR | Zbl
[33] and , Existence and consistency of Wasserstein Barycenters. Probab. Theory Related Fields 168 (2017) 901–917. | MR | Zbl | DOI
[34] , and , Entropy-regularized 2-Wasserstein distance between Gaussian measures. Inf. Geometry (2021) 1–35. | MR | Zbl
[35] , , and , Sampling via measure transport: An introduction. Handbook of Uncertainty Quantification (2016) 1–41. | MR
[36] , Concentration Inequalities and Model Selection. Springer (2007). | MR | Zbl
[37] , Machine learning: a probabilistic perspective. The MIT Press, Cambridge, MA (2012). | Zbl
[38] and , An invitation to statistics in Wasserstein space. SpringerBriefs in Probability and Mathematical Statistics, Springer, Cham (2020). | MR | Zbl
[39] , Transport maps for accelerated Bayesian computation, Ph.D. thesis, Massachusetts Institute of Technology (2015). | MR
[40] , Optimal transportation with infinitely many marginals. J. Funct. Anal. 264 (2013) 947–963. | MR | Zbl | DOI
[41] and , Computational optimal transport: with applications to data science. Found. Trends Mach. Learn. 11 (2019) 355–607. | DOI
[42] and , A stochastic approximation method. Ann. Math. Stat. (1951) 400–407. | MR | Zbl | DOI
[43] , Optimal transport for applied mathematicians. Birkauser, NY (2015) 99–102. | MR | Zbl
[44] , On Bayes procedures. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 4 (1965) 10–26. | MR | Zbl | DOI
[45] , Topics in optimal transportation. Graduate Studies in Mathematics, vol. 58, American Mathematical Society, Providence, RI (2003). | MR | Zbl
[46] , Optimal transport. Old and new, Grundlehren der mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], vol. 338. Springer-Verlag, Berlin (2009). | MR | Zbl
[47] , and , Sanov’s theorem in the Wasserstein distance: a necessary and sufficient condition. Stat. Probab. Lett. 80 (2010) 505–512. | MR | Zbl | DOI
[48] and , Fréchet means and Procrustes analysis in Wasserstein space. Bernoulli 25 (2019) 932–976. | MR | Zbl | DOI
Cité par Sources :
Partially funded by ANID-Chile grants: Fondecyt-Regular 1201948 (JF) and 1210606 (FT); Center for Mathematical Modeling ACE210010 and FB210005 (JF, FT); and Advanced Center for Electrical and Electronic Engineering FB0008 (FT).





