Dans cet article, nous proposons de substituer aux régressions linéaires multivariées classiques des sous modélisations plus parcimonieuses construites à l’aide de réseaux bayésiens gaussiens. L’idée est d’améliorer la prédiction de variables par des covariables, grâce à une réduction sensible de la dimension paramétrique de la matrice de variance-covariance. Une mise en œuvre est développée par l’utilisation de DAG (graphe orienté sans circuit) structurés lorsque l’ensemble des nœuds à modéliser est un produit cartésien de deux ensembles. Un certain nombre de propriétés intéressantes de ces DAG et des réseaux bayésiens associés en découle. Une expérimentation numérique basée sur des données simulées est réalisée pour vérifier la faisabilité de la proposition à partir de données lorsque la structure du DAG n’est pas connue. Enfin, la proposition est appliquée à la prédiction de la composition corporelle à partir de covariables faciles à obtenir. Les résultats obtenus par une recherche systématique de cette classe de réseaux bayésiens sont comparés avec la prédiction du modèle saturé de regression multiple multivariée.
Linear Gaussian Bayesian networks can dramatically reduce the parametric dimension of the covariance matrices in the framework of multivariate multiple regression models. This idea is developed using structured, crossed directed acyclic graphs (DAGs) when node sets can be interpreted as the cartesian product of two sets. Some interesting properties of these DAGs are shown as well as the probability distributions of the associated Bayesian networks. A numerical experiment on simulated data was performed to check that the idea could be applied in practice. This modelling is applied to the prediction of body composition from easily measurable covariates and compared with the results of a saturated regression prediction.
Mot clés : réseau bayésien, DAG croisé, régression multiple multivariée, prédiction
@article{JSFS_2014__155_3_1_0, author = {Tian, Simiao and Scutari, Marco and Denis, Jean-Baptiste}, title = {Crossed {Linear} {Gaussian} {Bayesian} {Networks,} parsimonious models}, journal = {Journal de la soci\'et\'e fran\c{c}aise de statistique}, pages = {1--21}, publisher = {Soci\'et\'e fran\c{c}aise de statistique}, volume = {155}, number = {3}, year = {2014}, mrnumber = {3272707}, zbl = {1316.62103}, language = {en}, url = {http://www.numdam.org/item/JSFS_2014__155_3_1_0/} }
TY - JOUR AU - Tian, Simiao AU - Scutari, Marco AU - Denis, Jean-Baptiste TI - Crossed Linear Gaussian Bayesian Networks, parsimonious models JO - Journal de la société française de statistique PY - 2014 SP - 1 EP - 21 VL - 155 IS - 3 PB - Société française de statistique UR - http://www.numdam.org/item/JSFS_2014__155_3_1_0/ LA - en ID - JSFS_2014__155_3_1_0 ER -
%0 Journal Article %A Tian, Simiao %A Scutari, Marco %A Denis, Jean-Baptiste %T Crossed Linear Gaussian Bayesian Networks, parsimonious models %J Journal de la société française de statistique %D 2014 %P 1-21 %V 155 %N 3 %I Société française de statistique %U http://www.numdam.org/item/JSFS_2014__155_3_1_0/ %G en %F JSFS_2014__155_3_1_0
Tian, Simiao; Scutari, Marco; Denis, Jean-Baptiste. Crossed Linear Gaussian Bayesian Networks, parsimonious models. Journal de la société française de statistique, Tome 155 (2014) no. 3, pp. 1-21. http://www.numdam.org/item/JSFS_2014__155_3_1_0/
[1] An introduction to multivariate statistical analysis, Wiley, 2003 | MR
[2] Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach, Springer, 2010
[3] deal: Learning Bayesian Networks with Mixed Variables (2012) http://CRAN.R-project.org/package=deal (R package version 1.2-35)
[4] Digraphs: Theory, Algorithms and Applications, Springer, 2009 | MR
[5] Model Selection and Model Averaging, Cambridge University Press, 2008 | MR | Zbl
[6] Sélection bayésienne de variables en régression linéaire, Journal de la Société Française de Statistique, Volume 147 (2006) no. 1, pp. 59-79 | MR | Zbl
[7] The igraph software package for complex network research, InterJournal, Volume Complex Systems (2006) http://igraph.sf.net
[8] rbmn: Handling Linear Gaussian Bayesian Networks (2013) http://CRAN.R-project.org/package=rbmn (R package version 0.9-2)
[9] Asymptotic confidence regions for biadditive models: interpreting genotype-environment interactions, Applied Statistics, Volume 45 (1996) no. 4, pp. 479-493
[10] Sparse Inverse Covariance Estimation With the Graphical Lasso, Biostatistics, Volume 9 (2007), pp. 432-441 | Zbl
[11] Regularization Paths for Generalized Linear Models via Coordinate Descent, Journal of Statistical Software, Volume 33 (2010) no. 1, pp. 1-22 http://www.jstatsoft.org/v33/i01/
[12] Learning the structure of dynamic probabilistic networks, Proceeding of the 14th Conference on Uncertainty and Artificial Intelligence (UAI’98), Morgan Kaufmann (1998), pp. 139-147
[13] Heteroskedasticity and Structural Models for Variances, Jour. Ind. Soc. Ag. Slatistics, Volume 57 (2004), pp. 64-70 | MR | Zbl
[14] Multivariate Statistical Modelling based on Generalized Linear Models, Springer-Verlag, 1994 | MR | Zbl
[15] Learning Dynamic Bayesian Networks, Lecture Notes In Computer Science, Springer, 1997 no. 1387, pp. 168-197
[16] Theory and application of the linear model, Duxbury Press, 1976 | MR | Zbl
[17] The Elements of Statistical Learning: Data Mining, Inference and Prediction, Springer, 2009 | MR
[18] Graphs, Networks and Algorithms, Springer, 2013 | MR | Zbl
[19] Probabilistic Graphical Models: Principles and Techniques, MIT Press, 2009 | MR | Zbl
[20] Causal Inference Using Graphical Models with the R Package pcalg, Journal of Statistical Software, Volume 47 (2012) no. 11, pp. 1-26 http://www.jstatsoft.org/v47/i11/
[21] Bayesian Artificial Intelligence, CRC press, 2011 | MR | Zbl
[22] Réseaux bayésiens : apprentissage et modélisation de systèmes complexes, Université de Rouen (2006), 85 pages (Ph. D. Thesis Habilitation à Diriger des Recherches)
[23] The bugs Book. A practical introduction to Bayesian analysis, CRC press, 2013 | Zbl
[24] New Introduction to Multiple Time Series Analysis, Springer, 2005 | MR | Zbl
[25] Subset selection in regression, Boca Raton: Chapman & Hall / CRC, 2002 | MR
[26] Dynamic Bayesian Networks: Representation, Inference and Learning, University of California, Berkeley (2002), 268 pages (Ph. D. Thesis PhD dissertation) | MR
[27] Fitting a Conditional Gaussian Distribution (1998) http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.32.9092 (Technical report)
[28] Learning Bayesian Networks, Prentice Hall, 2003
[29] Bayesian Networks in R with Applications in Systems Biology, Springer, 2013 | MR | Zbl
[30] Réseaux bayésiens, Eyrolles, 2004
[31] Causality: Models, Reasoning and Inference, Cambridge University Press, 2009 | MR | Zbl
[32] Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, Morgan Kaufmann, 1988 | MR | Zbl
[33] Jags: A program for analysis of bayesian graphical models using gibbs sampling., Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003) (2003)
[34] R: A Language and Environment for Statistical Computing (2013) http://www.R-project.org/
[35] Learning Bayesian Networks with the bnlearn R Package, Journal of Statistical Software, Volume 35 (2010) no. 3, pp. 1-22 http://www.jstatsoft.org/v35/i03/
[36] Bayesian Networks with Examples in R, Chapman & Hall, 2014 (in print) | MR | Zbl
[37] Algorithms, Addison-Wesley, 2011 | Zbl
[38] Multiple Quantitative Trait Analysis Using Bayesian Networks, Genetics (2014) http://www.jstatsoft.org/v33/i01/ (in print)
[39] Applied Multivariate Analysis, Springer, 2002 | MR | Zbl
[40] A multivariate model for predicting segmental body composition, British Journal of Nutrition, Volume 110(12) (2013), p. 2260-70
[41] Graphical Models in Applied Multivariate Statistics, Wiley, 1990 | MR