[Données spatio-temporelles avec R : tout ce que vous avez toujours voulu savoir sans jamais avoir osé le demander]
Nous présentons un aperçu des modèles, méthodes et techniques (géo-)statistiques pour l’analyse et la prévision de processus spatio-temporels continus. De nombreuses approches sont possibles pour la construction de modèles statistiques pour ces processus, l’estimation de leurs paramètres et leur prédiction. Nous avons choisi de présenter l’approche par processus gaussien, la plus communément utilisée en statistiques spatiales et en géostatistiques, ainsi que son implémentation avec le logiciel R. La variable cible est la moyenne de la concentration quotidienne PM à l’échelle de la France, prédite à l’aide d’un modèle de transport en chimie de l’atmosphère et de séries d’observations obtenues à des stations de surveillance de la qualité de l’air. En suivant le fil d’une application réelle de grande dimension, nous comparons certains des paquets R les plus utilisés. Le code R permettant la visualisation des données, l’estimation des paramètres de la fonction de covariance spatio-temporelle ainsi que la sélection d’un modèle et la prédiction de la concentration de PM est également présenté afin d’illustrer l’enchaînement des étapes. Nous concluons avec une comparaison entre les paquets qui sont disponibles aujourd’hui et ainsi que les pistes de développement qui nous paraissent intéressantes.
We present an overview of (geo-)statistical models, methods and techniques for the analysis and prediction of continuous spatio-temporal processes residing in continuous space. Various approaches exist for building statistical models for such processes, estimating their parameters and performing predictions. We cover the Gaussian process approach, very common in spatial statistics and geostatistics, and we focus on R-based implementations of numerical procedures. To illustrate and compare the use of some of the most relevant packages, we treat a real-world application with high-dimensional data. The target variable is the daily mean PM concentration predicted thanks to a chemistry-transport model and observation series collected at monitoring stations across France in 2014. We give R code covering the full work-flow from importing data sets to the prediction of PM concentrations with a fitted parametric model, including the visualization of data, estimation of the parameters of the spatio-temporal covariance function and model selection. We conclude with some elements of comparison between the packages that are available today and some discussion for future developments.
Mot clés : Fonction de covariance, Géostatistique, Krigeage, Pollution atmosphérique
@article{JSFS_2017__158_3_124_0, author = {RESSTE Network et al.}, title = {Analyzing spatio-temporal data with {R:} {Everything} you always wanted to know {\textendash} but were afraid to ask}, journal = {Journal de la soci\'et\'e fran\c{c}aise de statistique}, pages = {124--158}, publisher = {Soci\'et\'e fran\c{c}aise de statistique}, volume = {158}, number = {3}, year = {2017}, mrnumber = {3720133}, zbl = {1378.62139}, language = {en}, url = {http://www.numdam.org/item/JSFS_2017__158_3_124_0/} }
TY - JOUR AU - RESSTE Network et al. TI - Analyzing spatio-temporal data with R: Everything you always wanted to know – but were afraid to ask JO - Journal de la société française de statistique PY - 2017 SP - 124 EP - 158 VL - 158 IS - 3 PB - Société française de statistique UR - http://www.numdam.org/item/JSFS_2017__158_3_124_0/ LA - en ID - JSFS_2017__158_3_124_0 ER -
%0 Journal Article %A RESSTE Network et al. %T Analyzing spatio-temporal data with R: Everything you always wanted to know – but were afraid to ask %J Journal de la société française de statistique %D 2017 %P 124-158 %V 158 %N 3 %I Société française de statistique %U http://www.numdam.org/item/JSFS_2017__158_3_124_0/ %G en %F JSFS_2017__158_3_124_0
RESSTE Network et al. Analyzing spatio-temporal data with R: Everything you always wanted to know – but were afraid to ask. Journal de la société française de statistique, Tome 158 (2017) no. 3, pp. 124-158. http://www.numdam.org/item/JSFS_2017__158_3_124_0/
[1] A flexible class of non-separable cross-covariance functions for multivariate space–time data, Spatial Statistics, Volume 18A (2016), pp. 125-146 | MR
[2] Spatial and spatio-temporal Bayesian models with R-INLA, John Wiley & Sons, 2015 | MR
[3] Spatial and spatio-temporal models with R-INLA, Spatial and spatio-temporal epidemiology, Volume 7 (2013), pp. 39-55
[4] Comparing composite likelihood methods based on pairs for spatial Gaussian random fields, Statistics and Computing, Volume 25 (2015) no. 5, pp. 877-892 | MR | Zbl
[5] Estimating space and space-time covariance functions for large data sets: a weighted composite likelihood approach, Journal of the American Statistical Association, Volume 107 (2012) no. 497, pp. 268-280 | MR | Zbl
[6] GEODESY FOR THE LAYMAN. (1964) https://www.ngs.noaa.gov/PUBS_LIB/Geodesy4Layman/TR80003A.HTM (Technical report)
[7] Stem: Spatio-temporal models in R, R package version, Volume 1 (2009)
[8] Geostatistics: Modeling Spatial Uncertainty, 2nd edition, John Wiley & Sons, 2012 | MR | Zbl
[9] A simple spatial-temporal model of rainfall, Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, Volume 415, The Royal Society (1988) no. 1849, pp. 317-328 | MR
[10] Fixed rank kriging for very large spatial data sets, J. of the Royal Statist. Society, Series B, Volume 70 (2008), pp. 209-226 | MR | Zbl
[11] Spatio-temporal modeling of particulate matter concentration through the SPDE approach, AStA Advances in Statistical Analysis, Volume 97 (2013) no. 2, pp. 109-131 | MR | Zbl
[12] Statistics for Spatial Data: Wiley Series in Probability and Statistics, Wiley-Interscience New York, 1993 | MR
[13] Statistics for spatio-temporal data, John Wiley & Sons, 2015 | MR | Zbl
[14] Space–time analysis using a general product–sum model, Statistics & Probability Letters, Volume 52 (2001) no. 1, pp. 21-28 | MR | Zbl
[15] Nonseparable space-time covariance models: some parametric families, Mathematical Geology, Volume 34 (2002), pp. 23-42 | MR | Zbl
[16] data.table: Extension of Data.frame (2015) https://CRAN.R-project.org/package=data.table (R package version 1.9.6)
[17] Covariance tapering for interpolation of large spatial datasets, J. Computnl Graph. Statist. (2006), pp. 502-523 | MR
[18] Visuanimation in statistics, Stat, Volume 4 (2015) no. 1, pp. 81-96 | MR
[19] Geostatistical space-time models, stationarity, separability, and full symmetry, Monographs On Statistics and Applied Probability, Volume 107 (2006), pp. 151-175 | Zbl
[20] Compactly supported correlation functions, Journal of Multivariate Analysis, Volume 83 (2002) no. 2, pp. 493–-508 (Accessed 2014-04-02) | MR | Zbl
[21] Strictly proper scoring rules, prediction, and estimation, Journal of the American Statistical Association, Volume 102 (2007) no. 477, pp. 359-378 | MR | Zbl
[22] Efficient Approximation of the Spatial Covariance Function for Large Datasets - Analysis of Atmospheric CO Concentrations, Discussion Paper series recap15 (2013)
[23] plotKML: Scientific Visualization of Spatio-Temporal Data, Journal of Statistical Software, Volume 63 (2015) no. 5, pp. 1-25
[24] Covariance tapering for likelihood-based estimation in large spatial data sets, Journal of the American Statistical Association, Volume 103 (2008) no. 484, pp. 1545-1555 | MR | Zbl
[25] Composite likelihood methods, Contemporary Mathematics, Volume 80 (1988) no. 1, pp. 221-239 | MR | Zbl
[26] An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach, J. R. Statist. Soc. B, Volume 73 (2011), p. 423-298 | MR | Zbl
[27] SpatioTemporal: An R Package for Spatio-Temporal Modelling of Air-Pollution, J stat softw (in press)(http://cran. rproject. org/web/packages/SpatioTemporal/index. html) (2013)
[28] CHIMERE 2013: a model for regional atmospheric composition modelling, Geoscientific Model Development, Volume 6 (2013) no. 4, pp. 981-1028 http://www.geosci-model-dev.net/6/981/2013/ | DOI
[29] Analysis of Random Fields Using CompRandFld, Journal of Statistical Software, Volume 63 (2015) no. i09
[30] Multivariable geostatistics in S: the gstat package, Computers & Geosciences, Volume 30 (2004), pp. 683-691
[31] Spacetime: Spatio-Temporal Data in R, Journal of Statistical Software, Volume 51 (2012) no. 7, pp. 1-30 http://www.jstatsoft.org/v51/i07/
[32] Quasi-arithmetic means of covariance functions with potential applications to space–time data, Journal of Multivariate Analysis, Volume 100 (2009) no. 8, pp. 1830-1844 | MR | Zbl
[33] R: A Language and Environment for Statistical Computing (2013) http://www.R-project.org/
[34] Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations, Journal of the royal statistical society: Series b (statistical methodology), Volume 71 (2009) no. 2, pp. 319-392 | MR | Zbl
[35] xts: eXtensible Time Series (2014) https://CRAN.R-project.org/package=xts (R package version 0.9-7)
[36] A full scale approximation of covariance functions for large spatial data sets, J. R. Statist. Soc. B, Volume 74 (2012), pp. 111-132 | MR | Zbl
[37] spate: An R Package for Spatio-Temporal Modeling with a Stochastic Advection-Diffusion Process, Journal of Statistical Software, Volume 63 (2015) no. 14, pp. 1-23 http://www.jstatsoft.org/v63/i14/
[38] Analysis, Simulation and Prediction of Multivariate Random Fields with Package RandomFields, Journal of Statistical Software, Volume 63 (2015) no. 8, pp. 1-25 http://www.jstatsoft.org/v63/i08/
[39] RandomFields: Simulation and Analysis of Random Fields (2016) http://CRAN.R-project.org/package=RandomFields (R package version 3.1.16)
[40] Statistical properties of covariance tapers, J. Comput. Graph. Statist., Volume 22 (2013), pp. 866-885 | MR
[41] An overview of composite likelihood methods, Statistica Sinica, Volume 21 (2011) no. 1, pp. 5-42 | MR | Zbl
[42] dplyr: A Grammar of Data Manipulation (2016) https://CRAN.R-project.org/package=dplyr (R package version 0.5.0)
[43] ggplot2: Elegant Graphics for Data Analysis, Springer-Verlag New York, 2009 | Zbl