Numéro spécial : analyse des données fonctionnelles
Classification bayésienne non supervisée de données fonctionnelles
Journal de la société française de statistique, Tome 155 (2014) no. 2, pp. 185-201.

Nous nous intéressons à la classification bayésienne non supervisée de données fonctionnelles. Nous généralisons un modèle de classification de données basé sur le processus de Dirichlet, pour les données fonctionnelles. Contrairement à d’autres articles qui utilisent la dimension finie en projetant les courbes dans des bases de fonctions, ou en considérant les courbes aux temps d’observation, les calculs sont ici réalisés sur les courbes complètes en dimension infinie. Le cadre des espaces de Hilbert à noyau reproduisant nous permet alors d’exprimer les densités, en dimension infinie, des courbes par rapport à une mesure gaussienne. Nous proposons un algorithme généralisant l’algorithme Gibbs with Auxiliary Parameters ( Neal, 2000 ) dans le cas de processus. Les performances sont comparées à celles d’une autre méthode déjà existante, puis discutées.

We are interested in unsupervised bayesian clustering for functional data. We generalize a data clustering model based on the Dirichlet process, to functional data. Contrary to other papers making use of finite dimension, by decomposing curves into arbitrary basis functions, or by considering curves at their observation times, calculations are here realized onto complete curves in infinite dimension. The reproducing kernel Hilbert space theory permits us to derive densities of curves in respect to a gaussian measure. We thus propose a generalization to the algorithm Gibbs with Auxiliary Parameters ( Neal, 2000 ), to the functional case. Performances are compared to those of an already existing method, and then discussed.

Mot clés : statistique bayésienne, données fonctionnelles, processus de Dirichlet, classification de courbes, MCMC
Keywords: bayesian statistics, functional data, Dirichlet process, curve clustering, MCMC
@article{JSFS_2014__155_2_185_0,
     author = {Juery, Damien and Abraham, Christophe and Fontez, B\'en\'edicte},
     title = {Classification bay\'esienne non supervis\'ee de donn\'ees fonctionnelles},
     journal = {Journal de la soci\'et\'e fran\c{c}aise de statistique},
     pages = {185--201},
     publisher = {Soci\'et\'e fran\c{c}aise de statistique},
     volume = {155},
     number = {2},
     year = {2014},
     zbl = {1316.62089},
     language = {fr},
     url = {http://www.numdam.org/item/JSFS_2014__155_2_185_0/}
}
TY  - JOUR
AU  - Juery, Damien
AU  - Abraham, Christophe
AU  - Fontez, Bénédicte
TI  - Classification bayésienne non supervisée de données fonctionnelles
JO  - Journal de la société française de statistique
PY  - 2014
SP  - 185
EP  - 201
VL  - 155
IS  - 2
PB  - Société française de statistique
UR  - http://www.numdam.org/item/JSFS_2014__155_2_185_0/
LA  - fr
ID  - JSFS_2014__155_2_185_0
ER  - 
%0 Journal Article
%A Juery, Damien
%A Abraham, Christophe
%A Fontez, Bénédicte
%T Classification bayésienne non supervisée de données fonctionnelles
%J Journal de la société française de statistique
%D 2014
%P 185-201
%V 155
%N 2
%I Société française de statistique
%U http://www.numdam.org/item/JSFS_2014__155_2_185_0/
%G fr
%F JSFS_2014__155_2_185_0
Juery, Damien; Abraham, Christophe; Fontez, Bénédicte. Classification bayésienne non supervisée de données fonctionnelles. Journal de la société française de statistique, Tome 155 (2014) no. 2, pp. 185-201. http://www.numdam.org/item/JSFS_2014__155_2_185_0/

[1] Aldous, D Exchangeability and related topics, Springer (1985), pp. 1-198 | Zbl

[2] Antoniak, C E Mixtures of Dirichlet Processes with applications to Bayesian nonparametric problems, Ann. Stat., Volume 2 (1974) no. 6, pp. 1152-1174 | Zbl

[3] Blei, D M; Jordan, M I Variational Inference for Dirichlet Process Mixtures, Bayesian Analysis, Volume 1 (2006) no. 1, pp. 121-144 | Zbl

[4] Bouveyron, C; Jacques, J Model-based clustering of time series in group-specific functional subspaces, Advances in Data Analysis and Classification, Volume 5 (2011), pp. 281-300 | Zbl

[5] Blackwell, D; MacQueen, J B Ferguson Distributions via Pòlya urn schemes, Ann. Stat., Volume 1 (1973) no. 2, pp. 353-355 | Zbl

[6] Berlinet, A; Thomas-Agnan, C Reproducing kernel Hilbert spaces in Probability and Statistics, Kluwer Academic Publishers, 2004 | Zbl

[7] Crandell, J L; Dunson, D B Posterior simulation across nonparametric models for functional clustering, Sankhya B (2011) no. 73, pp. 42-61 | Zbl

[8] Chiou, J M; Li, P L Functional clustering and identifying substructures of longitudinal data, J. R. Stat. Soc. Ser. B Stat. Methodol., Volume 69 (2007), pp. 679-699

[9] Dahl, D B Sequentially-Allocated Merge-Split Sampler for Conjugate and Nonconjugate Dirichlet Process Mixture Models (2005), pp. 1-26 (Technical report)

[10] Dahl, D B 10, Model-Based Clustering for Expression Data via a Dirichlet Process Mixture Model, Cambridge University Press (2006), pp. 201-218

[11] Escobar, M D Estimating Normal Means With a Dirichlet Process Prior, J. Am. Stat. Assoc., Volume 89 (1994) no. 425, pp. 268-277 | Zbl

[12] Escobar, M D; West, M Bayesian Density Estimation and Inference Using Mixtures, J. Am. Stat. Assoc., Volume 90 (1995) no. 430, pp. 577-588 | Zbl

[13] Ferguson, T S A Bayesian analysis of some nonparametric problems, Ann. Stat., Volume 1 (1973) no. 2, pp. 209-230 | Zbl

[14] Gelfand, A E; Kottas, A; MacEachern, S N Bayesian Nonparametric Spatial Modeling With Dirichlet Process Mixing, J. Am. Stat. Assoc., Volume 100 (2005) no. 471, pp. 1021-1035 | Zbl

[15] Ishwaran, H; Zarepour, M Exact and approximate sum representations for the Dirichlet process, Can. J. Stat., Volume 30 (2002) no. 2, pp. 269-283 | Zbl

[16] Jackson, E; Davy, M; Doucet, A; Fitzgerald, W J Bayesian Unsupervised Signal Classification by Dirichlet Process Mixtures of Gaussian Processes, IEEE Int. Conf. Acoust. Spee., 2007 (2007), pp. 1077-1080

[17] Jain, S; Neal, R M A Split-Merge Markov Chain Monte Carlo Procedure for the Dirichlet Process Mixture Model, J. Comput. Graph. Stat., Volume 13 (2004) no. 1, pp. 158-182

[18] Jacques, J; Preda, C Funclust : a curves clustering method using functional random variable density approximation, Neurocomputing (2013) no. 112, pp. 164-171

[19] James, G M; Sugar, C A Clustering for sparsely sampled functional data, J. Am. Stat. Assoc., Volume 98 (2003) no. 462, pp. 397-408 | Zbl

[20] Kailath, T A General Likelihood-Ratio Formula for Random Signals in Gaussian Noise, IEEE T. Inform. Theory, Volume 15 (1969) no. 3, pp. 350-361 | Zbl

[21] Kailath, T; Poor, H V Detection of Stochastic Processes, IEEE T. Inform. Theory, Volume 44 (1998) no. 6, pp. 2230-2259 | Zbl

[22] Kimura, T; Tokuda, T; Nakada, Y; Nokajima, T; Matsumoto, T; Doucet, A Expectation-maximization algorithms for inference in Dirichlet processes mixture, Pattern Analysis and Applications, Volume 16 (2013) no. 1, pp. 55 - 67 | Zbl

[23] Kim, S; Tadesse, M G; Vannucci, M Variable selection in clustering via Dirichlet process mixture models, Biometrika, Volume 93 (2006) no. 4, pp. 877-893 | Zbl

[24] Liu, J S Nonparametric hierarchical Bayes via sequential imputations, Ann. Stat., Volume 24 (1996) no. 3, pp. 911-930 | Zbl

[25] Luan, Y; Li, H Model-based methods for identifying periodically expressed genes based on time course microarray gene expression data, Bioinformatics, Volume 20 (2004) no. 3, pp. 332-339

[26] MacEachern, S N Estimating Normal Means With a Conjugate Style Dirichlet Process Prior, Commun. Stat. Simulat. (1994) no. 23, pp. 727 - 741 | Zbl

[27] Ma, P; Castillo-Davis, C I; Zhong, W; Liu, J S A data-driven clustering method for time course gene expression data, Nucleic Acids Res., Volume 34 (2006) no. 4, pp. 1261-1269

[28] Miller, J W; Harrison, M T A simple example of Dirichlet process mixture inconsistency for the number of components (2013), pp. 1 - 8 (Technical report) | arXiv

[29] Miller, J W; Harrison, M T Inconsistency of Pitman-Yor process mixtures for the number of components (2013), pp. 1 - 28 (Technical report) | arXiv

[30] MacEachern, S N; Müller, P Estimating Mixture of Dirichlet Process Models, J. Comput. Graph. Stat., Volume 7 (1998) no. 2, pp. 223-238

[31] Neal, R M Markov chain sampling methods for Dirichlet Process mixture models, J. Comput. Graph. Stat., Volume 9 (2000) no. 2, pp. 249-265

[32] Parzen, E Statistical inference on time series by Hilbert space methods I. (1959) (Technical report)

[33] Parzen, E Regression Analysis of Continuous Parameter Time Series, Proc. Fourth Berkeley Symp. on Math. Statist. and Prob., University of California Press, Stanford University (1961), pp. 469-489 | Zbl

[34] Parzen, E Probability Density Functionals and Reproducing Kernel Hilbert Spaces, Time Series Analysis, Wiley, 1963, pp. 155-169 (11) | Zbl

[35] Ray, S; Mallick, B Functional clustering by Bayesian wavelet methods, J. R. Stat. Soc. Ser. B Stat. Methodol., Volume 68 (2006) no. 2, pp. 305-332 | Zbl

[36] Schervish, M J Theory of Statistics, Springer-Verlag, 1995 | Zbl

[37] Shepp, L A Radon-Nikodym Derivatives of Gaussian Measures, Ann. Math. Stat., Volume 37 (1966) no. 2, pp. 321-354 | Zbl

[38] Stratonovich, R L; Sosulin, Y G Optimal Detection of a Markov Process in Noise, Eng. Cybernet, Volume 6 (1964), pp. 7 - 19

[39] Shi, J Q; Wang, B Curve prediction and clustering with mixtures of Gaussian process functional regression models, Stat. Comput., Volume 18 (2008) no. 3, pp. 267-283

[40] Tuddenham, R D; Snyder, M M Physical Growth of California Boys and Girls from Birth to Eighteen years, University of California Press (1954), pp. 183-364

[41] Yi, G; Shi, J Q; Choi, T Penalized Gaussian Process Regression and Classification for High-Dimensional Nonlinear Data, Biometrics, Volume 67 (2011) no. 4, pp. 1285-1294 | Zbl