A pruned dynamic programming algorithm to recover the best segmentations with 1 to K m a x change-points
[Un algorithme de programmation dynamique élagué pour trouver les meilleures segmentations avec 1 à K m a x ruptures]
Journal de la société française de statistique, Tome 156 (2015) no. 4, pp. 180-205.

Dans les modèles de ruptures multiples la recherche des segmentations de coût minimal, au sens d’une certaine fonction de perte, avec une à K m a x ruptures est un problème algorithmique courant. Ici, nous présentons un algorithme, basé sur une représentation fonctionnelle des segmentations, qui élague l’ensemble des ruptures candidates. Nous étudions la complexité au pire de cet algorithme quand il y a un paramètre unidimensionnel par segment. Nous démontrons dans ce cas que ça complexité est équivalente à celle de l’algorithme « segment neighborhood » en 𝒪 ( K m a x n 2 ) . Pour une fonction de perte particulière nous démontrons que l’élagage est en moyenne efficace quand il n’y a pas de ruptures dans le signal. Enfin nous étudions empiriquement les performances de l’algorithme dans le cas de la perte quadratique et montrons qu’il est plus rapide que l’algorithme « segment neighborhood »

A common computational problem in multiple change-point models is to recover the segmentations with 1 to K m a x change-points of minimal cost with respect to some loss function. Here we present an algorithm to prune the set of candidate change-points which is based on a functional representation of the cost of segmentations. We study the worst case complexity of the algorithm when there is a unidimensional parameter per segment and demonstrate that it is at worst equivalent to the complexity of the segment neighbourhood algorithm: 𝒪 ( K m a x n 2 ) . For a particular loss function we demonstrate that pruning is on average efficient even if there are no change-points in the signal. Finally, we empirically study the performance of the algorithm in the case of the quadratic loss and show that it is faster than the segment neighbourhood algorithm.

Keywords: multiple-change-point detection, dynamic programming, functional cost, segment neighbourhood
Mot clés : détection de ruptures multiples, programmation dynamique, coût fonctionnel
@article{JSFS_2015__156_4_180_0,
     author = {Rigaill, Guillem},
     title = {A pruned dynamic programming algorithm to recover the best segmentations with $1$ to $K_{max}$ change-points},
     journal = {Journal de la soci\'et\'e fran\c{c}aise de statistique},
     pages = {180--205},
     publisher = {Soci\'et\'e fran\c{c}aise de statistique},
     volume = {156},
     number = {4},
     year = {2015},
     mrnumber = {3436653},
     zbl = {1381.90094},
     language = {en},
     url = {http://www.numdam.org/item/JSFS_2015__156_4_180_0/}
}
TY  - JOUR
AU  - Rigaill, Guillem
TI  - A pruned dynamic programming algorithm to recover the best segmentations with $1$ to $K_{max}$ change-points
JO  - Journal de la société française de statistique
PY  - 2015
SP  - 180
EP  - 205
VL  - 156
IS  - 4
PB  - Société française de statistique
UR  - http://www.numdam.org/item/JSFS_2015__156_4_180_0/
LA  - en
ID  - JSFS_2015__156_4_180_0
ER  - 
%0 Journal Article
%A Rigaill, Guillem
%T A pruned dynamic programming algorithm to recover the best segmentations with $1$ to $K_{max}$ change-points
%J Journal de la société française de statistique
%D 2015
%P 180-205
%V 156
%N 4
%I Société française de statistique
%U http://www.numdam.org/item/JSFS_2015__156_4_180_0/
%G en
%F JSFS_2015__156_4_180_0
Rigaill, Guillem. A pruned dynamic programming algorithm to recover the best segmentations with $1$ to $K_{max}$ change-points. Journal de la société française de statistique, Tome 156 (2015) no. 4, pp. 180-205. http://www.numdam.org/item/JSFS_2015__156_4_180_0/

[1] Arlot, Sylvain; Celisse, Alain; Harchaoui, Zaid Kernel change-point detection, arXiv preprint arXiv:1202.3878 (2012)

[2] Auger, Ivan E; Lawrence, Charles E Algorithms for the optimal identification of segment neighborhoods, Bulletin of mathematical biology, Volume 51 (1989) no. 1, pp. 39-54 | MR | Zbl

[3] Bellman, Richard On the approximation of curves by line segments using dynamic programming, Communications of the ACM, Volume 4 (1961) no. 6, pp. 284-286 | Zbl

[4] Bai, Jushan; Perron, Pierre Computation and analysis of multiple structural change models, Journal of Applied Econometrics, Volume 18 (2003) no. 1, pp. 1-22 http://econpapers.repec.org/article/jaejapmet/v_3a18_3ay_3a2003_3ai_3a1_3ap_3a1-22.htm | DOI

[5] Cleynen, Alice; Koskas, Michel; Lebarbier, Emilie; Rigaill, Guillem; Robin, Stéphane Segmentor3IsBack: an R package for the fast and exact segmentation of Seq-data., Algorithms for Molecular Biology, Volume 9 (2014) | DOI

[6] Cleynen, Alice; Lebarbier, Emilie Segmentation of the Poisson and negative binomial rate models: a penalized estimator, ESAIM, P&S, Volume 18 (2014), pp. 750-769 | MR | Zbl

[7] Fisher, Walter D On grouping for maximum homogeneity, Journal of the American Statistical Association, Volume 53 (1958) no. 284, pp. 789-798 | MR | Zbl

[8] Gey, Servane; Lebarbier, Emile Using CART to Detect Multiple Change Points in the Mean for Large Sample, SSB preprint (2008)

[9] Guédon, Yann Exploring the segmentation space for the assessment of multiple change-point models, hal.inria.fr pre-print inria-00311634 (2008) | Zbl

[10] Harchaoui, Zaıd; Lévy-Leduc, Céline Multiple change-point estimation with a total variation penalty, Journal of the American Statistical Association, Volume 105 (2010) no. 492 | MR | Zbl

[11] Hocking, Toby Dylan Comparing least squares segmentation code, available at http://sugiyama-www.cs.titech.ac.jp/ toby/org/HOCKING-ml-seg-compare.pdf (2013)

[12] Hocking, Toby Dylan; Schleiermacher, Gudrun; Janoueix-Lerosey, Isabelle; Boeva, Valentina; Cappo, Julie; Delattre, Olivier; Bach, Francis; Vert, Jean-Philippe Learning smoothing models of copy number profiles using breakpoint annotations, BMC Bioinformatics, Volume 14 (2013) no. 1 | DOI

[13] Jackson, Brad; Scargle, Jeffrey D; Barnes, David; Arabhi, Sundararajan; Alt, Alina; Gioumousis, Peter; Gwin, Elyus; Sangtrakulcharoen, Paungkaew; Tan, Linda; Tsai, Tun Tao An algorithm for optimal partitioning of data on an interval, Signal Processing Letters, IEEE, Volume 12 (2005) no. 2, pp. 105-108

[14] Kolesnikov, Alexander; Fränti, Pasi Reduced-search dynamic programming for approximation of polygonal curves, Pattern Recognition Letters, Volume 24 (2003) no. 14, pp. 2243-2254 | Zbl

[15] Killick, Rebecca; Fearnhead, Paul; Eckley, IA Optimal detection of changepoints with a linear computational cost, Journal of the American Statistical Association, Volume 107 (2012) no. 500, pp. 1590-1598 | MR | Zbl

[16] Lavielle, Marc Using penalized contrasts for the change-point problem, Signal Processing, Volume 85 (2005) no. 8, pp. 1501-1510 | Zbl

[17] Lai, Weil R; Johnson, Mark D; Kucherlapati, Raju; Park, Peter J Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data, Bioinformatics, Volume 21 (2005) no. 19, pp. 3763-3770

[18] Maidstone, Robert; Hocking, Toby; Rigaill, Guillem; Fearnhead, Paul On optimal multiple changepoint algorithms for large data, arXiv preprint arXiv:1409.1842 (2014) | MR | Zbl

[19] Picard, Franck; Robin, Stephane; Lavielle, Marc; Vaisse, Christian; Daudin, Jean-Jacques A statistical approach for array CGH data analysis, BMC Bioinformatics, Volume 6 (2005) no. 1 | DOI

[20] Rigaill, Guillem Pruned dynamic programming for optimal multiple change-point detection, arXiv preprint arXiv:1004.0887 (2010)

[21] Scott, AJ; Knott, M A cluster analysis method for grouping means in the analysis of variance, Biometrics, Volume 30 (1974) no. 3, pp. 507-512 | Zbl