Peaks detection and alignment for mass spectrometry data
[Détection et alignement de pics en spectrométrie de masse]
Journal de la société française de statistique, Tome 151 (2010) no. 1, pp. 17-37.

Le but de cet article est de faire une revue des méthodes existantes pour l’analyse de données protéomiques issues de spectromètres de masse, et de présenter une nouvelle méthodologie pour l’extraction automatique de pics significatifs (bio-marqueurs). Pour les étapes de pré-traitement nécessaires pour des données issues de spectres MALDI-TOF ou SELDI-TOF, nous utilisons une approche purement nonparamétrique qui combine la transformée en ondelettes invariante par translation pour le débruitage et la régression quantile pénalisée à partir de splines pour la correction de la ligne de base. Nous présentons ensuite une technique d’alignement multi-échelle qui est basée sur l’identification des pics statistiquement significatifs dans un ensemble de spectres. Cette méthode permet de trouver les pics communs à un ensemble de spectres qui peuvent être associés aux protéines des individus. Ceux-ci peuvent servir de bio-marqueurs utiles pour des applications médicales, ou bien de vecteurs de caractéristiques pour une analyse statistique multi-dimensionnelle des individus. Des spectres MALDI-TOF obtenus à partir d’échantillons de sérum sont utilisés à travers tout l’article pour illustrer la méthodologie.

The goal of this paper is to review existing methods for protein mass spectrometry data analysis, and to present a new methodology for automatic extraction of significant peaks (biomarkers). For the pre-processing step required for data from MALDI-TOF or SELDI-TOF spectra, we use a purely nonparametric approach that combines stationary invariant wavelet transform for noise removal and penalized spline quantile regression for baseline correction. We further present a multi-scale spectra alignment technique that is based on identification of statistically significant peaks from a set of spectra. This method allows one to find common peaks in a set of spectra that can subsequently be mapped to individual proteins. This may serve as useful biomarkers in medical applications, or as individual features for further multidimensional statistical analysis. MALDI-TOF spectra obtained from serum samples are used throughout the paper to illustrate the methodology.

Mots clés : regression nonparamétrique, ondelettes, régression quantile, détection de pic, alignement de courbes, identification de biomarqueurs
@article{JSFS_2010__151_1_17_0,
     author = {Antoniadis, Anestis and Bigot, J\'er\'emie and Lambert-Lacroix, Sophie},
     title = {Peaks detection and alignment for mass spectrometry data},
     journal = {Journal de la soci\'et\'e fran\c{c}aise de statistique},
     pages = {17--37},
     publisher = {Soci\'et\'e fran\c{c}aise de statistique},
     volume = {151},
     number = {1},
     year = {2010},
     zbl = {1316.62153},
     mrnumber = {2652788},
     language = {en},
     url = {http://www.numdam.org/item/JSFS_2010__151_1_17_0/}
}
TY  - JOUR
AU  - Antoniadis, Anestis
AU  - Bigot, Jérémie
AU  - Lambert-Lacroix, Sophie
TI  - Peaks detection and alignment for mass spectrometry data
JO  - Journal de la société française de statistique
PY  - 2010
DA  - 2010///
SP  - 17
EP  - 37
VL  - 151
IS  - 1
PB  - Société française de statistique
UR  - http://www.numdam.org/item/JSFS_2010__151_1_17_0/
UR  - https://zbmath.org/?q=an%3A1316.62153
UR  - https://www.ams.org/mathscinet-getitem?mr=2652788
LA  - en
ID  - JSFS_2010__151_1_17_0
ER  - 
Antoniadis, Anestis; Bigot, Jérémie; Lambert-Lacroix, Sophie. Peaks detection and alignment for mass spectrometry data. Journal de la société française de statistique, Tome 151 (2010) no. 1, pp. 17-37. http://www.numdam.org/item/JSFS_2010__151_1_17_0/

[1] Antoniadis, A.; Bigot, J.; Lambert-Lacroix, S.; Letué, F. Nonparametric Pre-processing Methods and Inference Tools for Analyzing Time-of-Flight Mass Spectrometry Data, Current Analytical Chemistry, Volume 3 (2007), pp. 127-147

[2] A.C., Sauve.; Speed, T.P. Normalization, baseline correction and alignment of high-throughput mass spectrometry data, Proceedings Gensips (2004) (in press)

[3] Abbink Spaink, H.; Lub, T.T.; Otjes, R.P.; Smith, H.C. Baseline correction for second-harmonic detection with funable diode lasers, Anal. Chim. Acta, Volume 183 (1986), pp. 141-151

[4] Bigot, J.; Gadat, S. A deconvolution approach to estimation of a common shape in a shifted curves model, Annals of Statistics, Volume to be published (2010) | MR 2676894 | Zbl 1202.62049

[5] Bigot, J.; Gadat, S. Smoothing under diffeomorphic constraints with homeomorphic splines, SIAM Journal on Numerical Analysis, Volume 48 (2010), pp. 224-243 | MR 2608367 | Zbl 1330.62187

[6] Bigot, J.; Gadat, S.; Loubes, J.M. Statistical M-Estimation and Consistency in large deformable models for Image Warping, Journal of Mathematical Imaging and Vision, Volume 34 (2009), pp. 270-290 | MR 2515449

[7] Bigot, J.; Gamboa, F.; Vimond, M. Estimation of translation, rotation and scaling between noisy images using the Fourier Mellin transform, SIAM Journal on Imaging Sciences, Volume 2 (2009), pp. 614-645 | MR 2519925 | Zbl 1175.62067

[8] Bigot, J. A scale-space approach with wavelets to singularity estimation, ESAIM: PS, Volume 9 (2005), pp. 143-164 | MR 2148964 | Zbl 1136.62030

[9] Bigot, J. Landmark-based registration of curves via the continuous wavelet transform, Journal of Computational and Graphical Statistics, Volume 15 (2006) no. 3, pp. 542-564 | MR 2291263

[10] Coifman, R.R.; Donoho, D.L. Translation invariant de-noising, Lecture Notes in Statistics, Volume 103 (1995), pp. 125-150 | Zbl 0866.94008

[11] Chao, F.T.; Leung, A.K.M. Application of wavelet transform in processing chromatographic data, Walczak B (ed.) Wavelets in Chemistry, Elsevier Science, 2000

[12] Coombes, K.R.; Tsavachidis, S.; Morris, J.S.; Baggerly, K. A.; Kobayashi, R. Improved Peak Detection and Quantification of Mass Spectrometry Data Acquired from Surface-Enhanced Laser Desorption and Ionization by Denoising Spectra using the Undecimated Discrete Wavelet Transform, Proteomics, Volume 41 (2005), pp. 4107-4117

[13] Daubechies, I. Ten lectures on wavelets, SIAM, 1992 | MR 1162107 | Zbl 0776.42018

[14] De Boor, C. A pratical guide to splines, Vol. 27 of Applied Mathematical Sciences, Springer-Verlag, New-York, 1978 | MR 507062 | Zbl 0406.41003

[15] Dierckx, P. Curve and surface fitting with splines, Clarendon, Oxford, 1993 | MR 1218172 | Zbl 0932.41010

[16] Gasser, T.; Kneip, A. Searching for Structure in Curve Samples, Journal of the American Statistical Association, Volume 90 (1995) no. 432, pp. 1179-1188 | Zbl 0864.62019

[17] Heipke, C. Overview of Image Matching Techniques, OEEPE - Applications of Digital Photogrammetric Work- stations, Proceedings, Lausanne, Switzerland (1996), pp. 173-191

[18] Jeffries, N. Algorithms for alignment of mass spectrometry proteomic data, Bioinformatics, Volume 21 (2005) no. 14, pp. 3066-3073

[19] Johnson, K.J.; Wright, B.W.; Jarman, K.H.; Synovec, R.E. High-speed peak matching algorithm for retention time alignment of gas chromatographic data, Journal of Chromatography A, Volume 996 (2003), pp. 141-155

[20] Koenker, R.; Basset, G. Regression quantiles, Econometrica, Volume 1 (1978), pp. 33-50 | MR 474644 | Zbl 0373.62038

[21] Kneip, A.; Gasser, T. Statistical Tools to Analyze Data Representing a Sample of Curves, Annals of Statistics, Volume 20 (1992) no. 3, pp. 1266-1305 | MR 1186250 | Zbl 0785.62042

[22] Koenker, R.; Ng, P.; Portnoy, S. Quantile smoothing splines, Biometrika, Volume 81 (1994) no. 4, pp. 673-680 | MR 1326417 | Zbl 0810.62040

[23] Loubes, J.M.; Maza, E.; Gamboa, F. Semi-parametric estimation of shifts, Electronic Journal of Statistics, Volume 1 (2007), pp. 616-640 | MR 2369028 | Zbl 1141.62313

[24] Mallat, S.G. A Wavelet Tour of Signal Processing. 2nd ed., San Diego: Academic Press, 1999 | MR 1614527

[25] Ma, X.G.; Zhang, Z.X. Application of wavelet transform to background correction in inductively coupled plasma atomic emission spectrometry, Anal. Chim. Acta, Volume 485 (2003) no. 2, pp. 233-239

[26] Padayachee, J.; Prozesky, V.; von der Linden, W.; Nkwinika, M.S.; Dose, V. Bayesian PIXE background subtraction, Nucl. Instrum. Methods Phys. Res. B, Volume 150 (1999), pp. 129-135

[27] Qu, Y.; Adam, B.L.; Thornquist, M.; Potter, J.D.; Thompson, M.L.; Yasui, Y.; Davis, J.; Schellhammer, P.F.; Cazares, L.; Clements, M.; Wright, G.L.Jr; Z., Feng Multiscale processing of mass spectrometry data, Biometrics, Volume 59 (2003), pp. 143-151 | MR 1978480 | Zbl 1210.62081

[28] Rouh, A.; Delsuc, M.A.; Bertrand, G.; Lallemand, J.Y. The use of classification in baseline correction of FT NMR spectra, J. Magn. Reson. Ser. A, Volume 102 (1993), pp. 357-359

[29] Ruckstuhl, A.F.; Jacobson, M.P.; Field, R.W.; Dodd, J.A. Baseline subtraction using robust local regression estimation, Journal of Quantitative Spectroscopy and Radiative Transfer, Volume 68 (2001) no. 2, pp. 179-193

[30] Ramsay, J.O.; Li, X. Curve Registration, Journal of the Royal Statistical Society, Series B, Volume 60 (1998), pp. 351-363 | MR 1616045 | Zbl 0909.62033

[31] Randolph, T. W.; Yasui, Y. Multiscale processing of mass spectrometry data, Biometrics, Volume 62 (2006) no. 2, pp. 589-597 (in press) | MR 2236841 | Zbl 1098.92043

[32] Saussen, B.; Kirchner, M.; Steen, H.; Jebanathirajah, J.A.; Hamprecht, F.A. The rpm package: aligning LC/MS mass spectra with R, Interdisciplinary Center for Scientific Computing, University of Heidelberg, Germany UseR2006 (2006)

[33] Sardy, S.; Percival, D. B.; Bruce, A.G.; Gao, H.Y.; Stuetzle, W. Wavelet DeNoising for Unequally Spaced Data, Statistics and Computing, Volume 9 (1999) no. 1, pp. 65-75

[34] Tibshirani, R.; Hastie, T.; Narasimhan, B.; Soltys, S.; Shi, G.; Koong, A.; Le, Q. Sample classification from protein mass spectrometry, by peak probability contrasts, Bioinformatics, Volume 20 (2004) no. 17, pp. 3034-3044

[35] van Veen, E.H.; de Loos-Vollebregt, M.T.C. Application of mathematical procedures to background correction and multivariate analysis in inductively coupled plasma-optical emission spectrometry, Spectrochimica Acta Part B: Atomic Spectroscopy, Volume 53 (1998) no. 5, pp. 639-669

[36] W. Dietrich, W.; Rüdel, C.H.; Neumann, M. Fast and precise automatic baseline correction of one- and two-dimensional NMR spectra, J. Magn. Reson., Volume 91 (1991), pp. 1-11

[37] Yuan, M. GACV for Quantile Smoothing Splines, Computational Statistics and Data Analysis, Volume 50 (2006) no. 3, pp. 813-829 | MR 2207010 | Zbl 1432.62090

[38] Yu, W.; Wu, B.; Lin, N.; Stone, K.; Williams, K.; Zhao, H. Detecting and Aligning Peaks in Analyzing MALDI Mass Spectrometry Data, Computational Biology and Chemistry, Volume 30 (2006), pp. 27-38 | Zbl 1087.92047

[39] Yu, W.; Zhao, H. Aligning spectral peaks in mass spectrometry data with a robust point matching approach, In 52nd ASMS Conference on Mass Spectrometry and Allied Topics, Nashville, TN, May (2004), pp. 23-27