Statistics/Probability Theory
New approximate P-value of gapped local sequence alignments
Comptes Rendus. Mathématique, Volume 346 (2008) no. 1-2, pp. 87-92.

We propose a new method to approximate the significativity of gapped local sequence alignments. We focus on short sequences for which standard methods are known to be less accurate since they have been developed under asymptotics. Our approach combines an approximate distribution of ungapped local score of two sequences and a special scoring scheme that allows the insertion of gaps. For a positive integer h, the scoring scheme is defined on h-tuples of the components of the sequences and corresponds to the gapped global score. The influence of h and the accuracy of the p-value are numerically studied.

Nous proposons une nouvelle méthode pour estimer la signification statistique des alignements locaux de deux séquences avec gaps. On s'intéresse plus particulièrement aux séquences courtes pour lesquelles les méthodes standards sont moins efficaces étant donné leur aspect asymptotique. Notre approche combine une distribution approchée du score local sans gaps de deux séquences et une fonction de score spécifique qui permet d'introduire les gaps. Soit h un entier positif, la fonction de score est définie sur les h-uplets des composants des séquences et correspond au score global avec gap. L'influence de h et la qualité de la p-valeur sont ensuite étudiées numériquement.

Received:
Accepted:
Published online:
DOI: 10.1016/j.crma.2007.11.022
Fayyaz, Afshin M. 1; Mercier, Sabine 1; Ferré, Louis 1; Hassenforder, Claudie 1

1 Institut de mathematiques de Toulouse, UMR CNRS 5219, Université Toulouse Le Mirail, 5, allées Antonio-Machado, 31058 Toulouse cedex 9, France
@article{CRMATH_2008__346_1-2_87_0,
     author = {Fayyaz, Afshin M. and Mercier, Sabine and Ferr\'e, Louis and Hassenforder, Claudie},
     title = {New approximate {\protect\emph{P}-value} of gapped local sequence alignments},
     journal = {Comptes Rendus. Math\'ematique},
     pages = {87--92},
     publisher = {Elsevier},
     volume = {346},
     number = {1-2},
     year = {2008},
     doi = {10.1016/j.crma.2007.11.022},
     language = {en},
     url = {http://www.numdam.org/articles/10.1016/j.crma.2007.11.022/}
}
TY  - JOUR
AU  - Fayyaz, Afshin M.
AU  - Mercier, Sabine
AU  - Ferré, Louis
AU  - Hassenforder, Claudie
TI  - New approximate P-value of gapped local sequence alignments
JO  - Comptes Rendus. Mathématique
PY  - 2008
SP  - 87
EP  - 92
VL  - 346
IS  - 1-2
PB  - Elsevier
UR  - http://www.numdam.org/articles/10.1016/j.crma.2007.11.022/
DO  - 10.1016/j.crma.2007.11.022
LA  - en
ID  - CRMATH_2008__346_1-2_87_0
ER  - 
%0 Journal Article
%A Fayyaz, Afshin M.
%A Mercier, Sabine
%A Ferré, Louis
%A Hassenforder, Claudie
%T New approximate P-value of gapped local sequence alignments
%J Comptes Rendus. Mathématique
%D 2008
%P 87-92
%V 346
%N 1-2
%I Elsevier
%U http://www.numdam.org/articles/10.1016/j.crma.2007.11.022/
%R 10.1016/j.crma.2007.11.022
%G en
%F CRMATH_2008__346_1-2_87_0
Fayyaz, Afshin M.; Mercier, Sabine; Ferré, Louis; Hassenforder, Claudie. New approximate P-value of gapped local sequence alignments. Comptes Rendus. Mathématique, Volume 346 (2008) no. 1-2, pp. 87-92. doi : 10.1016/j.crma.2007.11.022. http://www.numdam.org/articles/10.1016/j.crma.2007.11.022/

[1] Altschul, S.F.; Gish, W. Local alignment statistics, Methods Enzymol., Volume 266 (1996), pp. 460-480

[2] Dembo, A.; Karlin, S.; Zeitouni, O. Limit distribution of maximal non-aligned two-sequences segmental score, Ann. Probab., Volume 24 (1994), pp. 2022-2039

[3] Fayyaz, A.; Mercier, S.; Ferré, L. h-tuple approach to evaluate statistical significance of biological sequence comparison with gaps, Stat. Appl. Genet. Mol. Biol., Volume 6 (2007) no. 1 (article 22)

[4] Karlin, S.; Altschul, S.F. Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes, Proc. Natl. Acad. Sci. USA, Volume 87 (1990), pp. 2264-2268

[5] Mercier, S.; Daudin, J.J. Exact distribution for the local score of one i.i.d. random sequence, J. Comp. Biol., Volume 8 (2001), pp. 373-380

[6] Mott, R.F.; Tribe, R. Approximate statistics of gapped alignments, J. Comp. Biol., Volume 6 (1999), pp. 91-112

[7] Park, Y.; Spouge, J.L. The correlation error and finite-size correction in an ungapped sequence alignment, Bioinformatics, Volume 18 (2002), pp. 1236-1242

[8] Siegmund, D.; Yakir, B. Approximate p-values for local sequence alignments, Ann. Statist., Volume 28 (2000), pp. 657-680

[9] Siegmund, D.; Yakir, B. Correction: Approximate p-values for local sequence alignments, Ann. Statist., Volume 31 (2003), pp. 1027-1031

[10] Zhang, Y. A limit theorem for matching random sequences allowing deletions, Ann. Appl. Probab., Volume 5 (1995), pp. 1236-1240

Cited by Sources: