[Calcul de la vraisemblance et inférence des paramètres démographiques et mutationnels à partir de la variation génétique des populations]
Diverses approches ont été développées pour l’inférence des taux de migration et des changements démographiques passés à partir de la variation génétique des populations. Nous décrivons une de ces approches utilisant des techniques d’échantillonnage pondéré séquentiel, fondées sur la modélisation par approches de coalescence et de diffusion de l’évolution de ces polymorphismes. L’application et l’évaluation systématique de cette approche ont requis la ré-implémentation de méthodes souvent considérées pour l’analyse de fonctions simulées, en particulier le krigeage, ici utilisé pour inférer une surface de vraisemblance à partir de vraisemblances estimées en différents points de l’espace des paramètres, ainsi que des techniques d’échantillonage de ces points. Nous illustrons la performance et l’application de cette série de méthodes sur données simulées et réelles, et indiquons les améliorations souhaitables en termes de types de données et de scénarios biologiques.
Likelihood methods are being developed for inference of migration rates and past demographic changes from population genetic data. We survey an approach for such inference using sequential importance sampling techniques derived from coalescent and diffusion theory. The consistent application and assessment of this approach has required the re-implementation of methods often considered in the context of computer experiments methods, in particular of Kriging which is used as a smoothing technique to infer a likelihood surface from likelihoods estimated in various parameter points, as well as reconsideration of methods for sampling the parameter space appropriately for such inference. We illustrate the performance and application of the whole tool chain on simulated and actual data, and highlight desirable developments in terms of data types and biological scenarios.
Mot clés : histoire démographique, processus de coalescence, échantillonnage pondéré, polymorphisme génétique
@article{JSFS_2018__159_3_142_0, author = {Rousset, Fran\c{c}ois and Beeravolu, Champak Reddy and Leblois, Rapha\"el}, title = {Likelihood computation and inference of demographic and mutational parameters from population genetic data under coalescent approximations}, journal = {Journal de la soci\'et\'e fran\c{c}aise de statistique}, pages = {142--166}, publisher = {Soci\'et\'e fran\c{c}aise de statistique}, volume = {159}, number = {3}, year = {2018}, zbl = {1406.92404}, language = {en}, url = {http://www.numdam.org/item/JSFS_2018__159_3_142_0/} }
TY - JOUR AU - Rousset, François AU - Beeravolu, Champak Reddy AU - Leblois, Raphaël TI - Likelihood computation and inference of demographic and mutational parameters from population genetic data under coalescent approximations JO - Journal de la société française de statistique PY - 2018 SP - 142 EP - 166 VL - 159 IS - 3 PB - Société française de statistique UR - http://www.numdam.org/item/JSFS_2018__159_3_142_0/ LA - en ID - JSFS_2018__159_3_142_0 ER -
%0 Journal Article %A Rousset, François %A Beeravolu, Champak Reddy %A Leblois, Raphaël %T Likelihood computation and inference of demographic and mutational parameters from population genetic data under coalescent approximations %J Journal de la société française de statistique %D 2018 %P 142-166 %V 159 %N 3 %I Société française de statistique %U http://www.numdam.org/item/JSFS_2018__159_3_142_0/ %G en %F JSFS_2018__159_3_142_0
Rousset, François; Beeravolu, Champak Reddy; Leblois, Raphaël. Likelihood computation and inference of demographic and mutational parameters from population genetic data under coalescent approximations. Journal de la société française de statistique, Tome 159 (2018) no. 3, pp. 142-166. http://www.numdam.org/item/JSFS_2018__159_3_142_0/
[Abdo et al., 2004] Abdo, Z., Crandall, K. A., and Joyce, P. (2004). Evaluating the performance of likelihood methods for detecting population structure and migration. Mol. Ecol., 13:837–851.
[Andrieu et al., 2010] Andrieu, C., Doucet, A., and Holenstein, R. (2010). Particle markov chain monte carlo methods. J. R. Stat. Soc. B, 72(3):269–342.
[Bahlo and Griffiths, 2000] Bahlo, M. and Griffiths, R. C. (2000). Inference from gene trees in a subdivided population. Theor. Popul. Biol., 57:79–95. | Zbl
[Beaumont, 2010] Beaumont, M. (2010). Approximate bayesian computation in evolution and ecology. Ann. Rev. Ecol. Evol. Syst., 41:379–406.
[Beerli, 2006] Beerli, P. (2006). Comparison of bayesian and maximum-likelihood inference of population genetic parameters. Bioinformatics, 22:341–345.
[Beerli and Felsenstein, 1999] Beerli, P. and Felsenstein, J. (1999). Maximum likelihood estimation of migration rates and effective population numbers in two populations using a coalescent approach. Genetics, 152:763–773.
[Bingham et al., 2014] Bingham, D., Ranjan, P., and Welch, W. J. (2014). Design of computer experiments for optimization, estimation of function contours, and related objectives. In Lawless, J. F., editor, Statistics in Action: A Canadian Outlook, pages 109–124. Chapman and Hall/CRC.
[Cockerham, 1973] Cockerham, C. C. (1973). Analyses of gene frequencies. Genetics, 74:679–700.
[Cornuet and Beaumont, 2007] Cornuet, J. M. and Beaumont, M. A. (2007). A note on the accuracy of PAC-likelihood inference with microsatellite data. Theor. Popul. Biol., 71:12–19. | Zbl
[Coulson et al., 2010] Coulson, T., Tuljapurkar, S., and Childs, D. Z. (2010). Using evolutionary demography to link life history theory, quantitative genetics and population ecology. Journal of Animal Ecology, 79(6):1226–1240.
[de Iorio and Griffiths, 2004a] de Iorio, M. and Griffiths, R. C. (2004a). Importance sampling on coalescent histories. Adv. appl. Prob., 36:417–433. | Zbl
[de Iorio and Griffiths, 2004b] de Iorio, M. and Griffiths, R. C. (2004b). Importance sampling on coalescent histories. II. subdivided population models. Adv. appl. Prob., 36:434–454. | Zbl
[de Iorio et al., 2005] de Iorio, M., Griffiths, R. C., Leblois, R., and Rousset, F. (2005). Stepwise mutation likelihood computation by sequential importance sampling in subdivided population models. Theor. Popul. Biol., 68:41–53. | Zbl
[Ewens, 1972] Ewens, W. J. (1972). The sampling theory of selectively neutral alleles. Theor. Popul. Biol., 3:87–112. | Zbl
[Ewens, 2004] Ewens, W. J. (2004). Mathematical population genetics I. Theoretical introduction. Springer Verlag, New York, second edition. | Zbl
[Gelman and Meng, 1998] Gelman, A. and Meng, X.-L. (1998). Simulating normalizing constants: From importance sampling to bridge sampling to path sampling. Stat. Sci., 13:163–185. | Zbl
[Golub et al., 1979] Golub, G. H., Heath, M., and Wahba, G. (1979). Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics, 21:215–223. | Zbl
[Griffiths and Tavaré, 1994] Griffiths, R. C. and Tavaré, S. (1994). Sampling theory for neutral alleles in a varying environment. Phil. Trans. Roy. Soc. (Lond.) B, 344:403–410.
[Hein et al., 2005] Hein, J., Schierup, M. H., and Wiuf, C. (2005). Gene genealogies, variation and evolution. Oxford Univ. Press, Oxford, UK.
[Hey, 2010] Hey, J. (2010). Isolation with migration models for more than two populations. Mol. Biol. Evol., 27:905–920.
[Hey et al., 2015] Hey, J., Chung, Y., and Sethuraman, A. (2015). On the occurrence of false positives in tests of migration under an isolation-with-migration model. Molecular Ecology, 24(20):5078–5083.
[Hobolth et al., 2008] Hobolth, A., Uyenoyama, M. K., and Wiuf, C. (2008). Importance sampling for the infinite sites model. Statistical Applications in Genetics and Molecular Biology, 7(1):1–26.
[Karlin and Taylor, 1981] Karlin, S. and Taylor, H. M. (1981). A second course in stochastic processes. Acad. Press, San Diego.
[Kimura, 1969] Kimura, M. (1969). The number of heterozygous nucleotide sites maintained in a finite population due to steady flux of mutations. Genetics, 61:893–903.
[Lalis et al., 2016] Lalis, A., Leblois, R., Stoetzel, E., Benazzou, T., Souttou, K., Denys, C., and Nicolas, V. (2016). Phylogeography and demographic history of Shaw’s Jird (Meriones shawii complex) in North Africa. Biological Journal of the Linnean Society, 118:262–279.
[Leblois, 2004] Leblois, R. (2004). Inference of dispersal parameters from genetic data in subdivided populations. PhD thesis, Ecole Nationale Supérieure Agronomique, Montpellier, France.
[Leblois et al., 2014] Leblois, R., Pudlo, P., Néron, J., Bertaux, F., Beeravolu, C. R., Vitalis, R., and Rousset, F. (2014). Maximum likelihood inference of population size contractions from microsatellite data. Mol. Biol. Evol., 31:2805–2823.
[Li and Stephens, 2003] Li, N. and Stephens, M. (2003). Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics, 165:2213–2233.
[Liu, 2004] Liu, J. S. (2004). Monte Carlo strategies in scientific computing. Springer, New York. | Zbl
[Matérn, 1960] Matérn, B. (1960). Spatial Variation: Stochastic models and their application ·to some problems in forest surveys and other sampling investigations. phdthesis, Forest Research Institute, Stockholm, Sweden.
[Merle et al., 2017] Merle, C., Leblois, R., Rousset, F., and Pudlo, P. (2017). Resampling: an improvement of importance sampling in varying population size models. Theor. Popul. Biol., 114:70–87.
[Nath and Griffiths, 1996] Nath, H. B. and Griffiths, R. C. (1996). Estimation in an island model using simulation. Theor. Popul. Biol., 50:227–253. | Zbl
[Nielsen and Wakeley, 2001] Nielsen, R. and Wakeley, J. (2001). Distinguishing migration from isolation: a markov chain monte carlo approach. Genetics, 158:885–896.
[Nychka, 2000] Nychka, D. (2000). Spatial process estimates as smoothers. In Schimek, M. G., editor, Smoothing and regression. Approaches, computation and application, pages 393–424. Wiley, New York. | Zbl
[Nychka et al., 2015] Nychka, D., Furrer, R., and Sain, S. (2015). fields: Tools for Spatial Data. R package version 8.2-1.
[Ohta and Kimura, 1973] Ohta, T. and Kimura, M. (1973). A model of mutation appropriate to estimate the number of electrophoretically detectable alleles in a finite population. Genet. Res., 22:201–204.
[Overall et al., 2005] Overall, A. D. J., Byrne, K. A., Pilkington, J. G., and Pemberton, J. M. (2005). Heterozygosity, inbreeding and neonatal traits in soay sheep on st kilda. Molecular Ecology, 14(11):3383–3393.
[Peter et al., 2010] Peter, B. M., Wegmann, D., and Excoffier, L. (2010). Distinguishing between population bottleneck and population subdivision by a bayesian model choice procedure. Mol. Ecol., 19(21):4648–4660.
[Pritchard et al., 1999] Pritchard, J. K., Seielstad, M. T., Perez-Lezaun, A., and Feldman, M. W. (1999). Population growth of human y chromosomes: a study of y chromosome microsatellites. Mol. Biol. Evol., 16(12):1791–1798.
[Rousset, 1996] Rousset, F. (1996). Equilibrium values of measures of population subdivision for stepwise mutation processes. Genetics, 142:1357–1362.
[Rousset and Leblois, 2007] Rousset, F. and Leblois, R. (2007). Likelihood and approximate likelihood analyses of genetic structure in a linear habitat: performance and robustness to model mis-specification. Mol. Biol. Evol., 24:2730–2745.
[Rousset and Leblois, 2012] Rousset, F. and Leblois, R. (2012). Likelihood-based inferences under isolation by distance: two-dimensional habitats and confidence intervals. Mol. Biol. Evol., 29:957–973.
[Sacks et al., 1989] Sacks, J., Welch, W. J., Mitchell, T. J., and Wynn, H. P. (1989). Design and analysis of computer experiments. Stat. Sci., 4:409–435. | Zbl
[Stein, 1999] Stein, M. L. (1999). Interpolation of spatial data: some theory for Kriging. Springer-Verlag, New York. | Zbl
[Stephens and Donnelly, 2000] Stephens, M. and Donnelly, P. (2000). Inference in molecular population genetics (with discussion). J. R. Stat. Soc., 62:605–655. | Zbl
[Tavaré, 1984] Tavaré, S. (1984). Line-of-descent and genealogical processes, and their applications in population genetics models. Theor. Popul. Biol., 26:119–164. | Zbl
[Vignaud et al., 2014a] Vignaud, T. M., Maynard, J. A., Leblois, R., Meekan, M. G., Vázquez-Juárez, R., Ramírez-Macías, D., Pierce, S. J., Rowat, D., Berumen, M. L., Beeravolu, C., Baksay, S., and Planes, S. (2014a). Genetic structure of populations of whale sharks among ocean basins and evidence for their historic rise and recent decline. Molecular Ecology, 23(10):2590–2601.
[Vignaud et al., 2014b] Vignaud, T. M., Mourier, J., Maynard, J. A., Leblois, R., Spaet, J. L., Clua, E., Neglia, V., and Planes, S. (2014b). Blacktip reef sharks, Carcharhinus melanopterus, have high genetic structure and varying demographic histories in their indo-pacific range. Molecular Ecology, 23(21):5193–5207.
[Wakeley, 2008] Wakeley, J. (2008). Coalescent theory: an introduction. Roberts and Company.
[Welch et al., 1992] Welch, W. J., Buck, R. J., Sachs, J., Wynn, H. P., Mitchell, T. J., and Morris, M. D. (1992). Screening, prediction, and computer experiments. Technometrics, 34:15–25.
[Wright, 1951] Wright, S. (1951). The genetical structure of populations. Ann. Eugenics, 15:323–354. | Zbl
[Zenboudji et al., 2016] Zenboudji, S., Cheylan, M., Arnal, V., Bertolero, A., Leblois, R., Astruc, G., Bertorelle, G., Pretus, J. L., Valvo, M. L., Sotgiu, G., and Montgelard, C. (2016). Conservation of the endangered mediterranean tortoise testudo hermanni hermanni: The contribution of population genetics and historical demography. Biological Conservation, 195:279 – 291.
[Zimmerman, 2006] Zimmerman, D. L. (2006). Optimal network design for spatial prediction, covariance parameter estimation, and empirical prediction. Environmetrics, 17:635–652.