Evaluation of the minimum number of markers for individual ancestry estimation in an Argentinean population sample
DOI:
https://doi.org/10.31048/1852.4826.v9.n1.12579Keywords:
number of AIMs, individual ancestry, Argentinean populationAbstract
Estimation of individual ancestry has great relevance when studying population composition in regions like South America, where intensive admixture processes have occurred, being also important in biomedical sciences. For that reason, it is important to assess the factors that may affect the reliability of results. In this work, we investigate the minimum number of ancestry informative markers (AIMs) for obtaining acceptable estimations of ancestry. As an example, we take individuals from a population sample of different Argentinean regions. Considering a three component model (Native American, Eurasian and Sub-Saharan), we calculated ancestry of 441 individuals using 10, 20, 30 and 50 AIMs. The results indicate that the number of markers affects ancestry estimation and its accuracy increases with AIMs number. When compared to previous estimations obtained from 99 AIMs, the result shows that at least 30 markers are needed to achieve good correlation values for the minority component (Sub-Saharan in this case). For individual ancestry studies, we suggest to take into account not only the number of markers, but also its informativeness and the background of the studied population.Downloads
References
Alexander, D. H., J. Novembre y K. Lange. 2009. Fast model-based estimation of ancestry in unrelated individuals. Genome Research, 19(9):1655-1664. DOI: https://doi.org/10.1101/gr.094052.109
Akaike, H. 1974. A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6):716-723. DOI: https://doi.org/10.1109/TAC.1974.1100705
Avena, S., M. Via, E. Ziv, E. J. Pérez-Stable, C.R. Gignoux, C. Dejean, S. Huntsman, G. Torres-Mejía, J. Dutil, J. L. Matta, K. Beckman, E. G. Burchard, M. L. Parolin, A. Goicoechea, N. Acreche, M. Boquet, M. C. Ríos Part, V. Fernández, J. Rey, M. C. Stern, R. F. Carnese y L. Fejerman. 2012. Heterogeneity in genetic admixture across different regions of Argentina. PLoS One, 7(4):e34695. http://doi.org/10.1371/journal.pone.0034695 (Última consulta: 11/10/2015). DOI: https://doi.org/10.1371/journal.pone.0034695
Banks, M. A. y W. Eichert. 2000. WHICHRUN (version 3.2): a computer program for population assignment of individuals based on multilocus genotype data. Journal of Heredity, 91(1):87-89. DOI: https://doi.org/10.1093/jhered/91.1.87
Beebe-Dimmer, J. L., A. M. Levin, A. M. Ray, K. A. Zuhlke, M. J. Machiela, B. A. Halstead-Nussloch, G. R. Johnson, K. A. Cooney y J. A. Douglas. 2008. Chromosome 8q24 markers: risk of early-onset and familial prostate cancer. International Journal of Cancer, 122(12):2876-2879. DOI: https://doi.org/10.1002/ijc.23471
Bonilla, C., B. Bertoni, P. C. Hidalgo, N. Artagaveytia, E. Ackermann, I. Barreto, P. Cancela, M. Cappetta, A. Egaña, G. Figueiro, S. Heinzen, S. Hooker, E. Román, M. Sans y R. A. Kittles. 2015. Breast cancer risk and genetic ancestry: a case-control study in Uruguay. BMC Womens Health, 15:11. DOI: https://doi.org/10.1186/s12905-015-0171-8
Burchard, E. G., E. Ziv, N. Coyle, S. L. Gomez, H. Tang, A. J. Karter, J. L. Mountain, E. J. Pérez-Stable, D. Sheppard y N. Risch. 2003. The importance of race and ethnic background in biomedical research and clinical practice. New England Journal of Medicine, 348(12):1170-1175. DOI: https://doi.org/10.1056/NEJMsb025007
Cann, H. M., C. de Toma, L. Cazes, M. F. Legrand, V. Morel, L. Piouffre, J. Bodmer, W. F. Bodmer, B. Bonne-Tamir, A. Cambon-Thomsen, Z. Chen, J. Chu, C. Carcassi, L. Contu, R. Du, L. Excoffier, G. B. Ferrara, J. S. Friedlaender, H. Groot, D. Gurwitz, T. Jenkins, R. J. Herrera, X. Huang, J. Kidd, K. K. Kidd, A. Langaney, A. A. Lin, S. Q. Mehdi, P. Parham, A. Piazza, M. P. Pistillo, Y. Qian, Q. Shu, J. Xu, S. Zhu, J. L. Weber, H. T. Greely, M. W. Feldman, G. Thomas, J. Dausset y L. L. Cavalli-Sforza. 2002. A human genome diversity cell line panel. Science, 296(5566):261-262. DOI: https://doi.org/10.1126/science.296.5566.261b
Cardini, A. y S. Elton. 2007. Sample size and sampling error in geometric morphometric studies of size and shape. Zoomorphology, 126(2):121-134. DOI: https://doi.org/10.1007/s00435-007-0036-2
Corach, D., O. Lao, C. Bobillo, K. van Der Gaag, S. Zuniga, M. Vermeulen, K. van Duijn, M. Goedbloed, P. M. Vallone, W. Parson, P. de Knijff y M. Kayser. 2010. Inferring continental ancestry of argentineans from Autosomal, Y-chromosomal and mitochondrial DNA. Annals of Human Genetics, 74(1):65-76. DOI: https://doi.org/10.1111/j.1469-1809.2009.00556.x
Corander, J., P. Waldmann, P. Marttinen y M. J. Sillanpää. 2004. BAPS 2: Enhanced possibilities for the analysis of genetic population structure. Bioinformatics, 20(15): 2363-2369. DOI: https://doi.org/10.1093/bioinformatics/bth250
Dawson, K. J. y K. Belkhir. 2001. A Bayesian approach to the identification of panmictic populations and the assignment of individuals. Genetical Research, 78(1):59-77. DOI: https://doi.org/10.1017/S001667230100502X
Di Rienzo, J. A., A. W. Guzman y F. Casanoves. 2002. A Multiple Comparisons Method based On the Distribution of the Root Node Distance of a Binary Tree Obtained by Average Linkage of the Matrix of Euclidean Distances between Treatment Means. Journal of Agricultural, Biological, and Environmental Statistics, 7(2):129-142. DOI: https://doi.org/10.1198/10857110260141193
Di Rienzo, J. A., F. Casanoves, M. G. Balzarini, L. Gonzalez, M. Tablada y C. W. Robledo. 2013. InfoStat versión 2013. Grupo InfoStat, FCA, Universidad Nacional de Córdoba, Argentina. http://www.infostat.com.ar.
Galanter, J. M., J. C. Fernandez-Lopez, C. R. Gignoux, J. Barnholtz-Sloan, C. Fernandez-Rozadilla, M. Via, A. Hidalgo-Miranda, A. V. Contreras, L. U. Figueroa, P. Raska, G. Jimenez-Sanchez, I. S. Zolezzi, M. Torres, C. R. Ponte, Y. Ruiz, A. Salas, E. Nguyen, C. Eng, L. Borjas, W. Zabala, G. Barreto, F. R. González, A. Ibarra, P. Taboada, L. Porras, F. Moreno, A. Bigham, G. Gutierrez, T. Brutsaert, F. León-Velarde, L. G. Moore, E. Vargas, M. Cruz, J. Escobedo, J. Rodriguez-Santana, W. Rodriguez-Cintrón, R. Chapela, J. G. Ford, C. Bustamante, D. Seminara, M. Shriver, E. Ziv, E. G. Burchard, R. Haile, E. Parra, A. Carracedo y LACE Consortium. 2012. Development of a panel of genome-wide ancestry informative markers to study admixture throughout the Americas. PLoS Genetics, 8(3): e1002554. http://doi.org/10.1371/journal.pgen.1002554 (Última consulta: 11/10/2015). DOI: https://doi.org/10.1371/journal.pgen.1002554
García, A., L. Tovo-Rodrigues, M. Pauro, S. M. Callegari-Jacques, F. M. Salzano, M. H. Hutz y D. A. Demarchi. 2011. Caracterización del mestizaje en poblaciones del centro de Argentina a partir de marcadores moleculares informativos de ancestralidad (AIM). M. F. Cesani, Libro de Resúmenes de las Décimas Jornadas Nacionales de Antropología Biológica, 136, Asociación de Antropología Biológica Argentina, City Bell.
González, P. N., V. Bernal, S. I. Pérez, M. Del Papa, F. Gordon y G. Ghidini. 2004. El error de observación y su influencia en los análisis morfológicos de restos óseos humanos. Datos de variación discreta. Revista Argentina de Antropología Biológica, 6(1):35-46.
González-José, R., I. Escapa, W. A. Neves, R. Cúneo y H. M. Pucciarelli. 2011. Morphometric variables can be analyzed using cladistic methods: a reply to Adams et al. Journal of Human Evolution, 60(2):244-245. DOI: https://doi.org/10.1016/j.jhevol.2010.11.001
Halder, I. y M. D. Shriver. 2003. Measuring and using admixture to study the genetics of complex diseases. Human Genomics, 1(1):52-62. DOI: https://doi.org/10.1186/1479-7364-1-1-52
Handley, L. J., A. Manica, J. Goudet y F. Balloux. 2007. Going the distance: human population genetics in a clinal world. Trends in Genetics, 23(9):432-439. DOI: https://doi.org/10.1016/j.tig.2007.07.002
Haryono, S. J., I. G. Datasena, W. B. Santosa, R. Mulyarahardja y K. Sari. 2015. A pilot genome-wide association study of breast cancer susceptibility loci in Indonesia. Asian Pacific Journal of Cancer Prevention, 16(6):2231-2235. DOI: https://doi.org/10.7314/APJCP.2015.16.6.2231
Heinz, T., V. Alvarez-Iglesias, J. Pardo-Seco, P. Taboada-Echalar, A. Gómez-Carballa, A. Torres-Balanza, O. Rocabado, A. Carracedo, C. Vullo y A. Salas. 2013. Ancestry analysis reveals a predominant Native American component with moderate European admixture in Bolivians. Forensic Science International. Genetics, 7(5):537-542. DOI: https://doi.org/10.1016/j.fsigen.2013.05.012
International HapMap Consortium, 2003. The International HapMap Project. Nature, 426(6968):789-796. DOI: https://doi.org/10.1038/nature02168
Keene, K. L., J. C. Mychaleckyj, T. S. Leak, S. G. Smith, P. S. Perlegas, J. Divers, C. D. Langefeld, B. I. Freedman, D. W. Bowden y M. M. Sale. 2008. Exploration of the utility of ancestry informative markers for genetic association studies of African Americans with type 2 diabetes and end stage renal disease. Human Genetics, 124(2):147-154. DOI: https://doi.org/10.1007/s00439-008-0532-6
Manel, S., P. Berthier y G. Luikart. 2002. Detecting wildlife poaching: Identifying the origin of individuals with Bayesian assignment tests and multilocus genotypes. Conservation Biology, 16(3):650-659. DOI: https://doi.org/10.1046/j.1523-1739.2002.00576.x
Marchini, J., L. R. Cardon, M. S. Phillips y P. Donnelly. 2004. The effects of human population structure on large genetic association studies. Nature Genetics, 36(5):512-517. DOI: https://doi.org/10.1038/ng1337
Nalls, M. A., J. G. Wilson, N. J. Patterson, A. Tandon, J. M. Zmuda, S. Huntsman, M. García, D. Hu, R. Li, B. A. Beamer, K. V. Patel, E. L. Akylbekova, J. C. Files, C. L. Hardy, S. G. Buxbaum, H. A. Taylor, D. Reich, T. B. Harris y E. Ziv. 2008. Admixture mapping of white cell count: genetic locus responsible for lower white blood cell count in the Health ABC and Jackson Heart studies. American Journal of Human Genetics, 82(1):81-87. DOI: https://doi.org/10.1016/j.ajhg.2007.09.003
Peprah, E., H. Xu, F. Tekola-Ayele y C. D. Royal. 2015. Genome-wide association studies in Africans and African Americans: expanding the framework of the genomics of human traits and disease. Public Health Genomics, 18(1):40-51. DOI: https://doi.org/10.1159/000367962
Pinheiro, J., D. Bates, S. DebRoy, D. Sarkar y R Core Team. 2015. nlme: Linear and Nonlinear Mixed Effects Models. R package version 3.1-120. http://CRAN.R-project.org/package=nlme.
Price, A. L., N. Patterson, F. Yu, D. R. Cox, A. Waliszewska, G. J. McDonald, A. Tandon, C. Schirmer, J. Neubauer, G. Bedoya, C. Duque, A. Villegas, M. C. Bortolini, F. M. Salzano, C. Gallo, G. Mazzotti, M. Tello-Ruiz, L. Riba, C. A. Aguilar-Salinas, S. Canizales-Quinteros, M. Menjivar, W. Klitz, B. Henderson, C. A. Haiman, C. Winkler, T. Tusie-Luna, A. Ruiz-Linares y D. Reich. 2007. A genomewide admixture map for Latino populations. American Journal of Human Genetics, 80(6):1024-1036. DOI: https://doi.org/10.1086/518313
Pritchard, J. K., M. Stephens y P. Donnelly. 2000. Inference of population structure using multilocus genotype data. Genetics, 155(2):945-959. DOI: https://doi.org/10.1093/genetics/155.2.945
Pritchard, J. K. y P. Donnelly. 2001. Case-control studies of association in structured or admixed populations. Theoretical Population Biology, 60(3):227-237. DOI: https://doi.org/10.1006/tpbi.2001.1543
R Core Team. 2014. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/.
Rice, W. R. 1989. Analyzing tables of statistical tests. Evolution, 43(1):223-225. DOI: https://doi.org/10.1111/j.1558-5646.1989.tb04220.x
Robbins, C., J. B. Torres, S. Hooker, C. Bonilla, W. Hernandez, A. Candreva, C. Ahaghotu, R. Kittles y J. Carpten. 2007. Confirmation study of prostate cancer risk variants at 8q24 in African Americans identifies a novel risk locus. Genome Research, 17(12):1717-1722. DOI: https://doi.org/10.1101/gr.6782707
Rohlf, F.J. y L. F. Marcus. 1993. A revolution in morphometrics. Trends in Ecology & Evolution, 8(4):129-132. DOI: https://doi.org/10.1016/0169-5347(93)90024-J
Rosenberg, N. A., J. K. Pritchard, J. L. Weber, H. M. Cann, K. K. Kidd, L. A. Zhivotovsky y M. W. Feldman. 2002. Genetic structure of human populations. Science, 298(5602):2381-2385. DOI: https://doi.org/10.1126/science.1078311
Rosenberg, N. A., S. Mahajan, S. Ramachandran, C. Zhao, J. K. Pritchard y M. W. Feldman. 2005. Clines, clusters, and the effect of study design on the inference of human population structure. PLoS Genetics, 1(6):e70. DOI: https://doi.org/10.1371/journal.pgen.0010070
Rúa, O., I. M. Larráyoz, M. T. Barajas, S. Velilla y A. Martínez. 2012. Oral doxycycline reduces pterygium lesions; results from a double blind, randomized, placebo controlled clinical trial. PLoS One, 7(12):e52696. http://doi.org/10.1371/journal.pone.0052696 (Última consulta: 11/10/2015). DOI: https://doi.org/10.1371/journal.pone.0052696
Ruiz-Linares, A., K. Adhikari, V. Acuña-Alonzo, M. Quinto-Sanchez, C. Jaramillo, W. Arias, M. Fuentes, M. Pizarro, P. Everardo, F. de Avila, J. Gómez-Valdés, P. León-Mimila, T. Hunemeier, V. Ramallo, C. C. Silva de Cerqueira, M. W. Burley, E. Konca, M. Z. de Oliveira, M. R. Veronez, M. Rubio-Codina, O. Attanasio, S. Gibbon, N. Ray, C. Gallo, G. Poletti, J. Rosique, L. Schuler-Faccini, F. M. Salzano, M. C. Bortolini, S. Canizales-Quinteros, F. Rothhammer, G. Bedoya, D. Balding y R. Gonzalez-José. 2014. Admixture in Latin America: Geographic structure, phenotypic diversity and self-perception of ancestry based on 7,342 individuals. PLoS Genetics, 10(9):e1004572. http://doi.org/10.1371/journal.pgen.1004572 (Última consulta: 11/10/2015). DOI: https://doi.org/10.1371/journal.pgen.1004572
Ruiz-Narváez, E. A., L. Rosenberg, L. A. Wise, D. Reich y J. Palmer. 2010. Validation of a small set of Ancestral Informative Markers for control of population admixture in African Americans. American Journal of Epidemiology, 173(5):587-592. DOI: https://doi.org/10.1093/aje/kwq401
Schwarz, G. 1978. Estimating the dimension of a model. Annals of Statistics, 6(2):461-464. DOI: https://doi.org/10.1214/aos/1176344136
Silva-Zolezzi, I., A. Hidalgo-Miranda, J. Estrada-Gil, J. C. Fernandez-Lopez, L. Uribe-Figueroa, A. Contreras, E. Balam-Ortiz, L. del Bosque-Plata, D. Velazquez-Fernandez, C. Lara, R. Goya, E. Hernandez-Lemus, C. Davila, E. Barrientos, S. March y G. Jimenez-Sanchez. 2009. Analysis of genomic diversity in Mexican Mestizo populations to develop genomic medicine in Mexico. Proceedings of the National Academy of Sciences of the United States of America, 106(21):8611-8616. DOI: https://doi.org/10.1073/pnas.0903045106
Tang, H., J. Peng, P. Wang y N. Risch. 2005. Estimation of individual admixture: analytical and study design considerations. Genetic Epidemiology, 28(4):289-301. DOI: https://doi.org/10.1002/gepi.20064
Torcida, S. y S. I. Pérez. 2012. Análisis de Procrustes y el estudio de la variación morfológica. Revista Argentina de Antropología Biológica, 14(1):131-141.
Toscanini, U., L. Gusmão, G. Berardi, A. Gómez, R. Pereira y E. Raimondi. 2011. Ancestry proportions in urban populations of Argentina. Forensic Science International: Genetics Supplement Series, 3(1):e387-e388. DOI: https://doi.org/10.1016/j.fsigss.2011.09.055
Trinks, J., M. L. Hulaniuk, M. Caputo, L. B. Pratx, V. Ré, L. Fortuny, A. Pontoriero, A. Frías, O. Torres, F. Nuñez, V. Gadano, D. Corach y D. Flichman. 2014. Distribution of genetic polymorphisms associated with hepatitis C virus (HCV) antiviral response in a multiethnic and admixed population. The Pharmacogenomics Journal, 14(6):549-554. DOI: https://doi.org/10.1038/tpj.2014.20
Tsai, H. J., S. Choudhry, M. Naqvi, W. Rodriguez-Cintron, E. G. Burchard y E. Ziv. 2005. Comparison of three methods to estimate genetic ancestry and control for stratification in genetic association studies among admixed populations. Human Genetics, 118(3-4):424-433. DOI: https://doi.org/10.1007/s00439-005-0067-z
Turakulov, R. y S. Easteal. 2003. Number of SNPS loci needed to detect population structure. Human Heredity, 55(1):37-45. DOI: https://doi.org/10.1159/000071808
Utermohle CJ, Zegura SL. 1982. Intra- and interobserver error in craniometry: a cautionary tale. Am J Phys Anthropol 57(3):303-10. DOI: https://doi.org/10.1002/ajpa.1330570307
Wheeler, H. E., L. K. Gorsic, M. Welsh, A. L. Stark, E. R. Gamazon, N. J. Cox y M. E. Dolan. 2011. Genome-wide local ancestry approach identifies genes and variants associated with chemotherapeutic susceptibility in African Americans. PLoS One, 6(7):e21920. http://doi.org/10.1371/journal.pone.0021920 (Última consulta: 11/10/2015). DOI: https://doi.org/10.1371/journal.pone.0021920
Zhang, Q., C. E. Lewis, L. E. Wagenknecht, R. H. Myers, J. S. Pankow, S. C. Hunt, K. E. North, J. E. Hixson, J. Jeffrey Carr, L. C. Shimmin, I. Borecki y M. A. Province. 2008. Genome-wide admixture mapping for coronary artery calcification in African Americans: the NHLBI Family Heart Study. Genetic Epidemiology, 32(3):264-272. DOI: https://doi.org/10.1002/gepi.20301
Zhu, X. y R. S. Cooper. 2007. Admixture mapping provides evidence of association of the VNN1 gene with hypertension. PLoS One, 2(11):e1244. http://doi.org/10.1371/journal.pone.0001244 (Última consulta: 11/10/2015). DOI: https://doi.org/10.1371/journal.pone.0001244
Ziv, E., E. M. John, S. Choudhry, J. Kho, W. Lorizio, E. J. Perez-Stable y E. G. Burchard. 2006. Genetic ancestry and risk factors for breast cancer among Latinas in the San Francisco Bay Area. Cancer epidemiology, biomarkers & prevention: a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology, 15(10):1878-1885. DOI: https://doi.org/10.1158/1055-9965.EPI-06-0092
Downloads
Published
How to Cite
Issue
Section
License
Those authors who have publications with this Journalaccept the following terms:
a. Authors will retain their copyrights and guarantee the journal the right of first publication of their work, which will be simultaneously subject to the Creative Commons Attribution License (Licencia de reconocimiento de Creative Commons) that allows third parties to share the work as long as its author and his first publication in this journal.
b. Authors may adopt other non-exclusive licensing agreements for the distribution of the version of the published work (eg, deposit it in an institutional electronic file or publish it in a monographic volume) provided that the initial publication in this journal is indicated.
c. Authors are allowed and recommended to disseminate their work on the Internet (eg in institutional telematic archives or on their website) before and during the submission process, which can lead to interesting exchanges and increase citations of the published work. (See The Effect of Open Access - El efecto del acceso abierto)