Evaluation of the minimum number of markers for individual ancestry estimation in an Argentinean population sample
DOI:
https://doi.org/10.31048/1852.4826.v9.n1.12579Keywords:
number of AIMs, individual ancestry, Argentinean populationAbstract
Estimation of individual ancestry has great relevance when studying population composition in regions like South America, where intensive admixture processes have occurred, being also important in biomedical sciences. For that reason, it is important to assess the factors that may affect the reliability of results. In this work, we investigate the minimum number of ancestry informative markers (AIMs) for obtaining acceptable estimations of ancestry. As an example, we take individuals from a population sample of different Argentinean regions. Considering a three component model (Native American, Eurasian and Sub-Saharan), we calculated ancestry of 441 individuals using 10, 20, 30 and 50 AIMs. The results indicate that the number of markers affects ancestry estimation and its accuracy increases with AIMs number. When compared to previous estimations obtained from 99 AIMs, the result shows that at least 30 markers are needed to achieve good correlation values for the minority component (Sub-Saharan in this case). For individual ancestry studies, we suggest to take into account not only the number of markers, but also its informativeness and the background of the studied population.Downloads
References
Alexander, D. H., J. Novembre y K. Lange. 2009. Fast model-based estimation of ancestry in unrelated individuals. Genome Research, 19(9):1655-1664.
Akaike, H. 1974. A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6):716-723.
Avena, S., M. Via, E. Ziv, E. J. Pérez-Stable, C.R. Gignoux, C. Dejean, S. Huntsman, G. Torres-Mejía, J. Dutil, J. L. Matta, K. Beckman, E. G. Burchard, M. L. Parolin, A. Goicoechea, N. Acreche, M. Boquet, M. C. Ríos Part, V. Fernández, J. Rey, M. C. Stern, R. F. Carnese y L. Fejerman. 2012. Heterogeneity in genetic admixture across different regions of Argentina. PLoS One, 7(4):e34695. http://doi.org/10.1371/journal.pone.0034695 (Última consulta: 11/10/2015).
Banks, M. A. y W. Eichert. 2000. WHICHRUN (version 3.2): a computer program for population assignment of individuals based on multilocus genotype data. Journal of Heredity, 91(1):87-89.
Beebe-Dimmer, J. L., A. M. Levin, A. M. Ray, K. A. Zuhlke, M. J. Machiela, B. A. Halstead-Nussloch, G. R. Johnson, K. A. Cooney y J. A. Douglas. 2008. Chromosome 8q24 markers: risk of early-onset and familial prostate cancer. International Journal of Cancer, 122(12):2876-2879.
Bonilla, C., B. Bertoni, P. C. Hidalgo, N. Artagaveytia, E. Ackermann, I. Barreto, P. Cancela, M. Cappetta, A. Egaña, G. Figueiro, S. Heinzen, S. Hooker, E. Román, M. Sans y R. A. Kittles. 2015. Breast cancer risk and genetic ancestry: a case-control study in Uruguay. BMC Womens Health, 15:11.
Burchard, E. G., E. Ziv, N. Coyle, S. L. Gomez, H. Tang, A. J. Karter, J. L. Mountain, E. J. Pérez-Stable, D. Sheppard y N. Risch. 2003. The importance of race and ethnic background in biomedical research and clinical practice. New England Journal of Medicine, 348(12):1170-1175.
Cann, H. M., C. de Toma, L. Cazes, M. F. Legrand, V. Morel, L. Piouffre, J. Bodmer, W. F. Bodmer, B. Bonne-Tamir, A. Cambon-Thomsen, Z. Chen, J. Chu, C. Carcassi, L. Contu, R. Du, L. Excoffier, G. B. Ferrara, J. S. Friedlaender, H. Groot, D. Gurwitz, T. Jenkins, R. J. Herrera, X. Huang, J. Kidd, K. K. Kidd, A. Langaney, A. A. Lin, S. Q. Mehdi, P. Parham, A. Piazza, M. P. Pistillo, Y. Qian, Q. Shu, J. Xu, S. Zhu, J. L. Weber, H. T. Greely, M. W. Feldman, G. Thomas, J. Dausset y L. L. Cavalli-Sforza. 2002. A human genome diversity cell line panel. Science, 296(5566):261-262.
Cardini, A. y S. Elton. 2007. Sample size and sampling error in geometric morphometric studies of size and shape. Zoomorphology, 126(2):121-134.
Corach, D., O. Lao, C. Bobillo, K. van Der Gaag, S. Zuniga, M. Vermeulen, K. van Duijn, M. Goedbloed, P. M. Vallone, W. Parson, P. de Knijff y M. Kayser. 2010. Inferring continental ancestry of argentineans from Autosomal, Y-chromosomal and mitochondrial DNA. Annals of Human Genetics, 74(1):65-76.
Corander, J., P. Waldmann, P. Marttinen y M. J. Sillanpää. 2004. BAPS 2: Enhanced possibilities for the analysis of genetic population structure. Bioinformatics, 20(15): 2363-2369.
Dawson, K. J. y K. Belkhir. 2001. A Bayesian approach to the identification of panmictic populations and the assignment of individuals. Genetical Research, 78(1):59-77.
Di Rienzo, J. A., A. W. Guzman y F. Casanoves. 2002. A Multiple Comparisons Method based On the Distribution of the Root Node Distance of a Binary Tree Obtained by Average Linkage of the Matrix of Euclidean Distances between Treatment Means. Journal of Agricultural, Biological, and Environmental Statistics, 7(2):129-142.
Di Rienzo, J. A., F. Casanoves, M. G. Balzarini, L. Gonzalez, M. Tablada y C. W. Robledo. 2013. InfoStat versión 2013. Grupo InfoStat, FCA, Universidad Nacional de Córdoba, Argentina. http://www.infostat.com.ar.
Galanter, J. M., J. C. Fernandez-Lopez, C. R. Gignoux, J. Barnholtz-Sloan, C. Fernandez-Rozadilla, M. Via, A. Hidalgo-Miranda, A. V. Contreras, L. U. Figueroa, P. Raska, G. Jimenez-Sanchez, I. S. Zolezzi, M. Torres, C. R. Ponte, Y. Ruiz, A. Salas, E. Nguyen, C. Eng, L. Borjas, W. Zabala, G. Barreto, F. R. González, A. Ibarra, P. Taboada, L. Porras, F. Moreno, A. Bigham, G. Gutierrez, T. Brutsaert, F. León-Velarde, L. G. Moore, E. Vargas, M. Cruz, J. Escobedo, J. Rodriguez-Santana, W. Rodriguez-Cintrón, R. Chapela, J. G. Ford, C. Bustamante, D. Seminara, M. Shriver, E. Ziv, E. G. Burchard, R. Haile, E. Parra, A. Carracedo y LACE Consortium. 2012. Development of a panel of genome-wide ancestry informative markers to study admixture throughout the Americas. PLoS Genetics, 8(3): e1002554. http://doi.org/10.1371/journal.pgen.1002554 (Última consulta: 11/10/2015).
García, A., L. Tovo-Rodrigues, M. Pauro, S. M. Callegari-Jacques, F. M. Salzano, M. H. Hutz y D. A. Demarchi. 2011. Caracterización del mestizaje en poblaciones del centro de Argentina a partir de marcadores moleculares informativos de ancestralidad (AIM). M. F. Cesani, Libro de Resúmenes de las Décimas Jornadas Nacionales de Antropología Biológica, 136, Asociación de Antropología Biológica Argentina, City Bell.
González, P. N., V. Bernal, S. I. Pérez, M. Del Papa, F. Gordon y G. Ghidini. 2004. El error de observación y su influencia en los análisis morfológicos de restos óseos humanos. Datos de variación discreta. Revista Argentina de Antropología Biológica, 6(1):35-46.
González-José, R., I. Escapa, W. A. Neves, R. Cúneo y H. M. Pucciarelli. 2011. Morphometric variables can be analyzed using cladistic methods: a reply to Adams et al. Journal of Human Evolution, 60(2):244-245.
Halder, I. y M. D. Shriver. 2003. Measuring and using admixture to study the genetics of complex diseases. Human Genomics, 1(1):52-62.
Handley, L. J., A. Manica, J. Goudet y F. Balloux. 2007. Going the distance: human population genetics in a clinal world. Trends in Genetics, 23(9):432-439.
Haryono, S. J., I. G. Datasena, W. B. Santosa, R. Mulyarahardja y K. Sari. 2015. A pilot genome-wide association study of breast cancer susceptibility loci in Indonesia. Asian Pacific Journal of Cancer Prevention, 16(6):2231-2235.
Heinz, T., V. Alvarez-Iglesias, J. Pardo-Seco, P. Taboada-Echalar, A. Gómez-Carballa, A. Torres-Balanza, O. Rocabado, A. Carracedo, C. Vullo y A. Salas. 2013. Ancestry analysis reveals a predominant Native American component with moderate European admixture in Bolivians. Forensic Science International. Genetics, 7(5):537-542.
International HapMap Consortium, 2003. The International HapMap Project. Nature, 426(6968):789-796.
Keene, K. L., J. C. Mychaleckyj, T. S. Leak, S. G. Smith, P. S. Perlegas, J. Divers, C. D. Langefeld, B. I. Freedman, D. W. Bowden y M. M. Sale. 2008. Exploration of the utility of ancestry informative markers for genetic association studies of African Americans with type 2 diabetes and end stage renal disease. Human Genetics, 124(2):147-154.
Manel, S., P. Berthier y G. Luikart. 2002. Detecting wildlife poaching: Identifying the origin of individuals with Bayesian assignment tests and multilocus genotypes. Conservation Biology, 16(3):650-659.
Marchini, J., L. R. Cardon, M. S. Phillips y P. Donnelly. 2004. The effects of human population structure on large genetic association studies. Nature Genetics, 36(5):512-517.
Nalls, M. A., J. G. Wilson, N. J. Patterson, A. Tandon, J. M. Zmuda, S. Huntsman, M. García, D. Hu, R. Li, B. A. Beamer, K. V. Patel, E. L. Akylbekova, J. C. Files, C. L. Hardy, S. G. Buxbaum, H. A. Taylor, D. Reich, T. B. Harris y E. Ziv. 2008. Admixture mapping of white cell count: genetic locus responsible for lower white blood cell count in the Health ABC and Jackson Heart studies. American Journal of Human Genetics, 82(1):81-87.
Peprah, E., H. Xu, F. Tekola-Ayele y C. D. Royal. 2015. Genome-wide association studies in Africans and African Americans: expanding the framework of the genomics of human traits and disease. Public Health Genomics, 18(1):40-51.
Pinheiro, J., D. Bates, S. DebRoy, D. Sarkar y R Core Team. 2015. nlme: Linear and Nonlinear Mixed Effects Models. R package version 3.1-120. http://CRAN.R-project.org/package=nlme.
Price, A. L., N. Patterson, F. Yu, D. R. Cox, A. Waliszewska, G. J. McDonald, A. Tandon, C. Schirmer, J. Neubauer, G. Bedoya, C. Duque, A. Villegas, M. C. Bortolini, F. M. Salzano, C. Gallo, G. Mazzotti, M. Tello-Ruiz, L. Riba, C. A. Aguilar-Salinas, S. Canizales-Quinteros, M. Menjivar, W. Klitz, B. Henderson, C. A. Haiman, C. Winkler, T. Tusie-Luna, A. Ruiz-Linares y D. Reich. 2007. A genomewide admixture map for Latino populations. American Journal of Human Genetics, 80(6):1024-1036.
Pritchard, J. K., M. Stephens y P. Donnelly. 2000. Inference of population structure using multilocus genotype data. Genetics, 155(2):945-959.
Pritchard, J. K. y P. Donnelly. 2001. Case-control studies of association in structured or admixed populations. Theoretical Population Biology, 60(3):227-237.
R Core Team. 2014. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/.
Rice, W. R. 1989. Analyzing tables of statistical tests. Evolution, 43(1):223-225.
Robbins, C., J. B. Torres, S. Hooker, C. Bonilla, W. Hernandez, A. Candreva, C. Ahaghotu, R. Kittles y J. Carpten. 2007. Confirmation study of prostate cancer risk variants at 8q24 in African Americans identifies a novel risk locus. Genome Research, 17(12):1717-1722.
Rohlf, F.J. y L. F. Marcus. 1993. A revolution in morphometrics. Trends in Ecology & Evolution, 8(4):129-132.
Rosenberg, N. A., J. K. Pritchard, J. L. Weber, H. M. Cann, K. K. Kidd, L. A. Zhivotovsky y M. W. Feldman. 2002. Genetic structure of human populations. Science, 298(5602):2381-2385.
Rosenberg, N. A., S. Mahajan, S. Ramachandran, C. Zhao, J. K. Pritchard y M. W. Feldman. 2005. Clines, clusters, and the effect of study design on the inference of human population structure. PLoS Genetics, 1(6):e70.
Rúa, O., I. M. Larráyoz, M. T. Barajas, S. Velilla y A. Martínez. 2012. Oral doxycycline reduces pterygium lesions; results from a double blind, randomized, placebo controlled clinical trial. PLoS One, 7(12):e52696. http://doi.org/10.1371/journal.pone.0052696 (Última consulta: 11/10/2015).
Ruiz-Linares, A., K. Adhikari, V. Acuña-Alonzo, M. Quinto-Sanchez, C. Jaramillo, W. Arias, M. Fuentes, M. Pizarro, P. Everardo, F. de Avila, J. Gómez-Valdés, P. León-Mimila, T. Hunemeier, V. Ramallo, C. C. Silva de Cerqueira, M. W. Burley, E. Konca, M. Z. de Oliveira, M. R. Veronez, M. Rubio-Codina, O. Attanasio, S. Gibbon, N. Ray, C. Gallo, G. Poletti, J. Rosique, L. Schuler-Faccini, F. M. Salzano, M. C. Bortolini, S. Canizales-Quinteros, F. Rothhammer, G. Bedoya, D. Balding y R. Gonzalez-José. 2014. Admixture in Latin America: Geographic structure, phenotypic diversity and self-perception of ancestry based on 7,342 individuals. PLoS Genetics, 10(9):e1004572. http://doi.org/10.1371/journal.pgen.1004572 (Última consulta: 11/10/2015).
Ruiz-Narváez, E. A., L. Rosenberg, L. A. Wise, D. Reich y J. Palmer. 2010. Validation of a small set of Ancestral Informative Markers for control of population admixture in African Americans. American Journal of Epidemiology, 173(5):587-592.
Schwarz, G. 1978. Estimating the dimension of a model. Annals of Statistics, 6(2):461-464.
Silva-Zolezzi, I., A. Hidalgo-Miranda, J. Estrada-Gil, J. C. Fernandez-Lopez, L. Uribe-Figueroa, A. Contreras, E. Balam-Ortiz, L. del Bosque-Plata, D. Velazquez-Fernandez, C. Lara, R. Goya, E. Hernandez-Lemus, C. Davila, E. Barrientos, S. March y G. Jimenez-Sanchez. 2009. Analysis of genomic diversity in Mexican Mestizo populations to develop genomic medicine in Mexico. Proceedings of the National Academy of Sciences of the United States of America, 106(21):8611-8616.
Tang, H., J. Peng, P. Wang y N. Risch. 2005. Estimation of individual admixture: analytical and study design considerations. Genetic Epidemiology, 28(4):289-301.
Torcida, S. y S. I. Pérez. 2012. Análisis de Procrustes y el estudio de la variación morfológica. Revista Argentina de Antropología Biológica, 14(1):131-141.
Toscanini, U., L. Gusmão, G. Berardi, A. Gómez, R. Pereira y E. Raimondi. 2011. Ancestry proportions in urban populations of Argentina. Forensic Science International: Genetics Supplement Series, 3(1):e387-e388.
Trinks, J., M. L. Hulaniuk, M. Caputo, L. B. Pratx, V. Ré, L. Fortuny, A. Pontoriero, A. Frías, O. Torres, F. Nuñez, V. Gadano, D. Corach y D. Flichman. 2014. Distribution of genetic polymorphisms associated with hepatitis C virus (HCV) antiviral response in a multiethnic and admixed population. The Pharmacogenomics Journal, 14(6):549-554.
Tsai, H. J., S. Choudhry, M. Naqvi, W. Rodriguez-Cintron, E. G. Burchard y E. Ziv. 2005. Comparison of three methods to estimate genetic ancestry and control for stratification in genetic association studies among admixed populations. Human Genetics, 118(3-4):424-433.
Turakulov, R. y S. Easteal. 2003. Number of SNPS loci needed to detect population structure. Human Heredity, 55(1):37-45.
Utermohle CJ, Zegura SL. 1982. Intra- and interobserver error in craniometry: a cautionary tale. Am J Phys Anthropol 57(3):303-10.
Wheeler, H. E., L. K. Gorsic, M. Welsh, A. L. Stark, E. R. Gamazon, N. J. Cox y M. E. Dolan. 2011. Genome-wide local ancestry approach identifies genes and variants associated with chemotherapeutic susceptibility in African Americans. PLoS One, 6(7):e21920. http://doi.org/10.1371/journal.pone.0021920 (Última consulta: 11/10/2015).
Zhang, Q., C. E. Lewis, L. E. Wagenknecht, R. H. Myers, J. S. Pankow, S. C. Hunt, K. E. North, J. E. Hixson, J. Jeffrey Carr, L. C. Shimmin, I. Borecki y M. A. Province. 2008. Genome-wide admixture mapping for coronary artery calcification in African Americans: the NHLBI Family Heart Study. Genetic Epidemiology, 32(3):264-272.
Zhu, X. y R. S. Cooper. 2007. Admixture mapping provides evidence of association of the VNN1 gene with hypertension. PLoS One, 2(11):e1244. http://doi.org/10.1371/journal.pone.0001244 (Última consulta: 11/10/2015).
Ziv, E., E. M. John, S. Choudhry, J. Kho, W. Lorizio, E. J. Perez-Stable y E. G. Burchard. 2006. Genetic ancestry and risk factors for breast cancer among Latinas in the San Francisco Bay Area. Cancer epidemiology, biomarkers & prevention: a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology, 15(10):1878-1885.
Downloads
Published
Issue
Section
License
Those authors who have publications with this Journalaccept the following terms:
a. Authors will retain their copyrights and guarantee the journal the right of first publication of their work, which will be simultaneously subject to the Creative Commons Attribution License (Licencia de reconocimiento de Creative Commons) that allows third parties to share the work as long as its author and his first publication in this journal.
b. Authors may adopt other non-exclusive licensing agreements for the distribution of the version of the published work (eg, deposit it in an institutional electronic file or publish it in a monographic volume) provided that the initial publication in this journal is indicated.
c. Authors are allowed and recommended to disseminate their work on the Internet (eg in institutional telematic archives or on their website) before and during the submission process, which can lead to interesting exchanges and increase citations of the published work. (See The Effect of Open Access - El efecto del acceso abierto)