Extracción de conocimiento con técnicas de minería de textos aplicadas a la psicología

Luciana Mariñelarena-Dondena, Marcelo Luis Errecalde, Alejandro Castro Solano


La extracción de conocimiento en bases de datos es un proceso complejo que en última instancia busca darle sentido a los datos. La minería de datos sólo constituye una etapa de este proceso cuyo objetivo consiste en la obtención de patrones y modelos aplicando métodos estadísticos y técnicas de aprendizaje automático. El presente artículo de revisión examina cómo pueden aplicarse las técnicas de minería de textos en el campo de la psicología. En este contexto, se describen los dos grandes propósitos de las técnicas de minería de textos: la descripción y la predicción. Finalmente, se destaca que la aplicación de técnicas de minería de textos en nuestra disciplina hace posible la medición o evaluación de distintos constructos psicológicos, a diferencia de la utilización de los tradicionales cuestionarios o encuestas.

Palabras claveTécnicas de Minería de Textos, Ciencias de la Computación, Evaluación, Psicología

Knowledge discovery applying text mining techniques in Psychology. The knowledge discovery in databases (KDD) is concerned with the non-trivial process of making sense of data. Data mining is only a step in the KDD process that consists in pattern recognition using statistics and machine learning techniques. This literature review focuses on how text mining techniques can be applied in Psychology. In this context, the two main purposes of text mining techniques will be introduced: description and prediction. Finally, this paper highlights the use of text mining techniques as a psychological assessment tool, which differs from the use of standard questionnaires or scales.

Keywords: Text Mining Techniques, Computer Sciences, Assessment, Psychology

Texto completo:



Bellmore, A., Calvin, A. J., Xu, J. M., & Zhu, X. (2015). The five W’s of ‘‘bullying’’ on Twitter: Who, What, Why, Where, and When. Computers in Human Behavior, 44, 305–314.

Conover, M. D., Gonçalves, B., Ratkiewicz, J., Flammini, A., & Menczer, F. (2011). Predicting the Political Alignment of Twitter Users. Privacy, Security, Risk and Trust (PASSAT) and 2011 IEEE Third International Conference on Social Computing (SocialCom). Boston, USA, 192-199. doi: 10.1109/PASSAT/SocialCom.2011.34

Desmet, B., & Hoste, V. (2013). Emotion detection in suicide notes. Expert Systems with Applications, 40(16), 6351–6358.

Eichstaedt, J. C., Schwartz, H. A., Kern, M. L., Park, G., Labarthe, D. R., Merchant, R. M., … Seligman, M. E. P. (2015). Psychological Language on Twitter Predicts County-Level Heart Disease Mortality. Psychological Science, 26(2), 159-169. doi: 10.1177/0956797614557867

Eftekhar, A., Fullwood, C., & Morris, N. (2014). Capturing personality from Facebook photos and photo-related activities: How much exposure do you need? Computers in Human Behavior, 37, 162–170.

Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). From Data Mining to Knowledge Discovery in Databases. AI Magazine, 17(3), 37-54.

Gandomi, A., & Haider, M. (2015). Beyond the hype: Big data concepts, methods, and analytics. International Journal of Information Management, 35(2), 137–144.

Hernández Orallo, J., Ramírez Quintana, M. J., & Ferri Ramírez, C. (2004). Introducción a la Minería de Datos. Madrid: Pearson Prentice Hall.

Jim Wu, Y. C., Chang, W. H., & Yuan, C. H. (2015). Do Facebook profile pictures reflect user’s personality? Computers in Human Behavior, 51(B), 880-889.

Joachims, T. (1998) Text categorization with Support Vector Machines: Learning with many relevant features. En C. Nédellec & C. Rouveirol (Eds.), Machine Learning: ECML-98. ECML 1998. Lecture Notes in Computer Science (Lecture Notes in Artificial Intelligence) (pp. 137-142). Heidelberg: Springer.

Joachims, T. (1999). Transductive inference for text classification using support vector machines. En I. Bratko & S. Dzeroski (Eds.), Proceedings of ICML-99, 16th International Conference on Machine Learning (pp. 200–209). San Francisco: Morgan Kaufmann Publishers.

Kern, M. L., Park, G., Eichstaedt, J. C., Schwartz, H. A., Sap, M., Smith, L. K., & Ungar, L. H. (2016). Gaining Insights From Social Media Language: Methodologies and Challenges. Psychological Methods, 21(4), 507-525.

Laney, D. (2001). 3-D data management: Controlling data volume, velocity and variety. Application Delivery Strategies, META Group Inc. Recuperado de http://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdf

Layton, R., Watters, P., & Dazeley, R. (2011). Recentred Local Profiles for Authorship Attribution. Natural Language Engineering, 18(3), 293-312.

Lex, E. (2011). Content Facets for Individual Information Needs in Media. (Tesis Doctoral). Graz University of Technology, Styria, Austria.

Mairesse, F., Walker, M. A., Mehl, M. R., & Moore, R. K. (2007). Using Linguistic Cues for the Automatic Recognition of Personality in Conversation and Text. Journal of Artificial Intelligence Research, 30, 457-500.

McCallum, A., & Nigam, K. (1998). A Comparison of Event Models for Naive Bayes Text Classification. Learning for Text Categorization: Papers from the 1998 AAAI Workshop, 752, 41-48.

Mitchell, T. M. (1996). Machine Learning. New York: McGraw Hill.

Pennebaker, J. W. (2002). What our words can say about us: Toward a broader language psychology. Psychological Science Agenda, 15(1), 8-9.

Russell, S., & Norvig, P. (2009). Artificial Intelligence: A Modern Approach (3rd ed.). New Jersey: Prentice Hall.

Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Dziurzynski, L., Lucas, R. E., Agrawal, M., … Ungar, L. H. (2013). Characterizing Geographic Variation in Well-Being using Tweets. Proceedings of the Seventh International AAAI Conference on Weblogs and Social Media (ICWSM), Boston, USA, 583-591

Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Dziurzynski, L., Ramones, S. M., Agrawal, M., … Ungar, L. H. (2013). Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach. PLOS ONE, 8(9), e73791.

Schwartz, H. A., & Ungar, L. H. (2015). Data-Driven Content Analysis of Social Media: A Systematic Overview of Automated Methods. The ANNALS of the American Academy of Political and Social Science, 659(1), 78-94.

Sebastiani, F. (2002). Machine Learning in Automated Text Categorization. ACM Computing Surveys, 34(1), 1-47.

Tausczik, Y. R., & Pennebaker, J. W. (2010). The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods. Journal of Language and Social Psychology, 29(1), 24-54.

DOI: http://dx.doi.org/10.30882/1852.4206.v9.n2.12701

Copyright (c) 2017 Revista Argentina de Ciencias del Comportamiento