Thinking aloud methodological elements to obtain evidence of content validity

Authors

DOI:

https://doi.org/10.35670/1667-4545.v23.n3.43899

Keywords:

thinking aloud, valid evidence, select sample, expert consistency index, quantitative reasoning skill

Abstract

Think-aloud has been implemented in educational research mainly to define the solution strategies of the subjects on different tasks. In particular, it has been used to determine the strategies the subjects implement to find solutions to items in educational tests. However, since the steps of the methodological think-aloud elements followed to obtain valid evidence are not explicit and are important to define, this article aims to provide the methodological steps, from a theoretical perspective so they can be  implemented. To develop the steps, a comprehensive literature review was carried out, and the following steps were proposed: 1) define the purpose, 2) elaborate the solution processes of the items from a theoretical perspective, 3) select the thinking-aloud sample, 4) simulation process, 5) data recollection, 6) think-aloud transcript and analysis. These steps can lead to a successful think-aloud process that produces reliable, valid evidence.

Downloads

Download data is not yet available.

Author Biography

Graciela Ordóñez-Gutiérrez, Universidad de Costa Rica

Instituto de Investigaciones Psicológicas, Docente-Investigadora.

References

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for Educational and Psychological Testing. American Educational Research Association.

Artavia-Medrano, Á. (2015). Interpretación y análisis de pruebas psicológicas y educativas con el método Rule-Space. Revista Actualidades en Psicología: Medición y Psicometría, 29(119), 63-77. https://doi.org/10.15517/ap.v29i119.18724

Brizuela, A., Jiménez-Alfaro, K., Pérez-Rojas, N., Rojas-Rojas, G. (2016). Autorreportes verbales en voz alta para la identificación de procesos de razonamiento en pruebas estandarizadas. Revista Costarricense de Psicología, 35(1), 17-30. http://dx.doi.org/10.22544/rcps.v35i01.02

Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37-46. https://doi.org/10.1177/001316446002000104

Embretson, S. (2017). An Integrative Framework for Construct Validity. En A. A. Rupp & J. P. Leighton (Eds.), The Handbook of Cognition and Assessment: Frameworks, Methodologies and Applications (pp. 102-123). Wiley Blackwell.

Ercikan, K., Arim, R., Law, D., Domene, J., Gagnon, F., & Lacroix, S. (2010). Application of think aloud protocols for examining and confirming source of Differential Item functioning identified by expert reviews. Educational Measurement: Issues and Practice, 29(2), 24-35. https://doi.org/10.1111/j.1745-3992.2010.00173.x

Ericsson, K. A., & Simon, H. A. (1984). Protocol analysis: Verbal reports as data. The MIT Press.

Ericsson, K. A., & Simon, H. A. (1987). Verbal reports on thinking. En C. Faerch & G. Kasper (Eds.), Multilingual matter: Introspection in second language research (pp. 24-53). Multilingual Matters.

Fleiss, J. L. (1971). Measuring Nominal Scale Agreement Among Many Raters. Psychological Bulletin, 76(5), 378-382. https://doi.org/10.1037/h0031619

Fonteyn, M. E, Kuipers, B., & Grobe, S. J. (1993). A Description of Think Aloud Method and Protocol Analysis. Qualitative Health Research, 3(4) 430-441. http://dx.doi.org/10.1177/104973239300300403

Green, A. (1998). Verbal Protocol Analysis in Language Testing Research: A Handbook. Cambridge University Press.

Joseph, G., & Patel, V. (1990). Domain knowledge and hypothesis generation in diagnostic reasoning. Medical Decision Making, 10(1), 31-46. https://doi.org/10.1177/0272989X9001000107

Keehner, M., Gorin, J. S., Feng, G., & Katz, I. R. (2017). Developing and validating cognitive models in assessment. En A. A. Rupp & J. P. Leighton (Eds.). The Handbook of Cognition and Assessment: Frameworks, Methodologies, and Applications. (pp. 75-101). Wiley Blackwell. https://doi/10.1002/9781118956588.ch4

Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159-174. https:/doi/10.2307/2529310

Leighton, J. P. (2004). Avoiding misconception, misuse and missed opportunities: The collection of verbal reports in educational achievement testing. Educational Measurement: Issues and Practice, 23(4), 6-15. https://doi.org/10.1111/j.1745-3992.2004.tb00164.x

Leighton, J. P. (2013). Item difficulty and interviewer knowledge effects on the accuracy and consistency of examinee response processes in verbal reports. Applied Measurement in Education, 26(2), 136-157. https://doi.org/10.1080/08957347.2013.765435

Leighton, J. P., Cui, Y., Ken-Cor, M. (2009). Testing expert-based and student-based cognitive models: An application of the attribute Hierarchy method and Hierarchy Consistency Index. Applied Measurement in Education, 22(3), 229-254. https://psycnet.apa.org/doi/10.1080/08957340902984018

Martínez-Arias, M. R., Hernández-Lloreda, M. J., & Hernández-Lloreda, M. V. (2006). Psicometría. Alianza Editorial.

Padilla, J. L, & Leighton, J. P. (2017). Cognitive interviewing and think aloud methods. En B. D. Zumbo, & A. M. Hubley (Eds.), Understanding

and investigating response processes in validation Research (pp. 211-228). Springer. https://doi.org/10.1007/978-3-319-56129-5_12

Padilla, J. L., & Benítez, I. (2014). Validity evidence based on response processes. Psicothema, 26(1), 136-144. https://doi.org/10.7334/psicothema2013.259

Prieto-Adánez, G. (2011). Evaluación de la ejecución mediante el modelo Many-Facet Rasch Measurement. Psicothema, 23(2), 233-238. http://www.psicothema.com/pdf/3876.pdf

Rojas-Torres, L., & Ordóñez-Gutiérrez, G. (2019). Proceso de construcción de pruebas educativas: El caso de la Prueba de Habilidades Cuantitativas. Revista Evaluar, 19(2), 15-29. https://doi.org/10.35670/1667-4545.v19.n2

Rovinelli, R. J., & Hambleton, R. K. (1977). On the use of content specialists in the assessment of criterion-referenced test item validity. Tijdschrift voor Onderwijsresearch, 2(2), 49-60. https://files.eric.ed.gov/fulltext/ED121845.pdf

Ruiz, F.J. & Luciano, C. (2012). Relacionar relaciones como modelo analítico-funcional de la analogía y la metáfora. Revista Latina de Análisis de Comportamiento, 20. https://www.redalyc.org/articulo.oa?id=274525194014

Russo, J. E., Johnson, E. J. and Stephens, D. L. (1989). The validity of verbal protocols. Memory & Cognition, 17, 759–769. https://link.springer.com/article/10.3758/BF03202637#Bib1

Sapsirin, S. (2016). The application of verbal protocol analysis in second/foreign language testing research. Revista de revisión del idioma - Universidad de Chulalongkorn, 31. https://www.culi.chula.ac.th/en/pasaa-paritat/view/9

Stelzer, F., Vernucci, S., Aydmune, Y. S., del Valle, M. V., & Andrés, M. L. (2020). Diseño y validación de una escala de actitudes hacia las matemáticas. Revista Evaluar, 20(2), 51-68. https://doi.org/10.35670/1667-4545.v20.n2.30109

Van Den-Haak, M., De Jong, M., & Jan-Schellens, P. (2003). Retrospective vs. Concurrent think-aloud protocols: Testing the usability of an online library catalogue. Behaviour & Information Technology, 22(5), 339-351. http://dx.doi.org/10.1080/0044929031000

Virzi, R. A. (1992). Refining the Test Phase of Usability Evaluation: How Many Subjects Is Enough? Human Factors, 34(4), 457-468. https://doi.org/10.1177/001872089203400407

Published

2023-12-23

How to Cite

Ordóñez-Gutiérrez, G. (2023). Thinking aloud methodological elements to obtain evidence of content validity. Revista Evaluar, 23(3), 45–60. https://doi.org/10.35670/1667-4545.v23.n3.43899

Issue

Section

Investigaciones originales