The Influence of Item Discrimination on Misclassification of Test Takers

Authors

DOI:

https://doi.org/10.35670/1667-4545.v21.n3.36294

Keywords:

decision accuracy, item discrimination, item response theory, Rudner algorithm, information function

Abstract

It has been suggested that low discriminating items can be included in a test with a criterion-referenced score interpretation as long as they measure a highly relevant content. However, low item discrimination increases the standard error of measurement, which might increase the expected proportion of misclassified test takers. In order to test it, responses from 2000 test takers to 100 items were simulated, varying item discrimination values and number and location of cut scores, and classification inaccuracy was estimated. Results show that the expected proportion of misclassified test takers increased as item discrimination decreased, and as the cut scores were closer to the mean of the distribution of test takers. Therefore, a test should include as few items with low discrimination values as possible —or even none— in order to reduce the expected proportion of test takers classified into a wrong performance level.

Downloads

Download data is not yet available.

Author Biography

Raúl Emmanuel Trujano, Centro Nacional de Evaluación para la Educación Superior

Licenciado y Doctor en Psicología por la Universidad Nacional Autónoma de México. Jefe del Departamento de Asesoría Técnica para el Diseño en el Centro Nacional de Evaluación para la Educación Superior. Sus intereses de investigación son el diseño de especificaciones de exámenes de evaluación educativa, la elaboración de ítems y los estudios de validación de los usos de un examen.

References

Baker, F. B., & Kim, S.-H. (2017). The basics of Item Response Theory using R. New York, N.Y: Springer. doi: 10.1007/978-3-319-54205-8

Burton, R. F. (2001). Do item-discrimination indices really help us to improve our tests? Assessment & Evaluation in Higher Education, 26(3), 213-220. doi: 10.1080/02602930120052378

Cheng, Y., Liu, C., & Behrens, J. (2015). Standard error of ability estimates and the classification accuracy and consistency of binary decisions. Psychometrika, 80(3), 645-664. doi: 10.1007/s11336-014-9407-z

Clifford, R. (2016). A rationale for criterion-referenced proficiency testing. Foreign Language Annals, 49(2), 224-234. doi: 10.1111/flan.12201

DeMars, C. (2010). Item Response Theory. Oxford, Oxfordshire: Oxford University Press.

Ercikan, K., & Julian, M. (2002). Classification accuracy of assigning student performance to proficiency levels: Guidelines for assessment design. Applied Measurement in Education, 15(3), 269-294. doi: 10.1207/S15324818AME1503_3

Frisbie, D. A. (2005). Measurement 101: Some fundamentals revisited. Educational Measurement: Issues and Practice, 24(3), 21-28. doi: 10.1111/j.1745-3992.2005.00016.x

Haladyna, T. M. (2016). Item analysis for selected-response test items. In S. Lane, M. R. Raymond, & T. M. Haladyna (Eds.), Handbook of test development (2nd ed, pp. 392-409). New York, NY: Routledge.

Haladyna, T. M., Rodriguez, M. C., & Stevens, C. (2019). Are multiple-choice items too fat? Applied Measurement in Education, 32(4), 350-364. doi: 10.1080/08957347.2019.1660348

Hambleton, R. K., & Jones, R. W. (1994). Item parameter estimation errors and their influence on test information functions. Applied Measurement in Education, 7(3), 171-186. doi: 10.1207/s15324818ame0703_1

Lathrop, Q. N. (2014). R package cacIRT: Estimation of classification accuracy and consistency under item response theory. Applied Psychological Measurement, 38(7), 581-582. doi: 10.1177/0146621614536465

Lathrop, Q. N. (2015). Practical issues in estimating classification accuracy and consistency with R package cacIRT. Practical Assessment, Research, and Evaluation, 20, Article 18. Retrieved from https://scholarworks.umass.edu/pare/vol20/iss1/18

Lathrop, Q. N., & Cheng, Y. (2013). Two approaches to estimation of classification accuracy rate under item response theory. Applied Psychological Measurement, 37(3), 226-241. doi: 10.1177/0146621612471888

Lathrop, Q. N., & Cheng, Y. (2014). A nonparametric approach to estimate classification accuracy and consistency. Journal of Educational Measurement, 51(3), 318-334. doi: 10.1111/jedm.12048

Lee, W.-C. (2010). Classification consistency and accuracy for complex assessments using item response theory. Journal of Educational Measurement, 47(1), 1-17. doi: 10.1111/j.1745-3984.2009.00096.x

Leydold, J., & H”ormann, W. (2021). Runuran: R interface to the ‘UNU.RAN’ random variate generators (Version 0.34) [R package]. Retrieved from https://cran.r-project.org/web/packages/Runuran/index.html

Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum Associates.

Luecht, R. M. (2016). Applications of item response theory: Item and test information functions for designing and building mastery tests. In S. Lane, M. R. Raymond, & T. M. Haladyna (Eds.), Handbook of Test Development (2nd ed.). New York, NY: Routledge.

Martineau, J. A. (2007). An expansion and practical evaluation of expected classification accuracy. Applied Psychological Measurement, 31(3), 181-194. doi: 10.1177/0146621606291557

Paek, I., & Han, K. T. (2013). IRTPRO 2.1 for Windows (item response theory for patient-reported outcomes). Applied Psychological Measurement, 37(3), 242-252. doi: 10.1177/0146621612468223

Partchev, I., Maris, G., & Hattori, T. (2017). irtoys: A collection of functions related to item response theory (IRT) (Version 0.2.1) [R package]. Retrieved from https://cran.r-project.org/package=irtoys

Popham, W. J. (2014). Criterion-referenced measurement: Half a century wasted? Educational Leadership, 71(6), 62-66. Retrieved from http://www.ascd.org/publications/educational_leadership/mar14/vol71/num06/Criterion-Referenced_Measurement@_Half_a_Century_Wasted%C2%A2.aspx

Popham, W. J., & Husek, T. R. (1969). Implications of criterion-referenced measurement. Journal of Educational Measurement, 6(1), 1-9. doi: 10.1111/j.1745-3984.1969.tb00654.x

R Core Team. (2020). R: A language and environment for statistical computing (Version 4.0.2). [Computer software]. Retrieved from https://www.R-project.org

Ramírez-Benítez, Y., Jiménez-Morales, R. M., & Díaz-Bringas, M. (2015). Matrices progresivas de Raven: Punto de corte para preescolares 4 - 6 años. Revista Evaluar, 15(1), 123-133. doi: 10.35670/1667-4545.v15.n1.14911

Richaud de Minzi, M. C. (2008). Nuevas tendencias en psicometría. Revista Evaluar, 8(1), 1-19. doi: 10.35670/1667-4545.v8.n1.501

Rizopoulos, D. (2018). ltm: Latent trait models under IRT (Version 1.1-1) [R package]. Retrieved from https://CRAN.R-project.org/package=ltm

Rudner, L. M. (2001). Computing the expected proportions of misclassified examinees. Practical Assessment, Research, and Evaluation, 7, Article 14. doi: 10.7275/an9m-2035

Rudner, L. M. (2005). Expected classification accuracy. Practical Assessment, Research, and Evaluation, 10, Article 13. doi: 10.7275/56a5-6b14

Wickham, H. (2016). ggplot2: Elegant graphics for data analysis (2nd ed.). New York, NY: Springer. doi: 10.1007/978-3-319-24277-4

Wickham, H., Chang, W., Henry, L., Pedersen, T. L., Takahashi, K., Wilke, C., ... & RStudio. (2021). ggplot2: Create elegant data visualisations using the grammar of graphics (Version 3.3.5) [R package]. Retrieved from https://cran.r-project.org/web/packages/ggplot2/index.html

Wyse, A. E., & Hao, S. (2012). An evaluation of item response theory classification accuracy and consistency indices. Applied Psychological Measurement, 36(7), 602-624. doi: 10.1177/0146621612451522

Xing, D., & Hamleton, R. K. (2004). Impact of test design, item quality, and item bank size on the psychometric properties of computer-based credentialing examinations. Educational and Psychological Measurement, 64(1), 5-21. doi: 10.1177/0013164403258393

Yen, W. M. (1987). A comparison of the efficiency and accuracy of BILOG and LOGIST. Psychometrika, 52(2), 275-291. doi: 10.1007/BF02294241

Published

2021-12-24

How to Cite

Trujano, R. E. (2021). The Influence of Item Discrimination on Misclassification of Test Takers. Revista Evaluar, 21(3), 15–34. https://doi.org/10.35670/1667-4545.v21.n3.36294

Issue

Section

Investigaciones originales