Variation in Likert scale of the mathematics usefulness test

Keywords: Psychological tests, psychometrics, mathematics education, university students

Abstract

The aim of this work was to compare the results obtained in validity and reliability studies of a mathematics usefulness scale when the Likert format of the items varies. The test measured beliefs about the importance attributed to mathematics for career progress and future professional development. The 939 Psychology students (81% female) who participated responded to the items using 3, 5 and 6 categories. The effect of the order of exposure of individuals to each format was controlled and other tests were also included to reduce answer memorization. Likert scales with more categories increased test reliability at the extreme levels of the trait, but at the expense of compromising the internal structure validity evidences (Confirmatory Factor Analysis and Partial Credit Model of the Item Response Theory). The relative efficiency function revealed that similar information is obtained for all levels of the trait when using 5 or 6-point scales. The number of Likert categories did not substantially affect the relationship between usefulness and other variables.

Downloads

Download data is not yet available.

References

Abal, F. J. P., Auné, S. E., & Attorresi, H. F. (2014). Comparación del Modelo de Respuesta Graduada y la Teoría Clásica de Tests en una escala de confianza para la matemática. Summa Psicológica UST, 11(2), 101-113. doi: 10.18774/summa-vol11.num2-158.

Abal, F. J. P., Auné, S. E., Lozzia, G. S., & Attorresi, H. F. (2015). Modelización de una prueba de afecto hacia la matemática con la teoría de respuesta al ítem. Revista de Psicología UCA, 11(21), 23-34. Recuperado de http://uca.edu.ar/es/facultad-de-psicologia-y-psicopedagogia/revista-de-psicologia

Abal, F. J. P., Auné, S. E., Lozzia, G. S. & Attorresi, H. F. (2017). Funcionamiento de la categoría central en ítems de Confianza para la Matemática. Evaluar, 17(2), 18-31. Recuperado de https://revistas.unc.edu.ar/index.php/revaluar.

Abal, F. J. P, Lozzia, G. S., Auné, S. E. & Attorresi, H. F. (2017). El Modelo de Crédito Parcial aplicado a la escala Distorsión del Big Five Questionnaire. Actualidades en Psicología, 31 (122), 133-148. doi: 10.15517/ap.v31i122.23499.

Abal, F. J. P., Galibert, M. S., Aguerri, M. E., & Attorresi, H. F. (2014). Comparación de los modelos respuesta graduada y crédito parcial aplicados a una escala de utilidad de la matemática. Revista Argentina de Ciencias del Comportamiento, 6(3), 6-16. Recuperado de https://revistas.unc.edu.ar/index.php/racc/

Adelson, J. L., & McCoach, D. B. (2011). Development and psychometric properties of the Math and Me Survey: Measuring third through sixth graders' attitudes towards mathematics. Measurement and Evaluation in Counseling and Development, 44 (4), 225-247. doi: 10.1177/0748175611418522.

Adelson, J. L., & McCoach, D. B. (2010). Measuring the mathematical attitudes of elementary students: The effects of a 4-point or 5-point Likert-type scale. Educational and Psychological Measurement, 70 (5), 796-807. doi: 10.1177/0013164410366694

Alwin, D. F., Baumgartner, E. M. & Beattie, B. A. (2018). Number of Response Categories and Reliability in Attitude Measurement. Journal of Survey Statistics and Methodology, 6 (2), 212–239. doi: 10.1093/jssam/smx025

Andrich, D. (2016). Rasch Rating-Scale Model. En W. J. van der Linden (Ed.). Handbook of Item Response Theory, Volume 1: Models (pp. 75-94). Boca Raton: Chapman y Hall/CRC.

Ato, M., López. J. J., & Benavente. A. (2013). Un sistema de clasificación de los diseños de investigación en psicología. Anales de Psicología. 29 (3). 1038 - 1059. doi: 10.6018/analesps.29.3.178511.

Auzmendi, E. (1992). Las actitudes hacia la matemática-estadística en las enseñanzas medias y universitarias. Bilbao: Mensajero.

Bandura, A. (2006). Guide for constructing self-efficacy scales. In F. Pajares & T. Urdan (Eds.). Self-efficacy beliefs of adolescents, (Vol. 5., pp. 307-337). Greenwich, CT: Information Age Publishing.

Bazán, J., & Sotero, H. (1998). Una aplicación al estudio de actitudes hacia la matemática en la Unalm. Anales Científicos UNALM, 36, 60-72.

Bisquerra, R., & Pérez-Escoda, N. (2015). ¿Pueden las escalas Likert aumentar en sensibilidad? REIRE, Revista d’Innovació i Recerca en Educació, 8 (2), 129-147. doi: 10.1344/reire2015.8.2828

Campbell, D. J. (1988). Task complexity: A review and analysis. Academy of Management Review, 13, 40-52. doi: 10.5465/amr.1988.4306775.

Champney, H., & Marshall, H. (1939). Optimal refinement of the rating scale. Journal of Applied Psychology, 23, 323-331.

Coe, R. & Merino, C. (2003). Magnitud del efecto: Una guía para investigadores y usuarios. Revista de Psicología de la PUCP, 21 (1), 147-177. Recuperado de http://revistas.pucp.edu.pe/index.php/psicologia

Cox, E. P. (1980). The optimal number of response alternatives for a scale: A review. Journal of Marketing Research, 17 (4), 407-422. doi: 10.2307/3150495.

Culpepper, S. A. (2013). The reliability and precision of total scores and IRT estimates as a function of polytomous IRT parameters and latent trait distribution. Applied Psychological Measurement, 37 (3), 201-225. doi: 10.1177/0146621612470210

DeVellis, R. F. (2017). Scale development: Theory and application (4th Edition). Newbury Park, CA: Sage.

Diedenhofen, B. & Musch, J. (2015). cocor: A Comprehensive Solution for the Statistical Comparison of Correlations. PLoS ONE, 10 (4), e0121945. doi:10.1371/journal.pone.0121945.

DiStefano, C., Morgan, G. B. & Motl, R. W. (2012).An examination of personality characteristics related to acquiescence.Journal of applied measurement, 13(1), 41-56.

Domínguez-Lara, S. & Merino-Soto, C. (2015). ¿Por qué es importante reportar los intervalos de confianza del coeficiente alfa de Cronbach? Revista Latinoamericana de Ciencias Sociales, Niñez y Juventud, 13, 1326 - 1328.

Dunn, T. J., Baguley, T., & Brunsden, V. (2014). From alpha to omega: A practical solution to the pervasive problem of internal consistency estimation. British Journal of Psychology, 105 (3), 399–412. doi: 10.1111/bjop.12046

Durán, A, Ocaña, A. C., Cañadas, I. & Pérez Santamaría, F. J. (2000). Construcción de cuestionarios para encuestas: el problema de la familiaridad de las opciones de respuesta. Metodología de Encuestas, 2 (1) 27-60. Recuperado de http://casus.usal.es/pkp/index.php/MdE

Elosua, P. & Zumbo, B. D. (2008). Coeficientes de fiabilidad para escalas de respuesta categórica ordenada. Psicothema, 20 (4), 896-901.

Embretson, S. & Reise, S. (2000). Item Response Theory for Psychologists. Mahwah, NJ: Erlbaum Publishers.

Fennema, E., & Sherman, J. A. (1976). Fennema-Sherman Mathematics Attitudes Scales: Instruments designed to measure attitudes toward the learning of mathematics by females and males. Journal for Research in Mathematics Education, 7 (5), 324-326. doi: 10.2307/748467.

Finn, J. A., Ben-Porath, Y. S. & Tellegen, A. (2015). Dichotomous Versus Polytomous Response Options in Psychopathology Assessment: Method or Meaningful Variance? Psychological Assessment, 27 (1), 184-193. doi: 10.1037/pas0000044.

Gadermann, A. M., Guhn, M., & Zumbo, B. D. (2012). Estimating Ordinal Reliability for Likert-Type and Ordinal Item Response Data: A Conceptual, Empirical, and Practical Guide. Practical Assessment, Research & Evaluation, 17(3), 1-13.

Gempp, R., Denegri, M., Caprile, C., Cortés, L., Quesada, M. & Sepúlveda, J. (2006). Medición de la alfabetización económica en niños: oportunidades diagnósticas con el modelo de crédito parcial. Psykhe, 15 (1), 13-27. doi: 10.4067/S0718-22282006000100002.

George, D. & Mallery, M. (2016). IBM SPSS Statistics 23 Step by Step A Simple Guide and Reference (14th Edition). Boston, MA: Allyn and Bacon.

Ghasemi, A. & Zahedias, S. (2012). Normality Tests for Statistical Analysis: A Guide for Non-Statisticians. Int J Endocrinol Metab, 10 (2), 486–489. doi: 10.5812/ijem.3505.

González, V., & Espejo, B. (2003). Testing the middle response categories “Not sure”, “In between” and “?” in polytomous items. Psicothema, 15(2), 278-284.

González-Betanzos, F., Leenen, I., Lira-Mandujano, J. & Vega-Valero, Z. (2012). The Effect of the Number of Answer Choices on the Psychometric Properties of Stress Measurement in an Instrument Applied to Children. Evaluar, 12, 43 – 59.

Guilford, J. P. (1954). Psychometric methods (2nd ed.). New York: McGraw-Hill.

Hair, J. F., Black, W. C., Babin, B. J. & Anderson, R. E., (2009). Multivariate Data Analysis (7th edition). Upper Saddle River, NJ: Prentice Hall.

Hernández, A., Muñiz, J. & García-Cueto, E. (2000). Comportamiento del modelo de respuesta graduada en función del número de categorías de la escala. Psicothema, 12 (2), 288-291.

Jones, W. P. & Loe, S. A. (2013). Optimal Number of Questionnaire Response Categories: More May Not Be Better. SAGE. Open, 3, 1–10. doi: 10.1177/2158244013489691

Joshi, A., Kale, S., Chandel, S., & Pal, D. K. (2015). Likert -scale: Explored and explained. British Journal of Applied Science & Technology, 7 (4), 396-403. doi: 10.9734/BJAST/2015/14975

Kelley, K. (2007). Constructing confidence intervals for standardized effect sizes: Theory, application, and implementation. Journal of Statistical Software, 20 (8), 1–24. doi: 10.18637/jss.v020.i08

Kelley, K., & Pornprasertmanit, S. (2016). Confidence intervals for population reliability coefficients: Evaluation of methods, recommendations, and software for composite measures. Psychological Methods, 21, 69 – 92. doi:10.1037/a0040086

Kramp, U. (2006). Efecto del número de opciones de respuesta sobre las propiedades psicométricas de los cuestionarios de personalidad (Tesis doctoral, Universidad de Barcelona). Recuperada de http://www.tesisenred.net/bitstream/handle/10803/2535/ UKD_TESIS.pdf?sequence=1

Kulas, J. T., & Stachowski, A. A. (2013). Respondent rationale for neither agreeing nor disagreeing: Person and item contributors to middle category endorsement intent on Likert personality indicators. Journal of Research in Personality, 47 (4), 254–262. doi: 10.1016/j.jrp.2013.01.014

Lee, J. & Paek, I. (2014). In Search of the Optimal Number of Response Categories in a Rating Scale. Journal of Psychoeducational Assessment, 32 (7), 663–673. doi: 10.1177/0734282914522200.

Linacre, J. M. (2012). Winsteps® Rasch measurement computer program User's Guide. Beaverton, Oregon: Winsteps.com.

Lloret-Segura, S., Ferreres-Traver, A., Hernández-Baeza, A., & Tomás-Marco, I. (2014). El análisis factorial exploratorio de los ítems: una guía práctica, revisada y actualizada. Anales de Psicología, 30 (3), 1151-1169.

Lozano, L. M., García-Cueto, E. & Muñiz, J. (2008). Effect of the Number of Response Categories on the Reliability and Validity of Rating Scales. Methodology, 4 (2), 73–79. doi 10.1027/1614-2241.4.2.73.

MacDonald, K. (2018). A Review of the Literature: The Needs of Nontraditional Students in Postsecondary Education. Strategic Enrollment Management Quarterly, 5 (4), 159–164. doi:10.1002/sem3.20115

Martínez, O. J. (2008). Actitudes hacia la matemática. Sapiens. Revista Universitaria de Investigación, 9 (1), 237-256.

Masters, G. N. & Wright, B. D. (1997). The Partial Credit Model. En W. J. Van der Linden y R. K. Hambleton (Eds.). Handbook of Modern Item Response Theory, (pp. 101-121). New York: Springer.

Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149 - 174.

Masters, G. N. (2016). Partial Credit Model. En W. J. van der Linden (Ed.). Handbook of Item Response Theory, Volume 1: Models (pp. 109-126). Boca Raton: Chapman & Hall/CRC.

Matas, A. (2018). Diseño del formato de escalas tipo Likert: un estado de la cuestión. Revista Electrónica de Investigación Educativa, 20 (1), 38-47. doi: 10.24320/redie.2018 .20.1.1347

Matell, M. S. & Jacoby, J. (1971) Is there an optimal number of Likert scale items? Study I: Reliability and validity. Educational and Psychological Measurement, 31, 657-674.

Maydeu-Olivares, A., Kramp, U., García-Forero, C., Gallardo-Pujol, D. & Coffman, D. (2009). The effect of varying the number of response alternatives in rating scales: Experimental evidence from intra-individual effects. Behavior Research Methods, 41 (2), 295-308. doi:10.3758/BRM.41.2.295.

McLeod, D. & McLeod, S. (2002). Synthesis – Beliefs and Mathematics Education: Implications for Learning, Teaching and Research. En G. Leder, E. Pehkonen, & G. Törner (Eds.), Beliefs: A hidden variable in mathematics education? (pp. 115-126). Dordrecht: Kluwer Academic Publishers.

Morales, P.M. (2006). Medición de actitudes en Psicología y Educación. Madrid: Universidad Pontificia Comillas.

Muñiz, J., García-Cueto, E., & Lozano, L. M. (2005). Item format and the psychometric properties of the Eysenck Personality Questionnaire. Personality and Individual Differences, 38, 61-69. doi: 10.1016/j.paid.2004.03.021

Muthén, L. & Muthén, B. (2010). Mplus User’sGuide, 6thEdn. Los Angeles, CA: Muthén & Muthén.

Nunes, C. H. S. S., Primi, R., Nunes, M. F. O., Muniz, M., Cunha, T. F. & Couto, G. (2008). Teoria de Resposta ao Item para otimização de escalas tipo likert– um exemplo de aplicação. RIDEP, 25, 51 – 79.

Palacios, A., Arias, V., & Arias, B. (2014). Attitudes Towards Mathematics: Construction and Validation of a Measurement Instrument. Revista de Psicodidáctica, 19 (1), 67-91. doi: 10.1387/RevPsicodidact.8961.

Penfield, R. D. (2014). An NCME Instructional Module on Polytomous Item Response Theory Models. Educational Measurement: Issues and Practice, 33, 36-48. doi: 10.111/emip.12023

Peters, G. J. Y. (2014). The alpha and the omega of scale reliability and validity: Why and how to abandon Cronbach’s alpha and the route towards more comprehensive assessment of scale quality. European Health Psychologist, 16 (2), 56–69. Recuperado de http://www.ehps.net/ehp/index.php/contents.

Preston, C. C. & Colman, A. M. (2000). Optimal number of response categories in rating scales: Reliability, validity, discriminating power, and respondent preferences. Acta Psychologica, 104, 1-15. doi:10.1016/S0001-6918(99)00050-5.

Rasch, G. (1960). Probabilistic Models for Some Intelligence and Attainment Tests. Copenhague: The Danish Institute for Educational Research.

Rojas, A. J. & Pérez, C. (2001). Nuevos Modelos para la Medición de Actitudes. Valencia: Promolibro.

Samejima, F. (2016). Graded Response Model. En W. J. van der Linden (Ed.). Handbook of Item Response Theory, Volume 1: Models (pp. 95-108). Boca Raton: Chapman y Hall/CRC.

Sancerini, M. D., Meliá, J. L., & González-Romá, V. (1990). Formato de respuesta, fiabilidad y validez, en la medición del conflicto de rol. Psicológica, 11, 167-175.

Sancerni, M. D., Meliá, J. L. & González-Romá, V. (1990). Formato de respuesta, fiablidad y validez en la medición del conflicto de rol. Psicológica, 11, (2), 167-175.

Shea, T. S., Tennant, A. & Pallant, J. F. (2009). Rasch model analysis of the Depression, Anxiety and Stress Scales (DASS). BMC Psychiatry, 9 (1), 21. doi: 10.1186/1471-244X-9-21

Sijtsma, K. (2009). On the Use, the Misuse, and the Very Limited Usefulness of Cronbach’s Alpha. Psychometrika, 74 (1), 107–120.

Smith, A. B., Fallowfield, L. J., Stark, D. P., Velikova, G. & Jenkins, V. (2010). A Rasch and confirmatory factor analysis of the General Health Questionnaire (GHQ)-12. Health and Quality of Life Outcomes, 8 (1), 45. doi 10.1186/1477-7525-8-45

Symonds, P. M. (1924). On the Loss of Reliability in Ratings Due to Coarseness of the Scale. Journal of Experimental Psychology, 7 (6), 456-461. doi: 10.1037/h0074469

Tapia, M., & Marsh, G. E. (2004). An instrument to measure mathematics attitudes. Academic Exchange Quarterly, 8, 16-21.

Toland, M. D. & Usher, E. L. (2016). Assessing Mathematics Self-Efficacy: How Many Categories Do We Really Need? The Journal of Early Adolescence, 36, 932-960. doi: 10.1177/0272431615588952.

Vendramini, C., Silva, M. & Dias, A. (2009). Avaliação de atitudes de estudantes de psicologia via modelo de crédito parcial da TRI. Psico-USF, 14 (3), 287-298. doi: 10.1590/S1413-82712009000300005.

Wakita, T., Ueshima, N. & Noguchi, H. (2012). Psychological Distance Between Categories in the Likert Scale: Comparing Different Numbers of Options. Educational and Psychological Measurement, 72 (4): 533- 546. doi: 10.1177/0013164411431162

Weathers, D., Sharma, S. & Niedrich, R. W. (2005). The impact of the number of scale points, dispositional factors, and the status quo decision heuristic on scale reliability and response accuracy. Journal of Business Research, 58 (11), 1516-1524. doi: 10.1016/j.jbusres.2004.08.002

Weng, L.J. (2004). Impact of the number of response categories and anchor labels on coefficient alpha and test-retest reliability. Educational and Psychological Measurement, 64, 956-972. doi: 10.1177/0013164404268674

Wetzel, E. & Greiff, S. (2018). The world beyond rating scales. Why we should think more carefully about the response format in questionnaires. European Journal of Psychological Assessment, 34, 1-5. doi: 10.1027/1015-5759/a000469.

Willse, J. T. (2017). Polytomous Rasch Models in Counseling Assessment. Measurement and Evaluation in Counseling and Development, 50 (4), 248-255. doi: 10.1080/07481756.2017.1362656

Wright, B. D., Linacre, J. M., Gustafson, J. E., & Martin-Lof, P. (1994). Reasonable mean-square fit values. Rasch measurement transactions, 8 (3), 370. Recuperado de https://rasch.org/rmt/contents.htm

Zanini, D. S. & Peixoto, E. M. (2016). Social Support Scale (MOS-SSS): Analysis of the Psychometric Properties via Item Response Theory. Paidéia, 26 (65), 359-368. doi: 10.1590/1982-43272665201612

Zou, G. Y. (2007). Toward Using Confidence Intervals to Compare Correlations. Psychological Methods, 12 (4), 399–413. doi: 10.1037/1082-989X.12.4.399

Published
2018-09-01
How to Cite
Abal, F., Auné, S., & Attorresi, H. (2018). Variation in Likert scale of the mathematics usefulness test. Interacciones, 4(3), 177-189. https://doi.org/10.24016/2018.v4n3.134