Behaviour of product-moment and tetrachoric-polychoric correlations in ordinal scales: a simulation study

Authors

  • Fernando Martínez-Abad University Institute of Science Education, University of Salamanca
  • María José Rodríguez-Conde University Institute of Science Education, University of Salamanca

DOI:

https://doi.org/10.7203/relieve.23.2.9476

Keywords:

Simulation, Multivariate analysis, correlation analysis, Product-moment correlation, Tetrachoric correlation, Polychoric correlation, Measurement, Attitude scale

Abstract

The statistical multivariate analysis of Likert response scales, given their widespread use, is a controversial issue in the scientific community, mainly from the specification of the problem of measurement. This work aims to study various conditions of these ordinal scales affect the calculation of the product-moment and tetrachoric-polychoric correlation coefficients. For this purpose, a simulation study was applied in which 90 databases with 10 items each were generated. In the estimation of the databases, the following variables were controlled: number of response categories, symmetrical or asymmetric distributions of data, sample size and level of relationship between items. Thus, 90 matrices (10x10) were obtained which included the difference between the product-moment and tetrachoric-polychoric correlations. The graphical and variance analysis show how the product-moment correlation coefficient significantly underestimates the relationship between variables mainly when the number of response categories of the ordinal scale is small and the relationship between the variables is large. On the other hand, the statistical estimation of both coefficients is very similar when the starting relationship between pairs of variables is small and/or when the number of response options of the variables is greater than 5. The study concludes by making a recommendation to the applied researcher on the most appropriate correlation coefficient depending on the type of data available. Finally, the results are discussed from the previous studies, which reach some similar conclusions.

Author Biographies

Fernando Martínez-Abad, University Institute of Science Education, University of Salamanca

Profesor Ayudante Doctor Área de Métodos de Investigación y Diagnóstico en Educación Departamento de Didáctica, Organización y Métodos de Investigación Universidad de Salamanca.

María José Rodríguez-Conde, University Institute of Science Education, University of Salamanca

Catedrática de Universidad Área de Métodos de Investigación y Diagnóstico en Educación Departamento de Didáctica, Organización y Métodos de Investigación Universidad de Salamanca.

References

Abad, F. J. (2011). Medición en ciencias sociales y de la salud. Madrid: Síntesis.

Bandalos, D. L., & Enders, C. K. (1996). The Effects of Nonnormality and Number of Response Categories on Reliability. Applied Measurement in Education, 9(2), 151-160. doi: https://doi.org/10.1207/s15324818ame0902_4

Birkett, N. J. (1986). Selecting the number of response categories for a Likert-type scale. En Proceedings of the American Statistical Association (pp. 488-492). USA.

Burga León, A. (2012). La unidimensionalidad de un instrumento de medición: perspectiva factorial. Revista de Psicología, 24(1), 53-80.

Cain, M.K., Zhang, Z. & Yuan, K.H. (en prensa). Univariate and multivariate skewness and kurtosis for measuring nonnormality: Prevalence, inflluence and estimation. Behavior Research Methods. doi: https://doi.org/10.3758/s13428-016-0814-1

Chan, J. C. (1991). Response-Order Effects in Likert-Type Scales. Educational and Psychological Measurement, 51(3), 531-540. doi: https://doi.org/10.1177/0013164491513002

Choi, J., Peters, M., & Mueller, R. O. (2010). Correlational analysis of ordinal data: from Pearson’s r to Bayesian polychoric correlation. Asia Pacific education review, 11(4), 459-466. doi: https://doi.org/10.1007/s12564-010-9096-y

Cicchetti, D. V., Shoinralter, D., & Tyrer, P. J. (1985). The Effect of Number of Rating Scale Categories on Levels of Interrater Reliability : A Monte Carlo Investigation. Applied Psychological Measurement, 9(1), 31-36. doi: https://doi.org/10.1177/014662168500900103

Cohen, J. (1969). Statistical power analysis for the behavioral sciences. New York: Academic Press.

Corder, G. W., & Foreman, D. I. (2009). Nonparametric statistics for non-statisticians: a step-by-step approach. San Francisco: Wiley. doi: https://doi.org/10.1002/9781118165881

Cox, E. P. (1980). The Optimal Number of Response Alternatives for a Scale: A Review. Journal of Marketing Research, 17(4), 407-422. doi: https://doi.org/10.2307/3150495

Ferreyra, M. F., & Backhoff-Escudero, E. (2016). Validez del Generador Automático de Ítems del Examen de Competencias Básicas (Excoba). RELIEVE, 22(1). doi: https://doi.org/10.7203/relieve.22.1.8048

Freiberg Hoffmann, A., Stover, J. B., de la Iglesia, G., & Fernández Liporace, M. (2013). Correlaciones policóricas y tetracóricas en estudios factoriales exploratorios y confirmatorios. Ciencias Psicológicas, 7(2), 151-164.

García Cueto, E., Muñiz Fernández, J., & Hernández Baeza, A. (2000). Comportamiento del modelo de respuesta graduada en función del número de categorías de la escala. Psicothema, 12(2), 288-291.

Gilley, W. F., & Uhlig, G. E. (1993). Factor Analysis and Ordinal Data. Education, 114(2), 258.

González-González, H., Álvarez-Castillo, J.-L., & Fernández-Caminero, G. (2015). Desarrollo y validación de una escala de medida de la empatía intercultural. RELIEVE, 21(2). doi: https://doi.org/10.7203/relieve.21.2.7841

Holgado–Tello, F. P., Chacón–Moscoso, S., Barbero–García, I., & Vila–Abad, E. (2008). Polychoric versus Pearson correlations in exploratory and confirmatory factor analysis of ordinal variables. Quality & Quantity, 44(1), 153-166. doi: https://doi.org/10.1007/s11135-008-9190-y

Hopkins, W. G. (2000). A new view of statistics. Recuperado a partir de http://www.sportsci.org/resource/stats/

Lara, S. A. D. (2014). ¿Matrices Policóricas/Tetracóricas o Matrices Pearson? Un estudio metodológico. Recuperado 16 de junio de 2016, a partir de http://www.redalyc.org/articulo.oa?id=333430869006

Lévy Mangin, J.-P. (2006). Modelización con estructuras de covarianzas en ciencias sociales: temas esenciales, avanzados y aportaciones especiales. España: Netbiblo.

López González, E. (2012). Sugerencias para el análisis de escalas con métrica delicada. Revista Iberoamericana de Evaluación Educativa, 5(1), 84-105.

Lozano, L. M., García-Cueto, E., & Muñiz, J. (2008). Effect of the Number of Response Categories on the Reliability and Validity of Rating Scales. Methodology, 4(2), 73-79. doi: https://doi.org/10.1027/1614-2241.4.2.73

Marcus-Roberts, H. M., & Roberts, F. S. (1987). Meaningless Statistics. Journal of Educational and Behavioral Statistics, 12(4), 383-394. doi: https://doi.org/10.3102/10769986012004383

Muñiz, J. (1998). La medición de lo psicológico. Psicothema, 10(1), 1-21.

Matell, M. S., & Jacoby, J. (1972). Is there an optimal number of alternatives for Likert-scale items? Effects of testing time and scale properties. Journal of Applied Psychology, 56(6), 506-509. doi: https://doi.org/10.1037/h0033601

Maydeu-Olivares, A., Kramp, U., García-Forero, C., Gallardo-Pujol, D., & Coffman, D. (2009). The effect of varying the number of response alternatives in rating scales: Experimental evidence from intra-individual effects. Behavior Research Methods, 41(2), 295-308. doi: https://doi.org/10.3758/BRM.41.2.295

Morales Vallejo, P. (2000). Medición de actitudes en psicología y educación: construcción de escalas y problemas metodológicos. Madrid: Universidad Pontificia Comillas.

Morales Vallejo, P., Urosa, S., & Blanco, A. (2003). Construcción de escalas de actitudes tipo likert: una guía práctica. Madrid: La Muralla.

Morata-Ramírez, M. de los Á., & Holgado-Tello, F. P. (2013). Construct Validity of Likert Scales through Confirmatory Factor Analysis: A Simulation Study Comparing Different Methods of Estimation Based on Pearson and Polychoric Correlations. International Journal of Social Science Studies, 1(1), 54-61. doi: https://doi.org/10.11114/ijsss.v1i1.27

Muthen, B., & Kaplan, D. (1992). A comparison of some methodologies for the factor analysis of non-normal Likert variables: A note on the size of the model. British Journal of Mathematical and Statistical Psychology, 45(1), 171-189. doi: https://doi.org/10.1111/j.2044-8317.1992.tb00975.x

Myers, N.D., Ahn, S., Lu, M., Celimli, S., & Zopluoglu, C. (2017). Reordering and reflecting factors for simulation studies with exploratory factor analysis. Structural Equation Modeling, 24(1), 112-128. doi: https://doi.org/10.1080/10705511.2016.1230721

Nunnally, S. W. (2010). Construction Methods and Management (8 edition). Upper Saddle River, N.J: Pearson.

Oliden, P. E., & Zumbo, B. D. (2008). Coeficientes de fiabilidad para escalas de respuesta categórica ordenada. Psicothema, 20(4), 896-901.

Olmos Migueláñez, S., Martínez Abad, F., Torrecilla Sánchez, E. M., & Mena Marcos, J. (2014). Análisis psicométrico de una escala de percepción sobre la utilidad de Moodle en la universidad. RELIEVE, 20(2), art. 1. doi: https://doi.org/10.7203/relieve.20.2.4221

Panter, A. T., Swygert, K. A., Grant Dahlstrom, W., & Tanaka, J. S. (1997). Factor analytic approaches to personality item-level data. Journal of Personality Assessment, 68(3), 561-589. doi: https://doi.org/10.1207/s15327752jpa6803_6

Pearse, N. (2011). Deciding on the Scale Granularity of Response Categories of Likert type Scales: The Case of a 21-Point Scale. Electronic Journal of Business Research Methods, 9(2), 159-171.

Pearson, K. (1900). Mathematical Contributions to the Theory of Evolution. VII. On the Correlation of Characters not Quantitatively Measurable. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, 195, 1-405. doi: https://doi.org/10.1098/rsta.1900.0022

Pearson, K. (1910). On a New Method of Determining Correlation, when One Variable is Given by Alternative and the Other by Multiple Categories. Biometrika, 7(3), 248-257. doi: https://doi.org/10.2307/2345385

Preston, C. C., & Colman, A. M. (2000). Optimal number of response categories in rating scales: reliability, validity, discriminating power, and respondent preferences. Acta Psychologica, 104(1), 1-15. doi: https://doi.org/10.1016/S0001-6918(99)00050-5

Richaud, M. C. (2005). Desarrollos del analisis factorial para el estudio de item dicotomicos y ordinales. Interdisciplinaria, 22(2), 237-251.

Saris, W. E., & Coenders, G. (1995). Categorization and measurement quality. The choice between Pearson and Polychoric correlations. En W. Saris & A. Munnich, The multitrait-multimethod approach to evaluate measurement instruments (pp. 125-145). Budapest: Eotvos University Press. Recuperado a partir de http://dare.uva.nl/record/1/113406

Serrano Angulo, J., Cebrián Robles, D. & Serrano Puerto, J. (2015). Control de calidad de datos obtenidos de cuestionarios en escalas Lickert. En AIDIPE, Investigar con y para la sociedad (pp. 167-176). Cádiz: Bubok Publishing S.L.

Shafel, J., Brooke, L. N., & Gillmor, S. C. (2012). Effects of the Number of Response Categories on Rating Scales (pp. 1-24). Presentado en Anual conference of the American Educational Research Association, Vancouver.

Siegel, S. (1970). Estadística no paramétrica: aplicada a las ciencias de la conducta. México: Trillas.

Stevens, S. S. (1946). On the Theory of Scales of Measurement. Science, 103(2684), 677-680. doi: https://doi.org/10.1126/science.103.2684.677

Ulitzsch, E., Holtmann, J., Schultze, M., & Eid, M. (2017). Comparing Multilevel and Classical Confirmatory Factor Analysis Parameterizations of Multirater Data: A Monte Carlo Simulation Study. Structural Equation Modeling, 24(1), 80-103. doi: https://doi.org/10.1080/10705511.2016.1251846

Weijters, B., Cabooter, E., & Schillewaert, N. (2010). The effect of rating scale format on response styles: The number of response categories and response category labels. International Journal of Research in Marketing, 27(3), 236-247. doi: https://doi.org/10.1016/j.ijresmar.2010.02.004

Weng, L.-J. (2004). Impact of the Number of Response Categories and Anchor Labels on Coefficient Alpha and Test-Retest Reliability. Educational and Psychological Measurement, 64(6), 956-972. doi: https://doi.org/10.1177/0013164404268674

Published

2017-12-19

Issue

Section

Research Articles