PISA Science Contextualized Items: The Link between the Cognitive Demands and Context Characteristics of the Items

Authors

  • Maria Araceli Ruiz-Primo Stanford University
  • Min Li University of Washington

DOI:

https://doi.org/10.7203/relieve.22.1.8280

Keywords:

PISA, science items, context characteristics of the items, cognitive demands, validity.

Abstract

The ubiquitous use of contexts in test items is based on the premise that contextualizing items is an effective strategy to test whether students can apply or transfer their knowledge. In this paper, we continue a research agenda focusing on testing this premise. We present a study of the context characteristics in a sample of 2006 and 2009 PISA science items and how these characteristics as well as student performance may be related to the cognitive demands of the items. The study was guided by two research questions: (1) What are the cognitive demands of the sampled PISA contextualized items and what is the students’ performance linked to these items? (2)  Are the items’ cognitive demands associated with certain characteristics of the contexts of the items that proved to be linked to students’ performance? Using 52 released and secured PISA science items, we captured information about three context dimensions of items, level of abstraction, resources, and nature of the context, and the cognitive demands of the items. A multinomial logistic regression with cognitive demand as the outcome variable, context characteristics as the predictors, and percent of correct responses as the covariant indicated that certain context characteristics are linked to the cognitive demands of items. For example, we found that items in which contexts involve only concrete ideas were associated with items with low cognitive demands; these items are unlikely to require content knowledge to be responded. We also found that the type of resource (e.g., tables, graphs) was associated with the cognitive demands of the items: schematic representations seem to be linked to items tapping procedural knowledge rather than to items tapping declarative or schematic knowledge. We concluded that further research is needed to better understand the influence that context characteristics have on the cognitive processes in which students are asked to engage and in their performance.

Key-words: PISA, science items, context characteristics of the Items, cognitive demands, validity.

Author Biographies

Maria Araceli Ruiz-Primo, Stanford University

is an associate professor at the Graduate School of Education at Stanford University. Her work, funded mainly by the National Science Foundation and the Institute of Education Sciences, examines the assessment of student learning both in the classroom and in large-scale assessment programs and classroom assessment practices. Her publications address the development and evaluation of diverse learning assessment strategies (e.g., concept maps and stu­dents’ science notebooks), and the study of teach­ers’ informal and formal formative assessment practices (e.g., the use of assessment conversa­tions and embedded assessments). Her recent work focuses on the development and validation of assessments that are instructionally sensitive and instruments intended to measure teachers’ formative assess­ment practices. Her address is: Graduate School of Education. 485 Lasuen Mall. Stanford University, Stanford, CA

Min Li, University of Washington

is associate professor at College of Education, University of Washington, Seattle. She is interested in understanding how student learning can be accurately and adequately assessed both in large-scale testing and classroom settings. Her work reflects a combination of cognitive science and psychometric approaches in various projects, including examining the cognitive demands of large-scale science items, using science notebooks as assessment tools, parameterizing the design of contextualized tasks, analyzing teachers’ classroom assessment practices, and validating complex performance-based tasks for teachers and educational leaders. Her address is: University of Washington ? College of Education. Seattle, WA 98195-3600.

References

Ahmed, A., & Pollitt, A. (1999). Curriculum demands and question difficulty. Paper presented at the International Association of Educational Assessment Conference (IAEA), Slovenia.

Ahmed, A., & Pollitt, A. (2000). Observing context in action. Paper presented at the International Association of Educational Assessment Conference (IAEA), Jerusalem, Israel.

Ahmed, A., & Pollitt, A. (2001). Improving the validity of contextualized questions. In British Educational Research Association Annual Conference, Leeds.

Ahmed, A., & Pollitt, A. (2007). Improving the quality of contextualized questions: An experimental investigation of focus. Assessment in Education, 14(2), 201–232. DOI:10.1080/09695940701478909

Anderson, R. C. (1972). How to construct achievement test to assess comprehension. Review of Educational Research, 42(2), 145-170.

Anderson, L. W., Krathwohl, D. R., Airasian, P. W., Cruikshank, K. A., Mayer, R. E., Pintrich, P. R. … & Wittrock, M. C. (2001). A taxonomy for learning, teaching, and assessing: A revision of Bloom’s taxonomy of educational objectives. New York, NY: Longman.

Boaler, J. (1993). The role of context in the mathematics classroom: Do they make mathematics more “real”? For the Learning of Mathematics, 13(2), 12–17.

Boaler, J. (1994). When do girls prefer football to fashion? An analysis of female underachievement in relation to ‘realistic’ mathematic context. British Educational Research Journal, 20(5), 551–564. DOI: 10.1080/0141192940200504

Bormuth, J. R. (1970). On a theory of achievement test items. Chicago: University of Chicago Press.

Cooper, B., & Dunne, M. (2000). Assessing children’s mathematical knowledge. Social class, sex, and problem solving. Buckingham, UK: Open University Press.

Downing, S. M. (2006). Twelve steps for effective test development. In S. M. Downing & T. M. Haladyna (Eds). Handbook of test development (pp. 3-25). Mahwah, NJ: Lawrence Erlbaum Associates, Publishers.

Fisher-Hoch, H., & Hughes, S. (1996). What makes mathematical exam questions difficult? Paper presented in the British Educational Research Association conference, Lancaster, UK.

Fulkerson, D., Nichols, P., Haynie, K., & Mislevy, R. (2009). Narrative structures in the development of scenario-based science assessments (Large-Scale Assessment Technical Report 3). Menlo Park, CA: SRI International.

Gerofsky, S. (1996). A linguistic and narrative view of word problems in mathematics education. For the Learning of Mathematics, 16(2), 36-45.

Greeno, J. G. (1989). A perspective on thinking. American Psychologist, 44(2), 134–141.

Haladyna, T. M. (1994). Developing and validating multiple-choice test items. Hillsdale, NJ: Lawrence Erlbaum Associates.

Haladyna, T. M. (1997). Writing test items to evaluate higher order thinking. Boston, MA: Allyn and Bacon.

Haladyna, T. M., & Downing, S. M. (1989). A taxonomy of multiple-choice item-writing rules. Applied Measurement in Education, 2(1), 37-50. DOI:10.1207/s15324818ame0201_4

Haladyna, T., & Downing, S. M. (2004). Construct irrelevant variance in high-stakes testing. Educational Measurement: Issues and Practice, 23(1), 17-27. DOI: 10.1111/j.1745-3992.2004.tb00149.x

Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A review of multiple-choice item-writing guidelines for classroom assessment. Applied Measurement in Education, 15(3), 309–334. DOI:

1207/S15324818AME1503_5

Hembree, R. (1992). Experiments and relational studies in problem solving: A meta-analysis. Journal for Research in Mathematics Education, 23(3), 242–273. DOI: 10.2307/749120

Kelly, V. L. (2007). Alternative assessment strategies within a context based science teaching and learning approach in secondary schools in Swaziland. (Doctoral dissertation). University of Western Cape. Bellville, South Africa.

Leighton, J. P., & Gokiert, R. J. (2005). The cognitive effects of test item features: Informing item generation by identifying construct irrelevant variance. In Annual Meeting of the National Council on Measurement in Education (NCME), Montreal, Quebec, Canada.

Li, M. (2002). A framework for science achievement and its link to test items. (Unpublished doctoral dissertation). Stanford University, California.

Li, M., & Shavelson, R. J. (April, 2001). Using TIMSS items to examine the links between science achievement and assessment methods. Paper presented at the annual meeting of the American Educational Research Association. Seattle, WA.

Li, M., Ruiz-Primo, M. A., & Shavelson, R. J. (2006). Towards a science achievement framework: The case of TIMSS 1999. In S. J. Howie and T. Plop (Eds.). Contexts of learning mathematics and science (pp. 291-311). London, England: Routledge.

McMartin, F., McKenna, A., & Youssefi, K. (2000, May). Scenario assignments as assessment tools for undergraduate engineering education. IEEE Transactions on Education, 43(2), 111–119. DOI:10.1109/13.848061

Mevarech, Z. R., & Stern, E. (1997). Interaction between knowledge and context on understanding abstract mathematical concepts. Journal of Experimental Child Psychology, 65, 68–95. doi:10.1006/jecp.1996.2352

Mislevy, R. J., & Riconscente, M. M. (2006). Evidence-centered assessment design. In S. M. Downing & T. M. Haladyna (Eds). Handbook of test development (pp. 61-90). Mahwah, NJ: Lawrence Erlbaum Associates, Publishers.

Organisation for Economic Cooperation and Development (OECD). (2009). PISA 2009 Assessment Framework. Key competencies in reading, mathematics and science. Retrieved from http://www.oecd.org/pisa/pisaproducts/44455820.pdf.

Organisation for Economic Cooperation and Development (OECD). (2015). PISA 2015 Assessment and Analytical Framework. Science, reading. Mathematic, and Financial Literacy. Retrieved from http://www.oecd-ilibrary.org/docserver/download/9816021e.pdf?expires=1465158103&id=id&accname=guest&checksum=57858C1E7640CA2C6457CC296204529A.

Osterlind, S. J. (1998). Constructing test items: Multiple-choice, constructed response, performance, and other formats. (Evaluation in Education and Other Services, 47). Boston, MA: Kluwer Academic Publishers.

Royer, J. M., Ciscero, C. A., & Carlo, M. S., (1993). Techniques and procedures for assessing cognitive skills. Review of Educational Research, 63(2), 201-243. doi: 10.3102/00346543063002201

Ruiz-Primo, M. A., & Li, M. (2012, July). The role of context in science items and its relation to students’ performance. Paper presented in the International Test Commission Bi-Annual Conference. Amsterdam, The Netherlands.

Ruiz-Primo, M. A., & Li, M. (2015). The relationship between item context characteristics and student performance: The case of the 2006 and 2009 PISA Science items. Teachers College Record, 117(1), 1-36.

Ruiz-Primo, M. A., Li, M., & Minstrell, J. (2014). Building a framework for developing and evaluating contextualized ítems in science assessment (DECISA). Proposal submitted to the DRL–CORE R7D Program to National Science Foundation. Washington, DC: National Science Foundation.

Ruiz-Primo, M. A. (2003, April). A framework to examine cognitive validity. Paper presented at the meeting of the American Education Research Association, Chicago, IL.

Ruiz-Primo, M. A. (2007). Assessment in science and mathematics: Lessons learned. In M. Hoepfl & M. Lindstrom (Eds.), Assessment of Technology Education, CTTE 56th Yearbook (pp. 203-232). Woodland Hills, CA: Glencoe-McGraw Hill.

Shavelson, R. J., Ruiz-Primo, M. A., Li, M., & Ayala, C. C. (2002). Evaluating new approaches to assessing learning. CSE Technical Report 604. National Center for Research on Evaluation, Standards, and Student Testing (CRESST) University of California, Los Angeles.

Shoemaker, D. M. (1975). Toward a framework for achievement testing. Review of Educational Research, 45(1), 127-147. doi: 10.3102/00346543045001127

Taber, K. S. (2003). Examining structure and context – questioning the nature and purpose of summative assessment. Presentation at Cambridge International Examinations Seminar. University of Cambridge Local Examinations Syndicate. Cambridge, England.

Terry, T. M. (1980). The narrative exam – an approach to creative organization of multiple-choice tests. Journal of College Science Teaching, 9(3), 156-158.

Wainer, H., & Kiely, 1, G. L. (1987). Item clusters and computerized adaptive testing: A case for testlets. Journal of Educational Measurement, 24(3), 185-201.

Wang, T., & Li, M. (March, 2014). Literature review of characteristics of science item contexts. Paper presented in the Annual Meeting of National Association for Research in Science Teaching (NARST), Pittsburgh, PA.

Welch, C. (2006). Item and prompt development in performance testing. In S. M. Downing & T. M. Haladyna (Eds). Handbook of test development (pp. 303-327). Mahwah, NJ: Lawrence Erlbaum Associates, Publishers

Wiggins, G. (1993). Assessing student performance: Exploring the purpose and the limits of testing. San Francisco: Jossey-Bass.

Wiliam, D. (1997). Relevance as MacGuffin in mathematics education. In British Educational Research Association, York.

Yin, Y. (2005). The influence of formative assessments on student motivation, achievement, and conceptual change. Unpublished doctoral dissertation, Stanford University, Stanford, CA.

Published

2016-07-03

Issue

Section

Special Section