Educators may approach assessment differently, but according to our K-12 assessment survey, almost all educators agree on the foundation of a good assessment. The key to excellent assessments? Test questions aligned to state learning standards and measured consistently (reliability) to support actionable and appropriate inferences (validity).
Validity & Reliability Defined
Validity is an argument made about our assumptions based on test scores. Testing evidence is accumulated to support the intended interpretations of the test scores. Having the right test questions along with a clear use and purpose for the test is the first step. Presenting evidence of being able to measure the assessments consistently is another important contribution to validity evidence. This is known as the reliability of the test.
Reliability provides an estimate of how much error there is in a test score. Reliable assessments have consistent—and thus, dependable—results. Given there is error in all assessments, and the conclusions about achievement need to be made with confidence, it becomes imperative to minimize error and, hence, increase reliability. After all, it is challenging at best to support valid inferences without consistent measurement!
The Problem With Conventional Assessment Scoring Practices
The two most well-known psychometric approaches that estimate a student's score or ability are Classical Test Theory (CTT) and Item Response Theory (IRT).
CTT and IRT provide performance information by assigning scores along a continuum—such as a scale from 0-100. Students are placed on a continuum and compared to other students or against a performance standard (i.e., a cut score). For an assessment to accurately and reliably determine where a student's ability is along that continuum, there needs to be enough test items to reliably separate students and place them in order. The resulting tests can be long and consume too much valuable instructional time, providing only limited actionable information as a trade-off. For example, a teacher testing students over the course of a math unit that includes five content standards may need 20 or more items per standard to measure each with some degree of reliability, totaling about 100 items.
Additionally, the result is one score but the student is actually being assessed on multiple constructs. CTT and IRT have limitations measuring multiple constructs on a single test and the results are complicated and often intractable for even the most savvy educator. Teachers need to be able to use assessment data to clearly understand what students know, and don't know, so they can use that information to fuel instructional practice. While CTT and IRT work for providing high level views of student performance and fit within current accountability frameworks, they do not provide actionable data which can directly influence instruction in a timely manner.
The Promising Future of the Diagnostic Classification Model (DCM)
High quality, reliable assessments have always been a must, but now there is a more actionable way to gauge student mastery. That’s where the Diagnostic Classification Model (DCM) comes in. The model is based on research by Jonathan Templin, Ph.D., E.F. Lindquist Chair and Professor of Educational Measurement and Statistics at the University of Iowa, and is designed to address some of the limitations of traditional scoring approaches.
The goal of the DCM is to provide a reliable measure of student understanding based on mastery of a skill or set of skills. DCMs achieve this by providing profile estimates of probabilities of mastery, placing students in groups rather than on a continuum. With few items required for each assessment, more learning standards can be tested at once and single standards can be assessed with fewer questions.
Administering classroom assessments that are short, reliable, and fit seamlessly into the instructional cycle has become more important with the challenges of the last two years. Educators need the information provided by assessments to guide future learning, and this information needs to be immediately actionable, which is exactly where past assessment scoring practices fell short.
With this approach, everybody wins. Teachers have access to actionable data to identify areas of mastery and improvement among groups of students, as well as help them address individual student learning needs. Students aren’t burdened with lengthy (and sometimes disruptive) assessments and instead can work with and alongside others who are in a similar place from the learning perspective.
As we strive to support educators in embracing a balanced approach to assessment, we have partnered with school districts to help them lead innovative assessment initiatives. To learn more about how Instructure can support your assessment journey, read our Guide to Mastery View Formative Assessments.
Download the Report
To learn more about key assessment trends that are top of mind for educators nationwide, as well as perspectives and considerations for the future, check out The State of Assessment in K-12.