Healthcare Simulation may include the use of clinical simulation-based assessments in which some measurement instrument or tool is used to determine a learner’s performance or knowledge. The accuracy of these instruments depends on the reliability and validity of the tool. During the 2021 IMSH conference, Scerbo, M., Lineberry, M., Sebok-Syer, S., & Calhoun, A. presented a session entitled “Best Practices in Validity: A Primer for Simulation-Based Assessment.” The presenters emphasized the importance of using critical thinking when considering validity. They created a practical primer based on current theory and best practices for determining the validity of clinical simulation-based assessments.
The session began by defining terms such as “reliability,” “validity” and “measurement tool (instrument).” Reliability is the consistency of measurement and the extent to which any assessment tool will produce conformity of results. Internal consistency is typically a measure based on the correlations between different items on the same test (or the same subscale on a larger test).
Internal consistency measures whether several items that propose to measure the same general construct produce similar scores. This includes consistency over time (test-retest reliability), across items (internal consistency), and across different researchers (inter-rater reliability). A tool or form is an instrument used to gather assessment data e.g. DASH tool.
Validity tells you how accurately a method measures something. If a method measures something as claimed, and the results closely correspond to real-world values, then this method can be considered valid. For example, a test of intelligence should measure intelligence and not something else (such as memory). Validation refers to the process of collecting validity evidence to evaluate the appropriateness of the interpretations, uses, and decisions based on assessment results (Kane MT. Validation. In Brennan RL, editor. Educational measurement. 4th ed. Westport: Praeger; 2006. p. 17–64).
Healthcare simulation measures specific constructs such as communication or resuscitation skills. These are “more ideas which exist in our minds which we believe have meaning based on other research.” Simulationists should question whether tools being used really relate to the concepts being measured. The presenters suggest that validity is not a property of a tool. Rather, validity should be considered as a relational idea based on a specific environment or population and to the decision/s that will be made based on the tool’s score. A tool that is valid in one situation or with one population may be invalid in others.
For example, a tool that is valid for a formative evaluation may not be valid for a high stakes evaluation where a learner may pass or fail a course or OSCE. Therefore, the validity of a tool should be reevaluated when populations, situations or decisions are different. For example, would a tool validated for in-person simulation, be valid for virtual simulation? The presenters also state that validity is only one component of tool evaluation.
The theory behind validity has changed and become more complex over the last century. Internal, external, construct and statistical validity have been identified. Scerbo et al. suggest that more recently, an attempt has been made to simplify the validation process. Validity should be considered as research and therefore is hypothesis-driven.
A statement should be written such that “X” tool will show validity in this population, for making this decision at a specified level of confidence. First, knowing exactly what you are trying to measure, with a clear definition of terms, is important. Without this, determining validity will be difficult. In addition, the educator should determine what is driving the need for this assessment and why the assessment should be conducted. Following the assessment, how will the data from the assessment be used? In other words, is the tool defensible? This is particularly critical for high-stakes testing.
Calhoun compares the validity argument to arguing a case in court where a case should be made and a stream of evidence is presented. Two commonly used frameworks for healthcare simulation validity are:
Messick’s Framework: Addresses streams of evidence and categories of data. This may be the preferable method for those new to this work, since the framework provides a checklist approach although you may or may not have to include all items on the framework.
- Content evidence — converting construct to meaningful questions.
- Response process — how is the tool scored and how are raters trained?
- Internal structure — how reproducible is your tool?
- Relationship to other variables — do scores of a new tool relate to existing instruments?
- Consequences — what will the data be used for and what could be the intended or unintended consequences of the tool? This item should always be included and might be the most difficult to accomplish. Speaking to learners early and often is important since consequences can affect the behavior of the learners.
Kane’s Framework: Addresses the structure of the case/validity argument and the flow of the argument.
- Decision/Intended Use of Argument — State the decision the tool is intended to facilitate and an outline of the evidence to support the tool’s use.
- Scoring — translating an observation into one or more scores.
- Generalisation — using the score[s] as a reflection of performance in a test setting.
- Extrapolation — using the score[s] as a reflection of real-world performance.
- Implications — applying the score[s] to inform a decision or action.
- Evidence should be collected to support each of these inferences and should focus on the most questionable assumptions in the chain of inference.
In summary, validation work is complex. The presentation by Scerbo et al. offered an outline of the validation process as well as a step-by-step process for determining validity. Start by determining the purpose of the tool and the consequences of using the tool. Always consider the learner in this process and how they view the assessment. Ask yourself if the tool is valid for a particular decision in a particular population/context, rather than simply asking if this tool is valid. Use a framework to guide you through the validation process.
There are experts who create and validate tools as their main work all day and every day! If you have access to these experts, use them! Creating tools and determining reliability and validity are time-consuming and expensive activities. If a tool exists that might work, use it. If you find a tool that closely matches what you need but requires changes to one or two items, the validation process will need to be repeated.