COSMIN Checklist: Introduction
Patient outcomes and subjective assessments can challenge medical practices and statistical analyses. To support research, the consensus-based standards for the selection of health status measurement instruments (COSMIN) checklist has become an essential tool in science and practice. The COSMIN checklist can be used to evaluate the measurement properties of advanced health-related patient-reported outcomes (HR-PROs) and the methodological quality of medical studies that employ such subjective instruments.
Assessing the measurement properties of an instrument can approve its applications across domains, provide reliable conclusions, and support the further development of diagnostic techniques. By having a clear set of standards in the form of an engaging checklist, experts can get a better understanding of the design requirements in research and improve medical practice. Good criteria are important particularly in studies that utilize patient-reported outcomes and Item Response Theory (IRT) tools – simply because the subjective nature of the constructs may challenge statistical analyses and conclusions (Mokkink et al., 2010).
The Importance of the COSMIN Initiative
To facilitate healthcare practices, COSMIN has emerged as an ambitious initiative in the development of core outcome sets (COS), the standardization of measures, and the evaluation of outcome measurement instruments. By developing tools for selecting research instruments, the COSMIN initiative aims to facilitate the selection of outcome measurements. Thus, the initial aim of the initiative was to create a user-friendly checklist for experts and novices. To achieve this goal, a Delphi study was performed to reach consensus between experts. The Delphi study consisted of four rounds which took place between 2006 and 2007. Note that 91 international experts with at least five publications on health measures published in PubMed were invited. It’s interesting to mention that 57 experts agreed to participate; 25 from North America, 29 from Europe, 2 from Australia, and 1 from Asia. Out of those 57 scientists, 20 completed all four rounds. These numbers corresponded with the initial expectations of the committee panel; usually, Delphi studies reveal that 70% of the people invited normally agree to participate (Mokkink et al., 2010).
As stated above, the Delphi study conducted by Mokkink and colleagues (2010) consisted of four rounds. The first round focused on questions about measurement properties and scientific definitions (e.g., “Which definition do you consider the best for internal consistency?”). The second stage included questions and ratings about standards, study designs, and statistical methods (e.g., using Cronbach’s alpha to asses internal consistency). In the third round, the panel was presented with the most chosen methods and asked to provide feedback (e.g., if the most chosen method is actually the most preferred and appropriate one). Note that feedback and ideas were collected throughout all four stages of the study to foster constructive discussions and further solutions. In the last stage of the Delphi study, the aspects which the panel agreed on were integrated. Note that consensus was achieved when a minimum 67% of the members “agreed” or “strongly agreed” on a 5-point scale. After the final stage of the study, the steering committee created an initial version of the COSMIN checklist, which now is an integrated part of good research practices.
The COSMIN Checklist Explained
The ambitious Delphi study led to the development of the preliminary version of the COSMIN checklist. The list contains standards for assessing the methodological quality of studies and tools on measurement properties. Note that when it comes to patient-reported outcome measures, three domains become fundamental: reliability, validity, and responsiveness. To be more precise, the checklist consists of 12 boxes; 10 can be used to assess if the study meets the criteria for methodological quality and 9 can provide standards for the measurement properties of the actual study. Note that these criteria are internal consistency, reliability, measurement error, content validity, structural validity, hypotheses testing, cross-cultural validity, criterion validity, and responsiveness, plus interpretability. The list contains two more boxes: a box for IRT methods and another one for generalizability. Here we should mention that internal consistency was defined as the interrelatedness among questions. Content validity, on the other hand, was defined as the extent to which the content of a tool is an adequate reflection of the construct to be measured, as well as the hypotheses tested. Note that content validity includes the structural validity, hypotheses testing, and cross-cultural validity presented above. Criterion validity was defined as the degree to which the scores of a health-related patient-reported outcome instrument are an adequate representation of a “gold standard.” It’s important to mention that in the case of patient-reported outcomes, we can talk about a “golden standard” only when a shorter version is compared to the original version of the test. Last but not least, responsiveness was defined as the ability of a patient-reported tool to detect differences over time (in any construct being assessed).
To complete the COSMIN checklist, there are four steps experts must follow:
- The first step is to determine which measurement properties are being assessed
- Then experts must decide if the statistical analyses used in the study of interest are based on Classical Test Theory (CTT) or on Item Response Theory (IRT)
- The third step is to complete the boxes of the checklist (with standards that accompany the properties chosen)
- Last, experts should assess the generalizability
COSMIN Checklist: Usage
The COSMIN Checklist has numerous applications in research and practice. It can be used to select adequate instruments, design a clinical study, report findings (based on good measurement properties), or review a scientific paper. In addition, the checklist can be used for educational purposes or research appraisal. It’s not a secret that health status measurements should be valid and reliable in order to provide precise results, which can support healthcare. Note that usually, health outcomes encompass different modes of data collection: there are tools administered by an interviewer, delivered by a computer, self-administered, and performance-based methods.
When it comes to subjective measures, such as patient-reported outcomes, standards on measurement properties become essential to choose the most appropriate instrument. Interestingly, Marshall was one of the first experts to report bias due to the nature of the actual measurements. He revealed that in schizophrenia clinical trials, scientists were more likely to report that treatment was superior, particularly when the research team utilized an unpublished measurement instrument rather than a published and validated tool (Mokkink et al., 2010). The COSMIN can also ensure the standardization of cross-cultural adaptations and comparisons. Such instruments are multidimensional and subjective, so evaluating measurement properties can only improve practice and patient outcomes.