COSMIN Checklist: Introduction

Patient outcomes and subjective assessments can challenge medical practices and statistical analyses. To support research, the consensus-based standards for the selection of health status measurement instruments (COSMIN) checklist have become an essential tool in science and practice. The COSMIN checklist can be used to evaluate the measurement properties of advanced health-related patient-reported outcomes (HR-PROs) and the methodological quality of medical studies that employ such subjective instruments.

Assessing the measurement properties of an instrument can approve its applications across domains, provide reliable conclusions, and support the further development of diagnostic techniques. By having a clear set of standards in the form of an engaging checklist, experts can get a better understanding of the design requirements in research and improve medical practice. Good criteria are important particularly in studies that utilize patient-reported outcomes and Item Response Theory (IRT) tools – simply because the subjective nature of the constructs may challenge statistical analyses and conclusions (Mokkink et al., 2010).


The Importance of the COSMIN Initiative

To facilitate healthcare practices, COSMIN has emerged as an ambitious initiative in the development of core outcome sets (COS), the standardization of measures, and the evaluation of outcome measurement instruments. By developing tools for selecting research instruments, the COSMIN initiative aims to facilitate the selection of outcome measurements. Thus, the initial aim of the initiative was to create a user-friendly checklist for experts and novices. To achieve this goal, a Delphi study was performed to reach a consensus between experts. The Delphi study consisted of four rounds which took place between 2006 and 2007. Note that 91 international experts with at least five publications on health measures published in PubMed were invited. It’s interesting to mention that 57 experts agreed to participate; 25 from North America, 29 from Europe, 2 from Australia, and 1 from Asia. Out of those 57 scientists, 20 completed all four rounds. These numbers corresponded with the initial expectations of the committee panel; usually, Delphi studies reveal that 70% of the people invited normally agree to participate (Mokkink et al., 2010).

As stated above, the Delphi study conducted by Mokkink and colleagues (2010) consisted of four rounds. The first round focused on questions about measurement properties and scientific definitions (e.g., “Which definition do you consider the best for internal consistency?”). The second stage included questions and ratings about standards, study designs, and statistical methods (e.g., using Cronbach’s alpha to assess internal consistency). In the third round, the panel was presented with the most chosen methods and asked to provide feedback (e.g., if the most chosen method is actually the most preferred and appropriate one). Note that feedback and ideas were collected throughout all four stages of the study to foster constructive discussions and further solutions. In the last stage of the Delphi study, the aspects which the panel agreed on were integrated. Note that consensus was achieved when a minimum 67% of the members “agreed” or “strongly agreed” on a 5-point scale. After the final stage of the study, the steering committee created an initial version of the COSMIN checklist, which now is an integrated part of good research practices.


The COSMIN Checklist Explained

The ambitious Delphi study led to the development of the preliminary version of the COSMIN checklist. The list contains standards for assessing the methodological quality of studies and tools on measurement properties. Note that when it comes to patient-reported outcome measures, three domains become fundamental: reliability, validity, and responsiveness. To be more precise, the checklist consists of 12 boxes; 10 can be used to assess if the study meets the criteria for methodological quality and 9 can provide standards for the measurement properties of the actual study. Note that these criteria are internal consistency, reliability, measurement error, content validity, structural validity, hypotheses testing, cross-cultural validity, criterion validity, and responsiveness, plus interpretability. The list contains two more boxes: a box for IRT methods and another one for generalizability. Here we should mention that internal consistency was defined as the interrelatedness among questions. Content validity, on the other hand, was defined as the extent to which the content of a tool is an adequate reflection of the construct to be measured, as well as the hypotheses tested. Note that content validity includes the structural validity, hypotheses testing, and cross-cultural validity presented above. Criterion validity was defined as the degree to which the scores of a health-related patient-reported outcome instrument are an adequate representation of a “gold standard.” It’s important to mention that in the case of patient-reported outcomes, we can talk about a “golden standard” only when a shorter version is compared to the original version of the test. Last but not least, responsiveness was defined as the ability of a patient-reported tool to detect differences over time (in any construct being assessed).

To complete the COSMIN checklist, there are four steps experts must follow:

  • The first step is to determine which measurement properties are being assessed
  • Then experts must decide if the statistical analyses used in the study of interest are based on Classical Test Theory (CTT) or on Item Response Theory (IRT)
  • The third step is to complete the boxes of the checklist (with standards that accompany the properties chosen)
  • Last, experts should assess the generalizability



COSMIN Checklist: Usage

The COSMIN Checklist has numerous applications in research and practice. It can be used to select adequate instruments, design a clinical study, report findings (based on good measurement properties), or review a scientific paper. In addition, the checklist can be used for educational purposes or research appraisal. It’s not a secret that health status measurements should be valid and reliable in order to provide precise results, which can support healthcare. Note that usually, health outcomes encompass different modes of data collection: there are tools administered by an interviewer, delivered by a computer, self-administered, and performance-based methods.

When it comes to subjective measures, such as patient-reported outcomes, standards on measurement properties become essential to choose the most appropriate instrument. Interestingly, Marshall was one of the first experts to report bias due to the nature of the actual measurements. He revealed that in schizophrenia clinical trials, scientists were more likely to report that treatment was superior, particularly when the research team utilized an unpublished measurement instrument rather than a published and validated tool (Mokkink et al., 2010). The COSMIN can also ensure the standardization of cross-cultural adaptations and comparisons. Such instruments are multidimensional and subjective, so evaluating measurement properties can only improve practice and patient outcomes.


COSMIN and PROMIS: Future Perspectives

With digital health empowering patients, patient-reported outcomes measures (PROMs) have become essential tools in research. It’s not a secret that some symptoms are known only to the patients themselves. Due to the subjective nature of these measures, though, reliability and validity may suffer (Prinsen et al., 2018). In addition, experts often conduct systematic reviews to choose a tool that suits their objectives and methodology, which is a complex process prone to unclarity and bias. Note that bias can lead to a waste of resources and unethical decisions.

Therefore, COSMIN standards can only facilitate practice and literature searches. It’s interesting to mention that the research committee, which conducted the initial Delphi study and created the COSMIN checklist, keeps updating the COSMIN guidance on a regular basis. To set an example, the research team formulated a COSMIN Risk of Bias Checklist and ten steps to support the actual process of performing systematic reviews of patient-reported outcome measures. These steps are:

  • Formulate the aim of the review (with a focus on the quality of the tool)
  • Define the eligibility criteria
  • Conduct a literature search
  • Select articles and abstracts
  • Measure content validity
  • Evaluate internal structure
  • Assess other measurement qualities, such as reliability, measurement error, etc.
  • Describe interpretability and feasibility
  • Give recommendations for future improvement and standardization
  • Report the findings of the systematic review


COSMIN Checklist: Conclusion

Moved by the lack of clarity around measurement properties, inconsistency in standards, and poor evidence in patient-reported outcomes, the COSMIN initiative developed clear guidance and an engaging checklist to support evidence-based research. As the number of patient-reported outcomes measures is increasing, experts need a good methodology to conduct systematic reviews and clinical studies. The initial COSMIN Checklist with its criteria (internal consistency, reliability, measurement error, content validity, structural validity, hypotheses testing, cross-cultural validity, criterion validity, responsiveness, interpretability, IRT methods, and generalizability) provides clear standards to assess the measurement properties of any subjective tool and the methodological quality of any study that employs such methods.

To sum up, the COSMIN checklist has become one of the main factors for scientific success. Standards are constantly being updated, embracing the newest advancement in research and digital health.



Mokkink, L., Terwee, C., Patrick, D., Alonson, J., Stratford, P., Knol, D., Bouter, L., & de Vet, H. (2010). The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments: an international Delphi study.

Quality of Life Research, 19(4), p. 539-549.

Prinsen, C., Mokkink, L., Bouter, L., Alonson, J., Patrick, D., de Vet, H., & Terwee, C. (2018). COSMIN guideline for systematic reviews of patient-reported outcome measures. Quality of Life Research, 27(5), p.1147-1157.