Reporting the Results
Reporting the results is one of the fundamental aspects of clinical research. Accurate results benefit diagnosis, patient outcomes, and drug manufacturing. Nevertheless, measurements and findings are often prone to errors and bias (Bartlett & Frost, 2008).
Various research methods and statistical procedures exist to help experts erase discrepancies and reach “true” values. In the end, the main aim of any clinical trial is accuracy.
Repeatability and Medical Data
Repeatability is a paramount factor which reveals the consistency of any clinical method. In other words, repeatability unveils if the same instrument used in the same subject more than once will lead to the same results (Peat, 2011). Note that the term repeatability was introduced by Bland and Altman, after which various terminology was created in a similar way.
Although terms, such as repeatability, reproducibility, reliability, consistency and test-retest variability, can be used interchangeably – there are some slight differences. Repeatability, for instance, requires the same location, the same tool, the same observer, and the same subject. Consequently, the repeatability coefficient reveals the precision of a test and the difference between the two repeated tests findings over a short period of time. To test repeatability in continuous data – statistics, such as the intraclass correlation coefficient and Levine’s test of equal variance, can be utilized. For categorical data – kappa and proportion in the agreement can support research. Reproducibility, on the other side, refers to the ability to replicate medical studies. In other words, it reveals the agreement between results – obtained from different subjects, via different tools, and at different locations (“What is the difference between repeatability and reproducibility?” 2014).
Data types also matter. As explained above, in case of continuously distributed measures, measurement error (or standard error of the measurement (SEM)) and intraclass correlation coefficient (ICC) are the two most effective indicators of repeatability (regarding the reliability of a measure). The measurement error reveals any within-subject test-retest variation. Note the measurement error is an absolute estimate of the absolute range in which the true value can be found. On the other hand, the ICC is defined as a relative estimate of repeatability. To be more precise, it reveals any between-subject variance to the total variance for continuous measures. When it comes to interpretation, a high ICC means that only a small proportion of the variance is due to within-subject variance. In fact, ICC close to one means there’s no within-subject variance.
For categorical data, there are also various methods – with kappa being one of the sufficient statistics. Basically, kappa is similar to ICC but applicable to categorical data. Thus, kappa values close to one indicate total agreement (Peat, 2011). Note that repeatability in categorical measurements is also called misclassification error.
Continuous Data and True Values
Medical research is a curious niche full of unexpected outcomes. Variations and errors are quite common. Variations may occur even when the same subject is tested twice via the same tool. Such discrepancies might be a result of various factors: within-observer variation (intra-observer error), between-observer variation (inter-observer error), within-subject variation (test-retest error), and actual changes in the subject after a certain intervention (responsiveness). To be more precise, variations may occur due to changes: in the observer, the subject or the equipment.
Consequently, it’s hard to analyze true values. To guarantee the accuracy, any good study design should ensure that more than one measurement will be taken from each subject to assess estimate repeatability (Peat, 2011).
Selection Bias and Sample Size
Selection bias affects the repeatability scores. Therefore, studies that have different subject selection criteria cannot be compared. At the same time, estimates of studies with three or four repeated measures cannot be compared to studies with two repeated measures.
Note that estimates of ICC may be higher and estimates of measurement error lower if the inclusion criteria lead to variations. To set an example, usually, ICC will be higher when subjects are selected randomly. Researchers should recruit a sample of minimum 30 subjects to guarantee adequate measurements of variance.
Within-subject Variance and Two Paired Measurements
Paired data is vital in research. Paired data should be used to measure within-subject variances. The mean values and the standard deviation (SD) of the differences also must be computed. The measurement error can be later transformed into a 95% range. In fact, this is the so-called limits of agreement – or the 95% certainty that the true value for a subject lies within the calculated range (Peat, 2011).
- Paired t-tests are beneficial in assessing systematic bias between observers.
- A test of equal variance (e.g., Levene’s test of equal variance) can be helpful to assess repeatability in two different groups.
- A plot of the mean value is also crucial in order to assess the difference between various measures for each subject. This is an effective method as usually, the mean-vs-difference plot (or Bland-&-Altman plot) is clearer than any scatter plots.
- Note that Kendall’s correlation can add more valuable insights to the study. Kendall’s tau-b correlation coefficient indicates the strength of association that exists between two variables (“Kendall’s Tau-b using SPSS Statistics”).
Measurement Error and Various Measurements per Subject
Nothing is only black and white in medical research. Often, more than two measures are required per subject. In case there are more than two measurements taken per subject, experts should calculate the variance for each subject and after that, any within-subject variances. Note that such values can be calculated via ANOVA. A mean-vs-standard deviation plot can visualize the results. In addition, Kendall’s coefficient can indicate if there’s any systematic error.
Note that deciding on a reliable measure out of a selection of different measures is a difficult task. In clinical settings, this is a vital process as it may affect patients’ outcomes. Assessing measurement errors is also fundamental. The measurement error indicates the range of normal and abnormal values from the baseline. In fact, these values can reveal either a positive or a negative effect of a treatment (Peat, 2011). Let’s say that previously abnormal values have come close to normal values. One interpretation of this phenomenon can be that a disease has been affected by a treatment in a positive direction.
ICC and Methods to Calculate ICC
The ICC is an essential indicator in order to show the extent to which multiple measures taken from the same subjects are related to each other. This form of correlation is also known as a reliability coefficient. As explained above, a high ICC value means that variances are due to true differences between subjects, while the rest due to measurement errors (within-subject variance).
Unlike other correlation coefficients (Pearson’s correlation, for example), ICC is relatively easy to calculate. There are a few methods that can help experts calculate ICC, along with other sufficient computer programs. The first method employs a one-way analysis of variance – it is used when the difference between observers is fixed. The second method that can be beneficial refers to cases when there are many observers – it is based on two-way analysis of variance. There