Dear Editor,
We read with interest the recent paper by Jones et al1
examining the accuracy of clinical factors in distinguishing acute lymphocytic
leukaemia (ALL) from juvenile rheumatoid arthritis (JRA). We believe that
throughout this paper where the authors refer to “sensitivity” and
“specificity” the data they report are in fact positive predictive values (PPV)
and negative predictive values (NPV), respectively. We are also concerned about
the calculation of confidence intervals, as well as the methods of assessment
of both exposure and outcome in this paper.
The sensitivity and specificity of a diagnostic tool refer
to the ability of the tool to return a positive test in patients who truly have
the disease of interest, and to return a negative test in patients who truly do
not have the disease respectively. Thus sensitivity is
calculated as the number of true positives divided by the sum of the true
positives and false negatives, and specificity is calculated as the number of
true negatives divided by the sum of the true negatives and false positives
(see figure 1).
In contrast, the positive predictive value is the proportion
of patients with a positive test who truly have the disease, and the proportion
of patients with a negative test who truly do not have the disease. These two
values are calculated as the number of true positives divided by the sum of the
true positives and false positives and the number of true negatives divided by
the sum of the true negatives and false negatives (see figure 1).
In looking at the data presented in Table 1 of the paper it
can be seen that for the indicator ‘low white blood cell count (WBC)’ the total
number of children with leukaemia is 52, and of these 11 have a positive test
(have low WBC). Dividing 11 by 52 gives a sensitivity of 21%. The sensitivity
figure quoted of 85% is instead the PPV, calculated by dividing 11 by 13 (the number of children with either
ALL or arthritis who have low WBC). Similarly, 203 of the 205 children with
JRA have negative test (do not have low WBC), and dividing 203 by 205 gives and
specificity of 99%. The figure quoted of 83% is instead the NPV, calculated by
dividing 203 by the 244 (the number of children with either ALL or arthritis
(JRA) who do not have low WBC). See figure 2 below. This same confusion of
sensitivity with PPV and specificity with NPV has been made every time the
terms are used throughout the paper.
The difference between these terms is important not only for
semantic reasons but also because PPV and NPV vary markedly
with the prevalence of the disease, whereas, theoretically at least,
sensitivity and specificity do not. The study did not examine an unselected
cohort of children representative of the normal spectrum of
children who would normally present for assessment, but rather followed
a case-control methodology, comparing 53 ‘cases’, children with blast negative
ALL with 205 ‘controls’, children with JRA. This approximately 1:4 ratio is
unlikely to reflect the ratio of prevalence of ALL to JRA seen in clinical practice. For example, while the PPV of
85% for low WBC is impressive (meaning that 85% of children
with a positive test will have ALL), if the ratio of diseases is more
like 1:10 then the PPV rapidly decreases to 67%, and again to 33% for a 1:50
ratio.
We are also concerned about the way in
which confidence intervals have been calculated. It appears that these are based
on the variation in the entire cohort rather than the variation within the relevant
subgroups. Using low WBC as an example, the reported sensitivity (which is
actually PPV) of 85% is reported with confidence intervals from 80 to 89%.
These appear to be based on calculations including the entire group of 257
children, rather than on the 13 children whose data has been included in the
calculation. This leads to a substantial underestimation of the variation, and
therefore an overestimation of the precision of the data.
The way in which the exposures and outcomes
have been assessed in this paper is also cause for concern. The authors use an
“established diagnosis” of either JRA or ALL as the reference standard against which the accuracy of the other
diagnostic markers is assessed, however there does not
appear to have been a standard set of criteria for confirming these diagnoses
and certainly no indication that both case and control children have been
assessed using the same set of criteria. As a result of this it is possible
that some of the children diagnosed as having JRA may not have had a diagnosis
of all ruled out. Perhaps even more worryingly, the assessment of diagnostic
markers was made on the basis of a retrospective assessment of medical records
made by physicians who were not blind to the disease state of the children.
These physicians are therefore in a position to alter the exposure data in
light of the outcome – either consciously or unconsciously. Both of these
issues introduce a substantial opportunity for bias. It is difficult to interpret
the data reported on the accuracy of these clinical indicators given the potential
weakness of the data collection and that they are being compared to a poorly
defined reference standard which may not have been consistently applied to all
of the children.
While we can understand that authors
without statistical expertise may not be confident about appropriate methods
for calculation of diagnostic accuracy parameters and confidence intervals,
these kinds of issues emphasise the importance of statistical review of papers.
Tari Turner
Senior Project Officer
Associate Professor Damien Jolley
Deputy Director
Monash Institute of Health Services
Research
References
1. Jones, O.Y., et al., A
multicenter case-control study on predictive factors distinguishing childhood
leukemia from juvenile rheumatoid arthritis. Pediatrics, 2006. 117(5):
p. e840-4.
Figure 1. Diagnostic Test Parameters
|
|
|
Result on Gold
Standard Reference Test
|
|
|
|
|
Positive
|
Negative
|
|
|
Result on New Test or Indicator
|
Positive
|
True
positives (A)
|
False
positives (C)
|
Total positive to
new test (A+C)
|
|
Negative
|
False
negatives (B)
|
True
negatives (D)
|
Total positive to
new test (B+D)
|
|
|
|
Total who have
disease (A+B)
|
Total who do not
have disease (C+D)
|
|
Sensitivity = A/(A+B)
Specificity = D/(C+D)
PPV = A/(C+A)
NPV = D/(B+D)
Figure 2. Diagnostic Parameters for Low WBC
|
|
|
Result on Gold Standard
Reference Test
|
|
|
|
|
Positive
|
Negative
|
|
|
Result on New Test or Indicator
|
Positive
|
11
|
2
|
13
|
|
Negative
|
41
|
203
|
244
|
|
|
|
52
|
205
|
|
Sensitivity = 11/52 = 21%
Specificity = 203/205 = 99%
PPV = 11/13 = 85%
NPV = 203/244 = 83%