eLetters is an online forum for ongoing peer review. To submit an eLetter please go to the article you wish to respond to and click on the link that reads "eLetters: Submit a Response." Submission of eLetters are open to all health care professionals and experts in related fields.

eLetters to:

ARTICLES:
Olcay Y. Jones, Charles H. Spencer, Suzanne L. Bowyer, Peter B. Dent, Beth S. Gottlieb, and C. Egla Rabinovich
A Multicenter Case-Control Study on Predictive Factors Distinguishing Childhood Leukemia From Juvenile Rheumatoid Arthritis
Pediatrics 2006; 117: e840-e844 [Abstract] [Full text] [PDF]
*eLetters: Submit a response to this article

eLetters published:

[Read eLetters] Need for statistical review
Tari J Turner, Damien Jolley   (29 April 2007)

Need for statistical review 29 April 2007
  Top
Tari J Turner,
Senior Project Officer
Monash Institute of Health Services Research,
Damien Jolley

Send letter to journal:
Re: Need for statistical review

tari.turner{at}med.monash.edu.au Tari J Turner, et al.

Dear Editor,

 

We read with interest the recent paper by Jones et al1 examining the accuracy of clinical factors in distinguishing acute lymphocytic leukaemia (ALL) from juvenile rheumatoid arthritis (JRA). We believe that throughout this paper where the authors refer to “sensitivity” and “specificity” the data they report are in fact positive predictive values (PPV) and negative predictive values (NPV), respectively. We are also concerned about the calculation of confidence intervals, as well as the methods of assessment of both exposure and outcome in this paper.

 

The sensitivity and specificity of a diagnostic tool refer to the ability of the tool to return a positive test in patients who truly have the disease of interest, and to return a negative test in patients who truly do not have the disease respectively. Thus sensitivity is calculated as the number of true positives divided by the sum of the true positives and false negatives, and specificity is calculated as the number of true negatives divided by the sum of the true negatives and false positives (see figure 1).

 

In contrast, the positive predictive value is the proportion of patients with a positive test who truly have the disease, and the proportion of patients with a negative test who truly do not have the disease. These two values are calculated as the number of true positives divided by the sum of the true positives and false positives and the number of true negatives divided by the sum of the true negatives and false negatives (see figure 1).

 

In looking at the data presented in Table 1 of the paper it can be seen that for the indicator ‘low white blood cell count (WBC)’ the total number of children with leukaemia is 52, and of these 11 have a positive test (have low WBC). Dividing 11 by 52 gives a sensitivity of 21%. The sensitivity figure quoted of 85% is instead the PPV, calculated by dividing 11 by 13 (the number of children with either ALL or arthritis who have low WBC). Similarly, 203 of the 205 children with JRA have negative test (do not have low WBC), and dividing 203 by 205 gives and specificity of 99%. The figure quoted of 83% is instead the NPV, calculated by dividing 203 by the 244 (the number of children with either ALL or arthritis (JRA) who do not have low WBC). See figure 2 below. This same confusion of sensitivity with PPV and specificity with NPV has been made every time the terms are used throughout the paper.

 

The difference between these terms is important not only for semantic reasons but also because PPV and NPV vary markedly with the prevalence of the disease, whereas, theoretically at least, sensitivity and specificity do not. The study did not examine an unselected cohort of children representative of the normal spectrum of children who would normally present for assessment, but rather followed a case-control methodology, comparing 53 ‘cases’, children with blast negative ALL with 205 ‘controls’, children with JRA. This approximately 1:4 ratio is unlikely to reflect the ratio of prevalence of ALL to JRA seen in clinical practice. For example, while the PPV of 85% for low WBC is impressive (meaning that 85% of children with a positive test will have ALL), if the ratio of diseases is more like 1:10 then the PPV rapidly decreases to 67%, and again to 33% for a 1:50 ratio.

 

We are also concerned about the way in which confidence intervals have been calculated. It appears that these are based on the variation in the entire cohort rather than the variation within the relevant subgroups. Using low WBC as an example, the reported sensitivity (which is actually PPV) of 85% is reported with confidence intervals from 80 to 89%. These appear to be based on calculations including the entire group of 257 children, rather than on the 13 children whose data has been included in the calculation. This leads to a substantial underestimation of the variation, and therefore an overestimation of the precision of the data.

 

The way in which the exposures and outcomes have been assessed in this paper is also cause for concern. The authors use an “established diagnosis” of either JRA or ALL as the reference standard against which the accuracy of the other diagnostic markers is assessed, however there does not appear to have been a standard set of criteria for confirming these diagnoses and certainly no indication that both case and control children have been assessed using the same set of criteria. As a result of this it is possible that some of the children diagnosed as having JRA may not have had a diagnosis of all ruled out. Perhaps even more worryingly, the assessment of diagnostic markers was made on the basis of a retrospective assessment of medical records made by physicians who were not blind to the disease state of the children. These physicians are therefore in a position to alter the exposure data in light of the outcome – either consciously or unconsciously. Both of these issues introduce a substantial opportunity for bias. It is difficult to interpret the data reported on the accuracy of these clinical indicators given the potential weakness of the data collection and that they are being compared to a poorly defined reference standard which may not have been consistently applied to all of the children.

 

While we can understand that authors without statistical expertise may not be confident about appropriate methods for calculation of diagnostic accuracy parameters and confidence intervals, these kinds of issues emphasise the importance of statistical review of papers.

 

Tari Turner

Senior Project Officer

 

Associate Professor Damien Jolley

Deputy Director

 

Monash Institute of Health Services Research

 

References

1.         Jones, O.Y., et al., A multicenter case-control study on predictive factors distinguishing childhood leukemia from juvenile rheumatoid arthritis. Pediatrics, 2006. 117(5): p. e840-4.

 

Figure 1. Diagnostic Test Parameters

 

 

Result on Gold Standard Reference Test

 

 

 

Positive

Negative

 

Result on New Test or Indicator

Positive

True
positives (A)

False
positives (C)

Total positive to new test (A+C)

Negative

False
negatives (B)

True
negatives (D)

Total positive to new test (B+D)

 

 

Total who have disease (A+B)

Total who do not have disease (C+D)

 

 

Sensitivity = A/(A+B)

Specificity = D/(C+D)

PPV = A/(C+A)

NPV = D/(B+D)

 

Figure 2. Diagnostic Parameters for Low WBC

 

 

Result on Gold Standard Reference Test

 

 

 

Positive

Negative

 

Result on New Test or Indicator

Positive

11

2

13

Negative

41

203

244

 

 

52

205

 

 

Sensitivity = 11/52 = 21%

Specificity = 203/205 = 99%

PPV = 11/13 = 85%

NPV = 203/244 = 83%

 

Conflict of Interest:

None declared