To the editor:
We read with interest the
article by Baenziger and colleagues entitled “Delayed Cord Clamping and
Cerebral Oxygenation”, published in the March Edition of the Pediatrics.[1] While the conclusion of the study is tantalizing
(“delayed cord clamping increases cerebral oxygenation for the first 24 hours
after birth”), it also generated a number of questions that mostly relate to
basic statistics.
It is undeniable that
[brain] tissue oxygen saturation (StO2, also known as “TOSc”) was higher
in the experimental versus the control group, as demonstrated by the p <
0.05 at both 4hrs and 24hrs (using Mann-Whitney U test). Nevertheless, we
question whether it “clearly demonstrates a [clinically significant] higher
cerebral tissue oxygenation in the experimental group.” As pointed out by Kain
and MacLaren in the same Edition of Pediatrics in their enlightening article about p values, it is
very important to determine “if the difference of primary end points between
group is meaningful to a patient.”[2] The real question in this case, which
relates to biological plausibility, is whether one expects that a 4-6% StO2
difference is enough to promote better outcomes (at 4hrs: 69.8% vs. 63.4%; at
24hrs: 71.4% vs. 67.1%)? The conclusion of the authors would obviously be
considerably strengthened if they uncovered differences in short- and/or
long-term neurological outcomes between the two groups.
On this note, we were
particularly intrigued by the fact that the standard deviations (SD) for StO2
mentioned in the current article is much smaller than that published in
previous reports by the same team, using similar technology (Critikon 2020
Cerebral RedOx Monitor) and methodology for StO2 measurements
(‘Critikon algorithm’).[3] While the mean gestational ages (GA) were similar
(29.9 wks vs. 30.5 wks), the GA ranges were slightly different (24-32 wks vs.
25-36 wks). The weighted estimate
SD in the current study is 1.39% (see Table 1); we will use 1.5% hereafter to
simplify calculations.
Table 1
|
Article
|
Groups
|
StO2/TOSc (%)
|
SDs (%)
|
|
2007
(StO2)
|
Experiment
(n = 15)
|
4 hours
|
69.81
|
1.53
|
|
24 hours
|
71.36
|
1.34
|
|
Control
(n=24)
|
4 hours
|
63.37
|
1.46
|
|
24 hours
|
67.07
|
1.26
|
|
Weighted estimate SD from both groups
|
1.39
|
|
2000
(TOSc)
|
Experiment
(n = 20)
|
64.7
|
7.2
|
For example, while the SD
in the current report is 1.5% (n=39), the SD was in the order of 7.2% in
another study published in 2000 (n=20).[3] Since the patient populations in
both studies were comparable, it seems unlikely that a mere doubling in sample
size would result in such dramatic improvement in SD. Although it is possible
that the slightly tighter GA range in the current study may have resulted in
less inter-patient variability, it seems implausible to explain the degree of
change observed.
The simplest way to
approach this interesting statistical conundrum is to use a formula designed to
provide the expected impact of increasing the sample size on the SD: the SD is
multiplied by 1/ (sq. root x), where x is the ratio of the sample size of the
new study over that of the previous one. For our purpose, this would mean that
doubling of the sample size (39/20) should reduce the SD by 1/(sq. root 2).
Thus, if only the sample size is changed, and the patient population and
methodology remained unaltered, we would have expected a reduction in the SD
from 7% to about 5% (hereafter referred as the ‘expected SD’). This corresponds
to the inherent variability change (in this case, a reduction) that one would
predict from modification of the sample size alone.
An alternative way to
analyze this situation is to calculate the actual variability reduction
observed between both studies. Unfortunately, direct comparison of the SDs is
an inadequate tool for this purpose since it does not take into account the relative
sample sizes. However, comparison of the extent of data spread around the means
should provide us with a close reflection of sampling variability. This is
achieved by calculating the sums of squared deviations from the means [SUM (x -
xmean)2] using the equations for variance and SD as
follows:
SD = sq. root variance
Variance = SUM (x - xmean)2
/ n -1
Where SUM (x - xmean)2
computes the sum of the differences between each measurement and the mean (the
difference is then squared to insure that there are no negative values),
divided by n - 1 (where ‘n’ is the sample size).
To get an estimate of the
of sum of squared deviations from the means of each study, we solve the
equation for SUM (x - xmean)2:
SUM (x - xmean)2
= SD2 x (n-1)
As demonstrated in Table
2, an impressive 9-fold reduction in sampling variability is obtained from the
numbers provided by the authors. Using the ‘expected SD’ calculated above, and
feeding it in the same set of equations, we note that despite a reduction in SD
of 2%, the sampling variability is unchanged since the ratio is near unity
(0.98). This is precisely what one would expect if the same methodology is used
since the built-in sources of error have not changed (but are more consistent
owing to larger number of samples).
Table 2
|
|
n-1
|
SD
|
SD2
|
SD2 x (n-1)
|
SUM (x - xmean)22000/ SUM
(x - xmean)2NEW
|
|
2000
study
|
19
|
7%
|
49
|
931
|
|
|
2007
study
|
38
|
1.5%
|
2.25
|
85.5
|
9.25
|
2000 vs. 2007
|
|
“Expected”
|
38
|
5%
|
25
|
950
|
0.98
|
2000 vs. expected
|
Unless the authors have
made some unstated technical improvement is measuring StO2, it is
unclear why such a remarkable reduction in sampling variability occurred. While
we do not question the validity of the results presented, it is undeniable
however that such significant reduction of the SD must translates into
increased likelihood of obtaining statistically significant p values.
This issue is particularly
puzzling given the fact that the authors’ own conclusion in the previous study
(2000) was that there was “a highly significant unexplained inter-patient
variability, which is the major drawback of [the method].”[3] A recent review
from 2003 corroborates that the “highly significant inter-patient variability”
undermines the “clinical value of TOS measurements.”[4]
We would appreciate if the
authors could comment on these issues.
Mathieu Lemaire, MDCM MSc
Pediatric Nephrology
Fellow
Hospital for Sick Children
Toronto, Canada
References:
[1] Baenziger O et al. (2007) Delayed Cord
Clamping and Cerebral Oxygenation. Pediatrics 119(3):455-459.
[2] Kain ZN & MacLaren J (2007) P less
then .05. Pediatrics
119(3):608-610.
[3] Wolf M et al. (2000) Tissue oxygen
saturation measured by near infrared spectrophotometry correlates with arterial
oxygen saturation during induced oxygenation changes in neonates. Physiol
Measur 21:481-491.
[4] Nicklin SE et al. (2003) The light still
shines, but not that brightly? The current status of perinatal near infrared
spectroscopy. Arch Dis Child 88(4):263-268.