eLetters is an online forum for ongoing peer review. To submit an eLetter please go to the article you wish to respond to and click on the link that reads "eLetters: Submit a Response." Submission of eLetters are open to all health care professionals and experts in related fields.

eLetters to:

ARTICLES:
Oskar Baenziger, Florian Stolkin, Mathias Keel, Kurt von Siebenthal, Jean-Claude Fauchere, Seema Das Kundu, Vera Dietz, Hans-Ulrich Bucher, and Martin Wolf
The Influence of the Timing of Cord Clamping on Postnatal Cerebral Oxygenation in Preterm Neonates: A Randomized, Controlled Trial
Pediatrics 2007; 119: 455-459 [Abstract] [Full text] [PDF]
*eLetters: Submit a response to this article

eLetters published:

[Read eLetters] Of clinical significance and varying variability
Mathieu Lemaire   (3 October 2007)

Of clinical significance and varying variability 3 October 2007
  Top
Mathieu Lemaire,
Nephrology Fellow
Hospital for Sick Children

Send letter to journal:
Re: Of clinical significance and varying variability

mathieu.lemaire{at}sickkids.ca Mathieu Lemaire

To the editor:

To the editor:

 

We read with interest the article by Baenziger and colleagues entitled “Delayed Cord Clamping and Cerebral Oxygenation”, published in the March Edition of the Pediatrics.[1] While the conclusion of the study is tantalizing (“delayed cord clamping increases cerebral oxygenation for the first 24 hours after birth”), it also generated a number of questions that mostly relate to basic statistics.

 

It is undeniable that [brain] tissue oxygen saturation (StO2, also known as “TOSc”) was higher in the experimental versus the control group, as demonstrated by the p < 0.05 at both 4hrs and 24hrs (using Mann-Whitney U test). Nevertheless, we question whether it “clearly demonstrates a [clinically significant] higher cerebral tissue oxygenation in the experimental group.” As pointed out by Kain and MacLaren in the same Edition of Pediatrics in their enlightening article about p values, it is very important to determine “if the difference of primary end points between group is meaningful to a patient.”[2] The real question in this case, which relates to biological plausibility, is whether one expects that a 4-6% StO2 difference is enough to promote better outcomes (at 4hrs: 69.8% vs. 63.4%; at 24hrs: 71.4% vs. 67.1%)? The conclusion of the authors would obviously be considerably strengthened if they uncovered differences in short- and/or long-term neurological outcomes between the two groups.

 

On this note, we were particularly intrigued by the fact that the standard deviations (SD) for StO2 mentioned in the current article is much smaller than that published in previous reports by the same team, using similar technology (Critikon 2020 Cerebral RedOx Monitor) and methodology for StO2 measurements (‘Critikon algorithm’).[3] While the mean gestational ages (GA) were similar (29.9 wks vs. 30.5 wks), the GA ranges were slightly different (24-32 wks vs. 25-36 wks).  The weighted estimate SD in the current study is 1.39% (see Table 1); we will use 1.5% hereafter to simplify calculations.

 

Table 1

Article

Groups

StO2/TOSc (%)

SDs (%)

2007 (StO2)

Experiment (n = 15)

4 hours

69.81

1.53

24 hours

71.36

1.34

Control (n=24)

4 hours

63.37

1.46

24 hours

67.07

1.26

Weighted estimate SD from both groups

1.39

2000 (TOSc)

Experiment (n = 20)

64.7

7.2

 

For example, while the SD in the current report is 1.5% (n=39), the SD was in the order of 7.2% in another study published in 2000 (n=20).[3] Since the patient populations in both studies were comparable, it seems unlikely that a mere doubling in sample size would result in such dramatic improvement in SD. Although it is possible that the slightly tighter GA range in the current study may have resulted in less inter-patient variability, it seems implausible to explain the degree of change observed.

 

The simplest way to approach this interesting statistical conundrum is to use a formula designed to provide the expected impact of increasing the sample size on the SD: the SD is multiplied by 1/ (sq. root x), where x is the ratio of the sample size of the new study over that of the previous one. For our purpose, this would mean that doubling of the sample size (39/20) should reduce the SD by 1/(sq. root 2). Thus, if only the sample size is changed, and the patient population and methodology remained unaltered, we would have expected a reduction in the SD from 7% to about 5% (hereafter referred as the ‘expected SD’). This corresponds to the inherent variability change (in this case, a reduction) that one would predict from modification of the sample size alone.

 

An alternative way to analyze this situation is to calculate the actual variability reduction observed between both studies. Unfortunately, direct comparison of the SDs is an inadequate tool for this purpose since it does not take into account the relative sample sizes. However, comparison of the extent of data spread around the means should provide us with a close reflection of sampling variability. This is achieved by calculating the sums of squared deviations from the means [SUM (x - xmean)2] using the equations for variance and SD as follows:

 

SD = sq. root variance

Variance = SUM (x - xmean)2 / n -1

 

Where SUM (x - xmean)2 computes the sum of the differences between each measurement and the mean (the difference is then squared to insure that there are no negative values), divided by n - 1 (where ‘n’ is the sample size).

 

To get an estimate of the of sum of squared deviations from the means of each study, we solve the equation for SUM (x - xmean)2:

 

SUM (x - xmean)2 = SD2 x (n-1)

 

As demonstrated in Table 2, an impressive 9-fold reduction in sampling variability is obtained from the numbers provided by the authors. Using the ‘expected SD’ calculated above, and feeding it in the same set of equations, we note that despite a reduction in SD of 2%, the sampling variability is unchanged since the ratio is near unity (0.98). This is precisely what one would expect if the same methodology is used since the built-in sources of error have not changed (but are more consistent owing to larger number of samples).

 

Table 2

 

n-1

SD

SD2

SD2 x (n-1)

SUM (x - xmean)22000/ SUM (x - xmean)2NEW

2000 study

19

7%

49

931

 

2007 study

38

1.5%

2.25

85.5

9.25

2000 vs. 2007

“Expected”

38

5%

25

950

0.98

2000 vs. expected

 

Unless the authors have made some unstated technical improvement is measuring StO2, it is unclear why such a remarkable reduction in sampling variability occurred. While we do not question the validity of the results presented, it is undeniable however that such significant reduction of the SD must translates into increased likelihood of obtaining statistically significant p values.

 

This issue is particularly puzzling given the fact that the authors’ own conclusion in the previous study (2000) was that there was “a highly significant unexplained inter-patient variability, which is the major drawback of [the method].”[3] A recent review from 2003 corroborates that the “highly significant inter-patient variability” undermines the “clinical value of TOS measurements.”[4]

 

We would appreciate if the authors could comment on these issues.

 

Mathieu Lemaire, MDCM MSc

Pediatric Nephrology Fellow

Hospital for Sick Children

Toronto, Canada

 

References:

[1]  Baenziger O et al. (2007) Delayed Cord Clamping and Cerebral Oxygenation. Pediatrics 119(3):455-459.

[2]  Kain ZN & MacLaren J (2007) P less then .05.  Pediatrics 119(3):608-610.

[3]  Wolf M et al. (2000) Tissue oxygen saturation measured by near infrared spectrophotometry correlates with arterial oxygen saturation during induced oxygenation changes in neonates. Physiol Measur 21:481-491.

[4]  Nicklin SE et al. (2003) The light still shines, but not that brightly? The current status of perinatal near infrared spectroscopy. Arch Dis Child 88(4):263-268.

 

 

Conflict of Interest:

None declared