To the editor,
The article by Spandorfer et al. (1) about rehydration was reviewed
during a residents’ journal club. Following an interesting discussion, it
was felt that a few methodological issues may need to be addressed.
Attention to details of design is crucial for non-inferiority trials.
Two aspects of this study deserve further attention: the “role assignment”
of the different treatments and the sample size calculation in the context
of a non-inferiority trial.
The first issue relates to the role assignment chosen by the authors
for ORT and IVF in their trial. In the context of a non-inferiority trial,
the experimental procedure/treatment must be tested against the most
current gold standard therapy (known as the active-control) (3).
Traditionally, the inherent therapeutic value of the gold standard is
documented with placebo-controlled trials. In this case, it would be
resonable to assume that both IVF and ORT are superior to placebo, but
which one is the gold standard, and deservedly, the active-control in the
trial? Recent evidence from a small randomized controlled trial mentioned
by Spandorfer and colleagues suggests that ORT is superior to IVF (3).
Furthermore, according to the authors, “ORT is recommended by the AAP and
the WHO as first-line therapy for mild to moderate dehydration.”(1) In
this context, it is very interesting to note that the role of active-
control was paradoxically assigned to IVF. Unfortunately, no reference to
placebo-controlled studies are provided to support the choice of IVF as an
active-control. In theory, this constitutes not only an effective demotion
of ORT within the hierarchy of possible therapeutic avenues, which does
not appear to be supported by a priori evidence, but it may also
insiduously promote the primacy of physicians’ pragmatic preferences over
available evidence.
This leads to two important questions. First, does this constitute
evidence of the authors’ occult biases, crystalized for us in the
methodology of their study? A contrario to their hypothesis that “if ORT
was shown to be as effective as IVF, practitioners might be more likely to
adopt [ORT] in their practice”, we contend that it as likely that
demonstration of non-inferiority could also be interpreted quite
differently by physicians : why should I change my practice if it makes no
difference? A superiority study with head-to-head comparison of ORT versus
IVF (as in the study by Gomberg-Maitland M et al. (3)) would have been
much more informative, and may have effectively dismantled the current
status quo. By chosing a non-inferiority trial design and opting for IVF
as the active control for no clear reason, one may argue that the fate of
ORT versus IVF was essentially sealed from inception.
Second, and more importantly, could the conclusions of the study have
been different if the role assignment was inverted? Indeed, from a purely
logical standpoint, since ORT is established as first line therapy, it
seems rather obvious that it should at least be non-inferior to IVF,
otherwise IVF would be first line therapy (given that the evidence
available is solid, which it is not). Note that the opposite is not
necessarily true. Let us illustrate this concept with an example: if
treatment A is established as the gold standard for disease X, asking
whether A is non-inferior to a new, unproven, experimental treatment B may
be seen as a rather fruitless demonstration. Indeed, if treatment B has no
activity whatsoever against disease X, treatment A will inevitably be
found non-inferior to B – non-inferiority trials are not designed to
demonstrate superiority, nor equivalence for that matter (2). What
clinicians want to know is whether B is non-inferior to A, or not; the
inverted proposition is arguably of questionable usefulness from a
clinical standpoint.
The second issue pertains to sample size calculation, which is
particularly important to maximize the reliability of the results obtained
in non-inferiority trials (4). Typically, sample sizes in non-inferiority
trials are much larger than that of placebo-controlled trials, owing to
the stringency of the delta value used. In this case, an a priori expected
sample size of 50 per arm appeared, at first glance, rather small (the
actual total number of patients included in the final analysis is 73) (1).
Unfortunately, the methodological approach used to determine sample size
is not mentioned nor referenced by the authors. Sample size calculation
was thus performed according to the guidelines provided by Jones et al. to
determine the optimal sample size for a one-sided equivalence trial (i.e.,
a non-inferiority trial)2. It is noteworthy that this article is widely
cited to support sample size calculations for such trials (Web of Science
reports more than 280 citations).The formula is as follows:
N = 2p(100-p)/delta squared * [z(1-alpha)+ z(1-beta)]squared
where N is the sample size, p the overall percentage of successes to
be expected if the treatments are equivalent, delta the margin of non-
inferiority, alpha the type I error probability (significance) and beta
the type II error probability (1-beta = power). Since a failure rate of
20% for ORT is mentioned in the article, it will be assumed that the
expected rate of success is 80% for both arms. The margin of non-
inferiority was set at 5%. The one-sided alpha was 0.05, and the study was
powered at 80% (i.e. equivalent to Z-scores of 1.65 and 0.84,
respectively). Assuming that all these values were correctly extracted
from the article (and they may not since they were not all specifically
and clearly identified), and substituting them in the equation mentioned
above, we obtain
N = 2*80(100-80)/(5%)2 *[1.65+0.84]2
N = 793 patients per arm
N total = 2N = 1586 patients
According to this calculation, an optimal sample size to draw
reliable conclusions from a non-inferiority trial using the pre-specified
parameters delineated by the authors would be 1586 patients. Considering
the short time course of this study (4 hours), it would be reasonable to
argue that adding 10-25% patients to the calculated N, to account for
“dropouts”, would be unnecessary; the reasons underlying this decision
should nonetheless be stated. It is noteworthy that raising the margin of
non-inferiority to 10% would have reduced N by a factor of 4. On the other
hand, raising the power to 90% (Z-score of 1.28) would have increased N by
40%.
Assuming that the above calculations are sound, it appears as though
the analysis presented was carried out on a sample size consisting of a
mere 5% of the minimal number of patient that should have been expected.
Jones et al. warned that “the finding of equivalence [or non-inferiority]
may arise either from true equivalence [or non-inferiority] or from a
trial with poor discriminatory power – a trial which was too small.” (1)
If this study is truly underpowered to reject the null hypothesis, as
demonstrated above, must we conclude that the validity of the conclusions
drawn is questionable? In the event that another validated method for
sample size calculation was used, the equation(s) used as well as the
source reference would be greatly appreciated.
Despite the pragmatic attractiveness of the results presented by
Spandorfer et al. for any pediatric emergency department, it seems
reasonable to expect that the issues raised herein should be addressed by
the authors. It would be particularly important for the authors to clarify
further why the decided to perform a non-inferiority trial, and not a
superiority trial.
Competing interests: none declared.
1. Spandorfer, PR et al. (2005) Pediatrics 115:295-301.
2. Gomberg-Maitland M et al. (2003) Am Heart J 146:398-403.
3. Atherly-John YC et al. (2002) Arch Pediatr Adolesc Med 156:1240-
1243.
4. Jones B et al. (1996) BMJ 313:36-39.