## Abstract

**Objective.** To create a recommendation for pediatricians and other primary care providers about their role as screeners for detecting developmental dysplasia of the hip (DDH) in children.

**Patients.** Theoretical cohorts of newborns.

**Method.** Model-based approach using decision analysis as the foundation. Components of the approach include the following:

*Perspective:* Primary care provider.

*Outcomes:* DDH, avascular necrosis of the hip (AVN).

*Options:* Newborn screening by pediatric examination; orthopaedic examination; ultrasonographic examination; orthopaedic or ultrasonographic examination by risk factors. Intercurrent health supervision-based screening.

*Preferences:* 0 for bad outcomes, 1 for best outcomes.

*Model:* Influence diagram assessed by the Subcommittee and by the methodology team, with critical feedback from the Subcommittee.

*Evidence Sources:* Medline and EMBASE search of the research literature through June 1996. Hand search of sentinel journals from June 1996 through March 1997. Ancestor search of accepted articles.

*Evidence Quality:* Assessed on a custom subjective scale, based primarily on the fit of the evidence to the decision model.

**Results.** After discussion, explicit modeling, and critique, an influence diagram of 31 nodes was created. The computer-based and the hand literature searches found 534 articles, 101 of which were reviewed by 2 or more readers. Ancestor searches of these yielded a further 17 articles for evidence abstraction. Articles came from around the globe, although primarily Europe, British Isles, Scandinavia, and their descendants. There were 5 controlled trials, each with a sample size less than 40. The remainder were case series. Evidence was available for 17 of the desired 30 probabilities. Evidence quality ranged primarily between one third and two thirds of the maximum attainable score (median: 10–21; interquartile range: 8–14).

Based on the raw evidence and Bayesian hierarchical meta-analyses, our estimate for the incidence of DDH revealed by physical examination performed by pediatricians is 8.6 per 1000; for orthopaedic screening, 11.5; for ultrasonography, 25. The odds ratio for DDH, given breech delivery, is 5.5; for female sex, 4.1; for positive family history, 1.7, although this last factor is not statistically significant. Postneonatal cases of DDH were divided into mid-term (younger than 6 months of age) and late-term (older than 6 months of age). Our estimates for the mid-term rate for screening by pediatricians is 0.34/1000 children screened; for orthopaedists, 0.1; and for ultrasonography, 0.28. Our estimates for late-term DDH rates are 0.21/1000 newborns screened by pediatricians; 0.08, by orthopaedists; and 0.2 for ultrasonography. The rates of AVN for children referred before 6 months of age is estimated at 2.5/1000 infants referred. For those referred after 6 months of age, our estimate is 109/1000 referred infants.

The decision model (reduced, based on available evidence) suggests that orthopaedic screening is optimal, but because orthopaedists in the published studies and in practice would differ, the supply of orthopaedists is relatively limited, and the difference between orthopaedists and pediatricians is statistically insignificant, we conclude that pediatric screening is to be recommended. The place of ultrasonography in the screening process remains to be defined because there are too few data about postneonatal diagnosis by ultrasonographic screening to permit definitive recommendations. These data could be used by others to refine the conclusions based on costs, parental preferences, or physician style. Areas for research are well defined by our model-based approach.

- DDH =
- developmental dysplasia of the hip •
- PE =
- physical examination •
- AVN =
- avascular necrosis of the hip •
- SD =
- standard deviation

## I. GUIDELINE METHODS

### A. Decision Model

The steps required to build the model were taken with the Subcommittee as a whole, with individuals in the group, and with members of the methodology team. Agreement on the model was sought from the Subcommittee as a whole during face-to-face meetings.

1. Perspective

Although there are a number of perspectives to take in this problem (parental, child's, societal, and payer's), we opted for the view of the practicing clinician: What are the clinician's obligations, and what is the best strategy for the clinician? This choice of perspective meant that the focus would be on screening for developmental dysplasia of the hip (DDH) and obviated the need to review the evidence for efficacy or effectiveness of specific strategies.

2. Context

The target child is a full-term newborn with no obvious orthopaedic abnormalities. Children with such findings would be referred to an orthopaedist, obviating the need for a practice parameter.

3. Options

We focused on the following options: screening by physical examination (PE) at birth by a pediatrician, orthopaedist, or other care provider; ultrasonographic screening at birth; and episodic screening during health supervision. Treatment options are not included.

We also included in our model a wide range of options for managing the screening process during the first year of life when the newborn screening was negative.

4. Outcomes

Our focus is on dislocated hips at 1 year of age as the major morbidity of the disease and on avascular necrosis of the hip (AVN), as the primary sentinel complication of DDH therapy.

Ideally, we would have a “gold standard” that would define DDH at any point in time, much as cardiac output can be obtained from a pulmonary-artery catheter. However, no gold standard exists. Therefore, we defined our outcomes in terms of the process of care: a pediatrician and an ultrasonographer perform initial or confirmatory examinations and refer the patient, whereas the orthopaedist treats the patient. It is the treatment that has the greatest effect on postneonatal DDH or on complications, so we focus on that intermediate outcome, rather than the orthopaedist's stated diagnosis.

We operationalized the definitions of these outcomes for use in abstracting the data from articles. Table 1 presents our definitions. A statement that a “click” was found on PE was considered to refer to an intermediate result, unless the authors defined their “click” in terms of our definition of a positive examination. *Dynamic*ultrasonographic examinations include those of Harcke et al,^{1} and *static* refers primarily to that of Graf.^{2} The radiologic focus switches from ultrasonography to plain radiographs after 4 months of age, in keeping with the development of the femoral head.

5. Decision Structure

We used an influence diagram^{3–5} to represent the decision model. In this representation, nodes refer to actions to be taken (rectangles) or to states of the world (the patient) about which we are uncertain (ovals). We devoted substantial effort to the construction of a model that balanced the need to represent the rich array of possible screening pathways with the need to be parsimonious. We constructed the master influence diagram (Fig 1) and determined its construct validity through consensus by the Subcommittee before data abstraction. However, the available evidence could specify only a portion of the diagram. The missing components suggest research questions that need to be posed.Figure 2 depicts the master influence diagram. Table 2 gives the node definitions.

6. Probabilities

The purpose of the literature review was to provide the probabilities required by the decision model. The initial list of required probabilities is given in Table 2. The initial number of individual probabilities was 55. (Sensitivity and specificity for a single truth-indicator pair are counted as a single probability because they are garnered from the same table.)

Although this is a large number of parameters, the structure of the model helped the team of readers. As 1 reader said, referring to the influence diagram, “Because we did the picture together, it was easy to find the parameters.”

What follows are some operational rules for matching the data to our parameters. The list is not complete.

If an orthopaedic clinic worked at case finding, we used our judgment to determine whether to accept such reports as representing a population incidence (eg, target article 1).*

Risk factors were included generally only if a true control group was used for comparison (eg, not in target article 1).

For postneonatal diagnoses, no study we reviewed included the examination of all children without DDH, say, 1 year of age, so there is always the possibility of missed cases (false-negative diagnoses) in the screen, which leads to a falsely elevated estimate of the denominator (eg, target article 2). For studies originating in referral clinics, the data on the reasons for referrals were not usable for our purposes (eg, target article 3).

7. Preferences

Ideally, we would have cost data for the options, as well as patient data on the human burden of therapy and of DDH itself. We have deferred these assessments to later research. Therefore, we assigned a preference score of 0 to DDH at 1 year of age and 1 to its absence; for AVN, we assigned 0 for presence at 1 year of age and 1 for absence at 1 year of age.

### B. Literature Review

For the literature through May 1995, the following sources were searched: Books in Print, CATLINE, Current Contents, EMBASE, Federal Research in Progress, Health Care Standards, Health Devices Alerts, Health Planning and Administration, Health Services/Technology Assessment, International Health Technology Assessment, and Medline. Medline and EMBASE were searched through June 1996. The search terms used in all databases included the following: hip dislocation, congenital; hip dysplasia; congenital hip dislocation; developmental dysplasia; ultrasonography/adverse effects; and osteonecrosis. Hand searches of leading orthopaedic journals were performed for the issues from June 1996 to March 1997. The bibliographies of journals accepted for use in formulating the practice parameter also were perused.

The titles and the abstracts were then reviewed by 2 members of the methodology team to determine whether to accept or reject the articles for use. Decisions were reviewed by the Subcommittee, and conflicts were adjudicated. Similarly, articles were read by pairs of reviewers; conflicts were resolved in discussion.

The focus of the data abstraction process was on data that would provide evidence for the probabilities required by the decision model.

As part of the literature abstraction process, the evidence quality in each article was assessed. The scoring process (Table 3) was based on our decision model and involved traditional epidemiologic concerns, like outcome definition and bias of ascertainment, as well as influence–diagram-based concerns, such as how well the data fit into the model.

*Cohort definition*: Does the cohort represented by the denominator in the study match a node in our influence diagram? Does the cohort represented by the numerator match a node in our influence diagram? The closer the match, the more confident we are that the reported data provide good evidence of the conditional probability implied by the arrow between the corresponding nodes in the influence diagram.

*Path*: Does the implied path from denominator to numerator lead through 1 or more nodes of the influence diagram? The longer the path, the more likely that uncontrolled biases entered into the study, making us less confident about accepting the raw data as a conditional probability in our model.

*Assignment and comparison*: Was there a control group? How was assignment made to experimental or control arms? A randomized, controlled study provides the best quality evidence.

*Follow-up*: Were patients with positive and negative initial findings followed up? The best studies should have data on both.

*Outcome definition*: Did the language of the outcome definitions (PE, orthopaedic examination, ultrasonography, and radiography) match ours, and, in particular, were PE findings divided into 3 categories or 2? The closer the definition to ours, the more we could pool the data. Studies with only 2 categories do not help to distinguish clicks from “clunks.”

*Ascertainment*: When the denominator represented more than 1 node, to what degree was the denominator a mix of nodes? The smaller the contamination, the more confident we were that the raw data represented a desired conditional probability.

*Results*: Did the results fill an entire table or were data missing? This is related to the follow-up category but is more general.

### C. Synthesis of Evidence

There are 3 levels of evidence synthesis.

Listing evidence for individual probabilities

Summarizing evidence across probabilities

Integrating the pooled evidence for individual probabilities into the decision model

A list of evidence for an individual probability (or arc) is called an *evidence table* and provides the reader a look at the individual pieces of data.

The probabilities are summarized in 3 ways: by averaging, by averaging weighted by sample size (pooled), and by meta-analysis. We chose Bayesian meta-analytic techniques,^{6} which allow the representation of *prior belief* in the evidence and provide an explicit portrayal of the uncertainty of our conclusions. The framework we used was that of a hierarchical Bayesian model,^{7} similar to the random effects model in traditional meta-analysis.^{8} In this hierarchical model (Fig 3), each study has its own parameter, which, in turn, is sampled from a wider population parameter. Because there are 2 stages (ie, population to sample and sample to observation), and, therefore, the population parameter of interest is more distant from the data, the computed estimates in the population parameters are, in general, less certain (wider confidence interval) than simply pooling the data across studies. This lower certainty is appropriate in the DDH content area because the studies vary so widely in their raw estimates because of the range in time and geography over which they were performed.

In the Bayesian model, the observations were assumed to be Poisson distributed, given the study DDH rates. Those rates, in turn, were assumed to be Gamma distributed, given the population rate. The prior belief on that rate was set as Gamma (α, β), with mean α/β, and variance α/β^{2} (as defined in the BUGS software^{9}). In this parameterization, α has the semantics closest to that of location, and β has the semantics of certainty: the higher its value, the narrower the distribution and the more certain we are of the estimate. The parameter, α, was modeled as Exponential (1), and β, as Gamma (0.01, 1), with a mean of 0.01. Together, these correspond to a prior belief in the rate of a mean of 100 per 1000, and a standard deviation (SD) of 100, representing ignorance of the true rate.

As an example of interpretation, for pediatric newborn screening, the posterior α was 1.46, and the posterior β was 0.17, to give a posterior rate of 8.6/1000, with a variance of 50, or an SD of 7.1. Note that the value of β rose from 0.01 to 0.17, indicating a higher level of certainty (Fig 4; Table 4).

The Bayesian confidence interval is the narrowest interval that contains 95% of the area under the posterior-belief curve.^{10} The confidence interval for the prior curve is 2.53 to 370. The confidence interval for the posterior curve is 0.25 to 27.5, a significant shrinking and increase in certainty but still broad.

The model for the odds ratios is more complicated and is based on the Oxford data set and analysis in the BUGS manual.^{9}

### D. Thresholds

In the course of discussions about results, the Subcommittee was surveyed about the acceptable risks of DDH for different levels of interventions.

### E. Recommendations

Once the evidence and thresholds were obtained, a decision tree was created from the evidence available and was reviewed by the Subcommittee. In parallel, a consensus guideline (flowchart) was created. The Subcommittee evaluated whether evidence was available for links within the guidelines, as well as their strength of consensus. The decision tree was evaluated to check consistency of the evidence with the conclusions.

### F. “Cost”-Effectiveness Ratios

To integrate the results, we defined cost-effectiveness ratios, in which *cost* was excess neonatal referrals or excess cases of AVNs, and *effectiveness* was a decrease in the number of later cases. The decision tree from section E (“Recommendations”) was used to calculate the expected outcomes for each of pediatric, orthopaedic, and ultrasonographic strategies. Pediatric strategy was used as the baseline, because its neonatal screening rate was the lowest. The cost-effectiveness ratios then were calculated as the quotient of the difference in cost and the difference in effect.

## RESULTS

### A. Articles

Figure 5 shows the article-winnowing process. The distribution over publication years is shown in Fig 6. The peak number of articles is for 1992, with 10 articles. The articles are from sites all over the world, although the Nordic, Anglo-Saxon, and European communities and their descendants are the most represented (Fig 7).

### B. Evidence

By traditional epidemiologic standards, the quality of evidence in this set of articles is uniformly low. There are few controlled trials and few studies in which infants with negative results on their newborn examinations are followed up. (A number of studies attempted to cover all possible places where an affected child might have been ascertained.)

We found data on all chance nodes, for a total of 298 distinct tables. *Decision* nodes were poorly represented: beyond the neonatal strategy, there were almost no data clarifying the paths for the diagnosis children after the newborn period. Thus, although communities like those in southeast Norway have a postnewborn screening program, it is unclear what the program was, and it was unclear how many examination results were normal before a child was referred to an orthopaedist (eg, target articles 4 and 5).

The distribution of evidence qualities is shown in Fig 8. The mode is a score of 10, achieved in 16 articles. The median is 9.9, with an interquartile range of 8 to 14, suggesting that articles with scores below 8 are poor sources of evidence. Note that the maximum achievable quality score is 21, so half the articles do not achieve half the maximum quality score.

Graphing evidence quality against publication year suggests an improvement in quality over time, as shown in Fig 9, but the linear fit through the data is statistically indistinguishable from a flat line. (A nonparametric procedure yields the same conclusion).

The studies include 5 in which a comparative arm was designed into the study. The remainder are divided between prospective and retrospective studies. Surprisingly, the evidence quality is not higher in the former than in the latter (data not shown).

Of the 298 data tables, half the data tables relate to the following:

probabilities of DDH in different screening strategies

relative risk of DDH, given risk factors

the incidence of postneonatal DDH, and

the incidence of AVN.

The remainder of our discussion will focus on these probabilities.

### C. Evidence Tables

The evidence table details are found in the “Appendix.”

1. Newborn Screening

a. Pediatric Screening

There were 51 studies, providing 57 arms, for pediatric screening. However, of these, 17 were unclear on how the intermediate examinations were handled, and, unsurprisingly, their observed rates of positivity (clicks) were much higher than the studies that distinguished 3 categories, as we had specified. Therefore, we included only the 34 studies (target articles 3, 6–37) that used 3 categories.

For pediatric screening, the rate is about 8 positive cases per 1000 examinations. Fig 10 shows the distribution of the observed rates. The rates are distributed almost uniformly between 0 and 20 per 1000.

Figure 11 shows the distribution of the sample sizes for these studies; 3 outlier studies were excluded to avoid compression of the histogram. All studies represent a large experience: a total of 2 149 972 subjects. Although their methods may not have been the best, the studies demand attention simply because of their size.

In looking for covariates or confounding variables, we studied the relationship between positivity rate and the independent variables, year of publication (Fig 12), evidence quality, and sample size. Year and evidence quality show a positive effect: the higher the year (slope: 0.2; *P* = .018) or evidence quality (slope: 0.6; *P* = .046), the higher the observed rate. A model with both factors has evidence that suggests that most of the effect is in the factor, year (slope for year: 0.08;*P* = .038; slope for quality of evidence: 0.49;*P* = .09). Note that a regression using evidence quality is improper, because our evidence scale is not properly ratio (eg, the distance between 6 and 7 is not necessarily equivalent to the distance between 14 and 15), but the regression is a useful exploratory device.

### b. Orthopaedic Screening

Evidence was found in 25 studies (target articles 17, 23, 38–60). Three studies (target articles 43, 44, 54) provided 2 arms each.

As shown in Table 4, the positivity rate for orthopaedic screening is between 7 and 11/1000. One outlier study (target article 41), with an observed rate of more than 300/1000, skews the unweighted and meta-analytic averages. The estimate (between 7.1 and 11) is just below that of pediatric screening and is statistically indistinguishable. Note, however, that a fair number of studies have rates near 22/1000 or higher (Fig 13).

Unlike with pediatric screening, there are no correlations with other factors.

### c. Ultrasonographic Screening

Evidence was found in 17 studies (target articles 11, 22, 25, 31, 41, 54, 61–71), each providing a single arm.

The rate for ultrasonographic screening is 20/1000 or more. Although the estimates are sensitive to pooling and to the outlier, the positivity rate is clearly higher than in either PE strategy. There are no correlating factors. In particular, studies that use the Graf method^{2} or those that use the method of Harcke et al^{1} show comparable rates.

2. Postneonatal Cases

We initially were interested in all postneonatal diagnoses of DDH. However, the literature did not provide data within the narrow time frames initially specified for our model. Based on the data that were available, we considered 3 classes of postneonatal DDH: DDH diagnosed after 12 months of age (“late-term”), DDH diagnosed between 6 and 12 months of age (“mid-term”), and DDH diagnosed before 6 months of age. There were few data for the latter group, which often was combined with the newborn screening programs. Therefore, we collected data on only the first 2 groups. The results are summarized in Table 5 and Table 6.

### a. After Pediatric Screening

Evidence was found in 24 studies (target articles 1, 4, 7, 9, 12, 14, 15, 23, 25, 27, 30, 38, 40, 44, 72–81). The study by Dunn and O'Riordan (target article 14) provided 2 arms. It is difficult to discern an estimate rate for mid-term DDH, because the study by Czeizel et al (target article 40) is such an outlier, with a rate of 3.73/1000, and because the weighted and unweighted averages also differ greatly. The meta-analytic estimate of 0.55/1000 seems to be an upper limit.

The late-term rate is easier to estimate at ∼0.3/1000. Although it is intuitive that the late-term rate should be lower than the mid-term rate, our data do not allow us to draw that conclusion.

### b. After Orthopaedic Screening

There were only 4 studies (target articles 2, 43, 47, 55). The rates were comparable for mid- and late-term: 0.1/1000 newborns. A meta-analytic estimate was not calculated.

### c. After Ultrasonographic Screening

Only 1 study, by Rosendahl et al (target article 25) is available; it reported rates for infants with and without initial risk factors (eg, family history and breech presentation). The mid-term rate was 0.28/1000 newborns in the non-risk group, and the late-term rate was 0/1000 in the same group.

3. AVN After Treatment

For these estimates, we grouped together all treatments, because from the viewpoint of the referring primary care provider, orthopaedic treatment is a “black box:” A literature synthesis that teased apart the success and complications of particular*therapeutic* strategies is beyond the scope of the present study.

The complication rate should depend only on the age of the patient at time of orthopaedic referral and on the type of treatment received. We report on the complication rates for children treated before and after 12 months of age.

### a. After Early Referral

There were 17 studies providing evidence (target articles 2, 13, 35, 37, 42, 43, 51, 54, 58, 60, 77, 82–87). Infants were referred to orthopaedists during the newborn period in each study except 2. In the study by Pool et al (target article 84), infants were referred during the newborn period and before 2 months of age; in the study by Sochart and Paton (target article 87), infants were referred between 2 weeks and 2 months of age.

The range of AVN rates per 1000 infants referred was huge, from 0 to 123. The largest rate occurred in the study by Pool et al (target article 84), a sample-based study that included later referrals. Its evidence quality was 8, within the 7 to 13 interquartile range of the other studies in this group. As in earlier tables, the meta-analytic estimate lies between the average and weighted (pooled) average of the studies.

### b. After Later Referral

Evidence was obtained from 6 studies (target articles 19, 83 [includes 2 samples], 85, 88–90). Some of the studies included children referred during the newborn period (target article 19) or during the 2-week to 2-month period (target articles 85, 89), but even in these, the majority of infants were referred later during the first year of life.

There were no outlier rates, although the highest rate (216/1000 referred children) occurred in the study with the oldest referred children in the sample (target article 83) with children referred who were older than 12 months of age). One study (target article 19) contributed 5700 patients to the analysis, more than half of the 9270 total, so its AVN rate of 27/1000 brought the unweighted rate of 116/1000 to 54. Results are summarized in Table 7. A meta-analytic estimate was not computed.

4. Risk Factors

A number of factors are known to predispose infants to DDH. We sought evidence for 3 of these: sex, obstetrical position at birth, and family history. Studies were included in these analyses only if a control group could be ascertained from the available study data.

The key measure is the odds ratio, an estimate of the relative risk. The meaning of the odds ratio is that if the DDH rate for the control group is known, then the DDH rate for the at-risk group is the product of the control-group DDH rate and the odds ratio for the risk factor. An odds ratio statistically significantly greater than 1 indicates that the factor is a risk factor.

The Bayesian meta-analysis produces estimates between the average of the odds ratios and the pooled odds ratio and is, therefore, the estimate we used in our later analyses.

The data for all 3 risk factors are summarized in Table 8, and in Evidence Tables 9 through 16 (see “Appendix”).

### a. Female

The studies were uniform in discerning a risk to girls ∼4 times that of boys for being diagnosed with DDH. This risk was seen in all 3 screening environments.

### b. Breech

The studies for breech also were confident in finding a risk for breech presentation, on the order of fivefold. One study (target article 65) found breech presentation to be protective, but the study was relatively small and used ultrasonography rather than PE as its outcome measure.

### c. Family History

Although some studies found family history to be a risk factor, the range was wide. The confidence intervals for the pooled odds ratio and for the Bayesian analysis contained 1.0, suggesting that family history is *not* an independent risk factor for DDH. However, because of traditional concern with this risk factor, we kept it in our further considerations.

### D. Evidence Summary and Risk Implications

To bring all these evidence tables together, we constructed Table 9, which contains the estimates we chose for our recommendations. The intervals are asymmetric, in keeping with the intuition that rates near zero cannot be negative, but certainly can be very positive.

The risk implications are shown in Table 10 for infants with different risk factors. These risks are based on the pediatrician population rate of 8.6 labeled cases of DDH per 1000 infants screened. In the Subcommittee's discussion, 50/1000 was a cutoff for automatic referral during the newborn period. Hence, girls born in the breech position are classified in a separate category for newborn strategies than infants with other risk factors.

If we use the orthopaedists' rate as our baseline, we obtain the results shown in Table 11. Like Table 9, these numbers suggest that boys without risks or those with a family history have the lowest risk; girls without risks and boys born in the breech presentation have an intermediate risk; and girls with a positive family history, and especially girls born in the breech presentation, have the highest risks. Guidelines that consider risk factors should follow these risk profiles.

### E. Decision Recommendations

With the evidence synthesized, we can estimate the expected results of the target newborn strategies for postneonatal DDH and AVN.Table 12 summarizes Table 9 even further.

We use the numbers in Table 12 to arrive at summary outcomes for each initial strategy. Thus, if a case of DDH is observed in an infant with an initially negative result of screening by an orthopaedist in a newborn screening program, that case is “counted” against the orthopaedist strategy.

The numbers are combined using a simple decision tree (Fig 14), which is *not* the final tree represented by our influence diagram but is a tree that is supported by our evidence. The results are given in Table 13. The results show that pediatricians diagnose fewer newborns with DDH and perhaps have a higher postneonatal DDH rate than orthopaedists but one that is comparable to ultrasonography (acknowledging that our knowledge of postneonatal DDH revealed by ultrasonographic screening is limited). The AVN rates are comparable with pediatrician and ultrasonographic screening and less than with orthopaedist screening.

The algorithm in Fig 15 was generated by the Subcommittee after review of the evidence (Table 14).

### F. Cost-Effectiveness Ratios

In terms of excess neonatal referrals, the ratios suggest that there is a trade-off: for every case that these strategies detect beyond the pediatric strategy, they require more than 7000 or 16 000 extra referrals, respectively.

## DISCUSSION

### A. Summary

We derived 298 evidence tables from 118 studies culled from a larger set of 624 articles. Our literature review captured most in our model-based approach, if not all, of the past literature on DDH that was usable. The decision model (reduced based on available evidence) suggests that orthopaedic screening is optimal, but because orthopaedists in the published studies and in practice would differ, the supply of orthopaedists is relatively limited, and the difference between orthopaedists and pediatricians is relatively small, we conclude that pediatric screening is to be recommended. The place of ultrasonography in the screening process remains to be defined because there are too few data about postneonatal diagnosis by ultrasonographic screening to permit definitive recommendations.

Our conclusions are tempered by the uncertainties resulting from the wide range of the evidence. The confidence intervals are wide for the primary parameters. The uncertainties mean that, even with all the evidence collected from the literature, we are left with large doubts about the values of the different parameters.

Our data do not bear directly on the issue about the earliest point that *any* patient destined to have DDH will show signs of the disease. Our use of the terms *mid-term* and*late-term* DDH addresses that ignorance.

Our conclusions about other areas of the full decision model are more tentative because of the paucity of data about the effectiveness of periodicity examinations. Even the studies that gave data on mid-term and late-term case findings by pediatricians were sparse in their details about how the screening was instituted, maintained, or followed up.

Our literature search was weakest in addressing the European literature, where results about ultrasonography are more prevalent. We found, however, that many of the seminal articles were republished in English or in a form that we could assess.

### B. Specific Issues

1. Evidence Quality

Our measure of evidence quality is unique, although it is based on solid principles of study design and decision modeling. In particular, our measure was based on the notion that if the data conform poorly to how we need to use it, we downgrade its value.

However, throughout the analyses, there was never a correlation with the results of a study (in terms of the values of outcomes) and with evidence quality, so we never needed to use the measure for weighting the values of the outcome or for culling articles from our review. Had this been so, the measures would have needed further scrutiny and validation.

2. Outliers

Perhaps the true surrogates for study quality were the outlying values of outcomes. In general, however, there were few cases in which the outliers were clearly the result of poor-quality studies. One example is that of the outcomes of pediatric screening (1 → 3), in which the DDH rates in studies using only 2 categories were generally higher than those that explicitly specified 3 levels of outcomes.

Our general justification for using estimates that excluded outliers is that the outliers so much drove the results that they dominated the conclusion out of proportion to their sample sizes. As it is, our estimates have wide ranges.

3. Newborn Screening

The set of studies labeled “pediatrician screening” includes studies with a variety of examiners. We could not estimate the sensitivity and specificity of pediatricians' examinations versus those of other primary care providers versus orthopaedists. There are techniques for extracting these measures from agreement studies, but they are beyond the scope of the present study. It is intuitive that the more cases that one examines, the better an examiner one will be, regardless of professional title.

We were surprised that the results did not show a clear difference in results between the Graf^{2} and Harcke et al^{1}ultrasonographic examinations. Our data make no statement about the relative advantages of these methods for following up children or in addressing treatment.

4. Postneonatal Cases

As mentioned, our data cannot say *when* a postneonatal case is established or, therefore, the best time to screen children. We established our initial age categories for postneonatal cases based on biology, treatment changes, and optimal imaging and examination strategies. It is frustrating that the data in the literature are not organized to match this pathophysiological way of thinking about DDH. Similarly, as mentioned, the lack of details by authors on the methods of intercurrent screening means that we cannot recommend a preferred method for mid-term or late-term screening.

5. AVN

We used AVN as our primary marker for treatment morbidity. We acknowledge that the studies we grouped together may reflect different philosophies and results of orthopaedic practice. The hierarchical meta-analysis treats every study as an individual case, and the wide range in our confidence intervals reflects the uncertainty that results in grouping disparate studies together.

### C. Comments on Methods

This study is unique in its strong use of decision modeling at each step in the process. In the end, our results are couched in traditional terms (estimated rates of disease or morbidity outcomes), although the context is relatively nontraditional: attaching the estimates to *strategies* rather than to treatments. In this, our study is typical of an *effectiveness* study, which studied results in the real world, rather than of an*efficacy* study, which examines the biological effects of a treatment.^{11}

We made strong and recurrent use of the Bayesian hierarchical meta-analysis. A review of the tables will confirm that the Bayesian results were in the same “ballpark” as the average and pooled average estimates and had a more solid grounding.

The usual criticism of using Bayesian methods is that they depend on prior belief. The usual response is to show that the final estimates are relatively insensitive to the prior belief. In fact, for the screening strategies, a wide range of prior beliefs had no effect on the estimate. However, the prior belief used for the screening strategies—with a mean of 100 cases/1000 with a variance of 100—was too broad for the postneonatal case and AVN analyses; when data were sparse, the prior belief overwhelmed the data. For instance, in late-term DDH revealed by orthopaedic screening (5→ 30), in an analysis not shown, the posterior estimate from the 4 studies was a rate of 0.345 cases per 1000, despite an average and a pooled average on the order of 0.08. Four studies were insufficient to overpower a prior belief of 100.

### D. Research Issues

The place of ultrasonography in DDH screening needs more attention, as does the issue of intercurrent pediatrician screening. In the latter case, society and health care systems must assess the effectiveness of education and the “return on investment” for educational programs. The place of preferences—of the parents, of the clinician—must be established.

We hope that the framework we have delineated—of a decision model and of data—can be useful in these future research endeavors.

## ACKNOWLEDGMENTS

We thank Robert Sebring, PhD, for helping to manage this process and for substantive input and Bonnie Cosner for helping to manage the workflow.

We also thank Chris Kwiat, MLS, from the American Academy of Pediatrics Library, who performed the literature searches.

## VI. APPENDIX: EVIDENCE TABLES

Starting on next page.

## Footnotes

The recommendations in this statement do not indicate an exclusive course of treatment or serve as a standard of medical care. Variations, taking into account individual circumstances, may be appropriate.

* Target articles are those used for the literature review; they are listed separately following the reference list.

## V. REFERENCES

## TARGET ARTICLES

- BU1.
- BU2.
- BU3.
- BU4.
- BU5.
- BU6.
- BU7.
- BU8.
- BU9.
- BU10.
- BU11.
- BU12.
- BU13.
- BU14.
- BU15.
- BU16.
- BU17.
- BU18.
- BU19.
- BU20.
- BU21.
- BU22.
- BU23.
- BU24.
- BU25.
- BU26.
- BU27.
- BU28.
- BU29.
- BU30.
- BU31.
- BU32.
- BU33.
- BU34.
- BU35.
- BU36.
- BU37.
- BU38.
- BU39.
- BU40.
- BU41.
- BU42.
- BU43.
- BU44.
- BU45.
- BU46.
- BU47.
- BU48.
- BU49.
- BU50.
- BU51.
- BU52.
- BU53.
- BU54.
- BU55.
- BU56.
- BU57.
- BU58.
- BU59.
- BU60.
- BU61.
- BU62.
- BU63.
- BU64.
- BU65.
- BU66.
- BU67.
- BU68.
- BU69.
- BU70.
- BU71.
- BU72.
- BU73.
- BU74.
- BU75.
- BU76.
- BU77.
- BU78.
- BU79.
- BU80.
- BU81.
- BU82.
- BU83.
- BU84.
- BU85.
- BU86.
- BU87.
- BU88.
- BU89.
- BU90.
- BU91.
- BU92.
- BU93.
- BU94.
- BU95.
- BU96.
- BU97.
- BU98.
- BU99.
- BU100.
- BU101.
- BU102.
- BU103.
- BU104.
- BU105.
- BU106.
- BU107.
- BU108.
- BU109.
- BU110.
- BU111.
- BU112.
- BU113.
- BU114.
- BU115.
- BU116.
- BU117.
- BU118.

- Copyright © 2000 American Academy of Pediatrics