CONTEXT: Severe neonatal hyperbilirubinemia is associated with chronic bilirubin encephalopathy (kernicterus).
OBJECTIVE: To systematically review the effectiveness of specific screening modalities to prevent neonatal bilirubin encephalopathy.
METHODS: We identified studies through Medline searches, perusing reference lists and by consulting with US Preventive Services Task Force lead experts. We included English-language publications evaluating the effects of screening for bilirubin encephalopathy using early total serum bilirubin (TSB), transcutaneous bilirubin (TcB) measurements, or risk scores. Severe hyperbilirubinemia was used as a surrogate for possible chronic bilirubin encephalopathy, because no studies directly evaluated the latter as an outcome. We calculated the sensitivity and specificity of early TSB, TcB measurements, or risk scores in detecting hyperbilirubinemia.
RESULTS: Ten publications (11 studies) were eligible. Seven (2 prospective) studies evaluated the ability of risk factors (n = 3), early TSB (n = 3), TcB (n = 2), or combinations of risk factors and early TSB (n = 1) to predict hyperbilirubinemia (typically TSB > 95th hour-specific percentile 24 hours to 30 days postpartum). Screening had good ability to detect hyperbilirubinemia: reported area-under-the-curve values ranged between 0.69 and 0.84, and reported sensitivities and specificities suggested similar diagnostic ability. Indirect evidence from 3 descriptive uncontrolled studies suggests favorable associations between initiation of screening and decrease in hyperbilirubinemia rates, and rates of treatment or readmissions for hyperbilirubinemia compared with the baseline of no screening. No study assessed harms of screening.
CONCLUSIONS: Effects of screening on the rates of bilirubin encephalopathy are unknown. Although screening can predict hyperbilirubinemia, there is no robust evidence to suggest that screening is associated with favorable clinical outcomes.
Some degree of jaundice or hyperbilirubinemia occurs in most newborns. Severe neonatal hyperbilirubinemia is associated with chronic bilirubin encephalopathy or kernicterus, a rare condition characterized by athetoid spasticity, gaze and visual abnormalities, and sensorineural hearing loss in survivors. It may also be associated with mental retardation. In the literature, the term “kernicterus” has been used interchangeably with both the acute and chronic findings of bilirubin encephalopathy. Herein we adopt the suggestions of the American Academy of Pediatrics Subcommittee on Hyperbilirubinemia and reserve the term “kernicterus” for the chronic form of the condition.1
A 2003 review reported that chronic bilirubin encephalopathy has a mortality rate of at least 10% and morbidity rate of at least 70%.2 The true incidence of chronic bilirubin encephalopathy is unknown, because it is not a mandatory reportable disease. A 2001 Joint Commission Sentinel Event Alert stated that cases of kernicterus have continued to be reported in recent years.3 There are initiatives to prevent and eliminate this rare disease by instituting widespread screening for hyperbilirubinemia and subsequent timely treatment with phototherapy or exchange transfusion to reduce bilirubin levels.1,4,5 Typically, screening pertains to risk stratification on the basis of known risk factors and/or bilirubin-level measurements.
The Tufts Evidence-Based Practice Center (Tufts EPC) completed an evidence report in 2003 examining the effects of bilirubin on neurodevelopmental outcomes in infants of at least 34 weeks' gestation.2,6 In 2007, the Center on Primary Care, Prevention and Clinical Partnerships at the Agency for Healthcare Quality and Research (AHRQ), on behalf of the US Preventive Services Task Force (USPSTF), requested that Tufts EPC supplement the 2003 evidence report to address new questions. The findings are used by the USPSTF to make recommendations concerning screening for bilirubin encephalopathy in neonates.
Herein, we summarize key findings from our systematic review that supplements the previous report. We aimed to examine the effects of screening for hyperbilirubinemia on the incidence of acute and chronic bilirubin encephalopathy by addressing a set of key research questions using evidence-based medicine methodology.
In a systematic review it is often helpful to develop a schematic (analytic framework) that visually maps the specific linkages that associate the considered populations, interventions, modifying factors, and outcomes.7 An analytic framework provides a basis for interpreting and contextualizing relevant studies and establishing which links in the chain of logic have been answered, have inconclusive evidence, or have not yet been addressed.
The Tufts EPC, the Center on Primary Care, Prevention and Clinical Partnerships at the AHRQ, and the USPSTF jointly developed an analytic framework and a set of study inclusion/exclusion criteria that are suitable to meet the USPSTF objectives. The analytic framework and the 4 key questions are depicted in Fig 1. This manuscript presents information on the first 4 key questions. Key questions 5 (Does treatment reduce the risk of bilirubin encephalopathy in infants identified by screening?), for which we found no data, and key question 6 (What are the harms of treatment with phototherapy?), for which we found limited data, are presented in the Evidence Report.8
Search Strategy and Identification of Relevant Studies
This review answers additional key questions to supplement our previous report. The previous report identified all eligible studies published up to September 2001.2,6 We performed additional electronic Medline searches for English-language studies published from September 2001 to August 2007 using Medical Subject Headings (MeSH) terms and key words, such as “jaundice,” “bilirubin,” “hyperbilirubinemia,” and “kernicterus.” We complemented our electronic searches by perusing the reference lists of identified relevant studies and with input from USPSTF lead experts.
We included English-language publications on healthy term or preterm infants of at least 35 weeks' gestation. For key questions 1 to 3 (see Fig 1), we considered cohort and (nested) case-control studies on the ability of screening to predict (acute or chronic) bilirubin encephalopathy or clinically significant hyperbilirubinemia. Eligible screening modalities included risk-factor scores, transcutaneous bilirubin (TcB) measurements, early total serum bilirubin (TSB) measurements, or combinations thereof. We also considered reports from US-based cohort studies that described the impact of system-wide implementation of TcB screening and the use of the hour-specific nomogram percentile of jaundice4 and (re)admissions for the treatment of hyperbilirubinemia.
We excluded studies on clinical or maternal assessments of jaundice, cord blood bilirubin, end-tidal carbon monoxide, or umbilical cord α-fetoprotein alone. We also rejected studies that compared the agreement between different bilirubin-measurement methods (eg, using correlation analyses or difference versus average analyses [Bland-Altman analyses9,10]) without assessment of diagnostic accuracy.
For key question 4 (harms of screening), we considered all potentially relevant publications including case reports or case series.
Two experienced reviewers extracted each study in a nonblinded fashion using preconstructed forms. When needed, a third reviewer served as an arbitrator and resolved discrepancies. From each study, we recorded the first author; journal and year of publication; study country and setting; number of infants enrolled and analyzed; study eligibility criteria (including minimum gestational-age and birth-weight cutoffs); study design and setting; how study subjects were recruited (consecutively, by random sampling, or otherwise); and information on the characteristics of the screening test and the reference standard for hyperbilirubinemia. For TcB measurements, we recorded the device used and timing of measurements. For risk instruments, we recorded the components in the composite score. We also extracted sensitivities and specificities for various reported cutoffs of TcB or early TSB measurements (as applicable) and noted area-under-the-curve (AUC) values (see below).
Assessment of Methodologic Quality
Each study was assigned a quality rating of “good,” “fair,” or “poor” by 2 reviewers, per USPSTF criteria,7 and the presence or absence of overt methodologic errors.11 When needed, a third reviewer acted as an arbitrator. Briefly, we assessed study characteristics commonly associated with less susceptibility to biases and systematic errors, such as clarity of outcome definitions, suitability of statistical methods, and proper accounting for confounders.7 We considered the presence of overt verification bias as an important methodologic error. Verification bias operates when only infants with positive screening test results (eg, high scores in the risk instruments or high TcB measurements) were verified with the reference standard (late serum TSB measurements), whereas infants with negative screening results (eg, low scores or TcB measurements) were not. This can result in upward-biased sensitivity and downward-biased specificity estimates. We based our qualitative conclusions on studies of good or fair quality.
Data Analysis and Presentation
Sensitivity, Specificity, and AUC
Because of dissimilarities in the identified studies, no quantitative synthesis (meta-analysis) was performed. However, we calculated from each study the corresponding sensitivity and specificity pairs and depict them in sensitivity/100%-specificity plots. The sensitivity of a screening test is its ability to maximize true-positive diagnoses, and specificity is its ability to minimize false-positive diagnoses. A perfect screening test has both sensitivity and specificity equal to 100%. There is a trade-off between the sensitivity and specificity of a test. As the cutoff for a positive screening test decreases (eg, the value of TcB above which measurements are considered suggestive of a high bilirubin level), the corresponding sensitivity increases and the corresponding specificity decreases. One can capture this trade-off by recording multiple sensitivity/specificity pairs corresponding to different cutoffs. These can be plotted in a square sensitivity versus 100%-specificity plot (ie, constructing a receiver operating characteristic curve [see Fig 2 for examples]). The area under the (receiver operating characteristic) curve can summarize diagnostic ability across all positivity cutoffs. AUC values of 0.5 imply lack of any diagnostic ability, whereas AUC values of 1.0 correspond to a perfect screening test.
Positive and Negative Likelihood Ratios
We also characterized diagnostic abilities by using positive and negative likelihood ratios (LR+ and LR−, respectively). These quantities express the strength of the diagnostic or predictive information conveyed by the screening test results. Briefly, LR+ expresses how much a positive screening test reinforces our prescreening belief that an infant will indeed develop a high bilirubin level. Conversely, LR− quantifies the extent to which a negative screening result reinforces our prescreening belief that an infant will indeed not develop a high bilirubin level. LR+ and LR− values of 1 imply no diagnostic ability (ie, our prescreening belief is multiplied by a factor of 1 and, therefore, remains unaltered). As suggested in the literature, we consider tests with either an LR+ of >10 or an LR− of <0.1 informative and clinically useful.12 Instead of providing tables of LRs per study and cutoff used, we incorporated this information in the graphical analyses.
Intercooled Stata 8.2 (Stata Corp, College Station, TX) was used for calculations and graphics.
We screened 742 abstracts and 96 articles qualified for full-text examination, 9 of which were deemed eligible for the current systematic review.4,13–20 We show a schematic of the screening process in Appendix 1. In addition, we included a single study that was identified in our previous evidence report.21
Key Question 1: Does Screening Using Risk-Factor Assessment and/or Bilirubin Testing Reduce the Incidence of Acute or Chronic Bilirubin Encephalopathy?
No study directly addressed this question. Eligible studies evaluated only surrogate outcomes, namely the incidence of hyperbilirubinemia.
Key Question 2: Does Risk-Factor Assessment Accurately Identify Infants Who May Benefit From Bilirubin Testing?
We presumed that infants who would benefit from phototherapy are those with high bilirubin levels. Four studies described in 3 publications examined the effectiveness of 2 risk instruments assessing infant and family history in screening for the development of significant hyperbilirubinemia (Table 1), all of which were conducted in the United States. Two were retrospective cohorts,13,14 and 2 were nested case-control studies.14,21 All studies enrolled infants with available information on factors included in the risk instruments and also TSB measurements before and after discharge,13 anytime within 30 days from birth (nested case-control study14) or after the first 48 hours of life (retrospective cohort14). The definition of clinically significant hyperbilirubinemia varied across studies (Table 1). All studies were graded as being of fair quality.
The 2 studied risk instruments had only 2 risk factors in common (of 613 and 914,21), namely exclusive breastfeeding and gestational age (Appendix 2). However, even these common factors contributed differently to the total risk score/index. Therefore, the same infant can receive different risk scores with different instruments.
Overall, evidence suggests comparable ability of the 2 risk instruments in predicting later significant hyperbilirubinemia (Table 1). Studies reported AUC values ranging from 0.71 to 0.84, with nested case-control studies showing slightly higher AUC values when compared with retrospective cohorts.
Key Question 3: Does Bilirubin Testing Accurately Identify Infants Who May Benefit From Phototherapy?
We discuss separately early measurements of serum bilirubin (early TSB) and TcB measurements. Again, we presumed that infants who would benefit from phototherapy are those with high bilirubin levels.
Early TSB Measurements
Four studies were eligible (Table 2).13–16 One prospective study from India16 included all 220 healthy infants with a gestational age of ≥35 weeks born within a 5-month period. This study verified late TSB values only for infants with “at least 10 mg/dL” (ie, had overt verification bias). The other prospective study from Turkey15 included 366 consecutive infants with a gestational age between 35 and 42 weeks but reported data on the diagnostic ability of early TSB measurements only among 146 infants with a gestational age between 35 and 37 weeks. The 2 retrospective studies from the United States13,14 were described in key question 2. In all studies, the reference standard was a postdischarge measurement above the hour-specific 95th percentile. Three studies received a fair grade13–15 and 1 received a poor grade16 for their methodologic quality.
Fig 2A illustrates the diagnostic ability of early measurements to identify postdischarge TSB levels above the 95th hour-specific percentile. All 4 studies reported comparable diagnostic abilities (Table 2 and Fig 2A). Only the study with overt verification bias16 had an LR− of <0.1 for an early TSB cutoff of 6 mg/dL during the first 24 hours (which is near the 75th hour-specific percentile in the Bhutani et al nomogram4).
Combination of Risk Instruments and Early TSB Levels
In a retrospective cohort,14 the effectiveness of a combination of a risk instrument (see Appendix 2 for details) with early TSB measurements in predicting a TSB level of ≥20 mg/dL at ≥48 hours after birth (ie, levels above the 95th hour-specific percentile) was evaluated. The AUC value improved from 0.69 to 0.86 (P < .05) after incorporating the z scores of TSB measurements in the predictive model (the z scores express how extreme a TSB measurement is, in SD units).
One study from Thailand17 and 1 study from China (Hong Kong)19 were eligible; both studies included almost exclusively infants of Asian descent. Both studies analyzed selected infant subgroups on the basis of availability of measurements, and 1 of them had overt verification bias. The studies defined the presence of high bilirubin levels in different ways (Table 3). Both were graded as poor for their methodologic quality. Fig 2B summarizes the diagnostic ability of TcB measurements to identify a TSB level that was indicative of phototherapy in these 2 studies.
Outcomes After the Implementation of Screening Strategies
Three retrospective studies compared rates of phototherapy or readmission for hyperbilirubinemia before and after the implementation of screening (Table 4). 4,18,20 All were descriptive studies without concurrent controls and without adjustments for confounders. All received a poor grade for their methodologic quality.
The first study4 described the implementation of a “systems approach” in 3 sequential incremental steps (Table 4). In the first step, nurses were authorized to obtain TSB or TcB measurements for clinical jaundice. In the second step, all infants received predischarge TSB measurements (at routine metabolic screening). During this phase, the Bhutani et al hour-specific bilirubin nomogram was developed.4 Finally, in the third step, the hour-specific nomogram was used to interpret the universal TSB measurements at discharge. Intensive phototherapy rates (in nursery and after readmissions) increased from 4.5% (first step) to 5.4% (second step) and then decreased to 2.5% during the third step. Readmission rates gradually decreased throughout the study.
The report of another study described the implementation of universal predischarge TSB or TcB screening in 18 hospitals.19 After the implementation of screening, the proportion of term and near-term infants with “severe hyperbilirubinemia” increased significantly compared with the baseline. However, the proportion of readmissions decreased significantly (P < .005).
Authors of the third study18 reported that the proportion of newborns treated with phototherapy while in the nursery increased significantly after initiation of TcB screening compared with baseline rates. The mean rate of readmission for hyperbilirubinemia decreased significantly over time (Table 4; P = .044).
Key Question 4: What Are the Harms of Screening?
None of the reviewed studies assessed the harms of screening.
We supplemented our previous evidence report2 by summarizing the evidence on hyperbilirubinemia screening. The evidence presented in this review will be used by the USPSTF to make recommendation statements on screening for neonatal hyperbilirubinemia. No study evaluated the effects of screening on the rates of acute or chronic bilirubin encephalopathy. Instead, studies evaluated the surrogate outcome of bilirubin encephalopathy or hyperbilirubinemia (adopting different definitions). Overall, screening by use of risk factors, TcB or early TSB measurements, or combinations thereof is effective in predicting a high bilirubin level. Descriptive uncontrolled evidence suggests that screening is associated with increased diagnoses of hyperbilirubinemia and fewer readmissions for hyperbilirubinemia. All aforementioned findings are based on studies with methodologic problems and pragmatic limitations.
Study of the analytic framework in Fig 1 can help contextualize findings. We find that screening can predict high bilirubin levels. However, to connect this finding to clinical outcomes, we have to make the assumption that a high bilirubin level is a valid surrogate for bilirubin encephalopathy.22 This means that, foremost, a high bilirubin level must be in the causal pathway between screening (intervention) and bilirubin encephalopathy (clinical outcome). This is generally supported by the literature. Second, we make the assumption that if screening does affect the rates of bilirubin encephalopathy, it would do so by treatments implemented as a result of the screening. This implies that any intervention that lowers high bilirubin levels will also prevent bilirubin encephalopathy. However, we have no explicit data to show that treatments such as phototherapy or exchange transfusion actually decrease the risk of bilirubin encephalopathy. Despite the lack of data, it is generally postulated that hyperbilirubinemia is a valid surrogate outcome for acute and chronic bilirubin encephalopathy. A final caveat is that, even accepting hyperbilirubinemia as a valid surrogate end point, the diagnostic accuracy of the studied screening modalities is based on relatively few studies, some of which have overt methodologic problems.
We also found tangential evidence from cohort studies that screening is associated with increased detection of hyperbilirubinemia and fewer readmissions for hyperbilirubinemia. Notwithstanding reservations on the validity of these intermediate clinical outcomes, we caution that they were observed in studies without concurrent controls and without analyses that can account for confounding and secular trends. Therefore, one cannot be confident that the reported favorable changes in the intermediate clinical outcomes are attributable solely to the initiation of system-wide screening.
It is generally difficult to appreciate the trade-off in the effectiveness, harms, and costs of screening without performing a formal decision analysis.23 This is especially true for rare conditions, such as chronic bilirubin encephalopathy, and whenever key parameters are unknown (such as the effectiveness of screening strategies to prevent the disease). Allowing for these caveats, a recent cost-effectiveness analysis concluded that widespread screening for bilirubin encephalopathy is probably going to increase health costs substantially, with uncertain benefits.24
There are several limitations to this work. First, this is a supplement to our previous evidence report; we used a different literature search strategy and narrower study eligibility criteria. Therefore, one may argue that because we relied on the previous evidence report we may have missed eligible articles that were published before 2001. However, all eligible articles for the current review would also be eligible for the broader key questions of the original evidence report. Second, the number of eligible studies that addressed the key questions was limited, and none were graded as having good methodologic quality. Finally, because we had few studies and they were clinically and methodologically heterogeneous, we did not perform quantitative analyses.
In principle, a definitive answer to the question of whether screening for hyperbilirubinemia can reduce the rate of acute and chronic encephalopathy would be conveyed by an adequately powered, pragmatic (cluster) randomized trial. Because kernicterus is rare (0.9 cases per 100 000 live births in a prospective study in the United Kingdom and Ireland25), such a trial would be challenging to perform. For practical consideration, studies on the effectiveness of different strategies to reduce the incidence of bilirubin encephalopathy could only rely on a surrogate outcome such as hyperbilirubinemia. Because severe hyperbilirubinemia is a rare event25 (and because of the clustered design), tens of thousands of infants per arm would be needed to attain statistical power, posing questions on the feasibility of such a study. Future studies on preventive and screening strategies should also actively monitor the potential harms from implementing such strategies in both infants and their family members.
This article is based on a report conducted by the Tufts EPC under contract to the AHRQ (contract 290-020022).
We thank Ms Audrey Mahoney for administrative support.
- Accepted March 11, 2009.
- Address correspondence to Stanley Ip, MD, Tufts Medical Center, Center for Clinical Evidence Synthesis, Institute for Clinical Research and Health Policy Studies, #63, 800 Washington St, Boston, MA 02111. E-mail:
The findings and conclusions are those of the authors and do not necessarily represent the views of the AHRQ. No statement in this article should be construed as an official position of the AHRQ or US Department of Health and Human Services.
Financial Disclosure: The authors have indicated they have no financial relationships relevant to this article to disclose.
- ↵American Academy of Pediatrics, Subcommittee on Hyperbilirubinemia. Management of hyperbilirubinemia in the newborn infant 35 or more weeks of gestation [published correction appears in Pediatrics. 2004;114(4):1138]. Pediatrics.2004;114 (1):297– 316
- ↵Ip S, Glicken S, Kulig J, O'Brien R, Sege R. Management of neonatal hyperbilirubinemia. Evid Rep Technol Assess (Summ).2003;(65):1– 5
- ↵Kernicterus threatens healthy newborns. Sentinel Event Alert.2001;(18):1– 4
- ↵Chou SC, Palmer RH, Ezhuthachan S, et al. Management of hyperbilirubinemia in newborns: measuring performance by using a benchmarking model. Pediatrics.2003;112 (6 pt 1):1264– 1273
- ↵Ip S, Chung M, Kulig J, et al; American Academy of Pediatrics, Subcommittee on Hyperbilirubinemia. An evidence-based review of important issues concerning neonatal hyperbilirubinemia. Pediatrics.2004;114 (1). Available at: www.pediatrics.org/cgi/content/full/114/1/e130
- ↵Ip S, Chung M, Trikalinos T, DeVine D, Lau J. Screening for Bilirubin Encephalopathy. Evidence Synthesis No. 72. AHRQ Publication No. 09-05136-EF-1. Rockville, Maryland: Agency for Healthcare Research and Quality, October 2009
- ↵Bland JM, Altman DG. Measuring agreement in method comparison studies. Stat Methods Med Res.1999;8 (2):135– 160
- ↵Jaeschke R, Guyatt GH, Sackett DL. Users' guides to the medical literature. III. How to use an article about a diagnostic test. B. What are the results and will they help me in caring for my patients? The Evidence-Based Medicine Working Group. JAMA.1994;271 (9):703– 707
- ↵Keren R, Bhutani VK, Luan X, Nihtianova S, Cnaan A, Schwartz JS. Identifying newborns at risk of significant hyperbilirubinaemia: a comparison of two recommended approaches. Arch Dis Child.2005;90 (4):415– 421
- ↵Sarici SU, Serdar MA, Korkmaz A, et al. Incidence, course, and prediction of hyperbilirubinemia in near-term and term newborns. Pediatrics.2004;113 (4):775– 780
- ↵Petersen JR, Okorodudu AO, Mohammad AA, Fernando A, Shattuck KE. Association of transcutaneous bilirubin testing in hospital with decreased readmission rate for hyperbilirubinemia. Clin Chem.2005;51 (3):540– 544
- ↵Ho HT, Ng TK, Tsui KC, Lo YC. Evaluation of a new transcutaneous bilirubinometer in Chinese newborns. Arch Dis Child Fetal Neonatal Ed.2006;91 (6):F434– F438
- ↵Eggert LD, Wiedmeier SE, Wilson J, Christensen RD. The effect of instituting a prehospital-discharge newborn bilirubin screening program in an 18-hospital health system. Pediatrics.2006;117 (5). Available at: www.pediatrics.org/cgi/content/full/117/5/e855
- ↵Bucher HC, Guyatt GH, Cook DJ, Holbrook A, McAlister FA. Users' guides to the medical literature: XIX. Applying clinical trial results. A. How to use an article measuring the effect of an intervention on surrogate end points. Evidence-Based Medicine Working Group. JAMA.1999;282 (8):771– 778
- ↵Suresh GK, Clark RE. Cost-effectiveness of strategies that are intended to prevent kernicterus in newborn infants. Pediatrics.2004;114 (4):917– 924
- ↵Manning D, Todd P, Maxwell M, Jane PM. Prospective surveillance study of severe hyperbilirubinaemia in the newborn in the UK and Ireland. Arch Dis Child Fetal Neonatal Ed.2007;92 (5):F342– F346
- Copyright © 2009 by the American Academy of Pediatrics