Diagnostic Accuracy of Neonatal Assessment for Gestational Age Determination: A Systematic Review
CONTEXT: An estimated 15 million neonates are born preterm annually. However, in low- and middle-income countries, the dating of pregnancy is frequently unreliable or unknown.
OBJECTIVE: To conduct a systematic literature review and meta-analysis to determine the diagnostic accuracy of neonatal assessments to estimate gestational age (GA).
DATA SOURCES: PubMed, Embase, Cochrane, Web of Science, POPLINE, and World Health Organization library databases.
STUDY SELECTION: Studies of live-born infants in which researchers compared neonatal signs or assessments for GA estimation with a reference standard.
DATA EXTRACTION: Two independent reviewers extracted data on study population, design, bias, reference standard, test methods, accuracy, agreement, validity, correlation, and interrater reliability.
RESULTS: Four thousand nine hundred and fifty-six studies were screened and 78 included. We identified 18 newborn assessments for GA estimation (ranging 4 to 23 signs). Compared with ultrasound, the Dubowitz score dated 95% of pregnancies within ±2.6 weeks (n = 7 studies), while the Ballard score overestimated GA (0.4 weeks) and dated pregnancies within ±3.8 weeks (n = 9). Compared with last menstrual period, the Dubowitz score dated 95% of pregnancies within ± 2.9 weeks (n = 6 studies) and the Ballard score, ±4.2 weeks (n = 5). Assessments with fewer signs tended to be less accurate. A few studies showed a tendency for newborn assessments to overestimate GA in preterm infants and underestimate GA in growth-restricted infants.
LIMITATIONS: Poor study quality and few studies with early ultrasound-based reference.
CONCLUSIONS: Efforts in low- and middle-income countries should focus on improving dating in pregnancy through ultrasound and improving validity in growth-restricted populations. Where ultrasound is not possible, increased efforts are needed to develop simpler yet specific approaches for newborn assessment through new combinations of existing parameters, new signs, or technology.
- AVCL —
- anterior vascular capsule of the lens
- BOE —
- best obstetric estimate
- CI —
- confidence interval
- GA —
- gestational age
- HIC —
- high-income country
- LBW —
- low birth weight
- LMIC —
- low- and middle-income countries
- LMP —
- last menstrual period
- QUADAS–2 —
- Quality Assessment of Diagnostic Accuracy Studies–2
- SGA —
- small for gestational age
Of the estimated 14.9 million annual preterm births, 13.6 million (91%) occur in low- and middle-income countries (LMIC) .1,2 Preterm birth is the leading cause of mortality in children less than 5 years of age globally, accounting for 1 million neonatal deaths annually, almost all of which are in LMIC.3 In these settings, early recognition of the preterm infant may facilitate the timely delivery of life-saving interventions, such as continuous positive airway pressure or kangaroo mother care.
Ultrasound dating in early pregnancy is the most accurate method currently available to assess gestational age (GA) and is a standard of care in high-income countries. In LMIC, pregnancy dating is challenging, and GA of the infant is frequently unknown or inaccurate. Maternal recall of last menstrual period (LMP) is often unavailable or unreliable, particularly in populations with high rates of maternal illiteracy.4,5 The shortage of health care providers in LMIC, currently estimated at 7.9 million,6 contributes to poor coverage of antenatal care. In sub-Saharan Africa and Southeast Asia, fewer than one-third of mothers in households in the poorest quintile receive at least 1 antenatal care visit.7 Furthermore, the timing of the first visit for antenatal care is late, occurring typically late in the second trimester.8,9 Moreover, access to ultrasonography is low, with <7% of pregnant women having access to ultrasound in rural sub-Saharan Africa.4 Traditional sonography in late pregnancy is notably inaccurate for determining GA (±4 weeks).10,11
Clinical assessment of newborn maturity has long been used as a proxy to estimate GA after birth (Table 1). In 1966, Farr et al12 defined a classification for the development of external physical characteristics in the newborn. In 1968, Amiel-Tison13 described the assessment of neonatal neurologic maturation. Dubowitz et al14 developed a score for GA based on a combination of neurologic and physical signs, which dated pregnancies within 5 days of LMP in their original study. Since then, several simplified clinical assessments have been described in the literature.15–18 The Ballard score19 is one of the most commonly usedand was revised to the New Ballard score in 1991 to improve accuracy for early preterm infants.20
Newborn assessment for GA dating has become less relevant in high-income settings, where ultrasound coverage is high and uncertainty of antenatal pregnancy dating is less common than in LMIC. In LMIC settings without widespread access to early ultrasound dating and where accuracy of LMP recall is highly variable, clinical assessment of the newborn remains the commonest available tool to evaluate GA. Accurate GA is necessary to identify preterm and small-for-gestational-age (SGA) babies and provide them with effective interventions.
The Every Newborn Action Plan was launched in 2014 with the aim to end preventable neonatal deaths and stillbirths by 2030.34 GA measurement was identified as a priority area35 for improving (1) the epidemiology of preterm birth and SGA and (2) the comparability of neonatal mortality estimates through stratification by GA and birth weight.
In this systematic review, we aim to (1) identify individual neonatal signs and combined clinical scores or assessments that have been used to ascertain GA of newborns; and (2) assess the diagnostic accuracy and reliability of these methods for estimating GA, compared with dating by a reference standard (ie, ultrasound or LMP).
We conducted a systematic review of the published and gray literature, initially done in March 2015 and updated in June 2016 (Fig 1). Databases we searched included PubMed, Embase, Cochrane, Web of Science, POPLINE, and the World Health Organization Global Health Libraries and regional databases (Latin American and Carribbean Health Sciences, Index Medicus for the Eastern Mediterranean Region, African Index Medicus). The review was registered with the International Prospective Register of Systematic Reviews (PROSPERO registration number: CRD42015020499). The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement, review protocol, and detailed search terms are available in the Supplemental Information.
There were no language restrictions. Abstracts of non-English articles were translated via Google Translate, and if eligible, the full text was translated to English by fluent speakers. Articles were considered for inclusion if the study met the following criteria: (1) included live-born neonates; (2) compared at least 2 methods of GA estimation, 1 of which was a neonatal clinical assessment, score or individual clinical sign(s); and (3) reported at least 1 statistic assessing correlation, agreement, or validity of GA estimation. Prenatal assessments (eg, symphysis fundal height, ultrasound) and neonatal anthropometrics (eg, foot length) were reviewed separately and will be reported elsewhere.
We excluded studies in which researchers did not provide data describing the correlation, agreement, or validity of neonatal clinical assessment compared with a reference method of pregnancy dating (ie, ultrasound or LMP). We excluded studies from specialized subpopulations (eg, infants of diabetic mothers), editorials or reviews without original data, individual case reports, and duplicate studies.
All articles were reviewed independently by 2 researchers and extracted into a standard Excel file. Differences were resolved by a third independent reviewer. The study characteristics extracted are listed in Supplemental Information 2 .
Study Quality Assessment
Two independent reviewers graded the methodological quality of the studies of diagnostic accuracy using the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS–2)36 tool, modified for the context of this review (Supplemental Information, “Study Quality Assessment” section). Individual studies were evaluated for limitations and biases in the following domains: patient selection, test method, reference standard, and patient flow and timing. Studies with a reference standard GA of ultrasonography or best obstetric estimate (BOE) (including ultrasound confirmation of dating) were graded as highest quality. Though LMP may be considered gold standard in high-resource settings (where rates of literacy and early antenatal care are high), in LMIC, LMP recall is considered less reliable because of low literacy rates and late presentation to antenatal care.11,37 Additionally, we assessed the generalizability of study results to LMIC.
Stata 13 (StataCorp, College Station, TX) and R (R Foundation for Statistical Computing, Vienna, Austria) were used for analyses. The definition of preterm birth was a live birth <37 weeks’ gestation. Studies were grouped by method of newborn assessment and reference standard. Simple descriptive statistics were used to report ranges and medians. The mean individual-level differences between 2 methods of GA assessment were pooled using the Stata metan command, which provided the pooled mean-difference estimate and 95% confidence interval (CI). The variance and SD around the pooled estimate were calculated using the following formula38:
For studies in which researchers reported the percent of test measures within ±1 to 2 weeks of a reference, percentages were logit transformed and SEs were calculated. Meta-analysis was conducted with a random effects model. The Higgins I2 statistic was calculated to assess heterogeneity. For reports of diagnostic accuracy, forest plots were generated in R to summarize diagnostic accuracy across studies. Because pooling of sensitivity and specificity separately fails to account for the interrelatedness of the measures, hierarchical bivariate models are recommended for meta-analysis.39 These were analyzed by using MetaDisc 1.4 and RStudio (Mada package). Hierarchal summary receiver operating characteristic curves were generated.
Subgroup analyses were conducted by assessment method, reference standard type, and country income level. Correlation coefficients were not pooled, given that in many studies type of coefficient (ie, Spearman or Pearson) was not indicated, and furthermore, methods for pooling Spearman correlation coefficients have not been well described.38
Neonatal Clinical Assessments
We identified 3862 titles, and 66 articles were included, some reporting on more than one scoring system (22 articles reported on the Dubowitz score, 31 on the Original and/or New Ballard score,and 25 on other clinical scores)(Fig 1). Basic study characteristics of all included studies are in Supplemental Table 10. The studies were published between 1968 and 2016, with fewer than half from LMIC. Most studies (n = 62) were conducted in health facilities, with 19 conducted in NICUs on preterm and/or low birth weight (LBW) populations. For the reference standard, there were 31 studies in which researchers had ultrasound-based dating, 42 in which they used LMP, and 3 in which researchers used dating based on another neonatal assessment.
The overall QUADAS–2 summary is in Supplemental Fig 6. In general, the quality of the studies was relatively low. In over half of the studies, there was a high risk of bias related to patient selection, test method, or reference standard.
Neonatal Clinical Assessments or Scores
We identified 18 different neonatal assessments or scoring systems (combining >1 individual clinical sign) for GA determination (Table 1). Twelve were developed in high-income countries (HICs) and 7 in LMIC (4 in Africa, 2 in Asia, 1 in Turkey). The reference standard from which the scores were derived was ultrasound/BOE in only 2 studies. The most complex score, Amiel-Tison,21 has 23 criteria, including a large number of neurologic signs. The simplest score, the Parkin,16 includes only 4 external physical criteria. One simplified score was developed in Nigeria (Eregie17) and includes physical anthropometrics (head circumference and midarm circumference).
Individual External Physical Criteria and Signs
Table 2 shows 12 studies in which researchers reported the correlation of individual external physical criteria with GA. Correlation coefficients were generally higher for comparisons with an LMP reference, for which median correlation coefficients ranged from 0.60 to 0.75 for most signs. Three studies used an ultrasound or BOE GA reference, and lower correlations were reported in 2 of these studies, neither of which included early preterm infants.21,40 The physical characteristics with the highest median correlation were breast size, plantar skin creases, ear firmness, and skin texture.
Individual Neuromuscular Signs
In 10 studies, researchers reported the correlation of individual neuromuscular criteria with GA (Table 2). The median correlation coefficients ranged from 0.52 to 0.70 in the studies using an LMP reference standard GA. Of the 3 studies that used an ultrasound-based reference standard GA, correlation coefficients were again lower in the same 2 studies as they were for physical criteria.21,40 The signs with the highest median correlation coefficients were ventral suspension, square window, and posture.
Validity of Neonatal Clinical Scores of GA
Studies in which researchers reported on the validity or agreement of neonatal assessments with a reference standard are shown in Table 3 (Dubowitz), Table 4 (Ballard), and Supplemental Table 12 (other assessments).
There were 26 studies in which researchers validated the Dubowitz score (11 ultrasound/BOE; 19 LMP reference). Ten studies were from LMIC. In most studies, the neonatal assessment was performed by physicians or nurses.
Ultrasound or BOE Reference Standard
In 2 studies, researchers reported the correlation of GA dating by Dubowitz score and BOE (r = 0.73 and 0.90, respectively). In 7 studies, researchers reported a mean difference in GA between Dubowitz and ultrasound-based dating, ranging from −2.2 weeks (underestimation) to +0.7 weeks (overestimation). The pooled mean difference was not statistically different from the null hypothesis (ie, difference = 0), indicating no evidence of overall systematic bias (Table 5, Supplemental Fig 7). The precision of the estimate is reflected in the SD of the mean difference, which, at the individual study level, ranged from 0.52 to 1.94 weeks. The pooled SD across the studies was 1.3 weeks, indicating that 95% of the differences in GA (Dubowitz score–ultrasound dating) fell within ±2.6 weeks (n = 7 studies). In the studies in which researchers reported on the percent agreement within weeks (n = 3), the Dubowitz GA fell within 1 week of ultrasound dates in 53% of infants (pooled estimate, 95% CI: 47% to 71%), and within 2 weeks in 59% of newborns (pooled estimate, 95% CI: 41% to 74%). Researchers in 1 study reported on the diagnostic accuracy of the Dubowitz score to identify preterm infants compared to ultrasound-based dating (sensitivity 61%, specificity 99%).50 Among studies done in LMIC, there was no significant bias compared with ultrasound dating, and the precision of GA dating by the Dubowitz score was similar to HICs (Supplemental Table 11).
In 4 studies, there was evidence of greater bias of Dubowitz scoring among preterm infants (Supplemental Table 12). In 4 studies, researchers reported that the Dubowitz score systematically overestimated GA in preterm infants by up to 2.6 weeks48–50 and more so among early preterm infants.46,48–50
LMP Reference Standard
The correlation of GA determined by Dubowitz scoring and LMP GA was reported in 14 studies and was generally high, ranging from 0.41 to 0.94 (median = 0.89). The pooled mean difference was 0.65 weeks (n = 6, 95% CI: 0.01 to 1.30), indicating a systematic overestimation compared with LMP-based GA (Table 5, Supplemental Fig 7). 95% of the differences fell within ±2.9 weeks of the mean. The GA determined by Dubowitz assessment fell within 1 week of LMP dates in 59% of newborns (n = 4, 95% CI: 41% to 74%) and within 2 weeks in 87% (n = 6, 95% CI: 71% to 95%). Researchers in 1 study reported on the diagnostic accuracy of the Dubowitz score to identify preterm infants (sensitivity 81.5%, specificity 98.6%).41 Among LMIC studies (n = 2), there was a tendency of the Dubowitz score to overestimate GA (0.48 weeks), although the precision of the GA estimates was similar to HIC studies (Supplemental Table 11).
Ballard and New Ballard Score
We identified 30 studies in which researchers assessed the validity of the Original Ballard score (n = 20), the New Ballard score (n = 9), or both (n = 1) (Table 4) (17 ultrasound/BOE, 20 LMP reference), with 14 from LMIC. The Original and New Ballard scores assess the same clinical signs, with the New Ballard score20 having additional scoring categories for early preterm infants. Studies in which researchers used the Ballard score (Original or New) were combined for this analysis. Ballard assessments were performed by medically trained health workers (physicians, nurses, or research assistants) in the majority of studies and by community health workers in 2 studies.
Ultrasound or BOE Reference Standard
The correlation coefficients comparing Ballard score GA versus ultrasound or BOE ranged from 0.12 to 0.97 (median = 0.85, n = 7 studies). The mean GA difference ranged from −0.41 weeks (underestimation) to +1.4 weeks (overestimation) in 9 studies. The pooled mean difference was 0.40 weeks (95% CI: 0.00 to 0.81) (Table 5, Supplemental Fig 8), indicating a trend towards overestimation of GA. The pooled SD across the studies was 1.9 weeks, indicating that 95% of the differences in GA by Ballard assessment versus ultrasound dates fell within ±3.8 weeks (n = 9 studies, Table 5) of the mean. For the studies in which researchers reported on agreement in weeks, Ballard score dates fell within 1 week of ultrasound dates in 34% (n = 3; 95% CI: 22% to 44%) of infants and within 2 weeks in 72% (n = 5, 95% CI: 54% to 85%) of newborns. The Ballard score had a pooled sensitivity (n = 4) of 64% (95% CI: 61% to 67%) and specificity of 95% (95% CI: 95% to 96%) for identifying preterm newborns. Among LMIC studies, the trend of GA overestimation was similar to HIC studies. However, the imprecision of GA estimation was greater in LMIC compared with HIC studies (pooled SD of 2.12 vs 1.49 weeks) (Supplemental Table 11).
In several studies, researchers reported evidence of greater bias in Ballard scoring among smaller babies (Supplemental Table 12). In 3 studies, researchers reported that the Original Ballard systematically overestimated GA by up to 2 to 3 weeks, in particular among preterm infants,46,47,61 and generally, the trend was toward increasing bias in lower GAs. However, in a study in Papua New Guinea, Karl et al66 found the opposite trend. Wariyar et al47 reported that the New Ballard overestimated GA to a lesser degree than the Original Ballard in infants <30 weeks (1.6 vs 3.4 weeks, respectively). Among SGA infants, researchers in 2 studies showed that GA was underestimated by the original Ballard.40,61
LMP Reference Standard
The correlation coefficients of Ballard and LMP GA ranged from 0.66 to 0.96 (median = 0.85; n = 13). The mean difference in GA was reported in 6 studies, ranging from 0.34 to 2.6 weeks (overestimation). The pooled mean difference was 0.70 weeks (95% CI: 0.36 to 1.04), indicating systematic overestimation (Table 5, Supplemental Fig 8). Ninety five percent of mean differences fell within ±4.2 weeks (n = 5 studies) of the mean. Ballard GA fell within 1 week of LMP GA in 45% (n = 3, 95% CI: 25% to 66%) of newborns and within 2 weeks of LMP in 76% (n = 9, 95% CI: 71% to 81%) of newborns. The Ballard score had a pooled sensitivity (n = 2) of 84.1% (95% CI: 81.6% to 86.3%) and specificity of 83.5% (95% CI: 79.5% to 87.0%) for identifying preterm newborns (Fig 2). There were an inadequate number of studies to stratify analysis by LMIC versus HICs.
In 2 studies, researchers demonstrated overestimation of GA among preterm infants by the Original Ballard exam,73,79 but researchers in 1 study used the External Ballard only (Supplemental Table 12).79 In addition, researchers in 2 studies found that the Original Ballard performed differently among SGA infants: Baumann et al72 reported that the correlation of Ballard with GA was lower among SGA infants compared with those appropriate for gestational age. Constantine et al73 showed that for SGA babies, the bias for GA dating was 1 to 1.5 weeks lower than for non-SGA infants.
Other Clinical Assessments
Eighteen studies were identified in which researchers reported on the validity of other clinical methods of GA assessment (ie, Eregie et al,40,42,80 Capurro et al,15,40,81–84 Parkin et al,16,40,47,52,54,68 Bhagwat et al,18,33,40 Tunçer et al,26,57 Finnström,24 Narayanan et al,30 and Robinson32,47). These findings are reported in Supplemental Information 3 and Supplemental Table 13. In general, the majority of these exams were simplified assessments with fewer signs and were found to be less accurate than the Dubowitz or Ballard scores for GA dating (Supplemental Information 3; Supplemental Table 13; Table 5).
In 10 studies, researchers reported upon the interrater agreement of GA estimates (Supplemental Table 14). The κ for the classification of preterm births ranged from 0.73 to 0.93 (good to excellent; n = 3).20,67,85 The GA estimates were also highly correlated (r = 0.71–0.95)20,86 and without significant differences between raters.49,62,64,78
Anterior Vascularity of Lens
The literature searches for examination of the anterior vascular capsule of the lens (AVCL) yielded a total of 344 unique manuscripts (Fig 3), of which 10 met inclusion criteria (Table 6). Three were from LMIC (2 from South Asia, 1 from Africa). The studies were generally of smaller sample size (N = 30–356), and the latest was published in 1993. In general, study quality was poor, with a high risk of bias related to patient selection and reference standard. The overall QUADAS–2 assessment is in Supplemental Fig 9.
Assessments were typically performed at <72 hours of life by physicians in tertiary health facilities, with most studies performed in NICU settings and including only preterm and/or LBW infants. An ultrasound/BOE-based date was available in only 2 studies. Pupil dilation was performed before the assessment in 3 studies.
Correlation of AVCL Grading With GA
Hittner et al87,89 reported that as the infant matures in gestation, the AVCL disappears in stages. In Grade 4 (27–28 weeks), the entire anterior surface of the lens is vascularized, reducing to no vasculature in Grade 0 (>34 weeks). Of note, the reference standard in the original Hittner study87 was the Dubowitz score.
In 2 studies, researchers presented data on the average GA determined by Hittner’s AVCL grading system (Table 6).46,91 The correlation of AVCL grade with GA ranged from −0.84 to −0.96 (median: −0.88, n = 7) for preterm and/or LBW populations For the 2 studies in which researchers analyzed all GA populations, correlation was lower (−0.64 to −0.45).24,30 Among SGA preterm newborns, the median correlation coefficient was −0.77 (range: −0.68 to −0.91, n = 3).72,87,89
The results of searches for intermammillary distance, skin impedance, and palmar creases are in Supplemental Information 4.
Accurate GA determination is a public health priority to target and reduce preterm birth–related morbidity and mortality in LMIC. The Every Newborn Action Plan has prioritized GA measurement as a high-priority area to improve the epidemiology of preterm birth and SGA.34 In our systematic literature review, we identified 18 different newborn assessments that have been used for GA dating. The most commonly reported and validated scores in the literature were the Dubowitz and Ballard scores. The Dubowitz score dated 95% of newborns within ±2.6 weeks of ultrasound dating. The Ballard score tended to overestimate GA by 0.4 weeks compared with ultrasound and dated 95% of infants within ±3.8 weeks of this mean. Newborn clinical assessments tend to overestimate GA among preterm infants and therefore may misclassify preterm infants as term. They also tended to underestimate GA in growth-restricted babies. Simplified assessments were less accurate. Although researchers in several studies showed promise of the anterior vascularity of the lens to classify GA <34 weeks, few compared AVCL with an ultrasound-based reference standard.
Study quality was a major limitation of the studies identified in the review, with half of studies having high risk of bias. Many of the original validation studies were from the 1970s, when LMP was the gold standard for pregnancy dating and ultrasound was not widely available. Many hospital-based studies were performed in NICUs among LBW babies and thus were prone to selection and measurement biases (eg, lack of blinding). Fewer than half of the studies were in LMIC, and studies in HICs may not be generalizable to LMIC settings because of health worker availability and training, and differences in the prevalence of SGA and preterm birth.
The majority of individual physical and neurologic signs that have been used in different scoring systems had fair to moderate correlation with GA. Skin opacity was the most weakly correlated and is perhaps the most affected by the timing of the assessment after birth. Although neurologic signs may be more affected by neonatal morbidity (birth asphyxia, neonatal infection, maternal medications, etc), the correlation coefficients of most signs were in a similar range to the physical criteria. In 2 studies21,40 in which researchers excluded early to moderate preterm infants, the correlation of clinical signs with GA was lower, suggesting that the criteria may be more discriminating at lower GAs.
A critical consideration in LMIC is the validity of neonatal assessments in populations with high rates of SGA. Distinguishing whether a small baby is preterm, SGA, or both is a challenge in these settings. Most neonatal assessments were designed to measure infant maturity as opposed to gestational length. SGA infants may act less mature during a neonatal clinical assessment. Three studies have revealed that among SGA infants, neonatal clinical exams tend to systematically underestimate GA.40,61,73 Improving the validity of the neonatal assessment in growth-restricted populations is a critical research need in LMIC.30,87,92
The disappearance of the AVCL, or pupillary membrane, was found to correlate well with GA, although overall study quality was poor, with few studies with ultrasound-based references. AVCL may show promise in LMIC with high rates of fetal growth restriction because the grading correlated relatively well with GA, even among growth-restricted or SGA infants.87 An important consideration is that the AVCL completely disappears after ∼34 weeks’ GA; thus, it may not help with GA dating >34 weeks. Furthermore, the AVCL exam requires specialized skills with an opthalmoscope, which may limit the feasibility and scalability in LMIC.
Several factors should be considered in interpreting and generalizing the validity of neonatal GA assessments in different settings. Imprecision of the Ballard score was greater in LMIC studies compared with HIC studies (HICs: ±3.0 weeks; LMIC: ±4.2 weeks). The validity of a clinical assessment may vary with the level of medical training of the assessor.40,70 Most of the LMIC studies used physicians, nurses, or midwives, and there were few studies with frontline health workers. The validity of the newborn assessment has primarily been studied in the facility and/or hospital-based setting, and the few studies in home-based settings had poorer performance.40,70 Certain factors may improve the validity in the hospital setting, including the timing of assessment sooner after birth, being in a more controlled environment, and lighting. The development of some characteristics may vary by ethnicity. For example, plantar creases progress differently in African American populations93 and skin color may vary. Morbidities, such as gestational diabetes, are more common in specific populations94 and may affect the maturity assessment. Finally, the performance may also be affected by the GA ranges in which it is tested. The performance and validity of the assessments may vary in a general population with a larger representation of late preterm and near-term infants compared with a NICU.
Feasibility and scalability are critical factors to consider in LMIC. As shown in this review, there is a positive correlation between the number of parameters and accuracy of a GA assessment. Yet there is likely to be a negative correlation between the number of parameters (especially neurologic) and the feasibility of use. While the Dubowitz score had the best accuracy, the assessment is complex, may take 15 to 20 minutes to complete, and includes more difficult-to-train neurologic criteria. In South Asia and sub-Saharan Africa, approximately half of births occur outside of hospital facilities, and community-based health workers or traditional birth attendants may be the first point of contact for newborns. These health workers may not have the medical training or the time required to perform the assessment. The duration of the assessment as well as the feasibility of training, standardization, and quality control are critical considerations for scalability in LMIC.
Finally, when evaluating methods of GA assessment, the clinical, research, and programmatic objectives should be weighed. For the clinician, the primary objective is to identify preterm infants requiring special care, and individual-level misclassification may result in missed intervention opportunities. A measurement tool with high sensitivity is desired to identify all preterm infants, perhaps at the expense of specificity. A very simple tool based on a single parameter (such as foot size or another anthropometric parameter) may be suitable to meet these needs. On the other hand, for research, a more precise and continuous measurement of GA is desirable and early pregnancy ultrasound should be used. At the population level, inaccuracy and imprecision in GA dating may result in biased estimates of preterm birth rates and epidemiologic associations with preterm birth.95 Determining the optimal precision (ie, a 95% CI of ±1 or 2 vs 3 weeks) and diagnostic accuracy is also critical to choosing an appropriate method of GA measurement for LMIC. Future research priorities for improving GA determination in LMIC are shown in Fig 4.
As part of the Metrics Group of the Every Newborn Action Plan, we have conducted the first systematic review and meta-analysis assessing the diagnostic accuracy of neonatal GA assessments and scores. The most commonly used assessment, the Ballard score, tended to overestimate GA and had wide margins of error. The Dubowitz score had improved accuracy, although feasibility is a critical consideration in LMIC, and the complexity, training, and time to conduct the assessment are challenges to scale up. Additional high-quality studies are needed in LMIC to determine the accuracy of neonatal assessment compared with an early ultrasound reference, particularly in settings with SGA, as well as to explore the feasibility of implementation of complex GA assessments. This work also underlines the importance of future focus on increasing the maternal demand for knowledge of the GA of their pregnancy, improving coverage of early pregnancy ultrasound scans, and innovations to improve GA assessment in late pregnancy, such as novel ultrasound approaches. In settings where early ultrasound is not possible, increased efforts and innovation are urgently needed to develop simpler yet specific approaches for clinical GA assessment of the newborn, either through new combinations of existing parameters, new signs, or technology.
We acknowledge the students who were also part of the GA working group in the Brigham and Women’s Hospital global newborn health laboratory (Chelsea Clark). We also thank the Brigham and Women’s Hospital Department of Newborn Medicine and Dr Terrie Inder for their support of this work. Finally, we thank the following individuals for their assistance in translating foreign articles: Madeline Gilbert, Alison Leschen, Maria Dąbrowska, Susan Throckmorton, Felix Bergmann, and Lina Driouk.
- Accepted July 25, 2017.
- Address correspondence to Anne CC Lee, MD, MPH, Department of Pediatric Newborn Medicine, Brigham and Women’s Hospital, BB502A, 75 Francis St, Boston, MA 02115. E-mail:
This systematic review was registered with the International Prospective Register of Systematic Reviews. PROSPERO registration number: CRD42015020499.
FINANCIAL DISCLOSURE: The authors have indicated they have no financial relationships relevant to this article to disclose.
FUNDING: This work was supported by the Bill & Melinda Gates Foundation through grant OPP1130198.
POTENTIAL CONFLICT OF INTEREST: The authors have indicated they have no potential conflicts of interest to disclose.
- Blencowe H,
- Cousens S,
- Oestergaard MZ, et al
- World Bank
- Liu L,
- Johnson HL,
- Cousens S, et al; Child Health Epidemiology Reference Group of WHO and UNICEF
- World Health Organization
- Bucher S,
- Marete I,
- Tenge C, et al
- Wang W,
- Alva S,
- Wang S,
- Fort A
- Amiel-Tison C
- Parkin JM,
- Hey EN,
- Clowes JS
- Farr V
- Narayanan I,
- Dua K,
- Gujral VV,
- Mehta DK,
- Mathew M,
- Prabhakar AK
- Robinson RJ
- Serfontein GL,
- Jaroszewicz AM
- Bindusha S,
- Rasalam CS,
- Sreedevi N
- World Health Organization (WHO)
- United Nations International Children’s Emergency Fund (UNICEF)
- World Health Organization (WHO)
- Rosner B
- Macaskill P,
- Gatsonis C,
- Deeks JJ,
- Harbord RM,
- Takwoingi Y
- Lee AC,
- Mullany LC,
- Ladhani K, et al; Projahnmo Study Group
- Roberts CJ,
- Hibbard BM,
- Evans DR, et al
- Awoust J,
- Keuwez JJ,
- Levi S
- Sanders M,
- Allen M,
- Alexander GR, et al
- Wariyar U,
- Tin W,
- Hey E
- Mitchell D
- Thi HN,
- Khanh DK,
- Thu HT,
- Thomas EG,
- Lee KJ,
- Russell FM
- Mackanjee HR,
- Iliescu BM,
- Dawson WB
- Sasidharan K,
- Dutta S,
- Narang A
- Oliveira S,
- Kimura AMR
- Laveriano WRV
- Lee Anne CC,
- Uddin J,
- Shah R, et al.
- Aslan Y,
- Yildiran A,
- Sen Y,
- Erduran E,
- Kasim S,
- Gedik Y
- Guillory C,
- Carsia-Prats JA,
- Hittner HM,
- Rudolph J
- Damoulaki-Sfakianski E,
- Robertson A,
- Gordero L
- Fujimoto W,
- Samoa R,
- Wotring A
- Copyright © 2017 by the American Academy of Pediatrics