September 2017, VOLUME140 /ISSUE 3

Use of a Smartphone App to Assess Neonatal Jaundice

  1. James A. Taylor, MDa,
  2. James W. Stout, MD, MPHa,
  3. Lilian de Greef, MSa,
  4. Mayank Goel, PhDa,b,
  5. Shwetak Patel, PhDa,
  6. Esther K. Chung, MD, MPHc,d,
  7. Aruna Koduri, MDe,
  8. Shawn McMahon, MDf,
  9. Jane Dickerson, PhDg,
  10. Elizabeth A. Simpson, MDh, and
  11. Eric C. Larson, PhDi
  1. aDepartment of Pediatrics, University of Washington, Seattle, Washington;
  2. bDepartment of Computer Science and Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania;
  3. cDepartment of Pediatrics, Thomas Jefferson University, Philadelphia, Pennsylvania;
  4. dNemours Alfred I. duPont Hospital for Children, Wilmington, Delaware;
  5. eKaiser Permanente Northern California, San Leandro, California;
  6. fMaricopa Integrated Health System, Phoenix, Arizona;
  7. gSeattle Children’s Hospital, Seattle, Washington;
  8. hChildren’s Mercy Hospital, Kansas City, Missouri; and
  9. iDepartment of Computer Science and Engineering, Southern Methodist University, Dallas, Texas
  1. Dr Taylor conceptualized and designed the study, supervised data collection at 1 site, analyzed the study data, and drafted the initial manuscript; Dr Stout assisted in the design of the study, assisted in the interpretation of study data, and critically reviewed the manuscript; Ms de Greef and Drs Goel, Patel, and Larson designed the software used in the technology to assess neonatal jaundice, assisted in the analysis of study data, and critically reviewed the manuscript; Drs Chung, Koduri, McMahon, Dickerson, and Simpson assisted in the design of the study, supervised data at 1 site each, and critically reviewed the manuscript; and all authors approved the final manuscript as submitted.

  • Dr Goel’s current affiliation is Department of Computer Science and Engineering, Carnegie Mellon University, Pittsburgh, PA.


BACKGROUND: The assessment of jaundice in outpatient neonates is problematic. Visual assessment is inaccurate, and more exact methodologies are cumbersome and/or expensive. Our goal in this study was to assess the accuracy of a technology based on the analysis of digital images of newborns obtained using a smartphone application called BiliCam.

METHODS: Paired BiliCam images and total serum bilirubin (TSB) levels were obtained in a diverse sample of newborns (<7 days old) at 7 sites across the United States. By using specialized software, data on color values in the images (“features”) were extracted. Machine learning and regression analysis techniques were used to identify features for inclusion in models to predict an estimated bilirubin level for each newborn. The correlation between estimated bilirubin levels and TSB levels was calculated. In addition, the sensitivity and specificity of the estimated bilirubin levels in identifying newborns with high TSB levels were calculated by using 2 recommended decision rules for jaundice screening.

RESULTS: Estimated bilirubin levels were calculated and compared with TSB levels in a diverse sample of 530 newborns (20.8% African American, 26.3% Hispanic, and 21.2% Asian American). The overall correlation was 0.91, and correlations among white, African American, Hispanic, and Asian American newborns were 0.92, 0.90, 0.91, and 0.88, respectively. The sensitivities of BiliCam in identifying newborns with high TSB levels were 84.6% and 100%, respectively, by using 2 decision rules; specificities were 75.1% and 76.4%, respectively.

CONCLUSIONS: BiliCam provided accurate estimates of TSB values, demonstrating that an inexpensive technology that uses commodity smartphones could be used to effectively screen newborns for jaundice.

  • Abbreviations:
    area under the curve
    BiliCam-estimated bilirubin
    extreme hyperbilirubinemia
    Kaiser Permanente San Leandro Medical Center
    Maricopa Integrated Health System
    McKay-Dee Hospital
    transcutaneous bilirubin
    Thomas Jefferson University Hospital
    Truman Medical Center
    total serum bilirubin
    University of Washington Medical Center
  • What’s Known on This Subject:

    Jaundice peaks in many healthy neonates after discharge from their birth hospitalizations. A visual assessment of jaundice in these newborns is inaccurate, with the potential of either missing infants with significant jaundice or unnecessarily obtaining blood for a serum bilirubin level.

    What This Study Adds:

    The results of this study suggest that a technology that uses a smartphone application has the potential to be a useful methodology for effectively screening newborns for jaundice.

    Of newborns, >80% develop visual jaundice in the first few days of life.1,2 Systematic assessment to identify newborns with significant hyperbilirubinemia is a central focus of care during the birth hospitalization in the United States. In a recent study, respondents from 86% of 60 newborn nurseries across the United States reported that they screened virtually all neonates with either a transcutaneous bilirubin (TcB) or total serum bilirubin (TSB) measurement before discharge.3 Unfortunately, bilirubin levels typically peak in neonates at ∼96 hours of life, which is well after most infants are discharged.49 Because of this, the American Academy of Pediatrics recommends that newborns discharged before 72 hours of age be seen by a health care provider within the subsequent 48 to 72 hours to assess for jaundice.4,6,10 However, accurate assessment of the severity of jaundice in outpatient neonates is problematic. TSB measurement is more difficult in newborns after discharge than during the birth hospitalization.4 Although TcB measurement could be a viable option, the high cost of TcB meters limit their widespread use in outpatient settings.4,11 Given these obstacles, the outpatient assessment of jaundice in neonates is generally done by a visual inspection of an infant’s skin to assess the degree of yellowness.11 However, there is ample evidence that even experienced health care providers cannot accurately estimate the severity of jaundice.11 In studies in which researchers compared the visual assessment of jaundice with TSB levels, correlation coefficients ranged from 0.36 to 0.75 with poor interobserver agreement.1117

    In resource-poor countries, kernicterus continues to be a major and underappreciated source of neonatal morbidity and mortality. An estimated 500 000 newborns each year born in low- and middle-income countries develop extreme hyperbilirubinemia (EHB) (defined as a TSB level of ≥25 mg/dL), leading to 114 000 neonatal deaths and 75 000 cases of kernicterus.18 A primary reason for this high morbidity and mortality is the inability to measure bilirubin levels in many locations.19

    There is clearly a global need for an inexpensive and widely available technology that could be used to screen newborns for jaundice. Our group has developed BiliCam, which is a technology based on the analysis of digital images obtained with a smartphone application (“app”) to provide an estimate of TSB.20 For this study, we used BiliCam to collect data on a large and diverse sample of newborns with the goal of finalizing a robust algorithm for converting image data into an estimated TSB. We postulated that BiliCam would have similar utility to that of TcB measurement as a screening device for jaundice.


    Description of the Technology

    The BiliCam app is designed to obtain images of the skin overlying a newborn’s sternum in a standardized manner and transmit the image data via the Internet to a computer server for analysis. For the study, the app was installed on iPhone 5s smartphones. The process of obtaining a set of BiliCam images is initiated by placing a color calibration card (a modified Macbeth Color Checker in the shape of a hollow square) on the newborn’s sternum.21 The calibration cards are ∼6 cm by 6 cm and are held in place with a small amount of neonate-friendly adhesive. The use of the calibration card helps to account for variations in lighting conditions and facilitates image capture and data extraction. The cards are printed on specially coated paper to reduce glare, and color accuracy is checked during the printing process to insure batch-to-batch stability.

    The user starts the app, and a red square appears on the smartphone screen. When this square is properly aligned with the color calibration card and lighting is adequate, the app automatically captures images using the smartphone camera both with and without flash. Flash and nonflash images are obtained at 3 distances from the newborn (6 images total) and sent to the server. The process of obtaining the photos typically takes <60 seconds.

    On the server, the red, green, and blue values of pixels from multiple regions of the color calibration card and an area of the newborn’s skin in the hollow portion of the card are measured. These measurements are also transformed into additional color representations. These representations, or “features,” are entered into an algorithm that is used to estimate a bilirubin value.

    Study Procedures

    A prospective study was conducted. Participants were healthy, newborn infants <7 days old who were born at ≥35 weeks’ gestation. Neonates who had received phototherapy were ineligible.

    Participants were recruited for the study at 7 sites across the United States, including the University of Washington Medical Center (UWMC) in Seattle, Washington; Thomas Jefferson University Hospital (TJUH) in Philadelphia, Pennsylvania; Seattle Children’s Hospital in Seattle, Washington; Kaiser Permanente San Leandro Medical Center (KPSL) in San Leandro, California; Maricopa Integrated Health System (MIHS) in Phoenix, Arizona; McKay-Dee Hospital (MKD) in Ogden, Utah; and Truman Medical Center (TMC) in Kansas City, Missouri. Study participants were enrolled between October 2014 and July 2016.

    A variety of enrollment procedures were used. At UWMC and TJUH, newborns were enrolled when they were ≤24 hours old, with a follow-up study visit when they were 3 to 5 days old. At the follow-up visit, a set of BiliCam images was obtained, and blood was drawn for a TSB level. At KPSL and MIHS, BiliCam images were obtained from participants at the time of a blood draw for a TSB level that was ordered because a newborn was clinically jaundiced. Finally, newborns at MKD and TMC were enrolled and BiliCam images were obtained when TSB levels were measured as part of routine screening or if a neonate was clinically jaundiced. Attempts were made to obtain BiliCam images within 2 hours of the blood draw for TSB determination. Blood samples were assayed by the clinical laboratory at each site. The TSB assays at each of the participating sites were run by using the following platforms: at UWMC, Beckman AU680 Total Bilirubin; at TJUH, Roche Cobas 501 Bilirubin Total; at Seattle Children’s Hospital, Ortho Vitros 4600 BuBc; at KPSL, Beckman AU680 Total Bilirubin; at MIHS, Ortho Vitros 5600 BuBc; at MKD, Abbott Architects c8000 or c4000 Total Bilirubin; and at TMC, Roche Cobas Bilirubin Total.

    At all sites, parents of participants were asked to provide the race and ethnicity of their infants at the time of enrollment; data on participants’ birth dates and times were abstracted from medical records. At UWMC, MIHS, and TJUH, TcB measurements were done for virtually all participants at the time of the blood draw for TSB; TcB measurements were obtained for selected newborns at TMC. Both BiliChek (Philips Respironics, Monroeville, PA) and the Draeger Jaundice Meter JM-103 (Draeger Inc, Telford, PA) brands of TcB meters were used for TcB measurements.

    The study was approved by the institutional review boards at each participating institution, and written, informed consent was obtained from the parents of study newborns.

    Algorithm Development

    Machine learning and regression analysis techniques were used to identify sets of features that were highly correlated with TSB levels. Models were developed continually during data collection. Features from each of the 6 images obtained of the study participants were used. If any of the 6 images were unusable, a bilirubin level was not estimated. When data collection concluded, a final set of features was selected and included in regression models to determine a BiliCam-estimated bilirubin (BCB) level for each participant by using a 10-fold cross-validation procedure. With this technique, the study sample was randomly divided into 10 equal-sized subgroups (ie, folds) that were stratified by TSB value. One fold was removed from the sample, and data from the other nine folds were used in training a regression model for estimating bilirubin levels. This model was then used to estimate the BCB values for participants in the removed fold. This process was repeated for 10 iterations, successively removing one fold, developing a model by using data from the remaining nine folds, and applying the model to the removed fold. Thus, data from a particular newborn were not used in developing the model that was used to estimate the bilirubin level for that newborn, providing an unbiased evaluation of BiliCam.


    The primary outcome was the linear correlation between BCB and TSB values; subgroup analyses were conducted for newborns from various racial and ethnic groups. The mean (± SD) difference between paired BCB and TSB levels was also calculated. A Bland-Altman analysis was performed; mean bias and limits of agreement were calculated. Similar analyses were done comparing paired TcB and TSB measurements.

    We assessed the utility of BiliCam and TcB as screening tools for identifying newborns with significant jaundice using 2 recommended decision rules.4 First, BiliCam and TcB levels were plotted on the Bhutani TSB nomogram.22 A positive test result was a BiliCam or TcB level of ≥75th percentile on the nomogram (ie, in the high-intermediate or high-risk zone), with a positive result being a TSB level of ≥95th percentile (high-risk zone).4,23 For the second decision rule, a positive test was a BiliCam or TcB level of ≥13 mg/dL to identify a newborn with a TSB value ≥17 mg/dL.4,24 The sensitivity, specificity, positive predictive value, and negative predictive value were calculated for BiliCam and TcB for each decision rule. We compared the utility of BiliCam and TcB as screening tools for identifying neonates with TSB levels in the high-risk zone on the Bhutani nomogram or a TSB level of ≥17 mg/dL by constructing receiver operator characteristic curves and comparing the area under the curve (AUC) for each using the approach described by DeLong et al.25

    One study participant had 2 BCB and 2 TSB measurements at different time points. Analyses were conducted by excluding either set of measurements or including both, and the results were virtually identical. No adjustment was made for the 2 measurements from this single participant, and they were considered as separate events.


    BiliCam images were obtained for 580 enrolled newborns. A matching TSB level was missing for 8 participants; in 2 newborns, there was a laboratory problem, parents of 2 participants declined the blood draw, and no matching TSB level was obtained for 4 infants. In addition, data on 3 newborns were excluded because of issues with the consenting process. From the remaining 569 participants, a complete set of BiliCam images was obtained for 530 newborns (93.1% of those who were eligible).

    The race and ethnicity of the 530 newborns with complete TSB and BiliCam data are summarized in Table 1. The mean age of these participants at the time that BiliCam images were obtained was 75.2 ± 28.9 hours, with a range of 12 to 163 hours. The time between the TSB blood draw and BiliCam image collection was <2 hours in 515 participants (97.2%); the time difference was <3 hours in the remaining 15 newborns. The mean TSB value in study participants was 10.4 ± 4.4 mg/dL, with a range of 0.6 to 24.8 mg/dL. There were 66 newborns (12.5%) with a TSB level in the high-risk zone on the Bhutani nomogram; 80 participants (15.1%) had TSB levels of ≥15.0 mg/dL.

    TABLE 1

    Race and Ethnicity of Newborns With TSB and BCB Data

    Based on estimates calculated by using a 10-fold cross-validation procedure, the correlation between the BCB level and the paired TSB measurement was 0.91 (95% confidence interval 0.89–0.92). The correlation between BCB and TSB is shown graphically in Fig 1. The correlation coefficients in different racial and ethnic groups are summarized in Table 2. As is shown in the table, the correlation was highest among white neonates and lowest among Asian American newborns. The mean difference between BCB and TSB was 0.01 ± 1.8 mg/dL, with a range of −0.5.1 to +6.4 mg/dL; 91.9% of BCB values were within 3 mg/dL of the paired TSB level, and 73.0% were within 2 mg/dL. A Bland-Altman plot summarizing the differences between BCB and TSB is presented in Fig 2; as shown, the limits of agreement were −3.6 to +3.6 mg/dL.

    FIGURE 1

    The relationship between paired TSB and BCB values in 530 newborns. The linear regression line is shown, along with individual points.

    TABLE 2

    Correlations Between TSB and BCB Among Study Newborns From Different Racial and Ethnic Groups

    FIGURE 2

    A Bland-Altman plot of paired BCB and TSB values. The mean of TSB and BCB values (in mg/dL) is displayed along the x-axis, and the difference of BCB-TSB (mg/dL) is displayed on the y-axis. Horizontal lines denote the mean difference between BCB and TSB values and limits (95% confidence interval) of agreement.

    TcB measurements were made for 331 study newborns. Among these infants, the correlation between TcB and TSB was 0.91 (95% confidence interval 0.89–0.93). The mean TcB-TSB difference was 0.51 ± 1.8 mg/dL, with a range of −4.6 to +5.9 mg/dL; the limits of agreement were −3.2 to +4.2 mg/dL.

    The utility of BiliCam as a screening tool for identifying newborns with significant hyperbilirubinemia is summarized in Table 3; values for TcB are also shown. As is shown in Table 3, BiliCam had a sensitivity of 84.6% for identifying newborns with a TSB level in the high-risk zone on the Bhutani nomogram and a specificity of 75.1%. The sensitivity of identifying a neonate with at TSB level of ≥17.0 mg/dL was 100% with a specificity of 76.4%. A formal comparison of TcB and BiliCam as methods for screening neonates with TSB values in the high-risk zone (≥17 mg/dL) was limited to 312 newborns with both TcB and BCB values. Among these participants, the AUC for a high-risk zone TSB level was 0.95 for BiliCam and 0.92 for TcB (P = .30); for identifying newborns with a TSB level of ≥17.0, AUCs were 0.99 and 0.95, respectively (P = .09).

    TABLE 3

    Utility of BiliCam and TcB as Screening Tools to Identify Newborns With High TSB Values, Defined as a TSB Level in the High-risk Zone on the Bhutani TSB Nomogram or as a TSB Level of ≥17.0 mg/dL


    Our results suggest that a technology based on the analysis of images obtained by using an app on a commodity smartphone provided reasonably accurate estimates of TSB values in newborn infants and had accuracy similar to that of TcB measurements in study participants. The correlations between BCB and TSB values are also consistent with results of published studies on the accuracy of TcB in estimating bilirubin levels in newborns. The reported correlations between TcB and TSB range from 0.77 to 0.97.3,11,23,2636 Most of these studies were conducted on newborns during their birth hospitalizations. Although there are limited data on the accuracy of TcB measurements in outpatients, in 2 studies in which researchers focused on neonates after hospital discharge, correlations between TcB and TSB were found to be 0.77 and 0.78, respectively.35,36 A possible reason for the lower correlations found in outpatient newborns is that TSB levels tend to peak after newborns are typically discharged from their birth hospitalizations.49 TcB levels have been found to progressively underestimate serum values in neonates with higher TSB levels, particularly ≥15.0 mg/dL.3,26,37

    Our results suggest that BiliCam does not have adequate accuracy to serve as a standalone methodology to assess jaundice in newborns. Rather, as with TcB meters, BiliCam is best suited as a screening device to aid in determining which neonates require a blood draw for a TSB level, with treatment decisions being based on the TSB level.4 Because of this, perhaps the most clinically relevant comparison between BiliCam and TcB is their utility as screening tools for identifying newborns who require a TSB level while obviating the need for a blood draw in most. In our study, using 2 recommended decision rules, BiliCam and TcB had comparable utility in identifying newborns with high TSB levels. Bhutani et al23 reported that the BiliChek-brand TcB meter had a sensitivity of 100% and a specificity of 88.1% in identifying newborns with a TSB level in the high-risk zone on the Bhutani nomogram during their birth hospitalizations. In a study evaluating both the BiliChek and JM-103 brands of TcB meters in 2 populations of newborns assessed during their birth hospitalizations, the sensitivity of TcB screening was 94.1% and 91.9%, respectively, for this same outcome.38 However, in a study of outpatient newborns with higher TSB levels, the sensitivity and specificity of TcB screening for identifying newborns with high-risk–zone TSB levels were 79% and 84%, respectively.35 In this same study using a cutoff value of ≥13.0 mg/dL to define a positive TcB screen, the sensitivity of TcB screening was 100%, and the specificity was 58% for identifying outpatient neonates with a TSB ≥17.0 mg/dL. Overall, these results suggest that BiliCam may be as effective as TcB in identifying newborns in need of a blood draw for a TSB level.

    There has been limited previous study on using digital images to estimate bilirubin levels in newborns. Rong et al39 used a system similar to the one we assessed and reported an r2 of 0.628 (equivalent to r = 0.79) between estimated bilirubin and TSB values in 148 term newborns. Leung et al40 used a digital camera to obtain images of newborns’ sclerae and found a correlation between estimated bilirubin and TSB levels of 0.75.

    In evaluating the results of the current study, there are several important caveats to consider. We used a 10-fold cross-validation procedure to prevent overfitting the algorithm for converting image data into an estimated bilirubin level and to provide an unbiased assessment of the likely accuracy of the technology. Based on the data collected, a single algorithm is being developed that includes data from each of the 6 images obtained of the newborn. This algorithm is used to generate the BCB value, which is sent back to the smartphone via the Internet; the process of extracting the image data, applying the algorithm, and returning the value to the smartphone requires only a few seconds. The accuracy of the developed algorithm and the process of sending the BCB value to the smartphone will need to be validated in a different population of newborns. BCB values were compared with TSB levels that were measured by using different assays from different laboratories. There is more variability in TSB measurements related to laboratory methodology than is generally appreciated.41 It is possible that the correlations between BCB and TSB would have been higher if only a single laboratory for TSB assay had been used. Although there are many variables to consider when comparing BCB (or TcB) levels and TSB measurements (including infant age, TSB level, and laboratory methodology), the correlation between BCB and TSB tended to be lower in Asian American newborns than in those from other racial and ethnic groups. This will need to be further delineated in future studies. Finally, as with TcB meters, BiliCam was not perfect in identifying newborns with TSB levels in the high-risk zone on the Bhutani nomogram. However, BiliCam and TcB were highly sensitive in identifying newborns with a TSB level of ≥17 mg/dL. Using a threshold value of 13 mg/dL for defining a positive BCB or TcB test would eliminate the need for an unnecessary blood draw in the majority of newborns.35,38

    Because BiliCam requires no extra equipment besides a smartphone and the color calibration card, it has the potential to transform the outpatient management of jaundiced newborns. Health care providers evaluating newborns shortly after hospital discharge could use the technology to efficiently determine which infants require a blood draw for a TSB level. BiliCam could be used both in office settings and by nurses and other health care professionals evaluating newborns during home visits. Perhaps most importantly, in low- and middle-income countries with limited resources, BiliCam could be a low-cost technology that is used by health care workers to screen large numbers of newborns for jaundice and effectively identify the few that are at significant risk for EHB. In combination with low-cost phototherapy devices that have now been developed,42 BiliCam could thus be part of a system of care that could significantly reduce the morbidity and mortality related to EHB in these areas.


    We thank Vickie L. Baer, RN, and Robert D. Christensen, MD, for conducting the study at McKay-Dee Hospital in Ogden, Utah. We also thank Ping-Yu Liu, PhD, for his assistance with the biostatistical analyses of study data.


      • Accepted June 28, 2017.
    • Address correspondence to James A. Taylor, MD, Department of Pediatrics, University of Washington, Box 354920, Seattle, WA 98115. E-mail: uncjat{at}
    • FINANCIAL DISCLOSURE: Other than those already listed under Potential Conflicts of Interest, the other authors have indicated they have no financial relationships relevant to this article to disclose.

    • FUNDING: Funded by the Eunice Kennedy Shriver National Institute of Child Health and Human Development (R21HD080768) and the University of Washington Wallace H. Coulter Foundation. Funded by the National Institutes of Health (NIH).

    • POTENTIAL CONFLICT OF INTEREST: Drs Taylor, Stout, and Patel are cofounders of BiliCam, LLC, a company developing the technology described in this study for commercial use; the other authors have indicated they have no potential conflicts of interest to disclose.