OBJECTIVE: For children with cyanotic congenital heart disease or acute hypoxemic respiratory failure, providers frequently make decisions based on pulse oximetry, in the absence of an arterial blood gas. The study objective was to measure the accuracy of pulse oximetry in the saturations from pulse oximetry (SpO2) range of 65% to 97%.
METHODS: This institutional review board–approved prospective, multicenter observational study in 5 PICUs included 225 mechanically ventilated children with an arterial catheter. With each arterial blood gas sample, SpO2 from pulse oximetry and arterial oxygen saturations from CO-oximetry (SaO2) were simultaneously obtained if the SpO2 was ≤97%.
RESULTS: The lowest SpO2 obtained in the study was 65%. In the range of SpO2 65% to 97%, 1980 simultaneous values for SpO2 and SaO2 were obtained. The bias (SpO2 – SaO2) varied through the range of SpO2 values. The bias was greatest in the SpO2 range 81% to 85% (336 samples, median 6%, mean 6.6%, accuracy root mean squared 9.1%). SpO2 measurements were close to SaO2 in the SpO2 range 91% to 97% (901 samples, median 1%, mean 1.5%, accuracy root mean squared 4.2%).
CONCLUSIONS: Previous studies on pulse oximeter accuracy in children present a single number for bias. This study identified that the accuracy of pulse oximetry varies significantly as a function of the SpO2 range. Saturations measured by pulse oximetry on average overestimate SaO2 from CO-oximetry in the SpO2 range of 76% to 90%. Better pulse oximetry algorithms are needed for accurate assessment of children with saturations in the hypoxemic range.
- ABG —
- arterial blood gas
- ABG/SpO2 pairs —
- simultaneous measurement of ABG and SpO2
- AHRF —
- acute hypoxemic respiratory failure
- Arms —
- accuracy root mean squared
- CCHD —
- cyanotic congenital heart disease
- FDA —
- US Food and Drug Administration
- SaO2 —
- arterial oxygen saturations from CO-oximetry
- SpO2 —
- saturations from pulse oximetry
What’s Known on This Subject:
Saturations from pulse oximetry (SpO2) may overestimate arterial oxygen saturations measured by CO-oximetry (SaO2). The overestimation can be affected by location of measurement, perfusion, and skin color. Previous studies are limited by small numbers of observations in a hypoxemic range.
What This Study Adds:
This large sample of hypoxemic patients identified that SpO2 typically overestimates SaO2. Bias and precision varied throughout the SpO2 range. The SpO2 range of 81% to 85% had the greatest bias: median SpO2 6% higher than SaO2 measured by CO-oximetry.
Practitioners frequently make decisions for children based on oxygen saturations obtained from pulse oximetry. Interventions such as supplemental oxygen, diuretics, or transfer to a higher level of care frequently occur without an arterial blood gas (ABG). Pulse oximeter performance has an impact on clinical decisions.
Pulse oximeters are developed to perform optimally in a range of oxyhemoglobin saturation from 70% to 100%. The US Food and Drug Administration (FDA) requires documentation of pulse oximeter accuracy defined as accuracy root mean squared (Arms) <3% with an equal number of samples in the decile range of 70% to 100% from adult volunteers through the standard ISO 80601-2-61:2011.1 As indicated in the FDA’s document “Pulse Oximeters – Premarket Notification Submissions [510(k)s]”2 for devices intended for use with neonates, the FDA recommends performance reports using adult subjects. Additional convenience arterial samples obtained in neonates are recommended when the sensor is new or significantly changed compared with previous devices. It is unclear how well pulse oximeters perform when the majority of observations are in children in a hypoxemic range. Previous work in children has been limited by small samples of observations in a hypoxemic range. However, these earlier manuscripts indicate SpO2 may systematically overestimate measured arterial oxygen saturation from ABG sampling.3,4 The SpO2 overestimation may also be dependent on location of measurement,4,5 perfusion to the extremity where the pulse oximeter is located, and skin color.6,7
The population of children with lower baseline oxygen saturations is growing. There are many children with cyanotic congenital heart disease (CCHD) undergoing and surviving palliative procedures. In addition, practitioners frequently manage children with acute hypoxemic respiratory failure (AHRF) without arterial catheters.8 In these children, practitioners often increase ventilator support and fraction of inspired oxygen when the oxygen saturation falls out of an acceptable range (<87%). Although some manufacturers have developed pulse oximeters with hypoxemic measurements in mind (Masimo Blue Sensors, Nellcor LoSat Sensors), these sensors are not routine in most hospitals.
The primary objective of this study was to determine the performance of pulse oximetry for children in the range of 65% to 97% compared with arterial oxygen saturation measurements from CO-oximetry (SaO2). To overcome limitations with a single measure of bias or precision over the entire range of pulse oximetry, the mean bias, local bias, precision, and Arms are reported. The secondary objective was to explore clinical scenarios in which pulse oximetry may be less reliable.
A prospective, observational study in 5 US multidisciplinary PICUs was conducted from August 2009 to October 2010. The participating PICUs were Children’s Hospital Los Angeles, Penn State Children’s Hospital, University of Virginia Children’s Hospital, Monroe Carell Children’s Hospital, and Cohen Children’s Medical Center of New York. Patients were eligible if they were intubated, mechanically ventilated, had an arterial catheter, had SpO2 values ≤97%, and were ≥37 weeks’ gestational age and <18 years. Patients were excluded if they were receiving extracorporeal membrane oxygenation or were not on a fully supported mode of ventilation. Continuous pulse oximeter recordings were standard of care in all PICUs. Patient demographics were obtained on enrollment and included age, race/ethnicity, gender, weight, and diagnosis. Patients were identified as to whether they had CCHD or AHRF. All participating PICUs had institutional review board approval with waiver of written consent.9
The decision to obtain an ABG sample with CO-oximetry was left to the primary care team. The bedside provider was trained by a member of the research team before any data collection. Sensor cleanliness and position were verified before obtaining the ABG. The SpO2 value was prospectively documented at the precise time the ABG was obtained (ABG/SpO2 pair). ABG/SpO2 pairs were not recorded if SpO2 was >97% or there was endotracheal tube suctioning in the 10 minutes before or ventilator changes in the 30 minutes before the ABG. The ABG/SpO2 pair was not recorded if the pulse oximeter value was not stable before the ABG or if there was concern about the waveform quality. For inclusion in the analysis, SaO2 was measured using CO-oximetry. The blood gas machines used in the study were the ABL800 (Radiometer Medical Aps, Brønshøj, Denmark), Rapidlab 1265 (Siemens Healthcare, Erlangen, Germany), and Gem 3000 (Instrumentation Laboratory, Lexington, MA).
Additional information recorded with each ABG/SpO2 pair included temperature, capillary refill in the extremity with the sensor, hemoglobin, pulse oximeter type, end tidal carbon dioxide, and degree of ventilator support. If the patient had CCHD with a physiology that was dependent on flow through a patent ductus arteriosus and there was a difference in pre- and postductal oxygen saturation, the protocol required that the pulse oximeter sensor and the arterial line were on the same side of the ductus. Ventilator information included peak inspiratory pressure, positive end expiratory pressure, pressure support, rate, exhaled tidal volume, mean airway pressure, fraction of inspired oxygen, and mode. Hemoglobin values were recorded from laboratory samples, and when unavailable, they were obtained from the ABG. Masimo pulse oximeters with the LCNS line of probes (Masimo Corporation, Irvine, CA) were used by 2 PICUs and Nellcor pulse oximeters with the OxiMax line of probes (Covidien-Nellcor, Boulder, CO) were used by 2 PICUs. One PICU used Masimo oximeters with the Nellcor OxiMax line of disposable probes.
To describe pulse oximeter performance, it is helpful to review a few statistical terms commonly used: bias, precision, accuracy, and Arms. Bias is the SpO2 – SaO2. SaO2 as the reference standard from an ABG measured via CO-oximetry. Mean bias is the average SpO2 – SaO2 using all study observations, over the entire range, and is influenced by where the preponderance of pulse oximetry measurements lies. Local bias is mean SpO2 – SaO2 over a specific range of SpO2. Precision is typically 1 SD above and below the mean bias, describing how much random error is in the data. Methods to calculate precision require the bias to be normally distributed, which may not always be true. The limits of agreement are usually taken as 1.96 times the SD unless there is correction for repeated measures. The term “accuracy” is a measure of how far a value is from a reference standard. However, pulse oximeter companies generally present Arms, which is required by agencies that regulate pulse oximeters. The Arms combines the components of bias and precision controlling for the number of samples obtained. Arms of ≤3% is required by the FDA.
The primary objective of this study was to evaluate the accuracy of simultaneous samples of SpO2 compared with SaO2 obtained by CO-oximetry throughout a range of SpO2 values. Additional objectives were to identify variables that may affect the bias of SpO2 compared with SaO2 in the hypoxemic range. Statistical analysis was performed by using Stata version 10 (StataCorp, College Station, TX) and Statistica version 9 (Statsoft, Tulsa, OK).
Overall bias was assessed with a scatter plot of SpO2 against SaO2. Local bias was assessed via a box-and-whisker plot examining the difference between SpO2 and SaO2 in 7 SpO2 ranges. To explore inaccuracy, a bar graph was generated of counts of ABG/SpO2 pairs in which the absolute value of (SpO2 – SaO2) is ≤3% and >3% as a function of SpO2 range. Precision was reported with SD. However, because residuals were not normally distributed, median and interquartile range of (SpO2 – SaO2) as a whole and for each SpO2 range were reported. Arms was calculated from the formulaas a whole and for each SpO2 range.
To evaluate the potential influence of clinical variables on pulse oximetry bias, the ABG/SpO2 pairs were separated into groups based on absolute value of the bias of ≤3% or >3%. This separation was used to perform a multivariate logistic regression analysis using a mixed model to examine the effect of other potential confounding variables and to control for repeated measures per patient. The confounding variables included disease category, gender, oximeter and sensor type, capillary refill in the extremity with the pulse oximeter, hemoglobin, temperature, and degree of ventilator support. Confounders were considered for model inclusion if they had a univariate relationship with the outcome (P < .2) or if there was strong biologic plausibility to suspect that they may influence the relationship between SpO2 and SaO2 (eg, capillary refill).
Finally, to explore whether multiple measurements per subject biased the results, data were filtered to randomly extract a single ABG/SpO2 pair from each patient that fell within the SpO2 ranges previously identified. Values for bias and precision (SD) using only 1 sample per patient per range were calculated and compared with the entire data set.
Two hundred twenty-five children were enrolled with 1980 ABG/SpO2 pairs. The median number of ABG/SpO2 pairs per patient was 5, interquartile range (2–10), range (1–110). Demographics separated by CCHD or AHRF are shown in Table 1. The source and results of the ABG/SpO2 pairs including ventilation information are shown in Table 2. The children with CCHD were younger, weighed less, had lower oxygen saturations, had higher hemoglobin values, and required less ventilator support compared with the group with AHRF (all Ps < .001). The 122 children with CCHD accounted for 1175 ABG/SpO2 pairs with a median SpO2 of 82% and a median SaO2 of 77%, with 82% of the ABG/SpO2 pairs having a SpO2 ≤90%. The 103 children with AHRF accounted for 805 ABG/SpO2 pairs with a median SpO2 of 95% and a median SaO2 of 94%, with 14% of the ABG/SpO2 pairs having a SpO2 ≤90%.
Entire Range of SpO2
Values for SpO2 are plotted against SaO2 for all ABG/SpO2 pairs in Fig 1. A total of 1304 of the 1980 samples (66%) had a positive bias (SpO2 –SaO2 > 0). Although the data were not normally distributed, mean and SD are presented to be consistent with other studies. For the entire SpO2 range 65% to 97% the mean bias was 3.3%, median 2%, the precision (SD) 5.6%, interquartile range (0%–6%), and the Arms 6.5% (Table 3).
Smaller Ranges of SpO2
There were significant differences in bias, precision (SD), and accuracy based on SpO2 range. The local bias (SpO2 – SaO2) was lowest in the SpO2 ranges of 65% to 70% and 96% to 97% with median values of 0% and greatest in the SpO2 range of 81% to 85% with a median value of 6% (Fig 2). Precision (SD) and Arms were worst in the SpO2 range of 81% to 85%, with poor accuracy in all ranges <91% (Table 3).
The data were filtered to randomly extract a single ABG/SpO2 pair from each patient that fell within the SpO2 ranges. Five hundred twenty-six ABG/SpO2 pairs remained with a mean bias of 2.8%, median 2%, precision (SD) 5.6%, interquartile range (–1% to 6%), and Arms was 6.3% (nearly identical to values including repeated measurements). Findings were also similar within each of the SpO2 ranges (analysis not shown).
The ABG/SpO2 pairs were separated into 2 groups based on whether the absolute value of the bias was ≤3% or >3% (Table 4). For ABG/SpO2 pairs in which the bias was >3%, children with CCHD accounted for 78% of the samples (719 of 922) compared with children with AHRF (P < .001). The proportion of instances when the bias is >3% is highest in the midrange of SpO2 (75%–90%; Fig 3).
From multivariate modeling, variables associated with higher likelihood of bias >3% include CCHD, prolonged capillary refill, and having a SpO2 between 81% to 85%, 86% to 90%, or 91% to 95% (compared with a SpO2 of 96% to 97%; Table 5). Variables associated with a lower likelihood of bias were African American race/ethnicity, male gender, and the combination of Masimo oximeter with a Nellcor sensor. After controlling for these variables, mean airway pressure, hemoglobin, PICU site, temperature, fraction of inspired oxygen, age <2 months, other races/ethnicity, and other oximeter combinations did not contribute to the model.
Because the majority of observations in the lower ranges of SpO2 came from children with CCHD, a second multivariate model was performed restricted to only the subgroup of children with CCHD. Overall results were similar with a higher likelihood of bias associated with prolonged capillary refill and SpO2 ranges of 81% to 85% and 86% to 90%. Furthermore, a lower likelihood of bias was associated with African American race/ethnicity and male gender.
There is significant variability in the bias, precision (SD) and accuracy of pulse oximetry as a function of SpO2. SpO2 on average overestimates SaO2 measured with CO-oximetry. This local bias is most positive in the SpO2 range between 81% to 85%, although it is present in the SpO2 range of 76% to 90%. The precision (SD) and accuracy are also poor in this range. Although the median bias is low in the SpO2 range from 65% to 75%, there is poor precision (SD) and accuracy. This variability in bias, precision, and accuracy remains after controlling for capillary refill, diagnosis, pulse oximeter, gender, and race/ethnicity.
These findings may have significant clinical implications. For example, if the SpO2 is measured at 85%, on average the SaO2 would be 79% and 50% of the time the SaO2 would lie between 75% and 83%. However, to achieve 95% certainty, SaO2 values would lie between 64% and 89%. In contrast, if the SpO2 is measured at 94%, on average the SaO2 would be 93% and 50% of the time the SaO2 would lie between 90% and 95% and 95% of the time the SaO2 would lie between 89% and 99%. In the lowest SpO2 range of 65% to 70%, the median bias is close to zero, but the poor precision (SD) and accuracy could affect clinical care. Physicians caring for CCHD patients often make different treatment decisions based on whether the SpO2 is 65% vs 70%. Improved accuracy in this range could help decision-making.
The increased bias and poor precision in the lower SpO2 range may add insight into recent findings from neonatology demonstrating potential harm with lower oxygen saturations.10–12 Three large multicenter randomized trials have targeted ranges of SpO2 to prevent retinopathy of prematurity. Two of the 3 trials demonstrated increased mortality in the oxygen saturation group from 85% to 89%, compared with 91% to 95%.10,12 The increased bias when SpO2 is <90% may result in SaO2 values that are much lower than anticipated.
The multivariate model shows that pulse oximetry performs less well in children with CCHD and prolonged capillary refill. Children with CCHD were the largest proportion in the SpO2 range from 81% to 85%, although this range remained the most likely for bias even when restricting the analysis only to children with CCHD. Therefore, it appears the inaccuracy is most related to the range of SpO2 and not the presence of CCHD. Prolonged capillary refill makes physiologic sense because greater bias is anticipated with poorer perfusion.3 Age <2 months showed a univariate effect, but this was not supported by the regression model. Age was meant as a surrogate for fetal hemoglobin, which may also be partially captured by other elements of the multivariate model. African American race/ethnicity and male gender were associated with lower likelihood of bias of the pulse oximeter. It is unclear what conclusions can be drawn from this. Race/ethnicity but not degree of skin color was recorded. There is no clear biological mechanism in which gender should have an impact on pulse oximetry performance. It is possible these are surrogates for an unmeasured confounding variable.
The findings of this study are consistent with other, smaller studies. Das et al5 in 2010 studied pulse oximeter bias based on sensor location. Unfortunately, only 8 children had oxygen saturations <90%. However, in their small sample size, SpO2 was always greater than SaO2. Sedaghat-Yazdi et al4 in 2008 studied pulse oximeter bias based on sensor location. However, only 24 samples had SaO2 <90%. In 2004, Torres et al3 showed poor pulse oximeter accuracy with 77 SaO2 samples <90%. Our study echoes these findings and has the largest sample size to date by an order of magnitude.
Although pulse oximeters are marketed as meeting standards of accuracy throughout a range of oxygen saturations, this study highlights the need for manufacturers to present more data than the Arms for the entire range. Because mean bias and Arms will be influenced by large number of samples with SpO2 values >91%, where pulse oximeters perform best, a single number for accuracy does not tell the entire picture. Furthermore, accuracy of pulse oximetry algorithms are often demonstrated using adult healthy volunteers breathing a hypoxic gas mixture, which is not the clinical environment in which they are used.
The local bias, precision (SD), and accuracy values presented are much larger than anticipated. The data for this study were acquired in a clinical setting, in the context of a reasonably controlled research study. Accuracy of pulse oximetry algorithms are generally demonstrated in a laboratory setting using adult healthy volunteers breathing a hypoxic gas mixture. This study was unable to meet the rigorous conditions of a research laboratory setting,13 but the findings highlight the real-world clinical application of this scenario. Although it is not reasonable to generate hypoxia in children to produce an improved algorithm for pulse oximetry, there are pediatric cardiac surgery centers with large numbers of patients with CCHD. Perhaps future development could occur with a population such as this to generate algorithms that are more accurate for all children.
There are limitations to this study. First, the multicenter nature of the study increases generalizability, but the results may be confounded by institutional variation. For example, in 1 hospital, a sensor was used that was different from the manufacturer of the oximeter machine, making it difficult to determine the source of inaccuracy. However, this is a real-world situation, and results from that hospital were not large enough to skew the overall trend. Subgroup analysis by pulse oximeter brand was not performed because the majority of samples were measured with 1 oximeter brand. However, there was similar SpO2 overestimation in the second manufacturer, and in that subgroup, this sample size is as large as any previously reported. Moreover, pulse oximetry type (except the combination of Masimo oximeter and Nellcor sensor) was not significant in the multivariate model.
A second limitation is that the data were collected using a waiver of consent. In turn, no patient-identifying information was retained, preventing us from gathering variables not initially included in the study. Such examples include perfusion index, location of pulse oximeter probe, amount of fetal hemoglobin, presence of other hemoglobin species, use of inotropic medications, presence of a patent ductus arteriosus, and pulse pressure. These are potential confounding variables that were not captured. These areas should be investigated in future studies with pulse oximetry.
A third limitation is that fetal hemoglobin was not measured. Pulse oximeter algorithms are generated from adult volunteers with the assumption of 2 types of hemoglobin: oxy and de-oxyhemoglobin. They may not perform well with carboxyhemoglobin, methemoglobin, or possibly fetal hemoglobin. There are limited studies showing the performance of pulse oximeters in the presence of fetal hemoglobin,14,15 with conflicting results. Although it is possible that fetal hemoglobin has an impact on pulse oximetry it is unlikely to explain all of the findings, given that results are similar when restricting to only older patients (Supplemental Information). Furthermore, there was no difference in oxygen dissociation curves based on age ≤2 months or >2 months (Supplemental Information), supporting the idea that the findings are not solely explained by fetal hemoglobin.
Pulse oximeters appear to be accurate against CO-oximetry in a higher SpO2 range such as >91%. However, in the lower SpO2 range (76%–90%), the local bias (SpO2 – SaO2) is ∼5% and the precision (SD) and accuracy are poor when SpO2 is <90%. The manufacturers of these devices should improve algorithms in this range.
The authors thank those people who organized and participated in the research study: “Comparison of SpO2 to PaO2 Based Markers of Lung Disease Severity for Children With Acute Lung Injury,” which provided the data for this article: Neal J. Thomas, MD, MsC, Vani Venkatachalam, MD, Jason P. Sciememe, MD, Ty Berutti, MD, MS, James B. Schneider, MD, Douglas F. Willson, MD, Mark W. Hall, MD, Debbie Spear, Jill Raymond, Christine Traul, Jeff Terry, and Paul Vee.
- Accepted October 3, 2013.
- Address correspondence to Patrick A. Ross, MD, Department of Anesthesiology Critical Care Medicine, 4650 Sunset Blvd Mailstop 12, Children’s Hospital Los Angeles, Los Angeles, CA 90027. E-mail:
Dr Ross participated in data collection locally, performed data analysis, and drafted and revised the manuscript for important intellectual content; Dr Newth conceptualized and designed the study and reviewed and revised the manuscript for important intellectual content; Dr Khemani conceptualized and designed the study, coordinated the data collection at 5 sites, participated in data collection locally, performed data analysis, and drafted and revised the manuscript for important intellectual content; and all authors approved the final manuscript as submitted.
FINANCIAL DISCLOSURE: The authors have indicated they have no financial relationships relevant to this article to disclose.
FUNDING: No external funding.
POTENTIAL CONFLICT OF INTEREST: The authors have indicated they have no potential conflicts of interest to disclose.
- ↵International Organization for Standardization (ISO). Particular requirements for basic safety and essential performance of pulse oximeter equipment (ISO 80601-2-61-2011). Geneva, Switzerland: ISO; 2013
- ↵Center for Devices and Radiological Health. Pulse oximeters—premarket notification submissions [510(k)s]. (document issued March 4, 2013). Food and Drug Administration; 2013. Available at: www.fda.gov/downloads/MedicalDevices/DeviceRegulationandGuidance/GuidanceDocuments/UCM081352.pdf. Accessed November 1, 2013
- Torres A Jr,
- Skender KM,
- Wohrley JD,
- et al
- Khemani RG,
- Markovitz BP,
- Curley MA
- Khemani RG,
- Thomas NJ,
- Venkatachalam V,
- et al.,
- Pediatric Acute Lung Injury and Sepsis Network Investigators (PALISI)
- Schmidt B,
- Whyte RK,
- Asztalos EV,
- et al.,
- Canadian Oxygen Trial (COT) Group
- Copyright © 2014 by the American Academy of Pediatrics