BACKGROUND AND OBJECTIVES: Discharging patients from the NICU may be delayed for nonmedical reasons including the need for medical equipment, parental education, and children’s services. We describe a method to predict which patients will be medically ready for discharge in the next 2 to 10 days, providing lead time to address nonmedical reasons for delayed discharge.
METHODS: A retrospective study examined 26 features (17 extracted, 9 engineered) from daily progress notes of 4693 patients (103 206 patient-days) from the NICU of a large, academic children’s hospital. These data were used to develop a supervised machine learning problem to predict days to discharge (DTD). Random forest classifiers were trained by using examined features and International Classification of Diseases, Ninth Revision–based subpopulations to determine the most important features.
RESULTS: Three of the 4 subpopulations (premature, cardiac, gastrointestinal surgery) and all patients combined performed similarly at 2, 4, 7, and 10 DTD with area under the curve (AUC) ranging from 0.854 to 0.865 at 2 DTD and 0.723 to 0.729 at 10 DTD. Patients undergoing neurosurgery performed worse at every DTD measure, scoring 0.749 at 2 DTD and 0.614 at 10 DTD. This model was also able to identify important features and provide “rule-of-thumb” criteria for patients close to discharge. By using DTD equal to 4 and 2 features (oral percentage of feedings and weight), we constructed a model with an AUC of 0.843.
CONCLUSIONS: Using clinical features from daily progress notes provides an accurate method to predict when patients in the NICU are nearing discharge.
- A&B —
- apnea and bradycardia event
- AUC —
- area under the curve
- DTD —
- days to discharge
- GI —
- ICD-9 —
- International Classification of Diseases, Ninth Revision
- LOS —
- length of stay
- NS —
- RF —
- random forest
What’s Known on This Subject:
Discharge from the NICU requires coordination and may be delayed for nonmedical reasons. Predicting when patients will be medically ready for discharge can avoid these delays and result in cost savings for the hospital.
What This Study Adds:
We developed a supervised machine learning approach using real-time patient data from the daily neonatology progress note to predict when patients will be medically ready for discharge.
Approximately 4 million babies are born every year in the United States, and about 11% (∼440 000) of them are born prematurely.1 Caring for infants in the NICU poses a significant financial burden to the health care system, with an estimated total cost of $26 billion.1 The cost per day of NICU care can be several thousand dollars; therefore, discharging these infants as soon as they are medically ready is critical to controlling expenditures.
Delayed discharge of hospitalized patients who are medically ready is a common occurrence often linked to dependency and the need to provide postdischarge services.2 In older adults, difficulties in coordinating postdischarge services, lack of anticipation of discharge, and absence of caregivers at home were associated with delayed discharge of medically ready patients.3 Similarly, discharging a patient from the NICU usually requires a great deal of coordination. Neonates discharged from the NICU are prime examples of patients with dependencies (on parents and caregivers) and significant postdischarge needs such as primary care, specialists, physical and speech therapy, neonatal follow-up appointments, home equipment services, and home nursing. In cases of intrauterine drug exposure, discharge is often dependent on Child Protective Services approval. Parents have to demonstrate their ability to operate medical equipment, administer home medication, and feed and care for their medically fragile infant. In addition, a number of services must be scheduled around the time of discharge, such as hearing screens, car seat tests, immunizations, repeat state screens, and eye examinations. All these requirements can delay the discharge of a patient who is medically ready and, consequently, unnecessarily increase the cost of hospitalization.
The goal of this project is to build a predictive model to identify patients who are close to discharge from a medical perspective so staff can be alerted to impending discharges. Doing so will allow the nonmedical factors to be addressed in advance to ensure that the patient’s discharge is not delayed.
Almost all previous studies attempt to predict length of stay (LOS) using clinical and diagnostic information at or near the time of admission.4–7 Although it is important to pursue LOS prediction to understand total hospitalization costs, these methods lack sufficient clinical context to accurately predict the discharge date. Instead, the focus of this research project is to identify, based on the most recent clinical data, which patients in the NICU are likely to be discharged from the hospital in the next 2 to 10 days. Our method predicts the upcoming discharge date, not the LOS from time of admission.
To prevent delayed discharge, 3 questions will be answered. First, can the discharge date for a patient in the NICU be accurately predicted? Second, what combinations of clinical data improve predictive accuracy? Third, are there simple, “rule-of-thumb” factors that are responsible for a substantial fraction of the prediction accuracy?
Because of the potential impact on cost savings, predicting the LOS for patients in the NICU has been well studied. Most of the following prediction methods were performed at or near the time of admission. Powell et al8 found gestational age, low birth weight, and respiratory difficulties to be most predictive of LOS. Bannwart et al9 developed 2 models to predict the LOS for patients in the NICU. The first model considered only risk factors present in the first 3 days of life, whereas the second model used factors present during the entire hospitalization.
Despite the use of models incorporating multiple diagnostic factors at the time of admission and during the hospitalization, the accuracy of these models varied significantly, making LOS prediction difficult. Studying the Canadian NICU Network, Lee et al10 found that “significant variation in NICU practices and outcomes was observed despite Canada’s universal health insurance system.” Using data from the California Perinatal Quality Care Collaborative, Lee et al11 reported “wide variance in LOS by birth weight, gestational age, and other factors.”
In 2012, Levin et al12 described a real-time model to forecast LOS in a PICU by using physician orders from a provider order entry system. This model used physician orders (not diagnostic data) to provide a cumulative probability of discharge from the PICU over the next 72 hours. Counts of medications by administration route (injected, infused, or enteral) were more significant in predicting discharge from the PICU than the types of medication the patient received. Activity, diet (regular diet vs parenteral nutrition) and mechanical ventilation orders were highly predictive of remaining in the PICU over the next 72 hours.
It was our hypothesis that using a real-time data source that reflects orders, physiologic data, and diagnostic information will allow improved NICU discharge prediction.
In contrast to LOS models that are performed at the time of admission, our model is updated daily with the most recent progress note data. The calculated probability of discharge may, in the future, be displayed in the electronic medical record.
Patients and Setting
We conducted a retrospective study of all patients admitted to the NICU at a large academic medical center from June 2007 to May 2013.
All patients admitted to the NICU were considered for the study. Patients who were back-transferred to another facility or who died during their NICU hospitalization were excluded from the analysis. Also excluded from the analysis were patients with any missing daily neonatology progress notes.
Data Collection and Extraction
A large database containing all daily progress notes written by neonatology attending physicians was made available to the investigators. The data from the progress notes were in a semistructured text format that was extracted through regular expressions in Python version 2.7.3 (Python Software Foundation, Beaverton, OR) and SQL. In addition, these data were cross-referenced with the enterprise data warehouse to obtain basic patient information such as date of birth and International Classification of Diseases, Ninth Revision (ICD-9) codes used for billing during the hospitalization.
The clinical features used in our model fell into 4 main categories: quantitative, qualitative, engineered, and derived subpopulations. Thirteen features were obtained directly from data contained in the daily progress notes. These extracted features were classified as quantitative (values fell within a range) and qualitative (assigned a value of 0 or 1). Nine features were engineered from the extracted data. These engineered features do not actually exist as data in the progress note but were derived from the extracted data. For example, progress notes contain information on the number of apnea and bradycardia events (A&Bs) in the last 24 hours. The engineered feature from these data was the number of days since the last A&B.
Additionally, a neonatologist (C.U.L.) reviewed 138 of the most frequently occurring ICD-9 codes in the NICU patient population to categorize patients into 4 subpopulations: prematurity, cardiac disease, gastrointestinal (GI) surgical disease, and neurosurgical (NS) disease (see the Appendix for a list of ICD-9 codes and categories). A single patient could belong to 1, many, or none of the subpopulations. Table 1 contains a list of all features used in the model.
All extracted data, subpopulation categories, engineered features, and days to discharge (DTD) were inserted into a matrix. Each row represented data for 1 hospital day for a specific patient. If a row contained missing data in any field, the entire row was excluded from the final matrix.
Because the matrix is constructed using historical data, the outcome of interest (discharge date) is known. The DTD column contains the number of hospital days until the patient is discharged. For example, if the patient was discharged on March 15, the row of the matrix containing patient features for March 10 would have a DTD of 5 (Fig 1).
A supervised machine learning approach using a random forest (RF) classifier in Python’s Sci-kit Learn module (version 0.15.2)13 was used to analyze the data, engineer important features, and build a predictive model. An RF constructs many binary decision trees that branch based on randomly chosen features. The RF in Sci-kit Learn uses an optimized Classification and Regression Trees (CART) algorithm for constructing binary trees by using the input features and values that yield the largest information gain at each node. The Sci-kit Learn package allows the selection of either the gini impurity or entropy algorithms to determine feature importance. These algorithms performed similarly, and we chose to use gini impurity because it is slightly more robust to misclassifications. We ran the models using many different combinations of parameters, and the best-performing models used a RF with 100 trees, maximum tree depth of 10, and a minimum of 200 samples per split.
Models were trained with different combinations of subpopulations (all patients, premature, cardiac, GI surgical, and NS), DTD (2, 4, 7, and 10 days), and number of features (any combination of features from 2 to all 26).
To train our model, we converted the DTD variable into a binary outcome variable based on the number of days we were trying to model. For example, if we were training the model to predict when patients were 4 days from discharge, all values in the model where the DTD was not equal to 4 were set to 0. The rows in which the number of DTD was 4 were set to 1 (Fig 1). This same process was followed for 2, 7, and 10 DTD.
Each time a model was run, half of the patients (and all their associated daily rows) were randomly assigned to a training set, and the other half were assigned to the testing set. Because each patient provides only a single DTD, halving the data provided both testing and training sets an adequate number of the DTD of interest. To achieve small enough standard deviations, the patients were randomly assigned 5 times for each model and the area under the curve (AUC) for the receiver operating characteristic curve was obtained for the testing set. The reported AUC is the average of the 5 AUCs obtained after each round of randomization. Additionally, each time a model was run, the features used in the model were ranked in order of importance.
We ran the model for all patients and for each subpopulation to determine how well the model performed, to choose the most important features for each group, and to determine whether different features had a greater impact on certain patient populations. Finally, the most important features at 2, 4, 7, and 10 DTD were evaluated to determine whether the most important features changed as a patient was getting closer to discharge.
Institutional Review Board Approval
The Institutional Review Board of Vanderbilt University approved this study.
The initial database consisted of 6302 patients (116 299 hospital days) admitted to the NICU between June 2007 and May 2013. There were 256 (4%) deaths during this time period. A total of 1154 (18%) patients were excluded because the database did not contain physician progress notes for every day of the hospital course. There were 199 (3%) patients back-transferred to other NICUs in the region. The final matrix consisted of 4693 (74%) unique patients, accounting for 103 206 (89%) hospital days with a mean LOS of 30 days. A total of 3689 (79%) patients were categorized into ≥1 subpopulations based on ICD-9 codes; the other 1004 (21%) patients did not have an ICD-9 code that matched our criteria (Fig 2).
The average AUC for the model using all 26 features for all patients and each patient subpopulation is shown in Fig 3. Three of the 4 subpopulations (premature, cardiac, GI surgery) and all patients combined performed very similarly at 2, 4, 7, and 10 DTD, with AUCs ranging from 0.854 to 0.865 at 2 DTD and 0.723 to 0.729 at 10 DTD. The NS subpopulation performed worse on every DTD measure, scoring 0.749 at 2 DTD and 0.614 at 10 DTD (Fig 3). Using fivefold cross-validation provided a sufficiently narrow SD range for AUCs of ∼0.005 to 0.01.
The 9 most predictive features for each subpopulation were very similar, and their plots are shown in Fig 4. In each subpopulation, the combination of all features performed better than any single feature alone. Once again, the poorest-performing subpopulation included the NS patients.
In addition to analyzing the most important features for each subpopulation, we explored the best-performing features by the DTD. For each DTD (2, 4, 7, 10 days) the top 20 features in order of importance are shown in Table 2. The combination of all features performed best at each DTD, and model performance improved as patients moved closer to discharge.
We were able to use data from daily progress notes to predict impending discharge from the NICU accurately. Our model improved as more clinical information was included, and its prediction improved as the DTD became smaller (closer to discharge date). Three of the 4 subpopulations and all patients combined performed very similarly. The only population on which the model consistently underperformed was the NS population, for 2 possible reasons. First, the NS population was the smallest cohort by far, and therefore the model may not have had enough patients on which to train. Second, the NS population may be very different clinically from the other patients seen in the NICU, and their readiness for discharge may not be captured in the features extracted for this model.
When we broke the most important features down by subpopulation and DTD, the features remained surprisingly consistent across the subpopulations and DTD. This result was unexpected because we thought that different subpopulations of patients with different medical conditions would have different features that were important for discharge prediction. The top features centered on various feeding metrics, gestational age, and weight. Surprisingly, none of the metrics involving infused medications, caffeine use, A&Bs, or oxygen usage had a significant impact on the predictive power of the model.
Two interesting features are worth discussing. First, the percentage of oral feeds (eg, oral amount divided by the oral amount plus the tube fed amount) was the best-performing or nearly the best-performing feature across populations and DTD values. For example, using this feature alone gives an AUC score of 0.766 at 2 DTD. The second-best feature was the engineered feature of the number of days with oral feedings of >90%. At 10 DTD this feature ranks 20th in importance, but at 2 DTD this feature has advanced to third place. This indicates that consuming most of their feedings orally instead of by tube is an important predictor of impending discharge.
We used 26 features to predict with a high degree of accuracy which patients will be discharged from the hospital in the next 2 to 10 days. However, it may not always be practical or possible to include all these features into a decision support tool to construct this predictive model to alert staff of impending discharges. One beneficial aspect of our approach is the ability to identify and use the most important features to build a scaled-down but still highly predictive model.
A few simple “rule of thumb” models can be created to determine which patients are nearing discharge. For example, a very simple decision tree can be constructed from only 2 features (Fig 5). This tree is based on data from all patients, 2 features (oral percentage of feeds and weight), a DTD of 4 days, and a maximum tree depth of 3. The first branch of the tree splits the patients into 2 groups based on whether their oral percentage of feeds is >80%. On the right, the next differentiator is based on weight. If the patient weighs <1.5 kg, his or her probability of being discharged in the next 4 days is 0.23 (on a scale of 0–1). If the patient weighs between 1.5 and 1.7 kg, his or her probability for discharge in the next 4 days is 0.48. If the patient weighs >1.7 kg and takes >90% of his or her feeds orally, the patient has a 0.81 probability of being discharged in the next 4 days. The probabilities for discharge in 4 days for patients at different weights and taking <80% of their feeds orally are listed in the left-side branch.
This simple decision tree has an AUC of 0.843. Although it is not as accurate as using all features to obtain an AUC of 0.865, it is still an excellent predictor and can be easily calculated at the bedside.
It is interesting that using all 26 features yields an AUC of 0.865, whereas using only 2 features can yield an AUC 0.843. This result illustrates just how important feeding and weight gain are to the improving health of a neonate.
One possible way to improve our current model’s performance would be to add more features. The use of trending data (eg, the average amount of feeding increase over a 5-day period) could be beneficial. Another consideration for model improvement would be to predict a range of days until discharge (eg, 3–5 days instead of just 4).
There are several limitations to this study. First, some of the features used in the model are more difficult to obtain than others, and extracting certain features from commercial electronic medical record systems can be challenging.14 Second, the data extracted included pediatric- and neonatology-specific data, which was collected using specific pediatric functions built into Vanderbilt’s electronic health record. These functions may not be supported by all electronic health record systems.15,16 Third, categorizing hospitalized patients based on ICD-9 codes would be difficult because these codes are not usually available until after discharge. However, as the analysis showed, diagnosis categories added surprisingly little to the prediction model. Should we need our model to differentiate patients, admitting diagnoses could be used. Fourth, our sample could be potentially biased because we did exclude patients if they were missing any progress notes. Although an RF does provide techniques to address missing data, we felt thought excluding these patients was a conservative and appropriate approach.
We trained the model by using actual discharge dates. This limitation worked against us because some of the patients in the data set may have been medically ready for discharge sooner. The model may have performed better if we had been able to determine and adjust for the patients who had delayed discharges for nonmedical reasons. Additionally, once fully implemented our model might predict discharge too early, which could result in premature expectations of parents and possible wasted effort.
Future work will have to include testing the model in different ways. First, we will analyze the model on a new data set, such as patient records obtained from June 2013 to the present. Second, once we finish operationalizing this model, we will collect provider feedback about a patient’s discharge potential during daily rounds. We will then compare those results with the prediction of our model to determine whether the providers or the machine learning model is most accurate.
A supervised machine learning approach using an RF classifier accurately predicts which patients will be discharged from the NICU in the next 2 to 10 days. Running our model daily with the most recent progress note data will identify which patients are close to being medically ready for discharge and may alert the clinical staff through indicators in the electronic medical record. This method would allow more timely discharge planning and has the potential to prevent delayed discharges for nonmedical reasons.
The authors appreciate the Research Derivative team at Vanderbilt University for their assistance in retrieving data. The publication described was supported by CTSA award No. UL1TR000445 from the National Center for Advancing Translational Sciences. Its contents are solely the responsibility of the authors and do not necessarily represent official views of the National Center for Advancing Translational Sciences or the National Institutes of Health.
- Accepted May 19, 2015.
- Address correspondence to Michael W. Temple, MD, Department of Biomedical Informatics, Vanderbilt University School of Medicine, 2525 West End, Suite 1475, Nashville, TN 37203-8390. E-mail: ,
Dr Temple drafted the manuscript, contributed to the data collection, analysis, and model development, reviewed and revised the manuscript, and prepared it for publication; Dr Lehmann assisted with the data collection, aided in the selection of relevant clinical features for the model, categorized ICD-9 codes for grouping patients into distinct populations, and reviewed and revised the manuscript; Dr Fabbri assisted with the data collection, analysis, and model development, contributed to the machine learning and statistical analysis of the data, and reviewed and revised the manuscript; and all authors approved the final manuscript as submitted.
FINANCIAL DISCLOSURE: Dr Lehmann serves in a part-time role at the American Academy of Pediatrics. He has received royalties for the textbook Pediatric Informatics and travel funds from the American Medical Informatics Association, the International Medical Informatics Association, and the World Congress on Information Technology. Dr Fabbri has an equity interest in Maize Analytics, LLC. Dr Temple has indicated he has no financial relationships relevant to this article to disclose.
FUNDING: National Library of Medicine training grant 5T15LM007450-13.
POTENTIAL CONFLICT OF INTEREST: The authors have indicated they have no potential conflicts of interest to disclose.
- Szubski CR,
- Tellez A,
- Klika AK,
- et al
- Powell PJ,
- Powell CV,
- Hollis S,
- Robinson SJ
- Lee SK,
- McMillan DD,
- Ohlsson A,
- et al
- ↵Sci-Kit Learn. 2014. Available at: http://scikit-learn.org/stable/index.html
- Kim GR,
- Lehmann CU,
- Council on Clinical Information Technology
- Copyright © 2015 by the American Academy of Pediatrics