Abstract
PURPOSE. The purpose of this work was to develop and assess the feasibility, reliability, and validity of a brief performance checklist to evaluate skills during a simulated neonatal resuscitation (“megacode”) for the Neonatal Resuscitation Program of the American Academy of Pediatrics.
METHODS. A performance checklist of items was created, validated, and modified in sequential phases involving: an expert committee, review, and feedback by Neonatal Resuscitation Program instructors for feasibility and criticality and use of the performance checklist by Neonatal Resuscitation Program instructors reviewing videotaped megacodes. The final 20-item performance checklist used a 3-point scale and was assessed by student and instructor volunteers. Megacode scores, the NRP multiple-choice examination scores, student assessments of their ability and performance, and sociodemographic descriptors for both students and instructors were collected. Data were analyzed descriptively. In addition, we assessed the megacode score internal consistency reliability, the correlations between megacode and multiple-choice examination scores, and the variance in scores based on instructor and student characteristics.
RESULTS. A total of 468 students and 148 instructors volunteered for the study. The instrument was reliable and internally consistent. Student's scores were high on most items. There was a significant but low correlation between the megacode score and the written knowledge examination. Instructor and student characteristics had little effect on the variance in scores.
CONCLUSIONS. This performance checklist provides a feasible assessment tool. There is evidence for its reliability and validity.
- neonatal resuscitation
- neonatology
- megacode
- clinical assessment
- simulation
- competence assessment
- educational outcome
Worldwide, ∼1 million newborn infants die each year because of birth asphyxia. In the United States, 400000 newborns will require assistance with breathing, and ∼40000 will need extensive resuscitation to survive. The Neonatal Resuscitation Program (NRP) is an educational program developed by the American Academy of Pediatrics and the American Heart Association to help physicians, nurses, and other health care professionals attain the knowledge and skills necessary for proficiency in neonatal resuscitation. The Textbook of Neonatal Resuscitation1 includes 7 lessons describing skills including temperature management, assisted ventilation, chest compressions, endotracheal intubation, emergency medications, special circumstances, and ethics. Currently there are 24 000 NRP instructors and 2 million providers in the United States. At least another 500 000 providers have been trained worldwide through programs in >100 countries.2 Despite the importance of performing neonatal resuscitation skills correctly, investigators have found that content knowledge deteriorates quickly and resuscitators frequently do not perform skills correctly during real resuscitations.3,4 A reliable and valid method to assess a learner's resuscitation proficiency is needed.
The skills to carry out a resuscitation are critical. Six percent of all newborns and 80% of those weighing <1500 g require resuscitation at birth.3 Despite prenatal care, antepartum and intrapartum history, and fetal monitoring during labor, the need for resuscitation cannot always be predicted.4 Furthermore, performance is subject to many pitfalls, including the use of unskilled resuscitators, incorrect intubations, inadequate suctioning of meconium, and postresuscitation problems of hypoglycemia, hypocarbia, and hypotension.5 There is skill attrition6 even with “booster” courses designed to compensate for this loss.7
The NRP assesses learning by administering a multiple-choice examination (MCQ) to determine knowledge and a megacode to determine performance. The megacode is a simulated scenario that requires students to demonstrate their skills of evaluation, decision-making, and performance while observed by an instructor. Megacode scenarios are available from the NRP textbook1 or may be developed from the instructors' own clinical experiences. They may vary in difficulty or complexity. The NRP textbook contains a list of 77 items to be used as a template to assess performance on a megacode (see manual after pages 7–32).1 The list does not distinguish between critical decisions or actions that are “lifesaving,” as opposed to those that may be optional or “nice to do.” The list is lengthy, requiring instructors to assess a large number of items in a short period of time. Consequently, it is difficult to ensure that each participant is fairly and uniformly assessed.
The need for adaptations of the NRP megacodes and performance checklist for assessment purposes has been recognized.6–8 Modifications that researchers have described in the literature include the development of 5 scenarios for testing using a system in which marking was based on activities that were “life supporting” versus life saving,6 the creation of a long 131-item performance checklist in which items were assessed by computer,7 and another that divided the megacode activities into “epochs” of resuscitation.8
This study describes the development and testing of a new tool to assess neonatal resuscitation skills during a megacode. Our research questions were: (1) is it feasible to develop a “short” (ie, ∼20 items) performance checklist that can be used to assess skills? and (2) what evidence exists that the checklist is valid and reliable?
METHODS
The development of the megacode scoring instrument was guided by parameters established by the NRP Steering Committee (NRPSC) of the American Academy of Pediatrics. The NRPSC wanted an instrument that could be used in conjunction with the written examination to assess performance. The instrument had to be brief (∼20 items), because megacodes are completed in 5 to 10 minutes, and the items on the instrument had to be assessed by a single observer. Each item had to be observable so that the instructor could assess it while standing next to the student without interfering with the progress of the simulated resuscitation. The assessed items needed to be important to the infant's outcome. The items also had to have face validity (“buy-in”) from current instructors and experts in the field.
Development of Checklist
The checklist was developed and validated in several phases, described below.
Stage 1
In 2002, 2 of the authors (Drs Singhal and Aziz) created a performance checklist of 19 discrete items believed to be essential to the conduct of neonatal resuscitation. These items were assessed for initial content validity by the NRPSC. The performance checklist was then placed on the NRP Web site (www.nrp.org) for additional review.
Stage 2
In 2003, the NRP posted a notice on an 8000-person online mailing list recruiting volunteers to review the performance checklist. Participants provided feedback on a questionnaire for each of the proposed items indicating how easily they thought they could assess the items and their perception of the relevance of the item to neonatal resuscitation. A total of 822 responders participated, including physicians, nurses, and respiratory therapists. Based on their feedback, the wording of the original 19 items was modified.
Stage 3
In 2004, teams from 5 centers in Canada and the United States created 28 video clips in which instructor/student interactions were videotaped during mock megacodes. Seventeen experienced instructors (physicians, nurses, and respiratory therapists) then viewed an average of 6.6 video clips each, using the performance checklist to score student performance, as well as ease of scoring. They also provided descriptive information about their perceptions of the performance checklist. Based on these data, the performance checklist was modified to include 20 items with a performance scale ranging from 0 to 2 (0 = not done; 1 = partially done; 2 = done). The decision to use a 0–2 scale was controversial, because some scale developers would advocate a wider scale9 on psychometric grounds. The NRPSC, however, felt that additional intermediate gradations (eg, “done somewhat well” or “done very well”) were not clinically relevant in a critical care situation.
Study
The final performance checklist included 20 items (Table 2). Instructors were invited to use the “new” performance checklist while they evaluated students at the end of actual NRP courses. Instructor participants were recruited at a national NRP instructor conference (October 2004) and from the 822 participants in stage 2. Instructors were eligible to participate if they were teaching a course between October 2004 and January 31, 2005. Informed consent was obtained from both instructors and students.
In addition to performance data collected during the assessment session, student participants were asked about their own perceptions of their ability to carry out neonatal resuscitation and their performance on the megacode. Perceptions of ability and performance were measured on a self-reported 5-point scale (1 = low and 5 = high). Scores on the MCQ for each lesson were provided and paired with the megacode score reflecting skills from the corresponding lesson. Demographic information including gender, profession (physicians, nurses and respiratory therapists), nursery type (level 1, 2, or 3 [tertiary care]), type of NRP instructor (hospital-based instructor or regional trainer), and years of nursery experience were collected.
Megacode scenarios are not standardized, and the clinical situation in each scenario may involve activities related to only the first 4 lessons or all of the lessons. As a result, the total score varied for each candidate. To control for this variation, it is important to convert scores so that they have a value between 0 and 1. To this end, we established 3 megacode scores using the raw scores (0, 1, and 2). They were: (1) megacode summative subscore: the sum of raw scores for each lesson (ie, for lesson 1, items 1 and 2 were summed; for lesson 2, items 3-7 were summed); (2) converted megacode subscore: a score that calculated the proportion of correct activities; this was created by dividing the megacode summative subscore by the highest possible score for the particular lesson; and (3) total converted megacode score: a score across all of the lessons; this was calculated by summing the raw scores across all of the lessons and dividing the sum by the highest possible score that a participant could get (depending on which lessons were used to test the participant). The converted MCQ subscore and total converted MCQ score were calculated as well using the same approach.
Data Analysis
Descriptive analyses were conducted for the total converted megacode scores of the performance checklist items, the total converted MCQ score of the written examination, students perceptions of ability and performance, and the sociodemographic characteristics. The internal consistency reliability (Cronbach's α) of the converted megacode subscores and the total converted megacode score (items 1–20) were assessed. For the converted megacode subscores, the performance checklist items were divided into subgroups corresponding with the lessons they tested (lessons 1–4 were made up by items 1–15, lessons 1–5 were made up of items 1–17, lessons 1–6 were made up by items 1–19, and lessons 1–7 were made up by items 1–20).
Several aspects of construct validity were examined. Pearson's product moment correlations were established between each of the lessons for the converted megacode subscores and converted MCQ subscores and for composite subscores for lessons 1–4, 1–5, 1–6, and 1–7. A high correlation would indicate that the MCQ and megacode assessment tool were interchangeable (ie, measured the same phenomenon). In this work, it is desirable to get obtain both convergence (positive correlation) and divergence (not too high a correlation). The relationship between the students' perceptions of their ability and performance and their total converted megacode score were assessed using Pearson's product moment correlation coefficient. These assessments would help us establish the association between student perceptions and the megacode assessment. A backward multiple linear regression analysis was conducted to identify the sociodemographic factors that explained the total converted megacode score. Independent variables included the type of nursery that the participant worked in, years of neonatal experience, and specialty practice. The dependent variable was the total converted megacode score. A change in the F statistic was used as the criterion for inclusion. Statistical significance was set at P < .05. Linear regression helps to determine whether the scores are affected by appropriate and inappropriate phenomenon. For example, the variance in scores should not be affected by gender. The study was approved by the Conjoint Calgary Health Research Ethics Board.
RESULTS
A total of 468 students and 148 instructor volunteers participated in the study to assess the performance checklist in courses offered between October 2004 and January 2005. The majority of participants, both students and instructors, were female nurses practicing in level 3 nurseries. The students had slightly more work experience than instructors. Demographic characteristics are further described in Table 1.
Demographic Profile of Study Participants and Instructors
Data were analyzed descriptively from the megacode assessment forms. Table 2 provides the range of scores, means, SDs, and the percentage of people who scored 0, 1, or 2 for each item. The majority of people assessed had high scores (2) for each of the items on the performance checklist. The total summed megacode scores ranged from the total 7 to 40 with a mean of 36.02 (SD: 4.68). The total converted megacode scores ranged from 0.35 to 1.00 with a mean of 0.94 (SD: 0.09). The total summed MCQ lesson scores ranged from 24 to 98 with a mean of 87.69 (SD: 12.38). The total converted MCQ scores ranged from 0.72 to 1.00 mean of 0.96 (SD: 0.05)
NRP Megacode Item Description
Overall, students had high perceptions of both their ability and their performance. On a 5-point scale, the student range of responses was 1–5, and the means for the perceived ability and performance were 4.23 (SD: 0.73) and 4.2 (SD: 0.75), respectively.
The internal consistency reliability was tested for each of the megacode subscales scores as shown in Table 3. For the megacode lessons 1–4 made up of items 1–15, the Cronbach's α was .63. For lessons 1–5 (items 1–17), the Cronbach's α was .66; for lessons 1–6 (items 1–19), the Cronbach's α was found to be .70; and for lessons 1–7 (items 1–20), the Cronbach's α was .70. These results are described in Table 3. The reliability of the entire megacode performance checklist (ie, items 1–20) was 0.70.
Frequencies of the Megacode Converted Subscale Scores of the MCQ Converted Subscale Scores
When grouped by lessons, the converted subscale scores of the megacode had a significant but low correlation with the converted subscale scores of the MCQ (lessons 1–4, r = 0.23, P < .01; lessons 1–5, r = 0.24, P < .01; lessons 1–6, r = 0.24, P < .01; lesson 1–7, r = 0.25, P < .01).
There was a correlation between the students' perceptions of their ability and their perceptions of their performance (r = 0.79; P < .01). There was, however, a small but statistically significant correlation between students' perceptions of their ability and performance and their total converted megacode scores (r = 0.39, P < .01 and r = 0.37, P < 0 .01, respectively).
Backward multiple regression analysis demonstrated that sociodemographic characteristics of both students and instructors explained very little of the variance in the total converted megacode scores. Three student variables (specialty, years experience, and type of institution) explained 3% of the variance in total converted megacode scores (F3,400 = 3.98; P < .05). Four instructor variables (type of institution, specialty, years of experience, and NRP instructor level) explained 2.3% of the variance in total converted megacode scores (F4,440 = 2.80; P < .05).
DISCUSSION
This study focused on the development and psychometric testing of a performance checklist to assess performance on a megacode. The performance checklist seems to be a feasible way of assessing competence for the megacode under testing circumstances. The response to our invitation to participate (n = 148 instructors with 460 students) was better than we anticipated given the short period for data collection.
Our psychometric analysis provides evidence for both reliability and validity of the instrument. However, it must be recognized that both reliability and validity are temporal states. One builds up a case for both over time and through different types of assessments in different contexts. The internal consistency reliability was 0.70. Although an internal consistency reliability of 0.8 to 0.9 would have been desirable, given the truncated nature of the 0-to-2 scale, this reliability is to be expected.
Several aspects of validity were assessed. First, face and content validity were established through repeated iterations and reevaluations of the instrument at the expert committee level, receiving input from instructors and by testing with videotaped megacodes before its use in this study. Criterion validity was assessed through correlations between the megacode scores and the MCQ scores and between the total converted megacode scores and student perception scores. The correlations between the MCQ and megacode scores are particularly important. In this type of assessment, one looks for evidence of convergence (ie, a positive correlation) and divergence (ie, r < .0). For this analysis, our correlations ranged from an r value of 0.23 to 0.25 suggesting that the MCQ and megacode skill assessment measure different phenomenon. Had the correlation been high, it would have suggested that the 2 assessments were interchangeable. Collectively, the 2 assessments should be helpful in determining readiness to do a megacode in a live situation. The linear regression analyses, conducted as part of construct validity, shows that the variance in ratings on the performance checklist was not appreciably affected by the sociodemographic characteristics of the students or instructors. This type of assessment is important in further ensuring instrument stability. An instrument that is affected by gender, for example, would have limited applications across the spectrum of settings in which it is needed.
There are limitations to this study. Student scores were extremely high. Most instructors used a very limited range of the scale. It is possible that the nature of testing with experienced volunteers (ie, 9 years of practice) affected the scores, as well as the fact that testing was conducted primarily in level 2 and 3 institutions. Additional testing of the instruments is warranted in different contexts. Although there are many ways to further assess the instrument, testing with participants who have a broader range of experience may be useful. Similarly, there will be a need for testing in international settings. Testing with videotape recordings of live resuscitations will be helpful. Testing for interrater reliability (ie, 2–3 instructors assessing the same student) would also be a logical next step.
Nonetheless, we conclude that this performance checklist with 20 items along with the MCQ examination offers a feasible way to assess the skill and knowledge required in a high-risk setting. Our analyses provide evidence for the reliability and validity of the instrument. The newly revised Neonatal Resuscitation Instructor Manual (pp 4–17 and 4–18) includes the megacode assessment forms (ie, basic, lessons 1–4; and advanced, lessons 1–6), and these will be used as part of the assessment process.10
Acknowledgments
We thank Wendy Simon, American Academy of Pediatrics, for support and encouragement. We thank Professor Claudio Violato, PhD, Department of Community Health Sciences, University of Calgary, for consulting on the statistical analyses. A very special thank you goes to the American Academy of Pediatrics NRPSC and all of the instructors and students who participated in the various phases of this work.
Footnotes
- Accepted June 19, 2006.
- Address correspondence to Jocelyn Lockyer PhD, Continuing Medical Education and Professional Development and Department of Community Health Sciences, University of Calgary, 3330 Hospital Dr NW, Calgary, Alberta, Canada T2N 4N1. E-mail: lockyer{at}ucalgary.ca
The authors have indicated they have no financial relationships relevant to this article to disclose.
REFERENCES
- Copyright © 2006 by the American Academy of Pediatrics