ARTICLE |
a Departments of Community Health Sciences
b Continuing Medical Education and Professional Development
c Pediatrics, University of Calgary, Calgary, Alberta, Canada
d Neonatal Perinatal Section, Department of Pediatrics, St Joseph Mercy Hospital, Ann Arbor, Michigan
e Department of Pediatrics
f Centre for Collaborative Health Professional Education, Memorial University of Newfoundland, St Johns, Newfoundland, Canada
| ABSTRACT |
|---|
|
|
|---|
METHODS. A performance checklist of items was created, validated, and modified in sequential phases involving: an expert committee, review, and feedback by Neonatal Resuscitation Program instructors for feasibility and criticality and use of the performance checklist by Neonatal Resuscitation Program instructors reviewing videotaped megacodes. The final 20-item performance checklist used a 3-point scale and was assessed by student and instructor volunteers. Megacode scores, the NRP multiple-choice examination scores, student assessments of their ability and performance, and sociodemographic descriptors for both students and instructors were collected. Data were analyzed descriptively. In addition, we assessed the megacode score internal consistency reliability, the correlations between megacode and multiple-choice examination scores, and the variance in scores based on instructor and student characteristics.
RESULTS. A total of 468 students and 148 instructors volunteered for the study. The instrument was reliable and internally consistent. Student's scores were high on most items. There was a significant but low correlation between the megacode score and the written knowledge examination. Instructor and student characteristics had little effect on the variance in scores.
CONCLUSIONS. This performance checklist provides a feasible assessment tool. There is evidence for its reliability and validity.
Key Words: neonatal resuscitation neonatology megacode clinical assessment simulation competence assessment educational outcome
Abbreviations: NRPNeonatal Resuscitation Program MCQmultiple-choice examination NRPSCNeonatal Resuscitation Program Steering Committee of the American Academy of Pediatrics
Worldwide,
1 million newborn infants die each year because of birth asphyxia. In the United States, 400000 newborns will require assistance with breathing, and
40000 will need extensive resuscitation to survive. The Neonatal Resuscitation Program (NRP) is an educational program developed by the American Academy of Pediatrics and the American Heart Association to help physicians, nurses, and other health care professionals attain the knowledge and skills necessary for proficiency in neonatal resuscitation. The Textbook of Neonatal Resuscitation1 includes 7 lessons describing skills including temperature management, assisted ventilation, chest compressions, endotracheal intubation, emergency medications, special circumstances, and ethics. Currently there are 24 000 NRP instructors and 2 million providers in the United States. At least another 500 000 providers have been trained worldwide through programs in >100 countries.2 Despite the importance of performing neonatal resuscitation skills correctly, investigators have found that content knowledge deteriorates quickly and resuscitators frequently do not perform skills correctly during real resuscitations.3,4 A reliable and valid method to assess a learner's resuscitation proficiency is needed.
The skills to carry out a resuscitation are critical. Six percent of all newborns and 80% of those weighing <1500 g require resuscitation at birth.3 Despite prenatal care, antepartum and intrapartum history, and fetal monitoring during labor, the need for resuscitation cannot always be predicted.4 Furthermore, performance is subject to many pitfalls, including the use of unskilled resuscitators, incorrect intubations, inadequate suctioning of meconium, and postresuscitation problems of hypoglycemia, hypocarbia, and hypotension.5 There is skill attrition6 even with "booster" courses designed to compensate for this loss.7
The NRP assesses learning by administering a multiple-choice examination (MCQ) to determine knowledge and a megacode to determine performance. The megacode is a simulated scenario that requires students to demonstrate their skills of evaluation, decision-making, and performance while observed by an instructor. Megacode scenarios are available from the NRP textbook1 or may be developed from the instructors' own clinical experiences. They may vary in difficulty or complexity. The NRP textbook contains a list of 77 items to be used as a template to assess performance on a megacode (see manual after pages 732).1 The list does not distinguish between critical decisions or actions that are "lifesaving," as opposed to those that may be optional or "nice to do." The list is lengthy, requiring instructors to assess a large number of items in a short period of time. Consequently, it is difficult to ensure that each participant is fairly and uniformly assessed.
The need for adaptations of the NRP megacodes and performance checklist for assessment purposes has been recognized.68 Modifications that researchers have described in the literature include the development of 5 scenarios for testing using a system in which marking was based on activities that were "life supporting" versus life saving,6 the creation of a long 131-item performance checklist in which items were assessed by computer,7 and another that divided the megacode activities into "epochs" of resuscitation.8
This study describes the development and testing of a new tool to assess neonatal resuscitation skills during a megacode. Our research questions were: (1) is it feasible to develop a "short" (ie,
20 items) performance checklist that can be used to assess skills? and (2) what evidence exists that the checklist is valid and reliable?
| METHODS |
|---|
|
|
|---|
20 items), because megacodes are completed in 5 to 10 minutes, and the items on the instrument had to be assessed by a single observer. Each item had to be observable so that the instructor could assess it while standing next to the student without interfering with the progress of the simulated resuscitation. The assessed items needed to be important to the infant's outcome. The items also had to have face validity ("buy-in") from current instructors and experts in the field.
Development of Checklist
The checklist was developed and validated in several phases, described below.
Stage 1
In 2002, 2 of the authors (Drs Singhal and Aziz) created a performance checklist of 19 discrete items believed to be essential to the conduct of neonatal resuscitation. These items were assessed for initial content validity by the NRPSC. The performance checklist was then placed on the NRP Web site (www.nrp.org) for additional review.
Stage 2
In 2003, the NRP posted a notice on an 8000-person online mailing list recruiting volunteers to review the performance checklist. Participants provided feedback on a questionnaire for each of the proposed items indicating how easily they thought they could assess the items and their perception of the relevance of the item to neonatal resuscitation. A total of 822 responders participated, including physicians, nurses, and respiratory therapists. Based on their feedback, the wording of the original 19 items was modified.
Stage 3
In 2004, teams from 5 centers in Canada and the United States created 28 video clips in which instructor/student interactions were videotaped during mock megacodes. Seventeen experienced instructors (physicians, nurses, and respiratory therapists) then viewed an average of 6.6 video clips each, using the performance checklist to score student performance, as well as ease of scoring. They also provided descriptive information about their perceptions of the performance checklist. Based on these data, the performance checklist was modified to include 20 items with a performance scale ranging from 0 to 2 (0 = not done; 1 = partially done; 2 = done). The decision to use a 02 scale was controversial, because some scale developers would advocate a wider scale9 on psychometric grounds. The NRPSC, however, felt that additional intermediate gradations (eg, "done somewhat well" or "done very well") were not clinically relevant in a critical care situation.
Study
The final performance checklist included 20 items (Table 2). Instructors were invited to use the "new" performance checklist while they evaluated students at the end of actual NRP courses. Instructor participants were recruited at a national NRP instructor conference (October 2004) and from the 822 participants in stage 2. Instructors were eligible to participate if they were teaching a course between October 2004 and January 31, 2005. Informed consent was obtained from both instructors and students.
|
Megacode scenarios are not standardized, and the clinical situation in each scenario may involve activities related to only the first 4 lessons or all of the lessons. As a result, the total score varied for each candidate. To control for this variation, it is important to convert scores so that they have a value between 0 and 1. To this end, we established 3 megacode scores using the raw scores (0, 1, and 2). They were: (1) megacode summative subscore: the sum of raw scores for each lesson (ie, for lesson 1, items 1 and 2 were summed; for lesson 2, items 3-7 were summed); (2) converted megacode subscore: a score that calculated the proportion of correct activities; this was created by dividing the megacode summative subscore by the highest possible score for the particular lesson; and (3) total converted megacode score: a score across all of the lessons; this was calculated by summing the raw scores across all of the lessons and dividing the sum by the highest possible score that a participant could get (depending on which lessons were used to test the participant). The converted MCQ subscore and total converted MCQ score were calculated as well using the same approach.
Data Analysis
Descriptive analyses were conducted for the total converted megacode scores of the performance checklist items, the total converted MCQ score of the written examination, students perceptions of ability and performance, and the sociodemographic characteristics. The internal consistency reliability (Cronbach's
) of the converted megacode subscores and the total converted megacode score (items 120) were assessed. For the converted megacode subscores, the performance checklist items were divided into subgroups corresponding with the lessons they tested (lessons 14 were made up by items 115, lessons 15 were made up of items 117, lessons 16 were made up by items 119, and lessons 17 were made up by items 120).
Several aspects of construct validity were examined. Pearson's product moment correlations were established between each of the lessons for the converted megacode subscores and converted MCQ subscores and for composite subscores for lessons 14, 15, 16, and 17. A high correlation would indicate that the MCQ and megacode assessment tool were interchangeable (ie, measured the same phenomenon). In this work, it is desirable to get obtain both convergence (positive correlation) and divergence (not too high a correlation). The relationship between the students' perceptions of their ability and performance and their total converted megacode score were assessed using Pearson's product moment correlation coefficient. These assessments would help us establish the association between student perceptions and the megacode assessment. A backward multiple linear regression analysis was conducted to identify the sociodemographic factors that explained the total converted megacode score. Independent variables included the type of nursery that the participant worked in, years of neonatal experience, and specialty practice. The dependent variable was the total converted megacode score. A change in the F statistic was used as the criterion for inclusion. Statistical significance was set at P < .05. Linear regression helps to determine whether the scores are affected by appropriate and inappropriate phenomenon. For example, the variance in scores should not be affected by gender. The study was approved by the Conjoint Calgary Health Research Ethics Board.
| RESULTS |
|---|
|
|
|---|
|
Overall, students had high perceptions of both their ability and their performance. On a 5-point scale, the student range of responses was 15, and the means for the perceived ability and performance were 4.23 (SD: 0.73) and 4.2 (SD: 0.75), respectively.
The internal consistency reliability was tested for each of the megacode subscales scores as shown in Table 3. For the megacode lessons 14 made up of items 115, the Cronbach's
was .63. For lessons 15 (items 117), the Cronbach's
was .66; for lessons 16 (items 119), the Cronbach's
was found to be .70; and for lessons 17 (items 120), the Cronbach's
was .70. These results are described in Table 3. The reliability of the entire megacode performance checklist (ie, items 120) was 0.70.
|
There was a correlation between the students' perceptions of their ability and their perceptions of their performance (r = 0.79; P < .01). There was, however, a small but statistically significant correlation between students' perceptions of their ability and performance and their total converted megacode scores (r = 0.39, P < .01 and r = 0.37, P < 0 .01, respectively).
Backward multiple regression analysis demonstrated that sociodemographic characteristics of both students and instructors explained very little of the variance in the total converted megacode scores. Three student variables (specialty, years experience, and type of institution) explained 3% of the variance in total converted megacode scores (F3,400 = 3.98; P < .05). Four instructor variables (type of institution, specialty, years of experience, and NRP instructor level) explained 2.3% of the variance in total converted megacode scores (F4,440 = 2.80; P < .05).
| DISCUSSION |
|---|
|
|
|---|
Our psychometric analysis provides evidence for both reliability and validity of the instrument. However, it must be recognized that both reliability and validity are temporal states. One builds up a case for both over time and through different types of assessments in different contexts. The internal consistency reliability was 0.70. Although an internal consistency reliability of 0.8 to 0.9 would have been desirable, given the truncated nature of the 0-to-2 scale, this reliability is to be expected.
Several aspects of validity were assessed. First, face and content validity were established through repeated iterations and reevaluations of the instrument at the expert committee level, receiving input from instructors and by testing with videotaped megacodes before its use in this study. Criterion validity was assessed through correlations between the megacode scores and the MCQ scores and between the total converted megacode scores and student perception scores. The correlations between the MCQ and megacode scores are particularly important. In this type of assessment, one looks for evidence of convergence (ie, a positive correlation) and divergence (ie, r < .0). For this analysis, our correlations ranged from an r value of 0.23 to 0.25 suggesting that the MCQ and megacode skill assessment measure different phenomenon. Had the correlation been high, it would have suggested that the 2 assessments were interchangeable. Collectively, the 2 assessments should be helpful in determining readiness to do a megacode in a live situation. The linear regression analyses, conducted as part of construct validity, shows that the variance in ratings on the performance checklist was not appreciably affected by the sociodemographic characteristics of the students or instructors. This type of assessment is important in further ensuring instrument stability. An instrument that is affected by gender, for example, would have limited applications across the spectrum of settings in which it is needed.
There are limitations to this study. Student scores were extremely high. Most instructors used a very limited range of the scale. It is possible that the nature of testing with experienced volunteers (ie, 9 years of practice) affected the scores, as well as the fact that testing was conducted primarily in level 2 and 3 institutions. Additional testing of the instruments is warranted in different contexts. Although there are many ways to further assess the instrument, testing with participants who have a broader range of experience may be useful. Similarly, there will be a need for testing in international settings. Testing with videotape recordings of live resuscitations will be helpful. Testing for interrater reliability (ie, 23 instructors assessing the same student) would also be a logical next step.
Nonetheless, we conclude that this performance checklist with 20 items along with the MCQ examination offers a feasible way to assess the skill and knowledge required in a high-risk setting. Our analyses provide evidence for the reliability and validity of the instrument. The newly revised Neonatal Resuscitation Instructor Manual (pp 417 and 418) includes the megacode assessment forms (ie, basic, lessons 14; and advanced, lessons 16), and these will be used as part of the assessment process.10
| ACKNOWLEDGMENTS |
|---|
| FOOTNOTES |
|---|
Address correspondence to Jocelyn Lockyer PhD, Continuing Medical Education and Professional Development and Department of Community Health Sciences, University of Calgary, 3330 Hospital Dr NW, Calgary, Alberta, Canada T2N 4N1. E-mail: lockyer{at}ucalgary.ca
The authors have indicated they have no financial relationships relevant to this article to disclose.
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
M. B. Brett-Fleegler, R. J. Vinci, D. L. Weiner, S. K. Harris, M.-C. Shih, and M. E. Kleinman A Simulator-Based Tool That Assesses Pediatric Resident Resuscitation Competency Pediatrics, March 1, 2008; 121(3): e597 - e603. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||