Published online October 2, 2006
PEDIATRICS Vol. 118 No. 4 October 2006, pp. 1380-1387 (doi:10.1542/peds.2006-0326)
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow P3Rs: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when P3Rs are posted
Right arrow Alert me if a correction is posted
Services
Right arrow E-mail this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My File Cabinet
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via CrossRef
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Feldman, M. J.
Right arrow Articles by O'Rourke, E. J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Feldman, M. J.
Right arrow Articles by O'Rourke, E. J.
Related Collections
Right arrow Office Practice

ARTICLE

Evaluation of the Clinical Assessment Project: A Computer-Based Multimedia Tool to Assess Problem-Solving Ability in Medical Students

Mitchell J. Feldman, MDa,b,c, G. Octo Barnett, MDc, David A. Link, MDa,d, Margaret A. Colemana,d, Janice A. Lowe, MDa,b and Edward J. O'Rourke, MDa,e

a Department of Pediatrics, Harvard Medical School, Boston, Massachusetts
b Department of Pediatrics
c Laboratory of Computer Science, Massachusetts General Hospital, Boston, Massachusetts
d Department of Pediatrics, Cambridge Hospital and Mount Auburn Hospital, Cambridge, Massachusetts
e Department of Pediatrics, Children's Hospital, Boston, Massachusetts


    ABSTRACT
 TOP
 ABSTRACT
 METHODS
 RESULTS
 CONCLUSIONS
 REFERENCES
 
OBJECTIVE. The purpose of this work was to describe Clinical Assessment, a computer-based multimedia patient simulation used to assess the problem-solving abilities of medical students and to evaluate its capacity to guide the assignment of course grade.

METHODS. This was a multisite reviewer-blinded comparison of course grades, National Board of Medical Examiners pediatric examination score, and Clinical Assessment scores at 3 pediatric clerkship sites of the Harvard Medical School. Participants included 470 students completing their pediatric clerkships. Each student's performance on ≤4 Clinical Assessment patient case simulations was compared with National Board of Medical Examiners pediatric examination scores and course grades assigned by clerkship directors based on overall ward performance.

RESULTS. Data from both the National Board of Medical Examiners pediatric "shelf" examination and the course grade were available for 411 students who completed ≥1 Clinical Assessment case. There was a strong correlation between Clinical Assessment score and course grade when comparing students receiving honors versus satisfactory category course grades. Students who ordered more expensive or greater numbers of laboratory tests did not achieve greater diagnostic accuracy on Clinical Assessment. Clinical Assessment had a high positive predictive value for course grade: 95% of students scoring ≥90% on Clinical Assessment achieved an honors category course grade.

CONCLUSIONS. Because nearly all of the students who scored very well on Clinical Assessment received honors category course grades, future high scorers on this examination merit consideration for assigning a high course grade. A computer-based multimedia patient simulation assessment tool provides objective information that can complement a student's National Board of Medical Examiners score and course grade and may assist in evaluating clinical problem-solving ability.


Key Words: pediatrics • multimedia • educational measurement • medical students • computers

Abbreviations: CA—Clinical Assessment • PE—physical examination • PPV—positive predictive value • HMS—Harvard Medical School • NBME—National Board of Medical Examiners

We developed, implemented, and evaluated a computer-based patient simulation program, Clinical Assessment (CA), as a technique to complement the assessment of ward performance.1 We designed it to examine the problem-solving abilities of Harvard medical students in their core pediatric clinical rotation. The program is a multimedia simulated clinical encounter, focusing on determining students' abilities to obtain appropriate histories, physical examinations (PEs), and diagnostic tests and to select reasonable diagnoses, treatment plans, and assessments of severity. The CA score, it was hoped, might assist the faculty in determining whether medical students bordering between course grades of satisfactory and honors might deserve the higher grade. Although computers are increasingly being used as assessment tools in medicine,2 we believe that this is the first description of a multimedia program used as an assessment tool for a medical student clerkship in the pediatric domain.

Other investigators found the computer-based simulation to be well accepted by students and determined that a similar format is equivalent as an alternative learning method to lectures and group discussions.3,4 Educators have long known that multiple-choice examinations lead to a "cueing" effect, which can prevent the accurate determination of a student's knowledge. Anbar5 has shown that medical students who "demonstrated blatant ignorance in one or more areas of knowledge performed poorer on open-ended questions than the rest of the class. However these same students performed comparably to the rest of the class on multiple-choice questions covering the same material." Veloski et al6 have reported on the cueing effects of multiple-choice questions and advocate open-ended questions. Friedman7 also comments on the cueing effect of multiple-choice interaction as compared with the higher-fidelity simulation offered when using natural language. CA allows students to enter narrative text diagnostic assessments rather than to choose from a constrained list of diagnoses. This enhancement was a design objective to reduce the cueing problem.

Computer-based simulation tests measure student performance in a different way than do faculty. Schwartz8 used nonmultimedia computer simulations to evaluate medical students at the end of a pediatric clerkship. When interpreting the simulation score data as predicting honors or high-pass performance as graded by the faculty, Schwartz' data show a positive predictive value (PPV) of 0.77. Schwartz8 proposes that the computer-based assessment of student performance, with its more objective measures, may be more accurate than that of the faculty.

Millos et al9 developed a multimedia computer-based tool to measure clinical bedside skills of medical students in the domain of neurology. The authors found that, compared with actual standardized patients, the computer-based test offers the advantages of uniformity of patient findings, increased availability of and access to testing, and streamlined retrieval by faculty of administrative reporting, such as student performance and time spent.

Rogers et al10 discuss that a simulation environment is a more effective test of learning than a written examination. The authors feel that simulations more completely measure a student's ability to apply the knowledge to problem solving using higher analytic skills.


    METHODS
 TOP
 ABSTRACT
 METHODS
 RESULTS
 CONCLUSIONS
 REFERENCES
 
This study was approved by the Harvard Medical School (HMS) Committee on Human Studies.

The CA Examination
The student "works up" a simulated patient by ordering history, PE, and laboratory items from menus and by entering narrative text diagnostic impressions at specific junctures in the exercise. Initially, the student is presented a picture along with a brief clinical scenario about this patient (Fig 1). The student is asked to enter a preliminary set of diagnostic hypotheses that should be considered based on this introductory information. Each diagnostic entry is processed through the text recognition algorithm of the program. Extensive synonym matching and spelling correction allow for a high proportion of matches of the student's text entry to the program's clinical vocabulary. If the computer program does not recognize the text entry after several attempts, the program presents a list of body systems (eg, cardiac, respiratory, or gastrointestinal) and etiologies (bacterial, viral, or environmental); this allows the student to record diagnostic interpretations in a less specific format, although with the sacrifice of some cueing.


Figure 1
View larger version (69K):
[in this window]
[in a new window]
 
FIGURE 1 Brief clinical scenario.

 
After the initial entry of diagnostic hypotheses, the student is asked to perform a focused workup by selecting ≤10 items each from the history and PE menus. Each menu contains ~30 entries (history questions on the history menu and body systems on the PE menu; see Fig 2). The student is informed that the appropriate selection of items useful in determining the patient's diagnosis or in assessing the severity of illness should be given primary consideration, because such selections will generate a higher score. As the student selects history, PE, and laboratory items, the responses are displayed. Some of these responses are text, and others are either sounds (eg, heart auscultation) or high-resolution images (eg, chest radiographs, photographs, etc). After collecting the history and PE information, the student is required to enter a revised diagnostic impression. The student is then allowed to select laboratory examinations and to collect further history and PE information. Once the student is ready to commit to a final diagnosis, no further information may be collected.


Figure 2
View larger version (72K):
[in this window]
[in a new window]
 
FIGURE 2 History menu.

 
The computer program then challenges the student to justify the clinical assessment by specifying from the set of all of the history, PE, and laboratory information items gathered those selections that were felt to be most important diagnostically and those that best indicate the level of severity of the patient's illness. In the last stage of the interaction, the student selects items from a menu to order therapy and designate a disposition. The student is given 1 practice case, and then the option to take ≤4 challenge simulations derived from several different pediatric problem areas, for example, "cough and fever" and "abdominal pain."

Case Development
The development of the case simulations is a multiple step process. Initially, a topic area is chosen that includes a group of 10 to 20 diagnoses on the basis that each of these diagnoses is characterized and differentiated by a similar set of history, PE, and laboratory examinations. The diagnoses for the cases to be constructed are chosen. The history, PE, laboratory, and treatment items that are appropriate for the domain are specified and grouped into menus.

For example, the domain "pediatric cough and fever" includes diseases such as croup, bacterial pneumonia, and tuberculosis, among others. Figure 2 shows the history menu for the cough and fever domain. The PE for this domain has an emphasis on the respiratory system, for example, retractions, chest auscultation, and inspiratory-expiratory ratio. The laboratory for this domain includes basic laboratory tests, such as serum chemistries, complete blood cell count, basic radiograph studies, and cultures and also includes purified protein derivative, acid-fast stain, pertussis antibody, and cold agglutinins.

The presentation of multimedia responses (chest radiograph and audio of cough) whenever appropriate, make the case simulations more interesting to the student. The acquisition of quality multimedia content is one of the more time-consuming tasks involved in case development.

Pediatric faculty members (M.J.F., D.A.L., and M.A.C. all practice general pediatrics; M.J.F. has training in medical informatics; and D.A.L. is also a pediatric nephrologist) construct each case to be a classic presentation of a disease that students could encounter in a pediatric emergency department. Specific case diagnoses are ones that the faculty agrees should be seen during the pediatric clerkship. During case development, the faculty specifies diagnoses to be considered competing alternative diagnoses to the case diagnosis and those that could be considered unlikely but possible diagnoses. For each case, the faculty also categorizes each history, PE, and laboratory item as very important, important, screening, or contraindicated for either establishing the diagnosis or for assessing severity. The faculty also indicates those therapy and disposition items that are suitable for each case. The time required of the faculty to assign and agree on these attributes is another time-consuming task in case development. The development time was ~60 hours per case. One faculty member spent ~40 hours, and 2 other faculty spent ~10 hours each. This excludes the initial time spent planning the overall project including flow of control, scoring algorithm, and so forth and the time spent by programmers. The cost estimate was approximately $50000 in general project development plus $5000 for each case. For this study, students were presented with 4 case simulations of the 5 initial cases developed. Six additional cases were subsequently created.

Scoring
The student receives a score for each of the following categories in the CA exercise: diagnostic assessment, effective collection of information, efficiency in collection of information, identification of items important for diagnosis and for assessing severity (justification), and treatment score. These scores are calculated as described below.

The diagnostic score rewards correct entry of diagnostic assessments at the initial, revised, and final diagnosis stages. The student receives points for competing and possible diagnoses or fewer points for simply identifying the correct system/etiology.

In the effective collection of information subscore, points are added or subtracted for items selected by the student that were designated by faculty developers as important or contraindicated, respectively. In the efficiency in collection of information category, students who order inappropriate PE or laboratory items lose points. The student is awarded points in the justification section for correctly identifying the 4 items felt to be most important for diagnosis and the 4 items most important for establishing the severity of the simulated patient's illness. In the treatment section, the scoring is based on the case diagnosis, which may not be one of the student's diagnoses. A student is rewarded for choosing treatment items designated by the developer as appropriate; points are lost if important items are not ordered or if ordered items were designated as contraindicated.

The maximum score per case is 100 points derived from the following: diagnosis, 35 points; information collection, 29 points; efficiency, 10 points; justification, 16 points; and treatment, 10 points. The rationale was to emphasize and reward the process of gathering data (history and PE) and justifying choices. Therefore, 45% of the examination score was composed of these areas. An additional 35% was assigned to diagnosis, because an important focus of this test was to assess the student's ability to arrive at the case diagnosis. Average scores (based on the average of all of the cases completed by a student) are calculated both for totals and for each subscore category.

Two-sample t tests allowed comparison of mean CA scores between course grade category cohorts (honors and satisfactory). {chi}2 analysis allowed the comparison by course grade of the proportion of students completing all 4 of the CA cases with the proportion that completed <4. Correlation analysis was performed on CA scores with National Board of Medical Examiners (NBME) scores, and a Pearson correlation coefficient was determined. The students' CA scores were analyzed, and sensitivity, specificity, positive, and negative predictive value results were calculated at specific cutoff thresholds. The definitions of these terms as applied to our study are: (1) sensitivity: number of students receiving an honors course grade who scored high on CA examination (true-positive)/number students receiving honors; (2) specificity: number of students receiving course grade of satisfactory, incomplete, or unsatisfactory who scored low on CA examination (true-negative)/number of students receiving course grade of satisfactory, incomplete, or unsatisfactory; (3) PPV: number of students receiving an honors course grade who scored high on the CA examination (true-positive)/number students scoring high on CA; and (4) negative predictive value: number of students receiving course grade of satisfactory, incomplete, or unsatisfactory who scored low on the CA examination (true-negative)/number students scoring low on CA.

Data from 470 students were available for analysis. Each student was in the third or fourth year at HMS and took the CA test toward the end of the pediatric clerkship. There were 5 cases used in the testing simulations. Students were arbitrarily assigned 4 of the 5 possible cases, and each student completed ≥1 CA case. A total of 371 students (90%) completed 4 cases, 27 students (7%) completed 3 cases, 8 students (2%) completed 2 cases, and 5 students (1%) completed 1 case. Students were required to take the CA test and were told that it would not influence the course grade. Data regarding the NBME pediatric examination and course grade score were available from HMS for 411 students. HMS students perform their pediatric clerkships at 1 of 3 sites: Massachusetts General Hospital, Cambridge Hospital, or Children's Hospital. None of the sites used the CA score in determining a student's course grade. The Cambridge site (51 students) used only ward performance, whereas the Children's site and Massachusetts General sites (360 students) used mostly ward performance but used the NBME rarely in borderline situations. A student's ward performance grade is not rigidly standardized among all of the sites. Clerkship directors use their own evaluations along with those of housestaff and attendings, regarding the students' clinical and interpersonal skills in assessing ward performance. Student course grades were based on many factors, including case presentation, clinical skills, attendance, and journal club participation; clinical problem solving ability was not specifically assessed.

The data on these 411 students were deemed to be a calibration set. We compared the students' CA performance with their course grades and NBME scores. If any correlation were determined, then the results of the CA test from future cohorts of students could prove useful to faculty in assigning course grade.


    RESULTS
 TOP
 ABSTRACT
 METHODS
 RESULTS
 CONCLUSIONS
 REFERENCES
 
CA Performance Versus Course Grade and NBME Score
The average total CA examination score was 79 with a range of 38 to 96. There were significant differences between those who received honors or high honors category course grades and those who received satisfactory for total CA score, diagnostic accuracy, information collection, and therapy, as shown in Table 1. There was a small correlation between the average total score for all of the CA cases and the NBME score (r = 0.29; P = .0001).


View this table:
[in this window]
[in a new window]
 
TABLE 1 Comparison of CA Scores (Total and Subscores) With Course Grades

 
The honors grades used in the HMS grading system are honors and high honors. There was a small but not statistically significant difference in the performance on CA between those students who received a course grade of high honors and those who received honors (CA scores: high honors: 81.7; honors: 79.7; P = .095). For simplicity, we will hereafter use the term "honors category" to refer to the combination of honors and high honors grades. A total of 186 students (45%) received a satisfactory course grade, and 222 students (54%) received an honors category course grade. The remaining 3 students (1%) received either an incomplete or an unsatisfactory course grade.

Table 2 displays the CA scores and course grades of the top scoring students on CA. CA is not a sensitive test for an honors category course grade. There was considerable scatter among students receiving honors course grades, with a number of honors students achieving only fair performance on CA (false-negatives). However, CA does have a high PPV at a selected threshold, which was achieved by the top 5% of students. A review of our data revealed an obvious breakpoint at a CA score of 90%, with virtually all of the students who achieved a score of ≥90 on CA receiving an honors category course grade. Table 3 presents sensitivity, specificity, and positive and negative predictive values for the data at CA score thresholds of 90, 85, 80, and 70.


View this table:
[in this window]
[in a new window]
 
TABLE 2 CA Performance of Top-Scoring Students by Course Grade Category

 

View this table:
[in this window]
[in a new window]
 
TABLE 3 CA Test

 
CA Scores Over Time, by Site and by Number of Cases Completed
Because the same 5 cases were used during the 4-year period of the study, we looked for significant changes in scores during this time. The data showed no change: the mean total scores were 78.4, 78.4, 79.0, and 78.0 for years 1 through 4, respectively, and the 95% confidence intervals overlapped for all of the years. We also looked for differences in CA and NBME performance by site. Differences could be attributable to different case-mix exposure at the sites, different levels of ability among students or teachers, or other factors. The data showed no significant differences between the 3 sites, because the 95% confidence intervals overlap among the 3 sites: the mean total CA scores (95% CI) were 78.0 (75.2–80.8), 79.3 (78.3–80.3), and 77.3 (75.7–78.9) at Cambridge, Children's, and Massachusetts General hospitals, respectively. The mean NBME scores (95% CI) were 471 (443–500), 506 (494–518), and 517 (496–538), respectively. The 40 students who completed fewer than the recommended 4 cases did not perform significantly worse on CA than the 371 students who completed all 4 of the cases (total CA scores: 76 and 79, respectively; P = .19), nor were there any significant differences between these 2 groups on course grades assigned (18 honors category and 22 satisfactory vs 204 honors category and 164 satisfactory, respectively; P = .45).

Comment
The purpose of the CA project was to develop an objective measure to evaluate clinical assessment skills of third-year medical students. That there was only minimal correlation with the NBME examination may suggest that the 2 exercises in part measure different clinical problem-solving abilities. NBME multiple choice written tests assess mostly factual knowledge. CA, by contrast, queries the student at different stages of a simulated patient workup for a differential diagnosis; asks the student to collect history, PE, and laboratory information; and allows the student to order therapy and justify choices made. It thereby assesses thought processes and clinical reasoning skills during the course of a workup. Students must possess a certain fund of knowledge about the simulated case topic to make the decisions assessed by CA. It is probable that the small but significant correlation between the CA score and the NBME scores in part reflects the shared testing of factual knowledge common to both exercises.

It is likely that a student's clinical reasoning ability as demonstrated to housestaff and faculty is only 1 determinant of the course grade evaluation. Undoubtedly there are factors other than fund of knowledge and clinical reasoning ability that play a role in the instructor assigning an honors grade to a particular student. Such factors include extra effort to listen and learn, a warm and generous personality, and providing compassionate and understanding care to a patient. That CA was able to discriminate, to a limited extent (low sensitivity and high specificity at a particular threshold), students who received a satisfactory course grade from those who received honors category course grades could be a reflection that it met its design objective in measuring not just fund of knowledge but also clinical problem-solving ability.

The traditional method of evaluating a student's clinical performance on a ward rotation usually includes observations by, and interactions with, housestaff and faculty. This method is necessarily subjective and prone to interobserver variation, because different students will usually work with different housestaff and/or faculty. Different criteria for evaluation and varying quality standards will likely apply to different evaluators. It is probable that even if these evaluation criteria could be standardized formally to help reduce interobserver variability, there would still remain some variation. Furthermore, no 2 students are evaluated on the same patient case. The nuances of a particular patient case could affect significantly the student's evaluation.

In CA, all of the students are assessed using the same objective evaluator, and the problem-solving skills are analyzed using the same set of cases and scoring algorithms. Therefore, CA could play an important role in reducing the variation in interobserver evaluation and the heterogeneity of patient cases inherent in the traditional methods.

Although CA is not a sensitive test and, therefore, could not supplant the more traditional course grade, it does have a high PPV for course grade among students who perform very well on it. Table 3 shows the high PPV of CA at the highest threshold scores and the high specificity (of all students receiving only a satisfactory course grade in the HMS grading system, few scored very highly on the CA test).

It is reasonable to consider whether course grade assignment might be affected by the variation among rotation sites in the use of NBME scores in determining course grades. The Cambridge site (12% of students) did not use the NBME at all in determining students' course grades, whereas the Massachusetts General and Children's sites (88%) used it rarely and minimally. Therefore, we suspect that the NBME would be unlikely to affect the data significantly, because it is used so rarely in course grade determination. Also, whatever rare use there was of NBME scores was similarly applied across 88% of students.

It may seem circuitous that a test (CA) designed to help guide the assignment of course grades is itself validated based on correlation to the course grade. Because this is the first time the CA test was administered, the primary purpose was a proof of concept. These data represent a calibration set, so it was necessary to compare the performance results on this test with the classical methods of grading to help interpret how future results might be used. Based on our experience with >400 medical students during this 4-year study, we are persuaded that the CA test can be a useful adjunct to guide educators in the evaluation of student performance. Specifically, because of the high PPV of CA at the highest scores, those students who achieve a very high score on CA, but who might have otherwise received only a fair course grade, might merit extra consideration for a higher course grade. Because this study was conducted at only 1 institution (albeit at 3 distinct sites), these data may not generalize to other institutions. It is likely that different cutoffs of CA scores would need to be used to calculate the PPV of the test to calibrate the institutional variability in the prevalence of honors category course grades assigned (54% at HMS). At HMS, selecting a cutoff of the top 10% of CA scores translates into a PPV of 95% (sensitivity: 0.08; specificity: 0.99; Table 3). If the prevalence of honors grades assigned decreased dramatically to, say, 15%, the PPV would decrease accordingly to 63%, assuming the same cutoff. This illustrates the importance of tailoring the cutoffs to the study population and illustrates the high dependence of the PPV of CA on the prevalence of the honors course grades. The HMS grading system does not have a "high pass" category. It is likely that "honors" is used at HMS to describe the level of performance that high pass describes at other institutions. This may explain the seemingly excessive percentage of honors category course grades.

It is heartening that the data showed that the extent of the laboratory workup had no correlation with the degree of diagnostic accuracy. Thus, students who ordered a small, carefully selected set of laboratory tests were able to achieve the same levels of diagnostic accuracy as those who ordered multiple laboratory tests. The old professorial adage that the diagnosis can be primarily made on history and PE was borne out.

In 2 of the 5 case simulations, students did not choose the correct case diagnosis as the final diagnosis with considerable frequency (20% and 9%). In the case with 20% incorrect final diagnoses, both the average diagnostic score and average treatment score (which is based on the case diagnosis regardless of the student's entry) were much lower than these scores from the other cases. In the case with 9% incorrect final diagnoses, however, the average treatment score was not affected. This suggests that, just as can occur in clinical practice, students on this examination may treat the (simulated) patient correctly even when the diagnosis is not clear.

We were reassured to see that the CA scores were very consistent throughout the entire 4-year period of the study. This alleviated the concerns that differences in the didactic material covered during the clerkships, in patients seen during the clerkships, in the abilities of the teachers or students, or other factors might result in changes in performance on CA from year to year. The consistency of scores also suggests that teachers who were familiar with the cases did not consciously or subconsciously cover the material that the CA cases tested more thoroughly in passing years and that there was not leakage of the CA case diagnoses from year to year.

Computer-based simulation teaching programs have been used successfully for many years.11 Others have described development of computer-assisted instructional cases for use in pediatric training.12 These case simulations have gained wide acceptance as a teaching tool in many medical schools but are not designed as a test. Individual cases were highly rated by students, although the entire series of cases were rated less well as a whole, attributed to the time duration involved in completion. During the 4 years that CA was used, there was an enthusiastic reception by both the faculty and the students, with the students' only objection being that they would like to have had immediate feedback and analysis of their test performance. A recurring comment was the request to have available other self-assessment teaching programs based on the same model. This system was designed specifically as a test instrument to help guide the assignment of course grade.

It was also encouraging to see that the CA score, just as the NBME score, was independent of the rotation site location. This information can be useful to the clerkship directors at the different sites in assessing the completeness of the educational experience offered. In its recommendations about educational programming for the MD degree, the Liaison Committee on Medical Education listed several objectives. One of these (ED-2) requires that clerkships ensure that students have a standardized experience with regard to the numbers and types of real or simulated patient interactions.13 Because CA is designed as a test and not an educational resource, per se, it would not satisfy this ED-2 requirement. However, the approach of using the same clinical simulations at different rotation sites of a medical school clinical clerkship is in the spirit of this LCME objective. If teaching programs were developed on this model, we believe that they would satisfy the requirement.


    CONCLUSIONS
 TOP
 ABSTRACT
 METHODS
 RESULTS
 CONCLUSIONS
 REFERENCES
 
We are optimistic that programs similar to CA can serve as a useful supplemental evaluative tool in the assessment of medical students. Our data demonstrate that CA has a high predictive value for course grade for the top scoring students (PPV = 95% for students scoring ≥90% on CA). Students who score very well on CA may merit a higher course grade than may have been otherwise considered. The development of a student's humanistic qualities and interpersonal skills will continue to be an important focus of medical education. Imparting these qualities and teaching these skills are time consuming but important roles of medical school faculty. However, with these faculty increasingly busy with their own clinical duties and research and administrative responsibilities, there is less time both for this important aspect of teaching and also less time available for thorough evaluation of medical students. Using a program, such as CA, in an adjunct capacity as an objective evaluator may prove both pedagogically sound and cost-effective.


    ACKNOWLEDGMENTS
 
This work was supported in part by an unrestricted grant (computer equipment) from the Hewlett Packard Corporation.

We gratefully appreciate the help of the following individuals: Kathleen Famiglietti, Richard Kim, and Wayne Raila for programming assistance; Patricia McArdle for administrative assistance and coordination with the medical school; Jeanhee Chung, MD, for statistical assistance; and the 470 Harvard Medical School students for participating in this project.


    FOOTNOTES
 
Accepted Jun 1, 2006.

Address correspondence to Mitchell Feldman, MD, FAAP, Laboratory of Computer Science, MGH, 50 Staniford St, 7th Floor, Boston, MA 02114. E-mail: mfeldman{at}partners.org

The authors have indicated they have no financial relationships relevant to this article to disclose.

Dr Lowe's current address is Division of General Pediatrics, Stanford University, Lucile Packard Children’s Hospital, 725 Welch Rd, Palo Alto, CA 94304.


    REFERENCES
 TOP
 ABSTRACT
 METHODS
 RESULTS
 CONCLUSIONS
 REFERENCES
 

  1. Barnett GO, Link DA, Feldman MJ, et al. Clinical assessment –A computer-based aid to assessing the clinical problem solving ability of medical students. Proc Annu Symp Comput Appl Med Care. 1994:1045
  2. Cantillon P, Irish B, Sales D. Using computers for assessment in medicine. BMJ. 2004;329 :606 –609[Free Full Text]
  3. Wu AH, LaRocco M, Fath SJ, Simon FA. Evaluation of computer case simulations for teaching clinical pathology to second-year medical students. Ann Clin Lab Sci. 1990;20 :154 –160[Abstract]
  4. Christenson J, Parrish K, Barabe S, et al. A comparison of multimedia and standard advanced cardiac life support learning. Acad Emerg Med. 1998;5 :702 –708[ISI][Medline]
  5. Anbar M. Comparing assessments of students' knowledge by computerized open-ended and multiple-choice tests. Acad Med. 1991;66 :420 –422[ISI][Medline]
  6. Veloski JJ, Rabinowitz HK, Robeson MR. Cueing in multiple choice questions: a reliable, valid and economical solution. Proc Annu Conf Res Med Educ. 1988;27 :195 –200[Medline]
  7. Friedman CP. Anatomy of the clinical simulation. Acad Med. 1995;70 :205 –209[ISI][Medline]
  8. Schwartz W. Documentation of students' clinical reasoning using a computer simulation. Am J Dis Child. 1989;143 :575 –579[Abstract]
  9. Millos RT, Gordon DL, Issenberg SB, et al. Development of a reliable multimedia, computer-based measure of clinical skills in bedside neurology. Acad Med. 2003;78(suppl 10) :S52 –S54[CrossRef]
  10. Rogers PL, Jacob H, Rashwan AS, Pinsky MR. Quantifying learning in medical students during a critical care medicine elective: A comparison of three evaluation instruments. Crit Care Med. 2001;29; 1268 –1273[CrossRef][ISI][Medline]
  11. Barnett GO. The use of a computer-based system to teach clinical problem solving. In: Stacy RW, Waxman BD, eds. Computers in Biomedical Research, Vol. IV. New York, NY: Academic Press; 1974:301–319
  12. Fall LH, Berman NB, Smith S, White CB, Woodhead JC, Olson AL. Multi-institutional development and utilization of a computer-assisted learning program for the pediatrics clerkship: the CLIPP Project. Acad Med. 2005;80; 847 –855[CrossRef][ISI][Medline]
  13. Association of American Medical Colleges. Functions and Structure of a Medical School. Standards for Accreditation of Medical Education Programs Leading to the M.D. Degree. Liason Committee on Medical Education, Association of American Medical Colleges; 2005

PEDIATRICS (ISSN 1098-4275). ©2006 by the American Academy of Pediatrics




This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow P3Rs: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when P3Rs are posted
Right arrow Alert me if a correction is posted
Services
Right arrow E-mail this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My File Cabinet
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via CrossRef
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Feldman, M. J.
Right arrow Articles by O'Rourke, E. J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Feldman, M. J.
Right arrow Articles by O'Rourke, E. J.
Related Collections
Right arrow Office Practice