Chest Roentgenogram in the Evaluation of Heart Defects in Asymptomatic Infants and Children With a Cardiac Murmur: Reproducibility and Accuracy
Objectives. To evaluate the reproducibility and the accuracy of pediatric radiologists' assessments of chest radiographs with respect to the presence or absence of heart defects in children with an asymptomatic heart murmur.
Design. Ninety-eight children, ages 1 month to 15 years (median, 30.1 months), referred for evaluation of a heart murmur were consecutively included. They all had a standard chest radiograph and a color Doppler echocardiograph (CDE) performed. Six specialists in pediatric radiology evaluated the chest radiographs independently on two occasions 6 months apart. The radiologists were asked to classify each set of films into one of two categories: heart disease or no heart disease. The outcome of the CDE was considered the definite diagnosis. κ statistics were used to analyze the reproducibility of the radiologic assessments. Sensitivity, specificity, and the predictive value of a positive and a negative test were used for evaluation of the accuracy of the radiologic assessments.
Results. Mean intra- and interobserver κ values were all <0.6, and the majority were <0.4. Mean sensitivity was 0.3 (range: 0.17–0.52), mean predictive value of a positive test was 0.4, implying that 60% of the positive assessments were falsely positive. Mean specificity was 0.86 (range: 0.75–0.93) and the mean predictive value of a negative test was 0.80 implying that 20% of the negative assessments were falsely negative.
Conclusion We found a low reproducibility, as well as a low accuracy, of the radiologic assessments of the chest radiographs of children with an asymptomatic heart murmur with respect to the presence or absence of heart disease. A false-positive radiologic assessment of the chest radiograph with respect to heart defects causes unnecessary anxiety and further examinations, whereas a false-negative assessment might result in omission of relevant investigations and proper identification of the heart defect. We cannot recommend the use of chest radiographs in the initial evaluation of the asymptomatic child with a heart murmur. If a heart defect cannot be excluded by clinical examination a CDE must be performed.
We recently studied the value of various methods in the evaluation of otherwise asymptomatic children with a heart murmur, and we found that chest radiography did not contribute usefully to the evaluation of these children.1 The chest radiographs were evaluated blindly by various radiologists during the daily routine. Our observations made us wonder whether the failing diagnostic contribution of the chest radiograph in the primary evaluation of asymptomatic children with a heart murmur could be ascribed to the method or whether it should be ascribed to variations in the diagnostic standards of the radiologists working in the daily routine.
In the present study we examined the reproducibility and accuracy of specialists in pediatric radiology in their evaluation of chest radiographs with respect to the presence or absence of heart defects in asymptomatic children with a heart murmur.
PATIENTS AND METHODS
One hundred children, ages 1 month to 15 years (median, 30.1 months), referred for the first time to the Department of Pediatrics at Odense University Hospital for further evaluation because of a heart murmur were consecutively included in the study. The children had no symptoms or other abnormal findings.
A standard chest radiograph (anteroposterior and lateral projections) and a color Doppler echocardiograph (CDE) were routinely obtained in all children. The echocardiographist did not know the result of the clinical examination when making the CDE.
Six specialists in pediatric radiology, with experience in pediatric cardiologic radiology, evaluated all the chest radiographs independently. The radiologists were informed that the children had a heart murmur, but did not have access to any other data (including the results of the cardiologic investigation). Each radiologist classified each set of films into one of two categories: heart disease or no heart disease.
Six months after the initial evaluation, the radiographs were circulated to the same radiologists for a repeated evaluation. In this way, a total of 12 evaluations were obtained for each set of films. An abnormality score, giving 1 point for heart disease and 0 for no heart disease, was calculated for each set of films, which in this way could obtain a maximum score of 12 points.
Two sets of films could not be retrieved, so the analysis is comprised of 98 sets.
The reproducibility of the assessments of the chest radiographs (intra- and interobserver variation) was evaluated by κ statistics (see Appendix 1).2
The accuracy of the assessments of the chest radiographs was evaluated by comparison with the outcome of the CDE, which was considered the definitive diagnosis. The sensitivity, specificity, and the predictive value of a positive and a negative test were calculated according to standard methods.3
Table 1 shows the results of the CDE. Twenty-three of the 98 children had a heart abnormality.
The κ values comparing each of the six radiologists' duplicate evaluations are shown in Table 2. The values have been calculated for all the 98 sets of films on the one hand, and for the sets from children with normal and abnormal CDE, respectively, on the other. When all the films were considered, the κ values ranged from 0.33 to 0.64, indicating a low level of agreement between the first and second evaluation of the individual radiologists. When only films from children with abnormal CDE were considered, the κ values ranged from 0.49 to 0.78, indicating a slightly better agreement.
In Table 3 the mean intraobserver (within observer) κ values are shown together with the mean interobserver (between observer) κ values obtained by comparing each of the radiologists' evaluations with those of the others. Again, the calculations have been performed for the whole group of children as well as separately for those with normal and abnormal CDE, respectively. The overall agreement between the radiologists (interkappa) was lower than the average agreement found for the individual radiologist (intrakappa). However, when only films from children with abnormal CDE were considered, the average interobserver agreement was at the same level as the average intraobserver agreement.
Table 4 shows the distribution of the abnormality scores. Eight points or more were obtained in 5 of 23 chest radiographs from children with an abnormal CDE, but only in 1 of 75 of those with a normal CDE. On the other hand, 7 of 23 chest radiographs from children with an abnormal CDE and only 26 of 75 with a normal CDE scored 0 points.
Table 5 shows the radiologists' assessments compared with the CDE diagnosis. Mean number and range of true-positive, false-positive, false-negative, and true-negative answers, respectively, of the 12 evaluations are shown. It has to be remembered, that 23 of the 98 children had structural heart disease diagnosed by CDE. Taking the outcome of the CDE as the definitive diagnosis, the sensitivity, the specificity, and the predictive value of a positive and a negative evaluation of the chest radiograph were calculated. The mean predictive value of a positive test was 0.4 (range: 0.27–0.86) implying that 14% to 73% of the radiographs classified as abnormal, in fact, represented normal children, whereas the mean predictive value of a negative test was 0.8 (range: 0.78–0.85), which means that 15% to 22% of the radiographs classified as normal, in fact, belonged to children with a heart defect.
When asymptomatic children with a heart murmur are referred for evaluation, a chest radiograph is routinely obtained in many centers.4–6 Some investigators have questioned the value of chest radiographs in the initial evaluations of these children,7,,8 whereas others find the chest radiograph to be a valuable tool for the pediatric cardiologist in evaluation of patients with heart murmur.5,,6 However, the design of these studies made it difficult to unequivocally answer the question if the chest radiograph is useful in distinguishing between an innocent and a noninnocent heart murmur in asymptomatic children. For this purpose, we tested the reproducibility and accuracy of six pediatric radiologists' assessments of the chest radiographs, with respect to the presence or absence of heart defect, in children with a heart murmur. CDE was considered to be the gold standard.9
The reproducibility was evaluated with κ statistics, and we found a mean intraobserver κ of 0.45 and a mean interobserver κ of 0.28. κ values in this range are considered to reflect poor to moderate agreement between observers.10 The intraobserver κ values were generally higher than the interobserver κ values, indicating a closer agreement between two observations if they had been made by the same radiologist than if they had been made by two different radiologists. Other studies of observer variability have given similar results.11 In our study, the difference between intra- and interobserver variability is probably explained by individual diagnostic habits of the radiologists, as the radiologist hardly remembers a particular chest radiograph for half a year. The low consistency of the assessments of the chest radiograph might not be surprising. Despite existing guidelines for the assessments of chest radiographs, there remains a substantial amount of subjectivity in the evaluations. Reproducible assessment requires, among other things, standardized techniques. Chest radiographs are assumed to have been obtained in maximal inspiration. In children this assumption rarely holds true, and the various levels of inspiration leads to variable image quality. It should also be noted, that a high level of agreement (high reproducibility) between observers does not necessarily imply high accuracy.
With respect to accuracy, we found a very low sensitivity of the chest radiograph to heart defects as diagnosed by CDE. This might be explained by the absence of significant hemodynamic disturbances in some of the children with heart abnormalities at the time of investigation (eg, small atrial septal defect and small ventricular septal defect). A false-negative chest radiograph might result in the omission of relevant investigations. However, proper identification of the heart defect is important, as many of the children will require precautions, such as endocarditis prophylactics12 and/or follow-up examinations. The mean predictive value of a positive test was 0.4. Consequently, 60% of the children radiologically diagnosed as having a heart defect were in fact normal. These results were obtained in children who were referred from general practice for further evaluation. This means that the children were already selected. If chest radiographs were applied in a nonselected population with a lower prevalence of heart defects, the predictive value of a positive test would become even lower, and the number of false-positive outcomes very high.13 Swenson found chest radiographs a valuable tool in the evaluation of new pediatric patients with heart murmur.4 However, only 38% of the children received a CDE to confirm the clinical and radiologic evaluation in this study. In addition, the physicians might have been biased with regard to the evaluation of the chest radiograph, because they knew the result of the clinical examination when evaluating the chest radiograph.4,,14
The mean values of sensitivity and specificity and the predictive value of a positive and a negative test in this study corresponded closely to the values found when a number of radiologists assessed the chest radiographs in the daily routine.1 This indicates that the assessments of the chest radiographs, with respect to the presence or absence of heart defect, were not more correct when a single specialist in pediatric radiology assessed all of the chest radiographs, than when many radiologists assessed them in the daily routine. None of the six specialists in radiology was more correct in their assessments than the others. These results probably reflect poor accuracy of the test (chest radiograph for detection of heart defects) rather than poor accuracy of the radiologists.
In most clinical tests intra- and interobserver variability can be found.15,,16 The usefulness of a test in clinical practice depends on the reproducibility and the accuracy of the test. The present results do not support the use of a routine chest radiograph in the initial evaluation of children with asymptomatic heart murmurs. It frequently leads to unnecessary anxiety, and it increases the burden of ionizing radiation to no use. If any doubt remains after a thorough clinical evaluation of a child with an asymptomatic heart murmur, a competent CDE should be conducted.17
We thank the National Research Council of Health Sciences (M Frydenberg) for statistical advice.
The results of a study comprising two observers independently recording the same dichotomous diagnosis in n patients can be presented in a 2 × 2 contingency table, where a, b, c, and d indicate the number of observations and n is the number of patients:
The agreement between observer 1 and 2 is Po = (a + d) ÷ n. The agreement by chance between observer 1 and 2 isκ is defined as (Po − Pc) ÷ (1 − Pc). κ can vary between −1 and 1. If the agreement between two observers is 100%, κ becomes 1. With high agreement between the two observers, κ approaches 1, although if agreement is poor, κ approaches 0. A κ value of 0 indicates chance agreement, whereas negative κ values indicate that the observed agreement is less than chance.6,,15
In the present study where 12 sets of assessments are available, it is possible to make 66 = 12 × 11 ÷ 2 pairs of comparisons of assessments. Six of these will be comparisons between the first and second assessment by an individual radiologist, resulting in intraobserver κ values. The remaining 60 κ values are comparisons of different radiologists' assessments, resulting in interobserver kappa values. One must have in mind, that the interobserver κ values are calculated from 12 sets of assessments from 6 radiologists. The interobserver κ values will therefore be correlated to some degree. This is not expected to invalidate our results because only the mean values are considered.
- CDE =
- color Doppler echocardiography
- Birkebæk NH,
- Hansen LK,
- Oxhøj H
- Brennan P,
- Silman A
- ↵Foldspang A, Juul S, Olsen J, Sabroe S. Vurdering af screening programmer. In: Foldspang A, Juul S, Olsen J, Sabroe S, eds. Epidemiologi. Munksgård, Denmark: 1986;171–190
- Danford DA,
- Nasir A,
- Gumbiner C
- Swenson JM,
- Fischer DR,
- Miller SA,
- Boyle GJ,
- Ettedgui JA,
- Beerman LB
- Ades AE
- Kramer MS,
- Roberts-Brauer R,
- Williams RL
- Copyright © 1999 American Academy of Pediatrics