ARTICLE |
a Department of Pediatrics
b Department of Public Health Sciences, University of Virginia, Charlottesville, Virginia
| ABSTRACT |
|---|
|
|
|---|
METHODS. We reviewed the print research articles published in Pediatrics, volume 115, 2005, and recorded the statistical measures and procedures reported in each article to determine how many articles used statistics or statistical procedures and what statistical procedures were encountered most commonly.
RESULTS. The proportion of articles that used any inferential statistics increased from 48% in 1982 to 89% in 2005. The mean number of inferential procedures per article increased from 2.5 in 1982 to 3.9 in 2005. The most commonly encountered statistical procedures or measures were descriptive statistics, tests of proportions, measures of risk, logistic regression, t tests, nonparametric tests, analysis of variance, multiple linear regression, sample size and power calculation, and tests of correlation. However, a reader who is familiar with only these concepts can understand the analyses used in only 47% of articles.
CONCLUSIONS. Our results confirm a trend toward the use of new and increasingly complex statistical techniques in Pediatrics. Educational efforts might most profitably focus on the principles underlying statistical analysis rather than on specific statistical tests. Authors, reviewers, and journal editors have a greater responsibility for ensuring that statistical procedures are used appropriately, as it may be increasingly unrealistic to expect readers to fully understand the statistical analyses used in journal articles.
Key Words: medical education statistics publishing
Lifelong learning as a physician demands facility in the assessment and application of clinical evidence from the medical literature. Appraisal of an article's methodologic rigor depends on an understanding of the study design and analysis used by the authors. The importance of these skills in medical training is broadly recognized. The Accreditation Council for Graduate Medical Education and the American Board of Pediatrics mandate that, to attain competency in practice-based learning and improvement, residents are expected to "apply knowledge of study designs and statistical methods to the appraisal of clinical studies" and to "appraise and assimilate evidence from scientific studies related to their patients' health problems."1,2 Optimally, graduates of pediatric residency programs will have a strong enough working knowledge of statistics to be able to evaluate the most important analyses in most of the medical literature they read. Unfortunately, graduates of pediatric residency programs nationwide often report that they receive little to no formal training in epidemiology and biostatistics, and they give only "fair to poor" marks for their knowledge of research design and statistical analysis.3
It is not clear how best to teach critical appraisal skills during pediatric residency training. Study design and biostatistics are often taught in the context of a journal club or an evidence-based medicine conference,4 but the success of such efforts in imparting these concepts has not been well evaluated.5 Time is limited, especially with new resident duty-hour restrictions, and the needs of the learners may vary widely. Medical students frequently have poor skills in basic mathematics, and they often have difficulty interpreting medical data.6 Residents, researchers, and practicing physicians may perform no better.7,8
It is also not clear which statistical concepts are most necessary and useful for readers to become familiar with. There has been a well-documented trend toward the use of new and increasingly complex statistical techniques in published articles.9 Use of these more sophisticated techniques can potentially allow more thorough analysis of study data by, for example, enabling complex modeling with multiple comparisons or multiple variables. But, such advances have made it more and more difficult for readers to understand the study analyses. An earlier study demonstrated that a reader of Pediatrics who understood descriptive statistics (for example, means and standard deviations) and 3 inferential statistical procedures (Student's t test,
2, and Pearson's r) could understand the statistical analysis in 97% of research articles published in 1952, but only 49% of articles in 1982.10 The goals of this study were to determine (1) whether this trend has continued and the proportion of articles that a reader can understand with only these few basic concepts has declined further, and (2) the statistical measures and procedures most commonly encountered in Pediatrics. These concepts could then potentially be used in planning a curriculum for pediatric residents and other readers wishing to improve their skills in critical appraisal of published research.
| METHODS |
|---|
|
|
|---|
2, and Pearson's r), and (3) which statistical procedures were encountered most frequently. The 3 inferential procedures chosen for the second analysis do not necessarily represent the most important statistics but were selected to allow comparisons with the previous study. | RESULTS |
|---|
|
|
|---|
2, and/or Pearson's r, compared with 65% in 1982. If Review Articles and Special Articles are excluded from consideration, 9% of articles use only descriptive statistics, Student's t test,
2, and/or Pearson's r. The proportion of articles that used any inferential statistical procedure, with or without descriptive statistics, increased from 48% in 1982 to 89% in 2005. The mean number of inferential procedures per article increased from 2.5 in 1982 to 3.9 in 2005.
|
10% of the articles reviewed. A reader who understands all of these "top 10" topics can potentially understand the analyses used in only 47% of the 171 articles. Table 3 lists the procedures or measures that were encountered in 5% to 9% of articles. A reader who understands the concepts in Tables 2 and 3 can potentially understand the analyses used in only 70% of the 171 articles. Many other statistical techniques were encountered (Table 4).
|
|
|
Nonparametric techniques were used in 24% of articles, most commonly the Mann-Whitney U test and the Kruskal-Wallis test. To add to potential confusion, multiple names were used for the Mann-Whitney U test, including Wilcoxon rank sum, Mann-Whitney U, Mann-Whitney rank sum, and Wilcoxon Mann-Whitney test for ordered categories.
Statistical methods were not always explained or even mentioned in the methods section of the articles, but were often buried in the text of the results section or listed only as footnotes to tables. In several instances, no statistical procedure was specified, but the presence of a p value indicated that a test had been performed. In most of these cases, it was possible to make an educated guess about what sort of procedure had been performed (eg, a test to compare proportions), but it was not possible to determine which specific test had been used (eg,
2 or Fisher's exact test). In these cases, only the more general classification was tabulated.
| DISCUSSION |
|---|
|
|
|---|
Reasons for this increase in statistical complexity may include the development of new study designs and statistical techniques, and also the broad availability of expanded computing power.12 Perhaps this increased complexity of statistical analysis should be expected given the increasing complexity of the world in general, and of scientific domains in particular. Taken in this larger context, the statistical complexity is perhaps better understood, but it nevertheless may remain troublesome and baffling to readers.
A reader may understand a research article at several levels. He or she may understand the statistical tests and procedures well enough to assess whether they were appropriate to the study and conducted correctly, or he or she may be able to interpret results reported as descriptive statistics, measures of effect size, or P values without understanding the statistical procedures used. The latter reader may still find an article to be valuable.
If, however, one assumes that a general reader should be able to understand the statistical procedures and measures in most published articles, there are a few possible courses of action. One option might be for journals to require that statistical methods be kept relatively simple and that any unusual or complex procedures be explained thoroughly. In this context, "unusual" could be defined mathematically, for example, as a test appearing in <5% of articles. Such a requirement might, however, "dumb down" the techniques used, result in suboptimal analysis of study data, and increase the length of methods section that few would ever read, let alone comprehend.
The optimal way to report statistical methods no doubt depends on the article's anticipated audience. Unfortunately there may be many audiences (or a continuum of audiences) based on readers' levels of interest in the clinical topic and expertise in research design and analysis. For example, clinicians may have better understood the results of 1 reviewed study because it included helpful background information about the statistical model, as follows:
The Cox regression technique takes account of variable length of follow-up monitoring, including the possibility of "censoring" (no event when last observed but future events are not ruled out), and produces an estimate of the relative likelihood of the event during any small time interval ("hazard ratio"), as affected by specified risk factors. Like the conventional techniques of multiple linear and logistic regression, Cox regression can assess the independent effect of each risk factor while controlling simultaneously for other factors.13
This same information, however, may have been boring and superfluous for a reader with substantial statistical expertise. In contrast, statistically savvy readers may appreciate having substantial detail of a mathematical model, whereas most clinicians are unlikely to delve into a discussion of 6 different methods used to impute missing study data included in another reviewed study.14 Perhaps ideally, articles will include a brief overview of the statistical methods used, as well as significant detail (perhaps in an appendix) for statistical reviewers and any interested readers. In the instance of printed articles, additional information can be made available on request. For articles published electronically, readers who desire more information about the statistical technique or model could perhaps click on a link to access that material.
A second option would be to provide readers with more intensive training in statistical methods. Given current duty-hour restrictions for residents, however, finding more time to teach this material during residency will be difficult. Likewise, educational sessions on biostatistics at continuing medical education meetings are not likely to attract large audiences if they are competing with clinical updates or sessions on such practical issues as new vaccines or office management. Placing greater emphasis on teaching biostatistics to medical students is a possibility, but the practical value of this information may be less clear and, therefore, less interesting to students at this earlier stage of training.
A third option is to concede that many readers will never be motivated and/or able to understand the statistical analysis of most published articles. In past years when the variety of statistical techniques encountered was narrower than today, motivated physician readers could develop a rudimentary understanding of the techniques they were likely to encounter in published articles. Now that the range of techniques encountered has broadened so widely, the expectations may need to change. The purely statistical aspects of biomedical research are certainly not as important and as crucial to good science as is sound research design with attention to potential sources of bias, choice of appropriate controls, and types of outcomes chosen. Educational efforts focusing on principles of study design and potential biases might aid the clinician reader regardless of complexity of statistical analysis. A complementary approach is for clinicians to become "information masters" who efficiently use the medical literature, including secondary sources such as the Cochrane Database of Systematic Reviews, as well as assessments of the strength of research evidence, such as the strength of recommendation taxonomy.15,16 For most readers, understanding the "what" and "why" of the research is more important than understanding the "how" of the analysis.
Readers who do not understand the statistical measures and analysis used in an article have several options. Because ignorance often breeds mistrust, readers may tend to reject an unfamiliar analysis and discount an article's results, but this might well result in dismissing an important research finding. Consulting a statistician for assistance may be helpful, but this is impractical for most readers. Reading an expert review of the article may be helpful, if one has been written. Trusting a study's authors and the journal's peer-review process to assure that the statistical analysis is appropriate and correct is another possibility, but journal editors may not conduct statistical reviews of submitted manuscripts,17 and statistical errors have been detected commonly in published articles.18–20
Including a biostatistician among the authors of an article probably increases the possibility that an "unfamiliar" statistical test is used, but may well also increase the likelihood that the analysis is thoughtful and appropriate.21 Including a statistician on editorial boards and having articles refereed by a statistician may make it "safer" for statistically naïve readers to believe what they read.
This study has several limitations. First, only 1 volume of 1 journal was reviewed, and we excluded the electronic pages, thus the findings may not be generalizable to other journals. For example, a review published in 2003 of 6 journals in 3 nonpediatric subspecialties revealed that a reader could understand 70% of articles with 3 basic concepts: descriptive statistics,
2/Fisher's exact test, and Student's t test.22 Pediatrics was reviewed for this study to allow comparisons with the earlier article.10 The study results may still be broadly applicable because, as the official journal of the American Academy of Pediatrics, Pediatrics has a large circulation and high impact factor, and publishes many articles of interest to both clinicians and researchers. A second limitation is that some statistical procedures actually used in the reviewed articles may have been missed in our review. In that case, our findings can only underestimate the frequency and complexity of statistical procedures that a reader might encounter. Third, no attempt was made to assess the appropriateness or accuracy of the statistical measures and techniques used in each article. Finally, our classification of the statistical measure and procedures represents just 1 possible categorization. The concepts might be grouped in different ways.
| CONCLUSIONS |
|---|
|
|
|---|
| ACKNOWLEDGMENTS |
|---|
| FOOTNOTES |
|---|
Address correspondence to Martha A. Hellems, MD, MS, Department of Pediatrics, University of Virginia, Box 800386, Charlottesville, VA 22908-0386. E-mail: mab4c{at}virginia.edu
The authors have indicated they have no financial relationships relevant to this article to disclose.
| REFERENCES |
|---|
|
|
|---|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||