pediatrics
October 2014, VOLUME134 /ISSUE 4

# Rapid Diagnostic Tests for Group A Streptococcal Pharyngitis: A Meta-analysis

1. Wei Ling Lean, MBBS, BMedSca,
2. Sarah Arnup, BSc(Hons), MPhil, MBiostatb,
3. Margie Danchin, MBBS, FRACP, PhDa,c,d, and
4. Andrew C. Steer, MBBS, BMedSc, MPH, FRACP, PhDa,c,e
1. aDepartment of General Medicine, Royal Children’s Hospital, Melbourne, Australia;
2. bClinical Epidemiology and Biostatistics Unit,
3. cGroup A Streptococcal Research Group, and
4. dVaccine and Immunisation Research Group, Murdoch Children’s Research Institute, Melbourne, Australia; and
5. eCentre for International Child Health, Department of Paediatrics, University of Melbourne, Melbourne, Australia

## Abstract

BACKGROUND AND OBJECTIVE: Effective management of group A streptococcal (GAS) pharyngitis is hindered by impracticality of the gold standard diagnostic test: throat culture. Rapid antigen diagnostic tests (RADTs) are a promising alternative, although concerns about their sensitivity and specificity, and variation between test methodologies, have limited their clinical use. The objective of this study was to perform a systematic review with meta-analysis of the diagnostic accuracy of RADTs for GAS pharyngitis.

METHODS: Medline and Embase from 1996 to 2013 were used as data sources. Of 159 identified studies, 48 studies of diagnostic accuracy of GAS RADTs using throat culture on blood agar as a reference standard were selected. Bivariate random-effects regression was used to estimate sensitivity and specificity with 95% confidence intervals (CIs). Additional meta-analyses were performed for pediatric data.

RESULTS: A total of 60 pairs of sensitivity and specificity from 48 studies were included. Overall summary estimates for sensitivity and specificity of RADTs were 0.86 (95% CI 0.83 to 0.88) and 0.96 (95% CI 0.94 to 0.97), respectively, and estimates for pediatric data were similar. Molecular-based RADTs had the best diagnostic accuracy. Considerable variability exists in methodology between studies. There were insufficient studies to allow meta-regression/subgroup analysis within each test type.

CONCLUSIONS: RADTs can be used for accurate diagnosis of GAS pharyngitis to streamline management of sore throat in primary care. RADTs may not require culture backup for negative tests in most low-incidence rheumatic fever settings. Newer molecular tests have the highest sensitivity, but are not true point-of-care tests.

• group A streptococcus
• pharyngitis
• rapid test
• sensitivity
• specificity
• Abbreviations:
CI
confidence interval
ELISA
FISH
fluorescence in situ hybridization
GAS
group A β-hemolytic streptococcus
OIA
optical immunoassay
PCR
polymerase chain reaction
Quality Assessment of Diagnostic Accuracy Studies
rapid antigen diagnostic test
ROC
S-ROC
• Sore throat is a common presentation to primary health care and emergency departments, especially in the pediatric population. The most common bacterial cause of acute sore throat is the group A β-hemolytic Streptococcus (GAS). In a cohort study done in Australia, the incidence of pharyngitis caused by GAS in children aged 5 to 12 years was 13 cases per 100 person-years.1 GAS pharyngitis causes a considerable cost to society; in the United States it is estimated that GAS pharyngitis in children alone costs between $224 and$539 million per year.2 In addition to the acute symptoms of sore throat, GAS can lead to suppurative sequelae, including peri-tonsillar abscess, and nonsuppurative sequelae, including rheumatic fever, although this complication is rare today in most industrialized countries.

However, there are challenges in the diagnosis of GAS pharyngitis. First, the signs and symptoms of GAS pharyngitis are often indistinguishable from viral and other causes of sore throat. No symptom or sign in isolation has been shown to have a sufficiently high likelihood ratio to permit an accurate diagnosis of GAS pharyngitis.3 Combinations of symptoms and signs have been developed into clinical prediction rules to help identify patients who have a higher likelihood of GAS infection. One of the most commonly used prediction rules validated in both adults and children are the Centor criteria, which use up to 4 clinical features (tonsillar exudates, swollen tender anterior cervical nodes, fever, and the lack of cough). However, this rule identifies only 53% of patients with GAS culture–positive sore throat even when all 4 criteria are present.4,5 Therefore, if the clinician intends to treat GAS pharyngitis, it is generally recommended that laboratory confirmation of the presence of GAS be sought to limit unnecessary antibiotic prescription.

The gold standard laboratory investigation of GAS pharyngitis is bacterial culture of a throat swab. However, effective management is hindered by the impracticality of throat culture because of the relatively long lag time between the collection of the specimen and final microbiological diagnosis.5 This delay is especially problematic in low-resource settings, as it may not be feasible for patients to return for further follow-up visits and appropriate treatment.6

Rapid antigen diagnostic tests (RADTs) are a potentially more feasible alternative because of their quick turnaround time, so that the clinician can make a decision regarding treatment at the point of care.7 Since their inception in the early 1980s, there have been several generations of RADTs that have used different methodologies.8 The first-generation tests used latex agglutination, followed by enzyme-linked immunosorbent assays (ELISAs), lateral flow and immunochromatographic assays, and optical immunoassays (OIAs). More recently, molecular-based techniques, such as DNA probes, polymerase chain reaction (PCR), and fluorescence in situ hybridization (FISH) methods, have been developed.8 RADTs have been incorporated into both the Infectious Diseases Society of America and the European Society for Clinical Microbiology and Infectious Diseases clinical practice guidelines,4,9 but are not used routinely in all countries, including Australia.10

Widespread use of RADTs has been hindered by low sensitivity for most commonly used RADTs (immunoassays). Previous reviews of RADT performance have identified considerable variability in the diagnostic accuracy, especially sensitivity, between different test methodologies.4,11 The American guidelines recommend that negative RADTs in children and adolescents should be backed up by a throat culture to reduce the number of missed GAS pharyngitis cases.9 These guidelines, along with European guidelines,4 suggest that a backup culture in adults is not necessary because the incidence of GAS pharyngitis is generally lower than in children and because the risk of rheumatic fever is low. However, most RADTs have high specificity, meaning that a positive RADT result does not require a backup culture and that the rate of overdiagnosis is low.9

We conducted a systematic review with meta-analysis to determine the diagnostic accuracy of each class of RADTs in children and adults combined and children only with GAS pharyngitis, and to explore the heterogeneity among studies by analyzing subgroups classified according to type of test in both children and adults combined and restricted to children.

## Methods

### Data Collection

We systematically searched Medline and Embase via OvidSP for articles published between 1996 and 2013. We used the following search terms: Streptococcus pyogenes, streptococcal infections, group A streptococcal infection, pharyngitis, rapid test, diagnostic reagent kits, immunoassay, immunoenzyme technique, enzyme immunoassay, latex fixation test, latex agglutination test, diagnostic test, molecular biology. The search was supplemented by a manual review of bibliographies of articles meeting inclusion criteria and the bibliographies of previous reviews. The search was limited to English-language articles only.

The abstract of all identified articles was reviewed. We included articles in our review if they contained data on the accuracy of GAS RADTs. Review articles, letters, comments, and study protocols with incomplete data were excluded (Fig 1). After this, full articles were retrieved and reviewed.

FIGURE 1

Study flow diagram. This flow diagram follows the Preferred Reporting Items for Systematic Reviews and Meta-analysis (PRISMA)76 with modifications.

Each study was assessed for quality and risk of bias by 2 investigators (WLL, ACS) using the Quality Assessment of Diagnostic Accuracy Studies (QUADAS) tool for inclusion within a meta-analysis of studies.12 The Cochrane version of 11 QUADAS criteria was used in the quality assessment of each study (Supplemental Table 2).12 All the analyzed studies used culture on a blood agar plate as a minimum reference standard; data within individual studies that were not compared with blood agar culture were excluded from analysis. Studies that used only throat culture as a backup for negative RADTs were excluded from the meta-analysis because this methodology assumes that all test-positives are true-positives, and there are no false-positives; as a result, specificity is assumed to be 100% and sensitivity can be overestimated. Only studies that used throat swabs, not mouth swabs, were included.

### Data Extraction and Categorization

Multiple variables were extracted from the studies, including sample size, prevalence of GAS culture positivity, sensitivity, specificity, and sample characteristics. Where sensitivity and specificity were not presented in the article, we independently calculated sensitivity and specificity from published raw data or from data submitted by authors at our request. Studies were categorized on the basis of the type of test, setting (emergency department, outpatient clinic, inpatient), and a subgroup of studies performed in children (aged <18 years) was defined. For type of test, we included studies that reported on lateral flow assay and immunochromatographic assay in a single category, and DNA probe, PCR assay, and FISH in a single category (molecular technique), in addition to 4 other categories: latex agglutination, liposomal technology, ELISA, and OIA.

### Statistical Analysis

A bivariate random-effects model was used to estimate summary values of sensitivity, specificity, and their 95% confidence intervals (CIs), for each RADT category with more than 3 pairs of sensitivity and specificity and all categories combined.1315 Because a correlation may exist between sensitivity and specificity across studies, each study measurement of sensitivity and specificity was analyzed together as a pair.

To explore heterogeneity between studies, we prepared forest plots of the individual pairs of sensitivity and specificity with 95% confidence intervals; and plotted each pair in receiver operating characteristic (ROC) space, along with a summary ROC (S-ROC) curve, summary estimates of sensitivity and specificity, and a 95% confidence ellipse around the summary estimates. The S-ROC curve illustrates the estimated relationship between sensitivity and specificity across studies; where there is a correlation between sensitivity and specificity across studies, the individual pairs of sensitivity and specificity are expected to lie along the S-ROC curve.16 Heterogeneity was further investigated by performing separate analyses in the following clinical subgroups: RADT types with more than 3 studies (lateral flow/immunochromatographic assay, ELISA, OIA, molecular technique), and RADT types with more than 3 studies including children only.

Many (19/48) studies reported more than 1 pair of sensitivity and specificity. Where the multiple sensitivity and specificity pairs were estimated from different samples of patients, each pair is treated in our analysis as if it came from a separate study.6,1720 Where a different RADT was tested in the same sample of patients, we included each sensitivity and specificity pair in the subgroup analysis for the respective RADT.2125

Where multiple sensitivity and specificity pairs were estimated from the same patients in a study, only 1 pair of sensitivity and specificity was included in our analysis.22,23,2534 We selected the pairs that were the focus of primary analysis of the selected studies.

Statistical analysis was performed in Stata 12 (Stata Corp, College Station, TX) using the Metandi package.35

## Results

A total of 60 pairs of sensitivity and specificity, comprising 23 934 patients from 48 studies were included in the final analysis (Table 1). Of note, 7 studies were excluded after application of the modified QUADAS tool; 6 of these studies were excluded because of an inadequate reference standard (Fig 1 and Supplemental Table 2).3641 All studies included used culture as the reference standard to compare the RADT performance. Thirty-six of the 48 studies were carried out in a developed country and 12 in a developing country. Eight types of RADTs were found among these articles. The range of values for each RADT type is summarized in Supplemental Table 3.

TABLE 1

Characteristics of Studies With Data on Performance of RADTs

### Sensitivity and Specificity Analysis: Summary Estimates

The summary estimate of sensitivity of RADTs among all studies included was 0.86 (95% CI 0.83 to 0.88), whereas the summary estimate for specificity was 0.96 (95% CI 0.94 to 0.97, Supplemental Fig 4). We observed considerable variability across studies in sensitivity, but little variability in specificity. Despite this variability, we consider it appropriate to estimate diagnostic accuracy with a summary measure for sensitivity and specificity, rather than with an S-ROC curve, because there was no evidence of a correlation between sensitivity and specificity when the RADTs were pooled (correlation coefficient 0.06, 95% CI –0.26 to 0.37). Furthermore, the forest plot for all studies showed no systematic decrease in specificity with increasing sensitivity, illustrating that a threshold effect, such as variation in cutoff value for a positive test result between studies, does not account for the observed variability in diagnostic accuracy between studies.

### Sensitivity and Specificity Analysis: Test Types

Results from 4 of the 6 categories of RADT were pooled (Fig 2). Overall, specificity was higher than sensitivity for all 4 test categories. There was no evidence for a correlation between sensitivity and specificity within the test categories. Test performance appeared best for the molecular technique category with a pooled sensitivity and specificity of 0.92 (95% CI 0.89 to 0.95) and 0.99 (95% CI 0.97 to 0.99), respectively. The sensitivity and specificity of the other 3 test categories were comparable with pooled sensitivity ranging from 0.84 to 0.86, and pooled specificity ranging from 0.94 to 0.96. The S-ROC curves for each RADT type compared with the other test categories are shown in Fig 3.

FIGURE 2

Forest plots of summary estimates of sensitivity and specificity stratified by RADT category. The black boxes indicate the sensitivity and specificity, and the horizontal black lines indicate the corresponding 95% CIs for each result in each RADT category. For each RADT category with more than 3 results, a diamond is centered on the summary estimate for sensitivity and specificity, with points on the corresponding 95% CI, as estimated jointly by bivariate random-effects regression. FN, false-negative; FP, false-positive; IC, immunochromatographic assay; TN, true-negative; TP, true-positive.

FIGURE 3

S-ROC curve and sensitivity and specificity stratified by RADT category. In each panel, all pairs of sensitivity and specificity from RADT categories with more than 3 results are represented as a cross (+). A black cross indicates the pair is from the indicated RADT category, whereas a gray cross shows pairs from all other RADT categories with more than 3 results. Each panel also shows for the indicated RADT category the summary estimate (black closed circle) and corresponding 95% confidence ellipse (thick black line), and the S-ROC curve (thin black line), which were derived from bivariate random-effects regression. In addition, each panel shows in gray the summary estimate and 95% confidence ellipse, and S-ROC curve, for all other RADT categories with more than 3 results. IC, immunochromatographic assay.

We continued to observe a large amount of variability in the sensitivity found in studies within each category, particularly in the lateral-flow/immunochromatographic assay category where sensitivity ranged from 0.59 to 0.96. Of the test types not included in the meta-analyses described previously, both test types (latex agglutination and liposomal technology) had relatively poor sensitivity (Supplemental Fig 4, Supplemental Table 3).

There was less variability in specificity, although there were 2 clear outliers in the OIA category in our meta-analysis.42,43 The study by Hart et al42 compared BioStar Strep A OIA RADT with a Selective Strep Agar with 5% sheep blood that was incubated anaerobically. The prevalence of GAS pharyngitis in this study was 12%, which is much lower than the other studies of OIA included. This study found that weakly positive test results were frequently associated with false-positive results; reclassification of these weakly positive test results as negative results would increase the specificity of the OIA. Possible cross-reactivity with groups B and C streptococci also were observed in some of the false-positive cases.42 Similarly, false-positive results were frequent in the study by Filho et al43 in Brazil, contributing to low specificity. This was a small study with a sample size of 81, comparing the Strep A OIA Max RADT to the reference standard of 5% goat blood agar culture medium in an aerobic environment. The high rate of false-positives (32.6%) was attributed to failure of the RADT method, detecting nonspecific bacterial antigens or cross-reaction with other nongroup A streptococci.43

### Sensitivity and Specificity Analysis: Pediatric Population

Thirty-three paired sensitivity and specificity results from 25 studies evaluated RADTs in children only, and meta-analysis was performed for 3 categories of test type (lateral flow/immunochromatographic assay, OIA, and molecular technique). We did not find evidence for a correlation between sensitivity and specificity within the test categories. The summary estimates of sensitivity and specificity among studies in children were 0.87 (95% CI 0.84 to 0.89) and 0.96 (95% CI 0.95 to 0.97), respectively, which is similar to the overall summary estimates (Supplemental Figs 5 and 6). Molecular techniques performed better than OIA and lateral flow/immunochromatographic assay in the pediatric population, with a pooled sensitivity of 0.93 (95% CI 0.89 to 0.96) and a pooled specificity of 0.99 (95% CI 0.98 to 1.0). The performance of OIA and lateral flow/immunochromatographic assay was similar, OIA and lateral flow/immunochromatographic assay had a sensitivity of 0.85 (95% CI 0.80 to 0.89), and the specificity of lateral flow/immunochromatographic assay was slightly higher (0.97, 95% CI 0.95 to 0.98) than OIA (0.95, 95% CI 0.93 to 0.97).

### Best-Performing Tests

There were 6 studies that had a sensitivity of 0.95 or above. These included 2 studies from the lateral flow/immunochromatographic assay category, 1 of OIA, 1 of ELISA, and 2 from the molecular technique group.19,21,31,4446 Of these, 4 also had a specificity over 0.95.19,21,44,46 In the lateral flow/immunochromatographic assay group, the study by Al-Najjar et al44 in the United Arab Emirates collected paired throat swabs from 505 children with predefined symptoms for testing with Diaquick Strep A test (Dialab GmbH, Vienna, Austria) and culture. With a GAS prevalence of 14%, the positive predictive value was very high (0.96) with a negative predictive value of more than 0.99.44 In the OIA group, the study by Ezike et al,19 which used the OIA MAX test, found the highest sensitivity and specificity. This was achieved by using a single throat swab for both the OIA and for culture in children aged 5 to 18 years who presented with symptoms of acute pharyngitis.19 It is noteworthy that when investigators in this study collected throat swab specimens by rubbing 2 swabs simultaneously on the posterior pharynx and both tonsils, rather than a single swab, they observed a lower sensitivity (0.92) and specificity (0.96).19 In the molecular technique group, the GAS direct probe test (Gen-Probe, San Diego, CA) was one of the RADTs evaluated in 520 patients in the study by Chapin et al21; its performance parameters were reported to be comparable to those of culture when both were compared with Todd-Hewitt broth culture. Similarly, a retrospective clinical study on a laboratory-developed and internally controlled rapid GAS PCR assay using the dnaseB gene as the target gene reported a sensitivity and specificity of 0.96 and 0.99, respectively. An equally high sensitivity and specificity was observed when the test was carried out using either flocked swabs or conventional swabs.46 These 2 more recently developed RADTs have a turnaround time of 1 to 2 hours and require special laboratory setups, which may necessitate follow-up with patients for relaying results, as compared with other techniques that are true point-of-care tests.8

## Discussion

Our study is the most comprehensive meta-analysis of RADTs for GAS pharyngitis to date. We made an objective assessment of study quality by using the modified QUADAS tool and were able to evaluate 4 categories of test type with pooled results. Overall, the sensitivity of included RADTs in our study was 0.86 (95% CI 0.83 to 0.88) and specificity 0.96 (95% CI 0.94 to 0.97), although with noticeable variability among individual tests. These results indicate that RADTs in general have high diagnostic accuracy. The sensitivity and specificity of these tests when analyzed in pediatric studies alone were similar to the overall estimates. Overall, the newer molecular techniques were the best-performing tests, particularly in terms of their sensitivity, although a minority of nonmolecular tests also performed extremely well. There was less variability in sensitivity observed for the more recently developed RADTs compared with the older tests.

Although rheumatic fever is uncommon in Europe and the United States, with an incidence of <1 per 100 000, the disease remains an important cause of cardiac morbidity and mortality in many tropical developing countries where the incidence is frequently >50 per 100 000.48 In these countries, there is a clear indication for treatment of GAS pharyngitis to prevent rheumatic fever and its chronic and disabling sequelae, rheumatic heart disease.49 A highly sensitive (≥95%) and inexpensive RADT with a very rapid turnaround time could make a major contribution to control efforts for rheumatic fever. Based on our data, however, no single test currently fulfills all 3 of these criteria.

When cost is considered in the management of pharyngitis, RADTs have been shown to be the more cost-effective option when compared directly with culture (as treating all and none have unacceptable costs).50,51 In terms of direct costs, in Australia, RADTs cost approximately AUD$5 to AUD$10 per test compared with AUD$30 per test for culture, whereas costs in the United States and Europe are more difficult to compare because of the wide range of pricing by individual commercial companies . The practice of using confirmatory cultures to back up RADTs has been shown to cost >$8 million per additional case of rheumatic heart disease prevented,52 and should be questioned as a cost-effective approach to management.

There are several limitations to our study. Despite our best efforts to exclude low-quality studies, particularly those with an inadequate reference standard, there was considerable variability in methodology among studies. This included number and type of throat swabs used, as well as techniques used to obtain these throat swabs.19,23,29,46 Methods of sample collection were not clearly reported in all studies and there is no way to control the quality of the swab samples. Studies included also differed in their settings and the clinical severity of included patients. For example, we included studies that assessed diagnostic accuracy of RADTs among patients both before receiving, and after receiving, antibiotic treatment.18,28 These factors are potential confounders in the estimation of diagnostic accuracy and may explain some of the observed heterogeneity across each type of RADT. However, because of insufficient numbers of studies within each test type, we were unable to perform either a meta-regression or a subgroup analysis to determine the importance of these factors. In terms of quality of included studies, blinding of reference standard results was not well reported in most of the included studies. Information on uninterpretable results was also poorly reported. It was not possible to determine if uninterpretable results occurred in 24 of the included studies. In addition, withdrawals were unclear or not explained in 16 of the included studies (Supplemental Table 2). Finally, we included studies published in the English language only, which may have reduced the numbers of studies included in our meta-analysis.

The diagnostic accuracy of the more recently developed RADTs (molecular techniques) is encouraging. However, further research could focus on improving the practicality of these tests, especially when they are used in the primary care settings. A considerable drawback of these tests is that none are truly “point-of-care” tests with a turnaround time between 1 and 3 hours, whereas the immune-based tests have a turnaround time as fast as 30 seconds. Other factors that may have an impact on sensitivity and specificity of the RADTs, such as the type of throat swab and sampling techniques, also need to be further investigated in well-designed studies, to further improve the diagnostic accuracy of RADTs. Finally, studies to examine cost-effective analysis of each class of RADTs compared with culture and antibiotic treatment, especially in the pediatric population, would be beneficial for policy makers and clinicians with regard to choice of RADT and treatment decisions.

Our meta-analysis shows that RADTs can be used as accurate, rapid tests for the diagnosis of GAS pharyngitis and that generally backup culture for negative tests are not necessary in most low-incidence rheumatic fever settings, particularly if tests with a high sensitivity are used, including the newer molecular tests.

## Footnotes

• Accepted July 21, 2014.
• Address correspondence to A/Prof Andrew Steer, Centre for International Child Health, Department of Paediatrics, University of Melbourne, Royal Children’s Hospital, Flemington Road, Parkville, Vic 3052, Australia. E-mail: Andrew.steer{at}rch.org.au
• Dr. Lean collected data for the study and drafted the initial manuscript; Dr Arnup carried out the statistical analyses and reviewed and revised the manuscript; Drs Danchin and Steer conceptualized the study, supervised data collection, and critically reviewed and revised the manuscript; and all authors approved the final manuscript as submitted.

• FINANCIAL DISCLOSURE: Drs Danchin and Steer received funding for a clinical study conducted in 2012 of a Quidel Corporation rapid antigen diagnostic test product; the other authors have no financial relationships relevant to this article to disclose.

• FUNDING: No external funding.

• POTENTIAL CONFLICT OF INTEREST: The authors have indicated they have no potential conflicts of interest to disclose.