Conventional diagnostic tests for tuberculosis have several limitations and are often unhelpful in establishing the diagnosis of extrapulmonary tuberculosis. Although commercial serological antibody based tests are available, their usefulness in the diagnosis of extrapulmonary tuberculosis is unknown. A systematic review was conducted to assess the accuracy of commercial serological antibody detection tests for the diagnosis of extrapulmonary tuberculosis. In a comprehensive search, 21 studies that reported data on sensitivity and specificity for extrapulmonary tuberculosis were identified. These studies evaluated seven different commercial tests, with Anda-TB IgG accounting for 48 of the studies. The results showed that 1 all commercial tests provided highly variable estimates of sensitivity range 0.001.00 and specificity range 0.591.00 for all extrapulmonary sites combined; 2 the Anda-TB IgG kit showed highly variable sensitivity range 0.261.00 and specificity range 0.591.00 for all extrapulmonary sites combined; 3 for all tests combined, sensitivity estimates for both lymph node tuberculosis range 0.231.00 and pleural tuberculosis range 0.260.59 were poor and inconsistent; and 4 there were no data to determine the accuracy of the tests in children or in patients with HIV infection, the two groups for which the test would be most useful. At present, commercial antibody detection tests for extrapulmonary tuberculosis have no role in clinical care or case detection.
Statistics from Altmetric.com
Although tuberculosis most commonly affects the lungs, any organ or tissue may be involved. In the USA about 20 of incident cases in 2005 had only extrapulmonary sites of disease and an additional 9 had both pulmonary and extrapulmonary involvement.1 Globally, the proportion of extrapulmonary cases reported by countries ranges from 15 to 25 with greater proportions occurring in countries with a high prevalence of HIV infection.2 In addition to being proportionately greater in persons with HIV infection,35 extrapulmonary involvement occurs with greater relative frequency in children than in adults.6 In children and in persons with HIV infection, extrapulmonary tuberculosis compounds the diagnostic difficulty imposed by their having a lower frequency of sputum smear positivity, even when the lungs are involved.79
The diagnosis of extrapulmonary tuberculosis is often difficult to establish, especially for patients in resource limited areas. Signs and symptoms are non-specific and microscopic examination for acid-fast bacilli, the cornerstone of diagnosis for pulmonary tuberculosis in most parts of the world, lacks sensitivity for extrapulmonary disease.10 11 Mycobacterial culture and histological examination for caseating granulomas are more sensitive but not commonly available. Invasive procedures that are complex and costly may be required to obtain the necessary diagnostic specimens.11 12 In a retrospective study of patients in Tanzania with extrapulmonary tuberculosis, bacteriological or histological confirmation of diagnosis was found in only 18.13 Because of these difficulties, misdiagnosis of extrapulmonary tuberculosis is common in all countries and may result in unnecessary treatment if falsely diagnosed, or greater morbidity and mortality if the diagnosis is missed, especially in persons with HIV infection.11 1416
Immune based tests would seem to offer the potential to improve the diagnosis of extrapulmonary tuberculosis as some of the test formats (eg, immunochromatographic test) are practical for resource limited areas. Blood or urine based assays avoid the problems of obtaining a specimen of the affected organ for microbiological or histological assay, are simpler to perform than smear microscopy and the results can be available within hours.8 17 Efforts to develop immune based tests for the detection of antibodies, antigens and immune complexes have been underway for decades and their performance described in several reviews and textbook chapters.1827 The most common of these tests concentrate on the detection of the humoral (serological) antibody immune response to Mycobacterium tuberculosis (the subject of this review), as opposed to the T cell based cellular immune response (eg, interferon-gamma release assays) or direct detection of antigens in specimens other than serum (eg, lipoarabinomannan detection in urine28 29). It is tempting to speculate that a combination of both humoral and T cell based diagnostic tests could provide the highest diagnostic efficacy, although this has not been evaluated to date.
A number of in-house antibody detection tests have been developed but are not marketed. These tests use different antigens and distinct protocols and techniques.
Currently, dozens of commercial serological antibody detection tests (hereafter referred to as commercial tests) are marketed in low income countries where diagnostic tests are rarely subjected to regulatory review or approval.30 31 The extent of their use is unknown; however, companies report sales volumes between 3000 and 300000 tests per year.32 These tests differ in several respects, including antigen composition and source (eg, native or recombinant), chemical composition (eg, protein, carbohydrate or lipid), extent and manner of antigen purification, class of immunoglobulin (eg, IgG, IgA or IgM), and test format (eg, enzyme-linked immunosorbent assay (ELISA) and immunochromatographic test). Most of the studies investigating the use of antibody detection tests have focused on pulmonary tuberculosis; only a subset has also included patients with extrapulmonary tuberculosis.
To our knowledge, the body of literature evaluating commercial tests for the diagnosis of extrapulmonary tuberculosis has not been synthesised. We therefore conducted a systematic review to summarise the evidence on accuracy (sensitivity and specificity) of commercial tests according to the guidelines and methods proposed for diagnostic systematic reviews and meta-analyses.33 We specifically addressed the following questions:
Overall, how accurate are commercial tests for the diagnosis of extrapulmonary tuberculosis?
How accurate are commercial tests for the diagnosis of a specific form of extrapulmonary tuberculosis?
The following electronic databases were searched for primary studies in the English language: PubMed (1990 to December 2006), BIOSIS (1990 to December 2005), Embase (1990 to December 2005) and Web of Science (1990 to December 2005). The search terms used included tuberculosis, Mycobacterium tuberculosis, immunological tests, serological tests, antibody detection, antigen detection, ELISA, western blot and sensitivity and specificity. Additional studies were identified by contacting experts in the field and by searching reference lists from primary studies, review articles and textbook chapters.
Extrapulmonary tuberculosis was defined as tuberculosis in which the major site of involvement was outside the lungs. Thereafter, nine types of extrapulmonary disease were classified: lymph node, pleural, meninges and/or central nervous system (CNS), bone and/or joint, disseminated/miliary, genitourinary, abdominal, skin and other sites. We included only those studies that based the diagnosis of extrapulmonary tuberculosis on (1) isolation of M tuberculosis on culture or, for studies conducted without culture in tuberculosis endemic countries, the presence of acid-fast bacilli detected by smear microscopy; or (2) the presence of caseating granulomas in histopathological specimens.
Studies that relied solely on clinical and/or radiological features or improvement while on antituberculosis treatment as the diagnostic criteria were excluded. We further excluded: (1) studies published before 1990 for the reason that many studies used crude extracts or obsolete immunological methods; (2) studies with <50 subjects (at least 25 cases and 25 participants without tuberculosis for inclusion); (3) studies in which data were only provided for pulmonary and extrapulmonary cases combined; (4) studies of fluids other than serum (eg, cerebrospinal fluid); (5) studies of latent tuberculosis infection; (6) studies focused on non-tuberculous mycobacteria; (7) studies of antibody responses during or after tuberculosis treatment; (8) investigations using non-immunological methods for detection of antibodies; (9) basic science literature that focused on cloning of new antigens or their immunological properties (eg, epitope mapping) or other new methods of antibody detection; and (10) case reports and reviews.
Initially, two reviewers screened citations retrieved from all sources. To identify eligible studies, a second screen was done of full texts from citations found relevant in the first screen. A list of excluded studies and reasons for their exclusion is available from the authors.
One reviewer extracted data on the following qualities: study design, methodological quality, study population, reference standard, site of involvement, antigen and antibody characteristics, laboratory technique, and sensitivity and specificity. To verify the reproducibility of data extraction, a second reviewer independently extracted data from 24 of the included studies. The inter-rater agreement for sensitivity and specificity estimates was 100. Data not clearly reported were coded as not reported. When necessary, we attempted to contact authors for additional information.
Although some authors evaluated test performance using several different types of comparison groups, we preferentially selected only one type of comparison group for each study in the following order: (1) patients in whom extrapulmonary tuberculosis was initially suspected but who were later found to have a disease other than tuberculosis; (2) patients in whom pulmonary tuberculosis was initially suspected but who were later found to have non-tuberculous respiratory disease; (3) patients with a variety of diseases other than tuberculosis (mixed disease); (4) healthy persons from tuberculosis endemic countries; (5) contacts of patients with tuberculosis; (6) participants from categories 15 combined; and (7) healthy persons from non-endemic countries. We felt this hierarchy prioritised the groups in which the test would be used and provide more clinically relevant results.
Assessment of study quality
Was the commercial test result performed and recorded by technicians who were unaware (blinded) of the results of the reference standard?
Did the whole sample or a randomly selected subset of the sample receive verification using the reference standard?
Did the study prospectively recruit consecutive patients suspected of having extrapulmonary tuberculosis?
Data collation and meta-analysis
We used standard methods recommended for meta-analyses of diagnostic test evaluations.33 35 As studies were heterogeneous, particularly with respect to the site of involvement, antigen composition of the tests, antibody class (IgG, IgM, or IgA) and control groups, we first grouped studies by type of commercial test and then further stratified by immunoglobulin class and location of disease. If insufficient data (ie, <25 cases) were provided for a specific disease site, we combined data from several sites into a multiple site category with at least 25 cases. To calculate sensitivity and specificity of the commercial tests, we cross-tabulated each result against the reference standard. Sensitivity refers to the proportion of extrapulmonary tuberculosis cases with a positive result on a specific commercial test; specificity refers to the proportion of tuberculosis negative participants that had negative results on a specific commercial test. In calculations of sensitivity we included studies that used smear positivity or histological characteristics as the reference standard together with studies that used culture.
Data were analysed using SPSS (Version 188.8.131.526)36 and Meta-DiSc software (Version 1.4).37 Sensitivity and specificity estimates were calculated for the commercial tests along with their 95 confidence intervals. In addition, true positive rates (TPR sensitivity) and false positive rates (FPR 1 specificity) were summarised using a summary receiver operating characteristic (SROC) curve. Each data point in the SROC space represents an individual study. The SROC curve is obtained by fitting a regression curve to pairs of TPR and FPR.35
The SROC curve and the area under the curve (AUC) present an overall summary of test performance and display the trade off between sensitivity and specificity. An AUC of 1.00 indicates perfect discriminatory ability of the diagnostic test. In addition, the Q index is another useful global summary of the SROC curve and test performance. The Q index, defined by the point where sensitivity equals specificity on the SROC curve, is the point that is intersected by the antidiagonal, the top left corner of the SROC region. A Q value of 1.00 indicates 100 accuracy.35 38 39
In meta-analyses of studies of diagnostic tests, heterogeneity refers to a high level of variability in study results.40 Such heterogeneity could be a result of variability in thresholds, laboratory technique, disease spectrum, study design and/or quality between studies.40 In the presence of significant heterogeneity, pooled or summary estimates from meta-analyses are difficult to interpret. Given the anticipated variability in accuracy, we decided a priori to avoid the pooling of sensitivity and specificity. Also, as described previously, we addressed heterogeneity by using subgroup (stratified) analyses.
Description of included studies
Of the 3720 citations identified in the literature search, 9 publications describing the results of 21 independent studies met our eligibility criteria (fig 1).4149 None of the studies reported the method (eg, consecutive or random) of subject selection. Only one study48 reported blinded interpretation. No studies involved children younger than 15 years or patients with documented HIV infection. Six studies (29)41 46 48 were performed in HIV negative patients and 15 (71) in patients in whom HIV status was unknown or not reported.4245 47 49 The median sample size was 35 cases (interquartile range (IQR) 3056) and 48 participants without tuberculosis (IQR 37194).
ELISA was used in 20 studies and immunochromatography in one study.48 All investigators adhered to standard laboratory methods (eg, mean 2SD measured in a healthy population and receiver operating characteristic (ROC) curves) for determining the cut off values, as recommended by the manufacturers. Table 1 shows the commercial tests and their respective antigens. Anda-TB IgG was the most frequently studied test (n10 studies (48)). Tables 2 and 3 show design, quality, performance characteristics and disease site.
Overall, how accurate are commercial tests for the diagnosis of extrapulmonary tuberculosis?
For all 21 studies the sensitivity estimates ranged from 0.00 to 1.00 and specificity estimates from 0.59 to 1.00 (fig 2). Both sensitivity and specificity varied widely among studies using the same commercial test and among studies using different commercial tests. Confidence intervals for the sensitivity and specificity values of individual studies, depicted graphically by horizontal lines in the forest plots, show poor overlap, suggesting the presence of significant heterogeneity. As shown in fig 3, the accuracy of commercial tests was modest, the symmetrical SROC curve showing a trade off between sensitivity and specificity, with much greater variability in sensitivity estimates.
We identified 10 studies that assessed the accuracy of Anda-TB IgG.4147 As seen in table 2 (and in online supplementary fig S1A available at http://thorax.bmj.com/supplemental), sensitivity estimates ranged from 0.26 to 1.00 and specificity estimates from 0.59 to 1.00. The specificity forest plot (online supplementary fig S1B available at http://thorax.bmj.com/supplemental) includes 7 unique studies as 4 of the total 10 studies were conducted with the same comparison population.45 The results for both sensitivity and specificity among studies by different investigators were highly variable. Individual study results for the other commercial tests in the review are shown in table 3.
How accurate are commercial tests for the diagnosis of a specific form of extrapulmonary tuberculosis?
Four studies determined the accuracy of commercial tests for the diagnosis of lymph node tuberculosis.42 45 As seen in online supplementary figs S2A and B (available at http://thorax.bmj.com/supplemental), sensitivity estimates ranged from 0.23 to 1.00 and specificity estimates from 0.59 to 0.95. Four studies determined the accuracy of commercial tests for the diagnosis of pleural tuberculosis.43 46 Sensitivity estimates ranged from 0.26 to 0.59 and specificity estimates, from 0.81 to 1.00 (see online supplementary figs S3A and B available at http://thorax.bmj.com/supplemental). For both lymph node and pleural tuberculosis, commercial tests showed inconsistent results.
Only one study assessed the accuracy of a commercial test in patients with meningitis.48 In this prospective blinded study (56 culture confirmed patients and 74 controls), the sensitivity of the AMRAD ICT TB test was 0.48 (95 CI 0.35 to 0.62) and the specificity was 0.82 (95 CI 0.72 to 0.90).
Our systematic review of 21 studies examining the performance of commercial tests for the diagnosis of extrapulmonary tuberculosis showed that (1) all commercial tests provided highly variable estimates of sensitivity (range 0.001.00) and specificity (range 0.591.00) for all extrapulmonary sites combined; (2) Anda-TB IgG showed highly variable sensitivity (range 0.261.00) and specificity (range 0.591.00) for all extrapulmonary sites combined; (3) for all commercial tests combined, sensitivity estimates for both lymph node tuberculosis (range 0.231.00) and pleural tuberculosis (range 0.260.59) were poor and inconsistent; and (4) there were no data to determine the accuracy of commercial tests for the diagnosis of extrapulmonary tuberculosis in children or patients with HIV infection. Although commercial serological testsby virtue of being rapid, simple to use and non-invasiveare appealing, this review did not find sufficient evidence to justify their use for the diagnosis of extrapulmonary tuberculosis.
Our systematic review had several strengths. First, the comprehensive search strategy with various overlapping approaches enabled us to retrieve relevant studies published since 1990. Moreover, two reviewers independently completed screening and study selection. To verify reproducibility of data extraction, a second reviewer independently extracted data on five (24) of the included studies. Whenever possible we selected a control population with disease, in lieu of healthy subjects, to evaluate how well commercial tests performed in patients suspected of having tuberculosis. Authors were contacted for missing data. Finally, we analysed data within specific subgroups to lessen the effect of heterogeneity.
This review also had limitations. There were an insufficient number of studies for most of the specific commercial tests or the specific disease sites to provide meaningful summary measures of performance. Our use of stringent criteria for eligibility is probably the main reason that we identified only one study on tuberculous meningitis.48 Fifty-six studies of tuberculous meningitis identified by our search strategy were considered ineligible because they involved fewer than 25 cases; investigated studies of fluids (eg, cerebrospinal fluid) other than serum; involved antigen detection; or relied on clinical features and/or treatment response for case confirmation. Our choice of a bacteriological and/or histopathological reference standard may have limited the inclusion of studies involving children. Paediatric tuberculosis is difficult to diagnose on a bacteriological basis because of the paucibacillary nature of the disease.6 In addition, the number of specific antigens included in the commercial tests in this review was limited (table 1) compared with the number of potential antigens for serodiagnosis.22 50 51
Another set of problems involved shortcomings in study design and quality. No studies reported the method for recruitment of subjects, so it was not possible to ascertain if studies used the sound probabilistic sampling framework found in consecutive or random sampling designs. Only one study48 reported a blinded interpretation of the results of the commercial test and the reference standard. Lack of blinding may have resulted in an overestimation of the accuracy of the commercial test.52 Variability in study design and study quality might account for some of the observed heterogeneity evident in the results. Although statistical tests and graphical methods are available to detect potential publication bias in meta-analyses of randomised control trials, such techniques have not been adequately evaluated for diagnostic data.53 It is therefore difficult to rule out publication bias in our review. In addition, our search strategy may have missed some relevant studies by excluding non-English language publications.
Developing an immunological diagnostic test for tuberculosis presents a formidable challenge in part because both the stage of tuberculosis infection and the tissues involved may alter the profile of genes expressed by the organism and, thus, the antibody responses to these gene products may differ. Studies during the last decade have provided ample evidence that M tuberculosis adapts to its environment by altering the profile of genes that it expresses, that these profiles are modulated as infection progresses and the in vivo environment changes,5457 and that some genes of M tuberculosis are differentially expressed in different host tissues.58 Consequently, antibodies developed in response to pulmonary tuberculosis may not be the appropriate targets for diagnosing extrapulmonary involvement. The choice of reagents for all current assays was most likely determined in patients with pulmonary tuberculosis. Although a systematic investigation of the humoral immune responses of pulmonary tuberculosis has been performed and several antigenic proteins that are recognised by antibodies at different stages of pulmonary disease have been identified,55 59 no similar analysis of antigens expressed during extrapulmonary replication of M tuberculosis has been attempted so far. Thus, identification and study of M tuberculosis genes expressed in the different environments that characterise different sites of involvement may be able to provide the optimal reagents for devising a diagnostic test for extrapulmonary forms of the disease.
It will also be important to examine proteins expressed by M tuberculosis in HIV infected patients as smear negative pulmonary and extrapulmonary disease are disproportionately higher in HIV positive than in HIV negative individuals.35 60 61 Given that memory B cells are relatively independent of T cell help, antibody detection based diagnostic tests would be significant assets for identification of forms of paucibacillary disease. Indeed, despite the dysfunctional cellular immune responses and the presence of HIV induced hypergammaglobinaemia,6264 the presence of antibodies to M tuberculosis antigens has been reported by several investigators.51 55 6567 It is also possible that antibodies to M tuberculosis antigens are elicited in patients who get infected before their immune system is significantly compromised and the CD4 numbers are still high, but we are unaware of studies that have addressed this possibility.
CONCLUSIONS AND POLICY IMPLICATIONS
The evidence presented in this systematic review shows that, at present, commercial tests produce highly variable sensitivity and specificity results and therefore cannot be recommended as a sole test for the diagnosis of extrapulmonary tuberculosis. It is particularly disappointing that there are no studies of commercial tests that are of sufficient quality to enable their evaluation in patients with HIV infection or in children, as it is in these groups that the tests could be most useful. Our findings should be interpreted in the context of the variability in design and the quality of the studies in this review. Recent articles have attested to the mediocre quality of diagnostic studies for tuberculosis.30 68 The use of guidelines such as the Standards for Reporting of Diagnostic Accuracy (STARD)69 and the tool for Quality Assessment of Diagnostic Accuracy Studies (QUADAS)34 may lead to improvements in the quality of future studies. Guidelines specifically for the evaluation of diagnostic tests for infectious diseases have recently been published.70
Further data are given in the online figures available at http://thorax.bmj.com/supplemental.
The authors thank Nandini Dendukuri, Maya Bhat-Gregerson and Anna Meddaugh for technical assistance, Madelyn Hall for help in acquiring the publications in the review, and Izabela Suder-Dayao and Melissa Anthony of WHO/TDR for administrative assistance.
This work was supported by the UNICEF/UNDP/World Bank/WHO Special Programme for Research and Training in Tropical Diseases (TDR). AR and JC, both with TDR, contributed to the conception and design of the systematic review, and critical revision and decision to publish the manuscript. AR also participated in data interpretation.
Competing interests: KW is co-inventor on a number of patents relating to mycobacterial antigens which may be used for serological assays. All rights have been assigned to Statens Serum Institut.
This is a reprint of a paper that appeared in Thorax, October 2007, Volume 62, pages 9118. Reprinted with kind permission of the authors and publisher.
- area under the curve
- false positive rate
- true positive rate
- receiver operating characteristic
- summary receiver operating characteristic
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.