Article Text

Information provided by diagnostic and screening tests: improving probabilities
Free
1. Mark Weatherall
1. Department of Medicine, University of Otago Wellington, Wellington, New Zealand
1. Correspondence to Professor Mark Weatherall, Department of Medicine, University of Otago Wellington, PB 7343, Wellington 6242, New Zealand; mark.weatherall{at}otago.ac.nz

Abstract

Uncertainty in clinical encounters is inevitable and despite this uncertainty clinicians must still work with patients to make diagnostic and treatment decisions. Explicit diagnostic reasoning based on probabilities will optimise information in relation to uncertainty. In clinical diagnostic encounters, there is often pre-existing information that reflects the probability any particular patient has a disease. Diagnostic testing provides extra information that refines diagnostic probabilities. However, in general diagnostic tests will be positive in most, but not all cases of disease (sensitivity) and may not be negative in all cases of disease absence (specificity). Bayes rule is an arithmetic method of using diagnostic testing information to refine diagnostic probabilities. In this method, when probabilities are converted to odds, multiplication of the odds of disease before diagnostic testing, by the positive likelihood ratio (LR+), the sensitivity of a test divided by 1 minus the specificity refines the probability of a particular diagnosis. Similar arithmetic applies to the probability of not having a disease, where the negative likelihood ratio is the specificity divided by 1 minus the sensitivity. A useful diagnostic test is one where the LR+ is greater than 5–10. This can be clarified by creating a contingency table for hypothetical groups of patients in relation to true disease prevalence and test performance predicted by sensitivity and specificity. Most screening tests in populations with a low prevalence of disease have a very high ratio of false positive results to true positive results, which can also be illustrated by contingency tables.

• diagnosis
• screening
• bayes rule

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Introduction

Not in entire forgetfulness, and not in entire nakedness, but trailing clouds of glory do we come.

(William Wordsworth. Ode: Intimation of Immortality from Recollections of Early Childhood)

Uncertainty in clinical practice

Clinicians deal with uncertainty in everyday practice and still work, together with patients, to make diagnostic and treatment decisions. Typically clinical decisions are binary, for example whether a diagnosis is present or not or whether treatment should be started or not.

The sources of uncertainty include that human beings are variable, which can in turn reflect differences in genetic makeup; the genotype, how genetic make-up manifests in any individual expressed as the phenotype and from individual environmental exposures. Variability also arises from the nature of disease, which can vary in onset, severity and effects in any individual and this in turn can vary depending on the individual and environmental variables. Finally, the information that is used to clarify uncertainty is itself subject to variability. This information could be related to the history given by the patient in relation to their symptoms and the physical findings on examination or in the assessment of environmental exposures. The information could also be in relation to diagnostic testing. The uncertainty is inevitable and inescapable, and clinical decisions that take into account variability are likely to lead to better decisions. Embracing variability uses the nature of probability to optimise information in relation to uncertainty.

Probability, proportions and odds

Probability and understanding the power of probabilistic reasoning are tools that use the legacy of past research to help refine decision-making in terms of probabilities and then optimise decisions in clinical practice, using appropriate and explicit reasoning.

A probability is defined by an observation which can have one of two outcomes.1 The probability of an outcome is the long-term ratio of the number of times an outcome is observed divided by the total number of observations. Probabilities can range in value from 0 (no outcomes will occur for all observations) to 1 (all the observations have the same outcome). A proportion is an observed number of events divided by the number of observations. Proportions and probabilities can be rescaled by multiplication by 100 to give the proportion as a percentage, with limits of zero and 100. In a technical sense, observed proportions are used to estimate underlying probabilities of an underlying random (also called stochastic) process. Some other features of probabilities are that the probability of an observation not having an event is the complement of the probability of having an event; defined as one minus the probability of an event. Probabilities can also be rescaled in another way as odds.

The definition of odds, in relation to probabilities, is that the odds of an event are the probability of an event divided by the probability of not having the event. Odds have a distinguished history in gambling where they are the ratio of a bet made in relation to a pay-out that will be received if an event happens. For example, odds of ‘10 to 1’, odds of 0.1, reflects a one dollar bet which will be paid out as 10 dollars if the event happens. Odds and probabilities can be transformed one to the other according to the formulae:

And

In the case of the example odds of 10:1 reflect a probability of winning the bet of 0.1/1.1, which is approximately 0.091 and can be rescaled to 9.1%.

There is a one-to-one correspondence between any particular probability and odds, except if the probability is 1 or 0, in which case the odds are not defined because of division by 0.2 3

Clinical encounters are more complex than simple probabilities. Usually clinical encounters occur where there is pre-existing (prior) knowledge about the patient. In the setting of screening for disease in asymptomatic people, this can be at the basic level of knowledge about the distribution of disease in the population as a whole. In clinical encounters, the prior knowledge may be more sophisticated. For example, the distribution of a disease may be known in particular parts of a population, defined by age, sex or ethnicity. Early in clinical encounters needing diagnostic reasoning, patients may present with symptoms which have a well-understood distribution of underlying possible diagnoses. Integrating clinical information as part of reaching a diagnosis often then requires diagnostic testing. Any particular diagnostic test usually has a biological rationale. This means that symptoms and signs of disease, together with other clinical characteristics of the particular patient, are interpreted together with the test, to give greater certainty as to the underlying pathological process. This can also be framed as a clinician first having coming to an explicit or implicit (often intuitive) estimate of the probability that a patient has a particular disease process and then using diagnostic testing to refine this probability. However, clinical intuition can be misleading for diagnostic and related reasoning. Typically clinicians and patients overvalue the results of diagnostic testing or, in a population-based context, the balance of benefits and risks of screening for disease.

Diagnostic testing as part of clinical decision making

Clinical (also known as medical) decision making can be formalised in a way that takes account of all possible diagnostic and therapeutic actions for a particular clinical encounter to include the probabilities of clinical outcomes and adverse effects and to add in the costs of different actions and inactions, where the cost can be on a number of scales.4–7 One scale is that of health utility, that in turn can be translated into quality adjusted life years. For clinical decision making, the relationship between the probability of disease and costs should mean that consequences that are high stakes such as the death or severe disability may offset lower probabilities that a disease is present after diagnostic testing. A high risk of adverse effects for a particular treatment means taking this into account in a formal decision making process and may lead a decision to start treatment only when the probability of the disease is high. In a similar way, a decision to use a treatment that has a low financial or adverse effect risk may be more likely to be used even if the probability disease is present is lower.

In relation to diagnostic testing and in the related context of disease screening in asymptomatic populations, an important issue is that it is uncommon for diagnostic or screening tests to be positive in all cases of disease, the sensitivity of a test and negative in all cases where the disease is absent, the specificity of the test. When possible, explicit estimation of the probability of disease, in relation to diagnostic testing or screening, is useful as a first step in refining formal decision making.

Information and the likelihood ratio

Information provided by a diagnostic or screening test is the degree to which the diagnostic test increases or decreases the estimated probability that a disease is present or that a disease is not present, over and above what is known before the test is applied.8–10 A natural way to combine these two pieces of information, that a disease is present or not present depending on the results of a diagnostic test, is as a ratio. In diagnostic situations, this is typically called the likelihood ratio. When this reflects the likelihood a disease is present this is the so-called positive likelihood ratio (LR+) and when it is for the situation when a disease is not present, the so-called negative likelihood ratio (LR–). A larger value of the likelihood ratio means that a particular situation is more likely and reflects more information given from the diagnostic test.

There are two factors that lead to a larger value for the LR+. The first is when the diagnostic test is more likely to be positive for the disease when it is present, namely that it is a sensitive test. This is the numerator of the ratio. The LR+ will also have a larger value when the denominator is small and this occurs when a test is less likely to be positive when the disease is in fact absent, it is specific. The complement of the specificity (1 minus the specificity) reflects the probability that the test is positive when the disease is not present. A test with high information content for making a diagnosis occurs when the numerator (the sensitivity) is large in relation to the denominator (1 minus the specificity).

The LR– is defined in a similar way. In this situation, there also two factors that lead to a larger value for LR–. This first factor is the numerator, in this case the probability that test likely to be negative when indeed the disease is absent. This is the specificity. The LR– will also have a larger value when its denominator is smaller. This is the probability the test is negative when in fact the disease is present, which is the complement of the sensitivity (1 minus the sensitivity). A test with high information content for ruling out a diagnosis occurs when the numerator (the specificity) is large in relation to the denominator (1 minus the sensitivity).

Table 1 shows the contingency table of all possible outcomes of test results compared with reality for the presence or absence of a disease.

Table 1

Contingency table of diagnostic testing outcomes in relation to presence or absence of disease

Table 2 defines various terms used to describe diagnostic and screening testing based on the individual cells of the contingency table of possible outcomes.

Table 2

Definition of terms used in diagnostic and screening testing in relation to a contingency table of outcomes

A feature of these ratios is that some are naturally better described as ratios of odds rather than ratios of probabilities. The LR+ in the first-line formulation in the table is the ratio of two probabilities. In the second line, the formulation it is the odds of the presence of disease to absence of disease in those who are test positive divided by the odds of disease presence to disease absence in all patients. The LR– is the odds of disease absence to disease presence in those who are test-negative divided by the odds of disease absence to disease presence in all patients.

Bayes rule

The concept of information is useful because knowledge about the probability that a patient has a disease is present before application of a diagnostic test, represents some information. Technically this is called the prior information in the sense that the information is prior to any further actions. Extra information is then provided by a positive outcome of diagnostic testing. In terms of the probability of a patient having a disease this is the LR+. The prior probability can be rescaled to odds. This then gives the prior odds. Multiplication of the prior odds by the LR+, by increasing the information available about the patient, increases the odds that the patient has the disease after the fact of measurement; to give rise to the posterior odds of the disease, posterior in this case referring to after the fact of diagnostic testing. This formulation is Bayes rule; the posterior odds are the prior odds multiplied by the likelihood. In another formal formulation of Bayes Theorem, the relationships are in terms of conditional probabilities but the two formulations are equivalent (see online supplementary appendix 1).11

Supplementary file 1

In general, a LR+ of  greater than 10 will strongly modify posterior odds in relation to prior odds and a LR+ of greater than 5 will moderately modify posterior odds in relation to prior odds. Table 3 shows how LR+ modifies prior probabilities, and the associated prior odds, to posterior odds and their associated posterior probabilities for a range of prior probabilities that a diagnosis is present. General features of this table are that at low prior probabilities, even after application of a useful test, there can still be moderate diagnostic uncertainty. At high prior probabilities, the resolution of uncertainty with a useful test is not particularly marked. Diagnostic testing is most useful at modifying certainty of a diagnosis across a middle range of prior probabilities.

Table 3

Examples of the modification of prior probabilities by diagnostic testing

It is also useful to consider the range of sensitivities and specificities that in turn give rise to useful diagnostic tests, defined by LR + of between 5 and 10. This can be shown for a range of sensitivities and a range of specificities, shown in figure 1, with the actual underlying values shown in table 4.

Figure 1

Sensitivity and specificity to achieve a positive likelihood ratio of 5 (grey) and 10 (black).

Table 4

Sensitivities and specificities to achieve positive likelihood ratios of 5 and 10

From this figure and table, it can be seen that if the specificity of a test is less than 0.80, then it is not possible to have a sensitivity for a test that will achieve a LR+ of 5, because the maximum value for sensitivity is bounded by 1. Similarly, if the specificity of a test is less than 0.90, it is not possible for a test to achieve a LR+ of 10. Although not as obvious from the plot from table 4, it can be seen that as sensitivity for diagnostic tests becomes less, the value of specificity has to become very large in order for the LR+ to be useful; for example, at a sensitivity of 0.50, the specificity of a test needs to be 0.90 for a test to have a LR+ of 5 and 0.95 for a LR+ of 10.

Application of Bayes rule to diagnosis and screening

An example of how these calculations might be used is illustrated by two papers with regard to the diagnosis of congestive heart failure using B-type natriuretic peptide. A study of nearly 600 emergency department (ED) presentations in adults aged over 21 years to a US hospital identified that 209/599 (35%) were diagnosed with congestive heart failure. In those patients with dyspnoea who also had orthopnoea, congestive heart failure was present in 67/102 (66%).12 A systematic review and meta-analysis of the diagnostic performance of B-type natriuretic peptide estimated the sensitivity and specificity for heart failure at a threshold value of 100 ng/mL were 0.95 and 0.67.13 Using Bayes rule, the posterior probability that an adult presenting to ED with dyspnoea and who also has a B-type natriuretic peptide level above 100 ng/mL is 60%. This is based on a prior odds of 209/390 (0.54), multiplied by the LR+ of 0.95/0.33 (2.88), to give a posterior odds of 1.55, representing a probability of 61%. Extra prior information, in the form of an additional history of orthopnoea, means that the probability increases to 84%. This is based on a prior odds of 67/35 (1.9), which is the odds that and adults presenting to ED with dyspnoea and orthopnoea has congestive heart failure, multiplied by the same LR+ to give a posterior odds of 5.47, representing a probability of 85%. These calculations can also be shown by creating contingency tables based on a notional population of similar patients and filling in the cells of the table using the prevalence of the disease, sensitivity and specificity. For this example, if a notional population of those presenting to ED with dyspnoea of 10 000 is chosen, then based on the prevalence of congestive heart failure of 35%, 3500 will actually have heart failure and 6500 will not. The sensitivity of the test is 0.95 so that of the 3500 with heart failure, 3325 will be detected. Of the 650 without heart failure, the specificity of 0.67 means that 4355 without heart failure will be test negative. By subtraction, the other cells of the contingency table can be filled in (table 5) and the probability that a patient who was test-positive in this clinical population will actually have heart failure is 3325/5470 (61%).

Table 5

Contingency table of heart failure probabilities after diagnostic testing with test sensitivity of 95% and specificity of 67%, where disease prevalence is 35%

For screening tests, the proportion of those tested who in reality have a disease is typically very low. This means that almost no matter how good a test is based on sensitivity and specificity, the vast majority of those who are test-positive will not in fact have the disease. For example, consider a disease in an otherwise asymptomatic population with a prevalence of 1/1000 (0.1%). This translates is an odds for having the disease of 1/999. In passing, it is useful to see that very low proportions translate to odds that are very similar numerically. If a screening test is 99% sensitive and 95% specific, then the LR+ is 19.8 (0.99/0.05). The posterior odds of disease are 1/50.45 (1/999 multiplied by 19.8). This posterior odds translates is a probability of approximately 1.94%. In this example, only about 2% of those who are test-positive will actually have the disease and 98% will not. A particular specific example of a screening test is faecal immunochemical testing for bowel cancer. A systematic review and meta-analysis of the diagnostic performance for this type of test gave point estimates of sensitivity of 79% and for specificity of 94% and so for the LR+ of 13.1, the latter with 95% CI of 10.5 to 16.4.14 In a screened population in a New Zealand pilot study of colonoscopic screening following a positive immunochemical faecal occult blood testing, the prevalence of bowel cancer in the screened population was about 0.3%.15 Using these figures, the post-test probability of bowel cancer in a screening population moves from 0.3%, before the outcome of testing was known, to 3.8%. The same screening study also identified the possible harms of screening. About 0.7% of the group who were screened had significant bleeding or perforation as a result of the colonoscopy after the faecal occult blood testing. In a hypothetical population of 10 000, about 24/30 with bowel cancer would be detected but about 600 people without bowel cancer would receive a colonoscopy, of whom about 4 would have bleeding or a perforation as a result of a test for a diagnosis that they do not have.

Table 6 shows this more intuitively by completing an appropriate contingency table for test results compared with reality for the situation where, in this case, a hypothetical population of 100 000 is screened. This table is formed by calculating the number in the population who have the disease 0.1% of 100 000, in this case 100. Sensitivity only depends on those with the disease, so in this hypothetical scenario with a sensitivity of 99%; 99 of those with disease are picked up and 1 is missed, filling in the left-most cells of the contingency table. Those who do not have the disease are 99 000; which 100 000 minus the 100 that do have the disease. Specificity only depends on those without disease so that the bottom right cell is 95% of 99 000 and the top right cell by subtraction, which counts the false positives, is 4995. After application of the test, the proportion of those who have the disease of those who are screen positive is 99/5094 (1.94%). This is the same as that calculated by application of Bayes rule to the prior odds of 1/999 and a LR+ of 0.99/0.05.

Table 6

Contingency table of a screening test results with test sensitivity of 99% and specificity of 95%, where disease prevalence is 1/1000 (0.1%)

Discussion

Clinicians cannot escape uncertainty and in order to best manage decision making with patients, clinicians need to have a reasonable understanding of uncertainty, expressed as probabilities and how diagnostic information can be used to modify probabilities. A systematic review of how well health professionals interpret diagnostic information reported that commonly used measures of test accuracy are poorly understood by health professionals.16 Bayes rule is a method of modifying beliefs in the light of existing and new information and can be expressed mathematically in terms of the sensitivity and specificity of diagnostic testing. Multiplying current beliefs, expressed as odds, by a ratio defined by the metrics of diagnostic testing in the likelihood ratio, gives rise to more diagnostic certainty. The likelihood ratio has information that can change prior beliefs, the larger its value the more it will potentially change beliefs. A large value of a likelihood ratio for making a diagnosis happens when the test is sensitive, with a high probability of detecting disease if it is presence, and when it is specific, so that there is a low probability of detecting a disease when it is absent. Simple contingency tables of testing with respect to reality incorporating sensitivity and specificity as well as the prior beliefs in the probability of the presence or absence of a disease gives the same results as calculations based as odds and likelihood ratios, but may be more intuitive to understand, particularly for the outcomes of screening for relatively uncommon clinical events. Clinicians may still need to have an approach to use when diagnostic testing has not been particularly helpful. One approach is to consider gathering more clinical information to increase the pretest probability in relation to a particular diagnostic test. This was illustrated in the discussion of B-type natriuretic peptide testing in relation to congestive heart failure where extra clinical information, in this case a history of breathlessness, improved diagnostic performance. Another approach that might be considered is performance of a sequence of diagnostic test. In essence, this approach uses a particular diagnostic test as a screening test to modify pretest probabilities and allow use of, possibly a more expensive test, but one with improved diagnostic performance, to be applied to a smaller number of those for whom a particular diagnosis is considered. Formal application of serial Bayes rules might be then applied although this does assume that the conditional probabilities of the serial tests are independent. Finally clinicians may wish to distinguish between a threshold for diagnosis and a threshold for treatment. In this situation, a treatment which is relatively adverse event-free or where the disease is particularly harmful if left untreated, might be usefully given to a patient whose post-test probability of a disease is less than a particular threshold for diagnosis. The response of the patient to treatment might also be considered to provide additional diagnostic information.

Main messages

• Uncertainty is inevitable in diagnostic clinical encounters.

• Quantifying uncertainty through expression of this by probabilities is important to refine probabilities by diagnostic testing. Probabilities and odds are interchangeable ways of expressing the belief that a patient might have a particular disease.

• Bayes rule is an arithmetic way of using knowledge about the probability a patient has a disease and modifying this knowledge using characteristics of diagnostic test performance, provides extra information about the patient.

• For the presence of disease, multiplying prior odds of a disease by the positive likelihood ratio, defined as the sensitivity divided by 1 minus the specificity, allows estimation of the probability of disease given a diagnostic test is positive.

• Bayes rule can also be intuitively understood by creating a contingency table of true disease prevalence and diagnostic test performance to estimate the probability a patient with a positive diagnostic test actually has a disease.

Current  research questions

1. What is the best way to make explicit estimates of the probability of disease available across a wide range of diagnostic clinical encounters?

2. What is the best way to make explicit estimates of diagnostic performance, relevant to common and uncommon clinical presentations, widely available to clinicians performing diagnostic tests?

3. Does incorporating formal probabilistic reasoning in diagnostic encounters improve patient outcomes?

Key references

1. Naglie G, Krahn MD, Naimark D, et al. Primer on medical decision analysis: Part 3- Estimating probabilities and utilities. Med Decis Making 1997;17:136–41

2. Jaeschke R, Guyatt GH, Sackett DL. Users guides to the medical literature III. How to use an article about a diagnostic test B. What are the results and will they help me in caring for my patients? JAMA 1994;271:703–7

3. Greenhalgh T. How to read a paper: papers that report diagnostic or screening tests. BMJ 1997;315:540–3

4. Grimes DA, Schultz KF. Uses and abuses of screening tests. Lancet 2002;359:881–4

5. Westbury CF. Bayes' rule for clinicians: an introduction. Frontiers in Psychology 2010;1:192

Self assessment questions

1. Which is the best description for the sensitivity of a diagnostic test?

1. The number of patients who have a positive test result.

2. The proportion of patients with a disease who have a positive test result.

3. The proportion of patients with a disease whose disease status agrees with the test result.

4. The proportion of patients with a positive test result who have a disease.

5. The proportion of times a positive test result leads to a change in management.

2. Regarding proportions, probabilities and odds which one of the following is most correct?

1. A proportion is the number of times things occur out of one hundred observations.

2. All proportions can be redefined as odds.

3. All proportions can be redefined as a percentage.

4. The value of odds can be less than zero.

5. The complement of a probability is 1 minus the odds.

3. Regarding the positive likelihood ratio which one of the following is most correct?

1. Multiplying the prior odds by the positive likelihood ratio gives an estimate of the posterior odds.

2. A positive likelihood ratio means that a diagnosis is more likely.

3. A negative likelihood ratio means that a diagnosis is more likely.

4. A positive likelihood ratio of 2 greatly increases the probability a disease is present.

5. The positive likelihood ratio is defined as the sensitivity divided by the specificity.

4. If the probability a patient has a disease before diagnostic testing is 35% and a diagnostic test has a sensitivity of 95% and a specificity of 99% if the test is positive what is the probability the patient has the disease?

1. 58%

2. 68%

3. 78%

4. 88%

5. 98%

5. The prevalence of a disease in an otherwise asymptomatic population is 1/1000 (0.1%). If a test is used that is 99% sensitive and 95% specific for the disease what proportion of those who are test positive have the disease?

1. 1%

2. 2%

3. 4%

4. 18%

5. 16%