Nutricia Ireland: +353 (01) 289 0283
Paul Manson. Liason/Clinical Librarian, NHS Grampian, Scotland.
“Medicine is a science of uncertainty and an art of probability”
- William Osler
The renowned Canadian physician William Osler might just as well have had medical statistics in mind with this quote. This short article will show you how to interpret statistical uncertainties in clinical studies and how to extract useful conclusions from them. The explanations below are non-mathematical and are intended to give you a passable understanding of the concepts. This is not an abbreviated statistics course – perhaps real statisticians should look away now!
Why do we need to apply statistics to trials? Why can’t we just count how many people get better? Unfortunately, uncontrolled external influences can affect this simple tally:
- Any measurements are subject to some uncertainty
- Some conditions will improve spontaneously
- People react differently to medication
- Participants in a trial can behave differently
Statistics are used to show that the results of the trial are due to the intervention and not to these external influences. To do so, an approach called the null hypothesis is used. The null hypothesis assumes that any differences observed between two treatments are solely due to the external influences mentioned. The statisticians work out how likely it is that the differences are due to these influences. If the differences are unlikely to be due to these influences then they must be due to the intervention.
The Null Hypothesis
1. Assume that there is no real difference between the treatments and any differences seen are due to external influences
2. Work out how likely it is that the differences are just due to external influences
3. If it is unlikely that the differences are just due to external influences then they must be due to the intervention.
How do we measure how likely or unlikely an outcome is? We use a P-value. P-values range from zero (the outcome will never happen) to one (the outcome is certain to happen).
How unlikely does it have to be for us to decide the difference is not due to external influences? Conventionally, a figure of less than 1 in 20 is chosen as being ‘statistically significant’, that is P<0.05.
P<0.05 means there is less than one chance in 20 that the results of the trial are due to external influences, or, conversely, there are more than 19 chances in 20 that the results of the trial are due to the intervention.
If P ≥0.05 it means that no real (statistically significant) difference has been identified even if the values are numerically different. Of course, sometimes this can be the result you are looking for. If the trial is trying to show that one treatment is as effective as another (perhaps the new treatment is cheaper) then P ≥0.05 means the trial hasn’t been able to identify a difference.
“The trial hasn’t been able to identify a difference” may seem an odd way of saying, “There was no difference”. However, it is possible there is a real difference between the treatments but the trial can’t recognise it. This is where the power calculation comes in.
Typical power calculation statement
To identify a difference of 3.5 points in the Barthel score at a power of 80% and P<0.05, it was calculated that 130 participants were required in each arm of the trial.
The power calculation is used to determine how many participants the trial needs to give statistically meaningful results. If a study is underpowered (it doesn’t have enough participants) it can have one of two effects on the results. First, even if there is a real difference between the treatments it might not show up in the results, the P-value will be greater than 0.05, and we will wrongly conclude there is no difference. Or, if a difference is found, the low number of participants will often manifest itself as results with wide confidence intervals (CI).
In any measurement there is a degree of uncertainty (not just physical measurements, but things like survey responses, too). The researcher can go to her box of statistical tricks to produce a Confidence Interval (CI), normally a 95% CI. This says the researcher is 95% sure the actual value lies between two limits either side of a nominal value.
Note that if the 95% CI includes zero then you can’t say the intervention makes a difference. Narrow confidence intervals can indicate more reliable results.
There is a move towards using confidence intervals when reporting trials rather than just P values. While a P-value will tell you if a result is ‘real’, confidence intervals can give you more information about the effectiveness of an intervention. If you would make a different decision at the two limits of a CI then the results of that one trial are probably not good enough for you to come to a conclusion about a treatment.
The comments above (that results with P<0.05 are real, and confidence intervals give an indication of the upper and lower limits of the intervention) also apply when the outcome is given as a ratio (that is, the rate of events in one group compared with another), with one important proviso. For a ratio, if the 95% confidence includes one then you can’t say the intervention makes a difference (the rate of events in the two groups is the same).
Finally, the outcomes of a trial can be reported as unadjusted versus adjusted results. Unadjusted results are the simple figures from the trial; they do not take into account any known risk factors that might influence the outcome. For example, in a trial of an anti-hypertensive, there may be more heart attacks in the control group not receiving the drug. But perhaps the control group includes more smokers than the treatment group. Knowing smoking is a risk factor for heart attacks, the researchers would take this into account and produce adjusted results.
In a trial that uses both unadjusted and adjusted results it is important that the conclusions only consider the adjusted results – if the researchers need to account for some risk factors then they can’t ignore them by selectively quoting the unadjusted results.
The statistical discussion of a trial can be extremely complicated and unless you are a statistician you will never be able to say if it is right or wrong. However, it is not difficult to decide if the results are useful to you. Are the differences between the treatments real (P-value <0.05) or just due to random effects (P-value ≥0.05)? Is there a power calculation and did the trial have enough participants? This is essential if the trial is trying to show there is no difference between treatments. Looking at the confidence interval, does it include zero (or one for a ratio)? If so, there is no difference between the interventions (assuming the trial has enough participants). Consider also if you would make a different decision at the low limit of the confidence interval compared with the high limit. Lastly, if the authors have decided to take other effects into consideration and produced adjusted results, have they referred to them consistently, or called upon unadjusted results where they support their argument?
Paul's lecture from our INDI educational meeting earlier this year is available on DVD. If you would like to receive a copy please phone 1800 412 414 (ROI) or 0800 028 3416.