Establishing causation is not an easy task and a number of scientific methods have been developed specially for this purpose. Randomised controlled trials (RCTs) are by many, but not all, considered to be the gold standard. This means that RCTs are thought to provide the highest form of evidence of causation, and the results of such studies are frequently used to guide expert advice on what to eat, how to teach, which medical treatment to choose, whether to worry about pesticides, and so on. But can we trust RCTs to tell us the full causal story? Not really.
A number of causally relevant factors must be excluded from an RCT, either because of ethical or methodological constraints:
Severe effects cannot ethically be tested for: One cannot, for instance, perform an RCT to prove that drinking while driving increases traffic accidents or that being hit in the head with a rock causes fatal injury. If one did, people could suffer severe damages or even death as a direct result of the trial. This point is discussed in the satiric research paper ‘Parachute use to prevent death and major trauma related to gravitational challenge: systematic review of randomised controlled trials’.
Risk groups must be excluded: If we suspect that an individual would be put at risk in the trial, then for ethical reasons at least, they should be excluded. Some might have allergies, for instance, that makes them particularly vulnerable to a certain intervention. Some groups are generally more vulnerable to adverse effects than others, and are considered to be risk groups. In medicine this would typically be children, old people, and very sick people with complex symptoms and medical needs.
Variations and heterogeneity must be ignored: Since RCTs typically test for same cause and same effect in a group, individual variations and heterogeneity are necessarily disregarded. Alternatively, a particular type of variation can be studied by singling out a sub-group, for instance boys between 8 and 12, or pregnant women under 30. But within each sub-group, homogeneity is what one is looking for, which means that any variety must be treated as noise.
Individual propensities cannot be studied: RCTs are performed to find statistical frequencies of a given population. We might find that a drug increases the chance of recovery, but it’s still possible that an individual patient gets worse. If we apply the results of an RCT directly to an individual, we must assume that they represent a statistical average. But even though we know that half of all smokers die from it, we cannot conclude from this that every smoker has a 50:50 chance of doing so. This is a well-known problem, also referred to as the ecological fallacy.
Negative results are excluded: In order to document that an intervention works, we might have to perform many RCTs, some of which show no effect or are inconclusive. Such negative results are rarely published, leaving the results of a number of studies unknown. Any systematic review of RCTs is thus in danger of showing an overly optimistic trend. This is why there is an increasing pressure on scientists to publish negative results.
These are not necessarily weaknesses of RCTs as such. All scientific methods have their limitations. The problem lies in how we interpret the results from these studies. If we fail to take into account what is excluded already at the outset of an RCT, we risk drawing conclusions that are not scientifically supported. One conclusion we might draw is that what happened to the people in the study applies to individuals that were not part of the study. That is, we assume external validity. If a study is performed on women over 60, for instance, the results might not transfer directly to men under 30. The randomization is supposed to guarantee that the groups in the trial represent the general population, to which the results are meant to apply.
But say a medical treatment is tested on a relatively young, healthy and homogenic group; can we conclude that the results also apply to the old, the sick and the very young? No. And if no studies show that the treatment has severe adverse effects; can we conclude that those who take the medication won’t have any? No. And if the medication has a documented effect in 9 out of 10 patients; can we infer that an individual patient will have a 90 percent propensity to have such an effect? No.
Being open and explicit about what is excluded from RCTs already at the outset, allows a more realistic interpretation of the results. We might also be more cautious about applying these results universally and unconditionally.