The causal value of Big Data. Two views on causation


Not everything should be approached quantitatively. Is causation perhaps one of them?

Scientific research relies on data, and preferably lots of it. Population studies and statistical models are used to find and establish causal knowledge. The idea is that the more data we have, the better justification we have for our causal hypotheses.

Philosophically, however, there are at least two ways to approach causation. One is quantitative with lots of data and frequentism, where the more data we have, the closer we get to causal knowledge. The other approach is qualitative, looking at the concrete causal situation, the properties involved and their interactions.

In the quantitative approach to causation, emphasis is on repeatedly confirming our causal hypotheses by reproducing studies. A motivation behind such repetition is to separate accidental from causal correlations, thus seeking to avoid the well-known problem of induction.

The problem of induction can be illustrated by an example. We have a bag of 1000 marbles, the colours of which are unknown to us. The more marble balls we check, the closer we get to an accurate theory about their colour. Before we have checked all marble balls, any attempt at a hypothesis will be fallible, even if we have a large and representable sample. The very last marble ball might be the one that falsifies our hypothesis.

Causation on the quantitative view is treated like a bag of marble balls, but with infinitely many instances of the cause C and the effect E. If we could, in principle, check all past, present and future instances of C and E, we would know all the for certain if (or to what frequency) C causes E. Of course, David Hume pointed out that this was impossible, so scientists have settled for the second best option: large scale correlation data.

This view on causation is closely related to Humeanism, the idea that the world is one of separate and discrete events where anything could follow anything. Causation is then nothing but the perfect correlation between two types of events. From this it follows that if we knew all facts about the world, we would also know all relations between then, causal or other. A problem with this view is that perfect correlations are neither necessary nor sufficient to generate causal truths. The birth control pill causes thrombosis, but only in 1 of 1000 cases, while the per capita consumption of margarine is 0,99 correlated with the divorce rate in Maine. Whether a correlation is weak or strong can therefore not tell us whether we have an accidental or causal relation.

On the qualitative view on causation, causation happens in the concrete. Each causal set up will be unique, with different properties involved, and therefore different propensities. So instead of expecting that a causal factor should produce the same effect over a variety of contexts, one looks at the properties, or real causal powers, of things and their interactions.

This interaction is best studied by placing a property in different contexts to see what interferes with or amplifies the effect. In this way one could manipulate and tease out the hidden causal powers of things and learn more about how properties affect each other. When our expectations of the effect is not met, we could seek to understand the causal mechanisms that were involved.

We see that the two approaches represent different ways to arrive at causal hypotheses. In one we count the number of instances that confirm our causal hypothesis. In the other we try to find out what happens when the expected effect does not follow. In the first, we seek robustness. In the second we expect a high degree of complexity and context-sensitivity. Perhaps the best theory of causation would be the one that could explain both these seemingly contrasting features.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s