by Maritza Ilich Mauseth
This is a discussion of an article by J.C. Crabbe, D. Wahlsten and B.C. Dudek on the ‘Genetics of Mouse Behavior: Interactions with Laboratory Environment‘, Science, 284, 1670-1672, 1999. It is written for the course PHI302 Causation in Science at Norwegian University of Life Sciences. Mauseth is a Master student in Ecology at the Faculty of Environmental Sciences and Natural Resource Management, with 170 credits in biology and an additional 15 (soon 20) credits in Philosophy. I asked her for permission to include the discussion in my blog as an example of how philosophical reflections about causation in science can be done in practice.
Despite the title, the article by Crabbe and al. (1999) says less about the genetics of mouse behaviour, than the difficulties associated with controlling all variables in experiments, in particular those involving vertebrates. Nonetheless, the point that genetics alone may not be the determining factor in behavioural traits, is neatly demonstrated.
It is an interesting and instructive article for examining the issue of reproducibility, which is considered to be a necessary element of good scientific practice. If one conducts an experiment on butterfly larvae, farmed salmon, watermelon seedlings, or boron isotopes, it is not only desirable, but expected, that another scientist can reproduce the experimental conditions, and thus the results (within a reasonable margin). The same applies with experiments on mice, genetically modified or otherwise.
However, in the case of mice, certain behavioural traits are associated with particular genotypes – either inbred, or genetically modified. Mice are used as model organisms, a surrogate for humans when testing the physiological and behavioural affects of foods, medicine, or isolated chemicals. In an anthropocentric world, the stakes are high, and correctly ascertaining the causal relationship is critical. Is it exposure to the chemical being tested that is linked to a particular behavioural/physical outcome, or is it merely a reflection of a behavioural/physical trait which is considered to be inherent to a particular genetic strain? Or is it something else entirely?
Indeed, it is the uncomfortable issue that it might be “something else entirely”, that makes this article so instructive. The use of the word “uncomfortable” is not accidental. In science, we want to predict with certainty; and we want to be sure of our outcomes. The very nature of the experimental set up underpins a deterministic imagining of the world – if X is done to Y, then the result will be Z. The results, presented in a statistical format, then provide us with empirical support, which we choose to interpret as “evidence” (while carefully refraining from using that word) for a supposed causal relationship. Yet, although we have an outcome, are we really sure that what we think caused it, was indeed the cause?
My reading and analysis of this article is that these tests, even in highly controlled environments, do not serve to demonstrate the causal relationships the scientists wish to identify, and indeed because mice are employed as surrogates for humans, the causal relationships they are under pressure to identify. However, I suggest that the results are still perceived by the authors as doing just that – acting as “proof” of a satisfactory method.
The authors set out to examine whether they can reproduce the results of six behavioural experiments on mice in each of the test laboratories located in Albany, New York; Edmonton, Alberta; and Portland, Oregon. The set-up is intended to validate previous experimental data regarding behavioural traits associated with some strains of mice. More importantly, it is intended to demonstrate that experiments can be repeated, and that this repeatability extends to different locations, i.e. they are also “robust”. Crucially, the experimenters “went to extraordinary lengths to equate test apparatus, testing protocols, and all possible features of animal husbandry”. In other words, they genuinely tried to control as many variables as possible, to ensure repeatability.
They used the same six inbred strains, a hybrid cross of two of these strains, and a null mutant in each lab (8 groups). Furthermore, in an effort to examine whether shipped animals perform differently under tests, than locally bred animals of the same strains, each lab tested locally bred and shipped specimens of each, both 77 days old. Shipped specimens were given 5 weeks to acclimatise. (In total, 128 mice were tested in each lab, in two batches, one week apart. In practice this means that every test had 4 shipped, and 4 locally-bred samples from each group).
It should already be evident that such small numbers suggest that the results must be viewed with some scepticism. Indeed, the authors note in the text of the article that, “The numbers of mice we tested made formal statistical assessment of reliability infeasible, but it would be important to know whether each laboratory would obtain essentially the same strain-specific results if this experiment were repeated.” That use of the word “essentially” is disturbing. It suggests an underlying view that “if the results are close enough, they are good enough”, and implicit in this, that we can be satisfied with the robustness of the methodology we are testing. It is contradictory, to say the least.
It is important to observe that the article is published in Science; a noteworthy, a career-boosting achievement for any scientist. As a prestigious, peer-reviewed journal, it plays a significant role in shaping and upholding the research paradigm in general, as well as contributing to the framework of acceptable practice within a particular scientific discipline, such as molecular genetics and behavioural science, as well as the applications of novel technologies within the spectrum of fields associated with human health. As such, while it might appear to be injudicious to criticise an article published in such a journal, it is also appropriate to submit such articles to scrutiny, and to understand what opinions, and underlying biases in science they are shaping.
The key findings were that “significant and, in some cases, large effects of site were found for nearly all variables”. This indicates a lack of robustness. Furthermore, the variables were not as highly controlled as the experimenters had intially intended eg. mice in Edmonton were injected with cocaine from a source different to that of the cocaine provided to mice in Albany and Portland. The mice in Edmonton were also handled by an experimenter wearing a respirator, due to a mouse allergy.
Furthermore, behavioural differences that could supposedly be attributed to mouse strain, varied from site to site, to a degree that the authors considered “substantial” for some tests. Despite this, they conclude that “genotype was highly significant for all behaviors studied, accounting for 30 to 80% of the total variability”. They concluded that where there are large differences between strains (in terms of performance), the different environmental factors extant in each laboratory are unlikely to impact on these differences “in a major way”. Curiously, they associate smaller differences to the impacts of environmental conditions.
Interestingly, as this was a hypothesis they wished to test, there appeared to be no significant difference (after 5 weeks acclimatisation) in the performance of shipped vs locally bred mice.
INTERPRETING THE STUDY AND RESULTS IN TERMS OF CAUSATION
The variation in results, between laboratories, and indeed within the same strains of mice, indicate that a strictly deterministic view of causation is impossible to uphold. Indeed, Popper (1982), in the opening chapter of his book arguing for indeterminism, defines “scientific” determinism as being within a closed system. No matter how closely variables are controlled, a laboratory is still not a closed system.
In fact, having a gene that is associated with behaviour X, does not necessitate the production of that behaviour. The behaviour may or may not be exhibited. Yet this type of experiment seems to equate causality with necessity. Anscombe (1971) describes the underlying assumption thus; “If an effect occurs in one case and a similar effect does not occur in an apparently similar case, there must be a relevant further difference”. She describes it as “neo-Humeian” and points out that it not only has origins dating back to Aristotle, but seems to be dominant within academia (primarily in Anglophone nations), both in philosophy schools and other disciplines. One might argue that it is this assumption that underpins the need to control variables, and indeed, to explain results that do not align with expected predictions. In the case of this study, the authors are trying to identify the “relevant further difference”.
Instead, we might consider how the potential of a gene to impact on behavioural traits is a “dispositional property” or “causal power”, where we do not need to see its effects for it to exist (Mumford and Anjum, 2013). The difficulty for scientists, and indeed in this study, is how to provide empirical evidence about a power which is not apparent.
If we continue with the idea of “dispositional properties”, we could envision environmental variables at each laboratory site (or the tests) as “stimulus conditions” (Mumford and Anjum, 2013) that enable the gene trait to be revealed. In the example of the preference test for consumption of ethanol or water, the provision of ethanol could theoretically lead to a case of a “mutual manifestation partnership”. Mumford and Anjum (2013) explain this view of Charles Martin, where each causal power (dispositional property) unites with the others “as more or less equal partners in causation”. So in this situation, we have a mouse with a particular genotype, the availability of ethanol to drink, and perhaps the development of a particular disease, or demonstration of a type of behaviour eg. sleeping 16 hours a day.
Essentially, from a scientific viewpoint, the study indicates an absence of robustness in the methodology. The answer to the question of why the methodology is not robust has different answers, depending on whether one takes a scientific or philosophical viewpoint. In the case of the latter, the answer is not about adding variables and finding better ways to control them. The tension lies in the observation that small changes in the environment (variables) impact significantly on outcomes, and the implication that genotypes that predispose to certain behaviours, are just that – predispositions, or tendencies, not necessitating agents of causation. Any methodology based on a deterministic concept of the world is bound to be exposed as flawed when imposed upon a situation that is itself grounded in indeterminism.
In reflecting on causation, it has been useful to develop a framework for my own understanding of how the world works, and whether this is rational, or not. One of the motivations for studying biology and ecology, has clearly been a dismissal of a deterministic concept of the world – something I sensed, but could not, until now, define as such. However, after some years of study, and having read many papers in different fields, I observe that small changes can have large effects, and many variables can interact to produce surprising interactions, or “results”. In terms of studying ecosystems, particularly in urban contexts, the concept of “emergentism” is highly relevant.
Conversely, it seems that we have an insistence, at some level, even in ecology, of wanting to predict and control. I think we also cultivate the myth that this is what we do. We want determinism as scientists, as politicians, as policy makers. Yet, if we espouse this view, or desire, then we should perhaps recognise that in so doing, we are also relinquishing hope for being able to make the necessary societal changes for a world that will remain habitable for humans, and other species, as well as one where economic and social justice exist to a degree that humans will want to live there.
We are ourselves objects in a causal chain. Our thoughts, words, actions can (and will) have consequences far beyond what we imagine. We speak at times of the wisdom of hindsight. What if we realised, instead, that in one sense we stand, now, in the past – the past of the future ahead of us? This is not some abstract, or even banal word play, but a concrete and vital challenge to accept responsibility for all our actions. An examination of causation should enlighten us that in life, it is not a case of “nothing matters”, but the opposite – everything matters. We may not have control of outcomes, and we may make many incorrect predictions, but with imagination and awareness, the forces our conscious actions exert upon objects and situations can change the very fabric of the unfolding universe.
The questions then are what kind of universe do we wish to imagine? And do we have the courage to perform the minute actions necessary to skew the causal chains in that direction?
ANSCOMBE, G. E. M. 1971. Causality and Determinism, Cambridge, Cambridge University Press.
CRABBE, J. C., WAHLSTEN, D. & DUDEK, B. C. 1999. Genetics of mouse behavior: Interactions with laboratory environment. Science, 284, 1670-1672.
MUMFORD, S. & ANJUM, R. L. 2013. Causation: A Very Short Introduction, Oxford, Oxford University Press.
POPPER, K. R. 1982. The Open Universe: An Argument for Indeterminism, New York, Routledge.
 This identifies one of the issues with the interpretation of empirical results; we still need to decide what might be a “reasonable margin”. We tend to approach this by reducing it to a statistical calculation – what is the probability that we have incorrectly rejected our null hypothesis? There is nothing wrong, per se, with finding a solution to this problem of uncertainty, but it is indeed problematic when we interpret the results as a kind of “truth” without remembering that they are statistics, not the world/context in which the experiment was conducted, with all its messy variables.
 In the notes, they claim that “This sample size gave us statistical power of 90% to detect modest interactions of genotype X laboratory when Type 1 error probability was set at 0.01.” In other words, there was a probability of 0.01 that the scientists would incorrectly decide that there was no relationship between the variables.
 “The doctrine of ‘scientific’ determinism is the doctrine that the state of any closed physical system at any given future instant of time can be predicted, even from within the system, with any specified degree of precision, by deducing the prediction from theories, in conjunction with initial conditions whose required degree of precision can always be calculated (in accordance with the principle of accountability) if the prediction task is given.”(p. 36)
 Scientic viewpoint within the existing paradigm! This is not to suggest that this is how science should be.