Abstract
With the advent of ‘Big Data’ came an abundance of computational and statistical techniques to analyze it, somewhat vaguely grouped under the label of `Data Science'. This invites philosophical reflection and systematization. In this paper we will focus on exploratory data analysis (EDA), a widespread practice used by researchers to summarize the data and formulate hypotheses about it after briefly exploring it. Using Reichenbach's (1938) distinction between context of discovery and context of justification, EDA seems to sit in between exploring in order to discover new hypotheses; and exploiting the data to justify doing confirmatory work. In this paper we will present different conceptualizations of it, shine on its importance, and suggest success conditions for it to be well functioning. The distinction between context of discovery and context of justification is well known and heavily discussed in the literature, albeit different authors provide different interpretations. By it we mean the distinction to be one about two aspects or features of the scientific practice, namely between the process of arriving at hypotheses, and the defense or validation of those hypotheses, the assessment of their evidential support - confirmatory work. One playful way of conceptualizing the difference, and the role that EDA plays in between these two contexts, is to model it as a trade-off between exploration and exploitation. Exploration allows for the discovery of new hypotheses, exploitation allows for assessing the evidential support of hypotheses, obtaining a reward in justification. This allows for a test for when EDA was done successfully, by balancing the trade-offs. Yet it might be objected that EDA has no place in confirmatory work, as Wagenmakers et al. (2012) emphasizes. In a nutshell, it would amount to using the data both to formulate and test hypotheses. I sympathize with this take, but it assumes a deflationary notion of justification. In the literature on epistemic justification, there are two broad tribes. On the one hand, foundationalist theories which in the chain of justifications defend that there are propositions that are self-evident (Descartes), axiomatic (Aristotle, Euclid), acquainted by experience (Russell) or (sense) data (empiricism), etc. In a nutshell, that there is a set of propositions that do not require others to be justified. On the other hand, coherentist theories defend the holistic idea that propositions can be justified by how coherent they are with one another (see Bovens and Hartmann (2003) for a Bayesian formulation of this notion). If justification is understood in a foundationalist vein, then Wagenmakers et al. are correct in arguing that EDA is flawed methodology. But if justification is understood in a coherentist way, or foundherentist one (a mixture between the two developed by Haack (1995)), then there is some role that EDA can play in the context of justification.