Data, Capta and Constructa: Exploring, confirming, or manufacturing data

This abstract has open access
Abstract
The technology of Machine Learning (ML), arguably, is one of the most significant general purpose technologies of our age. The appealing promise of machine learning is that it can take a given large corpus of “raw” data packaged up into a “dataset”, learn and discover various patterns, and derive putatively objective and reliable conclusions about the world according to this data-based learning. This technology, and its implicit mindset, is becoming increasingly the subject of attention in the sciences [Hey 2020], which in turn suggests value in viewing this mindset in terms of philosophy of science. Although there is now work that argues that data is indeed not given (as the etymology of “data” suggest) but should be viewed as capta (i.e. taken by deliberate choice) [Kitchin 2014], we go further and argue that we should view it as constructa something constructed as part of the entire rhetorical chain of building reliable knowledge. We start by arguing that the ML-centric conception of raw data-set as a given is a large part of the problem in using ML in scientific endeavors. We build on the philosophical literature on values in science [Douglas 2000, 2016; Longino 1996] to show that the two essential and ever-present aspects of the context and values are intrinsic to the manufacturing of data-sets. Ironically, by obscuring and ignoring these aspects, the very ostensible goal of using data-driven inferences for rational, reliable, and sound knowledge and action in the world is thwarted. We argue that rather than conceiving the goal of data-science reasoning to just provide warrants for the calculations made upon the given data (construed as an accurate representation of the world), we are better served if we conceive of the entire process, including the acquisition (construction) of the data itself, and seek to legitimate that entire process. Our analysis allows us to identify and attack three pervasive, but flawed, assumptions that underpin the default conception of data in ML: (1) data is a thing, not a process; (2) data is raw and aimlessly given; (3) data is reliable. These assumptions not only result in epistemic harms, but more consequentially can lead to social and moral harms as well. We argue that value-ladenness and theory-ladenness coincide for data, and are readily understood by conceiving of data-based claims as rhetorical claims, where “facts” and “values” are equated in the sense that they are taken as incontrovertible assumptions. Then the justification for data-based inferences amounts to a rhetorical warrant for the whole process. Hence instead of presuming your data was given (or taken) and your job (as a scientist) is to explore it or confirm hypotheses tied to it, it is better to construe data as constructa - something manufactured, just like scientific knowledge, the solidity and reliability of which is within the control of the scientist, rather than being an intrinsic property of the world.
Abstract ID :
PSA2022768
Submission Type
Topic 1
University of Edinburgh
University of Tübingen

Abstracts With Same Type

Abstract ID
Abstract Title
Abstract Topic
Submission Type
Primary Author
PSA2022514
Philosophy of Biology - ecology
Contributed Papers
Dr. Katie Morrow
PSA2022405
Philosophy of Cognitive Science
Contributed Papers
Vincenzo Crupi
PSA2022481
Confirmation and Evidence
Contributed Papers
Dr. Matthew Joss
PSA2022440
Confirmation and Evidence
Contributed Papers
Mr. Adrià Segarra
PSA2022410
Explanation
Contributed Papers
Ms. Haomiao Yu
PSA2022504
Formal Epistemology
Contributed Papers
Dr. Veronica Vieland
PSA2022450
Decision Theory
Contributed Papers
Ms. Xin Hui Yong
PSA2022402
Formal Epistemology
Contributed Papers
Peter Lewis
86 visits