On Clustering Algorithms and Natural Kinds

This abstract has open access
Abstract
How and to what end can we “derive” natural kinds and categories from large data sets? This is a question of interest to philosophers of science, natural scientists, and data scientists, who each offer rich but disciplinarily siloed insights. Cluster analysis refers to a variety of algorithmic processes aimed at identifying “clusters” in data sets; subsets of data points that are relevantly more “similar” to one another than to the larger data set. These algorithms are concrete, explicit artefacts that encode and apply a range of theories and intuitions about the purpose and nature of classification. Their computational specifications and theoretical justifications mirror the rich philosophical literature on classification and natural kinds and the roles they play in scientific understanding. Yet the synergies between these two literatures have been largely unexplored (excepting some insightful theoretical work by data scientists, including von Luxburg et. al., 2012 and Hennig, 2015). This paper aims to bridge this gap (especially on the philosophical side) by providing a comparative birds-eye view of both disciplinary conceptions of clustering and classification, drawing out areas of particular promise for future interdisciplinary research. I begin with a brief summary of the roles of classification in science and existing philosophical discussions on the nature, promise, and limitations of classificatory practices. I discuss the general role of classification in inductive inference, and mention some specific considerations that arise from the roles of classification in specific disciplines. I then survey the most common types of clustering algorithms employed by data scientists (largely drawing on (Xu and Wunsch 2005). I tease out their core theoretical assumptions and connect them to the conception of classification in philosophy of science. I proceed to consider the contexts in which such algorithms are implemented, where scientists’ discretion and contextual peculiarities provide a richer picture of how these clustering algorithms are understood and used by scientists. I pay particular attention to the philosophy of biology, where the role of data analysis has been discussed by philosophers (Leonelli 2016), and where scientists already engage with philosophical work on natural kinds (Boyd 1999). I conclude with a discussion of the ways in which data scientists and philosophers of science can both benefit from the lessons the other has to offer on the nature and purpose of classification.
Abstract ID :
PSA2022765
Submission Type
Topic 1
Postdoctoral Research Fellow
,
Australian National University

Abstracts With Same Type

Abstract ID
Abstract Title
Abstract Topic
Submission Type
Primary Author
PSA2022514
Philosophy of Biology - ecology
Contributed Papers
Dr. Katie Morrow
PSA2022405
Philosophy of Cognitive Science
Contributed Papers
Vincenzo Crupi
PSA2022481
Confirmation and Evidence
Contributed Papers
Dr. Matthew Joss
PSA2022440
Confirmation and Evidence
Contributed Papers
Mr. Adrià Segarra
PSA2022410
Explanation
Contributed Papers
Ms. Haomiao Yu
PSA2022504
Formal Epistemology
Contributed Papers
Dr. Veronica Vieland
PSA2022450
Decision Theory
Contributed Papers
Ms. Xin Hui Yong
PSA2022402
Formal Epistemology
Contributed Papers
Peter Lewis
102 visits