Abstract
Scientists have started to use algorithms to manufacture a consensus from divergent scientific judgments. One area in which this has been done is the interpretation of MRI images. This paper consists of a normative epistemic analysis of this new practice. It examines a case study from medical imaging, in which a consensus about the segmentation of the left ventricle on cardiac MRI images was algorithmically generated. Algorithms in this case performed a dual role. First, algorithms automatically delineated the left ventricle – alongside expert human delineators. Second, algorithms amalgamated the different human-generated and algorithm-generated delineations into a single segmentation, which constituted the consensus outcome. My paper analyses the strengths and weaknesses of the process used in this case study, and draws general lessons from it. I analyze the algorithms that were used in this case, their strengths and weaknesses, and argue that the amalgamation of different human and non-human judgments contributes to the robustness of the final consensus outcome. Yet in recent years, there has been a move away from relying on multiple algorithms for analyzing the same data in favour of sole reliance on machine learning algorithms. I argue that despite the superior performance of machine learning algorithms compared to other types of algorithms, the move toward sole reliance on them in cases such as this ultimately damages the robustness and validity of the final outcome reached. This is because machine-learning algorithms are prone to certain kinds of errors that other types of algorithms are not prone to (and vice-versa). A central apparent motivation for this project and others like it is anxiety regarding the existence of disagreements over the segmentation of the same image by different human experts. At the same time, the consensus-generating method in this case and other like it faces difficulties handling—in a epistemically satisfying way—cases in which the experts’ judgments significantly diverge from one another. I argue that this difficulty stems from a strive to always reach a consensus, which follows from an unjustified tacit assumption that there should be just one correct segmentation. I argue that different legitimate delineations of the same data may be possible in some cases due to different weighings of inductive risks or different contextually appropriate theoretical background assumptions. Consensus-generating algorithms should recognize this possibility and incorporate an option to trade off values against each other for the sake of reaching a contextually appropriate outcome.