Abstract
Data science, and related data infrastructures and analytic tools, are frequently invoked as a major factor underpinning contemporary transformations in medical research, diagnosis and treatment. This paper discusses whether and how this is happening, and what the implications may be for philosophical understandings of the production, assessment and use of medical evidence. To this aim I consider the role of data science in tackling COVID-related illness and hospitalization, focusing on four areas that have proved critical to the medical response to the pandemic: 1. The development of data technologies and infrastructures to monitor COVID patients, for instance by checking levels of oxygen saturation in the blood, and related efforts to determine the extent to which frequent patient assessments help prevent hospitalization and death; 2. The collection, linkage and analysis of patient data by doctors and other health professionals (both in and outside the hospitals) to ensure effective and prompt insights into the emerging symptoms and long-term effects of infections caused by different COVID variants; 3. The use of data extracted from social services and other non-medical sources to support predictive models of COVID transmission, thus informing public health and treatment guidelines; and 4. The significance of data availability for the development and testing of COVID vaccines, and particularly the ways in which existing data sharing mechanisms (such as genomic databases) were redeployed and greatly expanded to inform small scale, non-clinical studies in several locations around the world, while at the same time underpinning the set-up of large-scale clinical trials. From consideration of these areas, I argue that the data science had a transformative effect on medical research on COVID-19, leading to an acceleration of knowledge production and significant changes in the evaluation of what counts as reliable evidence. Such transformation originated not solely from the deployment of novel methods and instruments for computational data mining and modelling, but also – and perhaps most fundamentally – from the diversity and scope of the data sources considered as potential evidence for medical knowledge and interventions, and the related challenges to existing standards for how evidence is produced, circulated and validated. The evidential power accrued by data produced by medical doctors and frontline hospital staff became incontrovertible, providing ammunition to already existing critiques of the hierarchy of evidence entrenched within the evidence-based medicine (EBM) movement. The need to recognise and value data coming from patients and doctors, compounded by the imperative to act swiftly to tackle the pandemic emergency, provided a strong incentive to review the structure and temporalities of randomised controlled trials, the relation between RCT results and other data, and the ways in which data circulation and exchange is regulated and fostered. This resulting shifts in evidential standards are ongoing. What remains unchallenged – and if anything has been strengthened by reliance on data analytics - is the dependence of publicly funded medical research and services on pharmaceutical companies and other private enterprises focused on the health sector.