Abstract
Much recent philosophical attention has been given to the concept of validity in psychometrics (Alexandrova 2017; Angner 2013; McClimans 2010). By contrast, the question of whether and when a psychometric instrument is fit for its intended purpose has been largely neglected. Here we argue that fitness for purpose is a distinct feature of a psychometric measure that does not automatically follow from its validity, and is established by distinct sources of evidence. We focus on applications of psychometrics in healthcare, and specifically on the use of patient-reported outcome measures (PROMs) in mental healthcare. PROMs such as the Patient Health Questionnaire (PHQ-9) and the Kessler Psychological Distress Scale (K-10) are routinely used by mental health service providers for various purposes, including screening patients, assisting with diagnosis, recommending treatment plans, tracking patient progress, and assessing overall quality of care. Health outcomes researchers acknowledge that a PROM designed and validated for one purpose and population, such as screening in adults, may not be fit to serve another, such as tracking patient progress in youth. This context-sensitivity is partially due to differences in patient characteristics, and to the fact that different clinical decisions can require different kinds of evidence. Health outcomes researchers typically deal with this context-sensitivity by ‘re-validating’ PROMs against ‘gold standards’ of evidence, e.g., by adjusting the severity thresholds of a screening tool against the outcomes of clinical interviews in new settings. This paper argues that ‘re-validation’ techniques are inadequate for establishing fitness-for-purpose across contexts, because they are based on an overly narrow concept of fitness-for-purpose. Fitness-for-purpose in psychometrics is not only an epistemic criterion, but also an ethical criterion, namely, the condition of fit between the meanings and uses of a measure and the values and aims of stakeholders. Consequently, evaluating fitness-for-purpose requires a thorough examination of the ethics of measurement. We substantiate our claims with the results of a recent project in which we collaborated with psychometricians, clinicians, and young people. As part of this collaboration, philosophers of science helped develop a training in measurement for clinicians working at Foundry, a network of integrated mental health clinics for people aged 12-24 in British Columbia. Our research revealed a gap between psychometric evaluation techniques, which focus on statistical properties, and the need of clinicians and patients to identify measures that promote ethical and social values, such as inclusiveness, empowerment and collaboration. Our analysis highlights the need for a normative theory of measurement as a foundation for measure evaluation in psychometrics. Although some validation theorists have paid close attention to the ethics of measurement, they overemphasized the importance of avoiding negative social consequences. Building on McClimans’ (2010), we show that fitness-for-purpose is a stronger requirement than Messick’s ‘consequential validity’, and involves using measurement as a tool for genuine dialogue between clinician and patient.