Abstract
Equilibrium climate sensitivity is a measure of the sensitivity of earth’s near-surface temperature to increasing greenhouse gas concentrations. When numerous state-of-the-art climate models recently indicated values for climate sensitivity outside of a range that had been stable for decades, climate scientists faced a dilemma. On the one hand, these high-sensitivity models had excellent pedigrees, incorporated sophisticated representations of physical processes, and had been demonstrated to perform more than acceptably well across a range of performance metrics; their developers considered them at least as good as, or even a significant improvement upon, previous generations of models. The common practice of “model democracy” would suggest giving their results equal weight alongside those of other state-of-the-art models. On the other hand, doing so would generate estimates of climate sensitivity and future warming substantially different from – and more alarming than – estimates developed over decades of previous investigation. Faced with this situation, climate scientists sought to further evaluate the quality of the CMIP6 models. I will show how their efforts, and their subsequent decisions to downweight or exclude some models when estimating future warming, but not when estimating some other variables, illustrates an adequacy-for-purpose approach to model evaluation. I will also critically examine some of the particular evaluation strategies and tests employed, with the aim of extracting some general insights regarding the evaluation of model inadequacy.