In a rational world, people would believe or not believe in predictions of human-induced climate change on the basis of how well the climate models used to make those predictions account for past observations. The details of what climate models do and don’t do well generally doesn’t get much public discussion though. In the interest of getting that information into wider circulation, the Executive Summary of Chapter 9 of the IPCC Working Group I Assessment Report 5 (WG1AR5), Evaluation of Climate Models, follows below. (The full report is here.) It’s a self-assessment by climate modelers of what they do and don’t do well. Note that statements are phrased to address advances in capability since the IPCC’s Fourth Assessment Report (AR4) was published in 2007. Take 10 minutes. It’s worth the read. If you’re in a hurry or just aren’t interested in the details (no harm in that) then just read the text in bold at the start of each paragraph. If you’re unfamiliar with climate models then RealClimate.org’s “FAQ on climate models” might be worthwhile background reading. It covers terminology and general characteristics of climate models. Without further ado, the Executive Summary of Chapter 9 in its entirety:
Climate models have continued to be developed and improved since the AR4, and many models have been extended into Earth System models by including the representation of biogeochemical cycles important to climate change. These models allow for policy-relevant calculations such as the carbon dioxide (CO2) emissions compatible with a specified climate stabilization target. In addition, the range of climate variables and processes that have been evaluated has greatly expanded, and differences between models and observations are increasingly quantified using ‘performance metrics’. In this chapter, model evaluation covers simulation of the mean climate, of historical climate change, of variability on multiple time scales and of regional modes of variability. This evaluation is based on recent internationally coordinated model experiments, including simulations of historic and paleo climate, specialized experiments designed to provide insight into key climate processes and feedbacks and regional climate downscaling. Figure 9.44 provides an overview of model capabilities as assessed in this chapter, including improvements, or lack thereof, relative to models assessed in the AR4. The chapter concludes with an assessment of recent work connecting model performance to the detection and attribution of climate change as well as to future projections. {9.1.2, 9.8.1, Table 9.1, Figure 9.44}
The ability of climate models to simulate surface temperature has improved in many, though not all, important aspects relative to the generation of models assessed in the AR4. There continues to be very high confidence that models reproduce observed large-scale mean surface temperature patterns (pattern correlation of ~0.99), though systematic errors of several degrees are found in some regions, particularly over high topography, near the ice edge in the North Atlantic, and over regions of ocean upwelling near the equator. On regional scales (sub-continental and smaller), the confidence in model capability to simulate surface temperature is less than for the larger scales; however, regional biases are near zero on average, with intermodel spread of roughly ±3°C. There is high confidence that regional-scale surface temperature is better simulated than at the time of the AR4. Current models are also able to reproduce the large-scale patterns of temperature during the Last Glacial Maximum (LGM), indicating an ability to simulate a climate state much different from the present. {9.4.1, 9.6.1, Figures 9.2, 9.6, 9.39, 9.40}
There is very high confidence that models reproduce the general features of the global-scale annual mean surface temperature increase over the historical period, including the more rapid warming in the second half of the 20th century, and the cooling immediately following large volcanic eruptions. Most simulations of the historical period do not reproduce the observed reduction in global mean surface warming trend over the last 10 to 15 years. There is medium confidence that the trend difference between models and observations during 1998–2012 is to a substantial degree caused by internal variability, with possible contributions from forcing error and some models overestimating the response to increasing greenhouse gas (GHG) forcing. Most, though not all, models overestimate the observed warming trend in the tropical troposphere over the last 30 years, and tend to underestimate the long-term lower stratospheric cooling trend. {9.4.1, Box 9.2, Figure 9.8}
The simulation of large-scale patterns of precipitation has improved somewhat since the AR4, although models continue to perform less well for precipitation than for surface temperature. The spatial pattern correlation between modelled and observed annual mean precipitation has increased from 0.77 for models available at the time of the AR4 to 0.82 for current models. At regional scales, precipitation is not simulated as well, and the assessment remains difficult owing to observational uncertainties. {9.4.1, 9.6.1, Figure 9.6}
The simulation of clouds in climate models remains challenging. There is very high confidence that uncertainties in cloud processes explain much of the spread in modelled climate sensitivity. However, the simulation of clouds in climate models has shown modest improvement relative to models available at the time of the AR4, and this has been aided by new evaluation techniques and new observations for clouds. Nevertheless, biases in cloud simulation lead to regional errors on cloud radiative effect of several tens of watts per square meter. {9.2.1, 9.4.1, 9.7.2, Figures 9.5, 9.43}
Models are able to capture the general characteristics of storm tracks and extratropical cyclones, and there is some evidence of improvement since the AR4. Storm track biases in the North Atlantic have improved slightly, but models still produce a storm track that is too zonal and underestimate cyclone intensity. {9.4.1}
Many models are able to reproduce the observed changes in upper ocean heat content from 1961 to 2005 with the multi-model mean time series falling within the range of the available observational estimates for most of the period. The ability of models to simulate ocean heat uptake, including variations imposed by large volcanic eruptions, adds confidence to their use in assessing the global energy budget and simulating the thermal component of sea level rise. {9.4.2, Figure 9.17}
The simulation of the tropical Pacific Ocean mean state has improved since the AR4, with a 30% reduction in the spurious westward extension of the cold tongue near the equator, a pervasive bias of coupled models. The simulation of the tropical Atlantic remains deficient with many models unable to reproduce the basic east–west temperature gradient. {9.4.2, Figure 9.14}
Current climate models reproduce the seasonal cycle of Arctic sea ice extent with a multi-model mean error of less than about 10% for any given month. There is robust evidence that the downward trend in Arctic summer sea ice extent is better simulated than at the time of the AR4, with about one quarter of the simulations showing a trend as strong as, or stronger, than in observations over the satellite era (since 1979). There is a tendency for models to slightly overestimate sea ice extent in the Arctic (by about 10%) in winter and spring. In the Antarctic, the multi-model mean seasonal cycle agrees well with observations, but inter-model spread is roughly double that for the Arctic. Most models simulate a small decreasing trend in Antarctic sea ice extent, albeit with large inter-model spread, in contrast to the small increasing trend in observations. {9.4.3, Figures 9.22, 9.24}
Models are able to reproduce many features of the observed global and Northern Hemisphere (NH) mean temperature variance on interannual to centennial time scales (high confidence), and most models are now able to reproduce the observed peak in variability associated with the El Niño (2- to 7-year period) in the Tropical Pacific. The ability to assess variability from millennial simulations is new since the AR4 and allows quantitative evaluation of model estimates of low-frequency climate variability. This is important when using climate models to separate signal and noise in detection and attribution studies (Chapter 10). {9.5.3, Figures 9.33, 9.35}
Many important modes of climate variability and intraseasonal to seasonal phenomena are reproduced by models, with some improvements evident since the AR4. The statistics of the global monsoon, the North Atlantic Oscillation, the El Niño-Southern Oscillation (ENSO), the Indian Ocean Dipole and the Quasi-Biennial Oscillation are simulated well by several models, although this assessment is tempered by the limited scope of analysis published so far, or by limited observations. There are also modes of variability that are not simulated well. These include modes of Atlantic Ocean variability of relevance to near term projections in Chapter 11 and ENSO teleconnections outside the tropical Pacific, of relevance to Chapter 14. There is high confidence that the multi-model statistics of monsoon and ENSO have improved since the AR4. However, this improvement does not occur in all models, and process-based analysis shows that biases remain in the background state and in the strength of associated feedbacks. {9.5.3, Figures 9.32, 9.35, 9.36}
There has been substantial progress since the AR4 in the assessment of model simulations of extreme events. Based on assessment of a suite of indices, the inter-model range of simulated climate extremes is similar to the spread amongst observationally based estimates in most regions. In addition, changes in the frequency of extreme warm and cold days and nights over the second half of the 20th century are consistent between models and observations, with the ensemble global mean time series generally falling within the range of observational estimates. The majority of models underestimate the sensitivity of extreme precipitation to temperature variability or trends, especially in the tropics, which implies that models may underestimate the projected increase in extreme precipitation in the future. Some high-resolution atmospheric models have been shown to reproduce observed year-to-year variability of Atlantic hurricane counts when forced with observed sea surface temperatures, though so far only a few studies of this kind are available. {9.5.4, Figure 9.37}
An important development since the AR4 is the more widespread use of Earth System models, which include an interactive carbon cycle. In the majority of these models, the simulated global land and ocean carbon sinks over the latter part of the 20th century fall within the range of observational estimates. However, the regional patterns of carbon uptake and release are less well reproduced, especially for NH land where models systematically underestimate the sink implied by atmospheric inversion techniques The ability of models to simulate carbon fluxes is important because these models are used to estimate ‘compatible emissions’ (carbon dioxide emission pathways compatible with a particular climate change target; see Chapter 6). {9.4.5, Figure 9.27}
The majority of Earth System models now include an interactive representation of aerosols, and make use of a consistent specification of anthropogenic sulphur dioxide emissions. However, uncertainties in sulphur cycle processes and natural sources and sinks remain and so, for example, the simulated aerosol optical depth over oceans ranges from 0.08 to 0.22 with roughly equal numbers of models over- and under-estimating the satellite-estimated value of 0.12. {9.1.2, 9.4.6, Table 9.1, Figure 9.29}
Time-varying ozone is now included in the latest suite of models, either prescribed or calculated interactively. Although in some models there is only medium agreement with observed changes in total column ozone, the inclusion of time-varying stratospheric ozone constitutes a substantial improvement since the AR4 where half of the models prescribed a constant climatology. As a result, there is robust evidence that the representation of climate forcing by stratospheric ozone has improved since the AR4. {9.4.1, Figure 9.10}
Regional downscaling methods are used to provide climate information at the smaller scales needed for many climate impact studies, and there is high confidence that downscaling adds value both in regions with highly variable topography and for various small-scale phenomena. Regional models necessarily inherit biases from the global models used to provide boundaryconditions. Furthermore, the ability to systematically evaluate regional climate models, and statistical downscaling schemes, is hampered because coordinated intercomparison studies are still emerging. However, several studies have demonstrated that added value arises from higher resolution of stationary features like topography and coastlines, and from improved representation of small-scale processes like convective precipitation. {9.6.4}
Earth system Models of Intermediate Complexity (EMICs) provide simulations of millennial time-scale climate change, and are used as tools to interpret and expand upon the results of more comprehensive models. Although they are limited in the scope and resolution of information provided, EMIC simulations of global mean surface temperature, ocean heat content and carbon cycle response over the 20th century are consistent with the historical records and with more comprehensive models, suggesting that they can be used to provide calibrated projections of long-term transient climate response and stabilization, as well as large ensembles and alternative, policy-relevant, scenarios. {9.4.1, 9.4.2, 9.4.5, Figures 9.8, 9.17, 9.27}
The Coupled Model Intercomparison Project Phase 5 (CMIP5) model spread in equilibrium climate sensitivity ranges from 2.1°C to 4.7°C and is very similar to the assessment in the AR4. No correlation is found between biases in global mean surface temperature and equilibrium climate sensitivity, and so mean temperature biases do not obviously affect the modelled response to GHG forcing. There is very high confidence that the primary factor contributing to the spread in equilibrium climate sensitivity continues to be the cloud feedback. This applies to both the modern climate and the LGM. There is likewise very high confidence that, consistent with observations, models show a strong positive correlation between tropospheric temperature and water vapour on regional to global scales, implying a positive water vapour feedback in both models and observations. {9.4.1, 9.7.2, Figures 9.9, 9.42, 9.43}Climate and Earth System models are based on physical principles, and they reproduce many important aspects of observed climate. Both aspects contribute to our confidence in the models’ suitability for their application in detection and attribution studies (Chapter 10) and for quantitative future predictions and projections (Chapters 11 to 14). In general, there is no direct means of translating quantitative measures of past performance into confident statements about fidelity of future climate projections. However, there is increasing evidence that some aspects of observed variability or trends are well correlated with inter-model differences in model projections for quantities such as Arctic summertime sea ice trends, snow albedo feedback, and the carbon loss from tropical land. These relationships provide a way, in principle, to transform an observable quantity into a constraint on future projections, but the application of such constraints remains an area of emerging research. There has been substantial progress since the AR4 in the methodology to assess the reliability of a multi-model ensemble, and various approaches to improve the precision of multi-model projections are being explored. However, there is still no universal strategy for weighting the projections from different models based on their historical performance. {9.8.3, Figure 9.45}