Sec. 10 Lab exercise 4: Choosing our models

Last update: Wed Feb 15 10:22:51 2023 -0500 (76511fd)

10.1 Introduction and setup

Now that we have reviewed some methods for selecting models, let’s see them in action. We’ll start by importing the relevant libraries, as usual.

Python

import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt

Next, we need to find our downloaded anomaly data. We will be using the same .csv file that we used at the end of Lab 3 (Sec. @ref{lab3}) If you have renamed your file, or saved it somewhere strange, adjust this line as necessary.

Python

file = "data/ssp_change_factors.csv"

Now we read it using pandas.

Python

dat = pd.read_csv(file)
dat.head()

##   Variable          Model Scenario  1981-2010  2011-2040  2041-2070  2071-2100
## 0      tas     ACCESS-CM2   ssp126   5.975702   1.795567   3.468750   4.010926
## 1      tas     ACCESS-CM2   ssp245   5.975702   1.603762   3.545075   4.664551
## 2      tas     ACCESS-CM2   ssp370   5.975702   1.916411   3.938202   5.793183
## 3      tas     ACCESS-CM2   ssp585   5.975702   1.916542   4.229950   7.124695
## 4      tas  ACCESS-ESM1-5   ssp126   9.203638   1.450132   2.687561   2.943726

Let’s see what we’re working with.

Python

dat.shape

## (118, 7)

It looks like we have 118 rows and 7 columns of data. Let’s drop the 'Variable' column, which we won’t use in this analysis. Note that we are only analyzing the r1i1p1f1 ensemble member. If you want to analyze additional ensemble members, you should include a column identifying the ensemble member.

Python

dat = dat.drop(['Variable'], axis = 1)
dat.head()

##            Model Scenario  1981-2010  2011-2040  2041-2070  2071-2100
## 0     ACCESS-CM2   ssp126   5.975702   1.795567   3.468750   4.010926
## 1     ACCESS-CM2   ssp245   5.975702   1.603762   3.545075   4.664551
## 2     ACCESS-CM2   ssp370   5.975702   1.916411   3.938202   5.793183
## 3     ACCESS-CM2   ssp585   5.975702   1.916542   4.229950   7.124695
## 4  ACCESS-ESM1-5   ssp126   9.203638   1.450132   2.687561   2.943726

10.2 Selecting our models

10.2.1 Using the extremes

Choosing our models based on the highest and lowest projected changes allows us to get an idea of the “range” of possibilities for the future. Since we are using change factors, we are only considering the changes that each model projects over its own baseline period. So these changes are relative, i.e. the model with the largest projected change is not necessarily the model with the highest projected temperature.

For this and all of our selection methods, we will made a copy of dat to work with using the copy() method. This is the safer option, as something like dat_ext = dat creates a new reference to the same dat table. This isn’t an issue if we are just sub-setting and filtering data, but it becomes troublesome if we want to change any of the data points; any modification to the values in dat_ext would also appear in dat! This will become more relevant later in this lab when we create the tor_win data frame.

Python

dat_ext = dat.copy()

We will assess highest and lowest projected changes based on the average over the three tridecadal time periods. We will add a new column, called 'MeanAnom', which will contain the average tridecadal anomaly, or change factor, produced by each model and then sort this new column from smallest change to largest.

Python

dat_ext['MeanAnom'] = dat_ext.iloc[:,-3:].mean(axis = 1)
dat_ext = dat_ext.sort_values('MeanAnom')
dat_ext.head() # Smallest

##           Model Scenario  1981-2010  2011-2040  2041-2070  2071-2100  MeanAnom
## 11  CAMS-CSM1-0   ssp126   7.211725   0.560417   0.895661   1.103638  0.853239
## 72     IITM-ESM   ssp126   4.398706   0.752211   1.071686   1.180725  1.001541
## 94    KIOST-ESM   ssp126   2.966669   0.802513   1.428680   1.379089  1.203427
## 12  CAMS-CSM1-0   ssp245   7.211725   0.472422   1.578430   1.860260  1.303704
## 59    FGOALS-g3   ssp126   6.822473   1.410681   1.378022   1.440735  1.409813

dat_ext.tail() # Largest

##           Model Scenario  1981-2010  2011-2040  2041-2070  2071-2100  MeanAnom
## 93   KACE-1-0-G   ssp585   8.671564   2.172386   4.747009   7.160980  4.693458
## 117     TaiESM1   ssp585   6.847009   2.149199   4.891968   7.551697  4.864288
## 43    EC-Earth3   ssp585   6.003870   2.653685   4.608856   7.428070  4.896871
## 37      CanESM5   ssp585   8.782312   2.131543   4.874146   8.249115  5.084935
## 39     E3SM-1-1   ssp585   7.465784   2.758083   5.961243   9.758056  6.159128

The next code block is a little complicated, so let’s walk through it: first, we group dat_ext by 'Scenario' and get the smallest value for each using first(). Make sure you sorted the values in the previous code block or you won’t get the expected output! We label these first values as “Lower”, to represent the lower limit of the model projections, and restore the index, which was changed by groupby. We then do the same for the last() value, labelling these as “Upper”. Finally, we append these two tables together, and drop the 'MeanAnom' column.

Python

lower = dat_ext.groupby('Scenario').first().assign(Limit = "Lower").reset_index()
upper = dat_ext.groupby("Scenario").last().assign(Limit = "Upper").reset_index()
dat_ext = pd.concat([lower,upper]).drop("MeanAnom", axis = 1)
dat_ext

##   Scenario        Model  1981-2010  2011-2040  2041-2070  2071-2100  Limit
## 0   ssp126  CAMS-CSM1-0   7.211725   0.560417   0.895661   1.103638  Lower
## 1   ssp245  CAMS-CSM1-0   7.211725   0.472422   1.578430   1.860260  Lower
## 2   ssp370  CAMS-CSM1-0   7.211725   0.709878   1.681183   2.707093  Lower
## 3   ssp585  CAMS-CSM1-0   7.211725   0.588055   1.840119   3.313080  Lower
## 0   ssp126   KACE-1-0-G   8.671564   2.406111   3.670105   3.714386  Upper
## 1   ssp245      CanESM5   8.782312   2.065475   3.647126   4.935028  Upper
## 2   ssp370      CanESM5   8.782312   2.133130   4.579010   6.976593  Upper
## 3   ssp585     E3SM-1-1   7.465784   2.758083   5.961243   9.758056  Upper

Now we have a table of the two extreme models for each scenario that identifies each model as providing either an upper or lower extreme. Great! This table suits our needs for this analysis. We will revisit this table later.

10.2.2 Validating model baselines

Another model selection method is to validate a model against observed baseline values. Before we can validate, we need data to validate against. Here we are using the very same .csv file that we generated in Lab 1. If you no longer have that file, give the code in Lab 1 a quick run and you should be able to re-create it in no time. Note that when we saved the file using tor.to_csv(), an index column was included in the file. Use the index_col argument to let pandas know about this, or you will end up with an extra unnamed column of numbers from 0 to the number of rows in the file.

Python

tor = pd.read_csv("tor.csv", index_col = 0)
tor.head()

##          Date  MaxTemp  MinTemp  MeanTemp  TotalPrecip
## 0  1981-01-01     -5.7    -10.2      -8.0          5.0
## 1  1981-01-02     -9.2    -15.8     -12.5          0.0
## 2  1981-01-03    -16.2    -20.7     -18.5          0.0
## 3  1981-01-04    -12.3    -26.9     -19.6          0.0
## 4  1981-01-05     -4.7    -14.0      -9.4          0.0

There are many different approaches to model validation. Here, we will use the Gough–Fenech Confidence Index (GFCI) to help us choose our models. The GFCI quantifies the difference in the model-projected baseline as a proportion of the standard deviation of the aggregate observed values using the following equation:

\[\textrm{GFCI} = \frac{ \left | T_{\textrm{mod}} - T_{\textrm{obs}} \right |}{\sigma_{\textrm{obs}}}\]

$\left | T_{\textrm{mod}} - T_{\textrm{obs}} \right |$ is the absolute difference between the modelled baseline and the observed baseline, and $\sigma_{\textrm{obs}}$ is the standard deviation of the aggregate observed values. In this case, we need to use the mean and standard deviation of the annual aggregate values of our observed Toronto station data. If we want to test for $T_{\textrm{mean}}$, for instance, we would first calculate the annual average $T_{\textrm{mean}}$ for each of the 30-years in our baseline, and then take the mean and standard deviation of those 30 annual values.

If you are performing a monthly or seasonal analysis, you should use the respective aggregate values. As an example, if you are performing your analysis on winter values, $T_{\textrm{obs}}$ is the average yearly winter temperature and $\sigma_{\textrm{obs}}$ is the standard deviation of those average yearly winter temperatures.

pandas doesn’t reliably process date information in a .csv file, so we need to fix the 'Date' column.

Python

tor.Date = pd.to_datetime(tor.Date)

Now we can generate our annual average $T_{\textrm{mean}}$.

Python

tor_mean = tor[['Date', 'MeanTemp']].groupby(tor.Date.dt.year).mean(numeric_only=True)
tor_mean.head()

##       MeanTemp
## Date          
## 1981  8.824658
## 1982  8.562192
## 1983  9.380000
## 1984  9.067486
## 1985  8.866575

Now we calculate the mean and the standard deviation of those values.

Python

tor_tas_mean = tor_mean.MeanTemp.values.mean()
tor_tas_std = tor_mean.MeanTemp.values.std()

print("Mean:", tor_tas_mean, "... Standard Deviation:", tor_tas_std)

## Mean: 9.526999642630084 ... Standard Deviation: 0.7626351751355993

Since we have the same baseline for each model, so each SSP, let’s copy the original data frame to get the unique baseline values for each model.

Python

dat_gf = dat.copy()[['Model', '1981-2010']].drop_duplicates()

Now we calculate the GFCI.

Python

dat_gf = dat_gf.assign(GFCI = abs(dat_gf[['1981-2010']] - tor_tas_mean) / tor_tas_std)
dat_gf.sort_values('GFCI').head()

##             Model  1981-2010      GFCI
## 4   ACCESS-ESM1-5   9.203638  0.424006
## 19    CESM2-WACCM   8.903986  0.816922
## 26   CMCC-CM2-SR5   8.875727  0.853977
## 15     CAS-ESM2-0   8.821039  0.925686
## 34        CanESM5   8.782312  0.976466

The GFCI defines confidence as follows:

Values below 1.0 indicate that the bias in the model projections was within one standard deviation of the observed aggregate values. These values are further subdivided:
Values between 0.5 and 1.0 are considered to merit moderate confidence
Values below 0.5 are considered to merit high confidence
Values above 1.0 indicate that the model bias exceeds one standard deviation of the aggregate values. These projections are considered to be suspect and should not be used.

We can whip this into a simple function so that we can easily classify the confidence that we have in our models.

Python

def confidence(GFCI):
    if (GFCI < 0.5):
        return "high"
    elif (GFCI > 1.0):
        return "low"
    else:
        return "moderate"

Python

dat_gf['Confidence'] = dat_gf.GFCI.apply(confidence)
dat_gf.sort_values('GFCI').head(10)

##             Model  1981-2010      GFCI Confidence
## 4   ACCESS-ESM1-5   9.203638  0.424006       high
## 19    CESM2-WACCM   8.903986  0.816922   moderate
## 26   CMCC-CM2-SR5   8.875727  0.853977   moderate
## 15     CAS-ESM2-0   8.821039  0.925686   moderate
## 34        CanESM5   8.782312  0.976466   moderate
## 97         MIROC6  10.297052  1.009726        low
## 90     KACE-1-0-G   8.671564  1.121684        low
## 86   IPSL-CM6A-LR   8.605860  1.207838        low
## 6   AWI-CM-1-1-MR   8.574091  1.249495        low
## 30      CMCC-ESM2   8.500208  1.346373        low

Let’s see this information in a figure!

Python

plt.figure(figsize=(10,6))
fg = sns.scatterplot(x="Model", y="GFCI", hue="Confidence", data=dat_gf.sort_values('GFCI'))
plt.title(r'GFCI Values for Model-Projected $T_\mathrm{mean}$ at Toronto (1981‒2010)')
for label in fg.axes.get_xticklabels():
    label.set_rotation(90)
plt.show()

$GFCI confidence classes for GCM-projected $T_\mathrm{mean}$ at Toronto (1981‒2010). Lower values indicate less bias in the model projections compared to baseline annual observed average $T_\mathrm{mean}$$

Figure 10.1: GFCI confidence classes for GCM-projected $T_\mathrm{mean}$ at Toronto (1981‒2010). Lower values indicate less bias in the model projections compared to baseline annual observed average $T_\mathrm{mean}$

Let’s get just the top three models

Python

dat_gf = dat_gf.sort_values('GFCI').iloc[0:3,:]
dat_gf

##             Model  1981-2010      GFCI Confidence
## 4   ACCESS-ESM1-5   9.203638  0.424006       high
## 19    CESM2-WACCM   8.903986  0.816922   moderate
## 26   CMCC-CM2-SR5   8.875727  0.853977   moderate

Now that we have our models, we need to recover the change factors for these models! We can do this with a left merge of dat into dat_gf.

Python

dat_gf = pd.merge(dat_gf, dat, how = 'left', on = ['Model', '1981-2010'])
dat_gf

##            Model  1981-2010      GFCI Confidence Scenario  2011-2040  \
## 0  ACCESS-ESM1-5   9.203638  0.424006       high   ssp126   1.450132   
## 1  ACCESS-ESM1-5   9.203638  0.424006       high   ssp585   1.988043   
## 2    CESM2-WACCM   8.903986  0.816922   moderate   ssp126   1.701160   
## 3    CESM2-WACCM   8.903986  0.816922   moderate   ssp245   1.588462   
## 4    CESM2-WACCM   8.903986  0.816922   moderate   ssp370   1.462937   
## 5    CESM2-WACCM   8.903986  0.816922   moderate   ssp585   1.875615   
## 6   CMCC-CM2-SR5   8.875727  0.853977   moderate   ssp126   1.272100   
## 7   CMCC-CM2-SR5   8.875727  0.853977   moderate   ssp245   0.998358   
## 8   CMCC-CM2-SR5   8.875727  0.853977   moderate   ssp370   0.934538   
## 9   CMCC-CM2-SR5   8.875727  0.853977   moderate   ssp585   1.214020   
## 
##    2041-2070  2071-2100  
## 0   2.687561   2.943726  
## 1   3.748352   6.009125  
## 2   3.129578   3.389801  
## 3   3.355164   4.185547  
## 4   3.879700   5.488342  
## 5   4.506897   7.373016  
## 6   2.882965   3.622558  
## 7   2.982116   4.386383  
## 8   2.791137   4.430694  
## 9   3.660003   5.996887

You may notice in the plot above that some of the models with similar rankings have similar names. This is a good reminder of our caveat about model independence (see Sec. 9.2). We have ranked our models under the assumption that the models are independent, but in reality our models are not independent. Therefore, depending on the results of the GFCI ranking, it is possible that one of the top models will contribute little by the way of new knowledge to the analysis. In this sense, ranking models by their output value may in fact provide a better indicator of model interdependence than of model skill. For more on model independence, see this blog post from RealClimate.

10.2.3 Using the multi-model ensemble

Using the ensemble is as simple as taking the average for each scenario across each scenario.

Python

dat_ens = dat.copy().groupby(['Scenario']).mean(numeric_only=True).reset_index()
dat_ens

##   Scenario  1981-2010  2011-2040  2041-2070  2071-2100
## 0   ssp126   7.440728   1.437420   2.333416   2.488095
## 1   ssp245   7.379808   1.423465   2.716448   3.626614
## 2   ssp370   7.377921   1.417459   3.020552   4.830899
## 3   ssp585   7.487832   1.598217   3.677373   6.128601

You may be concerned that the baseline values differ across scenarios. What we’re seeing here is an inconsistency based on the number of models that provide projections for each scenario. Since we aren’t doing any validation here, it doesn’t really matter, but it is nice to know how many models are providing projections for each scenario and period.

Python

dat.groupby(['Scenario']).count().reset_index()

##   Scenario  Model  1981-2010  2011-2040  2041-2070  2071-2100
## 0   ssp126     28         28         28         28         28
## 1   ssp245     30         30         30         29         29
## 2   ssp370     27         27         27         27         26
## 3   ssp585     33         33         33         33         33

From the above table, we see that SSP3-7.0 is the least commonly modelled. For the remainder of the lab, we’ll use the SSP1-2.6 and SSP5-8.5 scenarios when we apply our change factors. Using these two scenarios highlights the stark differences that these two plausible futures present.

The methods that we have seen above are quick and useful for pedagogy, but there are lots of other, more advanced ways to choose models for CCIA studies. A lot of recent study has been put toward the identification of optimal sub-ensembles, using a handful of the most relevant models. You should do some independent reading on some of these methods (e.g. constrained projections) to ensure that your toolbox is up-to-date!

10.3 Applying our change factors

Now that we have chosen our models, let’s see how we can apply these change factors to our observed data. For each period, we apply the given change factor to the daily values from our observed data. If we were to graph this, it would look a little bit like a jagged staircase, with each tridecade shifting by some fixed amount. It is not, however, the daily values that are of interest. We cannot reliably forecast daily weather far into the future. Instead, we use the past history, combined with the average tridecadal change to estimate changes in our climate index. Since we are looking at $T_{\textrm{mean}}$ here, we won’t examine tropical nights as we did in Lab 1. Here we will use summer cooling degree days and winter heating degree days for the SSP1-2.6 and SSP5-8.5 scenarios.

In these examples, we are using annual change factors to analyze seasonal climate indices. This is lazy at best and downright wrong at worst! The sections below are only meant as code examples. In your research, make sure to use the appropriate change factors for your period of interest.

We will begin this section by showing how we can manually apply the change factors, and then we will define a function that will make this easier.

10.3.1 Using the extremes

Let’s begin by filtering our extreme table to only use the SSP1-2.6 and SSP5-8.5 scenarios.

Python

dat_ext = dat_ext[dat_ext.Scenario.isin(['ssp126', 'ssp585'])]
dat_ext

##   Scenario        Model  1981-2010  2011-2040  2041-2070  2071-2100  Limit
## 0   ssp126  CAMS-CSM1-0   7.211725   0.560417   0.895661   1.103638  Lower
## 3   ssp585  CAMS-CSM1-0   7.211725   0.588055   1.840119   3.313080  Lower
## 0   ssp126   KACE-1-0-G   8.671564   2.406111   3.670105   3.714386  Upper
## 3   ssp585     E3SM-1-1   7.465784   2.758083   5.961243   9.758056  Upper

Let’s apply the high extreme for SSP1-2.6 for the 2020s. There are a number of ways of sub-setting the table, above, use whichever is easiest for you.

Python

anom = dat_ext.loc[dat_ext.Model == 'KACE-1-0-G', '2011-2040'].values
anom

## array([2.4061114])

Now we add that anomaly to the daily 'MeanTemp' values in a copy of the tor data frame.

Python

tor_ext = tor.copy()[['Date', 'MeanTemp']]
tor_ext = tor_ext.assign(hex2020s = tor.MeanTemp + anom)
tor_ext.head()

##         Date  MeanTemp   hex2020s
## 0 1981-01-01      -8.0  -5.593889
## 1 1981-01-02     -12.5 -10.093889
## 2 1981-01-03     -18.5 -16.093889
## 3 1981-01-04     -19.6 -17.193889
## 4 1981-01-05      -9.4  -6.993889

Now we can (re)calculate our climate index, cooling degree-days (CDDs) using a lambda expression just as we did in Lab 1.

Python

tor_ext['CDDObs'] = tor_ext.MeanTemp.apply(lambda x: 0 if x <= 18 else x - 18)
tor_ext['CDDhex2020s'] = tor_ext.hex2020s.apply(lambda x: 0 if x <= 18 else x - 18)
tor_ext.head()

##         Date  MeanTemp   hex2020s  CDDObs  CDDhex2020s
## 0 1981-01-01      -8.0  -5.593889     0.0          0.0
## 1 1981-01-02     -12.5 -10.093889     0.0          0.0
## 2 1981-01-03     -18.5 -16.093889     0.0          0.0
## 3 1981-01-04     -19.6 -17.193889     0.0          0.0
## 4 1981-01-05      -9.4  -6.993889     0.0          0.0

So, what is the projected change to summertime CDDs? Let’s take a look.

Python

tor_ext[tor_ext.Date.dt.month.isin([6,7,8])].filter(regex = '^CDD').sum()

## CDDObs          9654.500000
## CDDhex2020s    15602.942425
## dtype: float64

It looks like we can expect an increase from 9654.5 total CDDs from the baseline to 15602.9 in the 2020s. This could have serious implications for energy supply for air conditioning!

Great! That’s one projection down; 11 more to go to fill in our first table. This looks like it could turn out to be a monotonous process. Let’s whip up a function to do the work for us! We will create a function that takes five arguments: obs is a two-column Data Frame containing the 'Date' and variable column; anoms is the respective dat_* table; stat is the stat that we want to calculate for the climate index, e.g. sum; expr is a lambda expression that we evaluate to calculate the climate index (some examples are in Appendix A); finally, month is a list of one or more months to filter on, defaulting to None. If in doubt, the function has a complete docstring to help you to choose values for each parameter.

Python

from math import isnan

def recalc_exp(obs, anoms, stat, expr, month = None):
    """Recalculate and aggregate a climate index using observed data
    and Conjuntool output.

    Parameters
    ----------
    obs   : pandas.core.series.Series
        A single-column or array data frame with 'Date' and some variable column
    anoms : pandas.core.frame.DataFrame
        The data frame produced by Conjuntool, filtered as necessary
    stat  : builtin_function_or_method or function
        The function to call on the calculated climate index, e.g. sum
    expr  : str
        A lambda expression to be applied to the variable column to
        calculate the climate index
    month : list
        A list of the months to filter on (optional)
    """
    row_list = []
    if month is not None:
        obs = obs[obs.Date.dt.month.isin(month)]
    baseline = stat(obs.iloc[:,1].apply(eval(expr)))
    for model in set(anoms.Model):
        for scen in set(anoms.Scenario):
            periods = {}
            for period in [col for col in anoms.columns if '-' in col][1:]:
                anom = anoms.loc[(anoms.Scenario == scen) & (anoms.Model == model)][[period]].values
                if len(anom) == 0:
                    continue
                else:
                    anom = anom[0]
                tmp = obs.assign(new = obs.iloc[:,1].values + anom)
                tmp = stat(tmp.new.apply(eval(expr)))
                periods[period] = tmp
            if all(isnan(value) for value in periods.values()):
                continue
            else:
                row = {'Model': model,
                       'Scenario': scen,
                       'Baseline': baseline}
                row = {**row, **periods}
                row_list.append(row)
    return pd.DataFrame(row_list, columns=row.keys())

Let’s take a look at CDDs, based on our extreme models.

Python

recalc_exp(obs=tor[['Date', 'MeanTemp']],
           anoms=dat_ext,
           stat=sum,
           expr="lambda x: 0 if x <= 18 else x - 18",
           month=[6,7,8])

##          Model Scenario  Baseline     2011-2040     2041-2070     2071-2100
## 0   KACE-1-0-G   ssp126    9654.5  15602.942425  18958.131925  19077.083954
## 1     E3SM-1-1   ssp585    9654.5  16529.507491  25185.106916  35648.635664
## 2  CAMS-CSM1-0   ssp126    9654.5  10957.468592  11763.282749  12274.430012
## 3  CAMS-CSM1-0   ssp585    9654.5  11023.055417  14133.785986  18003.024134

As we might have expected, there is a larger increase in CDDs for the SSP5-8.5 Scenario than for the SSP1-2.6 scenario, but the overall smallest and largest increases in CDDs depends on whether we consider the lower or upper model extremes.

The above table is missing an important piece of information. In our dat_ext table, we had labelled each model as providing and upper or lower extreme, but this information is not preserved by the recalc_exp() function. How can we recover the information? The simplest option might be to replace the values in the dat_ext.Model with the values of dat_ext.Limit, but this, again, removes some key information from the results, specifically, the model that provided each projection. Another option would be to re-write recalc_exp() so that it preserves both pieces of information. The third option, however, is the one that we will use: we already know whether each model in our recalc_exp() output represents the upper or lower limit of our projections, since that information is present in dat_ext. We can use the same mechanism of merging two tables that we used in in the GFCI section to create a table that includes both pieces of data.

Python

projected_cdds = recalc_exp(obs=tor[['Date', 'MeanTemp']],
                            anoms=dat_ext,
                            stat=sum,
                            expr="lambda x: 0 if x <= 18 else x - 18",
                            month=[6,7,8])

pd.merge(projected_cdds, dat_ext[['Scenario', 'Model', 'Limit']])

##          Model Scenario  Baseline     2011-2040     2041-2070     2071-2100  \
## 0   KACE-1-0-G   ssp126    9654.5  15602.942425  18958.131925  19077.083954   
## 1     E3SM-1-1   ssp585    9654.5  16529.507491  25185.106916  35648.635664   
## 2  CAMS-CSM1-0   ssp126    9654.5  10957.468592  11763.282749  12274.430012   
## 3  CAMS-CSM1-0   ssp585    9654.5  11023.055417  14133.785986  18003.024134   
## 
##    Limit  
## 0  Upper  
## 1  Upper  
## 2  Lower  
## 3  Lower

We can also use our recalc_exp() function to give us the 30-year daily average CDD, instead of the 30-year total.

Python

from numpy import mean

pd.merge(recalc_exp(obs=tor[['Date', 'MeanTemp']],
                    anoms=dat_ext,
                    stat=mean,
                    expr="lambda x: 0 if x <= 18 else x - 18",
                    month=[6,7,8]),
         dat_ext[['Scenario', 'Model', 'Limit']])

##          Model Scenario  Baseline  2011-2040  2041-2070  2071-2100  Limit
## 0   KACE-1-0-G   ssp126  3.498007   5.653240   6.868888   6.911987  Upper
## 1     E3SM-1-1   ssp585  3.498007   5.988952   9.125039  12.916172  Upper
## 2  CAMS-CSM1-0   ssp126  3.498007   3.970097   4.262059   4.447257  Lower
## 3  CAMS-CSM1-0   ssp585  3.498007   3.993861   5.120937   6.522835  Lower

Bonus exercise

The recalc_exp() function works on the daily values of each 30-year tridecade. Thus, it returns either the daily average, or the tridecadal total for a given climate index. These values might not be useful metrics for most analyses. It might be more useful to know the annual average or sum for a given metric. In our example the seasonal or annual total CDDs might be more useful to policy-makers.

For bonus marks: modify the recalc_exp() function above so that it first aggregates the daily values by year (obs.Date.dt.year), and then applies the calculation provided by the stat argument. For full marks, your function should also retain the original functionality of recalc_exp(). (Hint: you may need to add an extra parameter to the function).

Once you have finished changing the code, generate a patch for the original function using software such as Pretty Diff. This will highlight the changes that you made. Submit this patch along with your regular “What to submit” exercises.

What if we want to calculate winter heating degree days? We’ll need to shift our Decembers up a year like we did in Lab 1. This isn’t strictly necessary at the moment, since we are looking at 30-year totals, but it is always a good habit to maintain and allows us to be more precise about the seasons in consideration. It will become especially relevant for those of you who undertake the bonus assignment in the box above.

Python

tor_win = tor.copy()
tor_win['Date'] = tor_win.Date.apply(lambda x: x + pd.DateOffset(years=1) if x.month == 12 else x)
tor_win.head() # Not shifted

##         Date  MaxTemp  MinTemp  MeanTemp  TotalPrecip
## 0 1981-01-01     -5.7    -10.2      -8.0          5.0
## 1 1981-01-02     -9.2    -15.8     -12.5          0.0
## 2 1981-01-03    -16.2    -20.7     -18.5          0.0
## 3 1981-01-04    -12.3    -26.9     -19.6          0.0
## 4 1981-01-05     -4.7    -14.0      -9.4          0.0

tor_win.tail() # Shifted

##            Date  MaxTemp  MinTemp  MeanTemp  TotalPrecip
## 4013 2011-12-27     -0.9    -10.7      -5.8          0.0
## 4014 2011-12-28      0.7     -3.9      -1.6          0.0
## 4015 2011-12-29     -0.2     -3.1      -1.7          0.0
## 4016 2011-12-30      5.7     -3.5       1.1          0.0
## 4017 2011-12-31      9.9      5.3       7.6          0.0

Let’s drop the years that we know to be incomplete, winter 1981/82, which is missing December 1980, and winter 2010/11, which is just December 2010. Remember, a better solution for the former would be to download data for December 1980 so that we can include all 30 winters.

Python

tor_win = tor_win[tor_win.Date.dt.year.isin(range(1982, 2011))]

Python

pd.merge(recalc_exp(obs=tor_win[['Date', 'MeanTemp']],
                    anoms=dat_ext,
                    stat=sum,
                    expr="lambda x: 0 if x >= 18 else 18 - x",
                    month=[12,1,2]),
         dat_ext[['Scenario', 'Model', 'Limit']])

##          Model Scenario  Baseline     2011-2040     2041-2070     2071-2100  \
## 0   KACE-1-0-G   ssp126   53028.5  46731.706466  43424.605320  43308.766224   
## 1     E3SM-1-1   ssp585   53028.5  45810.596091  37433.732239  27568.678939   
## 2  CAMS-CSM1-0   ssp126   53028.5  51561.889758  50684.554640  50140.278831   
## 3  CAMS-CSM1-0   ssp585   53028.5  51489.559071  48212.908054  44358.582197   
## 
##    Limit  
## 0  Upper  
## 1  Upper  
## 2  Lower  
## 3  Lower

As winter is projected to warm, we see less heating degree days. This change is more rapid in the scenario with more radiative forcing.

10.3.2 Using the multi-model ensemble

Since the recalc_exp() function expects our anoms data frame to include a 'Model' column, we should add one.

Python

dat_ens['Model'] = "Ensemble"
dat_ens

##   Scenario  1981-2010  2011-2040  2041-2070  2071-2100     Model
## 0   ssp126   7.440728   1.437420   2.333416   2.488095  Ensemble
## 1   ssp245   7.379808   1.423465   2.716448   3.626614  Ensemble
## 2   ssp370   7.377921   1.417459   3.020552   4.830899  Ensemble
## 3   ssp585   7.487832   1.598217   3.677373   6.128601  Ensemble

Now we can take a look at the projected changes in our climate indices.

For CDDs:

Python

recalc_exp(obs=tor[['Date', 'MeanTemp']],
           anoms=dat_ens[dat_ens.Scenario.isin(['ssp126', 'ssp585'])],
           stat=sum,
           expr="lambda x: 0 if x <= 18 else x - 18",
           month=[6,7,8])

##       Model Scenario  Baseline     2011-2040     2041-2070     2071-2100
## 0  Ensemble   ssp126    9654.5  13108.585616  15412.316097  15818.150097
## 1  Ensemble   ssp585    9654.5  13515.777228  18977.646700  25644.053599

For HDDs:

Python

recalc_exp(obs=tor_win[['Date', 'MeanTemp']],
           anoms=dat_ens[dat_ens.Scenario.isin(['ssp126', 'ssp585'])],
           stat=sum,
           expr="lambda x: 0 if x >= 18 else 18 - x",
           month=[12,1,2])

##       Model Scenario  Baseline    2011-2040     2041-2070     2071-2100
## 0  Ensemble   ssp126   53028.5  49266.77174  46921.950982  46517.154665
## 1  Ensemble   ssp585   53028.5  48845.96539  43405.592042  36996.721784

10.3.3 Using our top-ranked models

How about our top ranked models? Our function works as expected for CDDs:

Python

recalc_exp(obs=tor[['Date', 'MeanTemp']],
           anoms=dat_gf[dat_gf.Scenario.isin(['ssp126', 'ssp585'])],
           stat=sum,
           expr="lambda x: 0 if x <= 18 else x - 18",
           month=[6,7,8])

##            Model Scenario  Baseline     2011-2040     2041-2070     2071-2100
## 0    CESM2-WACCM   ssp126    9654.5  13777.868099  17514.166214  18207.868670
## 1    CESM2-WACCM   ssp585    9654.5  14224.902849  21217.811561  29068.505112
## 2   CMCC-CM2-SR5   ssp126    9654.5  12693.562099  16859.193530  18830.468230
## 3   CMCC-CM2-SR5   ssp585    9654.5  12548.593088  18931.008055  25282.770380
## 4  ACCESS-ESM1-5   ssp126    9654.5  13140.683132  16343.610796  17020.117626
## 5  ACCESS-ESM1-5   ssp585    9654.5  14514.299627  19168.418528  25316.329875

OK, enough tables, let’s plot some of these changes! Here we will visualize changes in wintertime HDDs. Create a table, which we will call out, to save our projected changes.

Python

out = recalc_exp(obs=tor_win[['Date', 'MeanTemp']],
                 anoms=dat_gf[dat_gf.Scenario.isin(['ssp126', 'ssp585'])],
                 stat=sum,
                 expr="lambda x: 0 if x >= 18 else 18 - x",
                 month=[12,1,2])
out

##            Model Scenario  Baseline     2011-2040     2041-2070     2071-2100
## 0    CESM2-WACCM   ssp126   53028.5  48576.564629  44838.623952  44157.880584
## 1    CESM2-WACCM   ssp585   53028.5  48120.016417  41235.864345  33750.766336
## 2   CMCC-CM2-SR5   ssp126   53028.5  49699.413777  45483.780595  43548.988272
## 3   CMCC-CM2-SR5   ssp585   53028.5  49851.410532  43451.032152  37340.631156
## 4  ACCESS-ESM1-5   ssp126   53028.5  49233.504730  45995.152863  45324.812784
## 5  ACCESS-ESM1-5   ssp585   53028.5  47825.790597  43219.911168  37308.674625

In Lab 3, we used pd.pivot() to turn our table from a narrow layout to a wide layout. We stressed, however, that the narrow layout was more efficient for plotting. This time around, we have a wide data frame, which we want to convert to a narrow one. The method that we need to call, the antithesis to pd.pivot() is pd.melt().

Python

out_long = out.melt(id_vars = ['Model', 'Scenario'], var_name = "Period", value_name = "Heating Degree Days")

Now we are ready to plot! Play around with the options in the seaborn (sns) plot below until you’ve come up with a figure you are happy with. Make sure to add complete labels (like a title, for instance).

Python

sns.catplot(x = "Period", y = "Heating Degree Days", hue = "Scenario", col = "Model",
            data = out_long, hue_order=["ssp126", "ssp585"], saturation = 0.5,
            palette = "Reds", kind = "bar", ci = None, height = 7, aspect = 0.66)

Baseline and future wintertime heating degree days (HDDs) at Toronto. Projections are based on changes to mean temperature forecast by three GCMs under the SSP1-2.6 and SSP5-8.5 scenarios.

Figure 10.2: Baseline and future wintertime heating degree days (HDDs) at Toronto. Projections are based on changes to mean temperature forecast by three GCMs under the SSP1-2.6 and SSP5-8.5 scenarios.

Great! We have now seen how we can apply GCM-projected change factors to our observed data to project changes in a climate index. In our next lab, we will look at more advanced downscaling methods.

10.4 Exercises (what to submit)

In Lab 1 (Sec. 4) we examined the “baseline” summertime Tropical Nights. Use the change factor method to project Tropical Nights into the future. Edit the following jupyter notebook and select the relevant variable and both the annual and seasonal change factor data. Describe differences in the baselines and change factors when you use the annual data versus seasonal data. What accounts for the differences? Provide a table(s) to support your answer [2 marks]
Are the models that best reproduced the annual baseline the same models that best reproduce the seasonal baseline? If not, can we trust our validation results? Provide a table(s) to support your answer[2 marks]
Calculate the changes to the Tropical Nights climate index using the annual and the seasonal data. What differences do you see ? As in the lab, focus on SSP1-2.6 and SSP5-8.5. Provide a table(s) to support your answer [2 marks]
Is it better to use the annual data or the seasonal data for a seasonal study? Why? [2 marks]
Produce two multi-panel plots similar to the Figure 10.2. One figure should plot the changes projected by the top-three “validated” models. The other should plot the changes projected by the two extremes and the ensemble average. Make sure that the figures are properly labelled and properly captioned. Hint: you may find it useful to assign a dummy “Limit” variable to the ensemble projections just like we assigned a dummy “Model” variable to dat_ens. You should also adjust the id_vars in pd.melt(), and adjust the col facet in sns.catplot(). If you are totally stuck, you can see an example here. [2 marks]
Compare, contrast, and comment on the three model selection methods explored in this lab. What are their strengths and weaknesses? [5 marks]