Sec. 10 Lab exercise 4: Choosing our models
Last update: Wed Feb 15 10:22:51 2023 -0500 (76511fd)
10.1 Introduction and setup
Now that we have reviewed some methods for selecting models, let’s see them in action. We’ll start by importing the relevant libraries, as usual.
Python
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt
Next, we need to find our downloaded anomaly data. We will be using the same .csv file that we used at the end of Lab 3 (Sec. @ref{lab3}) If you have renamed your file, or saved it somewhere strange, adjust this line as necessary.
Python
file = "data/ssp_change_factors.csv"
Now we read it using pandas.
Python
= pd.read_csv(file)
dat dat.head()
## Variable Model Scenario 1981-2010 2011-2040 2041-2070 2071-2100
## 0 tas ACCESS-CM2 ssp126 5.975702 1.795567 3.468750 4.010926
## 1 tas ACCESS-CM2 ssp245 5.975702 1.603762 3.545075 4.664551
## 2 tas ACCESS-CM2 ssp370 5.975702 1.916411 3.938202 5.793183
## 3 tas ACCESS-CM2 ssp585 5.975702 1.916542 4.229950 7.124695
## 4 tas ACCESS-ESM1-5 ssp126 9.203638 1.450132 2.687561 2.943726
Let’s see what we’re working with.
Python
dat.shape
## (118, 7)
It looks like we have 118 rows and 7 columns of data. Let’s drop the 'Variable'
column, which we won’t use in this analysis. Note that we are only analyzing the r1i1p1f1 ensemble member. If you want to analyze additional ensemble members, you should include a column identifying the ensemble member.
Python
= dat.drop(['Variable'], axis = 1)
dat dat.head()
## Model Scenario 1981-2010 2011-2040 2041-2070 2071-2100
## 0 ACCESS-CM2 ssp126 5.975702 1.795567 3.468750 4.010926
## 1 ACCESS-CM2 ssp245 5.975702 1.603762 3.545075 4.664551
## 2 ACCESS-CM2 ssp370 5.975702 1.916411 3.938202 5.793183
## 3 ACCESS-CM2 ssp585 5.975702 1.916542 4.229950 7.124695
## 4 ACCESS-ESM1-5 ssp126 9.203638 1.450132 2.687561 2.943726
10.2 Selecting our models
10.2.1 Using the extremes
Choosing our models based on the highest and lowest projected changes allows us to get an idea of the “range” of possibilities for the future. Since we are using change factors, we are only considering the changes that each model projects over its own baseline period. So these changes are relative, i.e. the model with the largest projected change is not necessarily the model with the highest projected temperature.
For this and all of our selection methods, we will made a copy of
dat
to work with using the copy()
method. This
is the safer option, as something like dat_ext = dat
creates a new reference to the same dat
table.
This isn’t an issue if we are just sub-setting and filtering data, but
it becomes troublesome if we want to change any of the data points; any
modification to the values in dat_ext
would also appear in
dat
! This will become more relevant later in this lab when
we create the tor_win
data frame.
Python
= dat.copy() dat_ext
We will assess highest and lowest projected changes based on the average over the three tridecadal time periods. We will add a new column, called 'MeanAnom'
, which will contain the average tridecadal anomaly, or change factor, produced by each model and then sort this new column from smallest change to largest.
Python
'MeanAnom'] = dat_ext.iloc[:,-3:].mean(axis = 1)
dat_ext[= dat_ext.sort_values('MeanAnom')
dat_ext # Smallest dat_ext.head()
## Model Scenario 1981-2010 2011-2040 2041-2070 2071-2100 MeanAnom
## 11 CAMS-CSM1-0 ssp126 7.211725 0.560417 0.895661 1.103638 0.853239
## 72 IITM-ESM ssp126 4.398706 0.752211 1.071686 1.180725 1.001541
## 94 KIOST-ESM ssp126 2.966669 0.802513 1.428680 1.379089 1.203427
## 12 CAMS-CSM1-0 ssp245 7.211725 0.472422 1.578430 1.860260 1.303704
## 59 FGOALS-g3 ssp126 6.822473 1.410681 1.378022 1.440735 1.409813
# Largest dat_ext.tail()
## Model Scenario 1981-2010 2011-2040 2041-2070 2071-2100 MeanAnom
## 93 KACE-1-0-G ssp585 8.671564 2.172386 4.747009 7.160980 4.693458
## 117 TaiESM1 ssp585 6.847009 2.149199 4.891968 7.551697 4.864288
## 43 EC-Earth3 ssp585 6.003870 2.653685 4.608856 7.428070 4.896871
## 37 CanESM5 ssp585 8.782312 2.131543 4.874146 8.249115 5.084935
## 39 E3SM-1-1 ssp585 7.465784 2.758083 5.961243 9.758056 6.159128
The next code block is a little complicated, so let’s walk through it: first, we group dat_ext
by 'Scenario'
and get the smallest value for each using first()
. Make sure you sorted the values in the previous code block or you won’t get the expected output! We label these first values as “Lower”, to represent the lower limit of the model projections, and restore the index, which was changed by groupby
. We then do the same for the last()
value, labelling these as “Upper”. Finally, we append these two tables together, and drop the 'MeanAnom'
column.
Python
= dat_ext.groupby('Scenario').first().assign(Limit = "Lower").reset_index()
lower = dat_ext.groupby("Scenario").last().assign(Limit = "Upper").reset_index()
upper = pd.concat([lower,upper]).drop("MeanAnom", axis = 1)
dat_ext dat_ext
## Scenario Model 1981-2010 2011-2040 2041-2070 2071-2100 Limit
## 0 ssp126 CAMS-CSM1-0 7.211725 0.560417 0.895661 1.103638 Lower
## 1 ssp245 CAMS-CSM1-0 7.211725 0.472422 1.578430 1.860260 Lower
## 2 ssp370 CAMS-CSM1-0 7.211725 0.709878 1.681183 2.707093 Lower
## 3 ssp585 CAMS-CSM1-0 7.211725 0.588055 1.840119 3.313080 Lower
## 0 ssp126 KACE-1-0-G 8.671564 2.406111 3.670105 3.714386 Upper
## 1 ssp245 CanESM5 8.782312 2.065475 3.647126 4.935028 Upper
## 2 ssp370 CanESM5 8.782312 2.133130 4.579010 6.976593 Upper
## 3 ssp585 E3SM-1-1 7.465784 2.758083 5.961243 9.758056 Upper
Now we have a table of the two extreme models for each scenario that identifies each model as providing either an upper or lower extreme. Great! This table suits our needs for this analysis. We will revisit this table later.
10.2.2 Validating model baselines
Another model selection method is to validate a model against observed baseline values. Before we can validate, we need data to validate against. Here we are using the very same .csv file that we generated in Lab 1. If you no longer have that file, give the code in Lab 1 a quick run and you should be able to re-create it in no time. Note that when we saved the file using tor.to_csv()
, an index column was included in the file. Use the index_col
argument to let pandas know about this, or you will end up with an extra unnamed column of numbers from 0 to the number of rows in the file.
Python
= pd.read_csv("tor.csv", index_col = 0)
tor tor.head()
## Date MaxTemp MinTemp MeanTemp TotalPrecip
## 0 1981-01-01 -5.7 -10.2 -8.0 5.0
## 1 1981-01-02 -9.2 -15.8 -12.5 0.0
## 2 1981-01-03 -16.2 -20.7 -18.5 0.0
## 3 1981-01-04 -12.3 -26.9 -19.6 0.0
## 4 1981-01-05 -4.7 -14.0 -9.4 0.0
There are many different approaches to model validation. Here, we will use the Gough–Fenech Confidence Index (GFCI) to help us choose our models. The GFCI quantifies the difference in the model-projected baseline as a proportion of the standard deviation of the aggregate observed values using the following equation:
\[\textrm{GFCI} = \frac{ \left | T_{\textrm{mod}} - T_{\textrm{obs}} \right |}{\sigma_{\textrm{obs}}}\]
\(\left | T_{\textrm{mod}} - T_{\textrm{obs}} \right |\) is the absolute difference between the modelled baseline and the observed baseline, and \(\sigma_{\textrm{obs}}\) is the standard deviation of the aggregate observed values. In this case, we need to use the mean and standard deviation of the annual aggregate values of our observed Toronto station data. If we want to test for \(T_{\textrm{mean}}\), for instance, we would first calculate the annual average \(T_{\textrm{mean}}\) for each of the 30-years in our baseline, and then take the mean and standard deviation of those 30 annual values.
If you are performing a monthly or seasonal analysis, you should use the respective aggregate values. As an example, if you are performing your analysis on winter values, \(T_{\textrm{obs}}\) is the average yearly winter temperature and \(\sigma_{\textrm{obs}}\) is the standard deviation of those average yearly winter temperatures.
pandas doesn’t reliably process date information in a .csv file, so we need to fix the 'Date'
column.
Python
= pd.to_datetime(tor.Date) tor.Date
Now we can generate our annual average \(T_{\textrm{mean}}\).
Python
= tor[['Date', 'MeanTemp']].groupby(tor.Date.dt.year).mean(numeric_only=True)
tor_mean tor_mean.head()
## MeanTemp
## Date
## 1981 8.824658
## 1982 8.562192
## 1983 9.380000
## 1984 9.067486
## 1985 8.866575
Now we calculate the mean and the standard deviation of those values.
Python
= tor_mean.MeanTemp.values.mean()
tor_tas_mean = tor_mean.MeanTemp.values.std()
tor_tas_std
print("Mean:", tor_tas_mean, "... Standard Deviation:", tor_tas_std)
## Mean: 9.526999642630084 ... Standard Deviation: 0.7626351751355993
Since we have the same baseline for each model, so each SSP, let’s copy the original data frame to get the unique baseline values for each model.
Python
= dat.copy()[['Model', '1981-2010']].drop_duplicates() dat_gf
Now we calculate the GFCI.
Python
= dat_gf.assign(GFCI = abs(dat_gf[['1981-2010']] - tor_tas_mean) / tor_tas_std)
dat_gf 'GFCI').head() dat_gf.sort_values(
## Model 1981-2010 GFCI
## 4 ACCESS-ESM1-5 9.203638 0.424006
## 19 CESM2-WACCM 8.903986 0.816922
## 26 CMCC-CM2-SR5 8.875727 0.853977
## 15 CAS-ESM2-0 8.821039 0.925686
## 34 CanESM5 8.782312 0.976466
The GFCI defines confidence as follows:
- Values below 1.0 indicate that the bias in the model projections was within one standard deviation of the observed aggregate values. These values are further subdivided:
- Values between 0.5 and 1.0 are considered to merit moderate confidence
- Values below 0.5 are considered to merit high confidence
- Values above 1.0 indicate that the model bias exceeds one standard deviation of the aggregate values. These projections are considered to be suspect and should not be used.
We can whip this into a simple function so that we can easily classify the confidence that we have in our models.
Python
def confidence(GFCI):
if (GFCI < 0.5):
return "high"
elif (GFCI > 1.0):
return "low"
else:
return "moderate"
Python
'Confidence'] = dat_gf.GFCI.apply(confidence)
dat_gf['GFCI').head(10) dat_gf.sort_values(
## Model 1981-2010 GFCI Confidence
## 4 ACCESS-ESM1-5 9.203638 0.424006 high
## 19 CESM2-WACCM 8.903986 0.816922 moderate
## 26 CMCC-CM2-SR5 8.875727 0.853977 moderate
## 15 CAS-ESM2-0 8.821039 0.925686 moderate
## 34 CanESM5 8.782312 0.976466 moderate
## 97 MIROC6 10.297052 1.009726 low
## 90 KACE-1-0-G 8.671564 1.121684 low
## 86 IPSL-CM6A-LR 8.605860 1.207838 low
## 6 AWI-CM-1-1-MR 8.574091 1.249495 low
## 30 CMCC-ESM2 8.500208 1.346373 low
Let’s see this information in a figure!
Python
=(10,6))
plt.figure(figsize= sns.scatterplot(x="Model", y="GFCI", hue="Confidence", data=dat_gf.sort_values('GFCI'))
fg r'GFCI Values for Model-Projected $T_\mathrm{mean}$ at Toronto (1981‒2010)')
plt.title(for label in fg.axes.get_xticklabels():
90)
label.set_rotation( plt.show()
Let’s get just the top three models
Python
= dat_gf.sort_values('GFCI').iloc[0:3,:]
dat_gf dat_gf
## Model 1981-2010 GFCI Confidence
## 4 ACCESS-ESM1-5 9.203638 0.424006 high
## 19 CESM2-WACCM 8.903986 0.816922 moderate
## 26 CMCC-CM2-SR5 8.875727 0.853977 moderate
Now that we have our models, we need to recover the change factors for these models! We can do this with a left merge of dat
into dat_gf
.
Python
= pd.merge(dat_gf, dat, how = 'left', on = ['Model', '1981-2010'])
dat_gf dat_gf
## Model 1981-2010 GFCI Confidence Scenario 2011-2040 \
## 0 ACCESS-ESM1-5 9.203638 0.424006 high ssp126 1.450132
## 1 ACCESS-ESM1-5 9.203638 0.424006 high ssp585 1.988043
## 2 CESM2-WACCM 8.903986 0.816922 moderate ssp126 1.701160
## 3 CESM2-WACCM 8.903986 0.816922 moderate ssp245 1.588462
## 4 CESM2-WACCM 8.903986 0.816922 moderate ssp370 1.462937
## 5 CESM2-WACCM 8.903986 0.816922 moderate ssp585 1.875615
## 6 CMCC-CM2-SR5 8.875727 0.853977 moderate ssp126 1.272100
## 7 CMCC-CM2-SR5 8.875727 0.853977 moderate ssp245 0.998358
## 8 CMCC-CM2-SR5 8.875727 0.853977 moderate ssp370 0.934538
## 9 CMCC-CM2-SR5 8.875727 0.853977 moderate ssp585 1.214020
##
## 2041-2070 2071-2100
## 0 2.687561 2.943726
## 1 3.748352 6.009125
## 2 3.129578 3.389801
## 3 3.355164 4.185547
## 4 3.879700 5.488342
## 5 4.506897 7.373016
## 6 2.882965 3.622558
## 7 2.982116 4.386383
## 8 2.791137 4.430694
## 9 3.660003 5.996887
You may notice in the plot above that some of the models with similar rankings have similar names. This is a good reminder of our caveat about model independence (see Sec. 9.2). We have ranked our models under the assumption that the models are independent, but in reality our models are not independent. Therefore, depending on the results of the GFCI ranking, it is possible that one of the top models will contribute little by the way of new knowledge to the analysis. In this sense, ranking models by their output value may in fact provide a better indicator of model interdependence than of model skill. For more on model independence, see this blog post from RealClimate.
10.2.3 Using the multi-model ensemble
Using the ensemble is as simple as taking the average for each scenario across each scenario.
Python
= dat.copy().groupby(['Scenario']).mean(numeric_only=True).reset_index()
dat_ens dat_ens
## Scenario 1981-2010 2011-2040 2041-2070 2071-2100
## 0 ssp126 7.440728 1.437420 2.333416 2.488095
## 1 ssp245 7.379808 1.423465 2.716448 3.626614
## 2 ssp370 7.377921 1.417459 3.020552 4.830899
## 3 ssp585 7.487832 1.598217 3.677373 6.128601
You may be concerned that the baseline values differ across scenarios. What we’re seeing here is an inconsistency based on the number of models that provide projections for each scenario. Since we aren’t doing any validation here, it doesn’t really matter, but it is nice to know how many models are providing projections for each scenario and period.
Python
'Scenario']).count().reset_index() dat.groupby([
## Scenario Model 1981-2010 2011-2040 2041-2070 2071-2100
## 0 ssp126 28 28 28 28 28
## 1 ssp245 30 30 30 29 29
## 2 ssp370 27 27 27 27 26
## 3 ssp585 33 33 33 33 33
From the above table, we see that SSP3-7.0 is the least commonly modelled. For the remainder of the lab, we’ll use the SSP1-2.6 and SSP5-8.5 scenarios when we apply our change factors. Using these two scenarios highlights the stark differences that these two plausible futures present.
10.3 Applying our change factors
Now that we have chosen our models, let’s see how we can apply these change factors to our observed data. For each period, we apply the given change factor to the daily values from our observed data. If we were to graph this, it would look a little bit like a jagged staircase, with each tridecade shifting by some fixed amount. It is not, however, the daily values that are of interest. We cannot reliably forecast daily weather far into the future. Instead, we use the past history, combined with the average tridecadal change to estimate changes in our climate index. Since we are looking at \(T_{\textrm{mean}}\) here, we won’t examine tropical nights as we did in Lab 1. Here we will use summer cooling degree days and winter heating degree days for the SSP1-2.6 and SSP5-8.5 scenarios.
In these examples, we are using annual change factors to analyze seasonal climate indices. This is lazy at best and downright wrong at worst! The sections below are only meant as code examples. In your research, make sure to use the appropriate change factors for your period of interest.
We will begin this section by showing how we can manually apply the change factors, and then we will define a function that will make this easier.
10.3.1 Using the extremes
Let’s begin by filtering our extreme table to only use the SSP1-2.6 and SSP5-8.5 scenarios.
Python
= dat_ext[dat_ext.Scenario.isin(['ssp126', 'ssp585'])]
dat_ext dat_ext
## Scenario Model 1981-2010 2011-2040 2041-2070 2071-2100 Limit
## 0 ssp126 CAMS-CSM1-0 7.211725 0.560417 0.895661 1.103638 Lower
## 3 ssp585 CAMS-CSM1-0 7.211725 0.588055 1.840119 3.313080 Lower
## 0 ssp126 KACE-1-0-G 8.671564 2.406111 3.670105 3.714386 Upper
## 3 ssp585 E3SM-1-1 7.465784 2.758083 5.961243 9.758056 Upper
Let’s apply the high extreme for SSP1-2.6 for the 2020s. There are a number of ways of sub-setting the table, above, use whichever is easiest for you.
Python
= dat_ext.loc[dat_ext.Model == 'KACE-1-0-G', '2011-2040'].values
anom anom
## array([2.4061114])
Now we add that anomaly to the daily 'MeanTemp'
values in a copy of the tor
data frame.
Python
= tor.copy()[['Date', 'MeanTemp']]
tor_ext = tor_ext.assign(hex2020s = tor.MeanTemp + anom)
tor_ext tor_ext.head()
## Date MeanTemp hex2020s
## 0 1981-01-01 -8.0 -5.593889
## 1 1981-01-02 -12.5 -10.093889
## 2 1981-01-03 -18.5 -16.093889
## 3 1981-01-04 -19.6 -17.193889
## 4 1981-01-05 -9.4 -6.993889
Now we can (re)calculate our climate index, cooling degree-days (CDDs) using a lambda expression just as we did in Lab 1.
Python
'CDDObs'] = tor_ext.MeanTemp.apply(lambda x: 0 if x <= 18 else x - 18)
tor_ext['CDDhex2020s'] = tor_ext.hex2020s.apply(lambda x: 0 if x <= 18 else x - 18)
tor_ext[ tor_ext.head()
## Date MeanTemp hex2020s CDDObs CDDhex2020s
## 0 1981-01-01 -8.0 -5.593889 0.0 0.0
## 1 1981-01-02 -12.5 -10.093889 0.0 0.0
## 2 1981-01-03 -18.5 -16.093889 0.0 0.0
## 3 1981-01-04 -19.6 -17.193889 0.0 0.0
## 4 1981-01-05 -9.4 -6.993889 0.0 0.0
So, what is the projected change to summertime CDDs? Let’s take a look.
Python
6,7,8])].filter(regex = '^CDD').sum() tor_ext[tor_ext.Date.dt.month.isin([
## CDDObs 9654.500000
## CDDhex2020s 15602.942425
## dtype: float64
It looks like we can expect an increase from 9654.5 total CDDs from the baseline to 15602.9 in the 2020s. This could have serious implications for energy supply for air conditioning!
Great! That’s one projection down; 11 more to go to fill in our first table. This looks like it could turn out to be a monotonous process. Let’s whip up a function to do the work for us! We will create a function that takes five arguments: obs
is a two-column Data Frame containing the 'Date'
and variable column; anoms
is the respective dat_*
table; stat
is the stat that we want to calculate for the climate index, e.g. sum
; expr
is a lambda expression that we evaluate to calculate the climate index (some examples are in Appendix A); finally, month
is a list of one or more months to filter on, defaulting to None
. If in doubt, the function has a complete docstring to help you to choose values for each parameter.
Python
from math import isnan
def recalc_exp(obs, anoms, stat, expr, month = None):
"""Recalculate and aggregate a climate index using observed data
and Conjuntool output.
Parameters
----------
obs : pandas.core.series.Series
A single-column or array data frame with 'Date' and some variable column
anoms : pandas.core.frame.DataFrame
The data frame produced by Conjuntool, filtered as necessary
stat : builtin_function_or_method or function
The function to call on the calculated climate index, e.g. sum
expr : str
A lambda expression to be applied to the variable column to
calculate the climate index
month : list
A list of the months to filter on (optional)
"""
= []
row_list if month is not None:
= obs[obs.Date.dt.month.isin(month)]
obs = stat(obs.iloc[:,1].apply(eval(expr)))
baseline for model in set(anoms.Model):
for scen in set(anoms.Scenario):
= {}
periods for period in [col for col in anoms.columns if '-' in col][1:]:
= anoms.loc[(anoms.Scenario == scen) & (anoms.Model == model)][[period]].values
anom if len(anom) == 0:
continue
else:
= anom[0]
anom = obs.assign(new = obs.iloc[:,1].values + anom)
tmp = stat(tmp.new.apply(eval(expr)))
tmp = tmp
periods[period] if all(isnan(value) for value in periods.values()):
continue
else:
= {'Model': model,
row 'Scenario': scen,
'Baseline': baseline}
= {**row, **periods}
row
row_list.append(row)return pd.DataFrame(row_list, columns=row.keys())
Let’s take a look at CDDs, based on our extreme models.
Python
=tor[['Date', 'MeanTemp']],
recalc_exp(obs=dat_ext,
anoms=sum,
stat="lambda x: 0 if x <= 18 else x - 18",
expr=[6,7,8]) month
## Model Scenario Baseline 2011-2040 2041-2070 2071-2100
## 0 KACE-1-0-G ssp126 9654.5 15602.942425 18958.131925 19077.083954
## 1 E3SM-1-1 ssp585 9654.5 16529.507491 25185.106916 35648.635664
## 2 CAMS-CSM1-0 ssp126 9654.5 10957.468592 11763.282749 12274.430012
## 3 CAMS-CSM1-0 ssp585 9654.5 11023.055417 14133.785986 18003.024134
As we might have expected, there is a larger increase in CDDs for the SSP5-8.5 Scenario than for the SSP1-2.6 scenario, but the overall smallest and largest increases in CDDs depends on whether we consider the lower or upper model extremes.
The above table is missing an important piece of information. In our dat_ext
table, we had labelled each model as providing and upper or lower extreme, but this information is not preserved by the recalc_exp()
function. How can we recover the information? The simplest option might be to replace the values in the dat_ext.Model
with the values of dat_ext.Limit
, but this, again, removes some key information from the results, specifically, the model that provided each projection. Another option would be to re-write recalc_exp()
so that it preserves both pieces of information. The third option, however, is the one that we will use: we already know whether each model in our recalc_exp()
output represents the upper or lower limit of our projections, since that information is present in dat_ext
. We can use the same mechanism of merging two tables that we used in in the GFCI section to create a table that includes both pieces of data.
Python
= recalc_exp(obs=tor[['Date', 'MeanTemp']],
projected_cdds =dat_ext,
anoms=sum,
stat="lambda x: 0 if x <= 18 else x - 18",
expr=[6,7,8])
month
'Scenario', 'Model', 'Limit']]) pd.merge(projected_cdds, dat_ext[[
## Model Scenario Baseline 2011-2040 2041-2070 2071-2100 \
## 0 KACE-1-0-G ssp126 9654.5 15602.942425 18958.131925 19077.083954
## 1 E3SM-1-1 ssp585 9654.5 16529.507491 25185.106916 35648.635664
## 2 CAMS-CSM1-0 ssp126 9654.5 10957.468592 11763.282749 12274.430012
## 3 CAMS-CSM1-0 ssp585 9654.5 11023.055417 14133.785986 18003.024134
##
## Limit
## 0 Upper
## 1 Upper
## 2 Lower
## 3 Lower
We can also use our recalc_exp()
function to give us the 30-year daily average CDD, instead of the 30-year total.
Python
from numpy import mean
=tor[['Date', 'MeanTemp']],
pd.merge(recalc_exp(obs=dat_ext,
anoms=mean,
stat="lambda x: 0 if x <= 18 else x - 18",
expr=[6,7,8]),
month'Scenario', 'Model', 'Limit']]) dat_ext[[
## Model Scenario Baseline 2011-2040 2041-2070 2071-2100 Limit
## 0 KACE-1-0-G ssp126 3.498007 5.653240 6.868888 6.911987 Upper
## 1 E3SM-1-1 ssp585 3.498007 5.988952 9.125039 12.916172 Upper
## 2 CAMS-CSM1-0 ssp126 3.498007 3.970097 4.262059 4.447257 Lower
## 3 CAMS-CSM1-0 ssp585 3.498007 3.993861 5.120937 6.522835 Lower
Bonus exercise
The recalc_exp()
function works on the daily values of
each 30-year tridecade. Thus, it returns either the daily average, or
the tridecadal total for a given climate index. These values might not
be useful metrics for most analyses. It might be more useful to know the
annual average or sum for a given metric. In our example the
seasonal or annual total CDDs might be more useful to policy-makers.
For bonus marks: modify the recalc_exp()
function above
so that it first aggregates the daily values by year
(obs.Date.dt.year
), and then applies the calculation
provided by the stat
argument. For full marks, your
function should also retain the original functionality of
recalc_exp()
. (Hint: you may need to add an extra
parameter to the function).
Once you have finished changing the code, generate a patch for the original function using software such as Pretty Diff. This will highlight the changes that you made. Submit this patch along with your regular “What to submit” exercises.
What if we want to calculate winter heating degree days? We’ll need to shift our Decembers up a year like we did in Lab 1. This isn’t strictly necessary at the moment, since we are looking at 30-year totals, but it is always a good habit to maintain and allows us to be more precise about the seasons in consideration. It will become especially relevant for those of you who undertake the bonus assignment in the box above.
Python
= tor.copy()
tor_win 'Date'] = tor_win.Date.apply(lambda x: x + pd.DateOffset(years=1) if x.month == 12 else x)
tor_win[# Not shifted tor_win.head()
## Date MaxTemp MinTemp MeanTemp TotalPrecip
## 0 1981-01-01 -5.7 -10.2 -8.0 5.0
## 1 1981-01-02 -9.2 -15.8 -12.5 0.0
## 2 1981-01-03 -16.2 -20.7 -18.5 0.0
## 3 1981-01-04 -12.3 -26.9 -19.6 0.0
## 4 1981-01-05 -4.7 -14.0 -9.4 0.0
# Shifted tor_win.tail()
## Date MaxTemp MinTemp MeanTemp TotalPrecip
## 4013 2011-12-27 -0.9 -10.7 -5.8 0.0
## 4014 2011-12-28 0.7 -3.9 -1.6 0.0
## 4015 2011-12-29 -0.2 -3.1 -1.7 0.0
## 4016 2011-12-30 5.7 -3.5 1.1 0.0
## 4017 2011-12-31 9.9 5.3 7.6 0.0
Let’s drop the years that we know to be incomplete, winter 1981/82, which is missing December 1980, and winter 2010/11, which is just December 2010. Remember, a better solution for the former would be to download data for December 1980 so that we can include all 30 winters.
Python
= tor_win[tor_win.Date.dt.year.isin(range(1982, 2011))] tor_win
Python
=tor_win[['Date', 'MeanTemp']],
pd.merge(recalc_exp(obs=dat_ext,
anoms=sum,
stat="lambda x: 0 if x >= 18 else 18 - x",
expr=[12,1,2]),
month'Scenario', 'Model', 'Limit']]) dat_ext[[
## Model Scenario Baseline 2011-2040 2041-2070 2071-2100 \
## 0 KACE-1-0-G ssp126 53028.5 46731.706466 43424.605320 43308.766224
## 1 E3SM-1-1 ssp585 53028.5 45810.596091 37433.732239 27568.678939
## 2 CAMS-CSM1-0 ssp126 53028.5 51561.889758 50684.554640 50140.278831
## 3 CAMS-CSM1-0 ssp585 53028.5 51489.559071 48212.908054 44358.582197
##
## Limit
## 0 Upper
## 1 Upper
## 2 Lower
## 3 Lower
As winter is projected to warm, we see less heating degree days. This change is more rapid in the scenario with more radiative forcing.
10.3.2 Using the multi-model ensemble
Since the recalc_exp()
function expects our anoms
data frame to include a 'Model'
column, we should add one.
Python
'Model'] = "Ensemble"
dat_ens[ dat_ens
## Scenario 1981-2010 2011-2040 2041-2070 2071-2100 Model
## 0 ssp126 7.440728 1.437420 2.333416 2.488095 Ensemble
## 1 ssp245 7.379808 1.423465 2.716448 3.626614 Ensemble
## 2 ssp370 7.377921 1.417459 3.020552 4.830899 Ensemble
## 3 ssp585 7.487832 1.598217 3.677373 6.128601 Ensemble
Now we can take a look at the projected changes in our climate indices.
For CDDs:
Python
=tor[['Date', 'MeanTemp']],
recalc_exp(obs=dat_ens[dat_ens.Scenario.isin(['ssp126', 'ssp585'])],
anoms=sum,
stat="lambda x: 0 if x <= 18 else x - 18",
expr=[6,7,8]) month
## Model Scenario Baseline 2011-2040 2041-2070 2071-2100
## 0 Ensemble ssp126 9654.5 13108.585616 15412.316097 15818.150097
## 1 Ensemble ssp585 9654.5 13515.777228 18977.646700 25644.053599
For HDDs:
Python
=tor_win[['Date', 'MeanTemp']],
recalc_exp(obs=dat_ens[dat_ens.Scenario.isin(['ssp126', 'ssp585'])],
anoms=sum,
stat="lambda x: 0 if x >= 18 else 18 - x",
expr=[12,1,2]) month
## Model Scenario Baseline 2011-2040 2041-2070 2071-2100
## 0 Ensemble ssp126 53028.5 49266.77174 46921.950982 46517.154665
## 1 Ensemble ssp585 53028.5 48845.96539 43405.592042 36996.721784
10.3.3 Using our top-ranked models
How about our top ranked models? Our function works as expected for CDDs:
Python
=tor[['Date', 'MeanTemp']],
recalc_exp(obs=dat_gf[dat_gf.Scenario.isin(['ssp126', 'ssp585'])],
anoms=sum,
stat="lambda x: 0 if x <= 18 else x - 18",
expr=[6,7,8]) month
## Model Scenario Baseline 2011-2040 2041-2070 2071-2100
## 0 CESM2-WACCM ssp126 9654.5 13777.868099 17514.166214 18207.868670
## 1 CESM2-WACCM ssp585 9654.5 14224.902849 21217.811561 29068.505112
## 2 CMCC-CM2-SR5 ssp126 9654.5 12693.562099 16859.193530 18830.468230
## 3 CMCC-CM2-SR5 ssp585 9654.5 12548.593088 18931.008055 25282.770380
## 4 ACCESS-ESM1-5 ssp126 9654.5 13140.683132 16343.610796 17020.117626
## 5 ACCESS-ESM1-5 ssp585 9654.5 14514.299627 19168.418528 25316.329875
OK, enough tables, let’s plot some of these changes! Here we will visualize changes in wintertime HDDs. Create a table, which we will call out
, to save our projected changes.
Python
= recalc_exp(obs=tor_win[['Date', 'MeanTemp']],
out =dat_gf[dat_gf.Scenario.isin(['ssp126', 'ssp585'])],
anoms=sum,
stat="lambda x: 0 if x >= 18 else 18 - x",
expr=[12,1,2])
month out
## Model Scenario Baseline 2011-2040 2041-2070 2071-2100
## 0 CESM2-WACCM ssp126 53028.5 48576.564629 44838.623952 44157.880584
## 1 CESM2-WACCM ssp585 53028.5 48120.016417 41235.864345 33750.766336
## 2 CMCC-CM2-SR5 ssp126 53028.5 49699.413777 45483.780595 43548.988272
## 3 CMCC-CM2-SR5 ssp585 53028.5 49851.410532 43451.032152 37340.631156
## 4 ACCESS-ESM1-5 ssp126 53028.5 49233.504730 45995.152863 45324.812784
## 5 ACCESS-ESM1-5 ssp585 53028.5 47825.790597 43219.911168 37308.674625
In Lab 3, we used pd.pivot()
to turn our table from a narrow layout to a wide layout. We stressed, however, that the narrow layout was more efficient for plotting. This time around, we have a wide data frame, which we want to convert to a narrow one. The method that we need to call, the antithesis to pd.pivot()
is pd.melt()
.
Python
= out.melt(id_vars = ['Model', 'Scenario'], var_name = "Period", value_name = "Heating Degree Days") out_long
Now we are ready to plot! Play around with the options in the seaborn (sns
) plot below until you’ve come up with a figure you are happy with. Make sure to add complete labels (like a title, for instance).
Python
= "Period", y = "Heating Degree Days", hue = "Scenario", col = "Model",
sns.catplot(x = out_long, hue_order=["ssp126", "ssp585"], saturation = 0.5,
data = "Reds", kind = "bar", ci = None, height = 7, aspect = 0.66) palette
Great! We have now seen how we can apply GCM-projected change factors to our observed data to project changes in a climate index. In our next lab, we will look at more advanced downscaling methods.
10.4 Exercises (what to submit)
- In Lab 1 (Sec. 4) we examined the “baseline” summertime Tropical Nights. Use the change factor method to project Tropical Nights into the future. Edit the following jupyter notebook and select the relevant variable and both the annual and seasonal change factor data. Describe differences in the baselines and change factors when you use the annual data versus seasonal data. What accounts for the differences? Provide a table(s) to support your answer [2 marks]
- Are the models that best reproduced the annual baseline the same models that best reproduce the seasonal baseline? If not, can we trust our validation results? Provide a table(s) to support your answer[2 marks]
- Calculate the changes to the Tropical Nights climate index using the annual and the seasonal data. What differences do you see ? As in the lab, focus on SSP1-2.6 and SSP5-8.5. Provide a table(s) to support your answer [2 marks]
- Is it better to use the annual data or the seasonal data for a seasonal study? Why? [2 marks]
-
Produce two multi-panel plots similar to the Figure 10.2. One figure
should plot the changes projected by the top-three “validated” models.
The other should plot the changes projected by the two extremes and the
ensemble average. Make sure that the figures are properly labelled and
properly captioned. Hint: you may find it useful to assign a dummy
“Limit”
variable to the ensemble projections just like we assigned a dummy“Model”
variable todat_ens
. You should also adjust theid_vars
inpd.melt()
, and adjust thecol
facet insns.catplot()
. If you are totally stuck, you can see an example here. [2 marks] - Compare, contrast, and comment on the three model selection methods explored in this lab. What are their strengths and weaknesses? [5 marks]
The methods that we have seen above are quick and useful for pedagogy, but there are lots of other, more advanced ways to choose models for CCIA studies. A lot of recent study has been put toward the identification of optimal sub-ensembles, using a handful of the most relevant models. You should do some independent reading on some of these methods (e.g. constrained projections) to ensure that your toolbox is up-to-date!