Sec. 4 Lab exercise 1: Understanding the recent past

Last update: Fri Jun 9 12:12:10 2023 +0000 (444a736)

The first step of a climate change impact assessment (CCIA) is to determine the metric or metrics that we will use to understand impacts. You could choose any element or variable that is directly or indirectly affected by changes in the climate. For example, in the newest IPCC report, the term “climatic impact-driver” (CID) is used to describe “physical climate system conditions (e.g., means, events, extremes) that affect an element of society or ecosystems” (IPCC 2021). The IPCC report groups CIDs into seven groups: heat & cold (e.g. extreme heat, frost); wet & dry (e.g. heavy precipitation, fire weather); wind (e.g. tropical cyclones); snow & ice (e.g. permafrost, hail); “other” (e.g. air pollution weather); coastal (e.g. relative sea level); and open ocean (e.g. ocean salinity). Other terms are common too, such as “indices”, as used by Climdex or simply “variables” as used by ClimateData.ca.

The examples that we listed above are all physical variables, but your variable of interest could be a derived variable. For instance, perhaps you are interested in understanding whether there is a link between climate change and school continuity in a “Snowbelt” community such as Barrie, Ontario. Such a study may use an index such as annual school closures, or “snow days”, to determine whether changes to physical variables such as winter temperatures and snowfall amounts have social effects.

In this lab you will acquire Canadian climate data for Toronto, Ontario, and perform a baseline analysis of multiple climatic indices. For further examples of climate indices or variables, and examples of Python functions to calculate them, refer to Appendix A.

What do we mean by baseline? A baseline period is needed to define the observed climate. This observed climate information is combined with GCM output of climate change to determine the climate change impact for a given future climate scenario (e.g. Representative Concentration Pathway (RCP)). When using climate model output, the baseline also serves as the reference period from which the modelled future change in climate is calculated. Future climate change is defined by comparing climate statistics for the baseline period with similar statistics for a future period of the same length, for a given climate scenario.

Environment and Climate Change Canada (ECCC) makes data available over an internet portal. You can visit this page, and download data manually, but that is a very tedious process. In this lab, we will use a custom Python module to make it easier to download these data in bulk.

4.1 Getting the data

ECCC makes data available via a web server and gives instructions for downloading data manually via Bash and Wget. You can find instructions for using the command line to download data on Google Drive.

For convenience, we have written a small Python module, called ec3 to search for and download data from the ECCC archive. You can review the source code for this module on GitLab. ec3 is also available as a stand-alone command-line program, so if you have a colleague who is afraid to learn how to code, they can still use ec3 to speed up the process of downloading data from ECCC.

If you followed the instructions in Section 3, then you already have ec3 installed. If you didn’t, you can install it via Conda. Note that ec3 is not available in the official Anaconda channels, and has some dependencies from the “conda-forge” channel. First, prepend conda-forge to the start of your channels list so that packages get installed from there first, and append Conor’s personal channel to the end of the list so that it is only used for packages that are not available in any other channel.

Shell

conda config --prepend channels conda-forge
conda config --append channels claut
conda install ec3

Once you have the ec3 module installed, import it.

Python

import ec3

The ec3 module contains two functions: ec3.find_station(), and ec3.get_data(). The functions have complete docstrings, so you can check the function documentation for syntax, e.g.

Python

print(ec3.get_data.__doc__)

## Download data from the Environment and Climate Change Canada
##     historical data archive
## 
##     Optional Parameters
##     ----------
##     stations : int or list
##         One or more station codes to download.
##     type : int or str
##         The type of data to search for (required if period is not None)
##         Options are: 1, hourly; 2, daily; 3, monthly
##     years : int or range
##         Range of years for which to download data (does not apply to monthly)
##     months : int or range
##         Range of months for which to download data (only applies to hourly)
##     progress : Boolean
##         Whether to show the progress bar.
##

Let’s download the daily data for the “Toronto” station (No. 5051) from 1981 to 2010. We are starting with this 30-year time period to generate some baseline statistics; however, you will be asked to examine the 1991-2020 time period as well for your lab submission.

We can request the dates that we want using a Python range. Remember that Python ranges are “left inclusive, right exclusive”. That means that range(1981, 2011) will give us data from 1981 up to (but not including) 2011. Note that we use progress=False to turn off the progress bar here because it interferes with the book rendering. You can omit the progress argument.

Python

tor = ec3.get_data(stations=5051, years=range(1981, 2011), progress=False)

The ec3 module returns historical climate data in a data structure called a DataFrame (or data frame), which comes from the pandas library. A data frame is a tabular data object, and includes features that users of spreadsheet software might be used to, like labelled columns. pandas provides us with a lot of power and flexibility to organize, modify, and plot our data. It is particularly useful for time series, like our data. For more information on pandas, refer to the documentation.

4.2 Cleaning the data

Now that we have the data that we need, we can start working with it. First, let’s import the libraries we are going to need to manipulate our data.

Python

import numpy as np
import pandas as pd

Let’s take a look at what our columns are named.

Python

tor.columns

## Index(['Station', 'Longitude (x)', 'Latitude (y)', 'Station Name',
##        'Climate ID', 'Date/Time', 'Year', 'Month', 'Day', 'Data Quality',
##        'Max Temp (°C)', 'Max Temp Flag', 'Min Temp (°C)', 'Min Temp Flag',
##        'Mean Temp (°C)', 'Mean Temp Flag', 'Heat Deg Days (°C)',
##        'Heat Deg Days Flag', 'Cool Deg Days (°C)', 'Cool Deg Days Flag',
##        'Total Rain (mm)', 'Total Rain Flag', 'Total Snow (cm)',
##        'Total Snow Flag', 'Total Precip (mm)', 'Total Precip Flag',
##        'Snow on Grnd (cm)', 'Snow on Grnd Flag', 'Dir of Max Gust (10s deg)',
##        'Dir of Max Gust Flag', 'Spd of Max Gust (km/h)',
##        'Spd of Max Gust Flag'],
##       dtype='object')

Hmm, some of those are a little cumbersome. Let’s rename the columns we are going to use and then select only those columns.

Python

tor = tor.rename(columns={'Date/Time': 'Date', 'Max Temp (°C)': 'MaxTemp', 'Min Temp (°C)': 'MinTemp', 'Mean Temp (°C)': 'MeanTemp', 'Total Precip (mm)': 'TotalPrecip'})
tor = tor[['Date', 'MaxTemp', 'MinTemp', 'MeanTemp', 'TotalPrecip']]

Great. Let’s take a look at what we’ve got.

Python

tor.head()

##          Date  MaxTemp  MinTemp  MeanTemp  TotalPrecip
## 0  1981-01-01     -5.7    -10.2      -8.0          5.0
## 1  1981-01-02     -9.2    -15.8     -12.5          0.0
## 2  1981-01-03    -16.2    -20.7     -18.5          0.0
## 3  1981-01-04    -12.3    -26.9     -19.6          0.0
## 4  1981-01-05     -4.7    -14.0      -9.4          0.0

Python

tor.tail()

##              Date  MaxTemp  MinTemp  MeanTemp  TotalPrecip
## 10952  2010-12-27      NaN      NaN       NaN          0.0
## 10953  2010-12-28      NaN      NaN       NaN          0.0
## 10954  2010-12-29      NaN      NaN       NaN          0.0
## 10955  2010-12-30      NaN      NaN       NaN          0.0
## 10956  2010-12-31      NaN      NaN       NaN          1.2

Uhh ohh! It looks like we are missing all of our temperature data at the end of our baseline period. We can test this theory by using an enumerator. The following will tell us the index at which all three of our temperature variables of interest are missing for the first time.

Python

next(i for i, x in enumerate(tor.loc[:, 'MaxTemp':'MeanTemp'].isnull().all(axis=1)) if x) - 1

## 8215

Let’s take a look at the context around that index.

Python

tor[8211:8221]

##             Date  MaxTemp  MinTemp  MeanTemp  TotalPrecip
## 8211  2003-06-26     31.1     21.0      26.1          1.0
## 8212  2003-06-27     25.9     18.5      22.2          0.0
## 8213  2003-06-28     27.5     15.4      21.5          0.0
## 8214  2003-06-29     25.1     17.4      21.3          6.2
## 8215  2003-06-30     24.8     15.3      20.1          0.0
## 8216  2003-07-01      NaN      NaN       NaN          NaN
## 8217  2003-07-02      NaN      NaN       NaN          NaN
## 8218  2003-07-03      NaN      NaN       NaN          NaN
## 8219  2003-07-04      NaN      NaN       NaN          NaN
## 8220  2003-07-05      NaN      NaN       NaN          NaN

It looks like we’re missing data as of July 2003! Something odd has happened. Let’s look for a nearby station (within 2 km) that we can use!

Python

ec3.find_station(target=5051, dist=range(3))

##                             Name Province Climate ID  Station ID   WMO ID  \
## 6563                     TORONTO  ONTARIO    6158350        5051  71266.0   
## 6564     TORONTO SOLAR RADIATION  ONTARIO    6158352       41863  71626.0   
## 6565                TORONTO CITY  ONTARIO    6158355       31688  71508.0   
## 6583           TORONTO DEER PARK  ONTARIO    6158417        5065      NaN   
## 6503  PA MATTAMY ATHLETIC CENTRE  ONTARIO    6156175       52723      NaN   
## 
##       ... DLY First Year DLY Last Year MLY First Year  MLY Last Year  \
## 6563  ...         1840.0        2017.0         1840.0         2006.0   
## 6564  ...         2018.0        2018.0            NaN            NaN   
## 6565  ...         2002.0        2023.0         2003.0         2006.0   
## 6583  ...         1890.0        1933.0         1890.0         1933.0   
## 6503  ...            NaN           NaN            NaN            NaN   
## 
##                        Dist  
## 6563                 0.0 km  
## 6564                 0.0 km  
## 6565                 0.0 km  
## 6583  1.9585072748345775 km  
## 6503   1.958726710148121 km  
## 
## [5 rows x 20 columns]

By the looks of it, “Toronto” and “Toronto City” are co-located! Indeed, in 2003 the “Toronto” station was renamed “Toronto City” and re-coded. Since that date, the daily data is available under station code 31688.

Let’s grab that data set as of 2000 and compare with station 5051.

Python

tor2 = ec3.get_data(stations=31688, years=range(2000, 2011), progress=False)

The ECCC data has a standard format, so our column names for tor2 are going to be the same nasty column names that we had in tor. Let’s rename them and select the columns we are interested in.

Python

tor2 = tor2.rename(columns={'Date/Time': 'Date', 'Max Temp (°C)': 'MaxTemp', 'Min Temp (°C)': 'MinTemp', 'Mean Temp (°C)': 'MeanTemp', 'Total Precip (mm)': 'TotalPrecip'})
tor2 = tor2[['Date', 'MaxTemp', 'MinTemp', 'MeanTemp', 'TotalPrecip']]

We can check the head and the tail of this data too.

Python

tor2.head()

##          Date  MaxTemp  MinTemp  MeanTemp  TotalPrecip
## 0  2000-01-01      NaN      NaN       NaN          NaN
## 1  2000-01-02      NaN      NaN       NaN          NaN
## 2  2000-01-03      NaN      NaN       NaN          NaN
## 3  2000-01-04      NaN      NaN       NaN          NaN
## 4  2000-01-05      NaN      NaN       NaN          NaN

Python

tor2.tail()

##             Date  MaxTemp  MinTemp  MeanTemp  TotalPrecip
## 4013  2010-12-27     -0.9    -10.7      -5.8          0.0
## 4014  2010-12-28      0.7     -3.9      -1.6          0.0
## 4015  2010-12-29     -0.2     -3.1      -1.7          0.0
## 4016  2010-12-30      5.7     -3.5       1.1          0.0
## 4017  2010-12-31      9.9      5.3       7.6          0.0

As expected, we’re missing data at the beginning of the file (before 2003). We can use a slightly modified version of our iterator from earlier. This time, we’ll check for the first row for which any of the data is present.

Python

next(i for i, x in enumerate(tor2.loc[:, 'MaxTemp':'MeanTemp'].notnull().any(axis=1)) if x) - 1

## 884

We’ll take a look at that in context again.

Python

tor2[880:890]

##            Date  MaxTemp  MinTemp  MeanTemp  TotalPrecip
## 880  2002-05-30      NaN      NaN       NaN          NaN
## 881  2002-05-31      NaN      NaN       NaN          NaN
## 882  2002-06-01      NaN      NaN       NaN          NaN
## 883  2002-06-02      NaN      NaN       NaN          NaN
## 884  2002-06-03      NaN      NaN       NaN          NaN
## 885  2002-06-04     14.1      9.1      11.6          4.8
## 886  2002-06-05     23.5     11.3      17.4          3.4
## 887  2002-06-06     17.8     12.8      15.3          0.0
## 888  2002-06-07     21.4     11.0      16.2          0.0
## 889  2002-06-08     24.9     11.7      18.3          0.0

It looks like our data starts as of June 04, 2002.

OK, so we know that we have data at the old station until June 30, 2003. Likewise, it seems that we likely have data at the new station in June 2003. Let’s take a look at the overlap and see if we can merge these files without issue.

Python

tor[(tor.Date >= "2003-06-25") & (tor.Date <= "2003-07-05")]

##             Date  MaxTemp  MinTemp  MeanTemp  TotalPrecip
## 8210  2003-06-25     33.9     20.3      27.1          0.0
## 8211  2003-06-26     31.1     21.0      26.1          1.0
## 8212  2003-06-27     25.9     18.5      22.2          0.0
## 8213  2003-06-28     27.5     15.4      21.5          0.0
## 8214  2003-06-29     25.1     17.4      21.3          6.2
## 8215  2003-06-30     24.8     15.3      20.1          0.0
## 8216  2003-07-01      NaN      NaN       NaN          NaN
## 8217  2003-07-02      NaN      NaN       NaN          NaN
## 8218  2003-07-03      NaN      NaN       NaN          NaN
## 8219  2003-07-04      NaN      NaN       NaN          NaN
## 8220  2003-07-05      NaN      NaN       NaN          NaN

Python

tor2[(tor2.Date >= "2003-06-25") & (tor2.Date <= "2003-07-05")]

##             Date  MaxTemp  MinTemp  MeanTemp  TotalPrecip
## 1271  2003-06-25     33.6     20.8      27.2          0.0
## 1272  2003-06-26     31.1     19.7      25.4          1.0
## 1273  2003-06-27     26.0     16.2      21.1          0.2
## 1274  2003-06-28     27.5     15.7      21.6          0.0
## 1275  2003-06-29     25.1     17.4      21.3          7.0
## 1276  2003-06-30     24.8     15.2      20.0          0.0
## 1277  2003-07-01     28.4     17.0      22.7          0.0
## 1278  2003-07-02     29.4     16.9      23.2          0.0
## 1279  2003-07-03     31.3     18.2      24.8          0.0
## 1280  2003-07-04     31.0     21.3      26.2          0.0
## 1281  2003-07-05     31.6     19.7      25.7          0.0

The data on the overlapping days is virtually identical between the two station codes, so we can be confident that merging these two data sets won’t lead to a sudden temperature bump. Let’s concatenate the relevant sections of tor and tor2.

Python

tor = pd.concat([tor[tor.Date < "2003-07-01"], tor2[tor2.Date >= "2003-07-01"]])

Now we can make sure that we have a full data set from January 1981 to December 2010.

Python

tor.head()

##          Date  MaxTemp  MinTemp  MeanTemp  TotalPrecip
## 0  1981-01-01     -5.7    -10.2      -8.0          5.0
## 1  1981-01-02     -9.2    -15.8     -12.5          0.0
## 2  1981-01-03    -16.2    -20.7     -18.5          0.0
## 3  1981-01-04    -12.3    -26.9     -19.6          0.0
## 4  1981-01-05     -4.7    -14.0      -9.4          0.0

Python

tor.tail()

##             Date  MaxTemp  MinTemp  MeanTemp  TotalPrecip
## 4013  2010-12-27     -0.9    -10.7      -5.8          0.0
## 4014  2010-12-28      0.7     -3.9      -1.6          0.0
## 4015  2010-12-29     -0.2     -3.1      -1.7          0.0
## 4016  2010-12-30      5.7     -3.5       1.1          0.0
## 4017  2010-12-31      9.9      5.3       7.6          0.0

Save your new data set to csv by:

Python

tor.to_csv("tor.csv")

4.3 Analyzing the data

Now let’s take a look at our data. We will require some additional libraries.

Python

import matplotlib as mpl
mpl.rc('font', size = 12)
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import scipy.stats as stats

As we mentioned in the introduction, the term “climate index” is often used to describe the variable upon which we will focus our analysis of climate change impacts. In this section, we’ll calculate two climate indices, starting with a rather simple Boolean (True/False) variable.

At the time of writing, the outside temperature was a sweltering 32 °C (36 with the humidex), so what better climate index to consider than “tropical nights”, nights with a minimum temperature above 20 °C. We will limit our analysis to the summer months.

This code requires that your “Date” column is recognized as such by Python. This isn’t always automatic, so you will likely need to ask pandas to convert the column to a datetime object.

Python

tor.Date = pd.to_datetime(tor.Date)

4.3.1 Example: Tropical nights

Now let’s start by adding a new column to our table called “TropNight”, which will be a True/False value.

Python

tor['TropNight'] = ((tor.MinTemp > 20) & (tor.Date.dt.month.isin([6, 7, 8])))
tor.head()

##         Date  MaxTemp  MinTemp  MeanTemp  TotalPrecip  TropNight
## 0 1981-01-01     -5.7    -10.2      -8.0          5.0      False
## 1 1981-01-02     -9.2    -15.8     -12.5          0.0      False
## 2 1981-01-03    -16.2    -20.7     -18.5          0.0      False
## 3 1981-01-04    -12.3    -26.9     -19.6          0.0      False
## 4 1981-01-05     -4.7    -14.0      -9.4          0.0      False

Since our value is Boolean, we can use that column as an index. Let’s double-check to make sure that we got the results that we were expecting.

Python

tor[tor.TropNight].head()

##           Date  MaxTemp  MinTemp  MeanTemp  TotalPrecip  TropNight
## 166 1981-06-16     30.1     21.5      25.8          1.2       True
## 180 1981-06-30     25.6     20.6      23.1          4.6       True
## 184 1981-07-04     26.2     20.1      23.2          0.4       True
## 188 1981-07-08     33.2     22.5      27.9          0.0       True
## 189 1981-07-09     32.0     24.5      28.3          0.0       True

In Python, True is equal to 1, and False is equal to zero. So, how do we turn this into annual data? The year is stored within the datetime object in the 'Date' column, so we can get the total annual tropical nights by grouping our data frame by year using pd.resample and taking the sum of the 'TropNight' column for each group.

Python

tor_trop = tor[['Date', 'TropNight']].resample('Y', on='Date').sum()
tor_trop.head()

##             TropNight
## Date                 
## 1981-12-31          8
## 1982-12-31         11
## 1983-12-31         25
## 1984-12-31         21
## 1985-12-31          6

Let’s see that on a plot!

Python

tor_trop.plot()

Figure 4.1: Total summertime tropical nights at Toronto (1981‒2010).

If you don’t see a figure after running the code in the previous block, you may need to make an explicit call to plt.show(). If you are using Jupyter Notebook, you can add %matplotlib inline somewhere after your import statement for matplotlib and then the plots will automatically appear in your notebook when they are created. Seeing your plots as you build them is useful to make sure that they look like you might expect them to. Once you have the perfect plot, you can save it to disk using plt.savefig().

The grouping operation has changed our table index to December 31 from all years from 1981 to 2010. Let’s pull the year out so that we can use it to plot:

Python

tor_trop['Date'] = tor_trop.index.year
tor_trop.head()

##             TropNight  Date
## Date                       
## 1981-12-31          8  1981
## 1982-12-31         11  1982
## 1983-12-31         25  1983
## 1984-12-31         21  1984
## 1985-12-31          6  1985

Now let’s make another plot, this time with a trendline.

Python

plt.plot(tor_trop.Date, tor_trop.TropNight)
plt.plot(tor_trop.Date, np.polyval(np.polyfit(tor_trop.Date, tor_trop.TropNight, 1), tor_trop.Date))
plt.title("Total summertime tropical nights at Toronto (1981‒2010)")
plt.xlabel("Year")
plt.ylabel("No. of Tropical Nights")

Figure 4.2: Total summertime tropical nights at Toronto (1981‒2010), with linear trend line.

4.3.2 Example: Heating degree days

Let’s try for a slightly more complicated climate index: Winter heating degree days (HDDs). HDDs are used as an indirect measure of energy demand from building heating systems in winter. In summer, cooling degree days are use as a similar measure of energy demand from building cooling systems.The HDDs represent the difference between the mean temperature and the base temperature of 18 °C. Since HDDs can’t be negative, they are only calculated when the mean temperature is below 18 °C.

We can apply an anonymous (lambda) function to our MinTemp column to capture the following pseudocode expression:

return 0 if the mean temperature was above 18, otherwise, return the difference between 18 and the mean temperature

Python

tor['HDD'] = tor.MeanTemp.apply(lambda x: 0 if x >= 18 else 18 - x)

We have one more challenge to overcome. A common mistake when performing seasonal analysis on the winter is to simply group December, January, and February by year. This is an easy logical jump to make. Remember, when we refer to winter, we mean the continuous winter season that stretches from the December of the previous year to February of the current year. The easiest way to control for this is to add 1 to the year for any December. Winter 1981/82, for example will then appear with 'Year' 1982 in the data frame.

Python

tor['Year'] = tor.Date.apply(lambda x: x.year + 1 if x.month == 12 else x.year)

Now we can aggregate our heating degree days. Pay close attention to the code below, as it “chains” many operations together into a single line. First, we use two conditional filters in the square brackets. The first condition (tor.Year.isin(range(1982, 2011))) selects all years between 1982 and 2011. We use this filter to drop the two winters that we know are incomplete: winter 1980/81 (which is missing December 1980), and winter 2010/2011, which is really just December 2010 here. (A better solution for this issue would be to download data including December 1980). The second condition (tor.Date.dt.month.isin([12, 1, 2])) filters out all months except for the winter months: 12, December; 1, January; and 2, February. After filtering, we select the two columns that are of interest ('Year' and 'HDD'). This time around, we don’t need to use pd.resample to group our data by year, since we already created a brand new 'Year' column. Instead, we use pd.groupby and then aggregate to annual total HDDs by sum(). Finally, we reset the index.

Python

tor_hdd = tor[tor.Year.isin(range(1982, 2011)) & tor.Date.dt.month.isin([12, 1, 2])][['Year', 'HDD']].groupby('Year').sum().reset_index()
tor_hdd.head()

##    Year     HDD
## 0  1982  2027.9
## 1  1983  1660.8
## 2  1984  1957.8
## 3  1985  1856.6
## 4  1986  1964.8

If the line of code above is too complicated, you can break it down into little pieces. See the comments in the code block below.

Python

# First, filter the Toronto data to only include years from 1982 to 2011
# Remember that we just adjusted the year for all Decembers by + 1
tor_hdd = tor[tor.Year.isin(range(1982, 2011))]

# Now filter to include only December, January and February
tor_hdd = tor_hdd[& tor_hdd.Date.dt.month.isin([12, 1, 2])]

# Next, select only the 'Year' and 'HDD' columns:
tor_hdd = tor_hdd[['Year', 'HDD']]

# Group the data by year, and take the sum of the 'HDD' column
tor_hdd = tor_hdd.groupby('Year').sum()

# Reset the index of the data frame
tor_hdd = tor_hdd.reset_index()

Python

plt.plot(tor_hdd.Year, tor_hdd.HDD)
plt.plot(tor_hdd.Year, np.polyval(np.polyfit(tor_hdd.Year, tor_hdd.HDD, 1), tor_hdd.Year))
plt.title("Winter HDD at Toronto (1981/82‒2009/10)")
plt.xlabel("Year")
plt.ylabel("Heating Degree Days  (HDD)")

Figure 4.3: Total wintertime heating degree days (HDD) at Toronto (1981‒2010), with linear trend line.

Let’s see if we can detect any trends in our baseline values. For convenience, we are going to create a function that will print the relevant values. You can run these commands directly, but this might save you time when you get to the exercises (Hint!).

Python

def test_trends(years, values):
    # Perform linear regression
    slope, intercept, r_value, p_value, std_err = stats.linregress(years, values)
    # Print the regression statistics
    print("The regression coefficients are", np.round(slope, 3), "for the slope and",
          np.round(intercept, 1), "for the intercept\n")

    # Calculate confidence intervals
    t_crit = stats.t.ppf(0.975, len(years) - 1)
    confidence_interval = t_crit * std_err
    print("The true value of the slope is then", np.round(slope, 3), "+/-",
          np.round(confidence_interval, 3),"\n")

    # Get Pearson's coefficient
    pearsons_corrcoef, p_corr = stats.pearsonr(years,  values)

    # Describe the significance level
    levels = [0.001, 0.01, 0.05, 0.1]
    lvl = [i for i in levels if p_corr < i]

    # Print the correlation statistics
    print("The correlation is", np.round(pearsons_corrcoef, 3), "with a p-value of",
          np.round(p_corr, 5), "(not significant)\n" if lvl == [] else
          "(significant at the " + str(min(lvl)) + " level)\n")

    # Calculate and print the R²
    print("The variance in", values.name, "explained by the linear trend is",
          "quantified by the R²: R² =",
          np.round(100 * pearsons_corrcoef**2, 3), '%.\n')

Let’s use this new function to see some key features of the relationship between the year and the annual summertime tropical nights.

Python

test_trends(tor_trop.Date, tor_trop.TropNight)

## The regression coefficients are 0.037 for the slope and -57.5 for the intercept
## 
## The true value of the slope is then 0.037 +/- 0.373 
## 
## The correlation is 0.038 with a p-value of 0.84 (not significant)
## 
## The variance in TropNight explained by the linear trend is quantified by the R²: R² = 0.148 %.

At first glance, it appears that there is a positive correlation between year and tropical nights (i.e. tropical nights are increasing over time). However, the slope of the trend is only 0.037 tropical nights per year, and the confidence intervals are very wide (0.373). The fact that the confidence intervals cross zero and the large \(p\)-value tell use that there doesn’t appear to be a trend after all. In fact, the \(R^2\) tells us that only about 0.15% of the variance in tropical nights can be explained by the linear trend.

What about heating degree days?

Python

test_trends(tor_hdd.Year, tor_hdd.HDD)

## The regression coefficients are -2.163 for the slope and 6145.1 for the intercept
## 
## The true value of the slope is then -2.163 +/- 6.165 
## 
## The correlation is -0.137 with a p-value of 0.47862 (not significant)
## 
## The variance in HDD explained by the linear trend is quantified by the R²: R² = 1.876 %.

For heating degree days, we see the opposite: there is an apparent negative trend (i.e. less heating degree days), but we again have confidence intervals that cross zero, a large p-value, and a small \(R^2\).

You may be asking yourself how it is that we didn’t see any trends in tropical nights or in HDD. If the overwhelming scientific consensus is that the world is getting warmer, why didn’t we see trends in our climatic indices? Remember that the climate system is very noisy. We focused our analysis at a given place over a given period of time. While the general rule is that climate analysis requires at least 30 years of data; even 30 years may be insufficient to detect long-term trends. We also examined two very different climate indices. Tropical nights, for instance, are a relatively extreme index at Toronto. It may be unrealistic to expect that we’d find a statistically significant trend. For a look at how the length of our period affects trends in winter temperatures, see Anderson and Gough (2017). For another paper on long-term trend analysis at Toronto, see Mohsin and Gough (2010).

4.4 Exercises (what to submit)

Write up your lab exercise in short report format.

You have been assigned a climate index and are asked to assess this climate index over two different 30-year baseline time periods. Write a brief description of the climate index, whether there were any missing or suspicious data, and how these data were treated. [2 marks]
Include the time series plots with fitted trend lines for your climate index and for the corresponding temperature variable for both baseline time periods, e.g. MeanTemp for freezing degree days. [4 marks]
Include a table with the summary statistics for your climate index and the corresponding temperature variable for both baseline time periods (i.e., the climatological mean and standard deviation, the slope and the 95% confidence intervals, the correlation and the corresponding p-value and the \(R^2\)). Describe the results in a short paragraph. [4 marks]
Compare your plots and statistics with your classmates who were assigned the same climate index, but for a different season. Consider how the different season may influence the results. Write up your comparison in a short paragraph. [2 marks]
Be sure to clearly label your plots and table and include concise figure and table captions. [2 marks]

References

Anderson, Conor I., and William A. Gough. 2017. “Evolution of Winter Temperature in Toronto, Ontario, Canada: A Case Study of Winters 2013/14 and 2014/15.” Journal of Climate 30 (14): 5361–76. https://doi.org/10.1175/JCLI-D-16-0562.1.

IPCC. 2021. “Summary for Policymakers.” In Climate Change 2021: The Physical Science Basis. Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change, edited by V. Masson-Delmotte, P. Zhai, A. Pirani, S. L. Connors, C. Péan, S. Berger, N. Caud, et al. Cambridge University Press.

Mohsin, Tanzina, and William A. Gough. 2010. “Trend Analysis of Long-Term Temperature Time Series in the Greater Toronto Area (GTA).” Theoretical and Applied Climatology 101 (3–4): 311–27. https://doi.org/10.1007/s00704-009-0214-x.