Exercise 2 - Rainfall data analysis

The variation in rainfall, between seasons, over decades and around the continent is an important driver of ecosystems, land-cover, agriculture and water availability for development. This exercise is focused on i) accessing rainfall data (in Australia), ii) undertaking statistical analysis of the data, and then iii) seeking to understand WHY we see variability in rainfall amounts.

Note that this exercise forms the first out of three sessions associated with your Assignment 1A. You will be required to share a summary of your data to a compiled version of the class data, which will be needed in subsequent exercises to complete your assignment.

Objective

By the end of this activity, you should:

  • Be able to source rainfall, understand the nature of that data, perform tasks to summarise the data, and describe it using simple statistics in Excel.
  • Understand how and why rainfall may be changing over time and use statistics and regression to test this hypothesis.
  • Know how different modes of climate variability such as IOD and ENSO can affect annual rainfall.

Sourcing rainfall data

The previous exercise focused on estimating rainfall and runoff at catchment scales during a single (daily) storm event. In this exercise we will learn how to access long time-series of rain data in Australia, and examine the nature of rainfall data and use some simple statistics in Excel to describe these data.

1) Start by downloading the daily rainfall data for a chosen BOM site from the Bureau of Meteorology web site. Choose a site of interest in Western Australia. Use the site ID to get the data as below. Download and extract the zipped file to get a CSV file that you can then open in Excel.

Before moving forward with our analysis of this site, just open the dataset and make sure the site has a suitable period of data. Ideally, we need to be choosing a site with >30 years of data - the more the better. Also, refer to the “Data available for the selected station” small infographic on the site-selection panel to check what (complete) years of data you have.

Whilst this exercise is focused around using the raw data from BOM, other useful resoruces for accessing rainfall data include SILO, and openmeteo.

Statistical properties of rainfall data

2) Now, let’s plot the time series of DAILY rainfall. Lets do a ‘quick-and-dirty’ plot by selecting the date and rainfall amount columns, and then inserting a chart. Describe the main features of this data set. Consider: what is the most appropriate type of graph for rainfall data (bar, scatter, line)?

3) We’ll now make a more ‘human-readable’ plot by now focusing on annual rain amounts. SO in this step we will calculate the annual SUM of rainfall, for each year. But note, we want to ensure that there are only whole years to calculate the sums or else the incomplete years will be in error (the sum will be artificially low due to missing data). We can do this without having to delete rows by making a PIVOT table and counting number of good data days in each year.

3a) To explore each year’s data, select the required data across multiple columns (year column C to rainfall amount column F). Then click insert PIVOT table (select “Create a new tab” when prompted). Select Year and Rainfall amount as the FIELDS. Drag the Year FIELD into the ROWS box. (If it appears, left click on the “Sum of Year” in the VALUES box and remove it). You can select “COUNT” as the operation on the rainfall values to see how many entries are given for each year. Years with more than 10% of days missing (e.g <330 days counted per year), are unreliable to include in our analysis so see if your station has any incomplete years. Now add the Rainfall to the VALUES box again, but this time make the operation on the data the a SUM.

3b) Once you are satisfied which years have good data, plot the annual sums as a timeseries. Clean up the plot of the annual sum of rainfall as required (and labels etc). What are the main features of this annual data set? What variability can you see?

4) Now we have a “clean” dataset, we want to explore some statistics. Use the Data Analysis add-in to calculate descriptive statistics (mean annual rainfall, max annual rainfall, min annual rainfall, Standard deviation of annual rainfall, Skewness) for the whole data (you can install by going to Tools->Add-ins->Analysis ToolPack).

5) Next, let’s look at the distribution of the rainfall data. Do this by plotting a histogram of the rainfall data (you will need to decide on a reasonable bin size) and plotting the cumulative probability distribution (you can ask Excel to generate this automatically when it makes the histogram, or generate it manually – each data point represents a probability of 1/#data points). What do these plots tell you about the rainfall data? Are the data normally distributed?

6a) Plot the seasonal (monthly average) rainfall. Use a PIVOT table again. Insert PIVOT table (select OK to create a new tab). Select Month and Rainfall amount as the FIELDS; drag the Month FIELD into the ROWS box. Left click on the “Sum of Month” in the VALUES box and remove it. Plot the annual sums as a timeseries. Clean up the plot as required.

6b) Copy the results to the class shared EXCEL sheet, where you will be completing a table that looks something like Table 4. for your site (and others when they are done). Describe the values you got and what they mean. How “useful” are these statistics for your understanding of the system? What if any assumptions are there about these statistics?

6c) Looking at your site and others from around Western Australia, how would you describe the seasonal pattern of rainfall (hint consider the climate type)? What factors may influence it (hint, think about earth orbit)?

Table 4: Example Bureau of Meteorology Automatic Weather Station data for various station IDs across WA. You will need to complete a row in the online table for your site [refer to file on Teams].
Station Station ID Years Mean SD Min / Max
Swanbourne 9215
Perth 9225
Perth Airport 9021
Bickley 9240
Mundaring 9030
Chidlow 9007
Northam 10111
Kellerberrin 10073
Merredin 10092
Kalgoorlie-Boulder 12038
Leonora 12241
Cape Leeuwin 9518
Busselton 9515
Geraldton Town 8050
Carnarvon 6011
Exmouth Gulf 5004
Broome 3003
Wyndham 1006
Bridgetown Comparison 9510
Wagin 10647
Hyden 10568
Albany 9500
Esperance Downs 9631
Manjimup 9573
Narrogin 10614

Temporal trends in rainfall

Prior to starting this work, make sure you have completed the Steps 1 to 6 above!

Now we have explored the basic statistics of the rainfall time-series (including describing mean, variance, and seasonality), lets investigate if there are any long-term trends.

7a) For your chosen site, calculate the trend in rainfall for the last 5 years. Either copy and paste the annual totals from the PIVOT table (include the year and amount columns) to a new location for plotting or calculate manually. Select the last 5 years of data and in DATA ANALYSIS start a REGRESSION. Input the data range for x (the independant variable) and y (the dependant variable). Select “Line Fit Plots”. What did you find and what were you expecting for your site? Is this long enough for a reliable trend? Is the relationship really linear? (You can do this same analysis by adding a trend-line to the data, as described below).

7b) OK, so 5 years is not really useful fro trend assessment. Let’s now calculate the trend in rainfall for all years available at your chosen site. You can calculate that annual total rainfall using a PIVOT table (include the year and Rainfall amount columns) or by calculating them manually using SUM. Select the all years of data and INSERT a SCATTER plot. Add a linear trend line and “display the equation” on the chart. The slope of the equation of the line tells you the trend. Add the Pearson correlation coefficient (\(r^2\)) value as well to see how well the line fits the data (1 is a perfect fit, 0 is no fit).

7c) As before, add the information you have computed for your site into shared spreadsheet, which will have headings as per Table 5 and.

Table 5: Bureau of Meteorology Automatic Weather Station data and station ID across WA and SWWA. Complete the table for each site.
Station Station ID Years of data Sign of trend Trend rate (mm/decade) T-value Significance (p value)
Swanbourne 9215
Perth 9225
Perth Airport 9021
Bickley 9240
Mundaring 9030
Chidlow 9007
Northam 10111
Kellerberrin 10073
Merredin 10092
Kalgoorlie-Boulder 12038
Leonora 12241
Cape Leeuwin 9518
Busselton 9515
Geraldton Town 8050
Carnarvon 6011
Exmouth Gulf 5004
Broome 3003
Wyndham 1006
Bridgetown Comparison 9510
Wagin 10647
Hyden 10568
Albany 9500
Esperance Downs 9631
Manjimup 9573
Narrogin 10614


8) Looking at your plot of rainfall over time at your site, what hypothesis can we make about changes in rainfall amount over time? Using the full temporal dataset (all years), perform a REGRESSION analysis using the ToolPak regeression option. These results inclue a t-statistic (t Stat) and p-value, which tell us if the trend is statistically significant. Please also enter them also into the shared spreasheet, such as shown in Table 5. The null hypothesis is the the rainfall did NOT change over time. If the p-value is smaller than alpha = 0.05 then we can REJECT the null hypothesis (which means that the rainfall amount has changed significantly over time). You can also reject the null hypothesis if the t-statistic is less than -2.101 or greater than 2.101 (based on a normal distribution). Describe the results and conclusions of this analysis.

Explaining the interannual variability

OK, so we have looked at the long term trend, but we do see year-to-year variability in annual totals. We can think about WHY do we see this variation - is it just random, or is it related to specific features of the climate system.

So, here we determine how rainfall correlates with known modes of climate variability - for Western Australia, these will include the Indian Ocean Dipole (IOD) and El Nino Southern Oscillation (ENSO). We use climatic indicators of these phenomena to explore their temporal variability (and look for cyclic patterns). In particular, we want to know if rainfall totals at our site of interest correlate with these known modes of climate variability.

For ENSO we use the Southern Oscillation Index (SOI), which is calculated as follows.

\[\begin{equation} \text{SOI} = 10 \frac{P_{\text{diff}} - P_{\text{diffav}}}{SD(P_{\text{diff}})} \tag{3} \end{equation}\]

where:

  • \(P_{\text{diff}}\) = (average Tahiti MSLP for the month) - (average Darwin MSLP for the month)
  • \(P_{\text{diffav}}\) = long term average of \(P_{\text{diff}}\) for the month in question, and
  • \(SD(P_{\text{diff}})\) = long term standard deciation of \(P_{\text{diff}}\) for the month in question.

For IOD, this is calculated by quantifying the anomalous Sea Surface Temperature (SST) gradient between the western equatorial Indian Ocean (50E-70E and 10S-10N) and the south eastern equatorial Indian Ocean (90E-110E and 10S-0N) in units of degrees Kelvin - this index is denoted as the DMI.

We will use the extremely useful Climate Explorer for the next section. First you will need to register so you can save data series. Access the rainfall from the previous site that you used in the previous section. Click on “Monthly Station Data” (right hand menu), select “precipitation” from the “GHCN-M (all)” column, search for your station (e.g. type BROOME in the “Select stations” section), and then press “Get stations”. If the selection works you will see relevant stations listed, and then click “get data”. You will then see graphs and options for further integrating the data. Make a name to save the data (the default should be fine). Click “Add to list”.

9a) Now get the first climate index for the IOD called the DMI. Go to “Monthly climate indices” and then select DMI. Note any trend and variability in the index.

9b) To correlate the index and rainfall select “Correlate with other time series”. And tick your previously saved rainfall timeseries and then press “Correlate” and copy the results into a local spreadsheet with data similar to Table 6


Table 6: Rainfall and climate indices (DMI for IOD and SOI for ENSO) correlations.
Month DMI Corr DMI p-value SOI Corr SOI p-value
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec


9c) As before, goto the shared class spreadsheet, and add your results to your row - the row for your site.

10) Highlight any months that have a significant p-value (<0.05). What relationships do you see between rainfall and climate mode. Why are some months having a stronger correlation that others? Do they correlate in specific seasons? How strong are these correlations really? When comparing to other stations in the class has reported, are the trends you see consistent or different across WA?

Note that analysis we did above has its own rainfall data and it is not exactly using your BOM station data. In fact the tool has all rain data for any location in the globe. So we can do the same exercise except using gridded data (climate data divided into say 0.5 degree spacing using climate model “reanalysis” information), and investigate these correlations for the whole of Australia! Lets do this, by clicking on “Monthly observations” and go to Precipitation CRU TS 0.5.

We then want to correlate each grid cell on the earth with our climate index (SOI and DMI) to see how the correlations very spatially. Click on “Correlate with a time series” and choose either SOI or DMI. Then click correlate at the bottom and wait (the calculations (i.e. regression of index and rain) are being done online and then a graph is returned for viewing). Do the same for both SOI and DMI.

11) For DMI which month had the highest (negative) correlation (copy that plot to your results). What areas of Australia have significant correlations and what time of year? For SOI which month had the highest (positive) correlation (copy that plot to your results). What areas of Australia have significant correlations and what time of year? Does this match with your own rainfall station correlations?