Welcome to unseen_open’s documentation!

An open, reproducible and transferable workflow to assess and anticipate climate extremes beyond the observed record.

UNSEEN-open is an open source project using the global SEAS5 and ERA5 datasets. It makes evaluation of model simulations and extreme value analysis easy in order to anticipate climate extremes beyond the observed record. The project is developed as part of the ECMWF summer of weather code 2020 (esowc), which is funded by Copernicus.

UNSEEN-open relies on xarray for data preprocessing and uses ggplot and extRemes for the extreme value analysis. The extreme value utilities are being developed into an UNSEEN Rpackage.

Read all about UNSEEN-open in our preprint!

Applications

In our recent NPJ Climate and Atmospheric Science paper we outline four potential applications where we believe UNSEEN might prove to be useful:

  1. Help estimate design values, especially relevant for data scarce regions

  2. Improve risk estimation of natural hazards by coupling UNSEEN to impact models

  3. Detect trends in rare climate extremes

  4. Increase our physical understanding of the drivers of (non-stationarity of) climate extremes

We hope this approach may see many applications across a range of scientific fields!

What is UNSEEN?

The UNprecedented Simulated Extremes using ENsembles (UNSEEN, Thompson et al., 2017) approach uses forecast ensemble members to compute robust statistics for rare events, which is challenging to compute from historical records. UNSEEN may therefore help to identify plausible – yet unseen – weather extremes and to stress-test adaptation measures with maximum credible events. For more info about UNSEEN, see our preprint, BOX A in particular.

We believe UNSEEN has large potential as a tool to inform decision-making about unforeseen hydro-climatic risks. In order to apply UNSEEN: 1. Model ensemble members must be applicable for generating large samples of weather events (see Box B in paper); and 2. Large volumes of data must be handled.

Our paper presents a 6-step protocol (see below) and, as part of the protocol, the UNSEEN-open workflow, to guide users in applying UNSEEN more generally. The paper discusses the protocol in detail, including the practicalities of the workflow and its potential application to other datasets. The technical steps and relevant code are documented here. The protocol is applicable to any prediction system, whilst the code and guidance for UNSEEN-open is developed to work with the Copernicus Data Store (CDS, https://cds.climate.copernicus.eu/).

UNSEEN-open

In this project, the aim is to build an open, reproducible, and transferable workflow for UNSEEN.

This means that we aim for anyone to be able to assess any climate extreme event anywhere in the world!

UNSEEN-open was therefore developed with a focus on Copernicus SEAS5 forecasts, because it is an openly available, stable, homogeneous, global, high-resolution, large ensemble with continuous evaluation at ECMWF. We refer to section 4.2 of our paper for a discussion of other relevant datasets.

All code showing how UNSEEN data can be handled is documented on Jupyter notebooks. This means that some familiarity with python and R is (currently) required. Future further developments of tools and applications that do not require coding by the user themselves would be very interesting if time and funding allows!

title

Overview

Here we provide an overview of steps 3-5 for UNSEEN-open.

Retrieve

We use global open Copernicus C3S data: the seasonal prediction system SEAS5 and the reanlysis ERA5.

The functions to retrieve all forecasts (SEAS5) and reanalysis (ERA5) are retrieve_SEAS5 and retrieve_ERA5. You can select the climate variable, the target month(s) and the area - for more explanation see retrieve.

[2]:
retrieve.retrieve_SEAS5(
    variables=['2m_temperature', '2m_dewpoint_temperature'],
    target_months=[3, 4, 5],
    area=[70, -11, 30, 120],
    years=np.arange(1981, 2021),
    folder='../Siberia_example/SEAS5/')
[3]:
retrieve.retrieve_ERA5(variables=['2m_temperature', '2m_dewpoint_temperature'],
                       target_months=[3, 4, 5],
                       area=[70, -11, 30, 120],
                       folder='../Siberia_example/ERA5/')
Preprocess

In the preprocessing step, we first merge all downloaded files into one netcdf file. Then the rest of the preprocessing depends on the definition of the extreme event. For example, for the UK case study, we want to extract the UK average precipitation while for the Siberian heatwave we will just used the defined area to spatially average over. For the MAM season, we still need to take the seasonal average, while for the UK we already have the average February precipitation.

Read the docs on preprocessing for more info.

Evaluate

The evaluation step is important to assess whether the forecasts are realistic and consistent to the observations. There are three statistical tests available through the UNSEEN R package. See the evaluation section for more info.

Case studies

So what can we learn from UNSEEN-open?

Have a look at the examples!

Installation

Anaconda is used as package manager.

Python

For the retrieval and pre-processing of the data, the Copernicus Data Store (CDS) Python API cdsapi and xarray python packages are used. These can be installed using the environment provided in this directory.

conda env create -f environment.yml

This creates a conda environment called ‘basic_analysis’. The environment can be activated using:

conda activate basic_analysis

To get this environment as a kernel on jupyter, we need to install ‘ipykernel’ in the activated environment:

conda install -c anaconda ipykernel

And then install the environment:

python -m ipykernel install --user --name=basic_analysis

Hopefully, you will see the environment now as an available kernel!

R

For the evaluation, extreme value analysis and visualization, we use R ggplot and extRemes packages. The evaluation tests have been developed into an ‘UNSEEN’ R-package. These packages can be installed as follows:

[ ]:
### install regular packages
install.packages("extRemes") # for extreme value statistics
install.packages("ggplot2") # for plotting

### install GitHub packages (tag = commit, branch or release tag)
install.packages("devtools")
devtools::install_github("timokelder/UNSEEN") # for evaluation

Examples

In this project, UNSEEN-open is applied to assess two extreme events in 2020: February 2020 UK precipitation and the 2020 Siberian heatwave.

Launch in Binder Binder

Siberian Heatwave

Prolonged heat events with an average temperature above 0 degrees over Siberia can have enormous impacts on the local environment, such as wildfires, invasion of pests and infrastructure failure, and on the global environment, through the release of greenhouse gasses during permafrost thawing.

The 2020 Siberian heatwave was a prolonged event that consistently broke monthly temperature the records. We show a gif of the monthly 2020 temperature rank within the observations from 1979-2020, see this section for details. - Rank 1 mean highest on record - Rank 2 means second highest - etc..

Siberian Temperature records 2020

This attribution study by World Weather Attribution (WWA) has shown that the event was made much more likely (600x) because of human induced climate change but also that the event was a very rare event within our present climate.

Could such a thawing event be anticipated with UNSEEN?

With UNSEEN-open, we can assess whether extreme events like the Siberian heatwave have been forecasted already, i.e. whether we can anticipate such an event by exploiting all forecasts over the domain.

Retrieve data

The main functions to retrieve all forecasts (SEAS5) and reanalysis (ERA5) are retrieve_SEAS5 and retrieve_ERA5. We want to download 2m temperature, for the March-May target months over the Siberian domain. By default, the hindcast years of 1981-2016 are downloaded for SEAS5. We include the years 1981-2020. The folder indicates where the files will be stored, in this case outside of the UNSEEN-open repository, in a ‘Siberia_example’ directory. For more explanation, see retrieve.

[1]:
import os
import sys
sys.path.insert(0, os.path.abspath('../../../'))
os.chdir(os.path.abspath('../../../'))

import src.cdsretrieve as retrieve
import src.preprocess as preprocess

import numpy as np
import xarray as xr
[2]:
retrieve.retrieve_SEAS5(
    variables=['2m_temperature', '2m_dewpoint_temperature'],
    target_months=[3, 4, 5],
    area=[70, -11, 30, 120],
    years=np.arange(1981, 2021),
    folder='../Siberia_example/SEAS5/')
[3]:
retrieve.retrieve_ERA5(variables=['2m_temperature', '2m_dewpoint_temperature'],
                       target_months=[3, 4, 5],
                       area=[70, -11, 30, 120],
                       folder='../Siberia_example/ERA5/')
Preprocess

In the preprocessing step, we first merge all downloaded files into one xarray dataset, then take the spatial average over the domain and a temporal average over the MAM season. Read the docs on preprocessing for more info.

[4]:
SEAS5_Siberia = preprocess.merge_SEAS5(folder = '../Siberia_example/SEAS5/', target_months = [3,4,5])
Lead time: 02
1
12

And for ERA5:

[5]:
ERA5_Siberia = xr.open_mfdataset('../Siberia_example/ERA5/ERA5_????.nc',combine='by_coords')

Then we calculate the day-in-month weighted seasonal average:

[6]:
SEAS5_Siberia_weighted = preprocess.season_mean(SEAS5_Siberia, years = 39)
ERA5_Siberia_weighted = preprocess.season_mean(ERA5_Siberia, years = 42)

And we select the 2m temperature, and take the average over a further specified domain. This is an area-weighed average, since grid cell area decreases with latitude, see preprocess.

[8]:
area_weights = np.cos(np.deg2rad(SEAS5_Siberia_weighted.latitude))

SEAS5_Siberia_events_zoomed = (
    SEAS5_Siberia_weighted['t2m'].sel(   # Select 2 metre temperature
        latitude=slice(70, 50),          # Select the latitudes
        longitude=slice(65, 120)).       # Select the longitude
    weighted(area_weights).              # Apply the weights
    mean(['longitude', 'latitude']))     # and take the spatial average

SEAS5_Siberia_events_zoomed_df = SEAS5_Siberia_events_zoomed.rename('t2m').to_dataframe() # weights remove the DataArray name, so I renamed the DaraArray after applying the weight.

In this workflow, ERA5 and SEAS5 are on the same grid and hence have the same weights:

[9]:
area_weights_ERA = np.cos(np.deg2rad(ERA5_Siberia_weighted.latitude))
area_weights_ERA == area_weights
[9]:
Show/Hide data repr Show/Hide attributes
xarray.DataArray
'latitude'
  • latitude: 41
  • True True True True True True True ... True True True True True True
    array([ True,  True,  True,  True,  True,  True,  True,  True,  True,
            True,  True,  True,  True,  True,  True,  True,  True,  True,
            True,  True,  True,  True,  True,  True,  True,  True,  True,
            True,  True,  True,  True,  True,  True,  True,  True,  True,
            True,  True,  True,  True,  True])
    • latitude
      (latitude)
      float32
      70.0 69.0 68.0 ... 32.0 31.0 30.0
      units :
      degrees_north
      long_name :
      latitude
      array([70., 69., 68., 67., 66., 65., 64., 63., 62., 61., 60., 59., 58., 57.,
             56., 55., 54., 53., 52., 51., 50., 49., 48., 47., 46., 45., 44., 43.,
             42., 41., 40., 39., 38., 37., 36., 35., 34., 33., 32., 31., 30.],
            dtype=float32)

And here I take the spatial average 2m temperature for ERA5.

[42]:
ERA5_Siberia_events_zoomed = (
    ERA5_Siberia_weighted['t2m'].sel(  # Select 2 metre temperature
        latitude=slice(70, 50),        # Select the latitudes
        longitude=slice(65, 120)).    # Select the longitude
    weighted(area_weights).           # weights
    mean(['longitude', 'latitude']))  # Take the average

ERA5_Siberia_events_zoomed_df = ERA5_Siberia_events_zoomed.rename('t2m').to_dataframe()
Evaluate

Note

From here onward we use R and not python!

We switch to R since we believe R has a better functionality in extreme value statistics.

Is the UNSEEN ensemble realistic?

To answer this question, we perform three statistical tests: independence, model stability and model fidelity tests.
These statistical tests are available through the UNSEEN R package. See evaluation for more info.
[3]:
require(UNSEEN)
require(ggplot2)
require(ggpubr)
Loading required package: UNSEEN

Loading required package: ggplot2

Warning message:
“replacing previous import ‘vctrs::data_frame’ by ‘tibble::data_frame’ when loading ‘dplyr’”
Loading required package: ggpubr

Timeseries

We plot the timeseries of SEAS5 (UNSEEN) and ERA5 (OBS) for the the Siberian Heatwave.

[4]:
unseen_timeseries(
    ensemble = SEAS5_Siberia_events_zoomed_df,
    obs = ERA5_Siberia_events_zoomed[ERA5_Siberia_events_zoomed$year > 1981,],
    ensemble_yname = "t2m",
    ensemble_xname = "year",
    obs_yname = "t2m",
    obs_xname = "year",
    ylab = "MAM Siberian temperature (C)") +
theme(text = element_text(size = 14)) #This is to increase the figure font
Warning message:
“Removed 2756 rows containing non-finite values (stat_boxplot).”
_images/Notebooks_examples_Siberian_Heatwave_28_1.png

The timeseries consist of hindcast (years 1982-2016) and archived forecasts (years 2017-2020). The datasets are slightly different: the hindcasts contains 25 members whereas operational forecasts contain 51 members, the native resolution is different and the dataset from which the forecasts are initialized is different.

For the evaluation of the UNSEEN ensemble we want to only use the SEAS5 hindcasts for a consistent dataset. Note, 2017 is not used in either the hindcast nor the operational dataset, since it contains forecasts both initialized in 2016 (hindcast) and 2017 (forecast), see retrieve. We split SEAS5 into hindcast and operational forecasts:

[5]:
SEAS5_Siberia_events_zoomed_hindcast <- SEAS5_Siberia_events_zoomed_df[
    SEAS5_Siberia_events_zoomed_df$year < 2017 &
    SEAS5_Siberia_events_zoomed_df$number < 25,]

SEAS5_Siberia_events_zoomed_forecasts <- SEAS5_Siberia_events_zoomed_df[
    SEAS5_Siberia_events_zoomed_df$year > 2017,]

And we select the same years for ERA5.

[6]:
ERA5_Siberia_events_zoomed_hindcast <- ERA5_Siberia_events_zoomed[
    ERA5_Siberia_events_zoomed$year < 2017 &
    ERA5_Siberia_events_zoomed$year > 1981,]

Which results in the following timeseries:

[7]:
unseen_timeseries(
    ensemble = SEAS5_Siberia_events_zoomed_hindcast,
    obs = ERA5_Siberia_events_zoomed_hindcast,
    ensemble_yname = "t2m",
    ensemble_xname = "year",
    obs_yname = "t2m",
    obs_xname = "year",
    ylab = "MAM Siberian temperature (C)") +
    theme(text = element_text(size = 14))
_images/Notebooks_examples_Siberian_Heatwave_34_0.png
Evaluation tests

With the hindcast dataset we evaluate the independence, stability and fidelity. Here, we plot the results for the fidelity test, for more detail on the other tests see the evaluation section.

The fidelity test shows us how consistent the model simulations of UNSEEN (SEAS5) are with the observed (ERA5). The UNSEEN dataset is much larger than the observed – hence they cannot simply be compared. For example, what if we had faced a few more or a few less heatwaves purely by chance?

This would influence the observed mean, but not so much influence the UNSEEN ensemble because of the large data sample. Therefore we express the UNSEEN ensemble as a range of plausible means, for data samples of the same length as the observed. We do the same for higher order statistical moments.

[7]:
independence_test(
    ensemble = SEAS5_Siberia_events_zoomed_hindcast,
    n_lds = 3,
    var_name = "t2m",
    detrend = TRUE
) +
    theme(text = element_text(size = 14))

Warning message:
“Removed 975 rows containing non-finite values (stat_ydensity).”
Warning message:
“Removed 975 rows containing non-finite values (stat_boxplot).”
_images/Notebooks_examples_Siberian_Heatwave_36_1.png

Is the model stable over leadtimes?

[8]:
stability_test(
    ensemble = SEAS5_Siberia_events_zoomed_hindcast,
    lab = 'MAM Siberian temperature',
    var_name = 't2m'
)
Warning message:
“Removed 2 row(s) containing missing values (geom_path).”
_images/Notebooks_examples_Siberian_Heatwave_38_1.png

Is the model consistent with ERA5?

[9]:
fidelity_test(
    obs = ERA5_Siberia_events_zoomed_hindcast$t2m,
    ensemble = SEAS5_Siberia_events_zoomed_hindcast$t2m,
    units = 'C',
    biascor = FALSE,
    fontsize = 14
)
_images/Notebooks_examples_Siberian_Heatwave_40_0.png

The fidelity test shows that the mean of the UNSEEN ensemble is too low compared to the observed – the blue line falls outside of the model range in a. To correct for this low bias, we can apply an additive bias correction, which only corrects the mean of the simulations.

Lets apply the additive biascor:

[11]:
obs <- ERA5_Siberia_events_zoomed_hindcast$t2m
ensemble <- SEAS5_Siberia_events_zoomed_hindcast$t2m
ensemble_biascor <- ensemble + (mean(obs) - mean(ensemble))

fidelity_test_biascor <- fidelity_test(
    obs = obs,
    ensemble = ensemble_biascor,
    units = 'C',
    ylab = '',
    yticks = FALSE,
    biascor = FALSE,
    fontsize = 14
)

fidelity_test_biascor
# ggsave(fidelity_test_biascor,height = 90, width = 90, units = 'mm', filename = "graphs/Siberia_biascor.pdf")
_images/Notebooks_examples_Siberian_Heatwave_42_0.png

This shows us what we expected: the mean bias is corrected because the model simulations are shifted up (the blue line is still the same, the axis has just shifted along with the histogram), but the other statistical moments are the same.

Illustrate

So could thawing events (with an average temperature above 0 degrees) have been anticipated?

Here we create a bias adjusted dataframe:

[13]:
SEAS5_Siberia_events_zoomed_df_bc <- SEAS5_Siberia_events_zoomed_df
SEAS5_Siberia_events_zoomed_df_bc['t2m'] <- SEAS5_Siberia_events_zoomed_df_bc['t2m'] + (mean(obs) - mean(ensemble))

str(SEAS5_Siberia_events_zoomed_df_bc)
'data.frame':   5967 obs. of  4 variables:
 $ year    : int  1982 1982 1982 1982 1982 1982 1982 1982 1982 1982 ...
 $ leadtime: int  2 2 2 2 2 2 2 2 2 2 ...
 $ number  : int  0 1 2 3 4 5 6 7 8 9 ...
 $ t2m     : num  -3.16 -5.11 -3.64 -5.81 -2.93 ...
[14]:
unseen_timeseries(
    ensemble = SEAS5_Siberia_events_zoomed_df_bc,
    obs = ERA5_Siberia_events_zoomed[ERA5_Siberia_events_zoomed$year > 1981,],
    ensemble_yname = "t2m",
    ensemble_xname = "year",
    obs_yname = "t2m",
    obs_xname = "year",
    ylab = "MAM Siberian temperature (C)") +
theme(text = element_text(size = 14)) + #This is to increase the figure font
geom_hline(yintercept = 0) #+
# ggsave(height = 90, width = 90, units = 'mm', filename = "graphs/Siberia_timeseries_biascor.pdf")
Warning message:
“Removed 2756 rows containing non-finite values (stat_boxplot).”
_images/Notebooks_examples_Siberian_Heatwave_46_1.png

Applications:

With UNSEEN-open, we can: 1. Assess the drivers of the most severe events. The 2020 event seemed to be caused by a very anomalous Indian Ocean Dipole (IOD). What can we learn from the thawing events within UNSEEN? To what extent are these also driven by an anomalous IOD or are there other drivers of such severe heat events? 2. Perform nonstationary analysis in rare extremes, such as the 100-year event. There seems to be a trend over the hindcast period in the severe heatwaves. We can perform non-stationary analysis to estimate the change in the magnitude and frequency of the heatwaves and, if we find a change, we could explore the drivers of this change. 3. Evaluate forecasts. Since we are using seasonal forecasts in this setup, we could explore the forecast skill in simulating heatwaves over Siberia.

Launch in Binder Binder

California fires

In August 2020 in California, wildfires have burned more than a million acres of land. The wildfires coinciding with record high temperature anomalies, see August 2020 temperature anomaly. California Temperature August 2020

In this example, we evaluate the UNSEEN ensemble and show that there is a clear trend in temperature extremes over the last decades.

Retrieve data

The main functions to retrieve all forecasts (SEAS5) and reanalysis (ERA5) are retrieve_SEAS5 and retrieve_ERA5. We want to download 2m temperature for August over California. By default, the hindcast years of 1981-2016 are downloaded for SEAS5. We include the years 1981-2020. The folder indicates where the files will be stored, in this case outside of the UNSEEN-open repository, in a ‘California_example’ directory. For more explanation, see retrieve.

[1]:
import os
import sys
sys.path.insert(0, os.path.abspath('../../../'))
os.chdir(os.path.abspath('../../../'))

import src.cdsretrieve as retrieve
import src.preprocess as preprocess

import numpy as np
import xarray as xr
[2]:
retrieve.retrieve_SEAS5(
    variables=['2m_temperature', '2m_dewpoint_temperature'],
    target_months=[8],
    area=[70, -130, 20, -70],
    years=np.arange(1981, 2021),
    folder='../California_example/SEAS5/')
[3]:
retrieve.retrieve_ERA5(variables=['2m_temperature', '2m_dewpoint_temperature'],
                       target_months=[8],
                       area=[70, -130, 20, -70],
                       folder='../California_example/ERA5/')
Preprocess

In the preprocessing step, we first merge all downloaded files into one xarray dataset, then take the spatial average over the domain and a temporal average over the MAM season. Read the docs on preprocessing for more info.

[4]:
SEAS5_California = preprocess.merge_SEAS5(folder ='../California_example/SEAS5/', target_months = [8])
Lead time: 07
6
5
4
3

And for ERA5:

[5]:
ERA5_California = xr.open_mfdataset('../California_example/ERA5/ERA5_????.nc',combine='by_coords')

We calculate the standardized anomaly of the 2020 event and select the 2m temperature over the region where 2 standard deviations from the 1979-2010 average was exceeded, see this page. This is an area-weighed average, since grid cell area decreases with latitude, see preprocess.

[6]:
ERA5_anomaly = ERA5_California['t2m'] - ERA5_California['t2m'].sel(time=slice('1979','2010')).mean('time')
ERA5_sd_anomaly = ERA5_anomaly / ERA5_California['t2m'].sel(time=slice('1979','2010')).std('time')

We use a land-sea mask to select land-only gridcells:

[7]:
LSMask = xr.open_dataset('../California_example/ERA_landsea_mask.nc')
# convert the longitude from 0:360 to -180:180
LSMask['longitude'] = (((LSMask['longitude'] + 180) % 360) - 180)
[8]:
area_weights = np.cos(np.deg2rad(ERA5_sd_anomaly.latitude))

ERA5_California_events = (
    ERA5_California['t2m'].sel(  # Select 2 metre temperature
        longitude = slice(-125,-100),    # Select the longitude
        latitude = slice(45,20)).        # And the latitude
    where(ERA5_sd_anomaly.sel(time = '2020').squeeze('time') > 2). ##Mask the region where 2020 sd >2.
    where(LSMask['lsm'].sel(time = '1979').squeeze('time') > 0.5). #Select land-only gridcells
    weighted(area_weights).
    mean(['longitude', 'latitude']) #And take the mean
)

Plot the August temperatures over the defined California domain:

[9]:
ERA5_California_events.plot()
[9]:
[<matplotlib.lines.Line2D at 0x7f99b808f8b0>]
_images/Notebooks_examples_California_Fires_19_1.png

Select the same domain for SEAS5 and extract the events.

[10]:
SEAS5_California_events = (
    SEAS5_California['t2m'].sel(
        longitude = slice(-125,-100),    # Select the longitude
        latitude = slice(45,20)).        # And the latitude
    where(ERA5_sd_anomaly.sel(time = '2020').squeeze('time') > 2). #Mask the region where 2020 sd >2.
    where(LSMask['lsm'].sel(time = '1979').squeeze('time') > 0.5). #Select land-only gridcells
    weighted(area_weights).
    mean(['longitude', 'latitude']))

And here we store the data in the Data section so the rest of the analysis in R can be reproduced.

[11]:
SEAS5_California_events.rename('t2m').to_dataframe().to_csv('Data/SEAS5_California_events.csv')
ERA5_California_events.rename('t2m').to_dataframe().to_csv('Data/ERA5_California_events.csv')
Evaluate

Note

From here onward we use R and not python!

We switch to R since we believe R has a better functionality in extreme value statistics.

Is the UNSEEN ensemble realistic?

To answer this question, we perform three statistical tests: independence, model stability and model fidelity tests.
These statistical tests are available through the UNSEEN R package. See evaluation for more info.
[4]:
require(UNSEEN)
require(ggplot2)
Loading required package: UNSEEN

Loading required package: ggplot2

Warning message:
“replacing previous import ‘vctrs::data_frame’ by ‘tibble::data_frame’ when loading ‘dplyr’”
Timeseries

We plot the timeseries of SEAS5 (UNSEEN) and ERA5 (OBS) for the the Siberian Heatwave.

[5]:
timeseries = unseen_timeseries(
    ensemble = SEAS5_California_events,
    obs = ERA5_California_events,
    ensemble_yname = "t2m",
    ensemble_xname = "time",
    obs_yname = "t2m",
    obs_xname = "time",
    ylab = "August California temperature (C)")

timeseries + theme(text = element_text(size = 14))
Warning message:
“Removed 4680 rows containing non-finite values (stat_boxplot).”
_images/Notebooks_examples_California_Fires_33_1.png

The timeseries consist of hindcast (years 1982-2016) and archived forecasts (years 2017-2020). The datasets are slightly different: the hindcasts contains 25 members whereas operational forecasts contain 51 members, the native resolution is different and the dataset from which the forecasts are initialized is different.

For the evaluation of the UNSEEN ensemble we want to only use the SEAS5 hindcasts for a consistent dataset. Note, 2017 is not used in either the hindcast nor the operational dataset, since it contains forecasts both initialized in 2016 (hindcast) and 2017 (forecast), see retrieve. We split SEAS5 into hindcast and operational forecasts:

[6]:
SEAS5_California_events_hindcast <- SEAS5_California_events[
    SEAS5_California_events$time < '2017-02-01' &
    SEAS5_California_events$number < 25,]

SEAS5_California_events_forecasts <- SEAS5_California_events[
    SEAS5_California_events$time > '2017-02-01',]

And we select the same years for ERA5.

[7]:
ERA5_California_events_hindcast <- ERA5_California_events[
    ERA5_California_events$time > '1981-02-01' &
    ERA5_California_events$time < '2017-02-01',]

Which results in the following timeseries:

[8]:
unseen_timeseries(
    ensemble = SEAS5_California_events_hindcast,
    obs = ERA5_California_events_hindcast,
    ensemble_yname = "t2m",
    ensemble_xname = "time",
    obs_yname = "t2m",
    obs_xname = "time",
    ylab = "August California temperature (C)") +
theme(text = element_text(size = 14))
_images/Notebooks_examples_California_Fires_39_0.png
Evaluation tests

With the hindcast dataset we evaluate the independence, stability and fidelity. Here, we plot the results for the fidelity test, for more detail on the other tests see the evaluation section.

[9]:
Independence_California = independence_test(
    ensemble = SEAS5_California_events_hindcast,
    var_name = "t2m"
    )

Independence_California +
    theme(text = element_text(size = 14))
Warning message:
“Removed 1625 rows containing non-finite values (stat_ydensity).”
Warning message:
“Removed 1625 rows containing non-finite values (stat_boxplot).”
_images/Notebooks_examples_California_Fires_41_1.png
[10]:
Stability_California = stability_test(
        ensemble = SEAS5_California_events_hindcast,
        lab = 'August California temperature (C)',
        var_name = 't2m'
    )
Stability_California

Warning message:
“Removed 4 row(s) containing missing values (geom_path).”
_images/Notebooks_examples_California_Fires_42_1.png
[11]:
Stability_California = stability_test(
        ensemble = SEAS5_California_events_hindcast,
        lab = 'August temperature (C)',
        var_name = 't2m',
        fontsize = 10

    )
# ggsave(Stability_California,height = 120, width = 120, units = 'mm', filename = "graphs/California_stability.pdf")
Warning message:
“Removed 4 row(s) containing missing values (geom_path).”

The fidelity test shows us how consistent the model simulations of UNSEEN (SEAS5) are with the observed (ERA5). The UNSEEN dataset is much larger than the observed – hence they cannot simply be compared. For example, what if we had faced a few more or a few less heatwaves purely by chance?

This would influence the observed mean, but not so much influence the UNSEEN ensemble because of the large data sample. Therefore we express the UNSEEN ensemble as a range of plausible means, for data samples of the same length as the observed. We do the same for higher order statistical moments.

[12]:
Fidelity_California = fidelity_test(
    obs = ERA5_California_events_hindcast$t2m,
    ensemble = SEAS5_California_events_hindcast$t2m,
    units = 'C',
    biascor = FALSE,
    fontsize = 14
)
Fidelity_California
_images/Notebooks_examples_California_Fires_45_0.png

The fidelity test shows that the mean of the UNSEEN ensemble is too low compared to the observed – the blue line falls outside of the model range in a. To correct for this low bias, we can apply an additive bias correction, which only corrects the mean of the simulations.

Lets apply the additive biascor:

[10]:
obs = ERA5_California_events_hindcast$t2m
ensemble = SEAS5_California_events_hindcast$t2m
ensemble_biascor = ensemble + (mean(obs) - mean(ensemble))

fidelity_test(
    obs = obs,
    ensemble = ensemble_biascor,
    units = 'C',
    biascor = FALSE,
    fontsize = 14
)
_images/Notebooks_examples_California_Fires_47_0.png

This shows us what we expected: the mean bias is corrected because the model simulations are shifted up (the blue line is still the same, the axis has just shifted along with the histogram), but the other statistical moments are the same.

Publication-ready plots

We combine the timeseries and the three evaluation plots in one plot for the manuscript. We want the font size to be 10 for all plots and we need to adjust the panel labels for the stability and fidelity plots. For the fidelity plot we also remove redundant ylabels and yticks.

[14]:
timeseries_font10 = timeseries + theme(text = element_text(size = 10))
Independence_font10 = Independence_California + theme(text = element_text(size = 10))
Stability_font10 = stability_test(
    ensemble = SEAS5_California_events_hindcast,
    lab = 'August temperature (C)',
    var_name = 't2m',
    fontsize = 10,
    panel_labels = c("c", "d")
)
Fidelity_font10 = fidelity_test(
    obs = ERA5_California_events_hindcast$t2m,
    ensemble = SEAS5_California_events_hindcast$t2m,
    ylab = '',
    yticks = FALSE,
    units = 'C',
    biascor = FALSE,
    fontsize = 10,
    panel_labels = c("e", "f", "g", "h")
)
Warning message:
“Removed 4 row(s) containing missing values (geom_path).”
[15]:
Evaluations = ggpubr::ggarrange(timeseries_font10,
                                Independence_font10,
                                Stability_font10,
                                Fidelity_font10,
                                labels = c("a","b", "", ""),
                                font.label = list(size = 10,
                                                  color = "black",
                                                  face = "bold",
                                                  family = NULL),
                                ncol = 2,
                                nrow = 2)
Evaluations
# ggsave(Evaluations,height = 180, width = 180, units = 'mm', filename = "graphs/California_evaluation_test2.pdf")
Warning message:
“Removed 4680 rows containing non-finite values (stat_boxplot).”
Warning message:
“Removed 1625 rows containing non-finite values (stat_ydensity).”
Warning message:
“Removed 1625 rows containing non-finite values (stat_boxplot).”
_images/Notebooks_examples_California_Fires_51_1.png
Illustrate

Was there a trend in the temperature extremes over the last decades? Let’s investigate!

First, we are loading the required extRemes package:

[9]:
require('extRemes')
Loading required package: extRemes

Loading required package: Lmoments

Loading required package: distillery


Attaching package: ‘extRemes’


The following objects are masked from ‘package:stats’:

    qqnorm, qqplot


We also source some R code to make the unseen-trends plots. These functions were written for this case study and we cannot ensure robustness to other case studies.

[24]:
source('src/evt_plot.r')

We use ERA5 events from 1981 to match the starting date of SEAS5, which we call ‘obs’. In addition, we use the bias corrected UNSEEN ensemble with ld 6 removed. We remove the first two years from ERA5 and we remove lead time 6 from the SEAS5 ensemble:

[10]:
obs <- ERA5_California_events[
    ERA5_California_events$time > '1981-02-01',]

UNSEEN_bc <- SEAS5_California_events[SEAS5_California_events$leadtime < 6 &
                                     SEAS5_California_events$number < 25,]

And then we correct the SEAS5 temperature bias using a mean adjustment calculated over the hindcast period.

[11]:
UNSEEN_bc$t2m <- (UNSEEN_bc$t2m +
                  mean(ERA5_California_events_hindcast$t2m) - mean(SEAS5_California_events_hindcast$t2m)
                               )
str(UNSEEN_bc)
'data.frame':   4000 obs. of  4 variables:
 $ leadtime: int  2 2 2 2 2 2 2 2 2 2 ...
 $ time    : Date, format: "1981-08-01" "1981-08-01" ...
 $ number  : int  0 1 2 3 4 5 6 7 8 9 ...
 $ t2m     : num  23 24.8 23.2 23.9 24.6 ...

Lets plot the data to see whats going on

[12]:
timeseries = unseen_timeseries(
    ensemble = UNSEEN_bc,
    obs = obs,
    ensemble_yname = "t2m",
    ensemble_xname = "time",
    obs_yname = "t2m",
    obs_xname = "time",
    ylab = "August California temperature (C)")

timeseries + theme(text = element_text(size = 14))
_images/Notebooks_examples_California_Fires_62_0.png

We apply extreme value theory to analyze the likelihood and trend of the temperature extremes. There are different extreme value distributions that can be used to fit to the data. First, we fit a stationary Gumbel and a GEV distribution (including shape parameter) to the observed extremes, which shows that GEV better describes the data with a p-value of 0.0001 using the LR-test. Then we fit a nonstationary GEV distribution to the observed temperatures and show that this best fits the data with a very small p-value of 3.633e-05 as compared to the stationary distribution (much below 0.05 based on 5% significance with the likelihood ratio test).

[13]:
## Fit stationary distributions
fit_obs_Gumbel <- fevd(x = obs$t2m,
                    type = "Gumbel"
                   )
fit_obs_GEV <- fevd(x = obs$t2m,
                    type = "GEV"
                   )
## And the nonstationary distribution
fit_obs_GEV_nonstat <- fevd(x = obs$t2m,
                            type = "GEV",
                            location.fun = ~ c(1:length(obs$time)), ##Fitting the gev with a location and scale parameter linearly correlated to the covariate (years)
                            scale.fun = ~ c(1:length(obs$time)),
                            use.phi = TRUE
                           )
#And test the fit
##1. Stationary Gumbel vs stationary GEV
lr.test(fit_obs_Gumbel, fit_obs_GEV_nonstat)
##2. Stationary GEV vs Nonstationary GEV
lr.test(fit_obs_GEV, fit_obs_GEV_nonstat)

        Likelihood-ratio Test

data:  obs$t2mobs$t2m
Likelihood-ratio = 20.446, chi-square critical value = 7.8147, alpha =
0.0500, Degrees of Freedom = 3.0000, p-value = 0.0001372
alternative hypothesis: greater


        Likelihood-ratio Test

data:  obs$t2mobs$t2m
Likelihood-ratio = 20.446, chi-square critical value = 5.9915, alpha =
0.0500, Degrees of Freedom = 2.0000, p-value = 3.633e-05
alternative hypothesis: greater

For the unseen ensemble this analysis is slightly more complicated since we need a covariate that has the same length as the ensemble:

[14]:
#Create the ensemble covariate
year_vector = as.integer(format(UNSEEN_bc$time, format="%Y"))
covariate_ens = year_vector - 1980

# Fit the stationary distribution
fit_unseen_GEV <- fevd(x = UNSEEN_bc$t2m,
                       type = 'GEV',
                       use.phi = TRUE)

fit_unseen_Gumbel <- fevd(x = UNSEEN_bc$t2m,
                          type = 'Gumbel',
                          use.phi = TRUE)

# Fit the nonstationary distribution
fit_unseen_GEV_nonstat <- fevd(x = UNSEEN_bc$t2m,
                               type = 'GEV',
                               location.fun = ~ covariate_ens, ##Fitting the gev with a location and scale parameter linearly correlated to the covariate (years)
                               scale.fun = ~ covariate_ens,
                               use.phi = TRUE)

And the likelihood ratio test tells us that the nonstationary GEV distribution is the best fit, both p-values < 2.2e-16:

[15]:
#And test the fit
##1. Stationary Gumbel vs stationary GEV
lr.test(fit_unseen_Gumbel,fit_unseen_GEV)
##2. Stationary GEV vs Nonstationary GEV
lr.test(fit_unseen_GEV, fit_unseen_GEV_nonstat)

        Likelihood-ratio Test

data:  UNSEEN_bc$t2mUNSEEN_bc$t2m
Likelihood-ratio = 568.39, chi-square critical value = 3.8415, alpha =
0.0500, Degrees of Freedom = 1.0000, p-value < 2.2e-16
alternative hypothesis: greater


        Likelihood-ratio Test

data:  UNSEEN_bc$t2mUNSEEN_bc$t2m
Likelihood-ratio = 945.52, chi-square critical value = 5.9915, alpha =
0.0500, Degrees of Freedom = 2.0000, p-value < 2.2e-16
alternative hypothesis: greater

We plot unseen trends in 100-year extremes. The function unseen-trends1 fits the trend for a selected return period (rp) for both the observed and ensemble datasets. For the observed dataset, the year 2020 was not used in the fit. For more info on the UNSEEN-trend method see this paper and for more details on the results, see section 3.2 of the open workflow paper. The function was written for this case study in specific, and we cannot ensure robustness to other case studies.

[19]:
year_vector = as.integer(format(UNSEEN_bc$time, format="%Y"))
covariate_ens = year_vector - 1980

Trend_2year <- unseen_trends1(ensemble = UNSEEN_bc$t2m,
                     x_ens = year_vector,
                     x_obs = 1981:2020,
                     rp = 2,
                     obs = obs$t2m,
                     covariate_ens = covariate_ens,
                     covariate_obs = c(1:(length(obs$time)-1)),
                     covariate_values = c(1:length(obs$time)),
                     GEV_type = 'GEV',
                     ylab = 'August temperature (C)',
                     title = '2-year') +
ylim(c(20,28.5))

Trend_100year <- unseen_trends1(ensemble = UNSEEN_bc$t2m,
                     x_ens = year_vector,
                     x_obs = 1981:2020,
                     rp = 100,
                     obs = obs$t2m,
                     covariate_ens = covariate_ens,
                     covariate_obs = c(1:(length(obs$time)-1)),
                     covariate_values = c(1:length(obs$time)),
                     GEV_type = 'GEV',
                     ylab = '',
                     title = '100-year') +
ylim(c(20,28.5))

We combine the two plots:

[21]:
ggpubr::ggarrange(Trend_2year,Trend_100year,
                  labels = c("a","b"),
                  common.legend = TRUE,
                  font.label = list(size = 14,
                                    color = "black",
                                    face = "bold",
                                    family = NULL),
                  ncol = 2,
                  nrow = 1) #+
# ggsave(height = 100, width = 180, units = 'mm', filename = "graphs/California_trends.png")
_images/Notebooks_examples_California_Fires_72_0.png

There is a clear trend in the temperature extremes over last 40 years. How has this trend influenced the likelihood of occurrence of the 2020 event? The function unseen_trends2 plots the extreme value distributions for the ‘year’ covariate 1981 and 2020. There is a clear difference – the distribution for 1981 does not even reach the 2020 event. See section 3.2 of the open workflow paper for more details on this exciting but scary result! Note that also this function was written for this case study in specific, and we cannot ensure robustness to other case studies.

[25]:
p2 <- unseen_trends2(ensemble = UNSEEN_bc$t2m,
                    obs = obs[1:(length(obs$time)-1),]$t2m,
                    covariate_ens = covariate_ens,
                    covariate_obs = c(1:(length(obs$time)-1)),
                    GEV_type = 'GEV',
                    ylab = 'August temperature (C)')

Distributions = p2 + geom_hline(yintercept = obs[obs$time == '2020-08-01',]$t2m) #+
Distributions
_images/Notebooks_examples_California_Fires_74_0.png

Let’s make a publication-ready plot by combining the above figures.

[135]:
Trends = ggpubr::ggarrange(Trend_2year,Trend_100year,
                  labels = c("a","b"),
                  common.legend = TRUE,
                  font.label = list(size = 10,
                                    color = "black",
                                    face = "bold",
                                    family = NULL),
                  ncol = 2,
                  nrow = 1)

[136]:
ggpubr::ggarrange(Trends,Distributions,
                  labels = c("","c"),
                  font.label = list(size = 10,
                                    color = "black",
                                    face = "bold",
                                    family = NULL),
                  ncol = 1,
                  nrow = 2) +
ggsave(height = 180, width = 180, units = 'mm', filename = "graphs/California_trends2.pdf")
_images/Notebooks_examples_California_Fires_77_0.png

Applications:

We have seen the worst fire season over California in 2020. Such fires are likely part of a chain of impacts, from droughts to heatwaves to fires, with feedbacks between them. Here we assess August temperatures and show that the 2020 August average temperature was very anomalous. We furthermore use SEAS5 forecasts to analyze the trend in rare extremes. Evaluation metrics show that the model simulations have a high bias, which we correct for using an additive bias correction. UNSEEN trend analysis shows a clear trend over time, both in the model and in the observed temperatures. Based on this analysis, temperature extremes that you would expect to occur once in 1000 years in 1981 might occur once in <10 years at present (2020).

Note

Our analysis shows the results of a linear trend analysis of August temperature averages over 1981-2020. Other time windows, different trends than linear, and spatial domains could (should?) be investigated, as well as drought estimates in addition to temperature extremes.

Launch in Binder Binder

UK Precipitation

February 2020 case study

February 2020 was the wettest February on record in the UK (since 1862), according to the Met Office. The UK faced three official storms during February, and this exceptional phenomena attracted media attention, such as an article from the BBC on increased climate concerns among the population.

Here, we will test the applicability and potential of using SEAS5 for estimating the likelihood of the 2020 UK February precipitation event.

Retrieve data

The main functions to retrieve all forecasts (SEAS5) is retrieve_SEAS5. We want to download February average precipitation over the UK. By default, the hindcast years of 1981-2016 are downloaded for SEAS5. The folder indicates where the files will be stored, in this case outside of the UNSEEN-open repository, in a ‘UK_example’ directory. For more explanation, see retrieve.

[2]:
import os
import sys
sys.path.insert(0, os.path.abspath('../../../'))
os.chdir(os.path.abspath('../../../'))

import src.cdsretrieve as retrieve
import src.preprocess as preprocess
[29]:
import numpy as np
import xarray as xr
[4]:
retrieve.retrieve_SEAS5(variables = 'total_precipitation',
                        target_months = [2],
                        area = [60, -11, 50, 2],
                        years=np.arange(1981, 2021),
                        folder = '../UK_example/SEAS5/')

We use the EOBS observational dataset to evaluate the UNSEEN ensemble. I tried to download EOBS through the Copernicus Climate Data Store, but the Product is temporally disabled for maintenance purposes. As workaround I downloaded EOBS (from 1950 - 2019) and the most recent EOBS data (2020) here. Note, you have to register as E-OBS user.

Preprocess

In the preprocessing step, we first merge all downloaded files into one xarray dataset, see preprocessing.

[6]:
SEAS5_UK = preprocess.merge_SEAS5(folder = '../UK_example/SEAS5/', target_months = [2])
Lead time: 01
12
11
10
9

The SEAS5 total precipitation rate is in m/s. You can easily convert this and change the attributes. Click on the show/hide attributes button to see the assigned attributes.

[22]:
SEAS5_UK['tprate'] = SEAS5_UK['tprate'] * 1000 * 3600 * 24 ## From m/s to mm/d
SEAS5_UK['tprate'].attrs = {'long_name': 'rainfall',
 'units': 'mm/day',
 'standard_name': 'thickness_of_rainfall_amount'}
SEAS5_UK
[22]:
Show/Hide data repr Show/Hide attributes
xarray.Dataset
    • latitude: 11
    • leadtime: 5
    • longitude: 14
    • number: 51
    • time: 39
    • longitude
      (longitude)
      float32
      -11.0 -10.0 -9.0 ... 0.0 1.0 2.0
      units :
      degrees_east
      long_name :
      longitude
      array([-11., -10.,  -9.,  -8.,  -7.,  -6.,  -5.,  -4.,  -3.,  -2.,  -1.,   0.,
               1.,   2.], dtype=float32)
    • number
      (number)
      int64
      0 1 2 3 4 5 6 ... 45 46 47 48 49 50
      long_name :
      ensemble_member
      array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
             18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
             36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50])
    • time
      (time)
      datetime64[ns]
      1982-02-01 ... 2020-02-01
      long_name :
      time
      array(['1982-02-01T00:00:00.000000000', '1983-02-01T00:00:00.000000000',
             '1984-02-01T00:00:00.000000000', '1985-02-01T00:00:00.000000000',
             '1986-02-01T00:00:00.000000000', '1987-02-01T00:00:00.000000000',
             '1988-02-01T00:00:00.000000000', '1989-02-01T00:00:00.000000000',
             '1990-02-01T00:00:00.000000000', '1991-02-01T00:00:00.000000000',
             '1992-02-01T00:00:00.000000000', '1993-02-01T00:00:00.000000000',
             '1994-02-01T00:00:00.000000000', '1995-02-01T00:00:00.000000000',
             '1996-02-01T00:00:00.000000000', '1997-02-01T00:00:00.000000000',
             '1998-02-01T00:00:00.000000000', '1999-02-01T00:00:00.000000000',
             '2000-02-01T00:00:00.000000000', '2001-02-01T00:00:00.000000000',
             '2002-02-01T00:00:00.000000000', '2003-02-01T00:00:00.000000000',
             '2004-02-01T00:00:00.000000000', '2005-02-01T00:00:00.000000000',
             '2006-02-01T00:00:00.000000000', '2007-02-01T00:00:00.000000000',
             '2008-02-01T00:00:00.000000000', '2009-02-01T00:00:00.000000000',
             '2010-02-01T00:00:00.000000000', '2011-02-01T00:00:00.000000000',
             '2012-02-01T00:00:00.000000000', '2013-02-01T00:00:00.000000000',
             '2014-02-01T00:00:00.000000000', '2015-02-01T00:00:00.000000000',
             '2016-02-01T00:00:00.000000000', '2017-02-01T00:00:00.000000000',
             '2018-02-01T00:00:00.000000000', '2019-02-01T00:00:00.000000000',
             '2020-02-01T00:00:00.000000000'], dtype='datetime64[ns]')
    • latitude
      (latitude)
      float32
      60.0 59.0 58.0 ... 52.0 51.0 50.0
      units :
      degrees_north
      long_name :
      latitude
      array([60., 59., 58., 57., 56., 55., 54., 53., 52., 51., 50.], dtype=float32)
    • leadtime
      (leadtime)
      int64
      2 3 4 5 6
      array([2, 3, 4, 5, 6])
    • tprate
      (leadtime, time, number, latitude, longitude)
      float32
      dask.array<chunksize=(1, 1, 51, 11, 14), meta=np.ndarray>
      long_name :
      rainfall
      units :
      mm/day
      standard_name :
      thickness_of_rainfall_amount
      Array Chunk
      Bytes 6.13 MB 31.42 kB
      Shape (5, 39, 51, 11, 14) (1, 1, 51, 11, 14)
      Count 2281 Tasks 195 Chunks
      Type float32 numpy.ndarray
      39 5 14 11 51
  • Conventions :
    CF-1.6
    history :
    2020-05-13 14:49:43 GMT by grib_to_netcdf-2.16.0: /opt/ecmwf/eccodes/bin/grib_to_netcdf -S param -o /cache/data7/adaptor.mars.external-1589381366.1540039-11561-3-ad31a097-72e2-45ce-a565-55c62502f358.nc /cache/tmp/ad31a097-72e2-45ce-a565-55c62502f358-adaptor.mars.external-1589381366.1545565-11561-1-tmp.grib

Then I open the EOBS dataset and extract February monthly mean precipitation. I have taken the average mm/day over the month, which is more fair than the total monthly precipitation because of leap days.

[30]:
EOBS = xr.open_dataset('../UK_example/EOBS/rr_ens_mean_0.25deg_reg_v20.0e.nc') ## open the data
EOBS = EOBS.resample(time='1m').mean() ## Monthly averages
EOBS = EOBS.sel(time=EOBS['time.month'] == 2) ## Select only February
EOBS
/soge-home/users/cenv0732/.conda/envs/UNSEEN-open/lib/python3.8/site-packages/xarray/core/nanops.py:142: RuntimeWarning: Mean of empty slice
  return np.nanmean(a, axis=axis, dtype=dtype)
[30]:
Show/Hide data repr Show/Hide attributes
xarray.Dataset
    • latitude: 201
    • longitude: 464
    • time: 70
    • time
      (time)
      datetime64[ns]
      1950-02-28 ... 2019-02-28
      array(['1950-02-28T00:00:00.000000000', '1951-02-28T00:00:00.000000000',
             '1952-02-29T00:00:00.000000000', '1953-02-28T00:00:00.000000000',
             '1954-02-28T00:00:00.000000000', '1955-02-28T00:00:00.000000000',
             '1956-02-29T00:00:00.000000000', '1957-02-28T00:00:00.000000000',
             '1958-02-28T00:00:00.000000000', '1959-02-28T00:00:00.000000000',
             '1960-02-29T00:00:00.000000000', '1961-02-28T00:00:00.000000000',
             '1962-02-28T00:00:00.000000000', '1963-02-28T00:00:00.000000000',
             '1964-02-29T00:00:00.000000000', '1965-02-28T00:00:00.000000000',
             '1966-02-28T00:00:00.000000000', '1967-02-28T00:00:00.000000000',
             '1968-02-29T00:00:00.000000000', '1969-02-28T00:00:00.000000000',
             '1970-02-28T00:00:00.000000000', '1971-02-28T00:00:00.000000000',
             '1972-02-29T00:00:00.000000000', '1973-02-28T00:00:00.000000000',
             '1974-02-28T00:00:00.000000000', '1975-02-28T00:00:00.000000000',
             '1976-02-29T00:00:00.000000000', '1977-02-28T00:00:00.000000000',
             '1978-02-28T00:00:00.000000000', '1979-02-28T00:00:00.000000000',
             '1980-02-29T00:00:00.000000000', '1981-02-28T00:00:00.000000000',
             '1982-02-28T00:00:00.000000000', '1983-02-28T00:00:00.000000000',
             '1984-02-29T00:00:00.000000000', '1985-02-28T00:00:00.000000000',
             '1986-02-28T00:00:00.000000000', '1987-02-28T00:00:00.000000000',
             '1988-02-29T00:00:00.000000000', '1989-02-28T00:00:00.000000000',
             '1990-02-28T00:00:00.000000000', '1991-02-28T00:00:00.000000000',
             '1992-02-29T00:00:00.000000000', '1993-02-28T00:00:00.000000000',
             '1994-02-28T00:00:00.000000000', '1995-02-28T00:00:00.000000000',
             '1996-02-29T00:00:00.000000000', '1997-02-28T00:00:00.000000000',
             '1998-02-28T00:00:00.000000000', '1999-02-28T00:00:00.000000000',
             '2000-02-29T00:00:00.000000000', '2001-02-28T00:00:00.000000000',
             '2002-02-28T00:00:00.000000000', '2003-02-28T00:00:00.000000000',
             '2004-02-29T00:00:00.000000000', '2005-02-28T00:00:00.000000000',
             '2006-02-28T00:00:00.000000000', '2007-02-28T00:00:00.000000000',
             '2008-02-29T00:00:00.000000000', '2009-02-28T00:00:00.000000000',
             '2010-02-28T00:00:00.000000000', '2011-02-28T00:00:00.000000000',
             '2012-02-29T00:00:00.000000000', '2013-02-28T00:00:00.000000000',
             '2014-02-28T00:00:00.000000000', '2015-02-28T00:00:00.000000000',
             '2016-02-29T00:00:00.000000000', '2017-02-28T00:00:00.000000000',
             '2018-02-28T00:00:00.000000000', '2019-02-28T00:00:00.000000000'],
            dtype='datetime64[ns]')
    • latitude
      (latitude)
      float64
      25.38 25.62 25.88 ... 75.12 75.38
      units :
      degrees_north
      long_name :
      Latitude values
      axis :
      Y
      standard_name :
      latitude
      array([25.375, 25.625, 25.875, ..., 74.875, 75.125, 75.375])
    • longitude
      (longitude)
      float64
      -40.38 -40.12 ... 75.12 75.38
      units :
      degrees_east
      long_name :
      Longitude values
      axis :
      X
      standard_name :
      longitude
      array([-40.375, -40.125, -39.875, ...,  74.875,  75.125,  75.375])
    • rr
      (time, latitude, longitude)
      float32
      nan nan nan nan ... nan nan nan nan
      array([[[nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              ...,
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan]],
      
             [[nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              ...,
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan]],
      
             [[nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              ...,
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan]],
      
             ...,
      
             [[nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              ...,
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan]],
      
             [[nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              ...,
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan]],
      
             [[nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              ...,
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan]]], dtype=float32)

Here I define the attributes, that xarray uses when plotting

[31]:
EOBS['rr'].attrs = {'long_name': 'rainfall',  ##Define the name
 'units': 'mm/day', ## unit
 'standard_name': 'thickness_of_rainfall_amount'} ## original name, not used
EOBS['rr'].mean('time').plot() ## and show the 1950-2019 average February precipitation
/soge-home/users/cenv0732/.conda/envs/UNSEEN-open/lib/python3.8/site-packages/xarray/core/nanops.py:142: RuntimeWarning: Mean of empty slice
  return np.nanmean(a, axis=axis, dtype=dtype)
[31]:
<matplotlib.collections.QuadMesh at 0x7f93884292b0>
_images/Notebooks_examples_UK_Precipitation_18_2.png

The 2020 data file is separate and needs the same preprocessing:

[32]:
EOBS2020 = xr.open_dataset('../UK_example/EOBS/rr_0.25deg_day_2020_grid_ensmean.nc.1') #open
EOBS2020 = EOBS2020.resample(time='1m').mean() #Monthly mean
EOBS2020['rr'].sel(time='2020-04').plot() #show map
EOBS2020 ## display dataset
/soge-home/users/cenv0732/.conda/envs/UNSEEN-open/lib/python3.8/site-packages/xarray/core/nanops.py:142: RuntimeWarning: Mean of empty slice
  return np.nanmean(a, axis=axis, dtype=dtype)
[32]:
Show/Hide data repr Show/Hide attributes
xarray.Dataset
    • latitude: 201
    • longitude: 464
    • time: 12
    • time
      (time)
      datetime64[ns]
      2020-01-31 ... 2020-12-31
      array(['2020-01-31T00:00:00.000000000', '2020-02-29T00:00:00.000000000',
             '2020-03-31T00:00:00.000000000', '2020-04-30T00:00:00.000000000',
             '2020-05-31T00:00:00.000000000', '2020-06-30T00:00:00.000000000',
             '2020-07-31T00:00:00.000000000', '2020-08-31T00:00:00.000000000',
             '2020-09-30T00:00:00.000000000', '2020-10-31T00:00:00.000000000',
             '2020-11-30T00:00:00.000000000', '2020-12-31T00:00:00.000000000'],
            dtype='datetime64[ns]')
    • latitude
      (latitude)
      float64
      25.38 25.62 25.88 ... 75.12 75.38
      standard_name :
      latitude
      long_name :
      Latitude values
      units :
      degrees_north
      axis :
      Y
      array([25.375, 25.625, 25.875, ..., 74.875, 75.125, 75.375])
    • longitude
      (longitude)
      float64
      -40.38 -40.12 ... 75.12 75.38
      standard_name :
      longitude
      long_name :
      Longitude values
      units :
      degrees_east
      axis :
      X
      array([-40.375, -40.125, -39.875, ...,  74.875,  75.125,  75.375])
    • rr
      (time, latitude, longitude)
      float32
      nan nan nan nan ... nan nan nan nan
      array([[[nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              ...,
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan]],
      
             [[nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              ...,
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan]],
      
             [[nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              ...,
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan]],
      
             ...,
      
             [[nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              ...,
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan]],
      
             [[nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              ...,
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan]],
      
             [[nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              ...,
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan]]], dtype=float32)
_images/Notebooks_examples_UK_Precipitation_20_2.png

We then extract UK averaged precipitation for SEAS5 and EOBS. We upscale EOBS to the SEAS5 grid and apply the same UK mask to extract the UK average for both datasets. Using EOBS + upscaling shows how to regrid and extract the country average timeseries.

Here, we export the SEAS5 and EOBS datasets as NetCDF files to be imported in the other notebook. Note that for EOBS we had to download two separate files, which we concatenate below before exporting as nc.

[33]:
SEAS5_UK.to_netcdf('../UK_example/SEAS5/SEAS5_UK.nc') ## Store SEAS5 as NetCDF for future import

EOBS_concat = xr.concat([EOBS,EOBS2020.sel(time='2020-02')],dim='time') ## Concatenate the 1950-2019 and 2020 datasets.
EOBS_concat.to_netcdf('../UK_example/EOBS/EOBS_UK.nc') ## And store the 1950-2010 February precipitation into one nc for future import
Evaluate

Note

From here onward we use R and not python!

We switch to R since we believe R has a better functionality in extreme value statistics.

[1]:
setwd('../../..')
# getwd()
EOBS_UK_weighted_df <- read.csv("Data/EOBS_UK_weighted_upscaled.csv", stringsAsFactors=FALSE)
SEAS5_UK_weighted_df <- read.csv("Data/SEAS5_UK_weighted_masked.csv", stringsAsFactors=FALSE)

## Convert the time class to Date format
EOBS_UK_weighted_df$time <- lubridate::ymd(EOBS_UK_weighted_df$time)
str(EOBS_UK_weighted_df)

EOBS_UK_weighted_df_hindcast <- EOBS_UK_weighted_df[
    EOBS_UK_weighted_df$time > '1982-02-01' &
    EOBS_UK_weighted_df$time < '2017-02-01',
    ]


SEAS5_UK_weighted_df$time <- lubridate::ymd(SEAS5_UK_weighted_df$time)
str(SEAS5_UK_weighted_df)
'data.frame':   71 obs. of  2 variables:
 $ time: Date, format: "1950-02-28" "1951-02-28" ...
 $ rr  : num  4.13 3.25 1.07 1.59 2.59 ...
'data.frame':   9945 obs. of  4 variables:
 $ leadtime: int  2 2 2 2 2 2 2 2 2 2 ...
 $ number  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ time    : Date, format: "1982-02-01" "1983-02-01" ...
 $ tprate  : num  1.62 2.93 3.27 2 3.31 ...

Is the UNSEEN ensemble realistic?

To answer this question, we perform three statistical tests: independence, model stability and model fidelity tests.
These statistical tests are available through the UNSEEN R package. See evaluation for more info.
[2]:
require(UNSEEN)
require(ggplot2)
require(ggpubr)

Loading required package: UNSEEN

Loading required package: ggplot2

Warning message:
“replacing previous import ‘vctrs::data_frame’ by ‘tibble::data_frame’ when loading ‘dplyr’”
Loading required package: ggpubr

Timeseries

We plot the timeseries of SEAS5 (UNSEEN) and EOBS (OBS) for UK February precipitation.

[3]:
unseen_timeseries(ensemble = SEAS5_UK_weighted_df,
                               obs = EOBS_UK_weighted_df[EOBS_UK_weighted_df$time > '1982-02-01',],
                               ylab = 'UK February precipitation (mm/d)') +
theme(text = element_text(size = 14)) #This is just to increase the figure font
Warning message:
“Removed 4654 rows containing non-finite values (stat_boxplot).”
_images/Notebooks_examples_UK_Precipitation_30_1.png

We select the timeseries for the hindcast years 1981-2016.

[4]:
SEAS5_UK_hindcast <- SEAS5_UK_weighted_df[
    SEAS5_UK_weighted_df$time < '2017-02-01' &
    SEAS5_UK_weighted_df$number < 25,]
[5]:
unseen_timeseries(ensemble = SEAS5_UK_hindcast,
                  obs = EOBS_UK_weighted_df_hindcast,
                  ylab = 'UK February precipitation (mm/d)')#  %>%
# ggsave(height = 5, width = 6,   filename = "graphs/UK_timeseries.png")
_images/Notebooks_examples_UK_Precipitation_33_0.png
Evaluation tests

With the hindcast dataset we evaluate the independence, stability and fidelity.

First the independence test. This test checks if the forecasts are independent. If they are not, the event are not unique and care should be taken in the extreme value analysis. Because of the chaotic behaviour of the atmosphere, independence of precipitation events is expected beyond a lead time of two weeks. Here we use lead times 2-6 months and find that the boxplots are within the expected range (perhaps very small dependence in lead time 2). More info in our paper: https://doi.org/10.1038/s41612-020-00149-4.

[6]:
Independence_UK = independence_test(ensemble = SEAS5_UK_hindcast,
                                   detrend = TRUE) +
    theme(text = element_text(size = 14))

Independence_UK
Warning message:
“Removed 1625 rows containing non-finite values (stat_ydensity).”
Warning message:
“Removed 1625 rows containing non-finite values (stat_boxplot).”
_images/Notebooks_examples_UK_Precipitation_35_1.png

The test for model stability: Is there a drift in the simulated precipitation over lead times?

We find that the model is stable for UK February precipitation.

[7]:
Stability_UK = stability_test(ensemble = SEAS5_UK_hindcast,
                              lab = 'UK February precipitation (mm/d)')

Stability_UK
Warning message:
“Removed 4 row(s) containing missing values (geom_path).”
_images/Notebooks_examples_UK_Precipitation_37_1.png

The fidelity test shows us how consistent the model simulations of UNSEEN (SEAS5) are with the observed (EOBS). With this test we can asses systematic biases. The UNSEEN dataset is much larger than the observed – hence they cannot simply be compared. For example, what if we had faced a few more or a few less precipitation extremes purely by chance?

This would influence the observed mean, but not so much influence the UNSEEN ensemble because of the large data sample. Therefore we express the UNSEEN ensemble as a range of plausible means, for data samples of the same length as the observed. We do the same for higher order statistical moments.

[8]:
Fidelity_UK = fidelity_test(obs = EOBS_UK_weighted_df_hindcast$rr,
                            ensemble = SEAS5_UK_hindcast$tprate,
                            fontsize = 14
                            )
Fidelity_UK
_images/Notebooks_examples_UK_Precipitation_39_0.png

We find that the standard deviation within the model (the grey histograms and lines) are too low compared to the observed.

We can include a simple mean-bias correction (ratio) in this plot by setting biascor = TRUE. However, in this case it won’t help:

[16]:
fidelity_test(obs = EOBS_UK_weighted_df_hindcast$rr,
              ensemble = SEAS5_UK_hindcast$tprate,
              biascor = TRUE
             )
_images/Notebooks_examples_UK_Precipitation_41_0.png

Check the documentation of the test ?fidelity_test

Illustrate

First, we fit a Gumbel and a GEV distribution (including shape parameter) to the observed extremes. The Gumbel distribution best describes the data because the p-value of 0.9 is much above 0.05 (based on the likelihood ratio test).

[9]:
fit_obs_Gumbel <- fevd(x = EOBS_UK_weighted_df_hindcast$rr,
                    type = "Gumbel"
                   )
fit_obs_GEV <- fevd(x = EOBS_UK_weighted_df_hindcast$rr,
                    type = "GEV"
                   )
lr.test(fit_obs_Gumbel, fit_obs_GEV)

        Likelihood-ratio Test

data:  EOBS_UK_weighted_df_hindcast$rrEOBS_UK_weighted_df_hindcast$rr
Likelihood-ratio = 0.014629, chi-square critical value = 3.8415, alpha
= 0.0500, Degrees of Freedom = 1.0000, p-value = 0.9037
alternative hypothesis: greater

We show the gumbel plot for the observed (EOBS) and UNSEEN (SEAS5 hindcast data). This shows that the UNSEEN simulations are not within the uncertainty range of the observations. This has to do with the variability of the model that is too low, as indicated in the evaluation section. The EVT_plot function was written for this case study, but we cannot ensure robustness to other case studies.

[14]:
source('src/evt_plot.r')
options(repr.plot.width = 12)
EVT_plot(ensemble = SEAS5_UK_hindcast$tprate,
                     obs = EOBS_UK_weighted_df_hindcast$rr,
                     main = "1981-2016",
                     GEV_type = "Gumbel",
#                          ylim = 3,
                     y_lab = 'UK February precipitation (mm/d)'
         )
_images/Notebooks_examples_UK_Precipitation_47_0.png
Potential

We find that there is there too little variability within SEAS5 hindcasts of UK february precipitation. It might be resolution dependent or related to the signal-to-noise that is a problem over this region. The results can be fed back to model developers to help improve the models.

The use of other observational datasets and other model simulations can be further explored. For example, the UK Met Office studied UK monthly precipitation extremes using the UNSEEN method (Thompson et al., 2017). They showed that monthly precipitation records for south east England have a 7% chance of being exceeded in at least one month in any given winter. Their work was taken up in the UK National Flood Resilience Review (2016), showing the high relevance of the method.

Retrieve

We want to download the monthly precipitation for February. I use the automatically generated request from the CDS server. There are two datasets we can use to download the data: Seasonal forecast daily data on single levels and Seasonal forecast monthly statistics on single levels. We will use the latter for easy downloading of the monthly values. If we want to go to higher temporal resolution, such as daily extremes, we will have to consult the other dataset.

To get started with CDS, you have to register at https://cds.climate.copernicus.eu/ and copy your UID and API key from https://cds.climate.copernicus.eu/user in the ~/.cdsapirc file in the home directory of your user. See the ml-flood project for more details

[7]:
UID = 'UID'
API_key = 'API_key'
[8]:
import os
#Uncomment the following lines to write the UID and API key in the .cdsapirc file
# with open(os.path.join(os.path.expanduser('~'), '.cdsapirc'), 'w') as f:
#     f.write('url: https://cds.climate.copernicus.eu/api/v2\n')
#     f.write(f'key: {UID}:{API_key}')
[8]:
46
[8]:
47

Import packages

[1]:
##This is so variables get printed within jupyter
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
[2]:
##import packages
import os
import cdsapi ## check the current working directory, which should be the UNSEEN-open directory
import numpy as np
import xarray as xr
import matplotlib.pyplot as plt
import numpy as np
import cartopy
import cartopy.crs as ccrs
[3]:
##We want the working directory to be the UNSEEN-open directory
pwd = os.getcwd() ##current working directory is UNSEEN-open/Notebooks/1.Download
pwd #print the present working directory
os.chdir(pwd+'/../../../') # Change the working directory to UNSEEN-open
os.getcwd() #print the working directory
[3]:
'C:\\Users\\Timo\\OneDrive - Loughborough University\\GitHub\\UNSEEN-open\\doc\\Notebooks\\1.Download'
[3]:
'C:\\Users\\Timo\\OneDrive - Loughborough University\\GitHub\\UNSEEN-open\\doc'

First download

In our request, we will use the monthly mean. Interestingly, there is also the option to use the monthly maximum! We previously downloaded the data on daily resolution and extracted the monthly (or seasonal) maximum from that data. If we could just download the monthly maximum instead that might save a lot of processing power! However, you would be restricted to daily extremes only, for multi-day extremes (5 days is often used), you would have to do the original processing workflow. We select the UK domain to reduce the size of the download.

Here I download the monthly mean total precipitation (both convective and large scale precipitation) forecast for February 1993. It downloads all 25 ensemble members for the forecasts initialized in january.

[4]:
##Our first download:

c = cdsapi.Client()

c.retrieve(
    'seasonal-monthly-single-levels',
    {
        'format': 'netcdf',
        'originating_centre': 'ecmwf',
        'system': '5',
        'variable': 'total_precipitation',
        'product_type': [
            'monthly_mean', #'monthly_maximum',, 'monthly_standard_deviation',
        ],
        'year': '1993', #data before 1993 is available.
        'month': '01', #Initialization month. Target month is February (2), initialization months are August-January (8-12,1)
        'leadtime_month': [ ##Use of single months is much faster. Leadtime 0 does not exist. The first lead time is 1.
            '1', '2',
        ],
        'area': [##Select UK domain to reduce the size of the download
            60, -11, 50,
            2,
        ],
    },
    'Data/First_download.nc') ##can I use nc? yes!
2020-05-13 10:08:56,140 INFO Welcome to the CDS
2020-05-13 10:08:56,142 INFO Sending request to https://cds.climate.copernicus.eu/api/v2/resources/seasonal-monthly-single-levels
2020-05-13 10:08:56,983 INFO Request is completed
2020-05-13 10:08:56,984 INFO Downloading http://136.156.132.110/cache-compute-0001/cache/data0/adaptor.mars.external-1589266964.5635436-26283-29-a38e8975-b0ec-49ee-8f9b-7dea389f59cf.nc to Data/First_download.nc (16.4K)
2020-05-13 10:08:57,131 INFO Download rate 112.7K/s
[4]:
Result(content_length=16800,content_type=application/x-netcdf,location=http://136.156.132.110/cache-compute-0001/cache/data0/adaptor.mars.external-1589266964.5635436-26283-29-a38e8975-b0ec-49ee-8f9b-7dea389f59cf.nc)
Use xarray to visualize the netcdf file

I open the downloaded file and plot February 1993 precipitation over the UK.

[5]:
pr_1993_ds=xr.open_dataset('Data/First_download.nc')
pr_1993_ds

[5]:
Show/Hide data repr Show/Hide attributes
xarray.Dataset
    • latitude: 11
    • longitude: 14
    • number: 25
    • time: 2
    • longitude
      (longitude)
      float32
      -11.0 -10.0 -9.0 ... 0.0 1.0 2.0
      units :
      degrees_east
      long_name :
      longitude
      array([-11., -10.,  -9.,  -8.,  -7.,  -6.,  -5.,  -4.,  -3.,  -2.,  -1.,   0.,
               1.,   2.], dtype=float32)
    • latitude
      (latitude)
      float32
      60.0 59.0 58.0 ... 52.0 51.0 50.0
      units :
      degrees_north
      long_name :
      latitude
      array([60., 59., 58., 57., 56., 55., 54., 53., 52., 51., 50.], dtype=float32)
    • number
      (number)
      int32
      0 1 2 3 4 5 6 ... 19 20 21 22 23 24
      long_name :
      ensemble_member
      array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
             18, 19, 20, 21, 22, 23, 24], dtype=int32)
    • time
      (time)
      datetime64[ns]
      1993-01-01 1993-02-01
      long_name :
      time
      array(['1993-01-01T00:00:00.000000000', '1993-02-01T00:00:00.000000000'],
            dtype='datetime64[ns]')
    • tprate
      (time, number, latitude, longitude)
      float32
      ...
      units :
      m s**-1
      long_name :
      Mean total precipitation rate
      array([[[[7.843623e-08, ..., 5.291088e-08],
               ...,
               [5.904984e-08, ..., 1.335037e-08]],
      
              ...,
      
              [[8.193555e-08, ..., 6.469647e-08],
               ...,
               [4.698779e-08, ..., 2.983202e-08]]],
      
      
             [[[7.322512e-08, ..., 7.874678e-08],
               ...,
               [4.052692e-08, ..., 2.413616e-08]],
      
              ...,
      
              [[5.092263e-08, ..., 3.040388e-08],
               ...,
               [3.127492e-08, ..., 2.817704e-08]]]], dtype=float32)
  • Conventions :
    CF-1.6
    history :
    2020-05-12 07:02:45 GMT by grib_to_netcdf-2.16.0: /opt/ecmwf/eccodes/bin/grib_to_netcdf -S param -o /cache/data0/adaptor.mars.external-1589266964.5635436-26283-29-a38e8975-b0ec-49ee-8f9b-7dea389f59cf.nc /cache/tmp/a38e8975-b0ec-49ee-8f9b-7dea389f59cf-adaptor.mars.external-1589266964.5641062-26283-11-tmp.grib

I select ensemble member 0 and february precipitation (‘tprate’ called apparently) and I use cartopy to make the map.

[6]:
## Use cartopy for nicer maps
ax = plt.axes(projection= ccrs.OSGB())
pr_1993_ds['tprate'].sel(number=0,time='1993-02').plot(transform=ccrs.PlateCarree(),cmap=plt.cm.Blues, ax=ax)  #,cmap=plt.cm.Blues,

# ax.set_extent(extent)
ax.coastlines(resolution='50m')
plt.draw()
[6]:
<matplotlib.collections.QuadMesh at 0x7f5435adfa60>
[6]:
<cartopy.mpl.feature_artist.FeatureArtist at 0x7f5435b6ff10>
_images/Notebooks_1.Download_1.Retrieve_12_2.png
Download all data

We will be using the SEAS5 hindcast, which is a dataset running from 1981-2016. The hindcast is initialized every month with 25 ensemble members and the forecast run for 6 months, indicated by blue horizontal bars below. February is forecasted by 6 initialization months (September-February). We discard the first month of the forecast because of dependence between the forecasts, explained in the evaluation section and are left with 5 initialization months (Sep-Jan) and 25 ensemble members forecasting february precipitation each year, totalling to an increase of 125 times the observed length.

For a summary of all available C3S seasonal hindcasts, their initialization months and more specifics, please see ECMWF page and the SEAS5 paper.

title

The first download example above downloaded all 25 ensemble members for the forecast initialized in January (the bottom bar). We should repeat this over the other initialization month and over all years (1981-2016).

[58]:
init_months = np.append(np.arange(9,13),1) ## Initialization months 9-12,1 (Sep-Jan)
init_months
years = np.arange(1982,2017)
years

[58]:
array([ 9, 10, 11, 12,  1])
[58]:
array([1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992,
       1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003,
       2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014,
       2015, 2016])

For our download, we loop over initialization months and years. Because we only want February precipitation, the leadtime month (forecast length) changes with the initialization month. For example, in the September initialized forecasts, we only want the leadtime month 6 = February. For August initialized this is leadtime 5, etc. Furthermore, the year the forecast is initialized is required for the download. For September - December initialized forecasts, this is the target year-1. For January it is the same year as the target year. For example, for the first two target years this looks like the following:

[101]:

for j in range(2):#len(years)):
    for i in range(len(init_months)):
        init_month = init_months[i]
        leadtime_month = 6-i
        if init_month == 1:
            year = years[j]
        else:
            year = years[j]-1
        print ('year = ' + str(year) +' init_month = ' + str(init_month) + ' leadtime_month = ' + str(leadtime_month))
year = 1981 init_month = 9 leadtime_month = 6
year = 1981 init_month = 10 leadtime_month = 5
year = 1981 init_month = 11 leadtime_month = 4
year = 1981 init_month = 12 leadtime_month = 3
year = 1982 init_month = 1 leadtime_month = 2
year = 1982 init_month = 9 leadtime_month = 6
year = 1982 init_month = 10 leadtime_month = 5
year = 1982 init_month = 11 leadtime_month = 4
year = 1982 init_month = 12 leadtime_month = 3
year = 1983 init_month = 1 leadtime_month = 2

Write a function that is used for the download.

[72]:
def retrieve(variable, originating_centre, year, init_month, leadtime_month):

    c.retrieve(
        'seasonal-monthly-single-levels',
        {
            'format': 'netcdf',
            'originating_centre': originating_centre,
            'system': '5',
            'variable': variable,
            'product_type': [
                'monthly_mean', #'monthly_maximum',, 'monthly_standard_deviation',
            ],
            'year': str(year), #data before 1993 is available.
            'month': "%.2i" % init_month, #Initialization month. Target month is February (2), initialization months are August-January (8-12,1)
            'leadtime_month': [ ##The lead times you want. Use of single months is much faster. Leadtime 0 does not exist. The first lead time is 1.
                #For initialization month 1 (January), the leadtime months is 2 (February). For initialization month 12 (december), the lead time month is 3 (February).
                str(leadtime_month),
            ],
            'area': [##Select UK domain to reduce the size of the download
                     ## 25N-75N x. 40W-75E
                60, -11, 50, 2,
            ],
        },
        '../UK_example/'+ str(year) + "%.2i" % init_month + '.nc')

# retrieve(variable = 'total_precipitation',originating_centre = 'ecmwf', year = years[0], init_month = "%.2i" % init_months[0])

And start the download! In total, we request 35 years x initialization dates = 175 requests. I could try sending just 5 request of the different initialization dates for all years?

[ ]:
for j in range(len(years)):  ##add if error still continue
    for i in range(len(init_months)):
        init_month = init_months[i]
        leadtime_month = 6 - i
        if init_month == 1:
            year = years[j]
        else:
            year = years[j] - 1
        retrieve(variable='total_precipitation',
                 originating_centre='ecmwf',
                 year=year,
                 init_month=init_month,
                 leadtime_month=leadtime_month)
2020-05-18 10:14:48,767 INFO Welcome to the CDS
2020-05-18 10:14:48,768 INFO Sending request to https://cds.climate.copernicus.eu/api/v2/resources/seasonal-monthly-single-levels
2020-05-18 10:14:49,485 INFO Downloading http://136.156.132.235/cache-compute-0006/cache/data5/adaptor.mars.external-1589380912.7108843-4209-7-1add31ae-a0cd-44ce-83ac-9ff7c97f1b01.nc to ../UK_example/198109.nc (8.9K)
2020-05-18 10:14:49,575 INFO Download rate 101.5K/s
2020-05-18 10:14:49,803 INFO Welcome to the CDS
2020-05-18 10:14:49,804 INFO Sending request to https://cds.climate.copernicus.eu/api/v2/resources/seasonal-monthly-single-levels
2020-05-18 10:14:50,498 INFO Downloading http://136.156.132.153/cache-compute-0002/cache/data4/adaptor.mars.external-1589381056.172494-12462-1-c9714216-87ac-49bc-be19-260627a9077d.nc to ../UK_example/198110.nc (8.9K)
2020-05-18 10:14:50,571 INFO Download rate 124.6K/s
2020-05-18 10:14:51,070 INFO Welcome to the CDS
2020-05-18 10:14:51,071 INFO Sending request to https://cds.climate.copernicus.eu/api/v2/resources/seasonal-monthly-single-levels
2020-05-18 10:14:51,213 INFO Downloading http://136.156.132.235/cache-compute-0006/cache/data9/adaptor.mars.external-1589381301.6300867-8112-3-49ba0ab2-34fe-4364-9dec-700bf911b079.nc to ../UK_example/198111.nc (8.9K)
2020-05-18 10:14:51,254 INFO Download rate 219.7K/s
2020-05-18 10:14:51,415 INFO Welcome to the CDS
2020-05-18 10:14:51,416 INFO Sending request to https://cds.climate.copernicus.eu/api/v2/resources/seasonal-monthly-single-levels
2020-05-18 10:14:51,548 INFO Request is queued

The download sometimes fails. When redoing the request it does download. I don’t know what is causing the failure? Below I donwload the file that failed.

[97]:
#201501 missing

year = 2015
init_month = 1
leadtime_month = 2
retrieve(variable = 'total_precipitation',originating_centre = 'ecmwf', year = year,
                 init_month = init_month, leadtime_month = leadtime_month)


2020-05-15 11:51:16,127 INFO Welcome to the CDS
2020-05-15 11:51:16,129 INFO Sending request to https://cds.climate.copernicus.eu/api/v2/resources/seasonal-monthly-single-levels
2020-05-15 11:51:16,327 INFO Downloading http://136.156.133.46/cache-compute-0015/cache/data7/adaptor.mars.external-1589527607.2123153-8094-37-3b786f72-2e2a-462f-bbb8-9c8d89c05102.nc to ../UK_example/201501.nc (8.9K)
2020-05-15 11:51:16,485 INFO Download rate 56.7K/s
Retrieve function

We have written a module where the above procedure is done automatically. Here we load the retrieve module and retrieve SEAS5 and ERA5 data for the examples by selecting the variable, target month(s), area and folder where we want to download the file in.

The main function to download the data is retrieve.retrieve_SEAS5. The function only downloads the target months, for each year and each intialization month. To do this, it obtains the initialization months and leadtimes from the selected target month(s). For the UK example, we select February as our target month, hence sep-jan will be our initialization months with leadtimes 2-6, see Download all.

[7]:
retrieve.print_arguments([2])
year = 1982 init_month = 1 leadtime_month = [2]
year = 1981 init_month = 12 leadtime_month = [3]
year = 1981 init_month = 11 leadtime_month = [4]
year = 1981 init_month = 10 leadtime_month = [5]
year = 1981 init_month = 9 leadtime_month = [6]

For the Siberia example this will be different, since the target months are march-may:

[8]:
retrieve.print_arguments([3,4,5])
year = 1982 init_month = 2 leadtime_month = [2 3 4]
year = 1982 init_month = 1 leadtime_month = [3 4 5]
year = 1981 init_month = 12 leadtime_month = [4 5 6]

Call ?retrieve.retrieve_SEAS5 to see the documentation.

For the California example, we use:

[ ]:
retrieve.retrieve_SEAS5(
    variables=['2m_temperature', '2m_dewpoint_temperature'],
    target_months=[8],
    area=[70, -130, 20, -70],
    years=np.arange(1981, 2021),
    folder='E:/PhD/California_example/SEAS5/')
[ ]:
retrieve.retrieve_ERA5(variables=['2m_temperature', '2m_dewpoint_temperature'],
                       target_months=[8],
                       area=[70, -130, 20, -70],
                       folder='E:/PhD/California_example/SEAS5/')

For the Siberia example:

[ ]:
retrieve.retrieve_SEAS5(
    variables=['2m_temperature', '2m_dewpoint_temperature'],
    target_months=[3, 4, 5],
    area=[70, -11, 30, 120],
    years=np.arange(1981, 2021),
    folder='../Siberia_example/SEAS5/')
[ ]:
retrieve.retrieve_ERA5(variables = ['2m_temperature','2m_dewpoint_temperature'],
                       target_months = [3,4,5],
                       area = [70, -11, 30, 120],
                       folder = '../Siberia_example/ERA5/')

And for the UK example:

[ ]:
retrieve.retrieve_SEAS5(variables = 'total_precipitation',
                        target_months = [2],
                        area = [60, -11, 50, 2],
                        folder = '../UK_example/SEAS5/')
[ ]:
retrieve.retrieve_ERA5(variables = 'total_precipitation',
                       target_months = [2],
                       area = [60, -11, 50, 2],
                       folder = '../UK_example/ERA5/')
EOBS data download

I tried to download EOBS through CDS, but the Product is temporally disabled for maintenance purposes (see below). As workaround I downloaded EOBS (from 1950 - 2019) and the most recent EOBS data (2020) here. Note, you have to register as E-OBS user.

[99]:
c.retrieve(
    'insitu-gridded-observations-europe',
    {
        'version': 'v20.0e',
        'format': 'zip',
        'product_type': 'ensemble_mean',
        'variable': 'precipitation_amount',
        'grid_resolution': '0_25',
        'period': 'full_period',
    },
    '../UK_example/EOBS/EOBS.zip')
2020-05-15 14:06:44,721 INFO Welcome to the CDS
2020-05-15 14:06:44,722 INFO Sending request to https://cds.climate.copernicus.eu/api/v2/resources/insitu-gridded-observations-europe
---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
~/.conda/envs/UNSEEN-open/lib/python3.8/site-packages/cdsapi/api.py in _api(self, url, request, method)
    388         try:
--> 389             result.raise_for_status()
    390             reply = result.json()

~/.conda/envs/UNSEEN-open/lib/python3.8/site-packages/requests/models.py in raise_for_status(self)
    940         if http_error_msg:
--> 941             raise HTTPError(http_error_msg, response=self)
    942

HTTPError: 403 Client Error:  for url: https://cds.climate.copernicus.eu/api/v2/resources/insitu-gridded-observations-europe

During handling of the above exception, another exception occurred:

Exception                                 Traceback (most recent call last)
<ipython-input-99-d12768b41b79> in <module>
----> 1 c.retrieve(
      2     'insitu-gridded-observations-europe',
      3     {
      4         'version': 'v20.0e',
      5         'format': 'zip',

~/.conda/envs/UNSEEN-open/lib/python3.8/site-packages/cdsapi/api.py in retrieve(self, name, request, target)
    315
    316     def retrieve(self, name, request, target=None):
--> 317         result = self._api('%s/resources/%s' % (self.url, name), request, 'POST')
    318         if target is not None:
    319             result.download(target)

~/.conda/envs/UNSEEN-open/lib/python3.8/site-packages/cdsapi/api.py in _api(self, url, request, method)
    408                                  "of '%s' at %s" % (t['title'], t['url']))
    409                     error = '. '.join(e)
--> 410                 raise Exception(error)
    411             else:
    412                 raise

Exception: Product temporally disabled for maintenance purposes. Sorry for the inconvenience, please try again later.

Preprocess

The preprocessing steps consist of merging all retrieved files into one xarray dataset and extracting the spatial and temporal average of the event of interest.

Merge

Here it is shown how all retrieved files are loaded into one xarray dataset, for both SEAS5 and for ERA5.

SEAS5

All retrieved seasonal forecasts are loaded into one xarray dataset. The amount of files retrieved depends on the temporal extent of the extreme event that is being analyzed (i.e are you looking at a monthly average or a seasonal average?). For the Siberian heatwave, we have retrieved 105 files (one for each of the 35 years and for each of the three lead times, (see Retrieve). For the UK, we are able to use more forecasts, because the target month is shorter: one month as compared to three months for the Siberian example. We retrieved 5 leadtimes x 35 = 175 files.

Each netcdf file contains 25 ensemble members, hence has the dimensions lat, lon, number (25 ensembles). Here we create an xarray dataset that also contains the dimensions time (35 years) and leadtime (5 initialization months). To generate this, we loop over lead times, and open all 35 years of the lead time and then concatenate those leadtimes.

[1]:
##This is so variables get printed within jupyter
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
[2]:
import os
import sys
sys.path.insert(0, os.path.abspath('../../../'))
import src.cdsretrieve as retrieve
[3]:
os.chdir(os.path.abspath('../../../'))
os.getcwd() #print the working directory
[3]:
'/lustre/soge1/projects/ls/personal/timo/UNSEEN-open'
[4]:
import xarray as xr
import numpy as np

def merge_SEAS5(folder, target_months):
    init_months, leadtimes = retrieve._get_init_months(target_months)
    print('Lead time: ' + "%.2i" % init_months[0])
    SEAS5_ld1 = xr.open_mfdataset(
        folder + '*' + "%.2i" % init_months[0] + '.nc',
        combine='by_coords')  # Load the first lead time
    SEAS5 = SEAS5_ld1  # Create the xarray dataset to concatenate over
    for init_month in init_months[1:len(init_months)]:  ## Remove the first that we already have
        print(init_month)
        SEAS5_ld = xr.open_mfdataset(
            folder + '*' + "%.2i" % init_month + '.nc',
            combine='by_coords')
        SEAS5 = xr.concat([SEAS5, SEAS5_ld], dim='leadtime')
    SEAS5 = SEAS5.assign_coords(leadtime = np.arange(len(init_months)) + 2) # assign leadtime coordinates
    return(SEAS5)
[5]:
SEAS5_Siberia = merge_SEAS5(folder='../Siberia_example/SEAS5/',
                            target_months=[3, 4, 5])
Lead time: 02
1
12
[6]:
SEAS5_Siberia
[6]:
Show/Hide data repr Show/Hide attributes
xarray.Dataset
    • latitude: 41
    • leadtime: 3
    • longitude: 132
    • number: 51
    • time: 117
    • longitude
      (longitude)
      float32
      -11.0 -10.0 -9.0 ... 119.0 120.0
      units :
      degrees_east
      long_name :
      longitude
      array([-11., -10.,  -9.,  -8.,  -7.,  -6.,  -5.,  -4.,  -3.,  -2.,  -1.,   0.,
               1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,  10.,  11.,  12.,
              13.,  14.,  15.,  16.,  17.,  18.,  19.,  20.,  21.,  22.,  23.,  24.,
              25.,  26.,  27.,  28.,  29.,  30.,  31.,  32.,  33.,  34.,  35.,  36.,
              37.,  38.,  39.,  40.,  41.,  42.,  43.,  44.,  45.,  46.,  47.,  48.,
              49.,  50.,  51.,  52.,  53.,  54.,  55.,  56.,  57.,  58.,  59.,  60.,
              61.,  62.,  63.,  64.,  65.,  66.,  67.,  68.,  69.,  70.,  71.,  72.,
              73.,  74.,  75.,  76.,  77.,  78.,  79.,  80.,  81.,  82.,  83.,  84.,
              85.,  86.,  87.,  88.,  89.,  90.,  91.,  92.,  93.,  94.,  95.,  96.,
              97.,  98.,  99., 100., 101., 102., 103., 104., 105., 106., 107., 108.,
             109., 110., 111., 112., 113., 114., 115., 116., 117., 118., 119., 120.],
            dtype=float32)
    • latitude
      (latitude)
      float32
      70.0 69.0 68.0 ... 32.0 31.0 30.0
      units :
      degrees_north
      long_name :
      latitude
      array([70., 69., 68., 67., 66., 65., 64., 63., 62., 61., 60., 59., 58., 57.,
             56., 55., 54., 53., 52., 51., 50., 49., 48., 47., 46., 45., 44., 43.,
             42., 41., 40., 39., 38., 37., 36., 35., 34., 33., 32., 31., 30.],
            dtype=float32)
    • number
      (number)
      int64
      0 1 2 3 4 5 6 ... 45 46 47 48 49 50
      long_name :
      ensemble_member
      array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
             18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
             36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50])
    • time
      (time)
      datetime64[ns]
      1982-03-01 ... 2020-05-01
      long_name :
      time
      array(['1982-03-01T00:00:00.000000000', '1982-04-01T00:00:00.000000000',
             '1982-05-01T00:00:00.000000000', '1983-03-01T00:00:00.000000000',
             '1983-04-01T00:00:00.000000000', '1983-05-01T00:00:00.000000000',
             '1984-03-01T00:00:00.000000000', '1984-04-01T00:00:00.000000000',
             '1984-05-01T00:00:00.000000000', '1985-03-01T00:00:00.000000000',
             '1985-04-01T00:00:00.000000000', '1985-05-01T00:00:00.000000000',
             '1986-03-01T00:00:00.000000000', '1986-04-01T00:00:00.000000000',
             '1986-05-01T00:00:00.000000000', '1987-03-01T00:00:00.000000000',
             '1987-04-01T00:00:00.000000000', '1987-05-01T00:00:00.000000000',
             '1988-03-01T00:00:00.000000000', '1988-04-01T00:00:00.000000000',
             '1988-05-01T00:00:00.000000000', '1989-03-01T00:00:00.000000000',
             '1989-04-01T00:00:00.000000000', '1989-05-01T00:00:00.000000000',
             '1990-03-01T00:00:00.000000000', '1990-04-01T00:00:00.000000000',
             '1990-05-01T00:00:00.000000000', '1991-03-01T00:00:00.000000000',
             '1991-04-01T00:00:00.000000000', '1991-05-01T00:00:00.000000000',
             '1992-03-01T00:00:00.000000000', '1992-04-01T00:00:00.000000000',
             '1992-05-01T00:00:00.000000000', '1993-03-01T00:00:00.000000000',
             '1993-04-01T00:00:00.000000000', '1993-05-01T00:00:00.000000000',
             '1994-03-01T00:00:00.000000000', '1994-04-01T00:00:00.000000000',
             '1994-05-01T00:00:00.000000000', '1995-03-01T00:00:00.000000000',
             '1995-04-01T00:00:00.000000000', '1995-05-01T00:00:00.000000000',
             '1996-03-01T00:00:00.000000000', '1996-04-01T00:00:00.000000000',
             '1996-05-01T00:00:00.000000000', '1997-03-01T00:00:00.000000000',
             '1997-04-01T00:00:00.000000000', '1997-05-01T00:00:00.000000000',
             '1998-03-01T00:00:00.000000000', '1998-04-01T00:00:00.000000000',
             '1998-05-01T00:00:00.000000000', '1999-03-01T00:00:00.000000000',
             '1999-04-01T00:00:00.000000000', '1999-05-01T00:00:00.000000000',
             '2000-03-01T00:00:00.000000000', '2000-04-01T00:00:00.000000000',
             '2000-05-01T00:00:00.000000000', '2001-03-01T00:00:00.000000000',
             '2001-04-01T00:00:00.000000000', '2001-05-01T00:00:00.000000000',
             '2002-03-01T00:00:00.000000000', '2002-04-01T00:00:00.000000000',
             '2002-05-01T00:00:00.000000000', '2003-03-01T00:00:00.000000000',
             '2003-04-01T00:00:00.000000000', '2003-05-01T00:00:00.000000000',
             '2004-03-01T00:00:00.000000000', '2004-04-01T00:00:00.000000000',
             '2004-05-01T00:00:00.000000000', '2005-03-01T00:00:00.000000000',
             '2005-04-01T00:00:00.000000000', '2005-05-01T00:00:00.000000000',
             '2006-03-01T00:00:00.000000000', '2006-04-01T00:00:00.000000000',
             '2006-05-01T00:00:00.000000000', '2007-03-01T00:00:00.000000000',
             '2007-04-01T00:00:00.000000000', '2007-05-01T00:00:00.000000000',
             '2008-03-01T00:00:00.000000000', '2008-04-01T00:00:00.000000000',
             '2008-05-01T00:00:00.000000000', '2009-03-01T00:00:00.000000000',
             '2009-04-01T00:00:00.000000000', '2009-05-01T00:00:00.000000000',
             '2010-03-01T00:00:00.000000000', '2010-04-01T00:00:00.000000000',
             '2010-05-01T00:00:00.000000000', '2011-03-01T00:00:00.000000000',
             '2011-04-01T00:00:00.000000000', '2011-05-01T00:00:00.000000000',
             '2012-03-01T00:00:00.000000000', '2012-04-01T00:00:00.000000000',
             '2012-05-01T00:00:00.000000000', '2013-03-01T00:00:00.000000000',
             '2013-04-01T00:00:00.000000000', '2013-05-01T00:00:00.000000000',
             '2014-03-01T00:00:00.000000000', '2014-04-01T00:00:00.000000000',
             '2014-05-01T00:00:00.000000000', '2015-03-01T00:00:00.000000000',
             '2015-04-01T00:00:00.000000000', '2015-05-01T00:00:00.000000000',
             '2016-03-01T00:00:00.000000000', '2016-04-01T00:00:00.000000000',
             '2016-05-01T00:00:00.000000000', '2017-03-01T00:00:00.000000000',
             '2017-04-01T00:00:00.000000000', '2017-05-01T00:00:00.000000000',
             '2018-03-01T00:00:00.000000000', '2018-04-01T00:00:00.000000000',
             '2018-05-01T00:00:00.000000000', '2019-03-01T00:00:00.000000000',
             '2019-04-01T00:00:00.000000000', '2019-05-01T00:00:00.000000000',
             '2020-03-01T00:00:00.000000000', '2020-04-01T00:00:00.000000000',
             '2020-05-01T00:00:00.000000000'], dtype='datetime64[ns]')
    • leadtime
      (leadtime)
      int64
      2 3 4
      array([2, 3, 4])
    • t2m
      (leadtime, time, number, latitude, longitude)
      float32
      dask.array<chunksize=(1, 3, 51, 41, 132), meta=np.ndarray>
      units :
      K
      long_name :
      2 metre temperature
      Array Chunk
      Bytes 387.52 MB 3.31 MB
      Shape (3, 117, 51, 41, 132) (1, 3, 51, 41, 132)
      Count 887 Tasks 117 Chunks
      Type float32 numpy.ndarray
      117 3 132 41 51
    • d2m
      (leadtime, time, number, latitude, longitude)
      float32
      dask.array<chunksize=(1, 3, 51, 41, 132), meta=np.ndarray>
      units :
      K
      long_name :
      2 metre dewpoint temperature
      Array Chunk
      Bytes 387.52 MB 3.31 MB
      Shape (3, 117, 51, 41, 132) (1, 3, 51, 41, 132)
      Count 887 Tasks 117 Chunks
      Type float32 numpy.ndarray
      117 3 132 41 51
  • Conventions :
    CF-1.6
    history :
    2020-09-08 09:33:24 GMT by grib_to_netcdf-2.16.0: /opt/ecmwf/eccodes/bin/grib_to_netcdf -S param -o /cache/data1/adaptor.mars.external-1599557575.5884402-22815-11-a3e13f38-976c-41fd-bc34-d70ac6258b8d.nc /cache/tmp/a3e13f38-976c-41fd-bc34-d70ac6258b8d-adaptor.mars.external-1599557575.5891798-22815-3-tmp.grib

You can for example select a the lat, long, time, ensemble member and lead time as follows (add .load() to see the values):

[ ]:
SEAS5_Siberia.sel(latitude=60,
                  longitude=-10,
                  time='2000-03',
                  number=26,
                  leadtime=3).load()

We can repeat this for the UK example, where just February is the target month:

[10]:
SEAS5_UK = merge_SEAS5(folder = '../UK_example/SEAS5/', target_months = [2])
Lead time: 01
12
11
10
9

The SEAS5 total precipitation rate is in m/s. You can easily convert this and change the attributes. Click on the show/hide attributes button to see the assigned attributes.

[11]:
SEAS5_UK['tprate'] = SEAS5_UK['tprate'] * 1000 * 3600 * 24 ## From m/s to mm/d
SEAS5_UK['tprate'].attrs = {'long_name': 'rainfall',
 'units': 'mm/day',
 'standard_name': 'thickness_of_rainfall_amount'}
SEAS5_UK
[11]:
Show/Hide data repr Show/Hide attributes
xarray.Dataset
    • latitude: 11
    • leadtime: 5
    • longitude: 14
    • number: 25
    • time: 35
    • time
      (time)
      datetime64[ns]
      1982-02-01 ... 2016-02-01
      long_name :
      time
      array(['1982-02-01T00:00:00.000000000', '1983-02-01T00:00:00.000000000',
             '1984-02-01T00:00:00.000000000', '1985-02-01T00:00:00.000000000',
             '1986-02-01T00:00:00.000000000', '1987-02-01T00:00:00.000000000',
             '1988-02-01T00:00:00.000000000', '1989-02-01T00:00:00.000000000',
             '1990-02-01T00:00:00.000000000', '1991-02-01T00:00:00.000000000',
             '1992-02-01T00:00:00.000000000', '1993-02-01T00:00:00.000000000',
             '1994-02-01T00:00:00.000000000', '1995-02-01T00:00:00.000000000',
             '1996-02-01T00:00:00.000000000', '1997-02-01T00:00:00.000000000',
             '1998-02-01T00:00:00.000000000', '1999-02-01T00:00:00.000000000',
             '2000-02-01T00:00:00.000000000', '2001-02-01T00:00:00.000000000',
             '2002-02-01T00:00:00.000000000', '2003-02-01T00:00:00.000000000',
             '2004-02-01T00:00:00.000000000', '2005-02-01T00:00:00.000000000',
             '2006-02-01T00:00:00.000000000', '2007-02-01T00:00:00.000000000',
             '2008-02-01T00:00:00.000000000', '2009-02-01T00:00:00.000000000',
             '2010-02-01T00:00:00.000000000', '2011-02-01T00:00:00.000000000',
             '2012-02-01T00:00:00.000000000', '2013-02-01T00:00:00.000000000',
             '2014-02-01T00:00:00.000000000', '2015-02-01T00:00:00.000000000',
             '2016-02-01T00:00:00.000000000'], dtype='datetime64[ns]')
    • latitude
      (latitude)
      float32
      60.0 59.0 58.0 ... 52.0 51.0 50.0
      units :
      degrees_north
      long_name :
      latitude
      array([60., 59., 58., 57., 56., 55., 54., 53., 52., 51., 50.], dtype=float32)
    • number
      (number)
      int32
      0 1 2 3 4 5 6 ... 19 20 21 22 23 24
      long_name :
      ensemble_member
      array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
             18, 19, 20, 21, 22, 23, 24], dtype=int32)
    • longitude
      (longitude)
      float32
      -11.0 -10.0 -9.0 ... 0.0 1.0 2.0
      units :
      degrees_east
      long_name :
      longitude
      array([-11., -10.,  -9.,  -8.,  -7.,  -6.,  -5.,  -4.,  -3.,  -2.,  -1.,   0.,
               1.,   2.], dtype=float32)
    • leadtime
      (leadtime)
      int64
      2 3 4 5 6
      array([2, 3, 4, 5, 6])
    • tprate
      (leadtime, time, number, latitude, longitude)
      float32
      dask.array<chunksize=(1, 1, 25, 11, 14), meta=np.ndarray>
      long_name :
      rainfall
      units :
      mm/day
      standard_name :
      thickness_of_rainfall_amount
      Array Chunk
      Bytes 2.69 MB 15.40 kB
      Shape (5, 35, 25, 11, 14) (1, 1, 25, 11, 14)
      Count 1715 Tasks 175 Chunks
      Type float32 numpy.ndarray
      35 5 14 11 25
  • Conventions :
    CF-1.6
    history :
    2020-05-13 14:49:43 GMT by grib_to_netcdf-2.16.0: /opt/ecmwf/eccodes/bin/grib_to_netcdf -S param -o /cache/data7/adaptor.mars.external-1589381366.1540039-11561-3-ad31a097-72e2-45ce-a565-55c62502f358.nc /cache/tmp/ad31a097-72e2-45ce-a565-55c62502f358-adaptor.mars.external-1589381366.1545565-11561-1-tmp.grib
ERA5

For each year a netcdf file is downloaded. They are named ERA5_yyyy, for example ERA5_1981. Therefore, we can load ERA5 by combining all downloaded years:

[9]:
ERA5_Siberia = xr.open_mfdataset('../Siberia_example/ERA5/ERA5_????.nc',combine='by_coords') ## open the data
ERA5_Siberia
[9]:
Show/Hide data repr Show/Hide attributes
xarray.Dataset
    • latitude: 41
    • longitude: 132
    • time: 126
    • longitude
      (longitude)
      float32
      -11.0 -10.0 -9.0 ... 119.0 120.0
      units :
      degrees_east
      long_name :
      longitude
      array([-11., -10.,  -9.,  -8.,  -7.,  -6.,  -5.,  -4.,  -3.,  -2.,  -1.,   0.,
               1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,  10.,  11.,  12.,
              13.,  14.,  15.,  16.,  17.,  18.,  19.,  20.,  21.,  22.,  23.,  24.,
              25.,  26.,  27.,  28.,  29.,  30.,  31.,  32.,  33.,  34.,  35.,  36.,
              37.,  38.,  39.,  40.,  41.,  42.,  43.,  44.,  45.,  46.,  47.,  48.,
              49.,  50.,  51.,  52.,  53.,  54.,  55.,  56.,  57.,  58.,  59.,  60.,
              61.,  62.,  63.,  64.,  65.,  66.,  67.,  68.,  69.,  70.,  71.,  72.,
              73.,  74.,  75.,  76.,  77.,  78.,  79.,  80.,  81.,  82.,  83.,  84.,
              85.,  86.,  87.,  88.,  89.,  90.,  91.,  92.,  93.,  94.,  95.,  96.,
              97.,  98.,  99., 100., 101., 102., 103., 104., 105., 106., 107., 108.,
             109., 110., 111., 112., 113., 114., 115., 116., 117., 118., 119., 120.],
            dtype=float32)
    • latitude
      (latitude)
      float32
      70.0 69.0 68.0 ... 32.0 31.0 30.0
      units :
      degrees_north
      long_name :
      latitude
      array([70., 69., 68., 67., 66., 65., 64., 63., 62., 61., 60., 59., 58., 57.,
             56., 55., 54., 53., 52., 51., 50., 49., 48., 47., 46., 45., 44., 43.,
             42., 41., 40., 39., 38., 37., 36., 35., 34., 33., 32., 31., 30.],
            dtype=float32)
    • time
      (time)
      datetime64[ns]
      1979-03-01 ... 2020-05-01
      long_name :
      time
      array(['1979-03-01T00:00:00.000000000', '1979-04-01T00:00:00.000000000',
             '1979-05-01T00:00:00.000000000', '1980-03-01T00:00:00.000000000',
             '1980-04-01T00:00:00.000000000', '1980-05-01T00:00:00.000000000',
             '1981-03-01T00:00:00.000000000', '1981-04-01T00:00:00.000000000',
             '1981-05-01T00:00:00.000000000', '1982-03-01T00:00:00.000000000',
             '1982-04-01T00:00:00.000000000', '1982-05-01T00:00:00.000000000',
             '1983-03-01T00:00:00.000000000', '1983-04-01T00:00:00.000000000',
             '1983-05-01T00:00:00.000000000', '1984-03-01T00:00:00.000000000',
             '1984-04-01T00:00:00.000000000', '1984-05-01T00:00:00.000000000',
             '1985-03-01T00:00:00.000000000', '1985-04-01T00:00:00.000000000',
             '1985-05-01T00:00:00.000000000', '1986-03-01T00:00:00.000000000',
             '1986-04-01T00:00:00.000000000', '1986-05-01T00:00:00.000000000',
             '1987-03-01T00:00:00.000000000', '1987-04-01T00:00:00.000000000',
             '1987-05-01T00:00:00.000000000', '1988-03-01T00:00:00.000000000',
             '1988-04-01T00:00:00.000000000', '1988-05-01T00:00:00.000000000',
             '1989-03-01T00:00:00.000000000', '1989-04-01T00:00:00.000000000',
             '1989-05-01T00:00:00.000000000', '1990-03-01T00:00:00.000000000',
             '1990-04-01T00:00:00.000000000', '1990-05-01T00:00:00.000000000',
             '1991-03-01T00:00:00.000000000', '1991-04-01T00:00:00.000000000',
             '1991-05-01T00:00:00.000000000', '1992-03-01T00:00:00.000000000',
             '1992-04-01T00:00:00.000000000', '1992-05-01T00:00:00.000000000',
             '1993-03-01T00:00:00.000000000', '1993-04-01T00:00:00.000000000',
             '1993-05-01T00:00:00.000000000', '1994-03-01T00:00:00.000000000',
             '1994-04-01T00:00:00.000000000', '1994-05-01T00:00:00.000000000',
             '1995-03-01T00:00:00.000000000', '1995-04-01T00:00:00.000000000',
             '1995-05-01T00:00:00.000000000', '1996-03-01T00:00:00.000000000',
             '1996-04-01T00:00:00.000000000', '1996-05-01T00:00:00.000000000',
             '1997-03-01T00:00:00.000000000', '1997-04-01T00:00:00.000000000',
             '1997-05-01T00:00:00.000000000', '1998-03-01T00:00:00.000000000',
             '1998-04-01T00:00:00.000000000', '1998-05-01T00:00:00.000000000',
             '1999-03-01T00:00:00.000000000', '1999-04-01T00:00:00.000000000',
             '1999-05-01T00:00:00.000000000', '2000-03-01T00:00:00.000000000',
             '2000-04-01T00:00:00.000000000', '2000-05-01T00:00:00.000000000',
             '2001-03-01T00:00:00.000000000', '2001-04-01T00:00:00.000000000',
             '2001-05-01T00:00:00.000000000', '2002-03-01T00:00:00.000000000',
             '2002-04-01T00:00:00.000000000', '2002-05-01T00:00:00.000000000',
             '2003-03-01T00:00:00.000000000', '2003-04-01T00:00:00.000000000',
             '2003-05-01T00:00:00.000000000', '2004-03-01T00:00:00.000000000',
             '2004-04-01T00:00:00.000000000', '2004-05-01T00:00:00.000000000',
             '2005-03-01T00:00:00.000000000', '2005-04-01T00:00:00.000000000',
             '2005-05-01T00:00:00.000000000', '2006-03-01T00:00:00.000000000',
             '2006-04-01T00:00:00.000000000', '2006-05-01T00:00:00.000000000',
             '2007-03-01T00:00:00.000000000', '2007-04-01T00:00:00.000000000',
             '2007-05-01T00:00:00.000000000', '2008-03-01T00:00:00.000000000',
             '2008-04-01T00:00:00.000000000', '2008-05-01T00:00:00.000000000',
             '2009-03-01T00:00:00.000000000', '2009-04-01T00:00:00.000000000',
             '2009-05-01T00:00:00.000000000', '2010-03-01T00:00:00.000000000',
             '2010-04-01T00:00:00.000000000', '2010-05-01T00:00:00.000000000',
             '2011-03-01T00:00:00.000000000', '2011-04-01T00:00:00.000000000',
             '2011-05-01T00:00:00.000000000', '2012-03-01T00:00:00.000000000',
             '2012-04-01T00:00:00.000000000', '2012-05-01T00:00:00.000000000',
             '2013-03-01T00:00:00.000000000', '2013-04-01T00:00:00.000000000',
             '2013-05-01T00:00:00.000000000', '2014-03-01T00:00:00.000000000',
             '2014-04-01T00:00:00.000000000', '2014-05-01T00:00:00.000000000',
             '2015-03-01T00:00:00.000000000', '2015-04-01T00:00:00.000000000',
             '2015-05-01T00:00:00.000000000', '2016-03-01T00:00:00.000000000',
             '2016-04-01T00:00:00.000000000', '2016-05-01T00:00:00.000000000',
             '2017-03-01T00:00:00.000000000', '2017-04-01T00:00:00.000000000',
             '2017-05-01T00:00:00.000000000', '2018-03-01T00:00:00.000000000',
             '2018-04-01T00:00:00.000000000', '2018-05-01T00:00:00.000000000',
             '2019-03-01T00:00:00.000000000', '2019-04-01T00:00:00.000000000',
             '2019-05-01T00:00:00.000000000', '2020-03-01T00:00:00.000000000',
             '2020-04-01T00:00:00.000000000', '2020-05-01T00:00:00.000000000'],
            dtype='datetime64[ns]')
    • t2m
      (time, latitude, longitude)
      float32
      dask.array<chunksize=(3, 41, 132), meta=np.ndarray>
      units :
      K
      long_name :
      2 metre temperature
      Array Chunk
      Bytes 2.73 MB 64.94 kB
      Shape (126, 41, 132) (3, 41, 132)
      Count 126 Tasks 42 Chunks
      Type float32 numpy.ndarray
      132 41 126
    • d2m
      (time, latitude, longitude)
      float32
      dask.array<chunksize=(3, 41, 132), meta=np.ndarray>
      units :
      K
      long_name :
      2 metre dewpoint temperature
      Array Chunk
      Bytes 2.73 MB 64.94 kB
      Shape (126, 41, 132) (3, 41, 132)
      Count 126 Tasks 42 Chunks
      Type float32 numpy.ndarray
      132 41 126
  • Conventions :
    CF-1.6
    history :
    2020-09-08 13:26:08 GMT by grib_to_netcdf-2.16.0: /opt/ecmwf/eccodes/bin/grib_to_netcdf -S param -o /cache/data5/adaptor.mars.internal-1599571563.605006-1463-33-b77abfe4-0299-4cba-9b8c-c5a877f44943.nc /cache/tmp/b77abfe4-0299-4cba-9b8c-c5a877f44943-adaptor.mars.internal-1599571563.6055787-1463-13-tmp.grib
[13]:
ERA5_UK = xr.open_mfdataset('../UK_example/ERA5/ERA5_????.nc',combine='by_coords') ## open the data
ERA5_UK
[13]:
Show/Hide data repr Show/Hide attributes
xarray.Dataset
    • latitude: 11
    • longitude: 14
    • time: 42
    • latitude
      (latitude)
      float32
      60.0 59.0 58.0 ... 52.0 51.0 50.0
      units :
      degrees_north
      long_name :
      latitude
      array([60., 59., 58., 57., 56., 55., 54., 53., 52., 51., 50.], dtype=float32)
    • longitude
      (longitude)
      float32
      -11.0 -10.0 -9.0 ... 0.0 1.0 2.0
      units :
      degrees_east
      long_name :
      longitude
      array([-11., -10.,  -9.,  -8.,  -7.,  -6.,  -5.,  -4.,  -3.,  -2.,  -1.,   0.,
               1.,   2.], dtype=float32)
    • time
      (time)
      datetime64[ns]
      1979-02-01 ... 2020-02-01
      long_name :
      time
      array(['1979-02-01T00:00:00.000000000', '1980-02-01T00:00:00.000000000',
             '1981-02-01T00:00:00.000000000', '1982-02-01T00:00:00.000000000',
             '1983-02-01T00:00:00.000000000', '1984-02-01T00:00:00.000000000',
             '1985-02-01T00:00:00.000000000', '1986-02-01T00:00:00.000000000',
             '1987-02-01T00:00:00.000000000', '1988-02-01T00:00:00.000000000',
             '1989-02-01T00:00:00.000000000', '1990-02-01T00:00:00.000000000',
             '1991-02-01T00:00:00.000000000', '1992-02-01T00:00:00.000000000',
             '1993-02-01T00:00:00.000000000', '1994-02-01T00:00:00.000000000',
             '1995-02-01T00:00:00.000000000', '1996-02-01T00:00:00.000000000',
             '1997-02-01T00:00:00.000000000', '1998-02-01T00:00:00.000000000',
             '1999-02-01T00:00:00.000000000', '2000-02-01T00:00:00.000000000',
             '2001-02-01T00:00:00.000000000', '2002-02-01T00:00:00.000000000',
             '2003-02-01T00:00:00.000000000', '2004-02-01T00:00:00.000000000',
             '2005-02-01T00:00:00.000000000', '2006-02-01T00:00:00.000000000',
             '2007-02-01T00:00:00.000000000', '2008-02-01T00:00:00.000000000',
             '2009-02-01T00:00:00.000000000', '2010-02-01T00:00:00.000000000',
             '2011-02-01T00:00:00.000000000', '2012-02-01T00:00:00.000000000',
             '2013-02-01T00:00:00.000000000', '2014-02-01T00:00:00.000000000',
             '2015-02-01T00:00:00.000000000', '2016-02-01T00:00:00.000000000',
             '2017-02-01T00:00:00.000000000', '2018-02-01T00:00:00.000000000',
             '2019-02-01T00:00:00.000000000', '2020-02-01T00:00:00.000000000'],
            dtype='datetime64[ns]')
    • tp
      (time, latitude, longitude)
      float32
      dask.array<chunksize=(1, 11, 14), meta=np.ndarray>
      units :
      m
      long_name :
      Total precipitation
      Array Chunk
      Bytes 25.87 kB 616 B
      Shape (42, 11, 14) (1, 11, 14)
      Count 126 Tasks 42 Chunks
      Type float32 numpy.ndarray
      14 11 42
  • Conventions :
    CF-1.6
    history :
    2020-09-08 13:36:54 GMT by grib_to_netcdf-2.16.0: /opt/ecmwf/eccodes/bin/grib_to_netcdf -S param -o /cache/data6/adaptor.mars.internal-1599572211.936053-12058-17-60c5a89d-0507-4593-8870-aacff3b72426.nc /cache/tmp/60c5a89d-0507-4593-8870-aacff3b72426-adaptor.mars.internal-1599572211.9366653-12058-6-tmp.grib

Event definition

Time selection

For the UK, the event of interest is UK February average precipitation. Since we download monthly averages, we do not have to do any preprocessing along the time dimension here. For the Siberian heatwave, we are interested in the March-May average. Therefore we need to take the seasonal average of the monthly timeseries. We cannot take the simple mean of the three months, because they have a different number of days in the months, see this example. Therefore we take a weighted average:

[11]:
month_length = SEAS5_Siberia.time.dt.days_in_month
month_length
[11]:
Show/Hide data repr Show/Hide attributes
xarray.DataArray
'days_in_month'
  • time: 117
  • 31 30 31 31 30 31 31 30 31 31 30 ... 30 31 31 30 31 31 30 31 31 30 31
    array([31, 30, 31, 31, 30, 31, 31, 30, 31, 31, 30, 31, 31, 30, 31, 31, 30,
           31, 31, 30, 31, 31, 30, 31, 31, 30, 31, 31, 30, 31, 31, 30, 31, 31,
           30, 31, 31, 30, 31, 31, 30, 31, 31, 30, 31, 31, 30, 31, 31, 30, 31,
           31, 30, 31, 31, 30, 31, 31, 30, 31, 31, 30, 31, 31, 30, 31, 31, 30,
           31, 31, 30, 31, 31, 30, 31, 31, 30, 31, 31, 30, 31, 31, 30, 31, 31,
           30, 31, 31, 30, 31, 31, 30, 31, 31, 30, 31, 31, 30, 31, 31, 30, 31,
           31, 30, 31, 31, 30, 31, 31, 30, 31, 31, 30, 31, 31, 30, 31])
    • time
      (time)
      datetime64[ns]
      1982-03-01 ... 2020-05-01
      long_name :
      time
      array(['1982-03-01T00:00:00.000000000', '1982-04-01T00:00:00.000000000',
             '1982-05-01T00:00:00.000000000', '1983-03-01T00:00:00.000000000',
             '1983-04-01T00:00:00.000000000', '1983-05-01T00:00:00.000000000',
             '1984-03-01T00:00:00.000000000', '1984-04-01T00:00:00.000000000',
             '1984-05-01T00:00:00.000000000', '1985-03-01T00:00:00.000000000',
             '1985-04-01T00:00:00.000000000', '1985-05-01T00:00:00.000000000',
             '1986-03-01T00:00:00.000000000', '1986-04-01T00:00:00.000000000',
             '1986-05-01T00:00:00.000000000', '1987-03-01T00:00:00.000000000',
             '1987-04-01T00:00:00.000000000', '1987-05-01T00:00:00.000000000',
             '1988-03-01T00:00:00.000000000', '1988-04-01T00:00:00.000000000',
             '1988-05-01T00:00:00.000000000', '1989-03-01T00:00:00.000000000',
             '1989-04-01T00:00:00.000000000', '1989-05-01T00:00:00.000000000',
             '1990-03-01T00:00:00.000000000', '1990-04-01T00:00:00.000000000',
             '1990-05-01T00:00:00.000000000', '1991-03-01T00:00:00.000000000',
             '1991-04-01T00:00:00.000000000', '1991-05-01T00:00:00.000000000',
             '1992-03-01T00:00:00.000000000', '1992-04-01T00:00:00.000000000',
             '1992-05-01T00:00:00.000000000', '1993-03-01T00:00:00.000000000',
             '1993-04-01T00:00:00.000000000', '1993-05-01T00:00:00.000000000',
             '1994-03-01T00:00:00.000000000', '1994-04-01T00:00:00.000000000',
             '1994-05-01T00:00:00.000000000', '1995-03-01T00:00:00.000000000',
             '1995-04-01T00:00:00.000000000', '1995-05-01T00:00:00.000000000',
             '1996-03-01T00:00:00.000000000', '1996-04-01T00:00:00.000000000',
             '1996-05-01T00:00:00.000000000', '1997-03-01T00:00:00.000000000',
             '1997-04-01T00:00:00.000000000', '1997-05-01T00:00:00.000000000',
             '1998-03-01T00:00:00.000000000', '1998-04-01T00:00:00.000000000',
             '1998-05-01T00:00:00.000000000', '1999-03-01T00:00:00.000000000',
             '1999-04-01T00:00:00.000000000', '1999-05-01T00:00:00.000000000',
             '2000-03-01T00:00:00.000000000', '2000-04-01T00:00:00.000000000',
             '2000-05-01T00:00:00.000000000', '2001-03-01T00:00:00.000000000',
             '2001-04-01T00:00:00.000000000', '2001-05-01T00:00:00.000000000',
             '2002-03-01T00:00:00.000000000', '2002-04-01T00:00:00.000000000',
             '2002-05-01T00:00:00.000000000', '2003-03-01T00:00:00.000000000',
             '2003-04-01T00:00:00.000000000', '2003-05-01T00:00:00.000000000',
             '2004-03-01T00:00:00.000000000', '2004-04-01T00:00:00.000000000',
             '2004-05-01T00:00:00.000000000', '2005-03-01T00:00:00.000000000',
             '2005-04-01T00:00:00.000000000', '2005-05-01T00:00:00.000000000',
             '2006-03-01T00:00:00.000000000', '2006-04-01T00:00:00.000000000',
             '2006-05-01T00:00:00.000000000', '2007-03-01T00:00:00.000000000',
             '2007-04-01T00:00:00.000000000', '2007-05-01T00:00:00.000000000',
             '2008-03-01T00:00:00.000000000', '2008-04-01T00:00:00.000000000',
             '2008-05-01T00:00:00.000000000', '2009-03-01T00:00:00.000000000',
             '2009-04-01T00:00:00.000000000', '2009-05-01T00:00:00.000000000',
             '2010-03-01T00:00:00.000000000', '2010-04-01T00:00:00.000000000',
             '2010-05-01T00:00:00.000000000', '2011-03-01T00:00:00.000000000',
             '2011-04-01T00:00:00.000000000', '2011-05-01T00:00:00.000000000',
             '2012-03-01T00:00:00.000000000', '2012-04-01T00:00:00.000000000',
             '2012-05-01T00:00:00.000000000', '2013-03-01T00:00:00.000000000',
             '2013-04-01T00:00:00.000000000', '2013-05-01T00:00:00.000000000',
             '2014-03-01T00:00:00.000000000', '2014-04-01T00:00:00.000000000',
             '2014-05-01T00:00:00.000000000', '2015-03-01T00:00:00.000000000',
             '2015-04-01T00:00:00.000000000', '2015-05-01T00:00:00.000000000',
             '2016-03-01T00:00:00.000000000', '2016-04-01T00:00:00.000000000',
             '2016-05-01T00:00:00.000000000', '2017-03-01T00:00:00.000000000',
             '2017-04-01T00:00:00.000000000', '2017-05-01T00:00:00.000000000',
             '2018-03-01T00:00:00.000000000', '2018-04-01T00:00:00.000000000',
             '2018-05-01T00:00:00.000000000', '2019-03-01T00:00:00.000000000',
             '2019-04-01T00:00:00.000000000', '2019-05-01T00:00:00.000000000',
             '2020-03-01T00:00:00.000000000', '2020-04-01T00:00:00.000000000',
             '2020-05-01T00:00:00.000000000'], dtype='datetime64[ns]')
[12]:
# Calculate the weights by grouping by 'time.season'.
weights = month_length.groupby('time.year') / month_length.groupby('time.year').sum()
weights
[12]:
Show/Hide data repr Show/Hide attributes
xarray.DataArray
'days_in_month'
  • time: 117
  • 0.337 0.3261 0.337 0.337 0.3261 ... 0.3261 0.337 0.337 0.3261 0.337
    array([0.33695652, 0.32608696, 0.33695652, 0.33695652, 0.32608696,
           0.33695652, 0.33695652, 0.32608696, 0.33695652, 0.33695652,
           0.32608696, 0.33695652, 0.33695652, 0.32608696, 0.33695652,
           0.33695652, 0.32608696, 0.33695652, 0.33695652, 0.32608696,
           0.33695652, 0.33695652, 0.32608696, 0.33695652, 0.33695652,
           0.32608696, 0.33695652, 0.33695652, 0.32608696, 0.33695652,
           0.33695652, 0.32608696, 0.33695652, 0.33695652, 0.32608696,
           0.33695652, 0.33695652, 0.32608696, 0.33695652, 0.33695652,
           0.32608696, 0.33695652, 0.33695652, 0.32608696, 0.33695652,
           0.33695652, 0.32608696, 0.33695652, 0.33695652, 0.32608696,
           0.33695652, 0.33695652, 0.32608696, 0.33695652, 0.33695652,
           0.32608696, 0.33695652, 0.33695652, 0.32608696, 0.33695652,
           0.33695652, 0.32608696, 0.33695652, 0.33695652, 0.32608696,
           0.33695652, 0.33695652, 0.32608696, 0.33695652, 0.33695652,
           0.32608696, 0.33695652, 0.33695652, 0.32608696, 0.33695652,
           0.33695652, 0.32608696, 0.33695652, 0.33695652, 0.32608696,
           0.33695652, 0.33695652, 0.32608696, 0.33695652, 0.33695652,
           0.32608696, 0.33695652, 0.33695652, 0.32608696, 0.33695652,
           0.33695652, 0.32608696, 0.33695652, 0.33695652, 0.32608696,
           0.33695652, 0.33695652, 0.32608696, 0.33695652, 0.33695652,
           0.32608696, 0.33695652, 0.33695652, 0.32608696, 0.33695652,
           0.33695652, 0.32608696, 0.33695652, 0.33695652, 0.32608696,
           0.33695652, 0.33695652, 0.32608696, 0.33695652, 0.33695652,
           0.32608696, 0.33695652])
    • time
      (time)
      datetime64[ns]
      1982-03-01 ... 2020-05-01
      long_name :
      time
      array(['1982-03-01T00:00:00.000000000', '1982-04-01T00:00:00.000000000',
             '1982-05-01T00:00:00.000000000', '1983-03-01T00:00:00.000000000',
             '1983-04-01T00:00:00.000000000', '1983-05-01T00:00:00.000000000',
             '1984-03-01T00:00:00.000000000', '1984-04-01T00:00:00.000000000',
             '1984-05-01T00:00:00.000000000', '1985-03-01T00:00:00.000000000',
             '1985-04-01T00:00:00.000000000', '1985-05-01T00:00:00.000000000',
             '1986-03-01T00:00:00.000000000', '1986-04-01T00:00:00.000000000',
             '1986-05-01T00:00:00.000000000', '1987-03-01T00:00:00.000000000',
             '1987-04-01T00:00:00.000000000', '1987-05-01T00:00:00.000000000',
             '1988-03-01T00:00:00.000000000', '1988-04-01T00:00:00.000000000',
             '1988-05-01T00:00:00.000000000', '1989-03-01T00:00:00.000000000',
             '1989-04-01T00:00:00.000000000', '1989-05-01T00:00:00.000000000',
             '1990-03-01T00:00:00.000000000', '1990-04-01T00:00:00.000000000',
             '1990-05-01T00:00:00.000000000', '1991-03-01T00:00:00.000000000',
             '1991-04-01T00:00:00.000000000', '1991-05-01T00:00:00.000000000',
             '1992-03-01T00:00:00.000000000', '1992-04-01T00:00:00.000000000',
             '1992-05-01T00:00:00.000000000', '1993-03-01T00:00:00.000000000',
             '1993-04-01T00:00:00.000000000', '1993-05-01T00:00:00.000000000',
             '1994-03-01T00:00:00.000000000', '1994-04-01T00:00:00.000000000',
             '1994-05-01T00:00:00.000000000', '1995-03-01T00:00:00.000000000',
             '1995-04-01T00:00:00.000000000', '1995-05-01T00:00:00.000000000',
             '1996-03-01T00:00:00.000000000', '1996-04-01T00:00:00.000000000',
             '1996-05-01T00:00:00.000000000', '1997-03-01T00:00:00.000000000',
             '1997-04-01T00:00:00.000000000', '1997-05-01T00:00:00.000000000',
             '1998-03-01T00:00:00.000000000', '1998-04-01T00:00:00.000000000',
             '1998-05-01T00:00:00.000000000', '1999-03-01T00:00:00.000000000',
             '1999-04-01T00:00:00.000000000', '1999-05-01T00:00:00.000000000',
             '2000-03-01T00:00:00.000000000', '2000-04-01T00:00:00.000000000',
             '2000-05-01T00:00:00.000000000', '2001-03-01T00:00:00.000000000',
             '2001-04-01T00:00:00.000000000', '2001-05-01T00:00:00.000000000',
             '2002-03-01T00:00:00.000000000', '2002-04-01T00:00:00.000000000',
             '2002-05-01T00:00:00.000000000', '2003-03-01T00:00:00.000000000',
             '2003-04-01T00:00:00.000000000', '2003-05-01T00:00:00.000000000',
             '2004-03-01T00:00:00.000000000', '2004-04-01T00:00:00.000000000',
             '2004-05-01T00:00:00.000000000', '2005-03-01T00:00:00.000000000',
             '2005-04-01T00:00:00.000000000', '2005-05-01T00:00:00.000000000',
             '2006-03-01T00:00:00.000000000', '2006-04-01T00:00:00.000000000',
             '2006-05-01T00:00:00.000000000', '2007-03-01T00:00:00.000000000',
             '2007-04-01T00:00:00.000000000', '2007-05-01T00:00:00.000000000',
             '2008-03-01T00:00:00.000000000', '2008-04-01T00:00:00.000000000',
             '2008-05-01T00:00:00.000000000', '2009-03-01T00:00:00.000000000',
             '2009-04-01T00:00:00.000000000', '2009-05-01T00:00:00.000000000',
             '2010-03-01T00:00:00.000000000', '2010-04-01T00:00:00.000000000',
             '2010-05-01T00:00:00.000000000', '2011-03-01T00:00:00.000000000',
             '2011-04-01T00:00:00.000000000', '2011-05-01T00:00:00.000000000',
             '2012-03-01T00:00:00.000000000', '2012-04-01T00:00:00.000000000',
             '2012-05-01T00:00:00.000000000', '2013-03-01T00:00:00.000000000',
             '2013-04-01T00:00:00.000000000', '2013-05-01T00:00:00.000000000',
             '2014-03-01T00:00:00.000000000', '2014-04-01T00:00:00.000000000',
             '2014-05-01T00:00:00.000000000', '2015-03-01T00:00:00.000000000',
             '2015-04-01T00:00:00.000000000', '2015-05-01T00:00:00.000000000',
             '2016-03-01T00:00:00.000000000', '2016-04-01T00:00:00.000000000',
             '2016-05-01T00:00:00.000000000', '2017-03-01T00:00:00.000000000',
             '2017-04-01T00:00:00.000000000', '2017-05-01T00:00:00.000000000',
             '2018-03-01T00:00:00.000000000', '2018-04-01T00:00:00.000000000',
             '2018-05-01T00:00:00.000000000', '2019-03-01T00:00:00.000000000',
             '2019-04-01T00:00:00.000000000', '2019-05-01T00:00:00.000000000',
             '2020-03-01T00:00:00.000000000', '2020-04-01T00:00:00.000000000',
             '2020-05-01T00:00:00.000000000'], dtype='datetime64[ns]')
    • year
      (time)
      int64
      1982 1982 1982 ... 2020 2020 2020
      array([1982, 1982, 1982, 1983, 1983, 1983, 1984, 1984, 1984, 1985, 1985,
             1985, 1986, 1986, 1986, 1987, 1987, 1987, 1988, 1988, 1988, 1989,
             1989, 1989, 1990, 1990, 1990, 1991, 1991, 1991, 1992, 1992, 1992,
             1993, 1993, 1993, 1994, 1994, 1994, 1995, 1995, 1995, 1996, 1996,
             1996, 1997, 1997, 1997, 1998, 1998, 1998, 1999, 1999, 1999, 2000,
             2000, 2000, 2001, 2001, 2001, 2002, 2002, 2002, 2003, 2003, 2003,
             2004, 2004, 2004, 2005, 2005, 2005, 2006, 2006, 2006, 2007, 2007,
             2007, 2008, 2008, 2008, 2009, 2009, 2009, 2010, 2010, 2010, 2011,
             2011, 2011, 2012, 2012, 2012, 2013, 2013, 2013, 2014, 2014, 2014,
             2015, 2015, 2015, 2016, 2016, 2016, 2017, 2017, 2017, 2018, 2018,
             2018, 2019, 2019, 2019, 2020, 2020, 2020])
[13]:
# Test that the sum of the weights for the season is 1.0
np.testing.assert_allclose(weights.groupby('time.year').sum().values, np.ones(39)) ## the weight is one for each year
[14]:
# Calculate the weighted average
SEAS5_Siberia_weighted = (SEAS5_Siberia * weights).groupby('time.year').sum(dim='time', min_count = 3)
SEAS5_Siberia_weighted
[14]:
Show/Hide data repr Show/Hide attributes
xarray.Dataset
    • latitude: 41
    • leadtime: 3
    • longitude: 132
    • number: 51
    • year: 39
    • longitude
      (longitude)
      float32
      -11.0 -10.0 -9.0 ... 119.0 120.0
      units :
      degrees_east
      long_name :
      longitude
      array([-11., -10.,  -9.,  -8.,  -7.,  -6.,  -5.,  -4.,  -3.,  -2.,  -1.,   0.,
               1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,  10.,  11.,  12.,
              13.,  14.,  15.,  16.,  17.,  18.,  19.,  20.,  21.,  22.,  23.,  24.,
              25.,  26.,  27.,  28.,  29.,  30.,  31.,  32.,  33.,  34.,  35.,  36.,
              37.,  38.,  39.,  40.,  41.,  42.,  43.,  44.,  45.,  46.,  47.,  48.,
              49.,  50.,  51.,  52.,  53.,  54.,  55.,  56.,  57.,  58.,  59.,  60.,
              61.,  62.,  63.,  64.,  65.,  66.,  67.,  68.,  69.,  70.,  71.,  72.,
              73.,  74.,  75.,  76.,  77.,  78.,  79.,  80.,  81.,  82.,  83.,  84.,
              85.,  86.,  87.,  88.,  89.,  90.,  91.,  92.,  93.,  94.,  95.,  96.,
              97.,  98.,  99., 100., 101., 102., 103., 104., 105., 106., 107., 108.,
             109., 110., 111., 112., 113., 114., 115., 116., 117., 118., 119., 120.],
            dtype=float32)
    • leadtime
      (leadtime)
      int64
      2 3 4
      array([2, 3, 4])
    • latitude
      (latitude)
      float32
      70.0 69.0 68.0 ... 32.0 31.0 30.0
      units :
      degrees_north
      long_name :
      latitude
      array([70., 69., 68., 67., 66., 65., 64., 63., 62., 61., 60., 59., 58., 57.,
             56., 55., 54., 53., 52., 51., 50., 49., 48., 47., 46., 45., 44., 43.,
             42., 41., 40., 39., 38., 37., 36., 35., 34., 33., 32., 31., 30.],
            dtype=float32)
    • number
      (number)
      int64
      0 1 2 3 4 5 6 ... 45 46 47 48 49 50
      long_name :
      ensemble_member
      array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
             18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
             36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50])
    • year
      (year)
      int64
      1982 1983 1984 ... 2018 2019 2020
      array([1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993,
             1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
             2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017,
             2018, 2019, 2020])
    • t2m
      (year, leadtime, number, latitude, longitude)
      float64
      dask.array<chunksize=(1, 1, 51, 41, 132), meta=np.ndarray>
      Array Chunk
      Bytes 258.35 MB 2.21 MB
      Shape (39, 3, 51, 41, 132) (1, 1, 51, 41, 132)
      Count 2550 Tasks 117 Chunks
      Type float64 numpy.ndarray
      3 39 132 41 51
    • d2m
      (year, leadtime, number, latitude, longitude)
      float64
      dask.array<chunksize=(1, 1, 51, 41, 132), meta=np.ndarray>
      Array Chunk
      Bytes 258.35 MB 2.21 MB
      Shape (39, 3, 51, 41, 132) (1, 1, 51, 41, 132)
      Count 2550 Tasks 117 Chunks
      Type float64 numpy.ndarray
      3 39 132 41 51

Or as function:

[15]:
def season_mean(ds, years, calendar='standard'):
    # Make a DataArray with the number of days in each month, size = len(time)
    month_length = ds.time.dt.days_in_month

    # Calculate the weights by grouping by 'time.season'
    weights = month_length.groupby('time.year') / month_length.groupby('time.year').sum()

    # Test that the sum of the weights for each season is 1.0
    np.testing.assert_allclose(weights.groupby('time.year').sum().values, np.ones(years))

    # Calculate the weighted average
    return (ds * weights).groupby('time.year').sum(dim='time', min_count = 3)
[16]:
ERA5_Siberia_weighted = season_mean(ERA5_Siberia, years = 42)
ERA5_Siberia_weighted
[16]:
Show/Hide data repr Show/Hide attributes
xarray.Dataset
    • latitude: 41
    • longitude: 132
    • year: 42
    • longitude
      (longitude)
      float32
      -11.0 -10.0 -9.0 ... 119.0 120.0
      units :
      degrees_east
      long_name :
      longitude
      array([-11., -10.,  -9.,  -8.,  -7.,  -6.,  -5.,  -4.,  -3.,  -2.,  -1.,   0.,
               1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,  10.,  11.,  12.,
              13.,  14.,  15.,  16.,  17.,  18.,  19.,  20.,  21.,  22.,  23.,  24.,
              25.,  26.,  27.,  28.,  29.,  30.,  31.,  32.,  33.,  34.,  35.,  36.,
              37.,  38.,  39.,  40.,  41.,  42.,  43.,  44.,  45.,  46.,  47.,  48.,
              49.,  50.,  51.,  52.,  53.,  54.,  55.,  56.,  57.,  58.,  59.,  60.,
              61.,  62.,  63.,  64.,  65.,  66.,  67.,  68.,  69.,  70.,  71.,  72.,
              73.,  74.,  75.,  76.,  77.,  78.,  79.,  80.,  81.,  82.,  83.,  84.,
              85.,  86.,  87.,  88.,  89.,  90.,  91.,  92.,  93.,  94.,  95.,  96.,
              97.,  98.,  99., 100., 101., 102., 103., 104., 105., 106., 107., 108.,
             109., 110., 111., 112., 113., 114., 115., 116., 117., 118., 119., 120.],
            dtype=float32)
    • latitude
      (latitude)
      float32
      70.0 69.0 68.0 ... 32.0 31.0 30.0
      units :
      degrees_north
      long_name :
      latitude
      array([70., 69., 68., 67., 66., 65., 64., 63., 62., 61., 60., 59., 58., 57.,
             56., 55., 54., 53., 52., 51., 50., 49., 48., 47., 46., 45., 44., 43.,
             42., 41., 40., 39., 38., 37., 36., 35., 34., 33., 32., 31., 30.],
            dtype=float32)
    • year
      (year)
      int64
      1979 1980 1981 ... 2018 2019 2020
      array([1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990,
             1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002,
             2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014,
             2015, 2016, 2017, 2018, 2019, 2020])
    • t2m
      (year, latitude, longitude)
      float64
      dask.array<chunksize=(1, 41, 132), meta=np.ndarray>
      Array Chunk
      Bytes 1.82 MB 43.30 kB
      Shape (42, 41, 132) (1, 41, 132)
      Count 547 Tasks 42 Chunks
      Type float64 numpy.ndarray
      132 41 42
    • d2m
      (year, latitude, longitude)
      float64
      dask.array<chunksize=(1, 41, 132), meta=np.ndarray>
      Array Chunk
      Bytes 1.82 MB 43.30 kB
      Shape (42, 41, 132) (1, 41, 132)
      Count 547 Tasks 42 Chunks
      Type float64 numpy.ndarray
      132 41 42

What is the difference between the mean and weighted mean?

Barely visible the difference

[17]:
ERA5_Siberia_weighted['t2m'].mean(['longitude', 'latitude']).plot()
ERA5_Siberia['t2m'].groupby('time.year').mean().mean(['longitude','latitude']).plot()
[17]:
[<matplotlib.lines.Line2D at 0x7fa13055f970>]
[17]:
[<matplotlib.lines.Line2D at 0x7fa130531be0>]
_images/Notebooks_2.Preprocess_2.Preprocess_25_2.png
Spatial selection

What spatial extent defines the event you are analyzing? The easiest option is to select a lat-lon box, like we did for the Siberian heatwave example (i.e we average the temperature over 50-70N, 65-120E, also used here).

In case you want to specify another domain than a lat-lon box, you could mask the datasets. For the California Fires example, we select the domain with high temperature anomalies (>2 standard deviation), see California_august_temperature_anomaly. For the UK example, we want a country-averaged timeseries instead of a box. In this case, we use another observational product: the EOBS dataset that covers Europe. We upscale this dataset to the same resolution as SEAS5 and create a mask to take the spatial average over the UK, see Using EOBS + upscaling.

We have to take the latitude-weighted average, since since grid cell area decreases with latitude. We first take the ‘normal’ average:

[36]:
ERA5_Siberia_events_zoomed = (
    ERA5_Siberia_weighted['t2m'].sel(  # Select 2 metre temperature
        latitude=slice(70, 50),        # Select the latitudes
        longitude=slice(65, 120)).    # Select the longitude
    mean(['longitude', 'latitude']))

And we repeat this for the SEAS5 events

[37]:
SEAS5_Siberia_events = (
    SEAS5_Siberia_weighted['t2m'].sel(
        latitude=slice(70, 30),
        longitude=slice(-11, 120)).
    mean(['longitude', 'latitude']))
SEAS5_Siberia_events.load()
/soge-home/users/cenv0732/.conda/envs/UNSEEN-open/lib/python3.8/site-packages/xarray/core/nanops.py:142: RuntimeWarning: Mean of empty slice
  return np.nanmean(a, axis=axis, dtype=dtype)
[37]:
Show/Hide data repr Show/Hide attributes
xarray.DataArray
't2m'
  • year: 39
  • leadtime: 3
  • number: 51
  • 277.7 277.1 277.1 276.4 277.6 277.0 ... 277.9 277.8 279.0 278.8 278.2
    array([[[277.70246026, 277.08011538, 277.05805243, ...,          nan,
                      nan,          nan],
            [276.54063304, 277.63527276, 276.53540684, ...,          nan,
                      nan,          nan],
            [276.94382457, 277.22540106, 277.25375804, ...,          nan,
                      nan,          nan]],
    
           [[276.68638666, 276.64418409, 276.88169219, ...,          nan,
                      nan,          nan],
            [277.06362955, 277.30470221, 276.49967939, ...,          nan,
                      nan,          nan],
            [276.37166345, 276.63563118, 277.05456392, ...,          nan,
                      nan,          nan]],
    
           [[277.53103277, 277.49691758, 278.32115366, ...,          nan,
                      nan,          nan],
            [277.53911427, 278.11393678, 277.66278741, ...,          nan,
                      nan,          nan],
            [278.24589375, 277.71341079, 277.3067712 , ...,          nan,
                      nan,          nan]],
    
           ...,
    
           [[278.78906418, 278.0626699 , 278.09438675, ..., 278.10947691,
             278.27620377, 278.18620179],
            [278.65929139, 277.33954004, 278.65951576, ..., 278.22959696,
             278.32163068, 278.94724957],
            [278.90689211, 277.9030209 , 279.13818072, ..., 278.76767259,
             279.13397914, 277.76992423]],
    
           [[278.6218426 , 277.95006232, 278.22900254, ..., 277.56330007,
             277.99480916, 277.66857676],
            [278.39808792, 277.65889255, 277.92266928, ..., 278.39390445,
             277.90353039, 278.01793147],
            [278.63429219, 278.11630486, 278.58465727, ..., 277.7081865 ,
             277.73949614, 278.65482764]],
    
           [[279.28124227, 278.53474142, 278.47436518, ..., 278.93209185,
             278.11716261, 279.3904762 ],
            [277.77935773, 279.15571385, 279.02168652, ..., 279.25803913,
             278.8991169 , 278.72803803],
            [278.54721722, 278.25816177, 279.65502139, ..., 279.01242149,
             278.80174459, 278.23615329]]])
    • leadtime
      (leadtime)
      int64
      2 3 4
      array([2, 3, 4])
    • number
      (number)
      int64
      0 1 2 3 4 5 6 ... 45 46 47 48 49 50
      long_name :
      ensemble_member
      array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
             18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
             36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50])
    • year
      (year)
      int64
      1982 1983 1984 ... 2018 2019 2020
      array([1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993,
             1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
             2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017,
             2018, 2019, 2020])
[39]:
SEAS5_Siberia_events_zoomed = (
    SEAS5_Siberia_weighted['t2m'].sel(
        latitude=slice(70, 50),
        longitude=slice(65, 120)).
    mean(['longitude', 'latitude']))
SEAS5_Siberia_events_zoomed.load()
[39]:
Show/Hide data repr Show/Hide attributes
xarray.DataArray
't2m'
  • year: 39
  • leadtime: 3
  • number: 51
  • 269.4 267.5 268.9 266.8 269.6 267.9 ... 268.2 270.0 270.0 269.8 267.2
    array([[[269.41349502, 267.46724112, 268.92858923, ...,          nan,
                      nan,          nan],
            [267.30541714, 269.45786213, 267.88083134, ...,          nan,
                      nan,          nan],
            [266.70463988, 269.36559057, 267.85380896, ...,          nan,
                      nan,          nan]],
    
           [[267.65255078, 267.85459971, 267.93397041, ...,          nan,
                      nan,          nan],
            [267.86908721, 269.16875401, 266.25505375, ...,          nan,
                      nan,          nan],
            [267.37705396, 268.0216673 , 269.216725  , ...,          nan,
                      nan,          nan]],
    
           [[269.11559244, 270.30993792, 268.97992022, ...,          nan,
                      nan,          nan],
            [268.42632866, 270.23730451, 268.14872887, ...,          nan,
                      nan,          nan],
            [269.91675564, 269.37623754, 268.62430776, ...,          nan,
                      nan,          nan]],
    
           ...,
    
           [[270.1395106 , 269.2763681 , 269.63408345, ..., 267.98167445,
             270.99148119, 269.2334804 ],
            [269.47838693, 267.62436995, 270.46814617, ..., 269.60061317,
             269.2030878 , 270.34174847],
            [271.39762301, 268.06102893, 271.27933216, ..., 269.56866515,
             270.64943148, 266.36754614]],
    
           [[269.32829344, 268.54758963, 269.10385302, ..., 268.03762412,
             269.16690743, 268.41760566],
            [269.09579833, 268.27844902, 268.80430384, ..., 269.38437353,
             269.58112924, 269.15390309],
            [269.2755198 , 269.28402279, 270.34429505, ..., 268.05044769,
             269.01128231, 270.09822017]],
    
           [[271.35447849, 269.73507649, 269.16852164, ..., 270.77523033,
             268.30803885, 271.87237937],
            [269.33923167, 270.81122764, 269.97102764, ..., 272.56708782,
             270.69195734, 269.90622653],
            [270.37695877, 269.72247595, 272.524012  , ..., 270.04026793,
             269.81585811, 267.24061429]]])
    • leadtime
      (leadtime)
      int64
      2 3 4
      array([2, 3, 4])
    • number
      (number)
      int64
      0 1 2 3 4 5 6 ... 45 46 47 48 49 50
      long_name :
      ensemble_member
      array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
             18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
             36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50])
    • year
      (year)
      int64
      1982 1983 1984 ... 2018 2019 2020
      array([1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993,
             1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
             2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017,
             2018, 2019, 2020])
[40]:
SEAS5_Siberia_events.to_dataframe().to_csv('Data/SEAS5_Siberia_events.csv')
ERA5_Siberia_events.to_dataframe().to_csv('Data/ERA5_Siberia_events.csv')
[41]:
SEAS5_Siberia_events_zoomed.to_dataframe().to_csv('Data/SEAS5_Siberia_events_zoomed.csv')
ERA5_Siberia_events_zoomed.to_dataframe().to_csv('Data/ERA5_Siberia_events_zoomed.csv')

Evaluate

Can seasonal forecasts be used as ‘alternate’ realities? Here we show how a set of evaluation metrics can be used to answer this question. The evaluation metrics are available through an R package for easy evaluation of the UNSEEN ensemble. Here, we illustrate how this package can be used in the UNSEEN workflow. We will evaluate the generated UNSEEN ensemble of UK February precipitation and of MAM Siberian heatwaves.

The framework to evaluate the UNSEEN ensemble presented here consists of testing the ensemble member independence, model stability and model fidelity, see also NPJ paper.

Note

This is R code and not python!

We switch to R since we believe R has a better functionality in extreme value statistics.

We load the UNSEEN package and read in the data.

[2]:
library(UNSEEN)

The data that is imported here are the files stored at the end of the preprocessing step.

[3]:
SEAS5_Siberia_events <- read.csv("Data/SEAS5_Siberia_events.csv", stringsAsFactors=FALSE)
ERA5_Siberia_events <- read.csv("Data/ERA5_Siberia_events.csv", stringsAsFactors=FALSE)
[4]:
SEAS5_Siberia_events_zoomed <- read.csv("Data/SEAS5_Siberia_events_zoomed.csv", stringsAsFactors=FALSE)
ERA5_Siberia_events_zoomed <- read.csv("Data/ERA5_Siberia_events_zoomed.csv", stringsAsFactors=FALSE)
[5]:
SEAS5_Siberia_events$t2m <- SEAS5_Siberia_events$t2m - 273.15
ERA5_Siberia_events$t2m <- ERA5_Siberia_events$t2m - 273.15
SEAS5_Siberia_events_zoomed$t2m <- SEAS5_Siberia_events_zoomed$t2m - 273.15
ERA5_Siberia_events_zoomed$t2m <- ERA5_Siberia_events_zoomed$t2m - 273.15

[6]:
head(SEAS5_Siberia_events_zoomed,n = 3)
head(ERA5_Siberia_events, n = 3)
A data.frame: 3 × 4
yearleadtimenumbert2m
<int><int><int><dbl>
1198220-3.736505
2198221-5.682759
3198222-4.221411
A data.frame: 3 × 2
yeart2m
<int><dbl>
119794.010750
219803.880965
319814.822891
[7]:
EOBS_UK_weighted_df <- read.csv("Data/EOBS_UK_weighted_upscaled.csv", stringsAsFactors=FALSE)
SEAS5_UK_weighted_df <- read.csv("Data/SEAS5_UK_weighted_masked.csv", stringsAsFactors=FALSE)

And then convert the time class to Date format, with the ymd function in lubridate:

[8]:
EOBS_UK_weighted_df$time <- lubridate::ymd(EOBS_UK_weighted_df$time)
str(EOBS_UK_weighted_df)

EOBS_UK_weighted_df_hindcast <- EOBS_UK_weighted_df[
    EOBS_UK_weighted_df$time > '1982-02-01' &
    EOBS_UK_weighted_df$time < '2017-02-01',
    ]


SEAS5_UK_weighted_df$time <- lubridate::ymd(SEAS5_UK_weighted_df$time)
str(SEAS5_UK_weighted_df)
'data.frame':   71 obs. of  2 variables:
 $ time: Date, format: "1950-02-28" "1951-02-28" ...
 $ rr  : num  4.13 3.25 1.07 1.59 2.59 ...
'data.frame':   9945 obs. of  4 variables:
 $ leadtime: int  2 2 2 2 2 2 2 2 2 2 ...
 $ number  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ time    : Date, format: "1982-02-01" "1983-02-01" ...
 $ tprate  : num  1.62 2.93 3.27 2 3.31 ...

Timeseries

Here we plot the timeseries of SEAS5 (UNSEEN) and ERA5 (OBS) for the Siberian Heatwave.

[10]:
unseen_timeseries(
    ensemble = SEAS5_Siberia_events_zoomed,
    obs = ERA5_Siberia_events_zoomed,
    ensemble_yname = "t2m",
    ensemble_xname = "year",
    obs_yname = "t2m",
    obs_xname = "year",
    ylab = "MAM Siberian temperature (C)")
Warning message:
“Removed 2756 rows containing non-finite values (stat_boxplot).”
_images/Notebooks_3.Evaluate_3.Evaluate_14_1.png

The timeseries consist of hindcast (years 1982-2016) and archived forecasts (years 2017-2020). The datasets are slightly different: the hindcasts contains 25 members whereas operational forecasts contain 51 members, the native resolution is different and the dataset from which the forecasts are initialized is different.

For the evaluation of the UNSEEN ensemble we want to only use the SEAS5 hindcasts for a consistent dataset. Note, 2017 is not used in either the hindcast nor the operational dataset in this example, since it contains forecasts both initialized in 2016 (hindcast) and 2017 (forecast), see retrieve. We split SEAS5 into hindcast and operational forecasts:

[11]:
SEAS5_Siberia_events_zoomed_hindcast <- SEAS5_Siberia_events_zoomed[
    SEAS5_Siberia_events_zoomed$year < 2017 &
    SEAS5_Siberia_events_zoomed$number < 25,]

SEAS5_Siberia_events_zoomed_forecasts <- SEAS5_Siberia_events_zoomed[
    SEAS5_Siberia_events_zoomed$year > 2017,]

And we select the same years for ERA5.

[12]:
ERA5_Siberia_events_zoomed_hindcast <- ERA5_Siberia_events_zoomed[
    ERA5_Siberia_events_zoomed$year < 2017 &
    ERA5_Siberia_events_zoomed$year > 1981,]
[13]:
unseen_timeseries(
    ensemble = SEAS5_Siberia_events_zoomed_hindcast,
    obs = ERA5_Siberia_events_zoomed_hindcast,
    ensemble_yname = "t2m",
    ensemble_xname = "year",
    obs_yname = "t2m",
    obs_xname = "year",
    ylab = "MAM Siberian temperature")
_images/Notebooks_3.Evaluate_3.Evaluate_19_0.png
[14]:
unseen_timeseries(
    ensemble = SEAS5_Siberia_events_zoomed_forecasts,
    obs = ERA5_Siberia_events_zoomed[ERA5_Siberia_events_zoomed$year > 2017,],
    ensemble_yname = "t2m",
    ensemble_xname = "year",
    obs_yname = "t2m",
    obs_xname = "year",
    ylab = "MAM Siberian temperature")
_images/Notebooks_3.Evaluate_3.Evaluate_20_0.png

For the UK we have a longer historical record available from EOBS:

[15]:
unseen_timeseries(ensemble = SEAS5_UK_weighted_df,
                  obs = EOBS_UK_weighted_df,
                  ylab = 'UK February precipitation (mm/d)')
Warning message:
“Removed 4654 rows containing non-finite values (stat_boxplot).”
_images/Notebooks_3.Evaluate_3.Evaluate_22_1.png
[16]:
unseen_timeseries(ensemble = SEAS5_UK_weighted_df,
                  obs = EOBS_UK_weighted_df_hindcast,
                  ylab = 'UK February precipitation (mm/d)')
Warning message:
“Removed 4654 rows containing non-finite values (stat_boxplot).”
_images/Notebooks_3.Evaluate_3.Evaluate_23_1.png

Call the documentation of the function with ?unseen_timeseries

Independence

Significance ranges need fixing + detrend method (Rob)

[17]:
independence_test(
    ensemble = SEAS5_Siberia_events,
    n_lds = 3,
    var_name = "t2m",
)
Warning message:
“Removed 975 rows containing non-finite values (stat_ydensity).”
Warning message:
“Removed 975 rows containing non-finite values (stat_boxplot).”
_images/Notebooks_3.Evaluate_3.Evaluate_27_1.png
[18]:
independence_test(
    ensemble = SEAS5_Siberia_events_zoomed,
    n_lds = 3,
    var_name = "t2m",
)
Warning message:
“Removed 975 rows containing non-finite values (stat_ydensity).”
Warning message:
“Removed 975 rows containing non-finite values (stat_boxplot).”
_images/Notebooks_3.Evaluate_3.Evaluate_28_1.png
[19]:
independence_test(ensemble = SEAS5_UK_weighted_df)
Warning message:
“Removed 1625 rows containing non-finite values (stat_ydensity).”
Warning message:
“Removed 1625 rows containing non-finite values (stat_boxplot).”
_images/Notebooks_3.Evaluate_3.Evaluate_29_1.png

Stability

For the stability test we assess whether the events get more severe with leadtime, due to a potential ‘drift’ in the model. We need to use the consistent hindcast dataset for this.

[20]:
stability_test(
    ensemble = SEAS5_Siberia_events_zoomed_hindcast,
    lab = 'MAM Siberian temperature',
    var_name = 't2m'
)
Warning message:
“Removed 2 row(s) containing missing values (geom_path).”
_images/Notebooks_3.Evaluate_3.Evaluate_31_1.png
[21]:
stability_test(ensemble = SEAS5_UK, lab = 'UK February precipitation (mm/d)')
Warning message:
“Removed 4 row(s) containing missing values (geom_path).”
_images/Notebooks_3.Evaluate_3.Evaluate_32_1.png

Fidelity

[22]:
fidelity_test(
    obs = ERA5_Siberia_events_zoomed_hindcast$t2m,
    ensemble = SEAS5_Siberia_events_zoomed_hindcast$t2m,
    units = 'C',
    biascor = FALSE
)
_images/Notebooks_3.Evaluate_3.Evaluate_34_0.png

Lets apply a additive biascor

[23]:
#Lets apply a additive biascor
obs = ERA5_Siberia_events_zoomed_hindcast$t2m
ensemble = SEAS5_Siberia_events_zoomed_hindcast$t2m
ensemble_biascor = ensemble + (mean(obs) - mean(ensemble))

fidelity_test(
    obs = obs,
    ensemble = ensemble_biascor,
    units = 'C',
    biascor = FALSE
)
_images/Notebooks_3.Evaluate_3.Evaluate_36_0.png

Global monthly temperature records in ERA5

Where have monthly average temperatures broken records across the world in 2020?

Global Temperature records 2020

In this first section, we load required packages and modules

[29]:
##This is so variables get printed within jupyter
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
[2]:
##import packages
import os
import xarray as xr
import numpy as np
import matplotlib.pyplot as plt
import cartopy
import cartopy.crs as ccrs
import matplotlib.ticker as mticker

#for rank calculation
import bottleneck
[3]:
## this is to load our own function to retrieve ERA5,
## which is located in ../src/CDSretrieve.py
import sys
sys.path.append('../')
[4]:
##And here we load the module
import src.CDSretrieve as retrieve
[5]:
##We want the working directory to be the UNSEEN-open directory
pwd = os.getcwd() ##current working directory is UNSEEN-open/Notebooks/1.Download
pwd #print the present working directory
os.chdir(pwd+'/../') # Change the working directory to UNSEEN-open
os.getcwd() #print the working directory
[5]:
'/lustre/soge1/projects/ls/personal/timo/UNSEEN-open/Notebooks'
[5]:
'/lustre/soge1/projects/ls/personal/timo/UNSEEN-open'

Download ERA5

This section describes the retrieval of ERA5. We retrieve netcdf files of global monthly 2m temperature and 2m dewpoint temperature for each year over 1979-2020.

[39]:
retrieve.retrieve_ERA5(variables = ['2m_temperature','2m_dewpoint_temperature'], folder = '../Siberia_example/')
;
[39]:
''

We load all files with xarray open_mfdataset. The latest 3 months in this dataset are made available through ERA5T, which might be slightly different to ERA5. In the downloaded file, an extra dimenions ‘expver’ indicates which data is ERA5 (expver = 1) and which is ERA5T (expver = 5). After retrieving and loading, I combine both ERA5 and ERA5T to create a dataset that runs until August 2020.

[10]:
ERA5 = xr.open_mfdataset('../Siberia_example/ERA5_????.nc',combine='by_coords') ## open the data
ERA5#
[10]:
Show/Hide data repr Show/Hide attributes
xarray.Dataset
    • expver: 2
    • latitude: 181
    • longitude: 360
    • time: 500
    • latitude
      (latitude)
      float32
      90.0 89.0 88.0 ... -89.0 -90.0
      units :
      degrees_north
      long_name :
      latitude
      array([ 90.,  89.,  88.,  87.,  86.,  85.,  84.,  83.,  82.,  81.,  80.,  79.,
              78.,  77.,  76.,  75.,  74.,  73.,  72.,  71.,  70.,  69.,  68.,  67.,
              66.,  65.,  64.,  63.,  62.,  61.,  60.,  59.,  58.,  57.,  56.,  55.,
              54.,  53.,  52.,  51.,  50.,  49.,  48.,  47.,  46.,  45.,  44.,  43.,
              42.,  41.,  40.,  39.,  38.,  37.,  36.,  35.,  34.,  33.,  32.,  31.,
              30.,  29.,  28.,  27.,  26.,  25.,  24.,  23.,  22.,  21.,  20.,  19.,
              18.,  17.,  16.,  15.,  14.,  13.,  12.,  11.,  10.,   9.,   8.,   7.,
               6.,   5.,   4.,   3.,   2.,   1.,   0.,  -1.,  -2.,  -3.,  -4.,  -5.,
              -6.,  -7.,  -8.,  -9., -10., -11., -12., -13., -14., -15., -16., -17.,
             -18., -19., -20., -21., -22., -23., -24., -25., -26., -27., -28., -29.,
             -30., -31., -32., -33., -34., -35., -36., -37., -38., -39., -40., -41.,
             -42., -43., -44., -45., -46., -47., -48., -49., -50., -51., -52., -53.,
             -54., -55., -56., -57., -58., -59., -60., -61., -62., -63., -64., -65.,
             -66., -67., -68., -69., -70., -71., -72., -73., -74., -75., -76., -77.,
             -78., -79., -80., -81., -82., -83., -84., -85., -86., -87., -88., -89.,
             -90.], dtype=float32)
    • longitude
      (longitude)
      float32
      -180.0 -179.0 ... 178.0 179.0
      units :
      degrees_east
      long_name :
      longitude
      array([-180., -179., -178., ...,  177.,  178.,  179.], dtype=float32)
    • expver
      (expver)
      int32
      1 5
      long_name :
      expver
      array([1, 5], dtype=int32)
    • time
      (time)
      datetime64[ns]
      1979-01-01 ... 2020-08-01
      long_name :
      time
      array(['1979-01-01T00:00:00.000000000', '1979-02-01T00:00:00.000000000',
             '1979-03-01T00:00:00.000000000', ..., '2020-06-01T00:00:00.000000000',
             '2020-07-01T00:00:00.000000000', '2020-08-01T00:00:00.000000000'],
            dtype='datetime64[ns]')
    • t2m
      (time, latitude, longitude, expver)
      float32
      dask.array<chunksize=(12, 181, 360, 2), meta=np.ndarray>
      units :
      K
      long_name :
      2 metre temperature
      Array Chunk
      Bytes 260.64 MB 6.26 MB
      Shape (500, 181, 360, 2) (12, 181, 360, 2)
      Count 209 Tasks 42 Chunks
      Type float32 numpy.ndarray
      500 1 2 360 181
    • d2m
      (time, latitude, longitude, expver)
      float32
      dask.array<chunksize=(12, 181, 360, 2), meta=np.ndarray>
      units :
      K
      long_name :
      2 metre dewpoint temperature
      Array Chunk
      Bytes 260.64 MB 6.26 MB
      Shape (500, 181, 360, 2) (12, 181, 360, 2)
      Count 209 Tasks 42 Chunks
      Type float32 numpy.ndarray
      500 1 2 360 181
  • Conventions :
    CF-1.6
    history :
    2020-09-07 10:14:42 GMT by grib_to_netcdf-2.16.0: /opt/ecmwf/eccodes/bin/grib_to_netcdf -S param -o /cache/data3/adaptor.mars.internal-1599473651.1990857-24563-8-be8219b1-4396-4b61-90fe-4f688ea35d84.nc /cache/tmp/be8219b1-4396-4b61-90fe-4f688ea35d84-adaptor.mars.internal-1599473651.199597-24563-2-tmp.grib
[14]:
ERA5_combine =ERA5.sel(expver=1).combine_first(ERA5.sel(expver=5))
ERA5_combine.load()
[14]:
Show/Hide data repr Show/Hide attributes
xarray.Dataset
    • latitude: 181
    • longitude: 360
    • time: 500
    • latitude
      (latitude)
      float32
      90.0 89.0 88.0 ... -89.0 -90.0
      units :
      degrees_north
      long_name :
      latitude
      array([ 90.,  89.,  88.,  87.,  86.,  85.,  84.,  83.,  82.,  81.,  80.,  79.,
              78.,  77.,  76.,  75.,  74.,  73.,  72.,  71.,  70.,  69.,  68.,  67.,
              66.,  65.,  64.,  63.,  62.,  61.,  60.,  59.,  58.,  57.,  56.,  55.,
              54.,  53.,  52.,  51.,  50.,  49.,  48.,  47.,  46.,  45.,  44.,  43.,
              42.,  41.,  40.,  39.,  38.,  37.,  36.,  35.,  34.,  33.,  32.,  31.,
              30.,  29.,  28.,  27.,  26.,  25.,  24.,  23.,  22.,  21.,  20.,  19.,
              18.,  17.,  16.,  15.,  14.,  13.,  12.,  11.,  10.,   9.,   8.,   7.,
               6.,   5.,   4.,   3.,   2.,   1.,   0.,  -1.,  -2.,  -3.,  -4.,  -5.,
              -6.,  -7.,  -8.,  -9., -10., -11., -12., -13., -14., -15., -16., -17.,
             -18., -19., -20., -21., -22., -23., -24., -25., -26., -27., -28., -29.,
             -30., -31., -32., -33., -34., -35., -36., -37., -38., -39., -40., -41.,
             -42., -43., -44., -45., -46., -47., -48., -49., -50., -51., -52., -53.,
             -54., -55., -56., -57., -58., -59., -60., -61., -62., -63., -64., -65.,
             -66., -67., -68., -69., -70., -71., -72., -73., -74., -75., -76., -77.,
             -78., -79., -80., -81., -82., -83., -84., -85., -86., -87., -88., -89.,
             -90.], dtype=float32)
    • longitude
      (longitude)
      float32
      -180.0 -179.0 ... 178.0 179.0
      units :
      degrees_east
      long_name :
      longitude
      array([-180., -179., -178., ...,  177.,  178.,  179.], dtype=float32)
    • time
      (time)
      datetime64[ns]
      1979-01-01 ... 2020-08-01
      long_name :
      time
      array(['1979-01-01T00:00:00.000000000', '1979-02-01T00:00:00.000000000',
             '1979-03-01T00:00:00.000000000', ..., '2020-06-01T00:00:00.000000000',
             '2020-07-01T00:00:00.000000000', '2020-08-01T00:00:00.000000000'],
            dtype='datetime64[ns]')
    • t2m
      (time, latitude, longitude)
      float32
      244.7074 244.7074 ... 214.79857
      units :
      K
      long_name :
      2 metre temperature
      array([[[244.7074 , 244.7074 , 244.7074 , ..., 244.7074 , 244.7074 ,
               244.7074 ],
              [244.42686, 244.4086 , 244.39035, ..., 244.46837, 244.45508,
               244.44014],
              [244.88667, 244.88168, 244.87338, ..., 244.9431 , 244.9265 ,
               244.90659],
              ...,
              [242.26729, 242.337  , 242.40009, ..., 242.0764 , 242.12619,
               242.19757],
              [241.50372, 241.53027, 241.55518, ..., 241.41907, 241.44728,
               241.4755 ],
              [242.92795, 242.92795, 242.92795, ..., 242.92795, 242.92795,
               242.92795]],
      
             [[241.44562, 241.44562, 241.44562, ..., 241.44562, 241.44562,
               241.44562],
              [240.8331 , 240.81152, 240.78995, ..., 240.8663 , 240.85468,
               240.84473],
              [240.3484 , 240.33844, 240.32683, ..., 240.4231 , 240.39986,
               240.3733 ],
              ...,
              [235.54123, 235.63916, 235.73047, ..., 235.21754, 235.32378,
               235.435  ],
              [233.12436, 233.15092, 233.17914, ..., 232.97995, 233.02643,
               233.07457],
              [234.14522, 234.14522, 234.14522, ..., 234.14522, 234.14522,
               234.14522]],
      
             [[246.76073, 246.76073, 246.76073, ..., 246.76073, 246.76073,
               246.76073],
              [246.30093, 246.29596, 246.29263, ..., 246.29596, 246.29596,
               246.29927],
              [245.97392, 245.98056, 245.9872 , ..., 245.99384, 245.9872 ,
               245.98056],
              ...,
              [230.69754, 230.77556, 230.84859, ..., 230.49005, 230.54483,
               230.62119],
              [227.83083, 227.83913, 227.84743, ..., 227.77771, 227.79431,
               227.81256],
              [227.91382, 227.91382, 227.91382, ..., 227.91382, 227.91382,
               227.91382]],
      
             ...,
      
             [[273.7372 , 273.7372 , 273.7372 , ..., 273.7372 , 273.7372 ,
               273.7372 ],
              [273.6449 , 273.6449 , 273.64816, ..., 273.65787, 273.653  ,
               273.64978],
              [273.69995, 273.69833, 273.69672, ..., 273.68863, 273.68863,
               273.6951 ],
              ...,
              [227.0798 , 227.18506, 227.28543, ..., 226.75113, 226.83856,
               226.96161],
              [222.94612, 222.95422, 222.9607 , ..., 222.88783, 222.90726,
               222.9267 ],
              [223.5452 , 223.5452 , 223.5452 , ..., 223.5452 , 223.5452 ,
               223.5452 ]],
      
             [[274.1614 , 274.1614 , 274.1614 , ..., 274.1614 , 274.1614 ,
               274.1614 ],
              [274.12256, 274.12418, 274.1258 , ..., 274.13715, 274.13226,
               274.1258 ],
              [274.20676, 274.20514, 274.2019 , ..., 274.2019 , 274.19867,
               274.2019 ],
              ...,
              [222.21588, 222.3357 , 222.45065, ..., 221.86453, 221.95844,
               222.0896 ],
              [218.76709, 218.7687 , 218.7687 , ..., 218.64404, 218.68452,
               218.72662],
              [218.6489 , 218.6489 , 218.6489 , ..., 218.6489 , 218.6489 ,
               218.6489 ]],
      
             [[273.53482, 273.53482, 273.53482, ..., 273.53482, 273.53482,
               273.53482],
              [273.5267 , 273.52832, 273.52832, ..., 273.53644, 273.53156,
               273.52994],
              [273.57367, 273.57205, 273.56882, ..., 273.57367, 273.56882,
               273.57205],
              ...,
              [221.32373, 221.45003, 221.5747 , ..., 220.88657, 221.02419,
               221.17477],
              [215.54337, 215.53204, 215.5207 , ..., 215.51746, 215.52718,
               215.5369 ],
              [214.79857, 214.79857, 214.79857, ..., 214.79857, 214.79857,
               214.79857]]], dtype=float32)
    • d2m
      (time, latitude, longitude)
      float32
      241.76836 241.76836 ... 211.0198
      units :
      K
      long_name :
      2 metre dewpoint temperature
      array([[[241.76836, 241.76836, 241.76836, ..., 241.76836, 241.76836,
               241.76836],
              [241.52267, 241.50885, 241.49504, ..., 241.57181, 241.55339,
               241.53802],
              [242.01097, 242.0033 , 241.99255, ..., 242.07086, 242.04936,
               242.0294 ],
              ...,
              [238.93071, 238.99673, 239.06276, ..., 238.76488, 238.8094 ,
               238.8693 ],
              [238.2996 , 238.32265, 238.34721, ..., 238.22284, 238.24893,
               238.27504],
              [239.42055, 239.42055, 239.42055, ..., 239.42055, 239.42055,
               239.42055]],
      
             [[238.25815, 238.25815, 238.25815, ..., 238.25815, 238.25815,
               238.25815],
              [237.63933, 237.61937, 237.59941, ..., 237.68387, 237.6685 ,
               237.65161],
              [237.17253, 237.15257, 237.13107, ..., 237.26312, 237.23242,
               237.20325],
              ...,
              [232.30031, 232.4078 , 232.51529, ..., 232.0101 , 232.10684,
               232.20511],
              [229.7145 , 229.73752, 229.76208, ..., 229.5763 , 229.62082,
               229.66843],
              [230.39319, 230.39319, 230.39319, ..., 230.39319, 230.39319,
               230.39319]],
      
             [[243.83824, 243.83824, 243.83824, ..., 243.83824, 243.83824,
               243.83824],
              [243.37451, 243.36992, 243.36838, ..., 243.3822 , 243.38066,
               243.37605],
              [243.08891, 243.08891, 243.08737, ..., 243.12883, 243.11502,
               243.10274],
              ...,
              [226.94287, 227.02118, 227.09335, ..., 226.74171, 226.79391,
               226.8707 ],
              [224.07297, 224.08064, 224.08832, ..., 224.01923, 224.03766,
               224.05608],
              [224.08064, 224.08064, 224.08064, ..., 224.08064, 224.08064,
               224.08064]],
      
             ...,
      
             [[272.77475, 272.77475, 272.77475, ..., 272.77475, 272.77475,
               272.77475],
              [272.66974, 272.67123, 272.67123, ..., 272.6594 , 272.6609 ,
               272.6653 ],
              [272.76733, 272.76883, 272.7703 , ..., 272.74664, 272.75552,
               272.76144],
              ...,
              [223.10846, 223.21198, 223.31105, ..., 222.80237, 222.88666,
               222.99904],
              [219.37021, 219.37761, 219.38353, ..., 219.3081 , 219.32881,
               219.351  ],
              [219.93805, 219.93805, 219.93805, ..., 219.93805, 219.93805,
               219.93805]],
      
             [[273.8749 , 273.8749 , 273.8749 , ..., 273.8749 , 273.8749 ,
               273.8749 ],
              [273.85568, 273.85568, 273.85275, ..., 273.85864, 273.85718,
               273.85568],
              [273.89413, 273.89267, 273.8897 , ..., 273.89413, 273.89413,
               273.89413],
              ...,
              [218.273  , 218.3839 , 218.49185, ..., 217.9388 , 218.02753,
               218.15173],
              [215.0523 , 215.05379, 215.05823, ..., 214.92958, 214.97098,
               215.01091],
              [214.8586 , 214.8586 , 214.8586 , ..., 214.8586 , 214.8586 ,
               214.8586 ]],
      
             [[273.13556, 273.13556, 273.13556, ..., 273.13556, 273.13556,
               273.13556],
              [273.11188, 273.11188, 273.11188, ..., 273.11633, 273.11633,
               273.11484],
              [273.06015, 273.05423, 273.0483 , ..., 273.06754, 273.06604,
               273.0616 ],
              ...,
              [217.46117, 217.58687, 217.70663, ..., 217.04416, 217.17874,
               217.32364],
              [211.87007, 211.85973, 211.8479 , ..., 211.83458, 211.8464 ,
               211.85825],
              [211.0198 , 211.0198 , 211.0198 , ..., 211.0198 , 211.0198 ,
               211.0198 ]]], dtype=float32)
  • Conventions :
    CF-1.6
    history :
    2020-09-07 10:14:42 GMT by grib_to_netcdf-2.16.0: /opt/ecmwf/eccodes/bin/grib_to_netcdf -S param -o /cache/data3/adaptor.mars.internal-1599473651.1990857-24563-8-be8219b1-4396-4b61-90fe-4f688ea35d84.nc /cache/tmp/be8219b1-4396-4b61-90fe-4f688ea35d84-adaptor.mars.internal-1599473651.199597-24563-2-tmp.grib

Calculating the rank

We want to show for each month whether the recorded monthly average temperature for 2020 is the highest since 1979 (or second highest, etc.).

We first select only January months.

[15]:
ERA5_jan = ERA5_combine.sel(time=ERA5_combine['time.month'] == 1) ## Select only for the i month

Then we calculate the rank of January average temperatures over the years. We rename the variable ‘t2m’ into ‘Temperature rank’.

[16]:
ERA5_jan_rank = ERA5_jan['t2m'].rank(dim = 'time')
ERA5_jan_rank = ERA5_jan_rank.rename('Temperature rank')

We now have calculated the rank in increasing order, i.e. the highest values has the highest rank. However, we want to show the highest rank being number 1, the second highest being number 2. Therefore, we invert the ranks and then we select the inverted rank of January 2020 average temperature within the January average temperatures of the other years. If January 2020 average temperature would be highest on record, the inverted rank will be 1. Second highest will be 2.

[17]:
ERA5_jan_rank_inverted = (len(ERA5_jan_rank.time) - ERA5_jan_rank + 1).sel(time='2020')
ERA5_jan_rank_inverted
[17]:
Show/Hide data repr Show/Hide attributes
xarray.DataArray
'Temperature rank'
  • time: 1
  • latitude: 181
  • longitude: 360
  • 25.0 25.0 25.0 25.0 25.0 25.0 25.0 ... 24.0 24.0 24.0 24.0 24.0 24.0
    array([[[25., 25., 25., ..., 25., 25., 25.],
            [23., 23., 23., ..., 23., 23., 23.],
            [25., 25., 25., ..., 25., 25., 25.],
            ...,
            [23., 23., 23., ..., 23., 23., 23.],
            [24., 24., 24., ..., 24., 24., 24.],
            [24., 24., 24., ..., 24., 24., 24.]]])
    • latitude
      (latitude)
      float32
      90.0 89.0 88.0 ... -89.0 -90.0
      units :
      degrees_north
      long_name :
      latitude
      array([ 90.,  89.,  88.,  87.,  86.,  85.,  84.,  83.,  82.,  81.,  80.,  79.,
              78.,  77.,  76.,  75.,  74.,  73.,  72.,  71.,  70.,  69.,  68.,  67.,
              66.,  65.,  64.,  63.,  62.,  61.,  60.,  59.,  58.,  57.,  56.,  55.,
              54.,  53.,  52.,  51.,  50.,  49.,  48.,  47.,  46.,  45.,  44.,  43.,
              42.,  41.,  40.,  39.,  38.,  37.,  36.,  35.,  34.,  33.,  32.,  31.,
              30.,  29.,  28.,  27.,  26.,  25.,  24.,  23.,  22.,  21.,  20.,  19.,
              18.,  17.,  16.,  15.,  14.,  13.,  12.,  11.,  10.,   9.,   8.,   7.,
               6.,   5.,   4.,   3.,   2.,   1.,   0.,  -1.,  -2.,  -3.,  -4.,  -5.,
              -6.,  -7.,  -8.,  -9., -10., -11., -12., -13., -14., -15., -16., -17.,
             -18., -19., -20., -21., -22., -23., -24., -25., -26., -27., -28., -29.,
             -30., -31., -32., -33., -34., -35., -36., -37., -38., -39., -40., -41.,
             -42., -43., -44., -45., -46., -47., -48., -49., -50., -51., -52., -53.,
             -54., -55., -56., -57., -58., -59., -60., -61., -62., -63., -64., -65.,
             -66., -67., -68., -69., -70., -71., -72., -73., -74., -75., -76., -77.,
             -78., -79., -80., -81., -82., -83., -84., -85., -86., -87., -88., -89.,
             -90.], dtype=float32)
    • longitude
      (longitude)
      float32
      -180.0 -179.0 ... 178.0 179.0
      units :
      degrees_east
      long_name :
      longitude
      array([-180., -179., -178., ...,  177.,  178.,  179.], dtype=float32)
    • time
      (time)
      datetime64[ns]
      2020-01-01
      long_name :
      time
      array(['2020-01-01T00:00:00.000000000'], dtype='datetime64[ns]')

Plotting

We define a function to plot the data on a global map:

[18]:
def Global_plot(ERA5_i_rank_inverted):
    fig, ax = plt.subplots(figsize=(9, 4.5))
    ax = plt.axes(projection=ccrs.Robinson())
    ERA5_i_rank_inverted.plot(
        ax=ax,
        transform=ccrs.PlateCarree(),
        levels=[1, 2, 3, 4, 5],
        extend='both',
        colors=plt.cm.Reds_r)

    ax.add_feature(cartopy.feature.BORDERS, linestyle=':')
    ax.coastlines(
        resolution='110m')  #Currently can be one of “110m”, “50m”, and “10m”.
    gl = ax.gridlines(crs=ccrs.PlateCarree(),
                      draw_labels=True,
                      linewidth=1,
                      color='gray',
                      alpha=0.5,
                      linestyle='--')
#     gl.top_labels = False
#     gl.right_labels = False

And plot!

[19]:
Global_plot(ERA5_jan_rank_inverted)
_images/Notebooks_Global_monthly_temperature_records_ERA5_22_0.png

And zoom in for Siberia. We define a new plot:

[55]:
def Siberia_plot(ERA5_i_rank_inverted):
    fig, ax = plt.subplots(figsize=(9, 4.5))
    ax = plt.axes(projection=ccrs.PlateCarree(central_longitude=50.0))
    ERA5_i_rank_inverted.plot(
        ax=ax,
        transform=ccrs.PlateCarree(),
        levels=[1, 2, 3, 4, 5],
        extend='both',
        colors=plt.cm.Reds_r)

    ax.add_feature(cartopy.feature.BORDERS, linestyle=':')
    ax.coastlines(resolution='50m')
    gl = ax.gridlines(crs=ccrs.PlateCarree(),
                      draw_labels=True,
                      linewidth=1,
                      color='gray',
                      alpha=0.5,
                      linestyle='--')
    gl.top_labels = False
    gl.right_labels = False
[56]:
Siberia_plot(ERA5_jan_rank_inverted.sel(longitude = slice(-11,140), latitude = slice(80,40)))
_images/Notebooks_Global_monthly_temperature_records_ERA5_25_0.png

Loop over Jan-Aug

Create the gif

We use ImageMagick and run it from the command line. See this CMS notebook for more info on creating gifs.

[58]:
!convert -delay 60 ../Siberia_example/plots/Global*png graphs/Global_Animation_01.gif
[60]:
!convert -delay 60 ../Siberia_example/plots/Siberia*png graphs/Siberia_Animation_01.gif

And show the gif in jupyter notebook with

![Global Temperature records 2020](../graphs/Global_Animation_01.gif "Records2020")

Global Temperature records 2020

Same for the Siberian temperature records: ![Siberian Temperature records 2020](../graphs/Siberia_Animation_01.gif "Records2020")

Siberian Temperature records 2020

California august temperature anomaly

How anomalous was the August 2020 average temperature?

California Temperature August 2020

In this first section, we load required packages and modules

[1]:
##This is so variables get printed within jupyter
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
[2]:
##import packages
import os
import xarray as xr
import numpy as np
import matplotlib.pyplot as plt
import cartopy
import cartopy.crs as ccrs
import matplotlib.ticker as mticker
import cdsapi


#for rank calculation
# import bottleneck
[3]:
os.chdir(os.path.abspath('../../'))

Load ERA5

We have retrieved netcdf files of global monthly 2m temperature and 2m dewpoint temperature for each year over 1979-2020.

We load all files with xarray open_mfdataset.

[4]:
ERA5 = xr.open_mfdataset('../California_example/ERA5/ERA5_????.nc',combine='by_coords') ## open the data
ERA5#
[4]:
Show/Hide data repr Show/Hide attributes
xarray.Dataset
    • latitude: 51
    • longitude: 61
    • time: 42
    • longitude
      (longitude)
      float32
      -130.0 -129.0 ... -71.0 -70.0
      units :
      degrees_east
      long_name :
      longitude
      array([-130., -129., -128., -127., -126., -125., -124., -123., -122., -121.,
             -120., -119., -118., -117., -116., -115., -114., -113., -112., -111.,
             -110., -109., -108., -107., -106., -105., -104., -103., -102., -101.,
             -100.,  -99.,  -98.,  -97.,  -96.,  -95.,  -94.,  -93.,  -92.,  -91.,
              -90.,  -89.,  -88.,  -87.,  -86.,  -85.,  -84.,  -83.,  -82.,  -81.,
              -80.,  -79.,  -78.,  -77.,  -76.,  -75.,  -74.,  -73.,  -72.,  -71.,
              -70.], dtype=float32)
    • latitude
      (latitude)
      float32
      70.0 69.0 68.0 ... 22.0 21.0 20.0
      units :
      degrees_north
      long_name :
      latitude
      array([70., 69., 68., 67., 66., 65., 64., 63., 62., 61., 60., 59., 58., 57.,
             56., 55., 54., 53., 52., 51., 50., 49., 48., 47., 46., 45., 44., 43.,
             42., 41., 40., 39., 38., 37., 36., 35., 34., 33., 32., 31., 30., 29.,
             28., 27., 26., 25., 24., 23., 22., 21., 20.], dtype=float32)
    • time
      (time)
      datetime64[ns]
      1979-08-01 ... 2020-08-01
      long_name :
      time
      array(['1979-08-01T00:00:00.000000000', '1980-08-01T00:00:00.000000000',
             '1981-08-01T00:00:00.000000000', '1982-08-01T00:00:00.000000000',
             '1983-08-01T00:00:00.000000000', '1984-08-01T00:00:00.000000000',
             '1985-08-01T00:00:00.000000000', '1986-08-01T00:00:00.000000000',
             '1987-08-01T00:00:00.000000000', '1988-08-01T00:00:00.000000000',
             '1989-08-01T00:00:00.000000000', '1990-08-01T00:00:00.000000000',
             '1991-08-01T00:00:00.000000000', '1992-08-01T00:00:00.000000000',
             '1993-08-01T00:00:00.000000000', '1994-08-01T00:00:00.000000000',
             '1995-08-01T00:00:00.000000000', '1996-08-01T00:00:00.000000000',
             '1997-08-01T00:00:00.000000000', '1998-08-01T00:00:00.000000000',
             '1999-08-01T00:00:00.000000000', '2000-08-01T00:00:00.000000000',
             '2001-08-01T00:00:00.000000000', '2002-08-01T00:00:00.000000000',
             '2003-08-01T00:00:00.000000000', '2004-08-01T00:00:00.000000000',
             '2005-08-01T00:00:00.000000000', '2006-08-01T00:00:00.000000000',
             '2007-08-01T00:00:00.000000000', '2008-08-01T00:00:00.000000000',
             '2009-08-01T00:00:00.000000000', '2010-08-01T00:00:00.000000000',
             '2011-08-01T00:00:00.000000000', '2012-08-01T00:00:00.000000000',
             '2013-08-01T00:00:00.000000000', '2014-08-01T00:00:00.000000000',
             '2015-08-01T00:00:00.000000000', '2016-08-01T00:00:00.000000000',
             '2017-08-01T00:00:00.000000000', '2018-08-01T00:00:00.000000000',
             '2019-08-01T00:00:00.000000000', '2020-08-01T00:00:00.000000000'],
            dtype='datetime64[ns]')
    • t2m
      (time, latitude, longitude)
      float32
      dask.array<chunksize=(1, 51, 61), meta=np.ndarray>
      units :
      K
      long_name :
      2 metre temperature
      Array Chunk
      Bytes 522.65 kB 12.44 kB
      Shape (42, 51, 61) (1, 51, 61)
      Count 126 Tasks 42 Chunks
      Type float32 numpy.ndarray
      61 51 42
    • d2m
      (time, latitude, longitude)
      float32
      dask.array<chunksize=(1, 51, 61), meta=np.ndarray>
      units :
      K
      long_name :
      2 metre dewpoint temperature
      Array Chunk
      Bytes 522.65 kB 12.44 kB
      Shape (42, 51, 61) (1, 51, 61)
      Count 126 Tasks 42 Chunks
      Type float32 numpy.ndarray
      61 51 42
  • Conventions :
    CF-1.6
    history :
    2021-02-09 15:47:30 GMT by grib_to_netcdf-2.16.0: /opt/ecmwf/eccodes/bin/grib_to_netcdf -S param -o /cache/data7/adaptor.mars.internal-1612885647.5164566-26296-16-d75758d7-9566-4339-b834-0e54e545a802.nc /cache/tmp/d75758d7-9566-4339-b834-0e54e545a802-adaptor.mars.internal-1612885647.5171535-26296-6-tmp.grib

Retrieve the land-sea mask

We retrieve the land-sea mask for ERA5 from CDS.

[5]:
c = cdsapi.Client()

c.retrieve(
    'reanalysis-era5-single-levels-monthly-means',
    {
        'format': 'netcdf',
        'product_type': 'monthly_averaged_reanalysis',
        'variable': 'land_sea_mask',
        'grid': [1.0, 1.0],
        'year': '1979',
        'month': '01',
        'time': '00:00',
    },
    '../California_example/ERA_landsea_mask.nc')
2021-08-12 10:14:27,929 INFO Welcome to the CDS
2021-08-12 10:14:27,931 INFO Sending request to https://cds.climate.copernicus.eu/api/v2/resources/reanalysis-era5-single-levels-monthly-means
2021-08-12 10:14:28,432 INFO Request is completed
2021-08-12 10:14:28,433 INFO Downloading https://download-0011.copernicus-climate.eu/cache-compute-0011/cache/data5/adaptor.mars.internal-1628605177.1251855-440-12-4553cc0b-4a6e-4405-85fa-b9d5396ac1f1.nc to ../California_example/ERA_landsea_mask.nc (130.5K)
2021-08-12 10:14:28,662 INFO Download rate 574.8K/s
[5]:
Result(content_length=133596,content_type=application/x-netcdf,location=https://download-0011.copernicus-climate.eu/cache-compute-0011/cache/data5/adaptor.mars.internal-1628605177.1251855-440-12-4553cc0b-4a6e-4405-85fa-b9d5396ac1f1.nc)

And here we open the dataset. It contains dimensionless values from 0 to 1. From CDS: “Grid boxes where this parameter has a value above 0.5 can be comprised of a mixture of land and inland water but not ocean. Grid boxes with a value of 0.5 and below can only be comprised of a water surface.”

[5]:
LSMask = xr.open_dataset('../California_example/ERA_landsea_mask.nc')
LSMask.load()
[5]:
Show/Hide data repr Show/Hide attributes
xarray.Dataset
    • latitude: 181
    • longitude: 360
    • time: 1
    • longitude
      (longitude)
      float32
      0.0 1.0 2.0 ... 357.0 358.0 359.0
      units :
      degrees_east
      long_name :
      longitude
      array([  0.,   1.,   2., ..., 357., 358., 359.], dtype=float32)
    • latitude
      (latitude)
      float32
      90.0 89.0 88.0 ... -89.0 -90.0
      units :
      degrees_north
      long_name :
      latitude
      array([ 90.,  89.,  88.,  87.,  86.,  85.,  84.,  83.,  82.,  81.,  80.,  79.,
              78.,  77.,  76.,  75.,  74.,  73.,  72.,  71.,  70.,  69.,  68.,  67.,
              66.,  65.,  64.,  63.,  62.,  61.,  60.,  59.,  58.,  57.,  56.,  55.,
              54.,  53.,  52.,  51.,  50.,  49.,  48.,  47.,  46.,  45.,  44.,  43.,
              42.,  41.,  40.,  39.,  38.,  37.,  36.,  35.,  34.,  33.,  32.,  31.,
              30.,  29.,  28.,  27.,  26.,  25.,  24.,  23.,  22.,  21.,  20.,  19.,
              18.,  17.,  16.,  15.,  14.,  13.,  12.,  11.,  10.,   9.,   8.,   7.,
               6.,   5.,   4.,   3.,   2.,   1.,   0.,  -1.,  -2.,  -3.,  -4.,  -5.,
              -6.,  -7.,  -8.,  -9., -10., -11., -12., -13., -14., -15., -16., -17.,
             -18., -19., -20., -21., -22., -23., -24., -25., -26., -27., -28., -29.,
             -30., -31., -32., -33., -34., -35., -36., -37., -38., -39., -40., -41.,
             -42., -43., -44., -45., -46., -47., -48., -49., -50., -51., -52., -53.,
             -54., -55., -56., -57., -58., -59., -60., -61., -62., -63., -64., -65.,
             -66., -67., -68., -69., -70., -71., -72., -73., -74., -75., -76., -77.,
             -78., -79., -80., -81., -82., -83., -84., -85., -86., -87., -88., -89.,
             -90.], dtype=float32)
    • time
      (time)
      datetime64[ns]
      1979-01-01
      long_name :
      time
      array(['1979-01-01T00:00:00.000000000'], dtype='datetime64[ns]')
    • lsm
      (time, latitude, longitude)
      float32
      0.0 0.0 0.0 0.0 ... 1.0 1.0 1.0 1.0
      units :
      (0 - 1)
      long_name :
      Land-sea mask
      standard_name :
      land_binary_mask
      array([[[0., 0., 0., ..., 0., 0., 0.],
              [0., 0., 0., ..., 0., 0., 0.],
              [0., 0., 0., ..., 0., 0., 0.],
              ...,
              [1., 1., 1., ..., 1., 1., 1.],
              [1., 1., 1., ..., 1., 1., 1.],
              [1., 1., 1., ..., 1., 1., 1.]]], dtype=float32)
  • Conventions :
    CF-1.6
    history :
    2021-08-10 14:19:37 GMT by grib_to_netcdf-2.20.0: /opt/ecmwf/mars-client/bin/grib_to_netcdf -S param -o /cache/data5/adaptor.mars.internal-1628605177.1251855-440-12-4553cc0b-4a6e-4405-85fa-b9d5396ac1f1.nc /cache/tmp/4553cc0b-4a6e-4405-85fa-b9d5396ac1f1-adaptor.mars.internal-1628605176.529759-440-21-tmp.grib
[6]:
LSMask['lsm'].plot()
[6]:
<matplotlib.collections.QuadMesh at 0x7fbcb9f389a0>
_images/Notebooks_California_august_temperature_anomaly_12_1.png

We can select all gridcelss where the land-sea mask values are > 0.5 in order to remove ocean gridcells:

[7]:
LSMask['lsm'].where(LSMask['lsm'] > 0.5).plot()
[7]:
<matplotlib.collections.QuadMesh at 0x7fbcb9d95d00>
_images/Notebooks_California_august_temperature_anomaly_14_1.png

The longitude values in this dataset run from 0:360. We need to convert this to -180:180:

[8]:
# convert the longitude from 0:360 to -180:180
LSMask['longitude'] = (((LSMask['longitude'] + 180) % 360) - 180)

Calculating the anomaly

We want to show how anomalous the recorded monthly average temperature for 2020 is compared to the 1979-2010 average. We first calculate the temperature anomaly from the 1979-2010 mean and then calculate the standardardized anomaly by dividing the anomaly by the standard deviation:

[9]:
ERA5_anomaly = ERA5['t2m'] - ERA5['t2m'].sel(time=slice('1979','2010')).mean('time')
ERA5_anomaly.attrs = {
    'long_name': 'August temperature anomaly',
    'units': 'C'
}
ERA5_sd_anomaly = ERA5_anomaly / ERA5['t2m'].sel(time=slice('1979','2010')).std('time')
ERA5_sd_anomaly.attrs = {
    'long_name': 'August temperature standardized anomaly',
    'units': '-'
}

Plotting the 2020 temperature anomaly

We define a function to plot the data on a global map:

[10]:
def plot_California(ERA5_input):

    extent = [-120, -80, 20, 50]
    central_lon = np.mean(extent[:2])
    central_lat = np.mean(extent[2:])

    plt.figure(figsize=(12, 6))
    ax = plt.axes(projection=ccrs.AlbersEqualArea(central_lon, central_lat))
    ax.set_extent(extent)

    ERA5_input.plot(
        ax=ax,
        transform=ccrs.PlateCarree(),
        extend='both')

    ax.add_feature(cartopy.feature.BORDERS, linestyle=':')
    ax.coastlines(
        resolution='110m')  #Currently can be one of “110m”, “50m”, and “10m”.
    ax.set_title('')
    gl = ax.gridlines(crs=ccrs.PlateCarree(),
                      draw_labels=True,
                      linewidth=1,
                      color='gray',
                      alpha=0.5,
                      linestyle='--')
    gl.top_labels = False
    gl.right_labels = False

And plot the ERA5 2020 temperature anomaly

[11]:
plot_California(ERA5_anomaly.sel(time = '2020'))
# plt.savefig('graphs/California_anomaly.png')
_images/Notebooks_California_august_temperature_anomaly_22_0.png

Plot the standardized anomaly

[12]:
plot_California(ERA5_sd_anomaly.sel(time = '2020'))
_images/Notebooks_California_august_temperature_anomaly_24_0.png

Selecting a domain for further analysis

We define the domain as a contiguous, land-only region within the box 125:100W, 20:45N with temperature anomalies above 2 standard deviation.

[17]:
ERA5_sd_anomaly_masked = (ERA5_sd_anomaly.
               sel(longitude = slice(-125,-100), #select the domain
                  latitude = slice(45,20)).      #select the domain
               where(ERA5_sd_anomaly.sel(time = '2020').squeeze('time')>2).  #Select the region where sd>2
               where(LSMask['lsm'].sel(time = '1979').squeeze('time') > 0.5) #Select land-only gridcells
              )
ERA5_sd_anomaly_masked.load()
[17]:
Show/Hide data repr Show/Hide attributes
xarray.DataArray
't2m'
  • time: 42
  • latitude: 26
  • longitude: 26
  • nan nan nan nan nan nan nan nan ... nan nan nan nan nan nan nan nan
    array([[[nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            ...,
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan]],
    
           [[nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            ...,
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan]],
    
           [[nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            ...,
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan]],
    
           ...,
    
           [[nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            ...,
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan]],
    
           [[nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            ...,
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan]],
    
           [[nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            ...,
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan]]], dtype=float32)
    • latitude
      (latitude)
      float64
      45.0 44.0 43.0 ... 22.0 21.0 20.0
      units :
      degrees_north
      long_name :
      latitude
      array([45., 44., 43., 42., 41., 40., 39., 38., 37., 36., 35., 34., 33., 32.,
             31., 30., 29., 28., 27., 26., 25., 24., 23., 22., 21., 20.])
    • longitude
      (longitude)
      float64
      -125.0 -124.0 ... -101.0 -100.0
      units :
      degrees_east
      long_name :
      longitude
      array([-125., -124., -123., -122., -121., -120., -119., -118., -117., -116.,
             -115., -114., -113., -112., -111., -110., -109., -108., -107., -106.,
             -105., -104., -103., -102., -101., -100.])
    • time
      (time)
      datetime64[ns]
      1979-08-01 ... 2020-08-01
      long_name :
      time
      array(['1979-08-01T00:00:00.000000000', '1980-08-01T00:00:00.000000000',
             '1981-08-01T00:00:00.000000000', '1982-08-01T00:00:00.000000000',
             '1983-08-01T00:00:00.000000000', '1984-08-01T00:00:00.000000000',
             '1985-08-01T00:00:00.000000000', '1986-08-01T00:00:00.000000000',
             '1987-08-01T00:00:00.000000000', '1988-08-01T00:00:00.000000000',
             '1989-08-01T00:00:00.000000000', '1990-08-01T00:00:00.000000000',
             '1991-08-01T00:00:00.000000000', '1992-08-01T00:00:00.000000000',
             '1993-08-01T00:00:00.000000000', '1994-08-01T00:00:00.000000000',
             '1995-08-01T00:00:00.000000000', '1996-08-01T00:00:00.000000000',
             '1997-08-01T00:00:00.000000000', '1998-08-01T00:00:00.000000000',
             '1999-08-01T00:00:00.000000000', '2000-08-01T00:00:00.000000000',
             '2001-08-01T00:00:00.000000000', '2002-08-01T00:00:00.000000000',
             '2003-08-01T00:00:00.000000000', '2004-08-01T00:00:00.000000000',
             '2005-08-01T00:00:00.000000000', '2006-08-01T00:00:00.000000000',
             '2007-08-01T00:00:00.000000000', '2008-08-01T00:00:00.000000000',
             '2009-08-01T00:00:00.000000000', '2010-08-01T00:00:00.000000000',
             '2011-08-01T00:00:00.000000000', '2012-08-01T00:00:00.000000000',
             '2013-08-01T00:00:00.000000000', '2014-08-01T00:00:00.000000000',
             '2015-08-01T00:00:00.000000000', '2016-08-01T00:00:00.000000000',
             '2017-08-01T00:00:00.000000000', '2018-08-01T00:00:00.000000000',
             '2019-08-01T00:00:00.000000000', '2020-08-01T00:00:00.000000000'],
            dtype='datetime64[ns]')
  • long_name :
    August temperature standardized anomaly
    units :
    -

The resulting domain looks as follows:

[18]:
plot_California(ERA5_sd_anomaly_masked.sel(time = '2020'))
_images/Notebooks_California_august_temperature_anomaly_29_0.png

Let’s make a nicer plot, using the previously defined function but adding an outline for the domain

[21]:
def plot_California(ERA5_input, ERA5_masked):

    extent = [-120, -80, 20, 50]
    central_lon = np.mean(extent[:2])
    central_lat = np.mean(extent[2:])

    plt.figure(frameon=False, figsize=(90 / 25.4, 60 / 25.4))
    ax = plt.axes(projection=ccrs.AlbersEqualArea(central_lon, central_lat))
    ax.set_extent(extent)

    ERA5_input.plot(
        ax=ax,
        transform=ccrs.PlateCarree(),
        extend='both')

    (ERA5_masked.fillna(0).sel(time = '2020').
     squeeze('time').
        plot.contour(levels = [0],
                     colors = 'black',
                      transform=ccrs.PlateCarree(),
                      ax = ax)
    )

    ax.add_feature(cartopy.feature.BORDERS, linestyle=':')
    ax.coastlines(
        resolution='110m')  #Currently can be one of “110m”, “50m”, and “10m”.
    ax.set_title('')
    gl = ax.gridlines(crs=ccrs.PlateCarree(),
                      draw_labels=True,
                      linewidth=1,
                      color='gray',
                      alpha=0.5,
                      linestyle='--')
    gl.top_labels = False
    gl.right_labels = False
[20]:
plot_California(ERA5_sd_anomaly.sel(time = '2020'),ERA5_sd_anomaly_masked)
/soge-home/users/cenv0732/.conda/envs/UNSEEN-open/lib/python3.8/site-packages/cartopy/mpl/geoaxes.py:1478: UserWarning: No contour levels were found within the data range.
  result = matplotlib.axes.Axes.contour(self, *args, **kwargs)
_images/Notebooks_California_august_temperature_anomaly_32_1.png

Some functions that were used to create the figure for publication: - Set figure size as 90 by 60mm and remove frame: plt.figure(frameon=False, figsize=(90 / 25.4, 60 / 25.4)) - Set the font: plt.rcParams["font.family"] = "sans-serif" - Set the font size: plt.rcParams['font.size'] = 10 - Set the font type so editing software (e.g. inkscape) can recognize text for svg graphics: plt.rcParams['svg.fonttype'] = 'none' - Same but for pdf files: plt.rcParams['pdf.fonttype'] = 42

[22]:
plt.rcParams["font.family"] = "sans-serif" ##change font
plt.rcParams['font.size'] = 10  ## change font size
# plt.rcParams['svg.fonttype'] = 'none' ## so inkscape recognized texts in svg file
plt.rcParams['pdf.fonttype'] = 42 ## so illustrator can recognize text
plot_California(ERA5_sd_anomaly.sel(time = '2020'),ERA5_sd_anomaly_masked)
plt.savefig('graphs/California_sd_anomaly_contour.pdf')
/soge-home/users/cenv0732/.conda/envs/UNSEEN-open/lib/python3.8/site-packages/cartopy/mpl/geoaxes.py:1478: UserWarning: No contour levels were found within the data range.
  result = matplotlib.axes.Axes.contour(self, *args, **kwargs)
_images/Notebooks_California_august_temperature_anomaly_34_1.png
Timeseries for the selected domain

Here we take the areal average over the selected domain and plot the resulting timeseries. Gridcell need to be weighted with their cell area when taking the spatial mean. We first calculate the area weights and then use these to average.

[23]:
area_weights = np.cos(np.deg2rad(ERA5_sd_anomaly.latitude))

ERA5_std_anomaly_timeseries = ERA5_sd_anomaly_masked.weighted(area_weights).mean(['longitude','latitude'])
[28]:
plt.figure(frameon=False, figsize=(90 / 25.4, 60 / 25.4))
ERA5_std_anomaly_timeseries.plot()
plt.ylabel('August standardized temperature anomaly')
plt.savefig('graphs/California_anomaly_timeseries.pdf')
[28]:
<Figure size 255.118x170.079 with 0 Axes>
[28]:
[<matplotlib.lines.Line2D at 0x7fbcb0643f40>]
[28]:
Text(0, 0.5, 'August standardized temperature anomaly')
_images/Notebooks_California_august_temperature_anomaly_37_3.png

Another option would be to use the Californian domain using regionmask: states = regionmask.defined_regions.natural_earth.us_states_50

February and April 2020 precipitation anomalies

In this notebook, we will analyze precipitation anomalies of February and April 2020, which seemed to be very contrasting in weather. We use the EOBS dataset.

Import packages

[1]:
##This is so variables get printed within jupyter
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
[2]:
##import packages
import os
import xarray as xr
import numpy as np
import matplotlib.pyplot as plt
import cartopy
import cartopy.crs as ccrs
import matplotlib.ticker as mticker

[3]:
os.chdir(os.path.abspath('../../')) # Change the working directory to UNSEEN-open
os.getcwd() #print the working directory
[3]:
'/lustre/soge1/projects/ls/personal/timo/UNSEEN-open'
[4]:
### Set plot font size
plt.rcParams['font.size'] = 10  ## change font size

Load EOBS

I downloaded EOBS (from 1950 - 2019) and the most recent EOBS data (2020) here. Note, you have to register as E-OBS user.

The data has a daily timestep. I resample the data into monthly average mm/day. I chose not to use the total monthly precipitation because of leap days.

[5]:
EOBS = xr.open_dataset('../UK_example/EOBS/rr_ens_mean_0.25deg_reg_v20.0e.nc') ## open the data
EOBS = EOBS.resample(time='1m').mean() ## Monthly averages
# EOBS = EOBS.sel(time=EOBS['time.month'] == 2) ## Select only February
EOBS
/soge-home/users/cenv0732/.conda/envs/UNSEEN-open/lib/python3.8/site-packages/xarray/core/nanops.py:142: RuntimeWarning: Mean of empty slice
  return np.nanmean(a, axis=axis, dtype=dtype)
[5]:
Show/Hide data repr Show/Hide attributes
xarray.Dataset
    • latitude: 201
    • longitude: 464
    • time: 835
    • time
      (time)
      datetime64[ns]
      1950-01-31 ... 2019-07-31
      array(['1950-01-31T00:00:00.000000000', '1950-02-28T00:00:00.000000000',
             '1950-03-31T00:00:00.000000000', ..., '2019-05-31T00:00:00.000000000',
             '2019-06-30T00:00:00.000000000', '2019-07-31T00:00:00.000000000'],
            dtype='datetime64[ns]')
    • longitude
      (longitude)
      float64
      -40.38 -40.12 ... 75.12 75.38
      units :
      degrees_east
      long_name :
      Longitude values
      axis :
      X
      standard_name :
      longitude
      array([-40.375, -40.125, -39.875, ...,  74.875,  75.125,  75.375])
    • latitude
      (latitude)
      float64
      25.38 25.62 25.88 ... 75.12 75.38
      units :
      degrees_north
      long_name :
      Latitude values
      axis :
      Y
      standard_name :
      latitude
      array([25.375, 25.625, 25.875, ..., 74.875, 75.125, 75.375])
    • rr
      (time, latitude, longitude)
      float32
      nan nan nan nan ... nan nan nan nan
      array([[[nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              ...,
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan]],
      
             [[nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              ...,
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan]],
      
             [[nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              ...,
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan]],
      
             ...,
      
             [[nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              ...,
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan]],
      
             [[nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              ...,
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan]],
      
             [[nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              ...,
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan]]], dtype=float32)

Here I define the attributes, that xarray uses when plotting

[6]:
EOBS['rr'].attrs = {'long_name': 'rainfall',  ##Define the name
 'units': 'mm/day', ## unit
 'standard_name': 'thickness_of_rainfall_amount'} ## original name, not used
EOBS['rr'].mean('time').plot() ## and show the 1950-2019 average February precipitation

[6]:
<matplotlib.collections.QuadMesh at 0x7f5f44952610>
_images/Notebooks_2020_contrasting_weather_9_1.png

The 2020 data file is separate and needs the same preprocessing:

[7]:
EOBS2020 = xr.open_dataset('../UK_example/EOBS/rr_0.25deg_day_2020_grid_ensmean.nc.1') #open
EOBS2020 = EOBS2020.resample(time='1m').mean() #Monthly mean
EOBS2020['rr'].sel(time='2020-04').plot() #show map
EOBS2020 ## display dataset
/soge-home/users/cenv0732/.conda/envs/UNSEEN-open/lib/python3.8/site-packages/xarray/core/nanops.py:142: RuntimeWarning: Mean of empty slice
  return np.nanmean(a, axis=axis, dtype=dtype)
[7]:
<matplotlib.collections.QuadMesh at 0x7f5f448a0e50>
[7]:
Show/Hide data repr Show/Hide attributes
xarray.Dataset
    • latitude: 201
    • longitude: 464
    • time: 12
    • time
      (time)
      datetime64[ns]
      2020-01-31 ... 2020-12-31
      array(['2020-01-31T00:00:00.000000000', '2020-02-29T00:00:00.000000000',
             '2020-03-31T00:00:00.000000000', '2020-04-30T00:00:00.000000000',
             '2020-05-31T00:00:00.000000000', '2020-06-30T00:00:00.000000000',
             '2020-07-31T00:00:00.000000000', '2020-08-31T00:00:00.000000000',
             '2020-09-30T00:00:00.000000000', '2020-10-31T00:00:00.000000000',
             '2020-11-30T00:00:00.000000000', '2020-12-31T00:00:00.000000000'],
            dtype='datetime64[ns]')
    • longitude
      (longitude)
      float64
      -40.38 -40.12 ... 75.12 75.38
      standard_name :
      longitude
      long_name :
      Longitude values
      units :
      degrees_east
      axis :
      X
      array([-40.375, -40.125, -39.875, ...,  74.875,  75.125,  75.375])
    • latitude
      (latitude)
      float64
      25.38 25.62 25.88 ... 75.12 75.38
      standard_name :
      latitude
      long_name :
      Latitude values
      units :
      degrees_north
      axis :
      Y
      array([25.375, 25.625, 25.875, ..., 74.875, 75.125, 75.375])
    • rr
      (time, latitude, longitude)
      float32
      nan nan nan nan ... nan nan nan nan
      array([[[nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              ...,
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan]],
      
             [[nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              ...,
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan]],
      
             [[nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              ...,
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan]],
      
             ...,
      
             [[nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              ...,
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan]],
      
             [[nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              ...,
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan]],
      
             [[nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              ...,
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan]]], dtype=float32)
_images/Notebooks_2020_contrasting_weather_11_3.png

Plot the 2020 event

I calculate the anomaly (deviation from the mean in mm/d) and divide this by the standard deviation to obtain the standardized anomalies.

[8]:
EOBS2020_anomaly = EOBS2020['rr'].groupby('time.month') - EOBS['rr'].groupby('time.month').mean('time')
EOBS2020_anomaly

EOBS2020_sd_anomaly = EOBS2020_anomaly.groupby('time.month') / EOBS['rr'].groupby('time.month').std('time')

EOBS2020_sd_anomaly.attrs = {
    'long_name': 'Monthly precipitation standardized anomaly',
    'units': '-'
}

EOBS2020_sd_anomaly
[8]:
Show/Hide data repr Show/Hide attributes
xarray.DataArray
'rr'
  • time: 12
  • latitude: 201
  • longitude: 464
  • nan nan nan nan nan nan nan nan ... nan nan nan nan nan nan nan nan
    array([[[nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            ...,
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan]],
    
           [[nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            ...,
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan]],
    
           [[nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            ...,
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan]],
    
           ...,
    
           [[nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            ...,
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan]],
    
           [[nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            ...,
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan]],
    
           [[nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            ...,
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan]]], dtype=float32)
    • longitude
      (longitude)
      float64
      -40.38 -40.12 ... 75.12 75.38
      standard_name :
      longitude
      long_name :
      Longitude values
      units :
      degrees_east
      axis :
      X
      array([-40.375, -40.125, -39.875, ...,  74.875,  75.125,  75.375])
    • latitude
      (latitude)
      float64
      25.38 25.62 25.88 ... 75.12 75.38
      standard_name :
      latitude
      long_name :
      Latitude values
      units :
      degrees_north
      axis :
      Y
      array([25.375, 25.625, 25.875, ..., 74.875, 75.125, 75.375])
    • time
      (time)
      datetime64[ns]
      2020-01-31 ... 2020-12-31
      array(['2020-01-31T00:00:00.000000000', '2020-02-29T00:00:00.000000000',
             '2020-03-31T00:00:00.000000000', '2020-04-30T00:00:00.000000000',
             '2020-05-31T00:00:00.000000000', '2020-06-30T00:00:00.000000000',
             '2020-07-31T00:00:00.000000000', '2020-08-31T00:00:00.000000000',
             '2020-09-30T00:00:00.000000000', '2020-10-31T00:00:00.000000000',
             '2020-11-30T00:00:00.000000000', '2020-12-31T00:00:00.000000000'],
            dtype='datetime64[ns]')
    • month
      (time)
      int64
      1 2 3 4 5 6 7 8 9 10 11 12
      array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])
[8]:
Show/Hide data repr Show/Hide attributes
xarray.DataArray
'rr'
  • time: 12
  • latitude: 201
  • longitude: 464
  • nan nan nan nan nan nan nan nan ... nan nan nan nan nan nan nan nan
    array([[[nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            ...,
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan]],
    
           [[nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            ...,
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan]],
    
           [[nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            ...,
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan]],
    
           ...,
    
           [[nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            ...,
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan]],
    
           [[nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            ...,
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan]],
    
           [[nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            ...,
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan]]], dtype=float32)
    • longitude
      (longitude)
      float64
      -40.38 -40.12 ... 75.12 75.38
      standard_name :
      longitude
      long_name :
      Longitude values
      units :
      degrees_east
      axis :
      X
      array([-40.375, -40.125, -39.875, ...,  74.875,  75.125,  75.375])
    • latitude
      (latitude)
      float64
      25.38 25.62 25.88 ... 75.12 75.38
      standard_name :
      latitude
      long_name :
      Latitude values
      units :
      degrees_north
      axis :
      Y
      array([25.375, 25.625, 25.875, ..., 74.875, 75.125, 75.375])
    • time
      (time)
      datetime64[ns]
      2020-01-31 ... 2020-12-31
      array(['2020-01-31T00:00:00.000000000', '2020-02-29T00:00:00.000000000',
             '2020-03-31T00:00:00.000000000', '2020-04-30T00:00:00.000000000',
             '2020-05-31T00:00:00.000000000', '2020-06-30T00:00:00.000000000',
             '2020-07-31T00:00:00.000000000', '2020-08-31T00:00:00.000000000',
             '2020-09-30T00:00:00.000000000', '2020-10-31T00:00:00.000000000',
             '2020-11-30T00:00:00.000000000', '2020-12-31T00:00:00.000000000'],
            dtype='datetime64[ns]')
    • month
      (time)
      int64
      1 2 3 4 5 6 7 8 9 10 11 12
      array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])
  • long_name :
    Monthly precipitation standardized anomaly
    units :
    -

I select February and April (tips on how to select this are appreciated)

[9]:
EOBS2020_sd_anomaly
# EOBS2020_sd_anomaly.sel(time = ['2020-02','2020-04']) ## Dont know how to select this by label?
EOBS2020_sd_anomaly[[1,3],:,:] ## Dont know how to select this by label?

[9]:
Show/Hide data repr Show/Hide attributes
xarray.DataArray
'rr'
  • time: 12
  • latitude: 201
  • longitude: 464
  • nan nan nan nan nan nan nan nan ... nan nan nan nan nan nan nan nan
    array([[[nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            ...,
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan]],
    
           [[nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            ...,
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan]],
    
           [[nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            ...,
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan]],
    
           ...,
    
           [[nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            ...,
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan]],
    
           [[nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            ...,
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan]],
    
           [[nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            ...,
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan]]], dtype=float32)
    • longitude
      (longitude)
      float64
      -40.38 -40.12 ... 75.12 75.38
      standard_name :
      longitude
      long_name :
      Longitude values
      units :
      degrees_east
      axis :
      X
      array([-40.375, -40.125, -39.875, ...,  74.875,  75.125,  75.375])
    • latitude
      (latitude)
      float64
      25.38 25.62 25.88 ... 75.12 75.38
      standard_name :
      latitude
      long_name :
      Latitude values
      units :
      degrees_north
      axis :
      Y
      array([25.375, 25.625, 25.875, ..., 74.875, 75.125, 75.375])
    • time
      (time)
      datetime64[ns]
      2020-01-31 ... 2020-12-31
      array(['2020-01-31T00:00:00.000000000', '2020-02-29T00:00:00.000000000',
             '2020-03-31T00:00:00.000000000', '2020-04-30T00:00:00.000000000',
             '2020-05-31T00:00:00.000000000', '2020-06-30T00:00:00.000000000',
             '2020-07-31T00:00:00.000000000', '2020-08-31T00:00:00.000000000',
             '2020-09-30T00:00:00.000000000', '2020-10-31T00:00:00.000000000',
             '2020-11-30T00:00:00.000000000', '2020-12-31T00:00:00.000000000'],
            dtype='datetime64[ns]')
    • month
      (time)
      int64
      1 2 3 4 5 6 7 8 9 10 11 12
      array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])
  • long_name :
    Monthly precipitation standardized anomaly
    units :
    -
[9]:
Show/Hide data repr Show/Hide attributes
xarray.DataArray
'rr'
  • time: 2
  • latitude: 201
  • longitude: 464
  • nan nan nan nan nan nan nan nan ... nan nan nan nan nan nan nan nan
    array([[[nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            ...,
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan]],
    
           [[nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            ...,
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan]]], dtype=float32)
    • longitude
      (longitude)
      float64
      -40.38 -40.12 ... 75.12 75.38
      standard_name :
      longitude
      long_name :
      Longitude values
      units :
      degrees_east
      axis :
      X
      array([-40.375, -40.125, -39.875, ...,  74.875,  75.125,  75.375])
    • latitude
      (latitude)
      float64
      25.38 25.62 25.88 ... 75.12 75.38
      standard_name :
      latitude
      long_name :
      Latitude values
      units :
      degrees_north
      axis :
      Y
      array([25.375, 25.625, 25.875, ..., 74.875, 75.125, 75.375])
    • time
      (time)
      datetime64[ns]
      2020-02-29 2020-04-30
      array(['2020-02-29T00:00:00.000000000', '2020-04-30T00:00:00.000000000'],
            dtype='datetime64[ns]')
    • month
      (time)
      int64
      2 4
      array([2, 4])
  • long_name :
    Monthly precipitation standardized anomaly
    units :
    -

And plot using cartopy!

[11]:
EOBS_plots = EOBS2020_sd_anomaly[[1, 3], :, :].plot(
    transform=ccrs.PlateCarree(),
    robust=True,
    extend = 'both',
    col='time',
    cmap=plt.cm.twilight_shifted_r,
    subplot_kws={'projection': ccrs.EuroPP()})

for ax in EOBS_plots.axes.flat:
    ax.add_feature(cartopy.feature.BORDERS, linestyle=':')
    ax.coastlines(resolution='50m')
    gl = ax.gridlines(crs=ccrs.PlateCarree(),
                      draw_labels=False,
                      linewidth=1,
                      color='gray',
                      alpha=0.5,
                      linestyle='--')

# plt.savefig('graphs/February_April_2020_precipAnomaly.png', dpi=300)
[11]:
<cartopy.mpl.feature_artist.FeatureArtist at 0x7f5f4483cb20>
[11]:
<cartopy.mpl.feature_artist.FeatureArtist at 0x7f5f44847850>
[11]:
<cartopy.mpl.feature_artist.FeatureArtist at 0x7f5f449a6610>
[11]:
<cartopy.mpl.feature_artist.FeatureArtist at 0x7f5f403af2b0>
_images/Notebooks_2020_contrasting_weather_17_4.png

Using EOBS + upscaling

Here we explore how to best extract areal averaged precipitation and test this for UK precipitation within SEAS5 and EOBS. The code is inspired on Matteo De Felice’s blog – credits to him!

We create a mask for all 241 countries within Regionmask, that has predefined countries from Natural Earth datasets (shapefiles). We use the mask to go from gridded precipitation to country-averaged timeseries. We regrid EOBS to the SEAS5 grid so we can select the same grid cells in calculating the UK average for both datasets. The country outline would not be perfect, but the masks would be the same so the comparison would be fair.

I use the xesmf package for upscaling, a good example can be found in this notebook.

Import packages

We need the packages regionmask for masking and xesmf for regridding. I cannot install xesmf into the UNSEEN-open environment without breaking my environment, so in this notebook I use a separate ‘upscale’ environment, as suggested by this issue. I use the packages esmpy=7.1.0 xesmf=0.2.1 regionmask cartopy matplotlib xarray numpy netcdf4.

[2]:
##This is so variables get printed within jupyter
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
[3]:
##import packages
import os
import sys
sys.path.insert(0, os.path.abspath('../../../'))
os.chdir(os.path.abspath('../../../'))
[4]:
import xarray as xr
import numpy as np
import matplotlib.pyplot as plt
import cartopy
import cartopy.crs as ccrs
import matplotlib.ticker as mticker

import regionmask       # Masking
import xesmf as xe      # Regridding

Load SEAS5 and EOBS

From CDS, we retrieve SEAS5 and here we merge the retrieved files (see more in preprocessing). We create a netcdf file containing the dimensions lat, lon, time (35 years), number (25 ensembles) and leadtime (5 initialization months).

[5]:
SEAS5 = xr.open_dataset('../UK_example/SEAS5/SEAS5_UK.nc')

And load EOBS netcdf with only February precipitation, resulting in 71 values, one for each year within 1950 - 2020 over the European domain (25N-75N x 40W-75E).

[6]:
EOBS = xr.open_dataset('../UK_example/EOBS/EOBS_UK.nc')
EOBS
[6]:
Show/Hide data repr Show/Hide attributes
xarray.Dataset
    • latitude: 201
    • longitude: 464
    • time: 71
    • latitude
      (latitude)
      float64
      25.38 25.62 25.88 ... 75.12 75.38
      units :
      degrees_north
      long_name :
      Latitude values
      axis :
      Y
      standard_name :
      latitude
      array([25.375, 25.625, 25.875, ..., 74.875, 75.125, 75.375])
    • longitude
      (longitude)
      float64
      -40.38 -40.12 ... 75.12 75.38
      units :
      degrees_east
      long_name :
      Longitude values
      axis :
      X
      standard_name :
      longitude
      array([-40.375, -40.125, -39.875, ...,  74.875,  75.125,  75.375])
    • time
      (time)
      datetime64[ns]
      1950-02-28 ... 2020-02-29
      array(['1950-02-28T00:00:00.000000000', '1951-02-28T00:00:00.000000000',
             '1952-02-29T00:00:00.000000000', '1953-02-28T00:00:00.000000000',
             '1954-02-28T00:00:00.000000000', '1955-02-28T00:00:00.000000000',
             '1956-02-29T00:00:00.000000000', '1957-02-28T00:00:00.000000000',
             '1958-02-28T00:00:00.000000000', '1959-02-28T00:00:00.000000000',
             '1960-02-29T00:00:00.000000000', '1961-02-28T00:00:00.000000000',
             '1962-02-28T00:00:00.000000000', '1963-02-28T00:00:00.000000000',
             '1964-02-29T00:00:00.000000000', '1965-02-28T00:00:00.000000000',
             '1966-02-28T00:00:00.000000000', '1967-02-28T00:00:00.000000000',
             '1968-02-29T00:00:00.000000000', '1969-02-28T00:00:00.000000000',
             '1970-02-28T00:00:00.000000000', '1971-02-28T00:00:00.000000000',
             '1972-02-29T00:00:00.000000000', '1973-02-28T00:00:00.000000000',
             '1974-02-28T00:00:00.000000000', '1975-02-28T00:00:00.000000000',
             '1976-02-29T00:00:00.000000000', '1977-02-28T00:00:00.000000000',
             '1978-02-28T00:00:00.000000000', '1979-02-28T00:00:00.000000000',
             '1980-02-29T00:00:00.000000000', '1981-02-28T00:00:00.000000000',
             '1982-02-28T00:00:00.000000000', '1983-02-28T00:00:00.000000000',
             '1984-02-29T00:00:00.000000000', '1985-02-28T00:00:00.000000000',
             '1986-02-28T00:00:00.000000000', '1987-02-28T00:00:00.000000000',
             '1988-02-29T00:00:00.000000000', '1989-02-28T00:00:00.000000000',
             '1990-02-28T00:00:00.000000000', '1991-02-28T00:00:00.000000000',
             '1992-02-29T00:00:00.000000000', '1993-02-28T00:00:00.000000000',
             '1994-02-28T00:00:00.000000000', '1995-02-28T00:00:00.000000000',
             '1996-02-29T00:00:00.000000000', '1997-02-28T00:00:00.000000000',
             '1998-02-28T00:00:00.000000000', '1999-02-28T00:00:00.000000000',
             '2000-02-29T00:00:00.000000000', '2001-02-28T00:00:00.000000000',
             '2002-02-28T00:00:00.000000000', '2003-02-28T00:00:00.000000000',
             '2004-02-29T00:00:00.000000000', '2005-02-28T00:00:00.000000000',
             '2006-02-28T00:00:00.000000000', '2007-02-28T00:00:00.000000000',
             '2008-02-29T00:00:00.000000000', '2009-02-28T00:00:00.000000000',
             '2010-02-28T00:00:00.000000000', '2011-02-28T00:00:00.000000000',
             '2012-02-29T00:00:00.000000000', '2013-02-28T00:00:00.000000000',
             '2014-02-28T00:00:00.000000000', '2015-02-28T00:00:00.000000000',
             '2016-02-29T00:00:00.000000000', '2017-02-28T00:00:00.000000000',
             '2018-02-28T00:00:00.000000000', '2019-02-28T00:00:00.000000000',
             '2020-02-29T00:00:00.000000000'], dtype='datetime64[ns]')
    • rr
      (time, latitude, longitude)
      float32
      ...
      long_name :
      rainfall
      units :
      mm/day
      standard_name :
      thickness_of_rainfall_amount
      [6621744 values with dtype=float32]

Masking

Here we load the countries and create a mask for SEAS5 and for EOBS.

Regionmask has predefined countries from Natural Earth datasets (shapefiles).

[7]:
countries = regionmask.defined_regions.natural_earth.countries_50
countries
[7]:
241 'Natural Earth Countries: 50m' Regions (http://www.naturalearthdata.com)
ZW ZM YE VN VE V VU UZ UY FSM MH MP VI GU AS PR US GS IO SH PN AI FK KY BM VG TC MS JE GG IM GB AE UA UG TM TR TN TT TO TG TL TH TZ TJ TW SYR CH S SW SR SS SD LK E KR ZA SO SL SB SK SLO SG SL SC RS SN SA ST RSM WS VC LC KN RW RUS RO QA P PL PH PE PY PG PA PW PK OM N KP NG NE NI NZ NU CK NL AW CW NP NR NA MZ MA WS ME MN MD MC MX MU MR M ML MV MY MW MG MK L LT FL LY LR LS LB LV LA KG KW KO KI KE KZ J J J I IS PAL IRL IRQ IRN INDO IND IS HU HN HT GY GW GN GT GD GR GH D GE GM GA F PM WF MF BL PF NC TF AI FIN FJ ET EST ER GQ SV EG EC DO DM DJ GL FO DK CZ CN CY CU HR CI CR DRC CG KM CO CN MO HK CL TD CF CV CA CM KH MM BI BF BG BN BR BW BiH BO BT BJ BZ B BY BB BD BH BS AZ A AU IOT HM NF AU ARM AR AG AO AND DZ AL AF SG AQ SX

Now we create the mask for the SEAS5 grid. Only one timestep is needed to create the mask. This mask will lateron be used to mask all the timesteps.

[8]:
SEAS5_mask = countries.mask(SEAS5.sel(leadtime=2, number=0, time='1982'),
                            lon_name='longitude',
                            lat_name='latitude')

And create a plot to illustrate what the mask looks like. The mask just indicates for each gridcell what country the gridcell belongs to.

[9]:
SEAS5_mask
SEAS5_mask.plot()
[9]:
Show/Hide data repr Show/Hide attributes
xarray.DataArray
'region'
  • latitude: 11
  • longitude: 14
  • nan nan nan nan nan nan nan nan ... nan nan nan nan nan nan nan 160.0
    array([[ nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
             nan,  nan,  nan],
           [ nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  31.,  nan,  nan,
             nan,  nan,  nan],
           [ nan,  nan,  nan,  nan,  31.,  nan,  31.,  31.,  nan,  nan,  nan,
             nan,  nan,  nan],
           [ nan,  nan,  nan,  nan,  nan,  nan,  31.,  31.,  31.,  nan,  nan,
             nan,  nan,  nan],
           [ nan,  nan,  nan,  nan,  nan,  nan,  nan,  31.,  nan,  nan,  nan,
             nan,  nan,  nan],
           [ nan,  nan,  nan, 140.,  31.,  31.,  31.,  31.,  31.,  31.,  nan,
             nan,  nan,  nan],
           [ nan, 140., 140., 140., 140.,  nan,  nan,  nan,  nan,  31.,  31.,
             nan,  nan,  nan],
           [ nan,  nan, 140., 140., 140.,  nan,  nan,  31.,  31.,  31.,  31.,
             31.,  nan,  nan],
           [ nan, 140., 140., 140.,  nan,  nan,  31.,  31.,  31.,  31.,  31.,
             31.,  31.,  nan],
           [ nan,  nan,  nan,  nan,  nan,  nan,  nan,  31.,  31.,  31.,  31.,
             31.,  31., 160.],
           [ nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
             nan,  nan, 160.]])
    • latitude
      (latitude)
      float32
      60.0 59.0 58.0 ... 52.0 51.0 50.0
      array([60., 59., 58., 57., 56., 55., 54., 53., 52., 51., 50.], dtype=float32)
    • longitude
      (longitude)
      float32
      -11.0 -10.0 -9.0 ... 0.0 1.0 2.0
      array([-11., -10.,  -9.,  -8.,  -7.,  -6.,  -5.,  -4.,  -3.,  -2.,  -1.,   0.,
               1.,   2.], dtype=float32)
[9]:
<matplotlib.collections.QuadMesh at 0x2b58d909bd50>
_images/Notebooks_2.Preprocess_2.3Upscale_16_2.png

And now we can extract the UK averaged precipitation within SEAS5 by using the mask index of the UK: where(SEAS5_mask == UK_index). So we need to find the index of one of the 241 abbreviations. In this case for the UK use ‘GB’. Additionally, if you can’t find a country, use countries.regions to get the full names of the countries.

[10]:
countries.abbrevs.index('GB')
[10]:
31

To select the UK average, we select SEAS5 precipitation (tprate), select the gridcells that are within the UK and take the mean over those gridcells. This results in a dataset of February precipitation for 35 years (1981-2016), with 5 leadtimes and 25 ensemble members.

[11]:
SEAS5_UK = (SEAS5['tprate']
            .where(SEAS5_mask == 31)
            .mean(dim=['latitude', 'longitude']))
SEAS5_UK
/soge-home/users/cenv0732/.conda/envs/upscale/lib/python3.7/site-packages/xarray/core/nanops.py:142: RuntimeWarning: Mean of empty slice
  return np.nanmean(a, axis=axis, dtype=dtype)
[11]:
Show/Hide data repr Show/Hide attributes
xarray.DataArray
'tprate'
  • leadtime: 5
  • time: 39
  • number: 51
  • 1.7730116 1.9548205 3.7803986 ... 2.7118914 2.7603571 4.0256233
    array([[[1.7730116 , 1.9548205 , 3.7803986 , ...,        nan,
                    nan,        nan],
            [3.040877  , 1.8855734 , 4.2009687 , ...,        nan,
                    nan,        nan],
            [3.556001  , 3.6879914 , 3.184576  , ...,        nan,
                    nan,        nan],
            ...,
            [2.9507504 , 2.789885  , 3.252184  , ..., 1.8537003 ,
             3.002799  , 3.8576229 ],
            [2.9951687 , 3.872034  , 3.8536534 , ..., 3.5534801 ,
             2.4795628 , 3.5001822 ],
            [1.6970206 , 1.3571059 , 2.7251225 , ..., 3.535101  ,
             3.3363144 , 3.8510854 ]],
    
           [[1.0868028 , 1.5332695 , 3.2461395 , ...,        nan,
                    nan,        nan],
            [0.99898285, 2.9119303 , 2.1601522 , ...,        nan,
                    nan,        nan],
            [3.6581304 , 3.5088263 , 2.0143754 , ...,        nan,
                    nan,        nan],
            ...,
            [2.2972794 , 2.6159883 , 1.2061493 , ..., 2.6729386 ,
             2.8370278 , 3.2695346 ],
            [5.094759  , 2.7528167 , 1.9417399 , ..., 3.5889919 ,
             1.525842  , 3.0035791 ],
            [3.240109  , 4.146352  , 2.9840755 , ..., 3.131607  ,
             3.634821  , 2.8452704 ]],
    
           [[2.0119116 , 2.4435556 , 1.3927166 , ...,        nan,
                    nan,        nan],
            [2.62523   , 3.6218376 , 3.003645  , ...,        nan,
                    nan,        nan],
            [4.071685  , 2.6880858 , 3.8181992 , ...,        nan,
                    nan,        nan],
            ...,
            [4.18686   , 2.4341922 , 2.3860729 , ..., 3.6107152 ,
             2.654895  , 1.8162413 ],
            [1.2847987 , 2.8927827 , 2.3829966 , ..., 4.846281  ,
             2.2673166 , 2.598539  ],
            [2.4176307 , 2.826758  , 1.9144063 , ..., 2.3856838 ,
             2.0960681 , 1.6105822 ]],
    
           [[2.9105136 , 3.6938024 , 1.1343408 , ...,        nan,
                    nan,        nan],
            [4.02007   , 1.8249133 , 3.099     , ...,        nan,
                    nan,        nan],
            [3.1248841 , 2.219241  , 3.6903172 , ...,        nan,
                    nan,        nan],
            ...,
            [3.1467083 , 5.082951  , 2.9249673 , ..., 2.0092194 ,
             2.544652  , 3.8257258 ],
            [2.3694625 , 3.578296  , 3.527209  , ..., 3.950293  ,
             2.9967482 , 1.6948671 ],
            [4.3424473 , 5.037247  , 2.4635391 , ..., 2.5078914 ,
             2.767472  , 2.4778244 ]],
    
           [[3.1285267 , 3.269652  , 2.5995293 , ...,        nan,
                    nan,        nan],
            [2.262867  , 3.3503478 , 2.4287066 , ...,        nan,
                    nan,        nan],
            [4.0569496 , 2.156282  , 1.781804  , ...,        nan,
                    nan,        nan],
            ...,
            [2.1076744 , 1.7262052 , 1.8901306 , ..., 3.2577527 ,
             3.3160198 , 1.7766333 ],
            [2.7879143 , 3.5520785 , 1.695757  , ..., 2.852018  ,
             3.3634171 , 2.9967682 ],
            [3.836647  , 2.7460904 , 4.5292573 , ..., 2.7118914 ,
             2.7603571 , 4.0256233 ]]], dtype=float32)
    • number
      (number)
      int64
      0 1 2 3 4 5 6 ... 45 46 47 48 49 50
      long_name :
      ensemble_member
      array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
             18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
             36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50])
    • time
      (time)
      datetime64[ns]
      1982-02-01 ... 2020-02-01
      long_name :
      time
      array(['1982-02-01T00:00:00.000000000', '1983-02-01T00:00:00.000000000',
             '1984-02-01T00:00:00.000000000', '1985-02-01T00:00:00.000000000',
             '1986-02-01T00:00:00.000000000', '1987-02-01T00:00:00.000000000',
             '1988-02-01T00:00:00.000000000', '1989-02-01T00:00:00.000000000',
             '1990-02-01T00:00:00.000000000', '1991-02-01T00:00:00.000000000',
             '1992-02-01T00:00:00.000000000', '1993-02-01T00:00:00.000000000',
             '1994-02-01T00:00:00.000000000', '1995-02-01T00:00:00.000000000',
             '1996-02-01T00:00:00.000000000', '1997-02-01T00:00:00.000000000',
             '1998-02-01T00:00:00.000000000', '1999-02-01T00:00:00.000000000',
             '2000-02-01T00:00:00.000000000', '2001-02-01T00:00:00.000000000',
             '2002-02-01T00:00:00.000000000', '2003-02-01T00:00:00.000000000',
             '2004-02-01T00:00:00.000000000', '2005-02-01T00:00:00.000000000',
             '2006-02-01T00:00:00.000000000', '2007-02-01T00:00:00.000000000',
             '2008-02-01T00:00:00.000000000', '2009-02-01T00:00:00.000000000',
             '2010-02-01T00:00:00.000000000', '2011-02-01T00:00:00.000000000',
             '2012-02-01T00:00:00.000000000', '2013-02-01T00:00:00.000000000',
             '2014-02-01T00:00:00.000000000', '2015-02-01T00:00:00.000000000',
             '2016-02-01T00:00:00.000000000', '2017-02-01T00:00:00.000000000',
             '2018-02-01T00:00:00.000000000', '2019-02-01T00:00:00.000000000',
             '2020-02-01T00:00:00.000000000'], dtype='datetime64[ns]')
    • leadtime
      (leadtime)
      int64
      2 3 4 5 6
      array([2, 3, 4, 5, 6])

However, xarray does not take into account the area of the gridcells in taking the average. Therefore, we have to calculate the area-weighted mean of the gridcells. To calculate the area of each gridcell, I use cdo cdo gridarea infile outfile. Here I load the generated file:

[12]:
Gridarea_SEAS5 = xr.open_dataset('../UK_example/Gridarea_SEAS5.nc')
Gridarea_SEAS5['cell_area'].plot()
[12]:
<matplotlib.collections.QuadMesh at 0x2b58d91655d0>
_images/Notebooks_2.Preprocess_2.3Upscale_22_1.png
[13]:
SEAS5_UK_weighted = (SEAS5['tprate']
                  .where(SEAS5_mask == 31)
                  .weighted(Gridarea_SEAS5['cell_area'])
                  .mean(dim=['latitude', 'longitude'])
                 )
SEAS5_UK_weighted
[13]:
Show/Hide data repr Show/Hide attributes
xarray.DataArray
  • leadtime: 5
  • time: 39
  • number: 51
  • 1.747 1.916 3.742 2.909 4.562 3.113 ... 1.786 3.534 2.696 2.736 4.043
    array([[[1.74715784, 1.91625164, 3.74246331, ...,        nan,
                    nan,        nan],
            [3.01557164, 1.86355946, 4.23964218, ...,        nan,
                    nan,        nan],
            [3.45037457, 3.67373672, 3.19124952, ...,        nan,
                    nan,        nan],
            ...,
            [2.93410386, 2.74606084, 3.18639043, ..., 1.75603451,
             2.92185771, 3.83075713],
            [2.99583296, 3.88464775, 3.8476675 , ..., 3.51473106,
             2.43278432, 3.47741487],
            [1.70198039, 1.34639466, 2.70610296, ..., 3.45445812,
             3.2937839 , 3.80084943]],
    
           [[1.08925258, 1.502868  , 3.23383862, ...,        nan,
                    nan,        nan],
            [0.96385251, 2.9144073 , 2.14176199, ...,        nan,
                    nan,        nan],
            [3.64189637, 3.44186084, 1.96817031, ...,        nan,
                    nan,        nan],
            ...,
            [2.26289577, 2.64050615, 1.18109141, ..., 2.64159823,
             2.78090027, 3.29229504],
            [5.05480192, 2.7228239 , 1.9085107 , ..., 3.56878426,
             1.46244825, 2.97974057],
            [3.19406732, 4.0754389 , 2.89935002, ..., 3.16283376,
             3.65486179, 2.7700864 ]],
    
           [[1.94362083, 2.4160058 , 1.36431312, ...,        nan,
                    nan,        nan],
            [2.57294707, 3.55756557, 2.96458594, ...,        nan,
                    nan,        nan],
            [4.13926899, 2.61380816, 3.76440713, ...,        nan,
                    nan,        nan],
            ...,
            [4.12634415, 2.40580538, 2.30931212, ..., 3.60437091,
             2.65663573, 1.78742804],
            [1.26374643, 2.86376533, 2.36735188, ..., 4.80009495,
             2.22897541, 2.58805634],
            [2.37179549, 2.86106518, 1.90401998, ..., 2.40591114,
             2.08595829, 1.55529216]],
    
           [[2.83876303, 3.61651907, 1.0950032 , ...,        nan,
                    nan,        nan],
            [3.95351432, 1.78778573, 3.08959013, ...,        nan,
                    nan,        nan],
            [3.13152664, 2.19419128, 3.64975772, ...,        nan,
                    nan,        nan],
            ...,
            [3.06984433, 4.99797376, 2.88955225, ..., 1.97087261,
             2.52861605, 3.75363435],
            [2.36128612, 3.52506141, 3.50087731, ..., 3.93962213,
             2.94645673, 1.69376439],
            [4.35700042, 5.02027928, 2.46636484, ..., 2.46297193,
             2.74433285, 2.45057078]],
    
           [[3.15063505, 3.23490175, 2.60923731, ...,        nan,
                    nan,        nan],
            [2.21017692, 3.32458317, 2.3819878 , ...,        nan,
                    nan,        nan],
            [4.07655988, 2.07606666, 1.75961194, ...,        nan,
                    nan,        nan],
            ...,
            [2.11429562, 1.68735339, 1.84325489, ..., 3.26149299,
             3.32371077, 1.78717855],
            [2.75385495, 3.4979577 , 1.66987709, ..., 2.85354534,
             3.40446786, 3.01953669],
            [3.81460036, 2.70650167, 4.54162104, ..., 2.69608025,
             2.73558576, 4.04264194]]])
    • number
      (number)
      int64
      0 1 2 3 4 5 6 ... 45 46 47 48 49 50
      long_name :
      ensemble_member
      array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
             18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
             36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50])
    • time
      (time)
      datetime64[ns]
      1982-02-01 ... 2020-02-01
      long_name :
      time
      array(['1982-02-01T00:00:00.000000000', '1983-02-01T00:00:00.000000000',
             '1984-02-01T00:00:00.000000000', '1985-02-01T00:00:00.000000000',
             '1986-02-01T00:00:00.000000000', '1987-02-01T00:00:00.000000000',
             '1988-02-01T00:00:00.000000000', '1989-02-01T00:00:00.000000000',
             '1990-02-01T00:00:00.000000000', '1991-02-01T00:00:00.000000000',
             '1992-02-01T00:00:00.000000000', '1993-02-01T00:00:00.000000000',
             '1994-02-01T00:00:00.000000000', '1995-02-01T00:00:00.000000000',
             '1996-02-01T00:00:00.000000000', '1997-02-01T00:00:00.000000000',
             '1998-02-01T00:00:00.000000000', '1999-02-01T00:00:00.000000000',
             '2000-02-01T00:00:00.000000000', '2001-02-01T00:00:00.000000000',
             '2002-02-01T00:00:00.000000000', '2003-02-01T00:00:00.000000000',
             '2004-02-01T00:00:00.000000000', '2005-02-01T00:00:00.000000000',
             '2006-02-01T00:00:00.000000000', '2007-02-01T00:00:00.000000000',
             '2008-02-01T00:00:00.000000000', '2009-02-01T00:00:00.000000000',
             '2010-02-01T00:00:00.000000000', '2011-02-01T00:00:00.000000000',
             '2012-02-01T00:00:00.000000000', '2013-02-01T00:00:00.000000000',
             '2014-02-01T00:00:00.000000000', '2015-02-01T00:00:00.000000000',
             '2016-02-01T00:00:00.000000000', '2017-02-01T00:00:00.000000000',
             '2018-02-01T00:00:00.000000000', '2019-02-01T00:00:00.000000000',
             '2020-02-01T00:00:00.000000000'], dtype='datetime64[ns]')
    • leadtime
      (leadtime)
      int64
      2 3 4 5 6
      array([2, 3, 4, 5, 6])

Another solution is to take the cosine of the latitude, which is proportional to the grid cell area for regular latitude/ longitude grids (xarray example). This should be the same as the previous example, but easier to reproduce.

[14]:
area_weights = np.cos(np.deg2rad(SEAS5.latitude))
SEAS5_UK_weighted_latcos = (SEAS5['tprate']
                  .where(SEAS5_mask == 31)
                  .weighted(area_weights)
                  .mean(dim=['latitude', 'longitude'])
                 )

I plot the UK average for ensemble member 0 and leadtime 2 to show that the the two methods for taking an average are the same. Furthermore, the difference between the weighted and non-weighted average is very small in this case. The difference would be greater for larger domains and further towards to poles.

[15]:
SEAS5_UK.sel(leadtime=2,number=0).plot()
SEAS5_UK_weighted.sel(leadtime=2,number=0).plot()
SEAS5_UK_weighted_latcos.sel(leadtime=2,number=0).plot()

[15]:
[<matplotlib.lines.Line2D at 0x2b58d902ce90>]
[15]:
[<matplotlib.lines.Line2D at 0x2b58d9037c10>]
[15]:
[<matplotlib.lines.Line2D at 0x2b58d9014c10>]
_images/Notebooks_2.Preprocess_2.3Upscale_27_3.png

Upscale

For EOBS, we want to upscale the dataset to the SEAS5 grid. We use the function regridder(ds_in,ds_out,function), see the docs. We have to rename the lat lon dimensions so the function can read them.

We use bilinear interpolation first (i.e. function = ‘bilinear’), because of its ease in implementation. However, the use of conservative areal average (function = ‘conservative’) for upscaling is preferred (Kopparla, 2013).

[16]:
regridder = xe.Regridder(EOBS.rename({'longitude': 'lon', 'latitude': 'lat'}), SEAS5.rename({'longitude': 'lon', 'latitude': 'lat'}),'bilinear')
Overwrite existing file: bilinear_201x464_11x14.nc
 You can set reuse_weights=True to save computing time.

Now that we have the regridder, we can apply the regridder to our EOBS dataarray:

[17]:
EOBS_upscaled = regridder(EOBS)
EOBS_upscaled
using dimensions ('latitude', 'longitude') from data variable rr as the horizontal dimensions for this dataset.
[17]:
Show/Hide data repr Show/Hide attributes
xarray.Dataset
    • lat: 11
    • lon: 14
    • time: 71
    • time
      (time)
      datetime64[ns]
      1950-02-28 ... 2020-02-29
      array(['1950-02-28T00:00:00.000000000', '1951-02-28T00:00:00.000000000',
             '1952-02-29T00:00:00.000000000', '1953-02-28T00:00:00.000000000',
             '1954-02-28T00:00:00.000000000', '1955-02-28T00:00:00.000000000',
             '1956-02-29T00:00:00.000000000', '1957-02-28T00:00:00.000000000',
             '1958-02-28T00:00:00.000000000', '1959-02-28T00:00:00.000000000',
             '1960-02-29T00:00:00.000000000', '1961-02-28T00:00:00.000000000',
             '1962-02-28T00:00:00.000000000', '1963-02-28T00:00:00.000000000',
             '1964-02-29T00:00:00.000000000', '1965-02-28T00:00:00.000000000',
             '1966-02-28T00:00:00.000000000', '1967-02-28T00:00:00.000000000',
             '1968-02-29T00:00:00.000000000', '1969-02-28T00:00:00.000000000',
             '1970-02-28T00:00:00.000000000', '1971-02-28T00:00:00.000000000',
             '1972-02-29T00:00:00.000000000', '1973-02-28T00:00:00.000000000',
             '1974-02-28T00:00:00.000000000', '1975-02-28T00:00:00.000000000',
             '1976-02-29T00:00:00.000000000', '1977-02-28T00:00:00.000000000',
             '1978-02-28T00:00:00.000000000', '1979-02-28T00:00:00.000000000',
             '1980-02-29T00:00:00.000000000', '1981-02-28T00:00:00.000000000',
             '1982-02-28T00:00:00.000000000', '1983-02-28T00:00:00.000000000',
             '1984-02-29T00:00:00.000000000', '1985-02-28T00:00:00.000000000',
             '1986-02-28T00:00:00.000000000', '1987-02-28T00:00:00.000000000',
             '1988-02-29T00:00:00.000000000', '1989-02-28T00:00:00.000000000',
             '1990-02-28T00:00:00.000000000', '1991-02-28T00:00:00.000000000',
             '1992-02-29T00:00:00.000000000', '1993-02-28T00:00:00.000000000',
             '1994-02-28T00:00:00.000000000', '1995-02-28T00:00:00.000000000',
             '1996-02-29T00:00:00.000000000', '1997-02-28T00:00:00.000000000',
             '1998-02-28T00:00:00.000000000', '1999-02-28T00:00:00.000000000',
             '2000-02-29T00:00:00.000000000', '2001-02-28T00:00:00.000000000',
             '2002-02-28T00:00:00.000000000', '2003-02-28T00:00:00.000000000',
             '2004-02-29T00:00:00.000000000', '2005-02-28T00:00:00.000000000',
             '2006-02-28T00:00:00.000000000', '2007-02-28T00:00:00.000000000',
             '2008-02-29T00:00:00.000000000', '2009-02-28T00:00:00.000000000',
             '2010-02-28T00:00:00.000000000', '2011-02-28T00:00:00.000000000',
             '2012-02-29T00:00:00.000000000', '2013-02-28T00:00:00.000000000',
             '2014-02-28T00:00:00.000000000', '2015-02-28T00:00:00.000000000',
             '2016-02-29T00:00:00.000000000', '2017-02-28T00:00:00.000000000',
             '2018-02-28T00:00:00.000000000', '2019-02-28T00:00:00.000000000',
             '2020-02-29T00:00:00.000000000'], dtype='datetime64[ns]')
    • lon
      (lon)
      float32
      -11.0 -10.0 -9.0 ... 0.0 1.0 2.0
      array([-11., -10.,  -9.,  -8.,  -7.,  -6.,  -5.,  -4.,  -3.,  -2.,  -1.,   0.,
               1.,   2.], dtype=float32)
    • lat
      (lat)
      float32
      60.0 59.0 58.0 ... 52.0 51.0 50.0
      array([60., 59., 58., 57., 56., 55., 54., 53., 52., 51., 50.], dtype=float32)
    • rr
      (time, lat, lon)
      float64
      nan nan nan nan ... nan nan 4.243
      array([[[       nan,        nan,        nan, ...,        nan,
                      nan,        nan],
              [       nan,        nan,        nan, ...,        nan,
                      nan,        nan],
              [       nan,        nan,        nan, ...,        nan,
                      nan,        nan],
              ...,
              [       nan, 9.83356658, 8.21107723, ..., 3.16702519,
               2.5759045 ,        nan],
              [       nan,        nan,        nan, ..., 3.64821065,
                      nan,        nan],
              [       nan,        nan,        nan, ...,        nan,
                      nan, 2.88222815]],
      
             [[       nan,        nan,        nan, ...,        nan,
                      nan,        nan],
              [       nan,        nan,        nan, ...,        nan,
                      nan,        nan],
              [       nan,        nan,        nan, ...,        nan,
                      nan,        nan],
              ...,
              [       nan, 6.84924265, 5.05712206, ..., 2.98497796,
               2.8831402 ,        nan],
              [       nan,        nan,        nan, ..., 4.582142  ,
                      nan,        nan],
              [       nan,        nan,        nan, ...,        nan,
                      nan, 2.52327854]],
      
             [[       nan,        nan,        nan, ...,        nan,
                      nan,        nan],
              [       nan,        nan,        nan, ...,        nan,
                      nan,        nan],
              [       nan,        nan,        nan, ...,        nan,
                      nan,        nan],
              ...,
              [       nan, 1.43787911, 0.64045753, ..., 0.50001099,
               0.47414158,        nan],
              [       nan,        nan,        nan, ..., 0.91728464,
                      nan,        nan],
              [       nan,        nan,        nan, ...,        nan,
                      nan, 1.65432386]],
      
             ...,
      
             [[       nan,        nan,        nan, ...,        nan,
                      nan,        nan],
              [       nan,        nan,        nan, ...,        nan,
                      nan,        nan],
              [       nan,        nan,        nan, ...,        nan,
                      nan,        nan],
              ...,
              [       nan, 6.07403735, 2.99021503, ..., 1.05803059,
               1.26427248,        nan],
              [       nan,        nan,        nan, ..., 1.05892861,
                      nan,        nan],
              [       nan,        nan,        nan, ...,        nan,
                      nan, 1.13395446]],
      
             [[       nan,        nan,        nan, ...,        nan,
                      nan,        nan],
              [       nan,        nan,        nan, ...,        nan,
                      nan,        nan],
              [       nan,        nan,        nan, ...,        nan,
                      nan,        nan],
              ...,
              [       nan, 8.53733901, 4.27626159, ..., 1.10982856,
               0.72946197,        nan],
              [       nan,        nan,        nan, ..., 1.46519147,
                      nan,        nan],
              [       nan,        nan,        nan, ...,        nan,
                      nan, 1.3330352 ]],
      
             [[       nan,        nan,        nan, ...,        nan,
                      nan,        nan],
              [       nan,        nan,        nan, ...,        nan,
                      nan,        nan],
              [       nan,        nan,        nan, ...,        nan,
                      nan,        nan],
              ...,
              [       nan, 7.82318034, 7.0790547 , ..., 2.82500975,
               1.98278063,        nan],
              [       nan,        nan,        nan, ..., 3.50348602,
                      nan,        nan],
              [       nan,        nan,        nan, ...,        nan,
                      nan, 4.24307919]]])
  • regrid_method :
    bilinear

And set the latlon dimension names back to their long name. This is so both SEAS5 and EOBS have the same latlon dimension names which is necessary when using the same mask.

[18]:
EOBS_upscaled = EOBS_upscaled.rename({'lon' : 'longitude', 'lat' : 'latitude'})

Illustrate the SEAS5 and EOBS masks for the UK

Here I plot the masked mean SEAS5 and upscaled EOBS precipitation. This shows that upscaled EOBS does not contain data for all gridcells within the UK mask (the difference between SEAS5 gridcells and EOBS gridcells with data). We can apply an additional mask for SEAS5 that masks the grid cells that do not contain data in EOBS.

[19]:
fig, axs = plt.subplots(1, 2, subplot_kw={'projection': ccrs.OSGB()})

SEAS5['tprate'].where(SEAS5_mask == 31).mean(
    dim=['time', 'leadtime', 'number']).plot(
    transform=ccrs.PlateCarree(),
    vmin=0,
    vmax=8,
    cmap=plt.cm.Blues,
    ax=axs[0])

EOBS_upscaled['rr'].where(SEAS5_mask == 31).mean(dim='time').plot(
    transform=ccrs.PlateCarree(),
    vmin=0,
    vmax=8,
    cmap=plt.cm.Blues,
    ax=axs[1])

for ax in axs.flat:
    ax.coastlines(resolution='10m')

axs[0].set_title('SEAS5')
axs[1].set_title('EOBS')
/soge-home/users/cenv0732/.conda/envs/upscale/lib/python3.7/site-packages/xarray/core/nanops.py:142: RuntimeWarning: Mean of empty slice
  return np.nanmean(a, axis=axis, dtype=dtype)
[19]:
<matplotlib.collections.QuadMesh at 0x2b58d91fa710>
/soge-home/users/cenv0732/.conda/envs/upscale/lib/python3.7/site-packages/xarray/core/nanops.py:142: RuntimeWarning: Mean of empty slice
  return np.nanmean(a, axis=axis, dtype=dtype)
[19]:
<matplotlib.collections.QuadMesh at 0x2b58d9204f50>
[19]:
<cartopy.mpl.feature_artist.FeatureArtist at 0x2b58d9214fd0>
[19]:
<cartopy.mpl.feature_artist.FeatureArtist at 0x2b58d920a050>
[19]:
Text(0.5, 1.0, 'SEAS5')
[19]:
Text(0.5, 1.0, 'EOBS')
_images/Notebooks_2.Preprocess_2.3Upscale_35_8.png

The additional mask of SEAS5 is where EOBS is not null:

[20]:
fig, axs = plt.subplots(1, 2, subplot_kw={'projection': ccrs.OSGB()})

(SEAS5['tprate']
 .where(SEAS5_mask == 31)
 .where(EOBS_upscaled['rr'].sel(time='1950').squeeze('time').notnull()) ## mask values that are nan in EOBS
 .mean(dim=['time', 'leadtime', 'number'])
 .plot(
    transform=ccrs.PlateCarree(),
    vmin=0,
    vmax=8,
    cmap=plt.cm.Blues,
    ax=axs[0])
)

EOBS_upscaled['rr'].where(SEAS5_mask == 31).mean(dim='time').plot(
    transform=ccrs.PlateCarree(),
    vmin=0,
    vmax=8,
    cmap=plt.cm.Blues,
    ax=axs[1])

for ax in axs.flat:
    ax.coastlines(resolution='10m')

axs[0].set_title('SEAS5')
axs[1].set_title('EOBS')


/soge-home/users/cenv0732/.conda/envs/upscale/lib/python3.7/site-packages/xarray/core/nanops.py:142: RuntimeWarning: Mean of empty slice
  return np.nanmean(a, axis=axis, dtype=dtype)
[20]:
<matplotlib.collections.QuadMesh at 0x2b58d9381a10>
/soge-home/users/cenv0732/.conda/envs/upscale/lib/python3.7/site-packages/xarray/core/nanops.py:142: RuntimeWarning: Mean of empty slice
  return np.nanmean(a, axis=axis, dtype=dtype)
[20]:
<matplotlib.collections.QuadMesh at 0x2b58d9229f90>
[20]:
<cartopy.mpl.feature_artist.FeatureArtist at 0x2b58d9242390>
[20]:
<cartopy.mpl.feature_artist.FeatureArtist at 0x2b58d9242fd0>
[20]:
Text(0.5, 1.0, 'SEAS5')
[20]:
Text(0.5, 1.0, 'EOBS')
_images/Notebooks_2.Preprocess_2.3Upscale_37_8.png

Let’s include the 2020 event

[21]:
EOBS2020_sd_anomaly = EOBS_upscaled['rr'].sel(time='2020') - EOBS_upscaled['rr'].mean('time') / EOBS_upscaled['rr'].std('time')
EOBS2020_sd_anomaly.attrs = {
    'long_name': 'Precipitation anomaly',
    'units': '-'
}
EOBS2020_sd_anomaly
/soge-home/users/cenv0732/.conda/envs/upscale/lib/python3.7/site-packages/numpy/lib/nanfunctions.py:1667: RuntimeWarning: Degrees of freedom <= 0 for slice.
  keepdims=keepdims)
[21]:
Show/Hide data repr Show/Hide attributes
xarray.DataArray
'rr'
  • time: 1
  • latitude: 11
  • longitude: 14
  • nan nan nan nan nan nan nan nan ... nan nan nan nan nan nan nan 2.583
    array([[[       nan,        nan,        nan,        nan,        nan,
                    nan,        nan,        nan,        nan,        nan,
                    nan,        nan,        nan,        nan],
            [       nan,        nan,        nan,        nan,        nan,
                    nan,        nan,        nan,        nan,        nan,
                    nan,        nan,        nan,        nan],
            [       nan,        nan,        nan,        nan,        nan,
                    nan, 3.95092072,        nan,        nan,        nan,
                    nan,        nan,        nan,        nan],
            [       nan,        nan,        nan,        nan,        nan,
                    nan, 4.59428102, 3.79702769, 1.96181425,        nan,
                    nan,        nan,        nan,        nan],
            [       nan,        nan,        nan,        nan,        nan,
                    nan, 4.71781348, 5.2826348 ,        nan,        nan,
                    nan,        nan,        nan,        nan],
            [       nan,        nan,        nan,        nan, 4.40361314,
                    nan,        nan, 5.47361673, 6.10292013, 3.78611734,
                    nan,        nan,        nan,        nan],
            [       nan,        nan, 6.8501372 , 6.14007146, 4.30546062,
                    nan,        nan,        nan,        nan, 3.68873396,
             2.38060983,        nan,        nan,        nan],
            [       nan,        nan,        nan, 5.73786063, 4.37045386,
                    nan,        nan,        nan, 2.70572296, 3.11985962,
             3.02861986,        nan,        nan,        nan],
            [       nan, 5.81172535, 5.30645502,        nan,        nan,
                    nan,        nan, 3.13110123, 3.03169002, 2.42073797,
             2.30607803, 1.0535416 , 0.16921805,        nan],
            [       nan,        nan,        nan,        nan,        nan,
                    nan,        nan, 3.97998605, 3.35993006, 2.88487988,
             3.06793855, 1.90991066,        nan,        nan],
            [       nan,        nan,        nan,        nan,        nan,
                    nan,        nan,        nan,        nan,        nan,
                    nan,        nan,        nan, 2.58264812]]])
    • time
      (time)
      datetime64[ns]
      2020-02-29
      array(['2020-02-29T00:00:00.000000000'], dtype='datetime64[ns]')
    • longitude
      (longitude)
      float32
      -11.0 -10.0 -9.0 ... 0.0 1.0 2.0
      array([-11., -10.,  -9.,  -8.,  -7.,  -6.,  -5.,  -4.,  -3.,  -2.,  -1.,   0.,
               1.,   2.], dtype=float32)
    • latitude
      (latitude)
      float32
      60.0 59.0 58.0 ... 52.0 51.0 50.0
      array([60., 59., 58., 57., 56., 55., 54., 53., 52., 51., 50.], dtype=float32)
  • long_name :
    Precipitation anomaly
    units :
    -
[50]:
plt.figure(figsize=(3.3, 4))
plt.rc('font', size=7) #controls default text size

ax = plt.axes(projection=ccrs.OSGB())

EOBS2020_sd_anomaly.where(SEAS5_mask == 31).plot(
    transform=ccrs.PlateCarree(),
    vmin = -6,
    vmax = 6,
    extend = 'both',
    cmap = plt.cm.RdBu,#twilight_shifted_r,#plt.cm.Blues,#
    ax=ax)

ax.coastlines(resolution='10m')
# gl = ax.gridlines(crs=ccrs.PlateCarree(),
#                   draw_labels=False,     # cannot label OSGB projection..
#                   linewidth=1,
#                   color='gray',
#                   alpha=0.5,
#                   linestyle='--')

ax.set_title('February 2020')
plt.tight_layout()
plt.savefig('graphs/UK_event_selection2.png', dpi=300)
[50]:
<Figure size 237.6x288 with 0 Axes>
[50]:
<matplotlib.collections.QuadMesh at 0x2b58f1ae4ed0>
[50]:
<cartopy.mpl.feature_artist.FeatureArtist at 0x2b58f1afbd50>
[50]:
Text(0.5, 1.0, 'February 2020')
_images/Notebooks_2.Preprocess_2.3Upscale_40_4.png
[101]:
EOBS2020_sd_anomaly = EOBS['rr'].sel(time='2020') - EOBS['rr'].mean('time') / EOBS['rr'].std('time')
EOBS2020_sd_anomaly.attrs = {
    'long_name': 'Precipitation anomaly',
    'units': '-'
}
EOBS2020_sd_anomaly


fig, axs = plt.subplots(1, 3, figsize=(6.7,2.5),subplot_kw={'projection': ccrs.OSGB()}) #figsize=(10.,6.),

EOBS2020_sd_anomaly.plot(
    transform=ccrs.PlateCarree(),
    robust=True,
    extend = 'both',
    cmap=plt.cm.twilight_shifted_r,
    ax=axs[0])

(SEAS5['tprate']
 .where(SEAS5_mask == 31)
 .where(EOBS_upscaled['rr'].sel(time='1950').squeeze('time').notnull()) ## mask values that are nan in EOBS
 .mean(dim=['time', 'leadtime', 'number'])
 .plot(
    transform=ccrs.PlateCarree(),
    vmin=0,
    vmax=8,
    cmap=plt.cm.Blues,
    ax=axs[1],
    cbar_kwargs={'label': 'February precipitation [mm/d]'}
 )
)

EOBS_upscaled['rr'].where(SEAS5_mask == 31).mean(dim='time').plot(
    transform=ccrs.PlateCarree(),
    vmin=0,
    vmax=8,
    cmap=plt.cm.Blues,
    ax=axs[2],
    cbar_kwargs={'label': 'February precipitation [mm/d]'}
)


for ax in axs.flat:
    ax.coastlines(resolution='10m')
#     ax.set_aspect('auto')


axs[0].set_title('February 2020')
axs[1].set_title('SEAS5 average')
axs[2].set_title('EOBS average')

# fig.set_figwidth(180/26)
# plt.savefig('graphs/UK_event_selection.pdf', dpi=300)
/soge-home/users/cenv0732/.conda/envs/upscale/lib/python3.7/site-packages/numpy/lib/nanfunctions.py:1667: RuntimeWarning: Degrees of freedom <= 0 for slice.
  keepdims=keepdims)
[101]:
Show/Hide data repr Show/Hide attributes
xarray.DataArray
'rr'
  • time: 1
  • latitude: 201
  • longitude: 464
  • nan nan nan nan nan nan nan nan ... nan nan nan nan nan nan nan nan
    array([[[nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            ...,
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan]]], dtype=float32)
    • latitude
      (latitude)
      float64
      25.38 25.62 25.88 ... 75.12 75.38
      units :
      degrees_north
      long_name :
      Latitude values
      axis :
      Y
      standard_name :
      latitude
      array([25.375, 25.625, 25.875, ..., 74.875, 75.125, 75.375])
    • longitude
      (longitude)
      float64
      -40.38 -40.12 ... 75.12 75.38
      units :
      degrees_east
      long_name :
      Longitude values
      axis :
      X
      standard_name :
      longitude
      array([-40.375, -40.125, -39.875, ...,  74.875,  75.125,  75.375])
    • time
      (time)
      datetime64[ns]
      2020-02-29
      array(['2020-02-29T00:00:00.000000000'], dtype='datetime64[ns]')
  • long_name :
    Precipitation anomaly
    units :
    -
[101]:
<matplotlib.collections.QuadMesh at 0x7f16beef9790>
/soge-home/users/cenv0732/.conda/envs/upscale/lib/python3.7/site-packages/xarray/core/nanops.py:142: RuntimeWarning: Mean of empty slice
  return np.nanmean(a, axis=axis, dtype=dtype)
[101]:
<matplotlib.collections.QuadMesh at 0x7f16beeb7e10>
/soge-home/users/cenv0732/.conda/envs/upscale/lib/python3.7/site-packages/xarray/core/nanops.py:142: RuntimeWarning: Mean of empty slice
  return np.nanmean(a, axis=axis, dtype=dtype)
[101]:
<matplotlib.collections.QuadMesh at 0x7f16bee05ed0>
[101]:
<cartopy.mpl.feature_artist.FeatureArtist at 0x7f16bedd64d0>
[101]:
<cartopy.mpl.feature_artist.FeatureArtist at 0x7f16bee162d0>
[101]:
<cartopy.mpl.feature_artist.FeatureArtist at 0x7f16bedd6910>
[101]:
Text(0.5, 1.0, 'February 2020')
[101]:
Text(0.5, 1.0, 'SEAS5 average')
[101]:
Text(0.5, 1.0, 'EOBS average')
_images/Notebooks_2.Preprocess_2.3Upscale_41_13.png

Extract the spatial average

To select the UK average, we select SEAS5 precipitation (tprate), select the gridcells that are within the UK and take the area-weighted mean over those gridcells. This results in a dataset of February precipitation for 35 years (1981-2016), with 5 leadtimes and 25 ensemble members.

[31]:
SEAS5_UK_weighted = (SEAS5
                  .where(SEAS5_mask == 31)
                  .where(EOBS_upscaled['rr'].sel(time='1950').squeeze('time').notnull())
                  .weighted(Gridarea_SEAS5['cell_area'])
                  .mean(dim=['latitude', 'longitude'])
                 )
SEAS5_UK_weighted
[31]:
Show/Hide data repr Show/Hide attributes
xarray.Dataset
    • leadtime: 5
    • number: 51
    • time: 39
    • number
      (number)
      int64
      0 1 2 3 4 5 6 ... 45 46 47 48 49 50
      long_name :
      ensemble_member
      array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
             18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
             36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50])
    • time
      (time)
      datetime64[ns]
      1982-02-01 ... 2020-02-01
      long_name :
      time
      array(['1982-02-01T00:00:00.000000000', '1983-02-01T00:00:00.000000000',
             '1984-02-01T00:00:00.000000000', '1985-02-01T00:00:00.000000000',
             '1986-02-01T00:00:00.000000000', '1987-02-01T00:00:00.000000000',
             '1988-02-01T00:00:00.000000000', '1989-02-01T00:00:00.000000000',
             '1990-02-01T00:00:00.000000000', '1991-02-01T00:00:00.000000000',
             '1992-02-01T00:00:00.000000000', '1993-02-01T00:00:00.000000000',
             '1994-02-01T00:00:00.000000000', '1995-02-01T00:00:00.000000000',
             '1996-02-01T00:00:00.000000000', '1997-02-01T00:00:00.000000000',
             '1998-02-01T00:00:00.000000000', '1999-02-01T00:00:00.000000000',
             '2000-02-01T00:00:00.000000000', '2001-02-01T00:00:00.000000000',
             '2002-02-01T00:00:00.000000000', '2003-02-01T00:00:00.000000000',
             '2004-02-01T00:00:00.000000000', '2005-02-01T00:00:00.000000000',
             '2006-02-01T00:00:00.000000000', '2007-02-01T00:00:00.000000000',
             '2008-02-01T00:00:00.000000000', '2009-02-01T00:00:00.000000000',
             '2010-02-01T00:00:00.000000000', '2011-02-01T00:00:00.000000000',
             '2012-02-01T00:00:00.000000000', '2013-02-01T00:00:00.000000000',
             '2014-02-01T00:00:00.000000000', '2015-02-01T00:00:00.000000000',
             '2016-02-01T00:00:00.000000000', '2017-02-01T00:00:00.000000000',
             '2018-02-01T00:00:00.000000000', '2019-02-01T00:00:00.000000000',
             '2020-02-01T00:00:00.000000000'], dtype='datetime64[ns]')
    • leadtime
      (leadtime)
      int64
      2 3 4 5 6
      array([2, 3, 4, 5, 6])
    • tprate
      (leadtime, time, number)
      float64
      1.62 1.803 3.715 ... 2.564 4.138
      array([[[1.61983929, 1.80270562, 3.71546987, ...,        nan,
                      nan,        nan],
              [2.92980026, 1.76955459, 4.32214303, ...,        nan,
                      nan,        nan],
              [3.27281587, 3.64590937, 3.30555726, ...,        nan,
                      nan,        nan],
              ...,
              [2.89360362, 2.59218477, 3.03558477, ..., 1.58015178,
               2.71970341, 3.90628793],
              [2.83273711, 3.75153198, 3.83866087, ..., 3.48253617,
               2.32188866, 3.46978992],
              [1.66102187, 1.3112535 , 2.63144203, ..., 3.30575779,
               3.18984353, 3.79450372]],
      
             [[1.09491934, 1.53100393, 3.26492016, ...,        nan,
                      nan,        nan],
              [0.82039565, 2.81008751, 2.15275396, ...,        nan,
                      nan,        nan],
              [3.66650518, 3.36296082, 1.86208923, ...,        nan,
                      nan,        nan],
              ...,
              [2.17491562, 2.64869696, 1.02353209, ..., 2.54835116,
               2.70637528, 3.4408103 ],
              [5.1222398 , 2.62565355, 1.78116274, ..., 3.56592628,
               1.26627067, 2.87900887],
              [3.17451054, 4.11302752, 2.7948676 , ..., 3.28856732,
               3.80998751, 2.59373123]],
      
             [[1.82732135, 2.35700255, 1.33907771, ...,        nan,
                      nan,        nan],
              [2.40939677, 3.5025749 , 2.91052599, ...,        nan,
                      nan,        nan],
              [4.27967074, 2.39810642, 3.68449437, ...,        nan,
                      nan,        nan],
              ...,
              [4.22301826, 2.2819489 , 2.16406475, ..., 3.59094438,
               2.66581256, 1.67580991],
              [1.2292298 , 2.66846781, 2.3310815 , ..., 4.89209404,
               2.18927737, 2.49474918],
              [2.21915567, 2.82841411, 1.91538155, ..., 2.39787836,
               1.99753504, 1.43734139]],
      
             [[2.77891452, 3.58542727, 1.01360512, ...,        nan,
                      nan,        nan],
              [3.83915299, 1.66513433, 3.15175385, ...,        nan,
                      nan,        nan],
              [3.09151276, 2.05154796, 3.72162802, ...,        nan,
                      nan,        nan],
              ...,
              [3.04811003, 4.9512416 , 2.92435969, ..., 1.83409994,
               2.4774762 , 3.66123896],
              [2.32522377, 3.49698525, 3.4530077 , ..., 3.95947688,
               2.91357033, 1.62377756],
              [4.34000582, 4.97702138, 2.41828472, ..., 2.35324511,
               2.69858031, 2.34864059]],
      
             [[3.10352949, 3.18728323, 2.56303604, ...,        nan,
                      nan,        nan],
              [1.98921652, 3.28544606, 2.2746998 , ...,        nan,
                      nan,        nan],
              [4.10408859, 1.99422737, 1.67128341, ...,        nan,
                      nan,        nan],
              ...,
              [2.06380293, 1.56495404, 1.75083516, ..., 3.32981376,
               3.29718419, 1.74166462],
              [2.5739265 , 3.4476199 , 1.47373789, ..., 2.83750521,
               3.40121088, 3.01731714],
              [3.70821856, 2.66286428, 4.67899868, ..., 2.62159033,
               2.56445819, 4.13767096]]])
[32]:
EOBS_UK_weighted = (EOBS_upscaled
                  .where(SEAS5_mask == 31) ## EOBS is now on the SEAS5 grid, so use the SEAS5 mask and gridcell area
                  .weighted(Gridarea_SEAS5['cell_area'])
                  .mean(dim=['latitude', 'longitude'])
                 )
EOBS_UK_weighted
EOBS_UK_weighted['rr'].plot()
[32]:
Show/Hide data repr Show/Hide attributes
xarray.Dataset
    • time: 71
    • time
      (time)
      datetime64[ns]
      1950-02-28 ... 2020-02-29
      array(['1950-02-28T00:00:00.000000000', '1951-02-28T00:00:00.000000000',
             '1952-02-29T00:00:00.000000000', '1953-02-28T00:00:00.000000000',
             '1954-02-28T00:00:00.000000000', '1955-02-28T00:00:00.000000000',
             '1956-02-29T00:00:00.000000000', '1957-02-28T00:00:00.000000000',
             '1958-02-28T00:00:00.000000000', '1959-02-28T00:00:00.000000000',
             '1960-02-29T00:00:00.000000000', '1961-02-28T00:00:00.000000000',
             '1962-02-28T00:00:00.000000000', '1963-02-28T00:00:00.000000000',
             '1964-02-29T00:00:00.000000000', '1965-02-28T00:00:00.000000000',
             '1966-02-28T00:00:00.000000000', '1967-02-28T00:00:00.000000000',
             '1968-02-29T00:00:00.000000000', '1969-02-28T00:00:00.000000000',
             '1970-02-28T00:00:00.000000000', '1971-02-28T00:00:00.000000000',
             '1972-02-29T00:00:00.000000000', '1973-02-28T00:00:00.000000000',
             '1974-02-28T00:00:00.000000000', '1975-02-28T00:00:00.000000000',
             '1976-02-29T00:00:00.000000000', '1977-02-28T00:00:00.000000000',
             '1978-02-28T00:00:00.000000000', '1979-02-28T00:00:00.000000000',
             '1980-02-29T00:00:00.000000000', '1981-02-28T00:00:00.000000000',
             '1982-02-28T00:00:00.000000000', '1983-02-28T00:00:00.000000000',
             '1984-02-29T00:00:00.000000000', '1985-02-28T00:00:00.000000000',
             '1986-02-28T00:00:00.000000000', '1987-02-28T00:00:00.000000000',
             '1988-02-29T00:00:00.000000000', '1989-02-28T00:00:00.000000000',
             '1990-02-28T00:00:00.000000000', '1991-02-28T00:00:00.000000000',
             '1992-02-29T00:00:00.000000000', '1993-02-28T00:00:00.000000000',
             '1994-02-28T00:00:00.000000000', '1995-02-28T00:00:00.000000000',
             '1996-02-29T00:00:00.000000000', '1997-02-28T00:00:00.000000000',
             '1998-02-28T00:00:00.000000000', '1999-02-28T00:00:00.000000000',
             '2000-02-29T00:00:00.000000000', '2001-02-28T00:00:00.000000000',
             '2002-02-28T00:00:00.000000000', '2003-02-28T00:00:00.000000000',
             '2004-02-29T00:00:00.000000000', '2005-02-28T00:00:00.000000000',
             '2006-02-28T00:00:00.000000000', '2007-02-28T00:00:00.000000000',
             '2008-02-29T00:00:00.000000000', '2009-02-28T00:00:00.000000000',
             '2010-02-28T00:00:00.000000000', '2011-02-28T00:00:00.000000000',
             '2012-02-29T00:00:00.000000000', '2013-02-28T00:00:00.000000000',
             '2014-02-28T00:00:00.000000000', '2015-02-28T00:00:00.000000000',
             '2016-02-29T00:00:00.000000000', '2017-02-28T00:00:00.000000000',
             '2018-02-28T00:00:00.000000000', '2019-02-28T00:00:00.000000000',
             '2020-02-29T00:00:00.000000000'], dtype='datetime64[ns]')
    • rr
      (time)
      float64
      4.127 3.251 1.072 ... 1.782 4.92
      array([4.12725776, 3.25073545, 1.07154934, 1.59250362, 2.59011679,
             2.19460832, 0.86553777, 2.71508494, 3.63875078, 0.43748921,
             2.44475198, 2.85323568, 1.90375894, 0.93073633, 0.96014483,
             0.54936363, 3.93806659, 3.41126225, 1.52403826, 2.14680588,
             3.24367008, 1.61173947, 2.49205547, 1.86097057, 3.44753495,
             1.09562791, 1.72599619, 4.25387794, 2.67016706, 1.75307168,
             2.88443313, 2.00213592, 2.06658247, 1.35684105, 2.11976754,
             1.09154394, 0.42823908, 1.98726177, 2.62084822, 3.95611062,
             6.18364117, 2.09273353, 2.32127359, 0.71821086, 2.72255429,
             4.36020144, 2.92272651, 5.12799133, 1.95292637, 2.49697346,
             4.02321657, 3.13585905, 5.53924289, 1.39434155, 1.90494815,
             1.91951746, 1.85577634, 3.60145376, 2.21509453, 1.73046396,
             2.21306353, 3.38992427, 1.39408258, 1.73443926, 5.42245947,
             2.37410915, 3.24500002, 2.77368044, 1.49827506, 1.78169717,
             4.91976886])
[32]:
[<matplotlib.lines.Line2D at 0x7f16cc470b90>]
_images/Notebooks_2.Preprocess_2.3Upscale_45_2.png

Illustrate the SEAS5 and EOBS UK average

And the area-weighted average UK precipitation for SEAS5 and EOBS I plot here. For SEAS5 I plot the range, both min/max and the 2.5/97.5 % percentile of all ensemble members and leadtimes for each year.

[33]:
ax = plt.axes()

Quantiles = (SEAS5_UK_weighted['tprate']
             .quantile([0,2.5/100, 0.5, 97.5/100,1],
                       dim=['number','leadtime']
                      )
            )
ax.plot(Quantiles.time, Quantiles.sel(quantile=0.5),
        color='orange',
        label = 'SEAS5 median')
ax.fill_between(Quantiles.time.values, Quantiles.sel(quantile=0.025), Quantiles.sel(quantile=0.975),
                color='orange',
                alpha=0.2,
                label = '95% / min max')
ax.fill_between(Quantiles.time.values, Quantiles.sel(quantile=0), Quantiles.sel(quantile=1),
                color='orange',
                alpha=0.2)

EOBS_UK_weighted['rr'].plot(ax=ax,
                            x='time',
                            label = 'E-OBS')
plt.legend(loc = 'lower left',
           ncol=2 ) #loc = (0.1, 0) upper left
[33]:
[<matplotlib.lines.Line2D at 0x7f16cc417410>]
[33]:
<matplotlib.collections.PolyCollection at 0x7f16d5497610>
[33]:
<matplotlib.collections.PolyCollection at 0x7f16cc2ad7d0>
[33]:
[<matplotlib.lines.Line2D at 0x7f16cc50cb10>]
[33]:
<matplotlib.legend.Legend at 0x7f16cc3d18d0>
_images/Notebooks_2.Preprocess_2.3Upscale_47_5.png

And save the UK weighted average datasets

[34]:
SEAS5_UK_weighted.to_netcdf('Data/SEAS5_UK_weighted_masked.nc')
SEAS5_UK_weighted.to_dataframe().to_csv('Data/SEAS5_UK_weighted_masked.csv')
EOBS_UK_weighted.to_netcdf('Data/EOBS_UK_weighted_upscaled.nc') ## save as netcdf
EOBS_UK_weighted.to_dataframe().to_csv('Data/EOBS_UK_weighted_upscaled.csv') ## and save as csv.
[35]:
SEAS5_UK_weighted.close()
EOBS_UK_weighted.close()

Other methods

There are many different sources and methods available for extracting areal-averages from shapefiles. Here I have used shapely / masking in xarray. Something that lacks with this method is the weighted extraction from a shapefile, that is more precise on the boundaries. In R, raster:extract can use the percentage of the area that falls within the country for each grid cell to use as weight in averaging. For more information on this method, see the EGU 2018 course. For SEAS5, with its coarse resolution, this might make a difference. However, for it’s speed and reproducibility, we have chosen to stick to xarray.

We have used xarray where you can apply weights yourself to a dataset and then calculate the weighted mean. Sources I have used: * xarray weighted reductions * Matteo’s blog * regionmask package * Arctic weighted average example * area weighted temperature example.

And this pretty awesome colab notebook on seasonal forecasting regrids seasonal forecasts and reanalysis on the same grid before calculating skill scores.

License

All code and example data are available under the open source MIT License.

Citation

When using the code or example data, please cite this project. If any questions may arise, please don’t hesitate to get in touch t.kelder@lboro.ac.uk.