Launch in Binder Binder

UK Precipitation

February 2020 case study

February 2020 was the wettest February on record in the UK (since 1862), according to the Met Office. The UK faced three official storms during February, and this exceptional phenomena attracted media attention, such as an article from the BBC on increased climate concerns among the population. A Carbon Brief post explained why the UK saw such record-breaking rainfall and put this rare event into perspective, citing, amongst other approaches, the UNSEEN method. The UNSEEN study by Thompson et al., 2017 assessed monthly precipitation over the UK. They showed that the monthly precipitation records for south east England have a 7% chance of being exceeded in at least one month in any given winter. They did not use SEAS5 but the Met Office model ensemble. This work was taken up in the National Flood Resilience Review (2016), showing the high relevance and applicability of the method.

Here, the aim is to build an open, reproducible and transferable workflow, that will be tested for this well-studied region of the world and can be transferred to other regions and climate variables of interest, such as the 2020 Siberian heat and California fires.

Retrieve data

The main functions to retrieve all forecasts (SEAS5) is retrieve_SEAS5. We want to download February average precipitation over the UK. By default, the hindcast years of 1981-2016 are downloaded for SEAS5. The folder indicates where the files will be stored, in this case outside of the UNSEEN-open repository, in a ‘UK_example’ directory. For more explanation, see retrieve.

[ ]:
retrieve.retrieve_SEAS5(variables = 'total_precipitation',
                        target_months = [2],
                        area = [60, -11, 50, 2],
                        folder = '../UK_example/SEAS5/')

We use the EOBS observational dataset to evaluate the UNSEEN ensemble. I tried to download EOBS through the Copernicus Climate Data Store, but the Product is temporally disabled for maintenance purposes. As workaround I downloaded EOBS (from 1950 - 2019) and the most recent EOBS data (2020) here. Note, you have to register as E-OBS user.

Preprocess

In the preprocessing step, we first merge all downloaded files into one xarray dataset, see preprocessing.

[ ]:
SEAS5 = xr.open_dataset('../UK_example/SEAS5/SEAS5.nc')

I tried to download EOBS through CDS, but the Product was temporally disabled for maintenance purposes (1.2 Retrieve). As workaround, here, I downloaded EOBS (from 1950 - 2019) and the most recent EOBS data (2020) here. Note, you have to register as E-OBS user.

I will select February monthly mean precipitation to compare to SEAS5. I have taken the average mm/day over the month, which I think is more fair than the total monthly precipitation because of leap days.

[5]:
EOBS = xr.open_dataset('../UK_example/EOBS/rr_ens_mean_0.25deg_reg_v20.0e.nc') ## open the data
EOBS = EOBS.resample(time='1m').mean() ## Monthly averages
EOBS = EOBS.sel(time=EOBS['time.month'] == 2) ## Select only February
EOBS
/soge-home/users/cenv0732/.conda/envs/UNSEEN-open/lib/python3.8/site-packages/xarray/core/nanops.py:142: RuntimeWarning: Mean of empty slice
  return np.nanmean(a, axis=axis, dtype=dtype)
[5]:
Show/Hide data repr Show/Hide attributes
xarray.Dataset
    • latitude: 201
    • longitude: 464
    • time: 70
    • time
      (time)
      datetime64[ns]
      1950-02-28 ... 2019-02-28
      array(['1950-02-28T00:00:00.000000000', '1951-02-28T00:00:00.000000000',
             '1952-02-29T00:00:00.000000000', '1953-02-28T00:00:00.000000000',
             '1954-02-28T00:00:00.000000000', '1955-02-28T00:00:00.000000000',
             '1956-02-29T00:00:00.000000000', '1957-02-28T00:00:00.000000000',
             '1958-02-28T00:00:00.000000000', '1959-02-28T00:00:00.000000000',
             '1960-02-29T00:00:00.000000000', '1961-02-28T00:00:00.000000000',
             '1962-02-28T00:00:00.000000000', '1963-02-28T00:00:00.000000000',
             '1964-02-29T00:00:00.000000000', '1965-02-28T00:00:00.000000000',
             '1966-02-28T00:00:00.000000000', '1967-02-28T00:00:00.000000000',
             '1968-02-29T00:00:00.000000000', '1969-02-28T00:00:00.000000000',
             '1970-02-28T00:00:00.000000000', '1971-02-28T00:00:00.000000000',
             '1972-02-29T00:00:00.000000000', '1973-02-28T00:00:00.000000000',
             '1974-02-28T00:00:00.000000000', '1975-02-28T00:00:00.000000000',
             '1976-02-29T00:00:00.000000000', '1977-02-28T00:00:00.000000000',
             '1978-02-28T00:00:00.000000000', '1979-02-28T00:00:00.000000000',
             '1980-02-29T00:00:00.000000000', '1981-02-28T00:00:00.000000000',
             '1982-02-28T00:00:00.000000000', '1983-02-28T00:00:00.000000000',
             '1984-02-29T00:00:00.000000000', '1985-02-28T00:00:00.000000000',
             '1986-02-28T00:00:00.000000000', '1987-02-28T00:00:00.000000000',
             '1988-02-29T00:00:00.000000000', '1989-02-28T00:00:00.000000000',
             '1990-02-28T00:00:00.000000000', '1991-02-28T00:00:00.000000000',
             '1992-02-29T00:00:00.000000000', '1993-02-28T00:00:00.000000000',
             '1994-02-28T00:00:00.000000000', '1995-02-28T00:00:00.000000000',
             '1996-02-29T00:00:00.000000000', '1997-02-28T00:00:00.000000000',
             '1998-02-28T00:00:00.000000000', '1999-02-28T00:00:00.000000000',
             '2000-02-29T00:00:00.000000000', '2001-02-28T00:00:00.000000000',
             '2002-02-28T00:00:00.000000000', '2003-02-28T00:00:00.000000000',
             '2004-02-29T00:00:00.000000000', '2005-02-28T00:00:00.000000000',
             '2006-02-28T00:00:00.000000000', '2007-02-28T00:00:00.000000000',
             '2008-02-29T00:00:00.000000000', '2009-02-28T00:00:00.000000000',
             '2010-02-28T00:00:00.000000000', '2011-02-28T00:00:00.000000000',
             '2012-02-29T00:00:00.000000000', '2013-02-28T00:00:00.000000000',
             '2014-02-28T00:00:00.000000000', '2015-02-28T00:00:00.000000000',
             '2016-02-29T00:00:00.000000000', '2017-02-28T00:00:00.000000000',
             '2018-02-28T00:00:00.000000000', '2019-02-28T00:00:00.000000000'],
            dtype='datetime64[ns]')
    • longitude
      (longitude)
      float64
      -40.38 -40.12 ... 75.12 75.38
      units :
      degrees_east
      long_name :
      Longitude values
      axis :
      X
      standard_name :
      longitude
      array([-40.375, -40.125, -39.875, ...,  74.875,  75.125,  75.375])
    • latitude
      (latitude)
      float64
      25.38 25.62 25.88 ... 75.12 75.38
      units :
      degrees_north
      long_name :
      Latitude values
      axis :
      Y
      standard_name :
      latitude
      array([25.375, 25.625, 25.875, ..., 74.875, 75.125, 75.375])
    • rr
      (time, latitude, longitude)
      float32
      nan nan nan nan ... nan nan nan nan
      array([[[nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              ...,
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan]],
      
             [[nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              ...,
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan]],
      
             [[nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              ...,
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan]],
      
             ...,
      
             [[nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              ...,
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan]],
      
             [[nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              ...,
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan]],
      
             [[nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              ...,
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan]]], dtype=float32)

Here I define the attributes, that xarray uses when plotting

[6]:
EOBS['rr'].attrs = {'long_name': 'rainfall',  ##Define the name
 'units': 'mm/day', ## unit
 'standard_name': 'thickness_of_rainfall_amount'} ## original name, not used
EOBS['rr'].mean('time').plot() ## and show the 1950-2019 average February precipitation

/soge-home/users/cenv0732/.conda/envs/UNSEEN-open/lib/python3.8/site-packages/xarray/core/nanops.py:142: RuntimeWarning: Mean of empty slice
  return np.nanmean(a, axis=axis, dtype=dtype)
[6]:
<matplotlib.collections.QuadMesh at 0x7f9dd1819f10>
../../_images/Notebooks_examples_UK_Precipitation_14_2.png

The 2020 data file is separate and needs the same preprocessing:

[8]:
EOBS2020 = xr.open_dataset('../UK_example/EOBS/rr_0.25deg_day_2020_grid_ensmean.nc.1') #open
EOBS2020 = EOBS2020.resample(time='1m').mean() #Monthly mean
EOBS2020['rr'].sel(time='2020-04').plot() #show map
EOBS2020 ## display dataset
/soge-home/users/cenv0732/.conda/envs/UNSEEN-open/lib/python3.8/site-packages/xarray/core/nanops.py:142: RuntimeWarning: Mean of empty slice
  return np.nanmean(a, axis=axis, dtype=dtype)
[8]:
<matplotlib.collections.QuadMesh at 0x7f9dd15e5820>
[8]:
Show/Hide data repr Show/Hide attributes
xarray.Dataset
    • latitude: 201
    • longitude: 464
    • time: 12
    • time
      (time)
      datetime64[ns]
      2020-01-31 ... 2020-12-31
      array(['2020-01-31T00:00:00.000000000', '2020-02-29T00:00:00.000000000',
             '2020-03-31T00:00:00.000000000', '2020-04-30T00:00:00.000000000',
             '2020-05-31T00:00:00.000000000', '2020-06-30T00:00:00.000000000',
             '2020-07-31T00:00:00.000000000', '2020-08-31T00:00:00.000000000',
             '2020-09-30T00:00:00.000000000', '2020-10-31T00:00:00.000000000',
             '2020-11-30T00:00:00.000000000', '2020-12-31T00:00:00.000000000'],
            dtype='datetime64[ns]')
    • longitude
      (longitude)
      float64
      -40.38 -40.12 ... 75.12 75.38
      standard_name :
      longitude
      long_name :
      Longitude values
      units :
      degrees_east
      axis :
      X
      array([-40.375, -40.125, -39.875, ...,  74.875,  75.125,  75.375])
    • latitude
      (latitude)
      float64
      25.38 25.62 25.88 ... 75.12 75.38
      standard_name :
      latitude
      long_name :
      Latitude values
      units :
      degrees_north
      axis :
      Y
      array([25.375, 25.625, 25.875, ..., 74.875, 75.125, 75.375])
    • rr
      (time, latitude, longitude)
      float32
      nan nan nan nan ... nan nan nan nan
      array([[[nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              ...,
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan]],
      
             [[nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              ...,
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan]],
      
             [[nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              ...,
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan]],
      
             ...,
      
             [[nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              ...,
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan]],
      
             [[nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              ...,
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan]],
      
             [[nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              ...,
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan]]], dtype=float32)
../../_images/Notebooks_examples_UK_Precipitation_16_3.png

We had to download EOBS in two separate files to also obtain the 2020 data. Below, we concatenate the two files and store the processed data for easy import in the future

[10]:
EOBS_concat = xr.concat([EOBS,EOBS2020.sel(time='2020-02')],dim='time') ## Concatenate the 1950-2019 and 2020 datasets.
EOBS_concat.to_netcdf('../UK_example/EOBS/EOBS.nc') ## And store the 1950-2010 February precipitation into one nc for future import

We then extract UK averaged precipitation SEAS5 and EOBS. We upscale EOBS to the SEAS5 grid and apply the same UK mask to extract the UK average for both datasets. See Using EOBS + upscaling for an example how to regrid and how to extract a country average timeseries.

Evaluate

Note

From here onward we use R and not python!

We switch to R since we believe R has a better functionality in extreme value statistics.

[1]:
setwd('../../..')
# getwd()
EOBS_UK_weighted_df <- read.csv("Data/EOBS_UK_weighted_upscaled.csv", stringsAsFactors=FALSE)
SEAS5_UK_weighted_df <- read.csv("Data/SEAS5_UK_weighted_masked.csv", stringsAsFactors=FALSE)

## Convert the time class to Date format
EOBS_UK_weighted_df$time <- lubridate::ymd(EOBS_UK_weighted_df$time)
str(EOBS_UK_weighted_df)

EOBS_UK_weighted_df_hindcast <- EOBS_UK_weighted_df[
    EOBS_UK_weighted_df$time > '1982-02-01' &
    EOBS_UK_weighted_df$time < '2017-02-01',
    ]


SEAS5_UK_weighted_df$time <- lubridate::ymd(SEAS5_UK_weighted_df$time)
str(SEAS5_UK_weighted_df)
'data.frame':   71 obs. of  2 variables:
 $ time: Date, format: "1950-02-28" "1951-02-28" ...
 $ rr  : num  4.13 3.25 1.07 1.59 2.59 ...
'data.frame':   4375 obs. of  4 variables:
 $ leadtime: int  2 2 2 2 2 2 2 2 2 2 ...
 $ number  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ time    : Date, format: "1982-02-01" "1983-02-01" ...
 $ tprate  : num  1.62 2.93 3.27 2 3.31 ...

Is the UNSEEN ensemble realistic?

To answer this question, we perform three statistical tests: independence, model stability and model fidelity tests.
These statistical tests are available through the UNSEEN R package. See evaluation for more info.
[2]:
require(UNSEEN)
Loading required package: UNSEEN

Timeseries

We plot the timeseries of SEAS5 (UNSEEN) and EOBS (OBS) for UK February precipitation.

[3]:
unseen_timeseries(ensemble = SEAS5_UK_weighted_df,
                  obs = EOBS_UK_weighted_df,
                  ylab = 'UK February precipitation (mm/d)')
../../_images/Notebooks_examples_UK_Precipitation_27_0.png

We select the timeseries for the hindcast years 1981-2016.

[4]:
timeseries <- unseen_timeseries(ensemble = SEAS5_UK_weighted_df,
                  obs = EOBS_UK_weighted_df_hindcast,
                  ylab = 'UK February precipitation (mm/d)')
ggsave(timeseries, height = 5, width = 6,   filename = "graphs/UK_timeseries.png")
Error in ggsave(timeseries, height = 5, width = 6, filename = "graphs/UK_timeseries.png"): could not find function "ggsave"
Traceback:

Evaluation tests

With the hindcast dataset we evaluate the independence, stability and fidelity.

First the independence test. This test checks if the forecasts are independent. If they are not, the event are not unique and care should be taken in the extreme value analysis. Because of the chaotic behaviour of the atmosphere, independence of precipitation events is expected beyond a lead time of two weeks. Here we use lead times 2-6 months and find that the boxplots are within the expected range (perhaps very small dependence in lead time 2). More info in our paper: https://doi.org/10.31223/osf.io/hyxeq.

[5]:
independence_test(ensemble = SEAS5_UK)
Warning message:
"Removed 1625 rows containing non-finite values (stat_ydensity)."
Warning message:
"Removed 1625 rows containing non-finite values (stat_boxplot)."
../../_images/Notebooks_examples_UK_Precipitation_31_1.png

The test for model stability: Is there a drift in the simulated precipitation over lead times?

We find that the model is stable for UK February precipitation.

[8]:
stability_test(ensemble = SEAS5_UK, lab = 'UK February precipitation (mm/d)')
Warning message:
“Removed 4 row(s) containing missing values (geom_path).”
../../_images/Notebooks_examples_UK_Precipitation_33_1.png

The fidelity test shows us how consistent the model simulations of UNSEEN (SEAS5) are with the observed (EOBS). With this test we can asses systematic biases. The UNSEEN dataset is much larger than the observed – hence they cannot simply be compared. For example, what if we had faced a few more or a few less precipitation extremes purely by chance?

This would influence the observed mean, but not so much influence the UNSEEN ensemble because of the large data sample. Therefore we express the UNSEEN ensemble as a range of plausible means, for data samples of the same length as the observed. We do the same for higher order statistical moments.

[6]:
fidelity_test(obs = EOBS_UK_weighted_df_hindcast$rr,
              ensemble = SEAS5_UK_weighted_df$tprate
             )
../../_images/Notebooks_examples_UK_Precipitation_35_0.png

We find that the standard deviation within the model (the grey histograms and lines) are too low compared to the observed.

We can include a simple mean-bias correction (ratio) in this plot by setting biascor = TRUE. However, in this case it won’t help:

[7]:
fidelity_test(obs = EOBS_UK_weighted_df_hindcast$rr,
              ensemble = SEAS5_UK_weighted_df$tprate,
              biascor = TRUE
             )
../../_images/Notebooks_examples_UK_Precipitation_37_0.png

Check the documentation of the test ?fidelity_test

Illustrate

[8]:
source('src/evt_plot.r')
Loading required package: Lmoments

Loading required package: distillery


Attaching package: 'extRemes'


The following objects are masked from 'package:stats':

    qqnorm, qqplot


First, we fit a Gumbel and a GEV distribution (including shape parameter) to the observed extremes. The Gumbel distribution best describes the data because the p-value of 0.9 is much above 0.05 (based on the likelihood ratio test).

[9]:
fit_obs_Gumbel <- fevd(x = EOBS_UK_weighted_df_hindcast$rr,
                    type = "Gumbel"
                   )
fit_obs_GEV <- fevd(x = EOBS_UK_weighted_df_hindcast$rr,
                    type = "GEV"
                   )
lr.test(fit_obs_Gumbel, fit_obs_GEV)

        Likelihood-ratio Test

data:  EOBS_UK_weighted_df_hindcast$rrEOBS_UK_weighted_df_hindcast$rr
Likelihood-ratio = 0.014629, chi-square critical value = 3.8415, alpha
= 0.0500, Degrees of Freedom = 1.0000, p-value = 0.9037
alternative hypothesis: greater

We show the gumbel plot for the observed (EOBS) and UNSEEN (SEAS5 hindcast data). This shows that the UNSEEN simulations are not within the uncertainty range of the observations. This has to do with the variability of the model that is too low, as indicated in the evaluation section.

[13]:
options(repr.plot.width = 12)
Gumbel_hindcast <- EVT_plot(ensemble = SEAS5_UK_weighted_df$tprate,
                         obs = EOBS_UK_weighted_df_hindcast$rr,
                         main = "1981-2016",
                         GEV_type = "Gumbel",
#                          ylim = 3,
                         y_lab = 'UK February precipitation (mm/d)'
                        )
GEV_hindcast <- EVT_plot(ensemble = SEAS5_UK_weighted_df$tprate,
                                   obs = EOBS_UK_weighted_df$rr,
                                   main = "Entire EOBS",
                                   GEV_type = "Gumbel",
#                                    ylim = 3,
                                   y_lab = 'UK February precipitation (mm/d)'
                                  )

ggarrange(Gumbel_hindcast, GEV_hindcast,
  labels = c("a", "b"), # ,"c","d"),
  common.legend = T,
  font.label = list(size = 10, color = "black", face = "bold", family = NULL),
  ncol = 2, nrow = 1
)
../../_images/Notebooks_examples_UK_Precipitation_44_0.png

Why is there too little variability within UK february simulations?

This can be fed back to model developers to help improve the models.

We could further explore the use of other observational datasets and other model simulations.