Launch in Binder

UK Precipitation¶

February 2020 case study¶

February 2020 was the wettest February on record in the UK (since 1862), according to the Met Office. The UK faced three official storms during February, and this exceptional phenomena attracted media attention, such as an article from the BBC on increased climate concerns among the population. A Carbon Brief post explained why the UK saw such record-breaking rainfall and put this rare event into perspective, citing, amongst other approaches, the UNSEEN method. The UNSEEN study by Thompson et al., 2017 assessed monthly precipitation over the UK. They showed that the monthly precipitation records for south east England have a 7% chance of being exceeded in at least one month in any given winter. They did not use SEAS5 but the Met Office model ensemble. This work was taken up in the National Flood Resilience Review (2016), showing the high relevance and applicability of the method.

Here, the aim is to build an open, reproducible and transferable workflow, that will be tested for this well-studied region of the world and can be transferred to other regions and climate variables of interest, such as the 2020 Siberian heat and California fires.

Retrieve data¶

The main functions to retrieve all forecasts (SEAS5) is retrieve_SEAS5. We want to download February average precipitation over the UK. By default, the hindcast years of 1981-2016 are downloaded for SEAS5. The folder indicates where the files will be stored, in this case outside of the UNSEEN-open repository, in a ‘UK_example’ directory. For more explanation, see retrieve.

[ ]:

retrieve.retrieve_SEAS5(variables = 'total_precipitation',
                        target_months = [2],
                        area = [60, -11, 50, 2],
                        folder = '../UK_example/SEAS5/')

We use the EOBS observational dataset to evaluate the UNSEEN ensemble. I tried to download EOBS through the Copernicus Climate Data Store, but the Product is temporally disabled for maintenance purposes. As workaround I downloaded EOBS (from 1950 - 2019) and the most recent EOBS data (2020) here. Note, you have to register as E-OBS user.

Evaluate¶

Note

From here onward we use R and not python!

We switch to R since we believe R has a better functionality in extreme value statistics.

[1]:

setwd('../../..')
# getwd()
EOBS_UK_weighted_df <- read.csv("Data/EOBS_UK_weighted_upscaled.csv", stringsAsFactors=FALSE)
SEAS5_UK_weighted_df <- read.csv("Data/SEAS5_UK_weighted_masked.csv", stringsAsFactors=FALSE)

## Convert the time class to Date format
EOBS_UK_weighted_df$time <- lubridate::ymd(EOBS_UK_weighted_df$time)
str(EOBS_UK_weighted_df)

EOBS_UK_weighted_df_hindcast <- EOBS_UK_weighted_df[
    EOBS_UK_weighted_df$time > '1982-02-01' &
    EOBS_UK_weighted_df$time < '2017-02-01',
    ]


SEAS5_UK_weighted_df$time <- lubridate::ymd(SEAS5_UK_weighted_df$time)
str(SEAS5_UK_weighted_df)

'data.frame':   71 obs. of  2 variables:
 $ time: Date, format: "1950-02-28" "1951-02-28" ...
 $ rr  : num  4.13 3.25 1.07 1.59 2.59 ...
'data.frame':   4375 obs. of  4 variables:
 $ leadtime: int  2 2 2 2 2 2 2 2 2 2 ...
 $ number  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ time    : Date, format: "1982-02-01" "1983-02-01" ...
 $ tprate  : num  1.62 2.93 3.27 2 3.31 ...

Is the UNSEEN ensemble realistic?

To answer this question, we perform three statistical tests: independence, model stability and model fidelity tests.

These statistical tests are available through the UNSEEN R package. See evaluation for more info.

[2]:

require(UNSEEN)

Loading required package: UNSEEN

Timeseries¶

We plot the timeseries of SEAS5 (UNSEEN) and EOBS (OBS) for UK February precipitation.

[3]:

unseen_timeseries(ensemble = SEAS5_UK_weighted_df,
                  obs = EOBS_UK_weighted_df,
                  ylab = 'UK February precipitation (mm/d)')

../../_images/Notebooks_examples_UK_Precipitation_27_0.png

We select the timeseries for the hindcast years 1981-2016.

[4]:

timeseries <- unseen_timeseries(ensemble = SEAS5_UK_weighted_df,
                  obs = EOBS_UK_weighted_df_hindcast,
                  ylab = 'UK February precipitation (mm/d)')
ggsave(timeseries, height = 5, width = 6,   filename = "graphs/UK_timeseries.png")

Error in ggsave(timeseries, height = 5, width = 6, filename = "graphs/UK_timeseries.png"): could not find function "ggsave"
Traceback:

Evaluation tests¶

With the hindcast dataset we evaluate the independence, stability and fidelity.

First the independence test. This test checks if the forecasts are independent. If they are not, the event are not unique and care should be taken in the extreme value analysis. Because of the chaotic behaviour of the atmosphere, independence of precipitation events is expected beyond a lead time of two weeks. Here we use lead times 2-6 months and find that the boxplots are within the expected range (perhaps very small dependence in lead time 2). More info in our paper: https://doi.org/10.31223/osf.io/hyxeq.

[5]:

independence_test(ensemble = SEAS5_UK)

Warning message:
"Removed 1625 rows containing non-finite values (stat_ydensity)."
Warning message:
"Removed 1625 rows containing non-finite values (stat_boxplot)."

../../_images/Notebooks_examples_UK_Precipitation_31_1.png

The test for model stability: Is there a drift in the simulated precipitation over lead times?

We find that the model is stable for UK February precipitation.

[8]:

stability_test(ensemble = SEAS5_UK, lab = 'UK February precipitation (mm/d)')

Warning message:
“Removed 4 row(s) containing missing values (geom_path).”

../../_images/Notebooks_examples_UK_Precipitation_33_1.png

The fidelity test shows us how consistent the model simulations of UNSEEN (SEAS5) are with the observed (EOBS). With this test we can asses systematic biases. The UNSEEN dataset is much larger than the observed – hence they cannot simply be compared. For example, what if we had faced a few more or a few less precipitation extremes purely by chance?

This would influence the observed mean, but not so much influence the UNSEEN ensemble because of the large data sample. Therefore we express the UNSEEN ensemble as a range of plausible means, for data samples of the same length as the observed. We do the same for higher order statistical moments.

[6]:

fidelity_test(obs = EOBS_UK_weighted_df_hindcast$rr,
              ensemble = SEAS5_UK_weighted_df$tprate
             )

../../_images/Notebooks_examples_UK_Precipitation_35_0.png

We find that the standard deviation within the model (the grey histograms and lines) are too low compared to the observed.

We can include a simple mean-bias correction (ratio) in this plot by setting biascor = TRUE. However, in this case it won’t help:

[7]:

fidelity_test(obs = EOBS_UK_weighted_df_hindcast$rr,
              ensemble = SEAS5_UK_weighted_df$tprate,
              biascor = TRUE
             )

../../_images/Notebooks_examples_UK_Precipitation_37_0.png

Check the documentation of the test ?fidelity_test

Illustrate¶

[8]:

source('src/evt_plot.r')

Loading required package: Lmoments

Loading required package: distillery


Attaching package: 'extRemes'


The following objects are masked from 'package:stats':

    qqnorm, qqplot

First, we fit a Gumbel and a GEV distribution (including shape parameter) to the observed extremes. The Gumbel distribution best describes the data because the p-value of 0.9 is much above 0.05 (based on the likelihood ratio test).

[9]:

fit_obs_Gumbel <- fevd(x = EOBS_UK_weighted_df_hindcast$rr,
                    type = "Gumbel"
                   )
fit_obs_GEV <- fevd(x = EOBS_UK_weighted_df_hindcast$rr,
                    type = "GEV"
                   )
lr.test(fit_obs_Gumbel, fit_obs_GEV)


        Likelihood-ratio Test

data:  EOBS_UK_weighted_df_hindcast$rrEOBS_UK_weighted_df_hindcast$rr
Likelihood-ratio = 0.014629, chi-square critical value = 3.8415, alpha
= 0.0500, Degrees of Freedom = 1.0000, p-value = 0.9037
alternative hypothesis: greater

We show the gumbel plot for the observed (EOBS) and UNSEEN (SEAS5 hindcast data). This shows that the UNSEEN simulations are not within the uncertainty range of the observations. This has to do with the variability of the model that is too low, as indicated in the evaluation section.

[13]:

options(repr.plot.width = 12)
Gumbel_hindcast <- EVT_plot(ensemble = SEAS5_UK_weighted_df$tprate,
                         obs = EOBS_UK_weighted_df_hindcast$rr,
                         main = "1981-2016",
                         GEV_type = "Gumbel",
#                          ylim = 3,
                         y_lab = 'UK February precipitation (mm/d)'
                        )
GEV_hindcast <- EVT_plot(ensemble = SEAS5_UK_weighted_df$tprate,
                                   obs = EOBS_UK_weighted_df$rr,
                                   main = "Entire EOBS",
                                   GEV_type = "Gumbel",
#                                    ylim = 3,
                                   y_lab = 'UK February precipitation (mm/d)'
                                  )

ggarrange(Gumbel_hindcast, GEV_hindcast,
  labels = c("a", "b"), # ,"c","d"),
  common.legend = T,
  font.label = list(size = 10, color = "black", face = "bold", family = NULL),
  ncol = 2, nrow = 1
)

../../_images/Notebooks_examples_UK_Precipitation_44_0.png

Why is there too little variability within UK february simulations?

This can be fed back to model developers to help improve the models.

We could further explore the use of other observational datasets and other model simulations.