Using EOBS + upscaling

Here we explore how to best extract areal averaged precipitation and test this for UK precipitation within SEAS5 and EOBS. The code is inspired on Matteo De Felice’s blog – credits to him!

We create a mask for all 241 countries within Regionmask, that has predefined countries from Natural Earth datasets (shapefiles). We use the mask to go from gridded precipitation to country-averaged timeseries. We regrid EOBS to the SEAS5 grid so we can select the same grid cells in calculating the UK average for both datasets. The country outline would not be perfect, but the masks would be the same so the comparison would be fair.

I use the xesmf package for upscaling, a good example can be found in this notebook.

Import packages

We need the packages regionmask for masking and xesmf for regridding. I cannot install xesmf into the UNSEEN-open environment without breaking my environment, so in this notebook I use a separate ‘upscale’ environment, as suggested by this issue. I use the packages esmpy=7.1.0 xesmf=0.2.1 regionmask cartopy matplotlib xarray numpy netcdf4.

[1]:
##This is so variables get printed within jupyter
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
[2]:
##import packages
import os
import xarray as xr
import numpy as np
import matplotlib.pyplot as plt
import cartopy
import cartopy.crs as ccrs
import matplotlib.ticker as mticker

import regionmask       # Masking
import xesmf as xe      # Regridding
[3]:
##We want the working directory to be the UNSEEN-open directory
pwd = os.getcwd() ##current working directory is UNSEEN-open/Notebooks/1.Download
pwd #print the present working directory
os.chdir(pwd+'/../../') # Change the working directory to UNSEEN-open
os.getcwd() #print the working directory
[3]:
'/lustre/soge1/projects/ls/personal/timo/UNSEEN-open/Notebooks/1.Download'
[3]:
'/lustre/soge1/projects/ls/personal/timo/UNSEEN-open'

Load SEAS5 and EOBS

From CDS, we retrieve SEAS5 and merge the retrieved files in preprocessing. We create a netcdf file containing the dimensions lat, lon, time (35 years), number (25 ensembles) and leadtime (5 initialization months).

[4]:
SEAS5 = xr.open_dataset('../UK_example/SEAS5/SEAS5.nc')
SEAS5
[4]:
Show/Hide data repr Show/Hide attributes
xarray.Dataset
    • latitude: 11
    • leadtime: 5
    • longitude: 14
    • number: 25
    • time: 35
    • longitude
      (longitude)
      float32
      -11.0 -10.0 -9.0 ... 0.0 1.0 2.0
      units :
      degrees_east
      long_name :
      longitude
      array([-11., -10.,  -9.,  -8.,  -7.,  -6.,  -5.,  -4.,  -3.,  -2.,  -1.,   0.,
               1.,   2.], dtype=float32)
    • time
      (time)
      datetime64[ns]
      1982-02-01 ... 2016-02-01
      long_name :
      time
      array(['1982-02-01T00:00:00.000000000', '1983-02-01T00:00:00.000000000',
             '1984-02-01T00:00:00.000000000', '1985-02-01T00:00:00.000000000',
             '1986-02-01T00:00:00.000000000', '1987-02-01T00:00:00.000000000',
             '1988-02-01T00:00:00.000000000', '1989-02-01T00:00:00.000000000',
             '1990-02-01T00:00:00.000000000', '1991-02-01T00:00:00.000000000',
             '1992-02-01T00:00:00.000000000', '1993-02-01T00:00:00.000000000',
             '1994-02-01T00:00:00.000000000', '1995-02-01T00:00:00.000000000',
             '1996-02-01T00:00:00.000000000', '1997-02-01T00:00:00.000000000',
             '1998-02-01T00:00:00.000000000', '1999-02-01T00:00:00.000000000',
             '2000-02-01T00:00:00.000000000', '2001-02-01T00:00:00.000000000',
             '2002-02-01T00:00:00.000000000', '2003-02-01T00:00:00.000000000',
             '2004-02-01T00:00:00.000000000', '2005-02-01T00:00:00.000000000',
             '2006-02-01T00:00:00.000000000', '2007-02-01T00:00:00.000000000',
             '2008-02-01T00:00:00.000000000', '2009-02-01T00:00:00.000000000',
             '2010-02-01T00:00:00.000000000', '2011-02-01T00:00:00.000000000',
             '2012-02-01T00:00:00.000000000', '2013-02-01T00:00:00.000000000',
             '2014-02-01T00:00:00.000000000', '2015-02-01T00:00:00.000000000',
             '2016-02-01T00:00:00.000000000'], dtype='datetime64[ns]')
    • latitude
      (latitude)
      float32
      60.0 59.0 58.0 ... 52.0 51.0 50.0
      units :
      degrees_north
      long_name :
      latitude
      array([60., 59., 58., 57., 56., 55., 54., 53., 52., 51., 50.], dtype=float32)
    • number
      (number)
      int32
      0 1 2 3 4 5 6 ... 19 20 21 22 23 24
      long_name :
      ensemble_member
      array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
             18, 19, 20, 21, 22, 23, 24], dtype=int32)
    • leadtime
      (leadtime)
      int64
      2 3 4 5 6
      array([2, 3, 4, 5, 6])
    • tprate
      (leadtime, time, number, latitude, longitude)
      float32
      ...
      long_name :
      rainfall
      units :
      mm/day
      standard_name :
      thickness_of_rainfall_amount
      [673750 values with dtype=float32]
  • Conventions :
    CF-1.6
    history :
    2020-05-13 14:49:43 GMT by grib_to_netcdf-2.16.0: /opt/ecmwf/eccodes/bin/grib_to_netcdf -S param -o /cache/data7/adaptor.mars.external-1589381366.1540039-11561-3-ad31a097-72e2-45ce-a565-55c62502f358.nc /cache/tmp/ad31a097-72e2-45ce-a565-55c62502f358-adaptor.mars.external-1589381366.1545565-11561-1-tmp.grib

And load EOBS netcdf with only February precipitation, resulting in 71 values, one for each year within 1950 - 2020 over the European domain (25N-75N x 40W-75E).

[5]:
EOBS = xr.open_dataset('../UK_example/EOBS/EOBS.nc')
EOBS
[5]:
Show/Hide data repr Show/Hide attributes
xarray.Dataset
    • latitude: 201
    • longitude: 464
    • time: 71
    • longitude
      (longitude)
      float64
      -40.38 -40.12 ... 75.12 75.38
      units :
      degrees_east
      long_name :
      Longitude values
      axis :
      X
      standard_name :
      longitude
      array([-40.375, -40.125, -39.875, ...,  74.875,  75.125,  75.375])
    • latitude
      (latitude)
      float64
      25.38 25.62 25.88 ... 75.12 75.38
      units :
      degrees_north
      long_name :
      Latitude values
      axis :
      Y
      standard_name :
      latitude
      array([25.375, 25.625, 25.875, ..., 74.875, 75.125, 75.375])
    • time
      (time)
      datetime64[ns]
      1950-02-28 ... 2020-02-29
      array(['1950-02-28T00:00:00.000000000', '1951-02-28T00:00:00.000000000',
             '1952-02-29T00:00:00.000000000', '1953-02-28T00:00:00.000000000',
             '1954-02-28T00:00:00.000000000', '1955-02-28T00:00:00.000000000',
             '1956-02-29T00:00:00.000000000', '1957-02-28T00:00:00.000000000',
             '1958-02-28T00:00:00.000000000', '1959-02-28T00:00:00.000000000',
             '1960-02-29T00:00:00.000000000', '1961-02-28T00:00:00.000000000',
             '1962-02-28T00:00:00.000000000', '1963-02-28T00:00:00.000000000',
             '1964-02-29T00:00:00.000000000', '1965-02-28T00:00:00.000000000',
             '1966-02-28T00:00:00.000000000', '1967-02-28T00:00:00.000000000',
             '1968-02-29T00:00:00.000000000', '1969-02-28T00:00:00.000000000',
             '1970-02-28T00:00:00.000000000', '1971-02-28T00:00:00.000000000',
             '1972-02-29T00:00:00.000000000', '1973-02-28T00:00:00.000000000',
             '1974-02-28T00:00:00.000000000', '1975-02-28T00:00:00.000000000',
             '1976-02-29T00:00:00.000000000', '1977-02-28T00:00:00.000000000',
             '1978-02-28T00:00:00.000000000', '1979-02-28T00:00:00.000000000',
             '1980-02-29T00:00:00.000000000', '1981-02-28T00:00:00.000000000',
             '1982-02-28T00:00:00.000000000', '1983-02-28T00:00:00.000000000',
             '1984-02-29T00:00:00.000000000', '1985-02-28T00:00:00.000000000',
             '1986-02-28T00:00:00.000000000', '1987-02-28T00:00:00.000000000',
             '1988-02-29T00:00:00.000000000', '1989-02-28T00:00:00.000000000',
             '1990-02-28T00:00:00.000000000', '1991-02-28T00:00:00.000000000',
             '1992-02-29T00:00:00.000000000', '1993-02-28T00:00:00.000000000',
             '1994-02-28T00:00:00.000000000', '1995-02-28T00:00:00.000000000',
             '1996-02-29T00:00:00.000000000', '1997-02-28T00:00:00.000000000',
             '1998-02-28T00:00:00.000000000', '1999-02-28T00:00:00.000000000',
             '2000-02-29T00:00:00.000000000', '2001-02-28T00:00:00.000000000',
             '2002-02-28T00:00:00.000000000', '2003-02-28T00:00:00.000000000',
             '2004-02-29T00:00:00.000000000', '2005-02-28T00:00:00.000000000',
             '2006-02-28T00:00:00.000000000', '2007-02-28T00:00:00.000000000',
             '2008-02-29T00:00:00.000000000', '2009-02-28T00:00:00.000000000',
             '2010-02-28T00:00:00.000000000', '2011-02-28T00:00:00.000000000',
             '2012-02-29T00:00:00.000000000', '2013-02-28T00:00:00.000000000',
             '2014-02-28T00:00:00.000000000', '2015-02-28T00:00:00.000000000',
             '2016-02-29T00:00:00.000000000', '2017-02-28T00:00:00.000000000',
             '2018-02-28T00:00:00.000000000', '2019-02-28T00:00:00.000000000',
             '2020-02-29T00:00:00.000000000'], dtype='datetime64[ns]')
    • rr
      (time, latitude, longitude)
      float32
      ...
      long_name :
      rainfall
      units :
      mm/day
      standard_name :
      thickness_of_rainfall_amount
      [6621744 values with dtype=float32]

Masking

Here we load the countries and create a mask for SEAS5 and for EOBS.

Regionmask has predefined countries from Natural Earth datasets (shapefiles).

[7]:
countries = regionmask.defined_regions.natural_earth.countries_50
countries
[7]:
241 'Natural Earth Countries: 50m' Regions (http://www.naturalearthdata.com)
ZW ZM YE VN VE V VU UZ UY FSM MH MP VI GU AS PR US GS IO SH PN AI FK KY BM VG TC MS JE GG IM GB AE UA UG TM TR TN TT TO TG TL TH TZ TJ TW SYR CH S SW SR SS SD LK E KR ZA SO SL SB SK SLO SG SL SC RS SN SA ST RSM WS VC LC KN RW RUS RO QA P PL PH PE PY PG PA PW PK OM N KP NG NE NI NZ NU CK NL AW CW NP NR NA MZ MA WS ME MN MD MC MX MU MR M ML MV MY MW MG MK L LT FL LY LR LS LB LV LA KG KW KO KI KE KZ J J J I IS PAL IRL IRQ IRN INDO IND IS HU HN HT GY GW GN GT GD GR GH D GE GM GA F PM WF MF BL PF NC TF AI FIN FJ ET EST ER GQ SV EG EC DO DM DJ GL FO DK CZ CN CY CU HR CI CR DRC CG KM CO CN MO HK CL TD CF CV CA CM KH MM BI BF BG BN BR BW BiH BO BT BJ BZ B BY BB BD BH BS AZ A AU IOT HM NF AU ARM AR AG AO AND DZ AL AF SG AQ SX

Now we create the mask for the SEAS5 grid. Only one timestep is needed to create the mask. This mask will lateron be used to mask all the timesteps.

[8]:
SEAS5_mask = countries.mask(SEAS5.sel(leadtime=2, number=0, time='1982'),
                            lon_name='longitude',
                            lat_name='latitude')

And create a plot to illustrate what the mask looks like. The mask just indicates for each gridcell what country the gridcell belongs to.

[9]:
SEAS5_mask
SEAS5_mask.plot()
[9]:
Show/Hide data repr Show/Hide attributes
xarray.DataArray
'region'
  • latitude: 11
  • longitude: 14
  • nan nan nan nan nan nan nan nan ... nan nan nan nan nan nan nan 160.0
    array([[ nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
             nan,  nan,  nan],
           [ nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  31.,  nan,  nan,
             nan,  nan,  nan],
           [ nan,  nan,  nan,  nan,  31.,  nan,  31.,  31.,  nan,  nan,  nan,
             nan,  nan,  nan],
           [ nan,  nan,  nan,  nan,  nan,  nan,  31.,  31.,  31.,  nan,  nan,
             nan,  nan,  nan],
           [ nan,  nan,  nan,  nan,  nan,  nan,  nan,  31.,  nan,  nan,  nan,
             nan,  nan,  nan],
           [ nan,  nan,  nan, 140.,  31.,  31.,  31.,  31.,  31.,  31.,  nan,
             nan,  nan,  nan],
           [ nan, 140., 140., 140., 140.,  nan,  nan,  nan,  nan,  31.,  31.,
             nan,  nan,  nan],
           [ nan,  nan, 140., 140., 140.,  nan,  nan,  31.,  31.,  31.,  31.,
             31.,  nan,  nan],
           [ nan, 140., 140., 140.,  nan,  nan,  31.,  31.,  31.,  31.,  31.,
             31.,  31.,  nan],
           [ nan,  nan,  nan,  nan,  nan,  nan,  nan,  31.,  31.,  31.,  31.,
             31.,  31., 160.],
           [ nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
             nan,  nan, 160.]])
    • latitude
      (latitude)
      float32
      60.0 59.0 58.0 ... 52.0 51.0 50.0
      array([60., 59., 58., 57., 56., 55., 54., 53., 52., 51., 50.], dtype=float32)
    • longitude
      (longitude)
      float32
      -11.0 -10.0 -9.0 ... 0.0 1.0 2.0
      array([-11., -10.,  -9.,  -8.,  -7.,  -6.,  -5.,  -4.,  -3.,  -2.,  -1.,   0.,
               1.,   2.], dtype=float32)
[9]:
<matplotlib.collections.QuadMesh at 0x7f877194e8d0>
../../_images/Notebooks_2.Preprocess_2.3Upscale_16_2.png

And now we can extract the UK averaged precipitation within SEAS5 by using the mask index of the UK: where(SEAS5_mask == UK_index). So we need to find the index of one of the 241 abbreviations. In this case for the UK use ‘GB’. Additionally, if you can’t find a country, use countries.regions to get the full names of the countries.

[10]:
countries.abbrevs.index('GB')
[10]:
31

To select the UK average, we select SEAS5 precipitation (tprate), select the gridcells that are within the UK and take the mean over those gridcells. This results in a dataset of February precipitation for 35 years (1981-2016), with 5 leadtimes and 25 ensemble members.

[11]:
SEAS5_UK = (SEAS5['tprate']
            .where(SEAS5_mask == 31)
            .mean(dim=['latitude', 'longitude']))
SEAS5_UK
[11]:
Show/Hide data repr Show/Hide attributes
xarray.DataArray
'tprate'
  • leadtime: 5
  • time: 35
  • number: 25
  • 1.7730116 1.9548205 3.7803986 ... 3.662669 1.6688719 1.9709551
    array([[[1.7730116 , 1.9548205 , 3.7803986 , ..., 2.9277864 ,
             1.8929143 , 2.446378  ],
            [3.040877  , 1.8855734 , 4.2009687 , ..., 2.8347836 ,
             4.1132317 , 2.4391398 ],
            [3.556001  , 3.6879914 , 3.184576  , ..., 5.116828  ,
             3.8872097 , 2.6074293 ],
            ...,
            [2.9086895 , 3.7759151 , 4.660708  , ..., 1.6525848 ,
             3.1059399 , 1.6731865 ],
            [2.8330274 , 3.6866314 , 2.634609  , ..., 3.3976977 ,
             4.993076  , 3.6399577 ],
            [2.2500532 , 2.510962  , 3.7602315 , ..., 3.8013346 ,
             1.2315732 , 3.6035924 ]],
    
           [[1.0868028 , 1.5332695 , 3.2461395 , ..., 1.5274822 ,
             2.7315547 , 1.0240313 ],
            [0.99898285, 2.9119303 , 2.1601522 , ..., 3.3752463 ,
             2.8715765 , 4.393921  ],
            [3.6581304 , 3.5088263 , 2.0143754 , ..., 3.9588785 ,
             3.795881  , 3.179954  ],
            ...,
            [2.5120296 , 3.568533  , 3.4690757 , ..., 2.6206532 ,
             3.953479  , 3.789593  ],
            [1.9866585 , 3.670435  , 2.7552466 , ..., 3.5482445 ,
             1.9896345 , 3.9373026 ],
            [1.9513754 , 2.0166924 , 2.9413762 , ..., 3.3549297 ,
             3.0631557 , 1.915903  ]],
    
           [[2.0119116 , 2.4435556 , 1.3927166 , ..., 1.8064059 ,
             1.8338976 , 2.4649143 ],
            [2.62523   , 3.6218376 , 3.003645  , ..., 3.3206403 ,
             2.5519    , 3.110555  ],
            [4.071685  , 2.6880858 , 3.8181992 , ..., 1.9033648 ,
             1.8840973 , 4.449085  ],
            ...,
            [2.8379328 , 2.6923704 , 1.82504   , ..., 2.8684394 ,
             3.1954165 , 3.445909  ],
            [2.072118  , 2.0628428 , 2.8790975 , ..., 2.697285  ,
             3.3291562 , 2.9145727 ],
            [3.535856  , 3.6515677 , 2.1052096 , ..., 3.2938933 ,
             3.1968002 , 2.2115657 ]],
    
           [[2.9105136 , 3.6938024 , 1.1343408 , ..., 1.1984391 ,
             2.9722617 , 2.1267796 ],
            [4.02007   , 1.8249133 , 3.099     , ..., 3.4654658 ,
             2.7237208 , 3.5907636 ],
            [3.1248841 , 2.219241  , 3.6903172 , ..., 0.8401702 ,
             4.017805  , 3.476175  ],
            ...,
            [2.8703861 , 5.5576715 , 1.7174771 , ..., 2.1288645 ,
             1.8581603 , 2.2952495 ],
            [4.179039  , 3.737323  , 2.880745  , ..., 1.9207952 ,
             0.6607291 , 2.8571692 ],
            [1.5515772 , 3.0461147 , 2.0624359 , ..., 2.5376263 ,
             4.744201  , 2.0716472 ]],
    
           [[3.1285267 , 3.269652  , 2.5995293 , ..., 4.347241  ,
             1.3022097 , 1.4634563 ],
            [2.262867  , 3.3503478 , 2.4287066 , ..., 2.593644  ,
             4.251352  , 3.1974325 ],
            [4.0569496 , 2.156282  , 1.781804  , ..., 4.4861846 ,
             1.8805121 , 4.268968  ],
            ...,
            [1.843329  , 3.1374288 , 2.516779  , ..., 1.3068637 ,
             2.4419744 , 2.2696178 ],
            [4.635685  , 4.429196  , 2.4575038 , ..., 4.441483  ,
             1.3038206 , 2.6521633 ],
            [3.1435733 , 2.614458  , 2.1784708 , ..., 3.662669  ,
             1.6688719 , 1.9709551 ]]], dtype=float32)
    • time
      (time)
      datetime64[ns]
      1982-02-01 ... 2016-02-01
      long_name :
      time
      array(['1982-02-01T00:00:00.000000000', '1983-02-01T00:00:00.000000000',
             '1984-02-01T00:00:00.000000000', '1985-02-01T00:00:00.000000000',
             '1986-02-01T00:00:00.000000000', '1987-02-01T00:00:00.000000000',
             '1988-02-01T00:00:00.000000000', '1989-02-01T00:00:00.000000000',
             '1990-02-01T00:00:00.000000000', '1991-02-01T00:00:00.000000000',
             '1992-02-01T00:00:00.000000000', '1993-02-01T00:00:00.000000000',
             '1994-02-01T00:00:00.000000000', '1995-02-01T00:00:00.000000000',
             '1996-02-01T00:00:00.000000000', '1997-02-01T00:00:00.000000000',
             '1998-02-01T00:00:00.000000000', '1999-02-01T00:00:00.000000000',
             '2000-02-01T00:00:00.000000000', '2001-02-01T00:00:00.000000000',
             '2002-02-01T00:00:00.000000000', '2003-02-01T00:00:00.000000000',
             '2004-02-01T00:00:00.000000000', '2005-02-01T00:00:00.000000000',
             '2006-02-01T00:00:00.000000000', '2007-02-01T00:00:00.000000000',
             '2008-02-01T00:00:00.000000000', '2009-02-01T00:00:00.000000000',
             '2010-02-01T00:00:00.000000000', '2011-02-01T00:00:00.000000000',
             '2012-02-01T00:00:00.000000000', '2013-02-01T00:00:00.000000000',
             '2014-02-01T00:00:00.000000000', '2015-02-01T00:00:00.000000000',
             '2016-02-01T00:00:00.000000000'], dtype='datetime64[ns]')
    • number
      (number)
      int32
      0 1 2 3 4 5 6 ... 19 20 21 22 23 24
      long_name :
      ensemble_member
      array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
             18, 19, 20, 21, 22, 23, 24], dtype=int32)
    • leadtime
      (leadtime)
      int64
      2 3 4 5 6
      array([2, 3, 4, 5, 6])

However, xarray does not take into account the area of the gridcells in taking the average. Therefore, we have to calculate the area-weighted mean of the gridcells. To calculate the area of each gridcell, I use cdo cdo gridarea infile outfile. Here I load the generated file:

[12]:
Gridarea_SEAS5 = xr.open_dataset('../UK_example/Gridarea_SEAS5.nc')
Gridarea_SEAS5['cell_area'].plot()
[12]:
<matplotlib.collections.QuadMesh at 0x7f8771a14bd0>
../../_images/Notebooks_2.Preprocess_2.3Upscale_22_1.png
[13]:
SEAS5_UK_weighted = (SEAS5['tprate']
                  .where(SEAS5_mask == 31)
                  .weighted(Gridarea_SEAS5['cell_area'])
                  .mean(dim=['latitude', 'longitude'])
                 )
SEAS5_UK_weighted
[13]:
Show/Hide data repr Show/Hide attributes
xarray.DataArray
  • leadtime: 5
  • time: 35
  • number: 25
  • 1.747 1.916 3.742 2.909 4.562 3.113 ... 4.034 1.957 3.664 1.684 1.929
    array([[[1.74715784, 1.91625164, 3.74246331, ..., 2.9025497 ,
             1.87161458, 2.42337871],
            [3.01557164, 1.86355946, 4.23964218, ..., 2.82348107,
             4.08311346, 2.45320866],
            [3.45037457, 3.67373672, 3.19124952, ..., 5.10772697,
             3.83928173, 2.59254324],
            ...,
            [2.91576568, 3.76689869, 4.64815469, ..., 1.61583385,
             3.07634248, 1.633837  ],
            [2.8217756 , 3.61588493, 2.61442489, ..., 3.36894025,
             5.01010622, 3.58215465],
            [2.21954162, 2.46237273, 3.71758684, ..., 3.72850729,
             1.21044478, 3.5506569 ]],
    
           [[1.08925258, 1.502868  , 3.23383862, ..., 1.49745391,
             2.74434639, 0.98857514],
            [0.96385251, 2.9144073 , 2.14176199, ..., 3.39404417,
             2.837949  , 4.39105086],
            [3.64189637, 3.44186084, 1.96817031, ..., 3.90560029,
             3.78419096, 3.13070979],
            ...,
            [2.46681904, 3.55272741, 3.41184678, ..., 2.58295751,
             3.84194729, 3.71967185],
            [1.96088291, 3.64719353, 2.73561159, ..., 3.56311814,
             2.00379361, 3.89548019],
            [1.89263296, 1.99179138, 2.91457733, ..., 3.37011609,
             3.05092884, 1.89736692]],
    
           [[1.94362083, 2.4160058 , 1.36431312, ..., 1.78378666,
             1.82209387, 2.42526171],
            [2.57294707, 3.55756557, 2.96458594, ..., 3.26071746,
             2.60226357, 3.13573254],
            [4.13926899, 2.61380816, 3.76440713, ..., 1.87333769,
             1.82042494, 4.46818308],
            ...,
            [2.81221309, 2.69631474, 1.80062933, ..., 2.86429728,
             3.13303958, 3.44787769],
            [2.07369663, 2.03958281, 2.81885547, ..., 2.65322161,
             3.32297921, 2.83923239],
            [3.49884241, 3.63866711, 2.03510114, ..., 3.24500072,
             3.15147085, 2.16834782]],
    
           [[2.83876303, 3.61651907, 1.0950032 , ..., 1.17176117,
             2.91180608, 2.1047135 ],
            [3.95351432, 1.78778573, 3.08959013, ..., 3.43481603,
             2.72829325, 3.6158294 ],
            [3.13152664, 2.19419128, 3.64975772, ..., 0.83938823,
             4.01702257, 3.46619512],
            ...,
            [2.90135673, 5.59148292, 1.70016201, ..., 2.08532097,
             1.84797942, 2.23519466],
            [4.19777172, 3.74702108, 2.8748067 , ..., 1.88071816,
             0.62889528, 2.79973163],
            [1.51194718, 3.07317605, 2.05956574, ..., 2.53950207,
             4.70096128, 2.04542173]],
    
           [[3.15063505, 3.23490175, 2.60923731, ..., 4.27225421,
             1.29816875, 1.42799228],
            [2.21017692, 3.32458317, 2.3819878 , ..., 2.53619218,
             4.24655687, 3.21280586],
            [4.07655988, 2.07606666, 1.75961194, ..., 4.47653645,
             1.81676988, 4.21655002],
            ...,
            [1.82563235, 3.10218576, 2.46347745, ..., 1.28711195,
             2.36333743, 2.26540939],
            [4.63821163, 4.43540009, 2.37741808, ..., 4.44346938,
             1.27125796, 2.63467098],
            [3.12914205, 2.58186504, 2.11029128, ..., 3.6635387 ,
             1.68374372, 1.9288108 ]]])
    • time
      (time)
      datetime64[ns]
      1982-02-01 ... 2016-02-01
      long_name :
      time
      array(['1982-02-01T00:00:00.000000000', '1983-02-01T00:00:00.000000000',
             '1984-02-01T00:00:00.000000000', '1985-02-01T00:00:00.000000000',
             '1986-02-01T00:00:00.000000000', '1987-02-01T00:00:00.000000000',
             '1988-02-01T00:00:00.000000000', '1989-02-01T00:00:00.000000000',
             '1990-02-01T00:00:00.000000000', '1991-02-01T00:00:00.000000000',
             '1992-02-01T00:00:00.000000000', '1993-02-01T00:00:00.000000000',
             '1994-02-01T00:00:00.000000000', '1995-02-01T00:00:00.000000000',
             '1996-02-01T00:00:00.000000000', '1997-02-01T00:00:00.000000000',
             '1998-02-01T00:00:00.000000000', '1999-02-01T00:00:00.000000000',
             '2000-02-01T00:00:00.000000000', '2001-02-01T00:00:00.000000000',
             '2002-02-01T00:00:00.000000000', '2003-02-01T00:00:00.000000000',
             '2004-02-01T00:00:00.000000000', '2005-02-01T00:00:00.000000000',
             '2006-02-01T00:00:00.000000000', '2007-02-01T00:00:00.000000000',
             '2008-02-01T00:00:00.000000000', '2009-02-01T00:00:00.000000000',
             '2010-02-01T00:00:00.000000000', '2011-02-01T00:00:00.000000000',
             '2012-02-01T00:00:00.000000000', '2013-02-01T00:00:00.000000000',
             '2014-02-01T00:00:00.000000000', '2015-02-01T00:00:00.000000000',
             '2016-02-01T00:00:00.000000000'], dtype='datetime64[ns]')
    • number
      (number)
      int32
      0 1 2 3 4 5 6 ... 19 20 21 22 23 24
      long_name :
      ensemble_member
      array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
             18, 19, 20, 21, 22, 23, 24], dtype=int32)
    • leadtime
      (leadtime)
      int64
      2 3 4 5 6
      array([2, 3, 4, 5, 6])

What is the difference between the weighted and non-weighted average?

I plot the UK average for ensemble member 0 and leadtime 2

[14]:
SEAS5_UK.sel(leadtime=2,number=0).plot()
SEAS5_UK_weighted.sel(leadtime=2,number=0).plot()

[14]:
[<matplotlib.lines.Line2D at 0x7f87718f8b50>]
[14]:
[<matplotlib.lines.Line2D at 0x7f87718e3ad0>]
../../_images/Notebooks_2.Preprocess_2.3Upscale_25_2.png

Upscale

For EOBS, we want to upscale the dataset to the SEAS5 grid. We use the function regridder(ds_in,ds_out,function), see the docs. We have to rename the lat lon dimensions so the function can read them.

We use bilinear interpolation first (i.e. function = ‘bilinear’), because of its ease in implementation. However, the use of conservative areal average (function = ‘conservative’) for upscaling is preferred (Kopparla, 2013).

[8]:
regridder = xe.Regridder(EOBS.rename({'longitude': 'lon', 'latitude': 'lat'}), SEAS5.rename({'longitude': 'lon', 'latitude': 'lat'}),'bilinear')
Overwrite existing file: bilinear_201x464_11x14.nc
 You can set reuse_weights=True to save computing time.

Now that we have the regridder, we can apply the regridder to our EOBS dataarray:

[27]:
EOBS_upscaled = regridder(EOBS)
EOBS_upscaled
using dimensions ('latitude', 'longitude') from data variable rr as the horizontal dimensions for this dataset.
[27]:
Show/Hide data repr Show/Hide attributes
xarray.Dataset
    • lat: 11
    • lon: 14
    • time: 71
    • time
      (time)
      datetime64[ns]
      1950-02-28 ... 2020-02-29
      array(['1950-02-28T00:00:00.000000000', '1951-02-28T00:00:00.000000000',
             '1952-02-29T00:00:00.000000000', '1953-02-28T00:00:00.000000000',
             '1954-02-28T00:00:00.000000000', '1955-02-28T00:00:00.000000000',
             '1956-02-29T00:00:00.000000000', '1957-02-28T00:00:00.000000000',
             '1958-02-28T00:00:00.000000000', '1959-02-28T00:00:00.000000000',
             '1960-02-29T00:00:00.000000000', '1961-02-28T00:00:00.000000000',
             '1962-02-28T00:00:00.000000000', '1963-02-28T00:00:00.000000000',
             '1964-02-29T00:00:00.000000000', '1965-02-28T00:00:00.000000000',
             '1966-02-28T00:00:00.000000000', '1967-02-28T00:00:00.000000000',
             '1968-02-29T00:00:00.000000000', '1969-02-28T00:00:00.000000000',
             '1970-02-28T00:00:00.000000000', '1971-02-28T00:00:00.000000000',
             '1972-02-29T00:00:00.000000000', '1973-02-28T00:00:00.000000000',
             '1974-02-28T00:00:00.000000000', '1975-02-28T00:00:00.000000000',
             '1976-02-29T00:00:00.000000000', '1977-02-28T00:00:00.000000000',
             '1978-02-28T00:00:00.000000000', '1979-02-28T00:00:00.000000000',
             '1980-02-29T00:00:00.000000000', '1981-02-28T00:00:00.000000000',
             '1982-02-28T00:00:00.000000000', '1983-02-28T00:00:00.000000000',
             '1984-02-29T00:00:00.000000000', '1985-02-28T00:00:00.000000000',
             '1986-02-28T00:00:00.000000000', '1987-02-28T00:00:00.000000000',
             '1988-02-29T00:00:00.000000000', '1989-02-28T00:00:00.000000000',
             '1990-02-28T00:00:00.000000000', '1991-02-28T00:00:00.000000000',
             '1992-02-29T00:00:00.000000000', '1993-02-28T00:00:00.000000000',
             '1994-02-28T00:00:00.000000000', '1995-02-28T00:00:00.000000000',
             '1996-02-29T00:00:00.000000000', '1997-02-28T00:00:00.000000000',
             '1998-02-28T00:00:00.000000000', '1999-02-28T00:00:00.000000000',
             '2000-02-29T00:00:00.000000000', '2001-02-28T00:00:00.000000000',
             '2002-02-28T00:00:00.000000000', '2003-02-28T00:00:00.000000000',
             '2004-02-29T00:00:00.000000000', '2005-02-28T00:00:00.000000000',
             '2006-02-28T00:00:00.000000000', '2007-02-28T00:00:00.000000000',
             '2008-02-29T00:00:00.000000000', '2009-02-28T00:00:00.000000000',
             '2010-02-28T00:00:00.000000000', '2011-02-28T00:00:00.000000000',
             '2012-02-29T00:00:00.000000000', '2013-02-28T00:00:00.000000000',
             '2014-02-28T00:00:00.000000000', '2015-02-28T00:00:00.000000000',
             '2016-02-29T00:00:00.000000000', '2017-02-28T00:00:00.000000000',
             '2018-02-28T00:00:00.000000000', '2019-02-28T00:00:00.000000000',
             '2020-02-29T00:00:00.000000000'], dtype='datetime64[ns]')
    • lon
      (lon)
      float32
      -11.0 -10.0 -9.0 ... 0.0 1.0 2.0
      array([-11., -10.,  -9.,  -8.,  -7.,  -6.,  -5.,  -4.,  -3.,  -2.,  -1.,   0.,
               1.,   2.], dtype=float32)
    • lat
      (lat)
      float32
      60.0 59.0 58.0 ... 52.0 51.0 50.0
      array([60., 59., 58., 57., 56., 55., 54., 53., 52., 51., 50.], dtype=float32)
    • rr
      (time, lat, lon)
      float64
      nan nan nan nan ... nan nan 4.243
      array([[[       nan,        nan,        nan, ...,        nan,
                      nan,        nan],
              [       nan,        nan,        nan, ...,        nan,
                      nan,        nan],
              [       nan,        nan,        nan, ...,        nan,
                      nan,        nan],
              ...,
              [       nan, 9.83356658, 8.21107723, ..., 3.16702519,
               2.5759045 ,        nan],
              [       nan,        nan,        nan, ..., 3.64821065,
                      nan,        nan],
              [       nan,        nan,        nan, ...,        nan,
                      nan, 2.88222815]],
      
             [[       nan,        nan,        nan, ...,        nan,
                      nan,        nan],
              [       nan,        nan,        nan, ...,        nan,
                      nan,        nan],
              [       nan,        nan,        nan, ...,        nan,
                      nan,        nan],
              ...,
              [       nan, 6.84924265, 5.05712206, ..., 2.98497796,
               2.8831402 ,        nan],
              [       nan,        nan,        nan, ..., 4.582142  ,
                      nan,        nan],
              [       nan,        nan,        nan, ...,        nan,
                      nan, 2.52327854]],
      
             [[       nan,        nan,        nan, ...,        nan,
                      nan,        nan],
              [       nan,        nan,        nan, ...,        nan,
                      nan,        nan],
              [       nan,        nan,        nan, ...,        nan,
                      nan,        nan],
              ...,
              [       nan, 1.43787911, 0.64045753, ..., 0.50001099,
               0.47414158,        nan],
              [       nan,        nan,        nan, ..., 0.91728464,
                      nan,        nan],
              [       nan,        nan,        nan, ...,        nan,
                      nan, 1.65432386]],
      
             ...,
      
             [[       nan,        nan,        nan, ...,        nan,
                      nan,        nan],
              [       nan,        nan,        nan, ...,        nan,
                      nan,        nan],
              [       nan,        nan,        nan, ...,        nan,
                      nan,        nan],
              ...,
              [       nan, 6.07403735, 2.99021503, ..., 1.05803059,
               1.26427248,        nan],
              [       nan,        nan,        nan, ..., 1.05892861,
                      nan,        nan],
              [       nan,        nan,        nan, ...,        nan,
                      nan, 1.13395446]],
      
             [[       nan,        nan,        nan, ...,        nan,
                      nan,        nan],
              [       nan,        nan,        nan, ...,        nan,
                      nan,        nan],
              [       nan,        nan,        nan, ...,        nan,
                      nan,        nan],
              ...,
              [       nan, 8.53733901, 4.27626159, ..., 1.10982856,
               0.72946197,        nan],
              [       nan,        nan,        nan, ..., 1.46519147,
                      nan,        nan],
              [       nan,        nan,        nan, ...,        nan,
                      nan, 1.3330352 ]],
      
             [[       nan,        nan,        nan, ...,        nan,
                      nan,        nan],
              [       nan,        nan,        nan, ...,        nan,
                      nan,        nan],
              [       nan,        nan,        nan, ...,        nan,
                      nan,        nan],
              ...,
              [       nan, 7.82318034, 7.0790547 , ..., 2.82500975,
               1.98278063,        nan],
              [       nan,        nan,        nan, ..., 3.50348602,
                      nan,        nan],
              [       nan,        nan,        nan, ...,        nan,
                      nan, 4.24307919]]])
  • regrid_method :
    bilinear

And set the latlon dimension names back to their long name. This is so both SEAS5 and EOBS have the same latlon dimension names which is necessary when using the same mask.

[28]:
EOBS_upscaled = EOBS_upscaled.rename({'lon' : 'longitude', 'lat' : 'latitude'})

Illustrate the SEAS5 and EOBS masks for the UK

Here I plot the masked mean SEAS5 and upscaled EOBS precipitation. This shows that upscaled EOBS does not contain data for all gridcells within the UK mask (the difference between SEAS5 gridcells and EOBS gridcells with data). We can apply an additional mask for SEAS5 that masks the grid cells that do not contain data in EOBS.

[30]:
fig, axs = plt.subplots(1, 2, subplot_kw={'projection': ccrs.OSGB()})

SEAS5['tprate'].where(SEAS5_mask == 31).mean(
    dim=['time', 'leadtime', 'number']).plot(
    transform=ccrs.PlateCarree(),
    vmin=0,
    vmax=8,
    cmap=plt.cm.Blues,
    ax=axs[0])

EOBS_upscaled['rr'].where(SEAS5_mask == 31).mean(dim='time').plot(
    transform=ccrs.PlateCarree(),
    vmin=0,
    vmax=8,
    cmap=plt.cm.Blues,
    ax=axs[1])

for ax in axs.flat:
    ax.coastlines(resolution='10m')

axs[0].set_title('SEAS5')
axs[1].set_title('EOBS')
/soge-home/users/cenv0732/.conda/envs/upscale/lib/python3.7/site-packages/xarray/core/nanops.py:142: RuntimeWarning: Mean of empty slice
  return np.nanmean(a, axis=axis, dtype=dtype)
[30]:
<matplotlib.collections.QuadMesh at 0x7fc1ea756c50>
/soge-home/users/cenv0732/.conda/envs/upscale/lib/python3.7/site-packages/xarray/core/nanops.py:142: RuntimeWarning: Mean of empty slice
  return np.nanmean(a, axis=axis, dtype=dtype)
[30]:
<matplotlib.collections.QuadMesh at 0x7fc1ea71e650>
[30]:
<cartopy.mpl.feature_artist.FeatureArtist at 0x7fc1ea6d8f10>
[30]:
<cartopy.mpl.feature_artist.FeatureArtist at 0x7fc1ea7b9590>
[30]:
Text(0.5, 1.0, 'SEAS5')
[30]:
Text(0.5, 1.0, 'EOBS')
../../_images/Notebooks_2.Preprocess_2.3Upscale_33_8.png

The additional mask of SEAS5 is where EOBS is not null:

[32]:
fig, axs = plt.subplots(1, 2, subplot_kw={'projection': ccrs.OSGB()})

(SEAS5['tprate']
 .where(SEAS5_mask == 31)
 .where(EOBS_upscaled['rr'].sel(time='1950').squeeze('time').notnull()) ## mask values that are nan in EOBS
 .mean(dim=['time', 'leadtime', 'number'])
 .plot(
    transform=ccrs.PlateCarree(),
    vmin=0,
    vmax=8,
    cmap=plt.cm.Blues,
    ax=axs[0])
)

EOBS_upscaled['rr'].where(SEAS5_mask == 31).mean(dim='time').plot(
    transform=ccrs.PlateCarree(),
    vmin=0,
    vmax=8,
    cmap=plt.cm.Blues,
    ax=axs[1])

for ax in axs.flat:
    ax.coastlines(resolution='10m')

axs[0].set_title('SEAS5')
axs[1].set_title('EOBS')
/soge-home/users/cenv0732/.conda/envs/upscale/lib/python3.7/site-packages/xarray/core/nanops.py:142: RuntimeWarning: Mean of empty slice
  return np.nanmean(a, axis=axis, dtype=dtype)
[32]:
<matplotlib.collections.QuadMesh at 0x7fc1ea579590>
/soge-home/users/cenv0732/.conda/envs/upscale/lib/python3.7/site-packages/xarray/core/nanops.py:142: RuntimeWarning: Mean of empty slice
  return np.nanmean(a, axis=axis, dtype=dtype)
[32]:
<matplotlib.collections.QuadMesh at 0x7fc1ea56f750>
[32]:
<cartopy.mpl.feature_artist.FeatureArtist at 0x7fc1ea4fe450>
[32]:
<cartopy.mpl.feature_artist.FeatureArtist at 0x7fc1ea56ffd0>
[32]:
Text(0.5, 1.0, 'SEAS5')
[32]:
Text(0.5, 1.0, 'EOBS')
../../_images/Notebooks_2.Preprocess_2.3Upscale_35_8.png

Extract the spatial average

To select the UK average, we select SEAS5 precipitation (tprate), select the gridcells that are within the UK and take the area-weighted mean over those gridcells. This results in a dataset of February precipitation for 35 years (1981-2016), with 5 leadtimes and 25 ensemble members.

[38]:
SEAS5_UK_weighted = (SEAS5
                  .where(SEAS5_mask == 31)
                  .where(EOBS_upscaled['rr'].sel(time='1950').squeeze('time').notnull())
                  .weighted(Gridarea_SEAS5['cell_area'])
                  .mean(dim=['latitude', 'longitude'])
                 )
SEAS5_UK_weighted
[38]:
Show/Hide data repr Show/Hide attributes
xarray.Dataset
    • leadtime: 5
    • number: 25
    • time: 35
    • time
      (time)
      datetime64[ns]
      1982-02-01 ... 2016-02-01
      long_name :
      time
      array(['1982-02-01T00:00:00.000000000', '1983-02-01T00:00:00.000000000',
             '1984-02-01T00:00:00.000000000', '1985-02-01T00:00:00.000000000',
             '1986-02-01T00:00:00.000000000', '1987-02-01T00:00:00.000000000',
             '1988-02-01T00:00:00.000000000', '1989-02-01T00:00:00.000000000',
             '1990-02-01T00:00:00.000000000', '1991-02-01T00:00:00.000000000',
             '1992-02-01T00:00:00.000000000', '1993-02-01T00:00:00.000000000',
             '1994-02-01T00:00:00.000000000', '1995-02-01T00:00:00.000000000',
             '1996-02-01T00:00:00.000000000', '1997-02-01T00:00:00.000000000',
             '1998-02-01T00:00:00.000000000', '1999-02-01T00:00:00.000000000',
             '2000-02-01T00:00:00.000000000', '2001-02-01T00:00:00.000000000',
             '2002-02-01T00:00:00.000000000', '2003-02-01T00:00:00.000000000',
             '2004-02-01T00:00:00.000000000', '2005-02-01T00:00:00.000000000',
             '2006-02-01T00:00:00.000000000', '2007-02-01T00:00:00.000000000',
             '2008-02-01T00:00:00.000000000', '2009-02-01T00:00:00.000000000',
             '2010-02-01T00:00:00.000000000', '2011-02-01T00:00:00.000000000',
             '2012-02-01T00:00:00.000000000', '2013-02-01T00:00:00.000000000',
             '2014-02-01T00:00:00.000000000', '2015-02-01T00:00:00.000000000',
             '2016-02-01T00:00:00.000000000'], dtype='datetime64[ns]')
    • number
      (number)
      int32
      0 1 2 3 4 5 6 ... 19 20 21 22 23 24
      long_name :
      ensemble_member
      array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
             18, 19, 20, 21, 22, 23, 24], dtype=int32)
    • leadtime
      (leadtime)
      int64
      2 3 4 5 6
      array([2, 3, 4, 5, 6])
    • tprate
      (leadtime, time, number)
      float64
      1.62 1.803 3.715 ... 1.681 1.769
      array([[[1.61983929, 1.80270562, 3.71546987, ..., 2.88294029,
               1.739468  , 2.38352979],
              [2.92980026, 1.76955459, 4.32214303, ..., 2.82505383,
               4.08514105, 2.35270195],
              [3.27281587, 3.64590937, 3.30555726, ..., 5.14189287,
               3.80868977, 2.52809885],
              ...,
              [2.989394  , 3.77111344, 4.6468335 , ..., 1.54306509,
               2.88290349, 1.51127271],
              [2.77336544, 3.56587255, 2.67943505, ..., 3.31771558,
               5.10737252, 3.54107268],
              [2.1260184 , 2.32783242, 3.5086297 , ..., 3.58126581,
               1.10084914, 3.4377672 ]],
      
             [[1.09491934, 1.53100393, 3.26492016, ..., 1.38953026,
               2.79466853, 0.85712489],
              [0.82039565, 2.81008751, 2.15275396, ..., 3.3407505 ,
               2.78649799, 4.40346567],
              [3.66650518, 3.36296082, 1.86208923, ..., 3.88302357,
               3.73840395, 3.05970022],
              ...,
              [2.44442749, 3.57619699, 3.30370749, ..., 2.36393416,
               3.78797196, 3.61051521],
              [1.85189896, 3.64994576, 2.75797531, ..., 3.54899347,
               1.96163795, 3.92756756],
              [1.69035158, 1.90886006, 2.81342074, ..., 3.43975463,
               3.08070923, 1.71887629]],
      
             [[1.82732135, 2.35700255, 1.33907771, ..., 1.75420451,
               1.73882812, 2.28699077],
              [2.40939677, 3.5025749 , 2.91052599, ..., 3.20174607,
               2.67736291, 3.16030153],
              [4.27967074, 2.39810642, 3.68449437, ..., 1.75735736,
               1.74492558, 4.47295371],
              ...,
              [2.73664435, 2.66972478, 1.7240791 , ..., 2.90310921,
               3.05444721, 3.45214616],
              [2.05969104, 1.85866855, 2.63843   , ..., 2.55877674,
               3.28343286, 2.6359866 ],
              [3.36070822, 3.69223251, 1.79020433, ..., 3.07561624,
               3.00743179, 2.03883137]],
      
             [[2.77891452, 3.58542727, 1.01360512, ..., 1.12565481,
               2.77999248, 1.98110613],
              [3.83915299, 1.66513433, 3.15175385, ..., 3.43047301,
               2.81870343, 3.69259162],
              [3.09151276, 2.05154796, 3.72162802, ..., 0.82186029,
               3.91416119, 3.56813021],
              ...,
              [2.96839925, 5.80239685, 1.63705973, ..., 1.95204278,
               1.81608721, 2.1571341 ],
              [4.20770424, 3.65632397, 2.84265034, ..., 1.68172334,
               0.55870573, 2.63901647],
              [1.29483006, 3.08373202, 2.0073062 , ..., 2.43205439,
               4.68513471, 1.89181443]],
      
             [[3.10352949, 3.18728323, 2.56303604, ..., 4.17817963,
               1.23774859, 1.28303082],
              [1.98921652, 3.28544606, 2.2746998 , ..., 2.39227487,
               4.26844875, 3.25052759],
              [4.10408859, 1.99422737, 1.67128341, ..., 4.60826197,
               1.73396222, 4.24098449],
              ...,
              [1.74079919, 3.03078117, 2.43933043, ..., 1.16495896,
               2.24788933, 2.25296764],
              [4.72132559, 4.58284629, 2.2242827 , ..., 4.47087103,
               1.12834648, 2.50279851],
              [3.01664848, 2.44682909, 2.00389092, ..., 3.64223307,
               1.68083814, 1.76908221]]])
[40]:
EOBS_UK_weighted = (EOBS_upscaled
                  .where(SEAS5_mask == 31) ## EOBS is now on the SEAS5 grid, so use the SEAS5 mask and gridcell area
                  .weighted(Gridarea_SEAS5['cell_area'])
                  .mean(dim=['latitude', 'longitude'])
                 )
EOBS_UK_weighted
EOBS_UK_weighted['rr'].plot()
[40]:
Show/Hide data repr Show/Hide attributes
xarray.Dataset
    • time: 71
    • time
      (time)
      datetime64[ns]
      1950-02-28 ... 2020-02-29
      array(['1950-02-28T00:00:00.000000000', '1951-02-28T00:00:00.000000000',
             '1952-02-29T00:00:00.000000000', '1953-02-28T00:00:00.000000000',
             '1954-02-28T00:00:00.000000000', '1955-02-28T00:00:00.000000000',
             '1956-02-29T00:00:00.000000000', '1957-02-28T00:00:00.000000000',
             '1958-02-28T00:00:00.000000000', '1959-02-28T00:00:00.000000000',
             '1960-02-29T00:00:00.000000000', '1961-02-28T00:00:00.000000000',
             '1962-02-28T00:00:00.000000000', '1963-02-28T00:00:00.000000000',
             '1964-02-29T00:00:00.000000000', '1965-02-28T00:00:00.000000000',
             '1966-02-28T00:00:00.000000000', '1967-02-28T00:00:00.000000000',
             '1968-02-29T00:00:00.000000000', '1969-02-28T00:00:00.000000000',
             '1970-02-28T00:00:00.000000000', '1971-02-28T00:00:00.000000000',
             '1972-02-29T00:00:00.000000000', '1973-02-28T00:00:00.000000000',
             '1974-02-28T00:00:00.000000000', '1975-02-28T00:00:00.000000000',
             '1976-02-29T00:00:00.000000000', '1977-02-28T00:00:00.000000000',
             '1978-02-28T00:00:00.000000000', '1979-02-28T00:00:00.000000000',
             '1980-02-29T00:00:00.000000000', '1981-02-28T00:00:00.000000000',
             '1982-02-28T00:00:00.000000000', '1983-02-28T00:00:00.000000000',
             '1984-02-29T00:00:00.000000000', '1985-02-28T00:00:00.000000000',
             '1986-02-28T00:00:00.000000000', '1987-02-28T00:00:00.000000000',
             '1988-02-29T00:00:00.000000000', '1989-02-28T00:00:00.000000000',
             '1990-02-28T00:00:00.000000000', '1991-02-28T00:00:00.000000000',
             '1992-02-29T00:00:00.000000000', '1993-02-28T00:00:00.000000000',
             '1994-02-28T00:00:00.000000000', '1995-02-28T00:00:00.000000000',
             '1996-02-29T00:00:00.000000000', '1997-02-28T00:00:00.000000000',
             '1998-02-28T00:00:00.000000000', '1999-02-28T00:00:00.000000000',
             '2000-02-29T00:00:00.000000000', '2001-02-28T00:00:00.000000000',
             '2002-02-28T00:00:00.000000000', '2003-02-28T00:00:00.000000000',
             '2004-02-29T00:00:00.000000000', '2005-02-28T00:00:00.000000000',
             '2006-02-28T00:00:00.000000000', '2007-02-28T00:00:00.000000000',
             '2008-02-29T00:00:00.000000000', '2009-02-28T00:00:00.000000000',
             '2010-02-28T00:00:00.000000000', '2011-02-28T00:00:00.000000000',
             '2012-02-29T00:00:00.000000000', '2013-02-28T00:00:00.000000000',
             '2014-02-28T00:00:00.000000000', '2015-02-28T00:00:00.000000000',
             '2016-02-29T00:00:00.000000000', '2017-02-28T00:00:00.000000000',
             '2018-02-28T00:00:00.000000000', '2019-02-28T00:00:00.000000000',
             '2020-02-29T00:00:00.000000000'], dtype='datetime64[ns]')
    • rr
      (time)
      float64
      4.127 3.251 1.072 ... 1.782 4.92
      array([4.12725776, 3.25073545, 1.07154934, 1.59250362, 2.59011679,
             2.19460832, 0.86553777, 2.71508494, 3.63875078, 0.43748921,
             2.44475198, 2.85323568, 1.90375894, 0.93073633, 0.96014483,
             0.54936363, 3.93806659, 3.41126225, 1.52403826, 2.14680588,
             3.24367008, 1.61173947, 2.49205547, 1.86097057, 3.44753495,
             1.09562791, 1.72599619, 4.25387794, 2.67016706, 1.75307168,
             2.88443313, 2.00213592, 2.06658247, 1.35684105, 2.11976754,
             1.09154394, 0.42823908, 1.98726177, 2.62084822, 3.95611062,
             6.18364117, 2.09273353, 2.32127359, 0.71821086, 2.72255429,
             4.36020144, 2.92272651, 5.12799133, 1.95292637, 2.49697346,
             4.02321657, 3.13585905, 5.53924289, 1.39434155, 1.90494815,
             1.91951746, 1.85577634, 3.60145376, 2.21509453, 1.73046396,
             2.21306353, 3.38992427, 1.39408258, 1.73443926, 5.42245947,
             2.37410915, 3.24500002, 2.77368044, 1.49827506, 1.78169717,
             4.91976886])
[40]:
[<matplotlib.lines.Line2D at 0x7fc1ea2f4850>]
../../_images/Notebooks_2.Preprocess_2.3Upscale_39_2.png

Illustrate the SEAS5 and EOBS UK average

And the area-weighted average UK precipitation for SEAS5 and EOBS I plot here. For SEAS5 I plot the range, both min/max and the 2.5/97.5 % percentile of all ensemble members and leadtimes for each year.

[43]:
ax = plt.axes()

Quantiles = (SEAS5_UK_weighted['tprate']
             .quantile([0,2.5/100, 0.5, 97.5/100,1],
                       dim=['number','leadtime']
                      )
            )
ax.plot(Quantiles.time, Quantiles.sel(quantile=0.5),
        color='orange',
        label = 'SEAS5 median')
ax.fill_between(Quantiles.time.values, Quantiles.sel(quantile=0.025), Quantiles.sel(quantile=0.975),
                color='orange',
                alpha=0.2,
                label = '95% / min max')
ax.fill_between(Quantiles.time.values, Quantiles.sel(quantile=0), Quantiles.sel(quantile=1),
                color='orange',
                alpha=0.2)

EOBS_UK_weighted['rr'].plot(ax=ax,
                            x='time',
                            label = 'E-OBS')
plt.legend(loc = 'lower left',
           ncol=2 ) #loc = (0.1, 0) upper left
[43]:
[<matplotlib.lines.Line2D at 0x7fc1ea2cef10>]
[43]:
<matplotlib.collections.PolyCollection at 0x7fc1ea39f250>
[43]:
<matplotlib.collections.PolyCollection at 0x7fc1ea77fed0>
[43]:
[<matplotlib.lines.Line2D at 0x7fc1ea449310>]
[43]:
<matplotlib.legend.Legend at 0x7fc1ea56fc90>
../../_images/Notebooks_2.Preprocess_2.3Upscale_41_5.png

And save the UK weighted average datasets

[45]:
SEAS5_UK_weighted.to_netcdf('Data/SEAS5_UK_weighted_masked.nc')
SEAS5_UK_weighted.to_dataframe().to_csv('Data/SEAS5_UK_weighted_masked.csv')
EOBS_UK_weighted.to_netcdf('Data/EOBS_UK_weighted_upscaled.nc') ## save as netcdf
EOBS_UK_weighted.to_dataframe().to_csv('Data/EOBS_UK_weighted_upscaled.csv') ## and save as csv.
[46]:
SEAS5_UK_weighted.close()
EOBS_UK_weighted.close()

Other methods

There are many different sources and methods available for extracting areal-averages from shapefiles. Here I have used shapely / masking in xarray. Something that lacks with this method is the weighted extraction from a shapefile, that is more precise on the boundaries. In R, raster:extract can use the percentage of the area that falls within the country for each grid cell to use as weight in averaging. For more information on this method, see the EGU 2018 course. For SEAS5, with its coarse resolution, this might make a difference. However, for it’s speed and reproducibility, we have chosen to stick to xarray.

We have used xarray where you can apply weights yourself to a dataset and then calculate the weighted mean. Sources I have used: * xarray weighted reductions * Matteo’s blog * regionmask package * Arctic weighted average example * area weighted temperature example.

And this pretty awesome colab notebook on seasonal forecasting regrids seasonal forecasts and reanalysis on the same grid before calculating skill scores.