Using EOBS + upscaling¶
Here we explore how to best extract areal averaged precipitation and test this for UK precipitation within SEAS5 and EOBS. The code is inspired on Matteo De Felice’s blog – credits to him!
We create a mask for all 241 countries within Regionmask, that has predefined countries from Natural Earth datasets (shapefiles). We use the mask to go from gridded precipitation to country-averaged timeseries. We regrid EOBS to the SEAS5 grid so we can select the same grid cells in calculating the UK average for both datasets. The country outline would not be perfect, but the masks would be the same so the comparison would be fair.
I use the xesmf package for upscaling, a good example can be found in this notebook.
Import packages¶
We need the packages regionmask for masking and xesmf for regridding. I cannot install xesmf into the UNSEEN-open environment without breaking my environment, so in this notebook I use a separate ‘upscale’ environment, as suggested by this issue. I use the packages esmpy=7.1.0 xesmf=0.2.1 regionmask cartopy matplotlib xarray numpy netcdf4.
[1]:
##This is so variables get printed within jupyter
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
[2]:
##import packages
import os
import xarray as xr
import numpy as np
import matplotlib.pyplot as plt
import cartopy
import cartopy.crs as ccrs
import matplotlib.ticker as mticker
import regionmask # Masking
import xesmf as xe # Regridding
[3]:
##We want the working directory to be the UNSEEN-open directory
pwd = os.getcwd() ##current working directory is UNSEEN-open/Notebooks/1.Download
pwd #print the present working directory
os.chdir(pwd+'/../../') # Change the working directory to UNSEEN-open
os.getcwd() #print the working directory
[3]:
'/lustre/soge1/projects/ls/personal/timo/UNSEEN-open/Notebooks/1.Download'
[3]:
'/lustre/soge1/projects/ls/personal/timo/UNSEEN-open'
Load SEAS5 and EOBS¶
From CDS, we retrieve SEAS5 and merge the retrieved files in preprocessing. We create a netcdf file containing the dimensions lat, lon, time (35 years), number (25 ensembles) and leadtime (5 initialization months).
[4]:
SEAS5 = xr.open_dataset('../UK_example/SEAS5/SEAS5.nc')
SEAS5
[4]:
- latitude: 11
- leadtime: 5
- longitude: 14
- number: 25
- time: 35
- longitude(longitude)float32-11.0 -10.0 -9.0 ... 0.0 1.0 2.0
- units :
- degrees_east
- long_name :
- longitude
array([-11., -10., -9., -8., -7., -6., -5., -4., -3., -2., -1., 0., 1., 2.], dtype=float32)
- time(time)datetime64[ns]1982-02-01 ... 2016-02-01
- long_name :
- time
array(['1982-02-01T00:00:00.000000000', '1983-02-01T00:00:00.000000000', '1984-02-01T00:00:00.000000000', '1985-02-01T00:00:00.000000000', '1986-02-01T00:00:00.000000000', '1987-02-01T00:00:00.000000000', '1988-02-01T00:00:00.000000000', '1989-02-01T00:00:00.000000000', '1990-02-01T00:00:00.000000000', '1991-02-01T00:00:00.000000000', '1992-02-01T00:00:00.000000000', '1993-02-01T00:00:00.000000000', '1994-02-01T00:00:00.000000000', '1995-02-01T00:00:00.000000000', '1996-02-01T00:00:00.000000000', '1997-02-01T00:00:00.000000000', '1998-02-01T00:00:00.000000000', '1999-02-01T00:00:00.000000000', '2000-02-01T00:00:00.000000000', '2001-02-01T00:00:00.000000000', '2002-02-01T00:00:00.000000000', '2003-02-01T00:00:00.000000000', '2004-02-01T00:00:00.000000000', '2005-02-01T00:00:00.000000000', '2006-02-01T00:00:00.000000000', '2007-02-01T00:00:00.000000000', '2008-02-01T00:00:00.000000000', '2009-02-01T00:00:00.000000000', '2010-02-01T00:00:00.000000000', '2011-02-01T00:00:00.000000000', '2012-02-01T00:00:00.000000000', '2013-02-01T00:00:00.000000000', '2014-02-01T00:00:00.000000000', '2015-02-01T00:00:00.000000000', '2016-02-01T00:00:00.000000000'], dtype='datetime64[ns]')
- latitude(latitude)float3260.0 59.0 58.0 ... 52.0 51.0 50.0
- units :
- degrees_north
- long_name :
- latitude
array([60., 59., 58., 57., 56., 55., 54., 53., 52., 51., 50.], dtype=float32)
- number(number)int320 1 2 3 4 5 6 ... 19 20 21 22 23 24
- long_name :
- ensemble_member
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24], dtype=int32)
- leadtime(leadtime)int642 3 4 5 6
array([2, 3, 4, 5, 6])
- tprate(leadtime, time, number, latitude, longitude)float32...
- long_name :
- rainfall
- units :
- mm/day
- standard_name :
- thickness_of_rainfall_amount
[673750 values with dtype=float32]
- Conventions :
- CF-1.6
- history :
- 2020-05-13 14:49:43 GMT by grib_to_netcdf-2.16.0: /opt/ecmwf/eccodes/bin/grib_to_netcdf -S param -o /cache/data7/adaptor.mars.external-1589381366.1540039-11561-3-ad31a097-72e2-45ce-a565-55c62502f358.nc /cache/tmp/ad31a097-72e2-45ce-a565-55c62502f358-adaptor.mars.external-1589381366.1545565-11561-1-tmp.grib
And load EOBS netcdf with only February precipitation, resulting in 71 values, one for each year within 1950 - 2020 over the European domain (25N-75N x 40W-75E).
[5]:
EOBS = xr.open_dataset('../UK_example/EOBS/EOBS.nc')
EOBS
[5]:
- latitude: 201
- longitude: 464
- time: 71
- longitude(longitude)float64-40.38 -40.12 ... 75.12 75.38
- units :
- degrees_east
- long_name :
- Longitude values
- axis :
- X
- standard_name :
- longitude
array([-40.375, -40.125, -39.875, ..., 74.875, 75.125, 75.375])
- latitude(latitude)float6425.38 25.62 25.88 ... 75.12 75.38
- units :
- degrees_north
- long_name :
- Latitude values
- axis :
- Y
- standard_name :
- latitude
array([25.375, 25.625, 25.875, ..., 74.875, 75.125, 75.375])
- time(time)datetime64[ns]1950-02-28 ... 2020-02-29
array(['1950-02-28T00:00:00.000000000', '1951-02-28T00:00:00.000000000', '1952-02-29T00:00:00.000000000', '1953-02-28T00:00:00.000000000', '1954-02-28T00:00:00.000000000', '1955-02-28T00:00:00.000000000', '1956-02-29T00:00:00.000000000', '1957-02-28T00:00:00.000000000', '1958-02-28T00:00:00.000000000', '1959-02-28T00:00:00.000000000', '1960-02-29T00:00:00.000000000', '1961-02-28T00:00:00.000000000', '1962-02-28T00:00:00.000000000', '1963-02-28T00:00:00.000000000', '1964-02-29T00:00:00.000000000', '1965-02-28T00:00:00.000000000', '1966-02-28T00:00:00.000000000', '1967-02-28T00:00:00.000000000', '1968-02-29T00:00:00.000000000', '1969-02-28T00:00:00.000000000', '1970-02-28T00:00:00.000000000', '1971-02-28T00:00:00.000000000', '1972-02-29T00:00:00.000000000', '1973-02-28T00:00:00.000000000', '1974-02-28T00:00:00.000000000', '1975-02-28T00:00:00.000000000', '1976-02-29T00:00:00.000000000', '1977-02-28T00:00:00.000000000', '1978-02-28T00:00:00.000000000', '1979-02-28T00:00:00.000000000', '1980-02-29T00:00:00.000000000', '1981-02-28T00:00:00.000000000', '1982-02-28T00:00:00.000000000', '1983-02-28T00:00:00.000000000', '1984-02-29T00:00:00.000000000', '1985-02-28T00:00:00.000000000', '1986-02-28T00:00:00.000000000', '1987-02-28T00:00:00.000000000', '1988-02-29T00:00:00.000000000', '1989-02-28T00:00:00.000000000', '1990-02-28T00:00:00.000000000', '1991-02-28T00:00:00.000000000', '1992-02-29T00:00:00.000000000', '1993-02-28T00:00:00.000000000', '1994-02-28T00:00:00.000000000', '1995-02-28T00:00:00.000000000', '1996-02-29T00:00:00.000000000', '1997-02-28T00:00:00.000000000', '1998-02-28T00:00:00.000000000', '1999-02-28T00:00:00.000000000', '2000-02-29T00:00:00.000000000', '2001-02-28T00:00:00.000000000', '2002-02-28T00:00:00.000000000', '2003-02-28T00:00:00.000000000', '2004-02-29T00:00:00.000000000', '2005-02-28T00:00:00.000000000', '2006-02-28T00:00:00.000000000', '2007-02-28T00:00:00.000000000', '2008-02-29T00:00:00.000000000', '2009-02-28T00:00:00.000000000', '2010-02-28T00:00:00.000000000', '2011-02-28T00:00:00.000000000', '2012-02-29T00:00:00.000000000', '2013-02-28T00:00:00.000000000', '2014-02-28T00:00:00.000000000', '2015-02-28T00:00:00.000000000', '2016-02-29T00:00:00.000000000', '2017-02-28T00:00:00.000000000', '2018-02-28T00:00:00.000000000', '2019-02-28T00:00:00.000000000', '2020-02-29T00:00:00.000000000'], dtype='datetime64[ns]')
- rr(time, latitude, longitude)float32...
- long_name :
- rainfall
- units :
- mm/day
- standard_name :
- thickness_of_rainfall_amount
[6621744 values with dtype=float32]
Masking¶
Here we load the countries and create a mask for SEAS5 and for EOBS.
Regionmask has predefined countries from Natural Earth datasets (shapefiles).
[7]:
countries = regionmask.defined_regions.natural_earth.countries_50
countries
[7]:
241 'Natural Earth Countries: 50m' Regions (http://www.naturalearthdata.com)
ZW ZM YE VN VE V VU UZ UY FSM MH MP VI GU AS PR US GS IO SH PN AI FK KY BM VG TC MS JE GG IM GB AE UA UG TM TR TN TT TO TG TL TH TZ TJ TW SYR CH S SW SR SS SD LK E KR ZA SO SL SB SK SLO SG SL SC RS SN SA ST RSM WS VC LC KN RW RUS RO QA P PL PH PE PY PG PA PW PK OM N KP NG NE NI NZ NU CK NL AW CW NP NR NA MZ MA WS ME MN MD MC MX MU MR M ML MV MY MW MG MK L LT FL LY LR LS LB LV LA KG KW KO KI KE KZ J J J I IS PAL IRL IRQ IRN INDO IND IS HU HN HT GY GW GN GT GD GR GH D GE GM GA F PM WF MF BL PF NC TF AI FIN FJ ET EST ER GQ SV EG EC DO DM DJ GL FO DK CZ CN CY CU HR CI CR DRC CG KM CO CN MO HK CL TD CF CV CA CM KH MM BI BF BG BN BR BW BiH BO BT BJ BZ B BY BB BD BH BS AZ A AU IOT HM NF AU ARM AR AG AO AND DZ AL AF SG AQ SX
Now we create the mask for the SEAS5 grid. Only one timestep is needed to create the mask. This mask will lateron be used to mask all the timesteps.
[8]:
SEAS5_mask = countries.mask(SEAS5.sel(leadtime=2, number=0, time='1982'),
lon_name='longitude',
lat_name='latitude')
And create a plot to illustrate what the mask looks like. The mask just indicates for each gridcell what country the gridcell belongs to.
[9]:
SEAS5_mask
SEAS5_mask.plot()
[9]:
- latitude: 11
- longitude: 14
- nan nan nan nan nan nan nan nan ... nan nan nan nan nan nan nan 160.0
array([[ nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan], [ nan, nan, nan, nan, nan, nan, nan, nan, 31., nan, nan, nan, nan, nan], [ nan, nan, nan, nan, 31., nan, 31., 31., nan, nan, nan, nan, nan, nan], [ nan, nan, nan, nan, nan, nan, 31., 31., 31., nan, nan, nan, nan, nan], [ nan, nan, nan, nan, nan, nan, nan, 31., nan, nan, nan, nan, nan, nan], [ nan, nan, nan, 140., 31., 31., 31., 31., 31., 31., nan, nan, nan, nan], [ nan, 140., 140., 140., 140., nan, nan, nan, nan, 31., 31., nan, nan, nan], [ nan, nan, 140., 140., 140., nan, nan, 31., 31., 31., 31., 31., nan, nan], [ nan, 140., 140., 140., nan, nan, 31., 31., 31., 31., 31., 31., 31., nan], [ nan, nan, nan, nan, nan, nan, nan, 31., 31., 31., 31., 31., 31., 160.], [ nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, 160.]])
- latitude(latitude)float3260.0 59.0 58.0 ... 52.0 51.0 50.0
array([60., 59., 58., 57., 56., 55., 54., 53., 52., 51., 50.], dtype=float32)
- longitude(longitude)float32-11.0 -10.0 -9.0 ... 0.0 1.0 2.0
array([-11., -10., -9., -8., -7., -6., -5., -4., -3., -2., -1., 0., 1., 2.], dtype=float32)
[9]:
<matplotlib.collections.QuadMesh at 0x7f877194e8d0>
And now we can extract the UK averaged precipitation within SEAS5 by using the mask index of the UK: where(SEAS5_mask == UK_index)
. So we need to find the index of one of the 241 abbreviations. In this case for the UK use ‘GB’. Additionally, if you can’t find a country, use countries.regions
to get the full names of the countries.
[10]:
countries.abbrevs.index('GB')
[10]:
31
To select the UK average, we select SEAS5 precipitation (tprate), select the gridcells that are within the UK and take the mean over those gridcells. This results in a dataset of February precipitation for 35 years (1981-2016), with 5 leadtimes and 25 ensemble members.
[11]:
SEAS5_UK = (SEAS5['tprate']
.where(SEAS5_mask == 31)
.mean(dim=['latitude', 'longitude']))
SEAS5_UK
[11]:
- leadtime: 5
- time: 35
- number: 25
- 1.7730116 1.9548205 3.7803986 ... 3.662669 1.6688719 1.9709551
array([[[1.7730116 , 1.9548205 , 3.7803986 , ..., 2.9277864 , 1.8929143 , 2.446378 ], [3.040877 , 1.8855734 , 4.2009687 , ..., 2.8347836 , 4.1132317 , 2.4391398 ], [3.556001 , 3.6879914 , 3.184576 , ..., 5.116828 , 3.8872097 , 2.6074293 ], ..., [2.9086895 , 3.7759151 , 4.660708 , ..., 1.6525848 , 3.1059399 , 1.6731865 ], [2.8330274 , 3.6866314 , 2.634609 , ..., 3.3976977 , 4.993076 , 3.6399577 ], [2.2500532 , 2.510962 , 3.7602315 , ..., 3.8013346 , 1.2315732 , 3.6035924 ]], [[1.0868028 , 1.5332695 , 3.2461395 , ..., 1.5274822 , 2.7315547 , 1.0240313 ], [0.99898285, 2.9119303 , 2.1601522 , ..., 3.3752463 , 2.8715765 , 4.393921 ], [3.6581304 , 3.5088263 , 2.0143754 , ..., 3.9588785 , 3.795881 , 3.179954 ], ..., [2.5120296 , 3.568533 , 3.4690757 , ..., 2.6206532 , 3.953479 , 3.789593 ], [1.9866585 , 3.670435 , 2.7552466 , ..., 3.5482445 , 1.9896345 , 3.9373026 ], [1.9513754 , 2.0166924 , 2.9413762 , ..., 3.3549297 , 3.0631557 , 1.915903 ]], [[2.0119116 , 2.4435556 , 1.3927166 , ..., 1.8064059 , 1.8338976 , 2.4649143 ], [2.62523 , 3.6218376 , 3.003645 , ..., 3.3206403 , 2.5519 , 3.110555 ], [4.071685 , 2.6880858 , 3.8181992 , ..., 1.9033648 , 1.8840973 , 4.449085 ], ..., [2.8379328 , 2.6923704 , 1.82504 , ..., 2.8684394 , 3.1954165 , 3.445909 ], [2.072118 , 2.0628428 , 2.8790975 , ..., 2.697285 , 3.3291562 , 2.9145727 ], [3.535856 , 3.6515677 , 2.1052096 , ..., 3.2938933 , 3.1968002 , 2.2115657 ]], [[2.9105136 , 3.6938024 , 1.1343408 , ..., 1.1984391 , 2.9722617 , 2.1267796 ], [4.02007 , 1.8249133 , 3.099 , ..., 3.4654658 , 2.7237208 , 3.5907636 ], [3.1248841 , 2.219241 , 3.6903172 , ..., 0.8401702 , 4.017805 , 3.476175 ], ..., [2.8703861 , 5.5576715 , 1.7174771 , ..., 2.1288645 , 1.8581603 , 2.2952495 ], [4.179039 , 3.737323 , 2.880745 , ..., 1.9207952 , 0.6607291 , 2.8571692 ], [1.5515772 , 3.0461147 , 2.0624359 , ..., 2.5376263 , 4.744201 , 2.0716472 ]], [[3.1285267 , 3.269652 , 2.5995293 , ..., 4.347241 , 1.3022097 , 1.4634563 ], [2.262867 , 3.3503478 , 2.4287066 , ..., 2.593644 , 4.251352 , 3.1974325 ], [4.0569496 , 2.156282 , 1.781804 , ..., 4.4861846 , 1.8805121 , 4.268968 ], ..., [1.843329 , 3.1374288 , 2.516779 , ..., 1.3068637 , 2.4419744 , 2.2696178 ], [4.635685 , 4.429196 , 2.4575038 , ..., 4.441483 , 1.3038206 , 2.6521633 ], [3.1435733 , 2.614458 , 2.1784708 , ..., 3.662669 , 1.6688719 , 1.9709551 ]]], dtype=float32)
- time(time)datetime64[ns]1982-02-01 ... 2016-02-01
- long_name :
- time
array(['1982-02-01T00:00:00.000000000', '1983-02-01T00:00:00.000000000', '1984-02-01T00:00:00.000000000', '1985-02-01T00:00:00.000000000', '1986-02-01T00:00:00.000000000', '1987-02-01T00:00:00.000000000', '1988-02-01T00:00:00.000000000', '1989-02-01T00:00:00.000000000', '1990-02-01T00:00:00.000000000', '1991-02-01T00:00:00.000000000', '1992-02-01T00:00:00.000000000', '1993-02-01T00:00:00.000000000', '1994-02-01T00:00:00.000000000', '1995-02-01T00:00:00.000000000', '1996-02-01T00:00:00.000000000', '1997-02-01T00:00:00.000000000', '1998-02-01T00:00:00.000000000', '1999-02-01T00:00:00.000000000', '2000-02-01T00:00:00.000000000', '2001-02-01T00:00:00.000000000', '2002-02-01T00:00:00.000000000', '2003-02-01T00:00:00.000000000', '2004-02-01T00:00:00.000000000', '2005-02-01T00:00:00.000000000', '2006-02-01T00:00:00.000000000', '2007-02-01T00:00:00.000000000', '2008-02-01T00:00:00.000000000', '2009-02-01T00:00:00.000000000', '2010-02-01T00:00:00.000000000', '2011-02-01T00:00:00.000000000', '2012-02-01T00:00:00.000000000', '2013-02-01T00:00:00.000000000', '2014-02-01T00:00:00.000000000', '2015-02-01T00:00:00.000000000', '2016-02-01T00:00:00.000000000'], dtype='datetime64[ns]')
- number(number)int320 1 2 3 4 5 6 ... 19 20 21 22 23 24
- long_name :
- ensemble_member
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24], dtype=int32)
- leadtime(leadtime)int642 3 4 5 6
array([2, 3, 4, 5, 6])
However, xarray does not take into account the area of the gridcells in taking the average. Therefore, we have to calculate the area-weighted mean of the gridcells. To calculate the area of each gridcell, I use cdo cdo gridarea infile outfile
. Here I load the generated file:
[12]:
Gridarea_SEAS5 = xr.open_dataset('../UK_example/Gridarea_SEAS5.nc')
Gridarea_SEAS5['cell_area'].plot()
[12]:
<matplotlib.collections.QuadMesh at 0x7f8771a14bd0>
[13]:
SEAS5_UK_weighted = (SEAS5['tprate']
.where(SEAS5_mask == 31)
.weighted(Gridarea_SEAS5['cell_area'])
.mean(dim=['latitude', 'longitude'])
)
SEAS5_UK_weighted
[13]:
- leadtime: 5
- time: 35
- number: 25
- 1.747 1.916 3.742 2.909 4.562 3.113 ... 4.034 1.957 3.664 1.684 1.929
array([[[1.74715784, 1.91625164, 3.74246331, ..., 2.9025497 , 1.87161458, 2.42337871], [3.01557164, 1.86355946, 4.23964218, ..., 2.82348107, 4.08311346, 2.45320866], [3.45037457, 3.67373672, 3.19124952, ..., 5.10772697, 3.83928173, 2.59254324], ..., [2.91576568, 3.76689869, 4.64815469, ..., 1.61583385, 3.07634248, 1.633837 ], [2.8217756 , 3.61588493, 2.61442489, ..., 3.36894025, 5.01010622, 3.58215465], [2.21954162, 2.46237273, 3.71758684, ..., 3.72850729, 1.21044478, 3.5506569 ]], [[1.08925258, 1.502868 , 3.23383862, ..., 1.49745391, 2.74434639, 0.98857514], [0.96385251, 2.9144073 , 2.14176199, ..., 3.39404417, 2.837949 , 4.39105086], [3.64189637, 3.44186084, 1.96817031, ..., 3.90560029, 3.78419096, 3.13070979], ..., [2.46681904, 3.55272741, 3.41184678, ..., 2.58295751, 3.84194729, 3.71967185], [1.96088291, 3.64719353, 2.73561159, ..., 3.56311814, 2.00379361, 3.89548019], [1.89263296, 1.99179138, 2.91457733, ..., 3.37011609, 3.05092884, 1.89736692]], [[1.94362083, 2.4160058 , 1.36431312, ..., 1.78378666, 1.82209387, 2.42526171], [2.57294707, 3.55756557, 2.96458594, ..., 3.26071746, 2.60226357, 3.13573254], [4.13926899, 2.61380816, 3.76440713, ..., 1.87333769, 1.82042494, 4.46818308], ..., [2.81221309, 2.69631474, 1.80062933, ..., 2.86429728, 3.13303958, 3.44787769], [2.07369663, 2.03958281, 2.81885547, ..., 2.65322161, 3.32297921, 2.83923239], [3.49884241, 3.63866711, 2.03510114, ..., 3.24500072, 3.15147085, 2.16834782]], [[2.83876303, 3.61651907, 1.0950032 , ..., 1.17176117, 2.91180608, 2.1047135 ], [3.95351432, 1.78778573, 3.08959013, ..., 3.43481603, 2.72829325, 3.6158294 ], [3.13152664, 2.19419128, 3.64975772, ..., 0.83938823, 4.01702257, 3.46619512], ..., [2.90135673, 5.59148292, 1.70016201, ..., 2.08532097, 1.84797942, 2.23519466], [4.19777172, 3.74702108, 2.8748067 , ..., 1.88071816, 0.62889528, 2.79973163], [1.51194718, 3.07317605, 2.05956574, ..., 2.53950207, 4.70096128, 2.04542173]], [[3.15063505, 3.23490175, 2.60923731, ..., 4.27225421, 1.29816875, 1.42799228], [2.21017692, 3.32458317, 2.3819878 , ..., 2.53619218, 4.24655687, 3.21280586], [4.07655988, 2.07606666, 1.75961194, ..., 4.47653645, 1.81676988, 4.21655002], ..., [1.82563235, 3.10218576, 2.46347745, ..., 1.28711195, 2.36333743, 2.26540939], [4.63821163, 4.43540009, 2.37741808, ..., 4.44346938, 1.27125796, 2.63467098], [3.12914205, 2.58186504, 2.11029128, ..., 3.6635387 , 1.68374372, 1.9288108 ]]])
- time(time)datetime64[ns]1982-02-01 ... 2016-02-01
- long_name :
- time
array(['1982-02-01T00:00:00.000000000', '1983-02-01T00:00:00.000000000', '1984-02-01T00:00:00.000000000', '1985-02-01T00:00:00.000000000', '1986-02-01T00:00:00.000000000', '1987-02-01T00:00:00.000000000', '1988-02-01T00:00:00.000000000', '1989-02-01T00:00:00.000000000', '1990-02-01T00:00:00.000000000', '1991-02-01T00:00:00.000000000', '1992-02-01T00:00:00.000000000', '1993-02-01T00:00:00.000000000', '1994-02-01T00:00:00.000000000', '1995-02-01T00:00:00.000000000', '1996-02-01T00:00:00.000000000', '1997-02-01T00:00:00.000000000', '1998-02-01T00:00:00.000000000', '1999-02-01T00:00:00.000000000', '2000-02-01T00:00:00.000000000', '2001-02-01T00:00:00.000000000', '2002-02-01T00:00:00.000000000', '2003-02-01T00:00:00.000000000', '2004-02-01T00:00:00.000000000', '2005-02-01T00:00:00.000000000', '2006-02-01T00:00:00.000000000', '2007-02-01T00:00:00.000000000', '2008-02-01T00:00:00.000000000', '2009-02-01T00:00:00.000000000', '2010-02-01T00:00:00.000000000', '2011-02-01T00:00:00.000000000', '2012-02-01T00:00:00.000000000', '2013-02-01T00:00:00.000000000', '2014-02-01T00:00:00.000000000', '2015-02-01T00:00:00.000000000', '2016-02-01T00:00:00.000000000'], dtype='datetime64[ns]')
- number(number)int320 1 2 3 4 5 6 ... 19 20 21 22 23 24
- long_name :
- ensemble_member
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24], dtype=int32)
- leadtime(leadtime)int642 3 4 5 6
array([2, 3, 4, 5, 6])
What is the difference between the weighted and non-weighted average?
I plot the UK average for ensemble member 0 and leadtime 2
[14]:
SEAS5_UK.sel(leadtime=2,number=0).plot()
SEAS5_UK_weighted.sel(leadtime=2,number=0).plot()
[14]:
[<matplotlib.lines.Line2D at 0x7f87718f8b50>]
[14]:
[<matplotlib.lines.Line2D at 0x7f87718e3ad0>]
Upscale¶
For EOBS, we want to upscale the dataset to the SEAS5 grid. We use the function regridder(ds_in,ds_out,function)
, see the docs. We have to rename the lat lon dimensions so the function can read them.
We use bilinear interpolation first (i.e. function = ‘bilinear’), because of its ease in implementation. However, the use of conservative areal average (function = ‘conservative’) for upscaling is preferred (Kopparla, 2013).
[8]:
regridder = xe.Regridder(EOBS.rename({'longitude': 'lon', 'latitude': 'lat'}), SEAS5.rename({'longitude': 'lon', 'latitude': 'lat'}),'bilinear')
Overwrite existing file: bilinear_201x464_11x14.nc
You can set reuse_weights=True to save computing time.
Now that we have the regridder, we can apply the regridder to our EOBS dataarray:
[27]:
EOBS_upscaled = regridder(EOBS)
EOBS_upscaled
using dimensions ('latitude', 'longitude') from data variable rr as the horizontal dimensions for this dataset.
[27]:
- lat: 11
- lon: 14
- time: 71
- time(time)datetime64[ns]1950-02-28 ... 2020-02-29
array(['1950-02-28T00:00:00.000000000', '1951-02-28T00:00:00.000000000', '1952-02-29T00:00:00.000000000', '1953-02-28T00:00:00.000000000', '1954-02-28T00:00:00.000000000', '1955-02-28T00:00:00.000000000', '1956-02-29T00:00:00.000000000', '1957-02-28T00:00:00.000000000', '1958-02-28T00:00:00.000000000', '1959-02-28T00:00:00.000000000', '1960-02-29T00:00:00.000000000', '1961-02-28T00:00:00.000000000', '1962-02-28T00:00:00.000000000', '1963-02-28T00:00:00.000000000', '1964-02-29T00:00:00.000000000', '1965-02-28T00:00:00.000000000', '1966-02-28T00:00:00.000000000', '1967-02-28T00:00:00.000000000', '1968-02-29T00:00:00.000000000', '1969-02-28T00:00:00.000000000', '1970-02-28T00:00:00.000000000', '1971-02-28T00:00:00.000000000', '1972-02-29T00:00:00.000000000', '1973-02-28T00:00:00.000000000', '1974-02-28T00:00:00.000000000', '1975-02-28T00:00:00.000000000', '1976-02-29T00:00:00.000000000', '1977-02-28T00:00:00.000000000', '1978-02-28T00:00:00.000000000', '1979-02-28T00:00:00.000000000', '1980-02-29T00:00:00.000000000', '1981-02-28T00:00:00.000000000', '1982-02-28T00:00:00.000000000', '1983-02-28T00:00:00.000000000', '1984-02-29T00:00:00.000000000', '1985-02-28T00:00:00.000000000', '1986-02-28T00:00:00.000000000', '1987-02-28T00:00:00.000000000', '1988-02-29T00:00:00.000000000', '1989-02-28T00:00:00.000000000', '1990-02-28T00:00:00.000000000', '1991-02-28T00:00:00.000000000', '1992-02-29T00:00:00.000000000', '1993-02-28T00:00:00.000000000', '1994-02-28T00:00:00.000000000', '1995-02-28T00:00:00.000000000', '1996-02-29T00:00:00.000000000', '1997-02-28T00:00:00.000000000', '1998-02-28T00:00:00.000000000', '1999-02-28T00:00:00.000000000', '2000-02-29T00:00:00.000000000', '2001-02-28T00:00:00.000000000', '2002-02-28T00:00:00.000000000', '2003-02-28T00:00:00.000000000', '2004-02-29T00:00:00.000000000', '2005-02-28T00:00:00.000000000', '2006-02-28T00:00:00.000000000', '2007-02-28T00:00:00.000000000', '2008-02-29T00:00:00.000000000', '2009-02-28T00:00:00.000000000', '2010-02-28T00:00:00.000000000', '2011-02-28T00:00:00.000000000', '2012-02-29T00:00:00.000000000', '2013-02-28T00:00:00.000000000', '2014-02-28T00:00:00.000000000', '2015-02-28T00:00:00.000000000', '2016-02-29T00:00:00.000000000', '2017-02-28T00:00:00.000000000', '2018-02-28T00:00:00.000000000', '2019-02-28T00:00:00.000000000', '2020-02-29T00:00:00.000000000'], dtype='datetime64[ns]')
- lon(lon)float32-11.0 -10.0 -9.0 ... 0.0 1.0 2.0
array([-11., -10., -9., -8., -7., -6., -5., -4., -3., -2., -1., 0., 1., 2.], dtype=float32)
- lat(lat)float3260.0 59.0 58.0 ... 52.0 51.0 50.0
array([60., 59., 58., 57., 56., 55., 54., 53., 52., 51., 50.], dtype=float32)
- rr(time, lat, lon)float64nan nan nan nan ... nan nan 4.243
array([[[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [ nan, 9.83356658, 8.21107723, ..., 3.16702519, 2.5759045 , nan], [ nan, nan, nan, ..., 3.64821065, nan, nan], [ nan, nan, nan, ..., nan, nan, 2.88222815]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [ nan, 6.84924265, 5.05712206, ..., 2.98497796, 2.8831402 , nan], [ nan, nan, nan, ..., 4.582142 , nan, nan], [ nan, nan, nan, ..., nan, nan, 2.52327854]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [ nan, 1.43787911, 0.64045753, ..., 0.50001099, 0.47414158, nan], [ nan, nan, nan, ..., 0.91728464, nan, nan], [ nan, nan, nan, ..., nan, nan, 1.65432386]], ..., [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [ nan, 6.07403735, 2.99021503, ..., 1.05803059, 1.26427248, nan], [ nan, nan, nan, ..., 1.05892861, nan, nan], [ nan, nan, nan, ..., nan, nan, 1.13395446]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [ nan, 8.53733901, 4.27626159, ..., 1.10982856, 0.72946197, nan], [ nan, nan, nan, ..., 1.46519147, nan, nan], [ nan, nan, nan, ..., nan, nan, 1.3330352 ]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [ nan, 7.82318034, 7.0790547 , ..., 2.82500975, 1.98278063, nan], [ nan, nan, nan, ..., 3.50348602, nan, nan], [ nan, nan, nan, ..., nan, nan, 4.24307919]]])
- regrid_method :
- bilinear
And set the latlon dimension names back to their long name. This is so both SEAS5 and EOBS have the same latlon dimension names which is necessary when using the same mask.
[28]:
EOBS_upscaled = EOBS_upscaled.rename({'lon' : 'longitude', 'lat' : 'latitude'})
Illustrate the SEAS5 and EOBS masks for the UK¶
Here I plot the masked mean SEAS5 and upscaled EOBS precipitation. This shows that upscaled EOBS does not contain data for all gridcells within the UK mask (the difference between SEAS5 gridcells and EOBS gridcells with data). We can apply an additional mask for SEAS5 that masks the grid cells that do not contain data in EOBS.
[30]:
fig, axs = plt.subplots(1, 2, subplot_kw={'projection': ccrs.OSGB()})
SEAS5['tprate'].where(SEAS5_mask == 31).mean(
dim=['time', 'leadtime', 'number']).plot(
transform=ccrs.PlateCarree(),
vmin=0,
vmax=8,
cmap=plt.cm.Blues,
ax=axs[0])
EOBS_upscaled['rr'].where(SEAS5_mask == 31).mean(dim='time').plot(
transform=ccrs.PlateCarree(),
vmin=0,
vmax=8,
cmap=plt.cm.Blues,
ax=axs[1])
for ax in axs.flat:
ax.coastlines(resolution='10m')
axs[0].set_title('SEAS5')
axs[1].set_title('EOBS')
/soge-home/users/cenv0732/.conda/envs/upscale/lib/python3.7/site-packages/xarray/core/nanops.py:142: RuntimeWarning: Mean of empty slice
return np.nanmean(a, axis=axis, dtype=dtype)
[30]:
<matplotlib.collections.QuadMesh at 0x7fc1ea756c50>
/soge-home/users/cenv0732/.conda/envs/upscale/lib/python3.7/site-packages/xarray/core/nanops.py:142: RuntimeWarning: Mean of empty slice
return np.nanmean(a, axis=axis, dtype=dtype)
[30]:
<matplotlib.collections.QuadMesh at 0x7fc1ea71e650>
[30]:
<cartopy.mpl.feature_artist.FeatureArtist at 0x7fc1ea6d8f10>
[30]:
<cartopy.mpl.feature_artist.FeatureArtist at 0x7fc1ea7b9590>
[30]:
Text(0.5, 1.0, 'SEAS5')
[30]:
Text(0.5, 1.0, 'EOBS')
The additional mask of SEAS5 is where EOBS is not null:
[32]:
fig, axs = plt.subplots(1, 2, subplot_kw={'projection': ccrs.OSGB()})
(SEAS5['tprate']
.where(SEAS5_mask == 31)
.where(EOBS_upscaled['rr'].sel(time='1950').squeeze('time').notnull()) ## mask values that are nan in EOBS
.mean(dim=['time', 'leadtime', 'number'])
.plot(
transform=ccrs.PlateCarree(),
vmin=0,
vmax=8,
cmap=plt.cm.Blues,
ax=axs[0])
)
EOBS_upscaled['rr'].where(SEAS5_mask == 31).mean(dim='time').plot(
transform=ccrs.PlateCarree(),
vmin=0,
vmax=8,
cmap=plt.cm.Blues,
ax=axs[1])
for ax in axs.flat:
ax.coastlines(resolution='10m')
axs[0].set_title('SEAS5')
axs[1].set_title('EOBS')
/soge-home/users/cenv0732/.conda/envs/upscale/lib/python3.7/site-packages/xarray/core/nanops.py:142: RuntimeWarning: Mean of empty slice
return np.nanmean(a, axis=axis, dtype=dtype)
[32]:
<matplotlib.collections.QuadMesh at 0x7fc1ea579590>
/soge-home/users/cenv0732/.conda/envs/upscale/lib/python3.7/site-packages/xarray/core/nanops.py:142: RuntimeWarning: Mean of empty slice
return np.nanmean(a, axis=axis, dtype=dtype)
[32]:
<matplotlib.collections.QuadMesh at 0x7fc1ea56f750>
[32]:
<cartopy.mpl.feature_artist.FeatureArtist at 0x7fc1ea4fe450>
[32]:
<cartopy.mpl.feature_artist.FeatureArtist at 0x7fc1ea56ffd0>
[32]:
Text(0.5, 1.0, 'SEAS5')
[32]:
Text(0.5, 1.0, 'EOBS')
Extract the spatial average¶
To select the UK average, we select SEAS5 precipitation (tprate), select the gridcells that are within the UK and take the area-weighted mean over those gridcells. This results in a dataset of February precipitation for 35 years (1981-2016), with 5 leadtimes and 25 ensemble members.
[38]:
SEAS5_UK_weighted = (SEAS5
.where(SEAS5_mask == 31)
.where(EOBS_upscaled['rr'].sel(time='1950').squeeze('time').notnull())
.weighted(Gridarea_SEAS5['cell_area'])
.mean(dim=['latitude', 'longitude'])
)
SEAS5_UK_weighted
[38]:
- leadtime: 5
- number: 25
- time: 35
- time(time)datetime64[ns]1982-02-01 ... 2016-02-01
- long_name :
- time
array(['1982-02-01T00:00:00.000000000', '1983-02-01T00:00:00.000000000', '1984-02-01T00:00:00.000000000', '1985-02-01T00:00:00.000000000', '1986-02-01T00:00:00.000000000', '1987-02-01T00:00:00.000000000', '1988-02-01T00:00:00.000000000', '1989-02-01T00:00:00.000000000', '1990-02-01T00:00:00.000000000', '1991-02-01T00:00:00.000000000', '1992-02-01T00:00:00.000000000', '1993-02-01T00:00:00.000000000', '1994-02-01T00:00:00.000000000', '1995-02-01T00:00:00.000000000', '1996-02-01T00:00:00.000000000', '1997-02-01T00:00:00.000000000', '1998-02-01T00:00:00.000000000', '1999-02-01T00:00:00.000000000', '2000-02-01T00:00:00.000000000', '2001-02-01T00:00:00.000000000', '2002-02-01T00:00:00.000000000', '2003-02-01T00:00:00.000000000', '2004-02-01T00:00:00.000000000', '2005-02-01T00:00:00.000000000', '2006-02-01T00:00:00.000000000', '2007-02-01T00:00:00.000000000', '2008-02-01T00:00:00.000000000', '2009-02-01T00:00:00.000000000', '2010-02-01T00:00:00.000000000', '2011-02-01T00:00:00.000000000', '2012-02-01T00:00:00.000000000', '2013-02-01T00:00:00.000000000', '2014-02-01T00:00:00.000000000', '2015-02-01T00:00:00.000000000', '2016-02-01T00:00:00.000000000'], dtype='datetime64[ns]')
- number(number)int320 1 2 3 4 5 6 ... 19 20 21 22 23 24
- long_name :
- ensemble_member
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24], dtype=int32)
- leadtime(leadtime)int642 3 4 5 6
array([2, 3, 4, 5, 6])
- tprate(leadtime, time, number)float641.62 1.803 3.715 ... 1.681 1.769
array([[[1.61983929, 1.80270562, 3.71546987, ..., 2.88294029, 1.739468 , 2.38352979], [2.92980026, 1.76955459, 4.32214303, ..., 2.82505383, 4.08514105, 2.35270195], [3.27281587, 3.64590937, 3.30555726, ..., 5.14189287, 3.80868977, 2.52809885], ..., [2.989394 , 3.77111344, 4.6468335 , ..., 1.54306509, 2.88290349, 1.51127271], [2.77336544, 3.56587255, 2.67943505, ..., 3.31771558, 5.10737252, 3.54107268], [2.1260184 , 2.32783242, 3.5086297 , ..., 3.58126581, 1.10084914, 3.4377672 ]], [[1.09491934, 1.53100393, 3.26492016, ..., 1.38953026, 2.79466853, 0.85712489], [0.82039565, 2.81008751, 2.15275396, ..., 3.3407505 , 2.78649799, 4.40346567], [3.66650518, 3.36296082, 1.86208923, ..., 3.88302357, 3.73840395, 3.05970022], ..., [2.44442749, 3.57619699, 3.30370749, ..., 2.36393416, 3.78797196, 3.61051521], [1.85189896, 3.64994576, 2.75797531, ..., 3.54899347, 1.96163795, 3.92756756], [1.69035158, 1.90886006, 2.81342074, ..., 3.43975463, 3.08070923, 1.71887629]], [[1.82732135, 2.35700255, 1.33907771, ..., 1.75420451, 1.73882812, 2.28699077], [2.40939677, 3.5025749 , 2.91052599, ..., 3.20174607, 2.67736291, 3.16030153], [4.27967074, 2.39810642, 3.68449437, ..., 1.75735736, 1.74492558, 4.47295371], ..., [2.73664435, 2.66972478, 1.7240791 , ..., 2.90310921, 3.05444721, 3.45214616], [2.05969104, 1.85866855, 2.63843 , ..., 2.55877674, 3.28343286, 2.6359866 ], [3.36070822, 3.69223251, 1.79020433, ..., 3.07561624, 3.00743179, 2.03883137]], [[2.77891452, 3.58542727, 1.01360512, ..., 1.12565481, 2.77999248, 1.98110613], [3.83915299, 1.66513433, 3.15175385, ..., 3.43047301, 2.81870343, 3.69259162], [3.09151276, 2.05154796, 3.72162802, ..., 0.82186029, 3.91416119, 3.56813021], ..., [2.96839925, 5.80239685, 1.63705973, ..., 1.95204278, 1.81608721, 2.1571341 ], [4.20770424, 3.65632397, 2.84265034, ..., 1.68172334, 0.55870573, 2.63901647], [1.29483006, 3.08373202, 2.0073062 , ..., 2.43205439, 4.68513471, 1.89181443]], [[3.10352949, 3.18728323, 2.56303604, ..., 4.17817963, 1.23774859, 1.28303082], [1.98921652, 3.28544606, 2.2746998 , ..., 2.39227487, 4.26844875, 3.25052759], [4.10408859, 1.99422737, 1.67128341, ..., 4.60826197, 1.73396222, 4.24098449], ..., [1.74079919, 3.03078117, 2.43933043, ..., 1.16495896, 2.24788933, 2.25296764], [4.72132559, 4.58284629, 2.2242827 , ..., 4.47087103, 1.12834648, 2.50279851], [3.01664848, 2.44682909, 2.00389092, ..., 3.64223307, 1.68083814, 1.76908221]]])
[40]:
EOBS_UK_weighted = (EOBS_upscaled
.where(SEAS5_mask == 31) ## EOBS is now on the SEAS5 grid, so use the SEAS5 mask and gridcell area
.weighted(Gridarea_SEAS5['cell_area'])
.mean(dim=['latitude', 'longitude'])
)
EOBS_UK_weighted
EOBS_UK_weighted['rr'].plot()
[40]:
- time: 71
- time(time)datetime64[ns]1950-02-28 ... 2020-02-29
array(['1950-02-28T00:00:00.000000000', '1951-02-28T00:00:00.000000000', '1952-02-29T00:00:00.000000000', '1953-02-28T00:00:00.000000000', '1954-02-28T00:00:00.000000000', '1955-02-28T00:00:00.000000000', '1956-02-29T00:00:00.000000000', '1957-02-28T00:00:00.000000000', '1958-02-28T00:00:00.000000000', '1959-02-28T00:00:00.000000000', '1960-02-29T00:00:00.000000000', '1961-02-28T00:00:00.000000000', '1962-02-28T00:00:00.000000000', '1963-02-28T00:00:00.000000000', '1964-02-29T00:00:00.000000000', '1965-02-28T00:00:00.000000000', '1966-02-28T00:00:00.000000000', '1967-02-28T00:00:00.000000000', '1968-02-29T00:00:00.000000000', '1969-02-28T00:00:00.000000000', '1970-02-28T00:00:00.000000000', '1971-02-28T00:00:00.000000000', '1972-02-29T00:00:00.000000000', '1973-02-28T00:00:00.000000000', '1974-02-28T00:00:00.000000000', '1975-02-28T00:00:00.000000000', '1976-02-29T00:00:00.000000000', '1977-02-28T00:00:00.000000000', '1978-02-28T00:00:00.000000000', '1979-02-28T00:00:00.000000000', '1980-02-29T00:00:00.000000000', '1981-02-28T00:00:00.000000000', '1982-02-28T00:00:00.000000000', '1983-02-28T00:00:00.000000000', '1984-02-29T00:00:00.000000000', '1985-02-28T00:00:00.000000000', '1986-02-28T00:00:00.000000000', '1987-02-28T00:00:00.000000000', '1988-02-29T00:00:00.000000000', '1989-02-28T00:00:00.000000000', '1990-02-28T00:00:00.000000000', '1991-02-28T00:00:00.000000000', '1992-02-29T00:00:00.000000000', '1993-02-28T00:00:00.000000000', '1994-02-28T00:00:00.000000000', '1995-02-28T00:00:00.000000000', '1996-02-29T00:00:00.000000000', '1997-02-28T00:00:00.000000000', '1998-02-28T00:00:00.000000000', '1999-02-28T00:00:00.000000000', '2000-02-29T00:00:00.000000000', '2001-02-28T00:00:00.000000000', '2002-02-28T00:00:00.000000000', '2003-02-28T00:00:00.000000000', '2004-02-29T00:00:00.000000000', '2005-02-28T00:00:00.000000000', '2006-02-28T00:00:00.000000000', '2007-02-28T00:00:00.000000000', '2008-02-29T00:00:00.000000000', '2009-02-28T00:00:00.000000000', '2010-02-28T00:00:00.000000000', '2011-02-28T00:00:00.000000000', '2012-02-29T00:00:00.000000000', '2013-02-28T00:00:00.000000000', '2014-02-28T00:00:00.000000000', '2015-02-28T00:00:00.000000000', '2016-02-29T00:00:00.000000000', '2017-02-28T00:00:00.000000000', '2018-02-28T00:00:00.000000000', '2019-02-28T00:00:00.000000000', '2020-02-29T00:00:00.000000000'], dtype='datetime64[ns]')
- rr(time)float644.127 3.251 1.072 ... 1.782 4.92
array([4.12725776, 3.25073545, 1.07154934, 1.59250362, 2.59011679, 2.19460832, 0.86553777, 2.71508494, 3.63875078, 0.43748921, 2.44475198, 2.85323568, 1.90375894, 0.93073633, 0.96014483, 0.54936363, 3.93806659, 3.41126225, 1.52403826, 2.14680588, 3.24367008, 1.61173947, 2.49205547, 1.86097057, 3.44753495, 1.09562791, 1.72599619, 4.25387794, 2.67016706, 1.75307168, 2.88443313, 2.00213592, 2.06658247, 1.35684105, 2.11976754, 1.09154394, 0.42823908, 1.98726177, 2.62084822, 3.95611062, 6.18364117, 2.09273353, 2.32127359, 0.71821086, 2.72255429, 4.36020144, 2.92272651, 5.12799133, 1.95292637, 2.49697346, 4.02321657, 3.13585905, 5.53924289, 1.39434155, 1.90494815, 1.91951746, 1.85577634, 3.60145376, 2.21509453, 1.73046396, 2.21306353, 3.38992427, 1.39408258, 1.73443926, 5.42245947, 2.37410915, 3.24500002, 2.77368044, 1.49827506, 1.78169717, 4.91976886])
[40]:
[<matplotlib.lines.Line2D at 0x7fc1ea2f4850>]
Illustrate the SEAS5 and EOBS UK average¶
And the area-weighted average UK precipitation for SEAS5 and EOBS I plot here. For SEAS5 I plot the range, both min/max and the 2.5/97.5 % percentile of all ensemble members and leadtimes for each year.
[43]:
ax = plt.axes()
Quantiles = (SEAS5_UK_weighted['tprate']
.quantile([0,2.5/100, 0.5, 97.5/100,1],
dim=['number','leadtime']
)
)
ax.plot(Quantiles.time, Quantiles.sel(quantile=0.5),
color='orange',
label = 'SEAS5 median')
ax.fill_between(Quantiles.time.values, Quantiles.sel(quantile=0.025), Quantiles.sel(quantile=0.975),
color='orange',
alpha=0.2,
label = '95% / min max')
ax.fill_between(Quantiles.time.values, Quantiles.sel(quantile=0), Quantiles.sel(quantile=1),
color='orange',
alpha=0.2)
EOBS_UK_weighted['rr'].plot(ax=ax,
x='time',
label = 'E-OBS')
plt.legend(loc = 'lower left',
ncol=2 ) #loc = (0.1, 0) upper left
[43]:
[<matplotlib.lines.Line2D at 0x7fc1ea2cef10>]
[43]:
<matplotlib.collections.PolyCollection at 0x7fc1ea39f250>
[43]:
<matplotlib.collections.PolyCollection at 0x7fc1ea77fed0>
[43]:
[<matplotlib.lines.Line2D at 0x7fc1ea449310>]
[43]:
<matplotlib.legend.Legend at 0x7fc1ea56fc90>
And save the UK weighted average datasets¶
[45]:
SEAS5_UK_weighted.to_netcdf('Data/SEAS5_UK_weighted_masked.nc')
SEAS5_UK_weighted.to_dataframe().to_csv('Data/SEAS5_UK_weighted_masked.csv')
EOBS_UK_weighted.to_netcdf('Data/EOBS_UK_weighted_upscaled.nc') ## save as netcdf
EOBS_UK_weighted.to_dataframe().to_csv('Data/EOBS_UK_weighted_upscaled.csv') ## and save as csv.
[46]:
SEAS5_UK_weighted.close()
EOBS_UK_weighted.close()
Other methods¶
There are many different sources and methods available for extracting areal-averages from shapefiles. Here I have used shapely / masking in xarray. Something that lacks with this method is the weighted extraction from a shapefile, that is more precise on the boundaries. In R, raster:extract can use the percentage of the area that falls within the country for each grid cell to use as weight in averaging. For more information on this method, see the EGU 2018 course. For SEAS5, with its coarse resolution, this might make a difference. However, for it’s speed and reproducibility, we have chosen to stick to xarray.
We have used xarray where you can apply weights yourself to a dataset and then calculate the weighted mean. Sources I have used: * xarray weighted reductions * Matteo’s blog * regionmask package * Arctic weighted average example * area weighted temperature example.
And this pretty awesome colab notebook on seasonal forecasting regrids seasonal forecasts and reanalysis on the same grid before calculating skill scores.