API Reference

climapct is both a freva plugin and a stand alone library that can be used to process climate data in order to select regions defined by coordinates stored in geo reference data, like shape or geojson files.

Installation

The best method to install the library is cloning the repository and creating a conda environment:

git clone https://gitlab.dkrz.de/ch1187/plugins4freva/climpact.git
cd climpact
conda env create -f conda-env.yml -n climpact

These commands would create a fresh conda environment named climpact after activating the environment

conda activate climpact

You can make use of the stand alone library.

class climpact.Averager(shapefile: str, region: str, mask: Tuple[str, str] | None = None)[source]

Bases: object

Collection of methods to create averages.

Attributes:
georefdata

Read the geo reference data.

Methods

fldmean(data[, dims, land_frac, skipna])

CDO equivalent of field mean.

get_extent(dset[, shape_data])

Calculate the lon/lat bounds of the valid data within a dataset.

get_mask(dset[, method, shape_data])

Create a mask region from the shape file information.

get_native_shape_data(region)

Read the given shape-file.

seasonal_mean

static fldmean(data, dims=('lat', 'lon'), land_frac=None, skipna=True)[source]

CDO equivalent of field mean.

Parameters:
dataxr.DataArray, xr.Dataset

input data

dims: tuple, default: (lat, lon)

geographical dimensions names

land_frac: xr.DataArray, dask.array, numpy.array, default: None

mask that is applied to the data

skipna: bool, default: True

drop nan values

Returns
——-

xr.DataArray, xr.Dataset: field mean

property georefdata

Read the geo reference data.

get_extent(dset: DataArray, shape_data: GeoDataFrame | None = None) Dict[str, slice][source]

Calculate the lon/lat bounds of the valid data within a dataset.

get_mask(dset: DataArray, method: str = 'centres', shape_data: GeoDataFrame | None = None) DataArray[source]

Create a mask region from the shape file information.

get_native_shape_data(region)[source]

Read the given shape-file.

static seasonal_mean(ds: Dataset, skipna: bool = False) Dataset[source]
class climpact.DataContainer(*datasets: tuple[Dataset, ...])[source]

Bases: object

Class that holds information about processed data.

Attributes:
container

Overview of the content of this data container.

variables

Returns a list of all variables within the data container.

Methods

convert(variable, unit)

Convert all variables in the data container to a desired unit.

from_zipfile(*zipfile)

Create a DataContainer from a zipfile that contians the time series data.

replace(dset[, index, name])

Replace a dataset in a container

save([filetype, username, filename])

Save the dataset to a zip file and upload the zip file to swift cloud store.

select(*[, by_index, by_name])

Select a dataset from the container.

property container

Overview of the content of this data container.

convert(variable: str, unit: str) None[source]

Convert all variables in the data container to a desired unit.

dc = DataContainer.from_zipfile("output.zip")
dc.convert("tas", "degC")
Parameters:
variable:

the variable name that needs to be converted

unit:

the desired unit the variable is converted to

classmethod from_zipfile(*zipfile: str | Path)[source]

Create a DataContainer from a zipfile that contians the time series data.

dc1 = DataContainer.from_zipfile("output.zip")
dataset1 = dc1.select(by_index=0)
dc2 = DataContainer.from_zipfile("*")
dataset2 = dc2.select(by_index=0)
Parameters:
zipfile:

Path to the zip file that contains the saved data, glob pattern like ‘*’ or ‘*.zip’ are also excepted.

Returns:
Returns The DataContainer ojbect holding all nessecary information to post-process the time-series data
replace(dset: Dataset, index: int | None = None, name: str | None = None) None[source]

Replace a dataset in a container

Parameters:
dset:

The dataset that needs to be replaced

index:

The index of the dataset in the container that is replaced

name:

The name of the model member of the according dataset that is replaced

save(filetype: str = 'nc', username: str | None = None, filename: str | Path | None = None) None[source]

Save the dataset to a zip file and upload the zip file to swift cloud store.

Parameters:
filetype:

The file extension the data is saved to, should be one of [h5, hdf5, nc, csv]

username:

The username that is used to login to the swift cloud store, if None is given (default), the current system user will be taken.

filename:

The desired file name of the saved zip file, if None is given (default) an informative filename will be constructed from the metadata.

select(*, by_index: int | None = None, by_name: str | None = None) Dataset[source]

Select a dataset from the container.

dc = DataContainer.from_zipfile("output.zip")
dc.container
dataset1 = dc.select(by_index=0)
dataset2 = dc.select(by_name="foo_bar")
Parameters:
by_index:

select the dataset based on the index in the container.

by_name:

select the dataset based on the model name in the container.

Returns:
Returns xarray dataset
property variables: List[str]

Returns a list of all variables within the data container.

class climpact.RunDirectory(files: List[str], variables: List[str], abort_on_error: bool = False, mask: Tuple[str, str] | None = None, reindex: bool = True, parallel: bool = True, **kwargs)[source]

Bases: Averager

The RunDirectory class is a class for reading data.

It offers easy control over selecting complex regions defined in geo-reference datasets like shape or geojson files. See also the Api Reference section for more details.

Parameters:
files: list[str]

List of files that should be opened by the reader

variables: list[str]

list variables that should be considered

shape_file: str, default: None

Path to the shape file that is used to define the mask region. If None given (default) no masking will be applied.

region: str, default: “”

Select the name of a sub region within the shape file. If None given (default) the whole geometry defined in the shape file is taken.

mask_method: str, default: centres

String representing the method how the masked region (if given) should be applied. The string can either be centres or corners. Where centres selects only those grid-boxes that have the grid-box centres within the coordinates of the mask region. corners will select grid-boxes that have at least one of the four corners of the grid-boxes within the coordinates of the mask region. This means that corners, compared to centres will slightly increase the selected area.

mask: tuple[str, int], default: None

If additionally a land or sea region should be mask this variable can be used to set the path to a land-sea mask file (first entry in the tuple) and the type of the mask (second entry - 0: land, 1: sea).

abort_on_error: bool, default: False

Exit if something goes wrong while loading the dataset

reindex: bool, default: True

Apply nearest neighbor re-indexing to a larger grid in order to achieve a more precise region selection.

kwargs:

Additional information about the run

Examples

First we read a dataset without applying a mask at all

from climpact import RunDirectory
rd = RundDirectory(["~/orog.nc"], ["orog"])
print(rd.variables)
['orog']
print(type(rd.dataset))
<class 'xarray.core.dataset.Dataset'>
print(rd.dataset["orog"])
<xarray.DataArray 'orog' (rlat: 412, rlon: 424)>
dask.array<mul, shape=(412, 424), dtype=float64, chunksize=(412, 424), chunktype=numpy.ndarray>
Coordinates:
    lon      (rlat, rlon) float64 dask.array<chunksize=(412, 424), meta=np.ndarray>
    lat      (rlat, rlon) float64 dask.array<chunksize=(412, 424), meta=np.ndarray>
  * rlon     (rlon) float64 -28.38 -28.26 -28.16 -28.05 ... 17.93 18.05 18.16
  * rlat     (rlat) float64 -23.38 -23.26 -23.16 -23.05 ... 21.61 21.73 21.83
    Y        (rlat, rlon) float64 21.99 22.03 22.07 22.11 ... 66.81 66.75 66.69
    X        (rlat, rlon) float64 -10.06 -9.964 -9.864 ... 64.55 64.76 64.96
Attributes:
    standard_name:  surface_altitude
    long_name:      Surface Altitude
    units:          m
    grid_mapping:   rotated_pole

Now let’s apply a mask defined in a shape file

from climpact import RunDirectory
rd = RundDirectory(["~/orog.nc"], ["orog"], shape_file="Germany.shp")
print(rd.dataset["orog"])
<xarray.DataArray 'orog' (rlat: 365, rlon: 275)>
dask.array<mul, shape=(365, 275), dtype=float64, chunksize=(365, 275), chunktype=numpy.ndarray>
Coordinates:
  * rlon     (rlon) float64 -7.695 -7.673 -7.652 -7.63 ... -1.798 -1.777 -1.755
  * rlat     (rlat) float64 -3.245 -3.223 -3.201 -3.18 ... 4.631 4.653 4.675
    lon      (rlat, rlon) float64 dask.array<chunksize=(365, 275), meta=np.ndarray>
    lat      (rlat, rlon) float64 dask.array<chunksize=(365, 275), meta=np.ndarray>
    Y        (rlat, rlon) float64 46.92 46.92 46.92 46.93 ... 55.39 55.39 55.39
    X        (rlat, rlon) float64 6.713 6.745 6.776 6.807 ... 14.84 14.88 14.92
Attributes:
    standard_name:  surface_altitude
    long_name:      Surface Altitude
    units:          m
    grid_mapping:   rotated_pole

Finally we can only select a sub region in the shape file by giving the key or index to the sub region.

from climpact import RunDirectory
rd = RundDirectory(["~/orog.nc"],
                   ["orog"],
                   shape_file="Germany.shp",
                   region="Schaumburg",
                   )
print(rd.dataset["orog"])
<xarray.DataArray 'orog' (rlat: 25, rlon: 25)>
dask.array<mul, shape=(25, 25), dtype=float64, chunksize=(25, 25), chunktype=numpy.ndarray>
Coordinates:
  * rlon     (rlon) float64 -5.605 -5.587 -5.568 -5.55 ... -5.202 -5.183 -5.165
  * rlat     (rlat) float64 1.595 1.613 1.632 1.65 ... 1.98 1.998 2.017 2.035
    lon      (rlat, rlon) float64 dask.array<chunksize=(25, 25), meta=np.ndarray>
    lat      (rlat, rlon) float64 dask.array<chunksize=(25, 25), meta=np.ndarray>
    Y        (rlat, rlon) float64 52.0 52.0 52.0 52.01 ... 52.48 52.49 52.49
    X        (rlat, rlon) float64 8.876 8.905 8.935 8.964 ... 9.444 9.474 9.504
Attributes:
    standard_name:  surface_altitude
    long_name:      Surface Altitude
    units:          m
    grid_mapping:   rotated_pole
Attributes:
bounding_box

Get the max boundingbox of non NaN values.

dataset

The selected sub region multiplied by a land-see mask (if chosen).

georefdata

Read the geo reference data.

mask

Create a gridded mask that is calculated from the shape region.

regionborder

The region border polygon as a cartopy feature.

shape_data

Shape data in the projection of the netcdf data.

variables

The variables as they are stored in the dataset.

Methods

fldmean(data[, dims, land_frac, skipna])

CDO equivalent of field mean.

get_extent(dset[, shape_data])

Calculate the lon/lat bounds of the valid data within a dataset.

get_mask(dset[, method, shape_data])

Create a mask region from the shape file information.

get_native_shape_data(region)

Read the given shape-file.

plot_maps(dset, fig_prefix[, shape_data])

Plot map data.

save_data(file, dset[, filetype])

Save the time series data to disk

save_table(dset, filename)

Convert xr.Dataset to pandas dataframe.

set_shape_projection_coords(dataset)

Add coordinates in the projection of the given input shapefile.

seasonal_mean

property bounding_box: Dict[str, slice]

Get the max boundingbox of non NaN values.

property dataset: Dataset | None

The selected sub region multiplied by a land-see mask (if chosen).

earth_radius = 6370000
property mask: DataArray

Create a gridded mask that is calculated from the shape region.

plot_maps(dset: Dataset, fig_prefix: str, shape_data: GeoDataFrame | None = None) None[source]

Plot map data.

Parameters:
max_index:

The number of timesteps to be considered why plotting the map.

fig_prefix:

Prefix of the file name that is plotted

shape_data:

the pol borders that should be plotted.

property regionborder: ShapelyFeature

The region border polygon as a cartopy feature.

classmethod save_data(file: ZipFile | str, dset: Dataset, filetype: str = 'nc') Path[source]

Save the time series data to disk

Parameters:
file:

The zipfile object where the data is saved to

dset:

the data that should be saved to file

filetype:

file type of the output data

static save_table(dset: Dataset, filename: Path) None[source]

Convert xr.Dataset to pandas dataframe.

Parameters:
dset:

xarray input dataset

filename:

output file path

set_shape_projection_coords(dataset: Dataset) Dataset[source]

Add coordinates in the projection of the given input shapefile.

Parameters:
datasset:

The input dataset the new coordinates are added to

Returns:
xarray.Dataset:

Xarray dataset where coordiantes in eastward dirction (X) and northward direction (Y) have been added a coordinates

property shape_data

Shape data in the projection of the netcdf data.

property variables: list[str]

The variables as they are stored in the dataset.

class climpact.Swift(account: str, username: str | None = None, password: str | None = None)[source]

Bases: object

Estabilish a connection to a dkrz swift cloud store.

Parameters:
account: str

Account name that is used as swift login

username: str, optional

Username to logon if not given (default) set to account name. Which is equivalent to logging on to the personal siwft.

password: str, optional

Password used to logon to the swift store. If not given (default) and a password is needed a password prompt will ask for the password.

Note

A password will only be needed if no login authorisation token exists or the token has expired.

Attributes:
auth_token: str

Swift authentication token.

Methods

get_passwd([msg, wait_sec])

Get a password from the getpass prompt

upload(inp_dir, container)

Upload a folder to a given swift-container.

static get_passwd(msg: str = 'User password for cloud storage', wait_sec: int = 60) str[source]

Get a password from the getpass prompt

Parameters:
msg:

user defined message to be displayed when asking for the password.

wait_sec:

amount of seconds to wait before terminating the password request.

Returns:
the password, or None if terminated.
property swift_file: Path

File where the information to the swift connection is stored.

upload(inp_dir: str | Path, container: str) str[source]

Upload a folder to a given swift-container.

class climpact.UnitConverter[source]

Bases: object

Methods

DimensionalityError(units1, units2[, dim1, ...])

Raised when trying to convert between incompatible units.

UndefinedUnitError(unit_names)

Raised when the units are not defined in the unit registry.

convert

units

exception DimensionalityError(units1: Any, units2: Any, dim1: str = '', dim2: str = '', extra_msg: str = '')

Bases: PintTypeError

Raised when trying to convert between incompatible units.

dim1: str = ''
dim2: str = ''
extra_msg: str = ''
units1: Any
units2: Any
exception UndefinedUnitError(unit_names: str | tuple[str, ...])

Bases: AttributeError, PintError

Raised when the units are not defined in the unit registry.

unit_names: str | tuple[str, ...]
classmethod convert(dset, outputunit)[source]
units = <pint.registry.UnitRegistry object>
climpact.get_datetime(da: array | DataArray, time: List[datetime] | array, **kwargs: str | bool) List[Timestamp | datetime][source]

Match a list time timestamps to the time object in a given data array.

Parameters:
danumpy.array

time array as xarray.DataArray that contains the target time object

time: collection

collection of datetime objects that will be matched

kwargs:

Additional keyword arguments that can be used to overwrite, certain timestamp attributes e.g. year=2020

Returns
——-
list: list of timestamps with of the same type as da
climpact.open_mfdataset(inp_files: List[str | Path], thresh: float = 0.1, parallel: bool = True, variables: List[str] | None = None) Dataset[source]

Open multiple datasets.

This is a re-implementation of the xarray.mf_dataset method. The only difference is that is assumes a list of files paths, glob pattern won’t be working. Additionally a threshold for the fraction files that can be currupeted when trying to open datasets can be passed.

Parameters:
inp_files: list[str]

List of filepaths to be opened.

thresh: float, default: 0.1

Fraction of the number of files that can be corrupted, if the fraction is surpassed and error is raised.

parallel: bool, default: True

Open all files in parallel.

variables: list, default: None

A list of variable names that have to be present in the combined dataset, if any variables are missing an error is raised. By default (None) no checks for the presence of variables is performed.

Raises:
ValueError: if the max fraction of corrupted files is surpassed or not

all variables are present in the merged dataset.