API Reference

climapct is both a freva plugin and a stand alone library that can be used to process climate data in order to select regions defined by coordinates stored in geo reference data, like shape or geojson files.

Installation

The best method to install the library is cloning the repository and creating a conda environment:

git clone https://gitlab.dkrz.de/ch1187/plugins4freva/climpact.git
cd climpact
conda env create -f conda-env.yml -n climpact

These commands would create a fresh conda environment named climpact after activating the environment

conda activate climpact

You can make use of the stand alone library.

class climpact.Averager(shapefile: str, region: str, mask: Tuple[str, str] | None = None)[source]

Bases: object

Collection of methods to create averages.

Attributes:

georefdata: Read the geo reference data.

Methods

`fldmean`(data[, dims, land_frac, skipna])	CDO equivalent of field mean.
`get_extent`(dset[, shape_data])	Calculate the lon/lat bounds of the valid data within a dataset.
`get_mask`(dset[, method, shape_data])	Create a mask region from the shape file information.
`get_native_shape_data`(region)	Read the given shape-file.

seasonal_mean

static fldmean(data, dims=('lat', 'lon'), land_frac=None, skipna=True)[source]

CDO equivalent of field mean.

Parameters:

dataxr.DataArray, xr.Dataset

input data

dims: tuple, default: (lat, lon): geographical dimensions names
land_frac: xr.DataArray, dask.array, numpy.array, default: None: mask that is applied to the data
skipna: bool, default: True: drop nan values

Returns

——-

xr.DataArray, xr.Dataset: field mean

property georefdata: Read the geo reference data.

get_extent(dset: DataArray, shape_data: GeoDataFrame | None = None) → Dict[str, slice][source]: Calculate the lon/lat bounds of the valid data within a dataset.

get_mask(dset: DataArray, method: str = 'centres', shape_data: GeoDataFrame | None = None) → DataArray[source]: Create a mask region from the shape file information.

get_native_shape_data(region)[source]: Read the given shape-file.

static seasonal_mean(ds: Dataset, skipna: bool = False) → Dataset[source]

class climpact.DataContainer(*datasets: tuple[Dataset, ...])[source]

Bases: object

Class that holds information about processed data.

Attributes:

container: Overview of the content of this data container.
variables: Returns a list of all variables within the data container.

Methods

`convert`(variable, unit)	Convert all variables in the data container to a desired unit.
`from_zipfile`(*zipfile)	Create a DataContainer from a zipfile that contians the time series data.
`replace`(dset[, index, name])	Replace a dataset in a container
`save`([filetype, username, filename])	Save the dataset to a zip file and upload the zip file to swift cloud store.
`select`(*[, by_index, by_name])	Select a dataset from the container.

property container: Overview of the content of this data container.

convert(variable: str, unit: str) → None[source]

Convert all variables in the data container to a desired unit.

dc = DataContainer.from_zipfile("output.zip")
dc.convert("tas", "degC")

Parameters:

variable:: the variable name that needs to be converted
unit:: the desired unit the variable is converted to

classmethod from_zipfile(*zipfile: str | Path)[source]

Create a DataContainer from a zipfile that contians the time series data.

dc1 = DataContainer.from_zipfile("output.zip")
dataset1 = dc1.select(by_index=0)
dc2 = DataContainer.from_zipfile("*")
dataset2 = dc2.select(by_index=0)

Parameters:

zipfile:: Path to the zip file that contains the saved data, glob pattern like ‘*’ or ‘*.zip’ are also excepted.

Returns:

Returns The DataContainer ojbect holding all nessecary information to post-process the time-series data

replace(dset: Dataset, index: int | None = None, name: str | None = None) → None[source]

Replace a dataset in a container

Parameters:

dset:: The dataset that needs to be replaced
index:: The index of the dataset in the container that is replaced
name:: The name of the model member of the according dataset that is replaced

save(filetype: str = 'nc', username: str | None = None, filename: str | Path | None = None) → None[source]

Save the dataset to a zip file and upload the zip file to swift cloud store.

Parameters:

filetype:: The file extension the data is saved to, should be one of [h5, hdf5, nc, csv]
username:: The username that is used to login to the swift cloud store, if None is given (default), the current system user will be taken.
filename:: The desired file name of the saved zip file, if None is given (default) an informative filename will be constructed from the metadata.

select(*, by_index: int | None = None, by_name: str | None = None) → Dataset[source]

Select a dataset from the container.

dc = DataContainer.from_zipfile("output.zip")
dc.container
dataset1 = dc.select(by_index=0)
dataset2 = dc.select(by_name="foo_bar")

Parameters:

by_index:: select the dataset based on the index in the container.
by_name:: select the dataset based on the model name in the container.

Returns:

Returns xarray dataset

property variables: List[str]: Returns a list of all variables within the data container.

class climpact.RunDirectory(files: List[str], variables: List[str], abort_on_error: bool = False, mask: Tuple[str, str] | None = None, reindex: bool = True, parallel: bool = True, **kwargs)[source]

Bases: Averager

The RunDirectory class is a class for reading data.

It offers easy control over selecting complex regions defined in geo-reference datasets like shape or geojson files. See also the Api Reference section for more details.

Parameters:

files: list[str]: List of files that should be opened by the reader
variables: list[str]: list variables that should be considered
shape_file: str, default: None: Path to the shape file that is used to define the mask region. If None given (default) no masking will be applied.
region: str, default: “”: Select the name of a sub region within the shape file. If None given (default) the whole geometry defined in the shape file is taken.
mask_method: str, default: centres: String representing the method how the masked region (if given) should be applied. The string can either be centres or corners. Where centres selects only those grid-boxes that have the grid-box centres within the coordinates of the mask region. corners will select grid-boxes that have at least one of the four corners of the grid-boxes within the coordinates of the mask region. This means that corners, compared to centres will slightly increase the selected area.
mask: tuple[str, int], default: None: If additionally a land or sea region should be mask this variable can be used to set the path to a land-sea mask file (first entry in the tuple) and the type of the mask (second entry - 0: land, 1: sea).
abort_on_error: bool, default: False: Exit if something goes wrong while loading the dataset
reindex: bool, default: True: Apply nearest neighbor re-indexing to a larger grid in order to achieve a more precise region selection.
kwargs:: Additional information about the run

Examples

First we read a dataset without applying a mask at all

from climpact import RunDirectory
rd = RundDirectory(["~/orog.nc"], ["orog"])
print(rd.variables)
['orog']
print(type(rd.dataset))
<class 'xarray.core.dataset.Dataset'>
print(rd.dataset["orog"])
<xarray.DataArray 'orog' (rlat: 412, rlon: 424)>
dask.array<mul, shape=(412, 424), dtype=float64, chunksize=(412, 424), chunktype=numpy.ndarray>
Coordinates:
    lon      (rlat, rlon) float64 dask.array<chunksize=(412, 424), meta=np.ndarray>
    lat      (rlat, rlon) float64 dask.array<chunksize=(412, 424), meta=np.ndarray>
  * rlon     (rlon) float64 -28.38 -28.26 -28.16 -28.05 ... 17.93 18.05 18.16
  * rlat     (rlat) float64 -23.38 -23.26 -23.16 -23.05 ... 21.61 21.73 21.83
    Y        (rlat, rlon) float64 21.99 22.03 22.07 22.11 ... 66.81 66.75 66.69
    X        (rlat, rlon) float64 -10.06 -9.964 -9.864 ... 64.55 64.76 64.96
Attributes:
    standard_name:  surface_altitude
    long_name:      Surface Altitude
    units:          m
    grid_mapping:   rotated_pole

Now let’s apply a mask defined in a shape file

from climpact import RunDirectory
rd = RundDirectory(["~/orog.nc"], ["orog"], shape_file="Germany.shp")
print(rd.dataset["orog"])
<xarray.DataArray 'orog' (rlat: 365, rlon: 275)>
dask.array<mul, shape=(365, 275), dtype=float64, chunksize=(365, 275), chunktype=numpy.ndarray>
Coordinates:
  * rlon     (rlon) float64 -7.695 -7.673 -7.652 -7.63 ... -1.798 -1.777 -1.755
  * rlat     (rlat) float64 -3.245 -3.223 -3.201 -3.18 ... 4.631 4.653 4.675
    lon      (rlat, rlon) float64 dask.array<chunksize=(365, 275), meta=np.ndarray>
    lat      (rlat, rlon) float64 dask.array<chunksize=(365, 275), meta=np.ndarray>
    Y        (rlat, rlon) float64 46.92 46.92 46.92 46.93 ... 55.39 55.39 55.39
    X        (rlat, rlon) float64 6.713 6.745 6.776 6.807 ... 14.84 14.88 14.92
Attributes:
    standard_name:  surface_altitude
    long_name:      Surface Altitude
    units:          m
    grid_mapping:   rotated_pole

Finally we can only select a sub region in the shape file by giving the key or index to the sub region.

from climpact import RunDirectory
rd = RundDirectory(["~/orog.nc"],
                   ["orog"],
                   shape_file="Germany.shp",
                   region="Schaumburg",
                   )
print(rd.dataset["orog"])
<xarray.DataArray 'orog' (rlat: 25, rlon: 25)>
dask.array<mul, shape=(25, 25), dtype=float64, chunksize=(25, 25), chunktype=numpy.ndarray>
Coordinates:
  * rlon     (rlon) float64 -5.605 -5.587 -5.568 -5.55 ... -5.202 -5.183 -5.165
  * rlat     (rlat) float64 1.595 1.613 1.632 1.65 ... 1.98 1.998 2.017 2.035
    lon      (rlat, rlon) float64 dask.array<chunksize=(25, 25), meta=np.ndarray>
    lat      (rlat, rlon) float64 dask.array<chunksize=(25, 25), meta=np.ndarray>
    Y        (rlat, rlon) float64 52.0 52.0 52.0 52.01 ... 52.48 52.49 52.49
    X        (rlat, rlon) float64 8.876 8.905 8.935 8.964 ... 9.444 9.474 9.504
Attributes:
    standard_name:  surface_altitude
    long_name:      Surface Altitude
    units:          m
    grid_mapping:   rotated_pole

Attributes:

bounding_box: Get the max boundingbox of non NaN values.
dataset: The selected sub region multiplied by a land-see mask (if chosen).
georefdata: Read the geo reference data.
mask: Create a gridded mask that is calculated from the shape region.
regionborder: The region border polygon as a cartopy feature.
shape_data: Shape data in the projection of the netcdf data.
variables: The variables as they are stored in the dataset.

Methods

`fldmean`(data[, dims, land_frac, skipna])	CDO equivalent of field mean.
`get_extent`(dset[, shape_data])	Calculate the lon/lat bounds of the valid data within a dataset.
`get_mask`(dset[, method, shape_data])	Create a mask region from the shape file information.
`get_native_shape_data`(region)	Read the given shape-file.
`plot_maps`(dset, fig_prefix[, shape_data])	Plot map data.
`save_data`(file, dset[, filetype])	Save the time series data to disk
`save_table`(dset, filename)	Convert xr.Dataset to pandas dataframe.
`set_shape_projection_coords`(dataset)	Add coordinates in the projection of the given input shapefile.

seasonal_mean

property bounding_box: Dict[str, slice]: Get the max boundingbox of non NaN values.

property dataset: Dataset | None: The selected sub region multiplied by a land-see mask (if chosen).

earth_radius = 6370000

property mask: DataArray: Create a gridded mask that is calculated from the shape region.

plot_maps(dset: Dataset, fig_prefix: str, shape_data: GeoDataFrame | None = None) → None[source]

Plot map data.

Parameters:

max_index:: The number of timesteps to be considered why plotting the map.
fig_prefix:: Prefix of the file name that is plotted
shape_data:: the pol borders that should be plotted.

property regionborder: ShapelyFeature: The region border polygon as a cartopy feature.

classmethod save_data(file: ZipFile | str, dset: Dataset, filetype: str = 'nc') → Path[source]

Save the time series data to disk

Parameters:

file:: The zipfile object where the data is saved to
dset:: the data that should be saved to file
filetype:: file type of the output data

static save_table(dset: Dataset, filename: Path) → None[source]

Convert xr.Dataset to pandas dataframe.

Parameters:

dset:: xarray input dataset
filename:: output file path

set_shape_projection_coords(dataset: Dataset) → Dataset[source]

Add coordinates in the projection of the given input shapefile.

Parameters:

datasset:: The input dataset the new coordinates are added to

Returns:

xarray.Dataset:: Xarray dataset where coordiantes in eastward dirction (X) and northward direction (Y) have been added a coordinates

property shape_data: Shape data in the projection of the netcdf data.

property variables: list[str]: The variables as they are stored in the dataset.

class climpact.Swift(account: str, username: str | None = None, password: str | None = None)[source]

Bases: object

Estabilish a connection to a dkrz swift cloud store.

Parameters:

account: str: Account name that is used as swift login
username: str, optional: Username to logon if not given (default) set to account name. Which is equivalent to logging on to the personal siwft.
password: str, optional: Password used to logon to the swift store. If not given (default) and a password is needed a password prompt will ask for the password.

Note

A password will only be needed if no login authorisation token exists or the token has expired.

Attributes:

auth_token: str: Swift authentication token.

Methods

`get_passwd`([msg, wait_sec])	Get a password from the getpass prompt
`upload`(inp_dir, container)	Upload a folder to a given swift-container.

static get_passwd(msg: str = 'User password for cloud storage', wait_sec: int = 60) → str[source]

Get a password from the getpass prompt

Parameters:

msg:: user defined message to be displayed when asking for the password.
wait_sec:: amount of seconds to wait before terminating the password request.

Returns:

the password, or None if terminated.

property swift_file: Path: File where the information to the swift connection is stored.

upload(inp_dir: str | Path, container: str) → str[source]: Upload a folder to a given swift-container.

class climpact.UnitConverter[source]

Bases: object

Methods

`DimensionalityError`(units1, units2[, dim1, ...])	Raised when trying to convert between incompatible units.
`UndefinedUnitError`(unit_names)	Raised when the units are not defined in the unit registry.

convert
units

exception DimensionalityError(units1: Any, units2: Any, dim1: str = '', dim2: str = '', extra_msg: str = '')

Bases: PintTypeError

Raised when trying to convert between incompatible units.

dim1: str = ''

dim2: str = ''

extra_msg: str = ''

units1: Any

units2: Any

exception UndefinedUnitError(unit_names: str | tuple[str, ...])

Bases: AttributeError, PintError

Raised when the units are not defined in the unit registry.

unit_names: str | tuple[str, ...]

classmethod convert(dset, outputunit)[source]

units = <pint.registry.UnitRegistry object>

climpact.get_datetime(da: array | DataArray, time: List[datetime] | array, **kwargs: str | bool) → List[Timestamp | datetime][source]

Match a list time timestamps to the time object in a given data array.

Parameters:

danumpy.array: time array as xarray.DataArray that contains the target time object
time: collection: collection of datetime objects that will be matched
kwargs:: Additional keyword arguments that can be used to overwrite, certain timestamp attributes e.g. year=2020
Returns
——-
list: list of timestamps with of the same type as da

climpact.open_mfdataset(inp_files: List[str | Path], thresh: float = 0.1, parallel: bool = True, variables: List[str] | None = None) → Dataset[source]

Open multiple datasets.

This is a re-implementation of the xarray.mf_dataset method. The only difference is that is assumes a list of files paths, glob pattern won’t be working. Additionally a threshold for the fraction files that can be currupeted when trying to open datasets can be passed.

Parameters:

inp_files: list[str]: List of filepaths to be opened.
thresh: float, default: 0.1: Fraction of the number of files that can be corrupted, if the fraction is surpassed and error is raised.
parallel: bool, default: True: Open all files in parallel.
variables: list, default: None: A list of variable names that have to be present in the combined dataset, if any variables are missing an error is raised. By default (None) no checks for the presence of variables is performed.

Raises:

ValueError: if the max fraction of corrupted files is surpassed or not: all variables are present in the merged dataset.