API Reference
climapct
is both a freva plugin and a stand alone library
that can be used to process climate data in order to select regions defined
by coordinates stored in geo reference data, like shape or geojson files.
Installation
The best method to install the library is cloning the repository and creating a conda environment:
git clone https://gitlab.dkrz.de/ch1187/plugins4freva/climpact.git
cd climpact
conda env create -f conda-env.yml -n climpact
These commands would create a fresh conda environment named climpact after activating the environment
conda activate climpact
You can make use of the stand alone library.
- class climpact.Averager(shapefile: str, region: str, mask: Tuple[str, str] | None = None)[source]
Bases:
object
Collection of methods to create averages.
- Attributes:
- georefdata
Read the geo reference data.
Methods
fldmean
(data[, dims, land_frac, skipna])CDO equivalent of field mean.
get_extent
(dset[, shape_data])Calculate the lon/lat bounds of the valid data within a dataset.
get_mask
(dset[, method, shape_data])Create a mask region from the shape file information.
get_native_shape_data
(region)Read the given shape-file.
seasonal_mean
- static fldmean(data, dims=('lat', 'lon'), land_frac=None, skipna=True)[source]
CDO equivalent of field mean.
- Parameters:
- dataxr.DataArray, xr.Dataset
input data
- dims: tuple, default: (lat, lon)
geographical dimensions names
- land_frac: xr.DataArray, dask.array, numpy.array, default: None
mask that is applied to the data
- skipna: bool, default: True
drop nan values
- Returns
- ——-
xr.DataArray, xr.Dataset: field mean
- property georefdata
Read the geo reference data.
- get_extent(dset: DataArray, shape_data: GeoDataFrame | None = None) Dict[str, slice] [source]
Calculate the lon/lat bounds of the valid data within a dataset.
- class climpact.DataContainer(*datasets: tuple[Dataset, ...])[source]
Bases:
object
Class that holds information about processed data.
- Attributes:
Methods
convert
(variable, unit)Convert all variables in the data container to a desired unit.
from_zipfile
(*zipfile)Create a DataContainer from a zipfile that contians the time series data.
replace
(dset[, index, name])Replace a dataset in a container
save
([filetype, username, filename])Save the dataset to a zip file and upload the zip file to swift cloud store.
select
(*[, by_index, by_name])Select a dataset from the container.
- property container
Overview of the content of this data container.
- convert(variable: str, unit: str) None [source]
Convert all variables in the data container to a desired unit.
dc = DataContainer.from_zipfile("output.zip") dc.convert("tas", "degC")
- Parameters:
- variable:
the variable name that needs to be converted
- unit:
the desired unit the variable is converted to
- classmethod from_zipfile(*zipfile: str | Path)[source]
Create a DataContainer from a zipfile that contians the time series data.
dc1 = DataContainer.from_zipfile("output.zip") dataset1 = dc1.select(by_index=0) dc2 = DataContainer.from_zipfile("*") dataset2 = dc2.select(by_index=0)
- Parameters:
- zipfile:
Path to the zip file that contains the saved data, glob pattern like ‘*’ or ‘*.zip’ are also excepted.
- Returns:
- Returns The DataContainer ojbect holding all nessecary information to post-process the time-series data
- replace(dset: Dataset, index: int | None = None, name: str | None = None) None [source]
Replace a dataset in a container
- Parameters:
- dset:
The dataset that needs to be replaced
- index:
The index of the dataset in the container that is replaced
- name:
The name of the model member of the according dataset that is replaced
- save(filetype: str = 'nc', username: str | None = None, filename: str | Path | None = None) None [source]
Save the dataset to a zip file and upload the zip file to swift cloud store.
- Parameters:
- filetype:
The file extension the data is saved to, should be one of [h5, hdf5, nc, csv]
- username:
The username that is used to login to the swift cloud store, if None is given (default), the current system user will be taken.
- filename:
The desired file name of the saved zip file, if None is given (default) an informative filename will be constructed from the metadata.
- select(*, by_index: int | None = None, by_name: str | None = None) Dataset [source]
Select a dataset from the container.
dc = DataContainer.from_zipfile("output.zip") dc.container dataset1 = dc.select(by_index=0) dataset2 = dc.select(by_name="foo_bar")
- Parameters:
- by_index:
select the dataset based on the index in the container.
- by_name:
select the dataset based on the model name in the container.
- Returns:
- Returns xarray dataset
- property variables: List[str]
Returns a list of all variables within the data container.
- class climpact.RunDirectory(files: List[str], variables: List[str], abort_on_error: bool = False, mask: Tuple[str, str] | None = None, reindex: bool = True, parallel: bool = True, **kwargs)[source]
Bases:
Averager
The
RunDirectory
class is a class for reading data.It offers easy control over selecting complex regions defined in geo-reference datasets like shape or geojson files. See also the Api Reference section for more details.
- Parameters:
- files: list[str]
List of files that should be opened by the reader
- variables: list[str]
list variables that should be considered
- shape_file: str, default: None
Path to the shape file that is used to define the mask region. If None given (default) no masking will be applied.
- region: str, default: “”
Select the name of a sub region within the shape file. If None given (default) the whole geometry defined in the shape file is taken.
- mask_method: str, default: centres
String representing the method how the masked region (if given) should be applied. The string can either be centres or corners. Where centres selects only those grid-boxes that have the grid-box centres within the coordinates of the mask region. corners will select grid-boxes that have at least one of the four corners of the grid-boxes within the coordinates of the mask region. This means that corners, compared to centres will slightly increase the selected area.
- mask: tuple[str, int], default: None
If additionally a land or sea region should be mask this variable can be used to set the path to a land-sea mask file (first entry in the tuple) and the type of the mask (second entry - 0: land, 1: sea).
- abort_on_error: bool, default: False
Exit if something goes wrong while loading the dataset
- reindex: bool, default: True
Apply nearest neighbor re-indexing to a larger grid in order to achieve a more precise region selection.
- kwargs:
Additional information about the run
Examples
First we read a dataset without applying a mask at all
from climpact import RunDirectory rd = RundDirectory(["~/orog.nc"], ["orog"]) print(rd.variables) ['orog'] print(type(rd.dataset)) <class 'xarray.core.dataset.Dataset'> print(rd.dataset["orog"]) <xarray.DataArray 'orog' (rlat: 412, rlon: 424)> dask.array<mul, shape=(412, 424), dtype=float64, chunksize=(412, 424), chunktype=numpy.ndarray> Coordinates: lon (rlat, rlon) float64 dask.array<chunksize=(412, 424), meta=np.ndarray> lat (rlat, rlon) float64 dask.array<chunksize=(412, 424), meta=np.ndarray> * rlon (rlon) float64 -28.38 -28.26 -28.16 -28.05 ... 17.93 18.05 18.16 * rlat (rlat) float64 -23.38 -23.26 -23.16 -23.05 ... 21.61 21.73 21.83 Y (rlat, rlon) float64 21.99 22.03 22.07 22.11 ... 66.81 66.75 66.69 X (rlat, rlon) float64 -10.06 -9.964 -9.864 ... 64.55 64.76 64.96 Attributes: standard_name: surface_altitude long_name: Surface Altitude units: m grid_mapping: rotated_pole
Now let’s apply a mask defined in a shape file
from climpact import RunDirectory rd = RundDirectory(["~/orog.nc"], ["orog"], shape_file="Germany.shp") print(rd.dataset["orog"]) <xarray.DataArray 'orog' (rlat: 365, rlon: 275)> dask.array<mul, shape=(365, 275), dtype=float64, chunksize=(365, 275), chunktype=numpy.ndarray> Coordinates: * rlon (rlon) float64 -7.695 -7.673 -7.652 -7.63 ... -1.798 -1.777 -1.755 * rlat (rlat) float64 -3.245 -3.223 -3.201 -3.18 ... 4.631 4.653 4.675 lon (rlat, rlon) float64 dask.array<chunksize=(365, 275), meta=np.ndarray> lat (rlat, rlon) float64 dask.array<chunksize=(365, 275), meta=np.ndarray> Y (rlat, rlon) float64 46.92 46.92 46.92 46.93 ... 55.39 55.39 55.39 X (rlat, rlon) float64 6.713 6.745 6.776 6.807 ... 14.84 14.88 14.92 Attributes: standard_name: surface_altitude long_name: Surface Altitude units: m grid_mapping: rotated_pole
Finally we can only select a sub region in the shape file by giving the key or index to the sub region.
from climpact import RunDirectory rd = RundDirectory(["~/orog.nc"], ["orog"], shape_file="Germany.shp", region="Schaumburg", ) print(rd.dataset["orog"]) <xarray.DataArray 'orog' (rlat: 25, rlon: 25)> dask.array<mul, shape=(25, 25), dtype=float64, chunksize=(25, 25), chunktype=numpy.ndarray> Coordinates: * rlon (rlon) float64 -5.605 -5.587 -5.568 -5.55 ... -5.202 -5.183 -5.165 * rlat (rlat) float64 1.595 1.613 1.632 1.65 ... 1.98 1.998 2.017 2.035 lon (rlat, rlon) float64 dask.array<chunksize=(25, 25), meta=np.ndarray> lat (rlat, rlon) float64 dask.array<chunksize=(25, 25), meta=np.ndarray> Y (rlat, rlon) float64 52.0 52.0 52.0 52.01 ... 52.48 52.49 52.49 X (rlat, rlon) float64 8.876 8.905 8.935 8.964 ... 9.444 9.474 9.504 Attributes: standard_name: surface_altitude long_name: Surface Altitude units: m grid_mapping: rotated_pole
- Attributes:
bounding_box
Get the max boundingbox of non NaN values.
- dataset
The selected sub region multiplied by a land-see mask (if chosen).
- georefdata
Read the geo reference data.
- mask
Create a gridded mask that is calculated from the shape region.
- regionborder
The region border polygon as a cartopy feature.
- shape_data
Shape data in the projection of the netcdf data.
- variables
The variables as they are stored in the dataset.
Methods
fldmean
(data[, dims, land_frac, skipna])CDO equivalent of field mean.
get_extent
(dset[, shape_data])Calculate the lon/lat bounds of the valid data within a dataset.
get_mask
(dset[, method, shape_data])Create a mask region from the shape file information.
get_native_shape_data
(region)Read the given shape-file.
plot_maps
(dset, fig_prefix[, shape_data])Plot map data.
save_data
(file, dset[, filetype])Save the time series data to disk
save_table
(dset, filename)Convert xr.Dataset to pandas dataframe.
set_shape_projection_coords
(dataset)Add coordinates in the projection of the given input shapefile.
seasonal_mean
- property bounding_box: Dict[str, slice]
Get the max boundingbox of non NaN values.
- property dataset: Dataset | None
The selected sub region multiplied by a land-see mask (if chosen).
- earth_radius = 6370000
- property mask: DataArray
Create a gridded mask that is calculated from the shape region.
- plot_maps(dset: Dataset, fig_prefix: str, shape_data: GeoDataFrame | None = None) None [source]
Plot map data.
- Parameters:
- max_index:
The number of timesteps to be considered why plotting the map.
- fig_prefix:
Prefix of the file name that is plotted
- shape_data:
the pol borders that should be plotted.
- property regionborder: ShapelyFeature
The region border polygon as a cartopy feature.
- classmethod save_data(file: ZipFile | str, dset: Dataset, filetype: str = 'nc') Path [source]
Save the time series data to disk
- Parameters:
- file:
The zipfile object where the data is saved to
- dset:
the data that should be saved to file
- filetype:
file type of the output data
- static save_table(dset: Dataset, filename: Path) None [source]
Convert xr.Dataset to pandas dataframe.
- Parameters:
- dset:
xarray input dataset
- filename:
output file path
- set_shape_projection_coords(dataset: Dataset) Dataset [source]
Add coordinates in the projection of the given input shapefile.
- Parameters:
- datasset:
The input dataset the new coordinates are added to
- Returns:
- xarray.Dataset:
Xarray dataset where coordiantes in eastward dirction (X) and northward direction (Y) have been added a coordinates
- property shape_data
Shape data in the projection of the netcdf data.
- property variables: list[str]
The variables as they are stored in the dataset.
- class climpact.Swift(account: str, username: str | None = None, password: str | None = None)[source]
Bases:
object
Estabilish a connection to a dkrz swift cloud store.
- Parameters:
- account: str
Account name that is used as swift login
- username: str, optional
Username to logon if not given (default) set to account name. Which is equivalent to logging on to the personal siwft.
- password: str, optional
Password used to logon to the swift store. If not given (default) and a password is needed a password prompt will ask for the password.
Note
A password will only be needed if no login authorisation token exists or the token has expired.
- Attributes:
- auth_token: str
Swift authentication token.
Methods
get_passwd
([msg, wait_sec])Get a password from the getpass prompt
upload
(inp_dir, container)Upload a folder to a given swift-container.
- static get_passwd(msg: str = 'User password for cloud storage', wait_sec: int = 60) str [source]
Get a password from the getpass prompt
- Parameters:
- msg:
user defined message to be displayed when asking for the password.
- wait_sec:
amount of seconds to wait before terminating the password request.
- Returns:
- the password, or None if terminated.
- property swift_file: Path
File where the information to the swift connection is stored.
- class climpact.UnitConverter[source]
Bases:
object
Methods
DimensionalityError
(units1, units2[, dim1, ...])Raised when trying to convert between incompatible units.
UndefinedUnitError
(unit_names)Raised when the units are not defined in the unit registry.
convert
units
- exception DimensionalityError(units1: Any, units2: Any, dim1: str = '', dim2: str = '', extra_msg: str = '')
Bases:
PintTypeError
Raised when trying to convert between incompatible units.
- dim1: str = ''
- dim2: str = ''
- extra_msg: str = ''
- units1: Any
- units2: Any
- exception UndefinedUnitError(unit_names: str | tuple[str, ...])
Bases:
AttributeError
,PintError
Raised when the units are not defined in the unit registry.
- unit_names: str | tuple[str, ...]
- units = <pint.registry.UnitRegistry object>
- climpact.get_datetime(da: array | DataArray, time: List[datetime] | array, **kwargs: str | bool) List[Timestamp | datetime] [source]
Match a list time timestamps to the time object in a given data array.
- Parameters:
- danumpy.array
time array as xarray.DataArray that contains the target time object
- time: collection
collection of datetime objects that will be matched
- kwargs:
Additional keyword arguments that can be used to overwrite, certain timestamp attributes e.g. year=2020
- Returns
- ——-
- list: list of timestamps with of the same type as da
- climpact.open_mfdataset(inp_files: List[str | Path], thresh: float = 0.1, parallel: bool = True, variables: List[str] | None = None) Dataset [source]
Open multiple datasets.
This is a re-implementation of the xarray.mf_dataset method. The only difference is that is assumes a list of files paths, glob pattern won’t be working. Additionally a threshold for the fraction files that can be currupeted when trying to open datasets can be passed.
- Parameters:
- inp_files: list[str]
List of filepaths to be opened.
- thresh: float, default: 0.1
Fraction of the number of files that can be corrupted, if the fraction is surpassed and error is raised.
- parallel: bool, default: True
Open all files in parallel.
- variables: list, default: None
A list of variable names that have to be present in the combined dataset, if any variables are missing an error is raised. By default (None) no checks for the presence of variables is performed.
- Raises:
- ValueError: if the max fraction of corrupted files is surpassed or not
all variables are present in the merged dataset.