API

Subset operation

Subset operation.

daops.ops.subset.subset(collection, time=None, area=None, level=None, time_components=None, output_dir=None, output_type='netcdf', split_method='time:auto', file_namer='standard', apply_fixes=True)[source]

Subset input dataset according to parameters.

Can be subsetted by level, area, and time.

Parameters:
  • collection (Collection of datasets to process, sequence or string of) – comma-separated dataset identifiers.

  • time (Time interval (defined by start/end) or time series (a sequence of) – datetime values) to subset over. Datetimes are typically provided as strings.

  • area (Area to subset over, sequence or string of comma separated lat and lon) – bounds. Must contain 4 values.

  • level (Level interval (defined by start/end) or level series (a sequence of) – values) to subset over. Levels are typically provided as integers or floats.

  • time_compoonents (Time components to filter on: year, month, day, hour, minute, second)

  • output_dir (str or path like object describing output directory for output files.)

  • output_type ({“netcdf”, “nc”, “zarr”, “xarray”})

  • split_method ({“time:auto”})

  • file_namer ({“standard”, “simple”})

  • apply_fixes (Boolean. If True fixes will be applied to datasets if needed. Default is True.)

Returns:

List of outputs in the selected type (a list of xarray Datasets or file paths.)

Examples

collection: (“cmip6.ukesm1.r1.gn.tasmax.v20200101”,)
time: (“1999-01-01T00:00:00”, “2100-12-30T00:00:00”)
area: (-5.,49.,10.,65)
level: (1000.,)
time_components: {“month”: [“dec”, “jan”, “feb”]}
output_type: “netcdf”
output_dir: “/cache/wps/procs/req0111”
split_method: “time:auto”
file_namer: “standard”
apply_fixes: True

Average operation

Operations for averaging data over dimensions, shape or time.

daops.ops.average.average_over_dims(collection, dims=None, ignore_undetected_dims=False, output_dir=None, output_type='netcdf', split_method='time:auto', file_namer='standard', apply_fixes=True)[source]

Average input dataset according to indicated dimensions.

Can be averaged over multiple dimensions.

Parameters:
  • collection (Collection of datasets to process, sequence or string of comma separated dataset identifiers.)

  • dims (list of dims to average over or None.)

  • ignore_undetected_dims (Boolean. If False exception will be raised if requested dims do not exist in the dataset)

  • If True missing dims will be ignored.

  • output_dir (str or path like object describing output directory for output files.)

  • output_type ({“netcdf”, “nc”, “zarr”, “xarray”})

  • split_method ({“time:auto”})

  • file_namer ({“standard”, “simple”})

  • apply_fixes (Boolean. If True fixes will be applied to datasets if needed. Default is True.)

Returns:

List of outputs in the selected type (a list of xarray Datasets or file paths.)

Examples

collection: (“cmip6.ukesm1.r1.gn.tasmax.v20200101”)
dims: [“time”, “lat”]
ignore_undetected_dims: (-5.,49.,10.,65)
output_type: “netcdf”
output_dir: “/cache/wps/procs/req0111”
split_method: “time:auto”
file_namer: “standard”
apply_fixes: True
daops.ops.average.average_shape(collection, shape, variable=None, output_dir=None, output_type='netcdf', split_method='time:auto', file_namer='standard', apply_fixes=True)[source]

Average input dataset over indicated shape.

Parameters:
  • collection (Collection of datasets to process, sequence or string of comma separated dataset identifiers.)

  • shape (Path to shape file, or directly a geodataframe to perform average within.)

  • variable (Variables to average. If None, average over all data variables.)

  • output_dir (str or path like object describing output directory for output files.)

  • output_type ({“netcdf”, “nc”, “zarr”, “xarray”})

  • split_method ({“time:auto”})

  • file_namer ({“standard”, “simple”})

  • apply_fixes (Boolean. If True fixes will be applied to datasets if needed. Default is True.)

Returns:

List of outputs in the selected type (a list of xarray Datasets or file paths.)

Examples

collection: (“cmip6.cmip.cas.fgoals-g3.historical.r1i1p1fi.Amon.tas.gn.v20190818”)
shape: “path_to_shape”
ignore_undetected_dims: (-5.,49.,10.,65)
output_type: “netcdf”
output_dir: “/cache/wps/procs/req0111”
split_method: “time:auto”
file_namer: “standard”
apply_fixes: True
daops.ops.average.average_time(collection, freq='year', output_dir=None, output_type='netcdf', split_method='time:auto', file_namer='standard', apply_fixes=True)[source]

Average input dataset according to indicated frequency.

Parameters:
  • collection (Collection of datasets to process, sequence or string of comma separated dataset identifiers.)

  • freq (Frequency to average over {“day”, “month”, “year”})

  • output_dir (str or path like object describing output directory for output files.)

  • output_type ({“netcdf”, “nc”, “zarr”, “xarray”})

  • split_method ({“time:auto”})

  • file_namer ({“standard”, “simple”})

  • apply_fixes (Boolean. If True fixes will be applied to datasets if needed. Default is True.)

Returns:

List of outputs in the selected type (a list of xarray Datasets or file paths.)

Examples

collection: (“cmip6.ukesm1.r1.gn.tasmax.v20200101”,)
freq: “month”
output_type: “netcdf”
output_dir: “/cache/wps/procs/req0111”
split_method: “time:auto”
file_namer: “standard”
apply_fixes: True

Utilities

Consolidate file paths for each dataset in a collection.

daops.utils.consolidate.consolidate(collection, **kwargs)[source]

Find the file paths relating to each input dataset.

If a time range has been supplied then only the files relating to this time range are recorded.

Parameters:
  • collection – (clisops.parameter.CollectionParameter) The collection of datasets to process.

  • kwargs – Arguments of the operation taking place e.g. subset, average, or re-grid.

Returns:

An ordered dictionary of each dataset from the collection argument and the file paths relating to it.

daops.utils.consolidate.get_files_matching_time_range(time_param, file_paths)[source]

Examine each file to see if it contains years that are in the requested range.

Uses the settings in time_param.

The time_param can have three types:
  1. type: “interval”: - defined with “start_time” and “end_time”

  2. type: “series”: - defined with a list of “time_values”

  3. type: “none”: - undefined

It attempts to filter out files that do not match the selected year. For any file that we cannot do this with, the file will be read by xarray.

Parameters:
  • time_param (TimeParameter) – time parameter of requested date/times

  • file_paths (list) – list of file paths

Returns:

file_paths (list) – filtered list of file paths

daops.utils.consolidate.get_year(value, default)[source]

Get a year from a datetime string.

Defaults to the value of default if not defined.

daops.utils.consolidate.get_years_from_file(fpath)[source]

Attempt to extract years from a file.

First by examining the file name. If that doesn’t work then it reads the file contents and looks at the time axis.

Returns a set of years.

daops.utils.consolidate.to_year(time_string)[source]

Return the year in a time string as an integer.

Utility functions for the DAOPS package.

class daops.utils.core.Characterised(dset)[source]

Bases: Lookup

Characterisation lookup class to look up whether a dataset has been characterised.

lookup_characterisation()[source]

Attempt to find datasets in the characterisation store.

Returns True if they exist in the store, returns False if not.

daops.utils.core.is_characterised(collection, require_all=False)[source]

Intake a collection (an individual data reference or a sequence of them).

Returns an ordered dictionary of a collection of ids with a boolean value for each stating whether the dataset has been characterised.

If require_all is True: return a single Boolean value.

Parameters:
  • collection – one or more data references

  • require_all – Boolean to require that all must be characterised

Returns:

Ordered Dictionary OR Boolean (if require_all is True)

daops.utils.core.open_dataset(ds_id, file_paths, apply_fixes=True)[source]

Open an xarray Dataset and apply fixes if requested.

Fixes are applied to the data either before or after the dataset is opened. Whether a fix is a ‘pre-processor’ or ‘post-processor’ is defined in the fix itself.

Parameters:
  • ds_id – Dataset identifier in the form of a drs id e.g. cmip5.output1.INM.inmcm4.rcp45.mon.ocean.Omon.r1i1p1.latest.zostoga

  • file_paths – (list) The file paths corresponding to the ds id.

  • apply_fixes – Boolean. If True fixes will be applied to datasets if needed. Default is True.

Returns:

xarray Dataset with fixes applied to the data.

Base class used for looking up datasets in the elasticsearch indexes.

class daops.utils.base_lookup.Lookup(dset)[source]

Bases: object

Base class used for looking up datasets in the elasticsearch indexes.

convert_to_ds_id()[source]

Convert the input dataset to a drs id form to use with the elasticsearch index.

Apply fixes to input dataset from the elastic search index.

class daops.utils.fixer.Fixer(dset)[source]

Bases: Lookup

Fixer class to look up fixes to apply to input dataset from the elastic search index.

Gathers fixes into pre- and post-processors. Pre-process fixes are chained together to allow them to be executed with one call.

class daops.utils.fixer.FuncChainer(funcs)[source]

Bases: object

Chains functions together to allow them to be executed in one call.

Normalise datasets.

class daops.utils.normalise.ResultSet(inputs=None)[source]

Bases: object

A class to hold the results from an operation e.g. subset.

add(dset, result)[source]

Add outputs to an ordered dictionary with the ds id as the key.

If the output is a file path this is also added to the file_paths variable so a list of file paths can be accessed independently.

daops.utils.normalise.normalise(collection, apply_fixes=True)[source]

Take file paths, then open and fix the datasets they make up.

Parameters:
  • collection – Ordered dictionary of ds ids and their related file paths.

  • apply_fixes – Boolean. If True fixes will be applied to datasets if needed. Default is True.

Returns:

An ordered dictionary of ds ids and their fixed xarray Dataset.

Data Utilities

Module for editing the attributes of a dataset.

daops.data_utils.attr_utils.add_global_attrs_if_needed(ds_id, ds, **operands)[source]

Add the global attrs, if needed.

Parameters:
  • ds_id (str) – Dataset ID.

  • ds (xarray.Dataset) – A Dataset.

  • operands (dict) – Dictionary containing the new attributes for the dataset.

Returns:

xarray.Dataset

daops.data_utils.attr_utils.edit_global_attrs(ds_id, ds, **operands)[source]

Edit the global attrs.

Parameters:
  • ds_id (str) – Dataset ID.

  • ds (xarray.Dataset) – A Dataset.

  • operands (dict) – Dictionary containing the new attributes for the dataset.

Returns:

xarray.Dataset

daops.data_utils.attr_utils.edit_var_attrs(ds_id, ds, **operands)[source]

Edit the variable attrs.

Parameters:
  • ds_id (str) – Dataset ID.

  • ds (xarray.Dataset) – A Dataset.

  • operands (dict) – Dictionary containing the new attributes for the variable.

Returns:

xarray.Dataset

daops.data_utils.attr_utils.remove_coord_attr(ds_id, ds, **operands)[source]

Remove the coordinate attr from the dataset.

Parameters:
  • ds_id (str) – Dataset ID.

  • ds (xarray.Dataset) – A Dataset.

  • operands (dict) – Dictionary containing the new attributes for the dataset.

Returns:

xarray.Dataset

Common utility functions for data operations.

daops.data_utils.common_utils.handle_derive_str(value, ds_id, ds)[source]

Handle the derive string.

Coordinate operations.

daops.data_utils.coord_utils.add_coord(ds_id, ds, **operands)[source]

Add a coordinate.

Parameters:
  • ds_id (str) – Dataset ID.

  • ds (xarray.Dataset) – A Dataset.

  • operands (dict) – Dictionary containing the new coordinate.

Returns:

xarray.Dataset

daops.data_utils.coord_utils.add_scalar_coord(ds_id, ds, **operands)[source]

Add a scalar coordinate.

Parameters:
  • ds_id (str) – Dataset ID.

  • ds (xarray.Dataset) – A Dataset.

  • operands (dict) – Dictionary containing the new coordinate.

Returns:

xarray.Dataset

daops.data_utils.coord_utils.squeeze_dims(ds_id, ds, **operands)[source]

Squeeze dimensions from dataset.

Parameters:
  • ds_id (str) – Dataset ID. Unused in this function.

  • ds (xarray.Dataset) – A Dataset.

  • operands (dict) – Dictionary containing the dimensions to remove.

Returns:

xarray.Dataset

Module to add a data variable to a dataset.

daops.data_utils.var_utils.add_data_var(ds_id, ds, **operands)[source]

Add a data variable.

Parameters:
  • ds_id (str) – Dataset ID. Unused in this function.

  • ds (xarray.Dataset) – A Dataset.

  • operands (dict) – Dictionary containing the new data variable.

Returns:

xarray.Dataset

Processor

Module to dispatch the processing operation to the correct mode (serial or parallel).

daops.processor.dispatch(operation, dset, **kwargs)[source]

Dispatch the operation to the correct mode (serial or parallel).

daops.processor.process(operation, dset, mode='serial', **kwargs)[source]

Run the processing operation on the dataset in the correct mode (in series or parallel).