API

Subset operation

daops.ops.subset.subset(collection, time=None, area=None, level=None, output_dir=None, output_type='netcdf', split_method='time:auto', file_namer='standard')[source]

Subset input dataset according to parameters. Can be subsetted by level, area and time.

Parameters
  • collection (Collection of datasets to process, sequence or string of comma separated dataset identifiers.)

  • time (Time period - Time range to subset over, sequence of two time values or string of two / separated time values)

  • area (Area to subset over, sequence or string of comma separated lat and lon bounds. Must contain 4 values.)

  • level (Level range - Level values to subset over, sequence of two level values or string of two / separated level values)

  • output_dir (str or path like object describing output directory for output files.)

  • output_type ({“netcdf”, “nc”, “zarr”, “xarray”})

  • split_method ({“time:auto”})

  • file_namer ({“standard”, “simple”})

Returns

List of outputs in the selected type (a list of xarray Datasets or file paths.)

Examples

collection: (“cmip6.ukesm1.r1.gn.tasmax.v20200101”,)
time: (“1999-01-01T00:00:00”, “2100-12-30T00:00:00”)
area: (-5.,49.,10.,65)
level: (1000.,)
output_type: “netcdf”
output_dir: “/cache/wps/procs/req0111”
split_method: “time:decade”
file_namer: “facet_namer”

Utilities

daops.utils.consolidate.consolidate(collection, **kwargs)[source]

Finds the file paths relating to each input dataset. If a time range has been supplied then only the files relating to this time range are recorded.

Parameters
  • collection – (roocs_utils.CollectionParameter) The collection of datasets to process.

  • kwargs – Arguments of the operation taking place e.g. subset, average, or re-grid.

Returns

An ordered dictionary of each dataset from the collection argument and the file paths relating to it.

daops.utils.consolidate.convert_to_ds_id(dset)[source]

Converts the input dataset to a drs id form to use with the elasticsearch index.

Parameters

dset – Dataset to process. Formats currently accepted are file paths and paths to directories.

Returns

The ds id for the input dataset.

daops.utils.core.is_characterised(collection, require_all=False)[source]

Takes in a collection (an individual data reference or a sequence of them). Returns an ordered dictionary of a collection of ids with a boolean value for each stating whether the dataset has been characterised.

If require_all is True: return a single Boolean value.

Parameters
  • collection – one or more data references

  • require_all – Boolean to require that all must be characterised

Returns

Ordered Dictionary OR Boolean (if require_all is True)

daops.utils.core.is_dataref_characterised(dset)[source]
daops.utils.core.open_dataset(ds_id, file_paths)[source]

Opens an xarray Dataset and applies fixes if required. Fixes are applied to the data either before or after the dataset is opened. Whether a fix is a ‘pre-processor’ or ‘post-processor’ is defined in the fix itself.

Parameters
  • ds_id – Dataset identifier in the form of a drs id e.g. cmip5.output1.INM.inmcm4.rcp45.mon.ocean.Omon.r1i1p1.latest.zostoga

  • file_paths – (list) The file paths corresponding to the ds id.

Returns

xarray Dataset with fixes applied to the data.

class daops.utils.fixer.Fixer(ds_id)[source]

Bases: object

Fixer class to look up fixes to apply to input dataset from the elastic search index. Gathers fixes into pre and post processors. Pre-process fixes are chained together to allow them to be executed with one call.

class daops.utils.fixer.FuncChainer(funcs)[source]

Bases: object

Chains functions together to allow them to be executed in one call.

class daops.utils.normalise.ResultSet(inputs=None)[source]

Bases: object

A class to hold the results from an operation e.g. subset

add(dset, result)[source]

Adds outputs to an ordered dictionary with the ds id as the key. If the output is a file path this is also added to the file_paths variable so a list of file paths can be accessed independently.

daops.utils.normalise.normalise(collection)[source]

Takes file paths and opens and fixes the dataset they make up.

Parameters

collection – Ordered dictionary of ds ids and their related file paths.

Returns

An ordered dictionary of ds ids and their fixed xarray Dataset.

Data Utilities

daops.data_utils.coord_utils.add_scalar_coord(ds, **operands)[source]
Parameters
  • ds – Xarray Dataset

  • operands – (dict) Arguments for fix. Id, value and data type of scalar coordinate to add.

Returns

Xarray Dataset

daops.data_utils.coord_utils.squeeze_dims(ds, **operands)[source]
Parameters
  • ds – Xarray Dataset

  • operands – (dict) Arguments for fix. Dims (list) to remove.

Returns

Xarray Dataset

Processor

daops.processor.dispatch(operation, dset, **kwargs)[source]
daops.processor.process(operation, dset, mode='serial', **kwargs)[source]

Runs the processing operation on the dataset in the correct mode (in series or parallel).