API

Subset operation

daops.ops.subset.subset(collection, time=None, area=None, level=None, time_components=None, output_dir=None, output_type='netcdf', split_method='time:auto', file_namer='standard', apply_fixes=True)[source]

Subset input dataset according to parameters. Can be subsetted by level, area and time.

Parameters
  • collection (Collection of datasets to process, sequence or string of) – comma-separated dataset identifiers.

  • time (Time interval (defined by start/end) or time series (a sequence of) – datetime values) to subset over. Datetimes are typically provided as strings.

  • area (Area to subset over, sequence or string of comma separated lat and lon) – bounds. Must contain 4 values.

  • level (Level interval (defined by start/end) or level series (a sequence of) – values) to subset over. Levels are typically provided as integers or floats.

  • time_compoonents (Time components to filter on: year, month, day, hour, minute, second)

  • output_dir (str or path like object describing output directory for output files.)

  • output_type ({“netcdf”, “nc”, “zarr”, “xarray”})

  • split_method ({“time:auto”})

  • file_namer ({“standard”, “simple”})

  • apply_fixes (Boolean. If True fixes will be applied to datasets if needed. Default is True.)

Returns

List of outputs in the selected type (a list of xarray Datasets or file paths.)

Examples

collection: (“cmip6.ukesm1.r1.gn.tasmax.v20200101”,)
time: (“1999-01-01T00:00:00”, “2100-12-30T00:00:00”)
area: (-5.,49.,10.,65)
level: (1000.,)
time_components: {“month”: [“dec”, “jan”, “feb”]}
output_type: “netcdf”
output_dir: “/cache/wps/procs/req0111”
split_method: “time:auto”
file_namer: “standard”
apply_fixes: True

Average operation

daops.ops.average.average_over_dims(collection, dims=None, ignore_undetected_dims=False, output_dir=None, output_type='netcdf', split_method='time:auto', file_namer='standard', apply_fixes=True)[source]

Average input dataset according over indicated dimensions. Can be averaged over multiple dimensions.

Parameters
  • collection (Collection of datasets to process, sequence or string of comma separated dataset identifiers.)

  • dims (list of dims to average over or None.)

  • ignore_undetected_dims (Boolean. If False exception will be raised if requested dims do not exist in the dataset)

  • If True missing dims will be ignored.

  • output_dir (str or path like object describing output directory for output files.)

  • output_type ({“netcdf”, “nc”, “zarr”, “xarray”})

  • split_method ({“time:auto”})

  • file_namer ({“standard”, “simple”})

  • apply_fixes (Boolean. If True fixes will be applied to datasets if needed. Default is True.)

Returns

List of outputs in the selected type (a list of xarray Datasets or file paths.)

Examples

collection: (“cmip6.ukesm1.r1.gn.tasmax.v20200101”,)
dims: [“time”, “lat”]
ignore_undetected_dims: (-5.,49.,10.,65)
output_type: “netcdf”
output_dir: “/cache/wps/procs/req0111”
split_method: “time:auto”
file_namer: “standard”
apply_fixes: True

Utilities

daops.utils.consolidate.consolidate(collection, **kwargs)[source]

Finds the file paths relating to each input dataset. If a time range has been supplied then only the files relating to this time range are recorded.

Parameters
  • collection – (roocs_utils.CollectionParameter) The collection of datasets to process.

  • kwargs – Arguments of the operation taking place e.g. subset, average, or re-grid.

Returns

An ordered dictionary of each dataset from the collection argument and the file paths relating to it.

daops.utils.consolidate.get_files_matching_time_range(time_param, file_paths)[source]

Using the settings in time_param, examine each file to see if it contains years that are in the requested range.

The time_param can have three types:
  1. type: “interval”: - defined with “start_time” and “end_time”

  2. type: “series”: - defined with a list of “time_values”

  3. type: “none”: - undefined

It attempts to filter out files that do not match the selected year. For any file that we cannot do this with, the file will be read by xarray.

Parameters
  • time_param (TimeParameter) – time parameter of requested date/times

  • file_paths (list) – list of file paths

Returns

file_paths (list) – filtered list of file paths

daops.utils.consolidate.get_year(value, default)[source]

Gets a year from a datetime string. Defaults to the value of default if not defined.

daops.utils.consolidate.get_years_from_file(fpath)[source]

Attempts to extract years from a file. First by examining the file name. If that doesn’t work then it reads the file contents and looks at the time axis.

Returns a set of years.

daops.utils.consolidate.to_year(time_string)[source]

Returns the year in a time string as an integer.

class daops.utils.core.Characterised(dset)[source]

Bases: daops.utils.base_lookup.Lookup

Characterisation lookup class to look up whether a dataset has been characterised.

lookup_characterisation()[source]

Attempts to find datasets in the characterisation store. Returns True if they exist in the store, returns False if not.

daops.utils.core.is_characterised(collection, require_all=False)[source]

Takes in a collection (an individual data reference or a sequence of them). Returns an ordered dictionary of a collection of ids with a boolean value for each stating whether the dataset has been characterised.

If require_all is True: return a single Boolean value.

Parameters
  • collection – one or more data references

  • require_all – Boolean to require that all must be characterised

Returns

Ordered Dictionary OR Boolean (if require_all is True)

daops.utils.core.open_dataset(ds_id, file_paths, apply_fixes=True)[source]

Opens an xarray Dataset and applies fixes if requested. Fixes are applied to the data either before or after the dataset is opened. Whether a fix is a ‘pre-processor’ or ‘post-processor’ is defined in the fix itself.

Parameters
  • ds_id – Dataset identifier in the form of a drs id e.g. cmip5.output1.INM.inmcm4.rcp45.mon.ocean.Omon.r1i1p1.latest.zostoga

  • file_paths – (list) The file paths corresponding to the ds id.

  • apply_fixes – Boolean. If True fixes will be applied to datasets if needed. Default is True.

Returns

xarray Dataset with fixes applied to the data.

class daops.utils.base_lookup.Lookup(dset)[source]

Bases: object

Base class used for looking up datasets in the elasticsearch indexes.

convert_to_ds_id()[source]

Converts the input dataset to a drs id form to use with the elasticsearch index.

class daops.utils.fixer.Fixer(dset)[source]

Bases: daops.utils.base_lookup.Lookup

Fixer class to look up fixes to apply to input dataset from the elastic search index. Gathers fixes into pre and post processors. Pre-process fixes are chained together to allow them to be executed with one call.

class daops.utils.fixer.FuncChainer(funcs)[source]

Bases: object

Chains functions together to allow them to be executed in one call.

class daops.utils.normalise.ResultSet(inputs=None)[source]

Bases: object

A class to hold the results from an operation e.g. subset

add(dset, result)[source]

Adds outputs to an ordered dictionary with the ds id as the key. If the output is a file path this is also added to the file_paths variable so a list of file paths can be accessed independently.

daops.utils.normalise.normalise(collection, apply_fixes=True)[source]

Takes file paths and opens and fixes the dataset they make up.

Parameters
  • collection – Ordered dictionary of ds ids and their related file paths.

  • apply_fixes – Boolean. If True fixes will be applied to datasets if needed. Default is True.

Returns

An ordered dictionary of ds ids and their fixed xarray Dataset.

Data Utilities

daops.data_utils.attr_utils.add_global_attrs_if_needed(ds_id, ds, **operands)[source]
Parameters
  • ds – Xarray DataSet

  • operands – sequence of arguments

Returns

Xarray Dataset

Add a global attribute if it doesn’t already exist.

daops.data_utils.attr_utils.edit_global_attrs(ds_id, ds, **operands)[source]
Parameters
  • ds – Xarray DataSet

  • operands – sequence of arguments

Returns

Xarray DataArray

Change the gloabl attributes.

daops.data_utils.attr_utils.edit_var_attrs(ds_id, ds, **operands)[source]
Parameters
  • ds – Xarray DataSet

  • operands – sequence of arguments

Returns

Xarray Dataset

Change the attributes of a variable.

daops.data_utils.attr_utils.remove_coord_attr(ds_id, ds, **operands)[source]
Parameters
  • ds – Xarray DataSet

  • operands – sequence of arguments

Returns

Xarray Dataset

Remove coordinate attribute that is added by xarray, for specified variables.

daops.data_utils.coord_utils.add_coord(ds_id, ds, **operands)[source]
Parameters
  • ds – Xarray DataSet

  • operands – sequence of arguments

Returns

Xarray DataArray

Add a coordinate.

daops.data_utils.coord_utils.add_scalar_coord(ds_id, ds, **operands)[source]
Parameters
  • ds – Xarray DataSet

  • operands – sequence of arguments

Returns

Xarray Dataset

Add a scalar coordinate.

daops.data_utils.coord_utils.squeeze_dims(ds_id, ds, **operands)[source]
Parameters
  • ds – Xarray Dataset

  • operands – (dict) Arguments for fix. Dims (list) to remove.

Returns

Xarray Dataset

daops.data_utils.var_utils.add_data_var(ds_id, ds, **operands)[source]
Parameters
  • ds – Xarray DataSet

  • operands – sequence of arguments

Returns

Xarray Dataset

Add a data variable.

Processor

daops.processor.dispatch(operation, dset, **kwargs)[source]
daops.processor.process(operation, dset, mode='serial', **kwargs)[source]

Runs the processing operation on the dataset in the correct mode (in series or parallel).