API
Subset operation
Subset operation.
- daops.ops.subset.subset(collection, time=None, area=None, level=None, time_components=None, output_dir=None, output_type='netcdf', split_method='time:auto', file_namer='standard', apply_fixes=True)[source]
Subset input dataset according to parameters.
Can be subsetted by level, area, and time.
- Parameters:
collection (Collection of datasets to process, sequence or string of) – comma-separated dataset identifiers.
time (Time interval (defined by start/end) or time series (a sequence of) – datetime values) to subset over. Datetimes are typically provided as strings.
area (Area to subset over, sequence or string of comma separated lat and lon) – bounds. Must contain 4 values.
level (Level interval (defined by start/end) or level series (a sequence of) – values) to subset over. Levels are typically provided as integers or floats.
time_compoonents (Time components to filter on: year, month, day, hour, minute, second)
output_dir (str or path like object describing output directory for output files.)
output_type ({“netcdf”, “nc”, “zarr”, “xarray”})
split_method ({“time:auto”})
file_namer ({“standard”, “simple”})
apply_fixes (Boolean. If True fixes will be applied to datasets if needed. Default is True.)
- Returns:
List of outputs in the selected type (a list of xarray Datasets or file paths.)
Examples
collection: (“cmip6.ukesm1.r1.gn.tasmax.v20200101”,)time: (“1999-01-01T00:00:00”, “2100-12-30T00:00:00”)area: (-5.,49.,10.,65)level: (1000.,)time_components: {“month”: [“dec”, “jan”, “feb”]}output_type: “netcdf”output_dir: “/cache/wps/procs/req0111”split_method: “time:auto”file_namer: “standard”apply_fixes: True
Average operation
Operations for averaging data over dimensions, shape or time.
- daops.ops.average.average_over_dims(collection, dims=None, ignore_undetected_dims=False, output_dir=None, output_type='netcdf', split_method='time:auto', file_namer='standard', apply_fixes=True)[source]
Average input dataset according to indicated dimensions.
Can be averaged over multiple dimensions.
- Parameters:
collection (Collection of datasets to process, sequence or string of comma separated dataset identifiers.)
dims (list of dims to average over or None.)
ignore_undetected_dims (Boolean. If False exception will be raised if requested dims do not exist in the dataset)
If True missing dims will be ignored.
output_dir (str or path like object describing output directory for output files.)
output_type ({“netcdf”, “nc”, “zarr”, “xarray”})
split_method ({“time:auto”})
file_namer ({“standard”, “simple”})
apply_fixes (Boolean. If True fixes will be applied to datasets if needed. Default is True.)
- Returns:
List of outputs in the selected type (a list of xarray Datasets or file paths.)
Examples
collection: (“cmip6.ukesm1.r1.gn.tasmax.v20200101”)dims: [“time”, “lat”]ignore_undetected_dims: (-5.,49.,10.,65)output_type: “netcdf”output_dir: “/cache/wps/procs/req0111”split_method: “time:auto”file_namer: “standard”apply_fixes: True
- daops.ops.average.average_shape(collection, shape, variable=None, output_dir=None, output_type='netcdf', split_method='time:auto', file_namer='standard', apply_fixes=True)[source]
Average input dataset over indicated shape.
- Parameters:
collection (Collection of datasets to process, sequence or string of comma separated dataset identifiers.)
shape (Path to shape file, or directly a geodataframe to perform average within.)
variable (Variables to average. If None, average over all data variables.)
output_dir (str or path like object describing output directory for output files.)
output_type ({“netcdf”, “nc”, “zarr”, “xarray”})
split_method ({“time:auto”})
file_namer ({“standard”, “simple”})
apply_fixes (Boolean. If True fixes will be applied to datasets if needed. Default is True.)
- Returns:
List of outputs in the selected type (a list of xarray Datasets or file paths.)
Examples
collection: (“cmip6.cmip.cas.fgoals-g3.historical.r1i1p1fi.Amon.tas.gn.v20190818”)shape: “path_to_shape”ignore_undetected_dims: (-5.,49.,10.,65)output_type: “netcdf”output_dir: “/cache/wps/procs/req0111”split_method: “time:auto”file_namer: “standard”apply_fixes: True
- daops.ops.average.average_time(collection, freq='year', output_dir=None, output_type='netcdf', split_method='time:auto', file_namer='standard', apply_fixes=True)[source]
Average input dataset according to indicated frequency.
- Parameters:
collection (Collection of datasets to process, sequence or string of comma separated dataset identifiers.)
freq (Frequency to average over {“day”, “month”, “year”})
output_dir (str or path like object describing output directory for output files.)
output_type ({“netcdf”, “nc”, “zarr”, “xarray”})
split_method ({“time:auto”})
file_namer ({“standard”, “simple”})
apply_fixes (Boolean. If True fixes will be applied to datasets if needed. Default is True.)
- Returns:
List of outputs in the selected type (a list of xarray Datasets or file paths.)
Examples
collection: (“cmip6.ukesm1.r1.gn.tasmax.v20200101”,)freq: “month”output_type: “netcdf”output_dir: “/cache/wps/procs/req0111”split_method: “time:auto”file_namer: “standard”apply_fixes: True
Utilities
Consolidate file paths for each dataset in a collection.
- daops.utils.consolidate.consolidate(collection, **kwargs)[source]
Find the file paths relating to each input dataset.
If a time range has been supplied then only the files relating to this time range are recorded.
- Parameters:
collection – (clisops.parameter.CollectionParameter) The collection of datasets to process.
kwargs – Arguments of the operation taking place e.g. subset, average, or re-grid.
- Returns:
An ordered dictionary of each dataset from the collection argument and the file paths relating to it.
- daops.utils.consolidate.get_files_matching_time_range(time_param, file_paths)[source]
Examine each file to see if it contains years that are in the requested range.
Uses the settings in time_param.
- The time_param can have three types:
type: “interval”: - defined with “start_time” and “end_time”
type: “series”: - defined with a list of “time_values”
type: “none”: - undefined
It attempts to filter out files that do not match the selected year. For any file that we cannot do this with, the file will be read by xarray.
- Parameters:
time_param (TimeParameter) – time parameter of requested date/times
file_paths (list) – list of file paths
- Returns:
file_paths (list) – filtered list of file paths
- daops.utils.consolidate.get_year(value, default)[source]
Get a year from a datetime string.
Defaults to the value of default if not defined.
- daops.utils.consolidate.get_years_from_file(fpath)[source]
Attempt to extract years from a file.
First by examining the file name. If that doesn’t work then it reads the file contents and looks at the time axis.
Returns a set of years.
- daops.utils.consolidate.to_year(time_string)[source]
Return the year in a time string as an integer.
Utility functions for the DAOPS package.
- class daops.utils.core.Characterised(dset)[source]
Bases:
LookupCharacterisation lookup class to look up whether a dataset has been characterised.
- lookup_characterisation()[source]
Attempt to find datasets in the characterisation store.
Returns True if they exist in the store, returns False if not.
- daops.utils.core.is_characterised(collection, require_all=False)[source]
Intake a collection (an individual data reference or a sequence of them).
Returns an ordered dictionary of a collection of ids with a boolean value for each stating whether the dataset has been characterised.
If require_all is True: return a single Boolean value.
- Parameters:
collection – one or more data references
require_all – Boolean to require that all must be characterised
- Returns:
Ordered Dictionary OR Boolean (if require_all is True)
- daops.utils.core.open_dataset(ds_id, file_paths, apply_fixes=True)[source]
Open an xarray Dataset and apply fixes if requested.
Fixes are applied to the data either before or after the dataset is opened. Whether a fix is a ‘pre-processor’ or ‘post-processor’ is defined in the fix itself.
- Parameters:
ds_id – Dataset identifier in the form of a drs id e.g. cmip5.output1.INM.inmcm4.rcp45.mon.ocean.Omon.r1i1p1.latest.zostoga
file_paths – (list) The file paths corresponding to the ds id.
apply_fixes – Boolean. If True fixes will be applied to datasets if needed. Default is True.
- Returns:
xarray Dataset with fixes applied to the data.
Base class used for looking up datasets in the elasticsearch indexes.
- class daops.utils.base_lookup.Lookup(dset)[source]
Bases:
objectBase class used for looking up datasets in the elasticsearch indexes.
- convert_to_ds_id()[source]
Convert the input dataset to a drs id form to use with the elasticsearch index.
Apply fixes to input dataset from the elastic search index.
- class daops.utils.fixer.Fixer(dset)[source]
Bases:
LookupFixer class to look up fixes to apply to input dataset from the elastic search index.
Gathers fixes into pre- and post-processors. Pre-process fixes are chained together to allow them to be executed with one call.
- class daops.utils.fixer.FuncChainer(funcs)[source]
Bases:
objectChains functions together to allow them to be executed in one call.
Normalise datasets.
- class daops.utils.normalise.ResultSet(inputs=None)[source]
Bases:
objectA class to hold the results from an operation e.g. subset.
- add(dset, result)[source]
Add outputs to an ordered dictionary with the ds id as the key.
If the output is a file path this is also added to the file_paths variable so a list of file paths can be accessed independently.
- daops.utils.normalise.normalise(collection, apply_fixes=True)[source]
Take file paths, then open and fix the datasets they make up.
- Parameters:
collection – Ordered dictionary of ds ids and their related file paths.
apply_fixes – Boolean. If True fixes will be applied to datasets if needed. Default is True.
- Returns:
An ordered dictionary of ds ids and their fixed xarray Dataset.
Data Utilities
Module for editing the attributes of a dataset.
- daops.data_utils.attr_utils.add_global_attrs_if_needed(ds_id, ds, **operands)[source]
Add the global attrs, if needed.
- Parameters:
ds_id (str) – Dataset ID.
ds (xarray.Dataset) – A Dataset.
operands (dict) – Dictionary containing the new attributes for the dataset.
- Returns:
xarray.Dataset
- daops.data_utils.attr_utils.edit_global_attrs(ds_id, ds, **operands)[source]
Edit the global attrs.
- Parameters:
ds_id (str) – Dataset ID.
ds (xarray.Dataset) – A Dataset.
operands (dict) – Dictionary containing the new attributes for the dataset.
- Returns:
xarray.Dataset
- daops.data_utils.attr_utils.edit_var_attrs(ds_id, ds, **operands)[source]
Edit the variable attrs.
- Parameters:
ds_id (str) – Dataset ID.
ds (xarray.Dataset) – A Dataset.
operands (dict) – Dictionary containing the new attributes for the variable.
- Returns:
xarray.Dataset
- daops.data_utils.attr_utils.remove_coord_attr(ds_id, ds, **operands)[source]
Remove the coordinate attr from the dataset.
- Parameters:
ds_id (str) – Dataset ID.
ds (xarray.Dataset) – A Dataset.
operands (dict) – Dictionary containing the new attributes for the dataset.
- Returns:
xarray.Dataset
Common utility functions for data operations.
- daops.data_utils.common_utils.handle_derive_str(value, ds_id, ds)[source]
Handle the derive string.
Coordinate operations.
- daops.data_utils.coord_utils.add_coord(ds_id, ds, **operands)[source]
Add a coordinate.
- Parameters:
ds_id (str) – Dataset ID.
ds (xarray.Dataset) – A Dataset.
operands (dict) – Dictionary containing the new coordinate.
- Returns:
xarray.Dataset
- daops.data_utils.coord_utils.add_scalar_coord(ds_id, ds, **operands)[source]
Add a scalar coordinate.
- Parameters:
ds_id (str) – Dataset ID.
ds (xarray.Dataset) – A Dataset.
operands (dict) – Dictionary containing the new coordinate.
- Returns:
xarray.Dataset
- daops.data_utils.coord_utils.squeeze_dims(ds_id, ds, **operands)[source]
Squeeze dimensions from dataset.
- Parameters:
ds_id (str) – Dataset ID. Unused in this function.
ds (xarray.Dataset) – A Dataset.
operands (dict) – Dictionary containing the dimensions to remove.
- Returns:
xarray.Dataset
Module to add a data variable to a dataset.
- daops.data_utils.var_utils.add_data_var(ds_id, ds, **operands)[source]
Add a data variable.
- Parameters:
ds_id (str) – Dataset ID. Unused in this function.
ds (xarray.Dataset) – A Dataset.
operands (dict) – Dictionary containing the new data variable.
- Returns:
xarray.Dataset
Processor
Module to dispatch the processing operation to the correct mode (serial or parallel).
- daops.processor.dispatch(operation, dset, **kwargs)[source]
Dispatch the operation to the correct mode (serial or parallel).
- daops.processor.process(operation, dset, mode='serial', **kwargs)[source]
Run the processing operation on the dataset in the correct mode (in series or parallel).