API
Subset operation
- daops.ops.subset.subset(collection, time=None, area=None, level=None, time_components=None, output_dir=None, output_type='netcdf', split_method='time:auto', file_namer='standard', apply_fixes=True)[source]
Subset input dataset according to parameters. Can be subsetted by level, area and time.
- Parameters:
collection (Collection of datasets to process, sequence or string of) – comma-separated dataset identifiers.
time (Time interval (defined by start/end) or time series (a sequence of) – datetime values) to subset over. Datetimes are typically provided as strings.
area (Area to subset over, sequence or string of comma separated lat and lon) – bounds. Must contain 4 values.
level (Level interval (defined by start/end) or level series (a sequence of) – values) to subset over. Levels are typically provided as integers or floats.
time_compoonents (Time components to filter on: year, month, day, hour, minute, second)
output_dir (str or path like object describing output directory for output files.)
output_type ({“netcdf”, “nc”, “zarr”, “xarray”})
split_method ({“time:auto”})
file_namer ({“standard”, “simple”})
apply_fixes (Boolean. If True fixes will be applied to datasets if needed. Default is True.)
- Returns:
List of outputs in the selected type (a list of xarray Datasets or file paths.)
Examples
collection: (“cmip6.ukesm1.r1.gn.tasmax.v20200101”,)time: (“1999-01-01T00:00:00”, “2100-12-30T00:00:00”)area: (-5.,49.,10.,65)level: (1000.,)time_components: {“month”: [“dec”, “jan”, “feb”]}output_type: “netcdf”output_dir: “/cache/wps/procs/req0111”split_method: “time:auto”file_namer: “standard”apply_fixes: True
Average operation
- daops.ops.average.average_over_dims(collection, dims=None, ignore_undetected_dims=False, output_dir=None, output_type='netcdf', split_method='time:auto', file_namer='standard', apply_fixes=True)[source]
Average input dataset according over indicated dimensions. Can be averaged over multiple dimensions.
- Parameters:
collection (Collection of datasets to process, sequence or string of comma separated dataset identifiers.)
dims (list of dims to average over or None.)
ignore_undetected_dims (Boolean. If False exception will be raised if requested dims do not exist in the dataset)
If True missing dims will be ignored.
output_dir (str or path like object describing output directory for output files.)
output_type ({“netcdf”, “nc”, “zarr”, “xarray”})
split_method ({“time:auto”})
file_namer ({“standard”, “simple”})
apply_fixes (Boolean. If True fixes will be applied to datasets if needed. Default is True.)
- Returns:
List of outputs in the selected type (a list of xarray Datasets or file paths.)
Examples
collection: (“cmip6.ukesm1.r1.gn.tasmax.v20200101”,)dims: [“time”, “lat”]ignore_undetected_dims: (-5.,49.,10.,65)output_type: “netcdf”output_dir: “/cache/wps/procs/req0111”split_method: “time:auto”file_namer: “standard”apply_fixes: True
Utilities
- daops.utils.consolidate.consolidate(collection, **kwargs)[source]
Finds the file paths relating to each input dataset. If a time range has been supplied then only the files relating to this time range are recorded.
- Parameters:
collection – (roocs_utils.CollectionParameter) The collection of datasets to process.
kwargs – Arguments of the operation taking place e.g. subset, average, or re-grid.
- Returns:
An ordered dictionary of each dataset from the collection argument and the file paths relating to it.
- daops.utils.consolidate.get_files_matching_time_range(time_param, file_paths)[source]
Using the settings in time_param, examine each file to see if it contains years that are in the requested range.
- The time_param can have three types:
type: “interval”: - defined with “start_time” and “end_time”
type: “series”: - defined with a list of “time_values”
type: “none”: - undefined
It attempts to filter out files that do not match the selected year. For any file that we cannot do this with, the file will be read by xarray.
- Parameters:
time_param (TimeParameter) – time parameter of requested date/times
file_paths (list) – list of file paths
- Returns:
file_paths (list) – filtered list of file paths
- daops.utils.consolidate.get_year(value, default)[source]
Gets a year from a datetime string. Defaults to the value of default if not defined.
- daops.utils.consolidate.get_years_from_file(fpath)[source]
Attempts to extract years from a file. First by examining the file name. If that doesn’t work then it reads the file contents and looks at the time axis.
Returns a set of years.
- daops.utils.consolidate.to_year(time_string)[source]
Returns the year in a time string as an integer.
- class daops.utils.core.Characterised(dset)[source]
Bases:
Lookup
Characterisation lookup class to look up whether a dataset has been characterised.
- lookup_characterisation()[source]
Attempts to find datasets in the characterisation store. Returns True if they exist in the store, returns False if not.
- daops.utils.core.is_characterised(collection, require_all=False)[source]
Takes in a collection (an individual data reference or a sequence of them). Returns an ordered dictionary of a collection of ids with a boolean value for each stating whether the dataset has been characterised.
If require_all is True: return a single Boolean value.
- Parameters:
collection – one or more data references
require_all – Boolean to require that all must be characterised
- Returns:
Ordered Dictionary OR Boolean (if require_all is True)
- daops.utils.core.open_dataset(ds_id, file_paths, apply_fixes=True)[source]
Opens an xarray Dataset and applies fixes if requested. Fixes are applied to the data either before or after the dataset is opened. Whether a fix is a ‘pre-processor’ or ‘post-processor’ is defined in the fix itself.
- Parameters:
ds_id – Dataset identifier in the form of a drs id e.g. cmip5.output1.INM.inmcm4.rcp45.mon.ocean.Omon.r1i1p1.latest.zostoga
file_paths – (list) The file paths corresponding to the ds id.
apply_fixes – Boolean. If True fixes will be applied to datasets if needed. Default is True.
- Returns:
xarray Dataset with fixes applied to the data.
- class daops.utils.base_lookup.Lookup(dset)[source]
Bases:
object
Base class used for looking up datasets in the elasticsearch indexes.
- convert_to_ds_id()[source]
Converts the input dataset to a drs id form to use with the elasticsearch index.
- class daops.utils.fixer.Fixer(dset)[source]
Bases:
Lookup
Fixer class to look up fixes to apply to input dataset from the elastic search index. Gathers fixes into pre and post processors. Pre-process fixes are chained together to allow them to be executed with one call.
- class daops.utils.fixer.FuncChainer(funcs)[source]
Bases:
object
Chains functions together to allow them to be executed in one call.
- class daops.utils.normalise.ResultSet(inputs=None)[source]
Bases:
object
A class to hold the results from an operation e.g. subset
- add(dset, result)[source]
Adds outputs to an ordered dictionary with the ds id as the key. If the output is a file path this is also added to the file_paths variable so a list of file paths can be accessed independently.
- daops.utils.normalise.normalise(collection, apply_fixes=True)[source]
Takes file paths and opens and fixes the dataset they make up.
- Parameters:
collection – Ordered dictionary of ds ids and their related file paths.
apply_fixes – Boolean. If True fixes will be applied to datasets if needed. Default is True.
- Returns:
An ordered dictionary of ds ids and their fixed xarray Dataset.
Data Utilities
- daops.data_utils.attr_utils.add_global_attrs_if_needed(ds_id, ds, **operands)[source]
- Parameters:
ds – Xarray DataSet
operands – sequence of arguments
- Returns:
Xarray Dataset
Add a global attribute if it doesn’t already exist.
- daops.data_utils.attr_utils.edit_global_attrs(ds_id, ds, **operands)[source]
- Parameters:
ds – Xarray DataSet
operands – sequence of arguments
- Returns:
Xarray DataArray
Change the gloabl attributes.
- daops.data_utils.attr_utils.edit_var_attrs(ds_id, ds, **operands)[source]
- Parameters:
ds – Xarray DataSet
operands – sequence of arguments
- Returns:
Xarray Dataset
Change the attributes of a variable.
- daops.data_utils.attr_utils.remove_coord_attr(ds_id, ds, **operands)[source]
- Parameters:
ds – Xarray DataSet
operands – sequence of arguments
- Returns:
Xarray Dataset
Remove coordinate attribute that is added by xarray, for specified variables.
- daops.data_utils.coord_utils.add_coord(ds_id, ds, **operands)[source]
- Parameters:
ds – Xarray DataSet
operands – sequence of arguments
- Returns:
Xarray DataArray
Add a coordinate.
- daops.data_utils.coord_utils.add_scalar_coord(ds_id, ds, **operands)[source]
- Parameters:
ds – Xarray DataSet
operands – sequence of arguments
- Returns:
Xarray Dataset
Add a scalar coordinate.
- daops.data_utils.coord_utils.squeeze_dims(ds_id, ds, **operands)[source]
- Parameters:
ds – Xarray Dataset
operands – (dict) Arguments for fix. Dims (list) to remove.
- Returns:
Xarray Dataset
- daops.data_utils.var_utils.add_data_var(ds_id, ds, **operands)[source]
- Parameters:
ds – Xarray DataSet
operands – sequence of arguments
- Returns:
Xarray Dataset
Add a data variable.
Processor
- daops.processor.dispatch(operation, dset, **kwargs)[source]
- daops.processor.process(operation, dset, mode='serial', **kwargs)[source]
Runs the processing operation on the dataset in the correct mode (in series or parallel).