daops.utils package

DAOPS utility functions.

Submodules

daops.utils.base_lookup module

Base class used for looking up datasets in the elasticsearch indexes.

class daops.utils.base_lookup.Lookup(dset)[source]

Bases: object

Base class used for looking up datasets in the elasticsearch indexes.

_convert_id(_id)[source]

Convert the dataset id to an md5 checksum used to retrieve the fixes for the dataset.

Converts to drs id format first if necessary.

convert_to_ds_id()[source]

Convert the input dataset to a drs id form to use with the elasticsearch index.

daops.utils.common module

Common utilities for the daops package.

daops.utils.common._logging_examples() None[source]

Enable testing module.

daops.utils.common.enable_logging() list[int][source]

Enable logging for the daops package.

daops.utils.consolidate module

Consolidate file paths for each dataset in a collection.

daops.utils.consolidate.consolidate(collection, **kwargs)[source]

Find the file paths relating to each input dataset.

If a time range has been supplied then only the files relating to this time range are recorded.

Parameters:
  • collection – (clisops.parameter.CollectionParameter) The collection of datasets to process.

  • kwargs – Arguments of the operation taking place e.g. subset, average, or re-grid.

Returns:

An ordered dictionary of each dataset from the collection argument and the file paths relating to it.

daops.utils.consolidate.get_files_matching_time_range(time_param, file_paths)[source]

Examine each file to see if it contains years that are in the requested range.

Uses the settings in time_param.

The time_param can have three types:
  1. type: “interval”: - defined with “start_time” and “end_time”

  2. type: “series”: - defined with a list of “time_values”

  3. type: “none”: - undefined

It attempts to filter out files that do not match the selected year. For any file that we cannot do this with, the file will be read by xarray.

Parameters:
  • time_param (TimeParameter) – time parameter of requested date/times

  • file_paths (list) – list of file paths

Returns:

file_paths (list) – filtered list of file paths

daops.utils.consolidate.get_year(value, default)[source]

Get a year from a datetime string.

Defaults to the value of default if not defined.

daops.utils.consolidate.get_years_from_file(fpath)[source]

Attempt to extract years from a file.

First by examining the file name. If that doesn’t work then it reads the file contents and looks at the time axis.

Returns a set of years.

daops.utils.consolidate.to_year(time_string)[source]

Return the year in a time string as an integer.

daops.utils.core module

Utility functions for the DAOPS package.

class daops.utils.core.Characterised(dset)[source]

Bases: Lookup

Characterisation lookup class to look up whether a dataset has been characterised.

lookup_characterisation()[source]

Attempt to find datasets in the characterisation store.

Returns True if they exist in the store, returns False if not.

daops.utils.core._wrap_sequence(obj)[source]
daops.utils.core.is_characterised(collection, require_all=False)[source]

Intake a collection (an individual data reference or a sequence of them).

Returns an ordered dictionary of a collection of ids with a boolean value for each stating whether the dataset has been characterised.

If require_all is True: return a single Boolean value.

Parameters:
  • collection – one or more data references

  • require_all – Boolean to require that all must be characterised

Returns:

Ordered Dictionary OR Boolean (if require_all is True)

daops.utils.core.open_dataset(ds_id, file_paths, apply_fixes=True)[source]

Open an xarray Dataset and apply fixes if requested.

Fixes are applied to the data either before or after the dataset is opened. Whether a fix is a ‘pre-processor’ or ‘post-processor’ is defined in the fix itself.

Parameters:
  • ds_id – Dataset identifier in the form of a drs id e.g. cmip5.output1.INM.inmcm4.rcp45.mon.ocean.Omon.r1i1p1.latest.zostoga

  • file_paths – (list) The file paths corresponding to the ds id.

  • apply_fixes – Boolean. If True fixes will be applied to datasets if needed. Default is True.

Returns:

xarray Dataset with fixes applied to the data.

daops.utils.fixer module

Apply fixes to input dataset from the elastic search index.

class daops.utils.fixer.Fixer(dset)[source]

Bases: Lookup

Fixer class to look up fixes to apply to input dataset from the elastic search index.

Gathers fixes into pre- and post-processors. Pre-process fixes are chained together to allow them to be executed with one call.

_gather_fixes(content)[source]

Gather pre- and post-processing fixes together.

_lookup_fix()[source]

Look up fixes on the elasticsearch index.

class daops.utils.fixer.FuncChainer(funcs)[source]

Bases: object

Chains functions together to allow them to be executed in one call.

daops.utils.normalise module

Normalise datasets.

class daops.utils.normalise.ResultSet(inputs=None)[source]

Bases: object

A class to hold the results from an operation e.g. subset.

add(dset, result)[source]

Add outputs to an ordered dictionary with the ds id as the key.

If the output is a file path this is also added to the file_paths variable so a list of file paths can be accessed independently.

daops.utils.normalise.normalise(collection, apply_fixes=True)[source]

Take file paths, then open and fix the datasets they make up.

Parameters:
  • collection – Ordered dictionary of ds ids and their related file paths.

  • apply_fixes – Boolean. If True fixes will be applied to datasets if needed. Default is True.

Returns:

An ordered dictionary of ds ids and their fixed xarray Dataset.

daops.utils.testing module

class daops.utils.testing.ContextLogger(caplog: _pytest.logging.LogCaptureFixture | None = None)[source]

Bases: object

Helper function for safe logging management in pytests.

daops.utils.testing.get_esgf_file_paths(esgf_cache_dir: str | PathLike[str])[source]
daops.utils.testing.write_roocs_cfg(cache_dir: str | Path)[source]