daops.utils package

DAOPS utility functions.

Submodules

daops.utils.base_lookup module

Base class used for looking up datasets in the elasticsearch indexes.

class daops.utils.base_lookup.Lookup(dset)[source]

Bases: object

Base class used for looking up datasets in the elasticsearch indexes.

_convert_id(_id)[source]

Convert the dataset id to an md5 checksum used to retrieve the fixes for the dataset.

Converts to drs id format first if necessary.

convert_to_ds_id()[source]: Convert the input dataset to a drs id form to use with the elasticsearch index.

daops.utils.common module

Common utilities for the daops package.

daops.utils.common._logging_examples() → None[source]: Enable testing module.

daops.utils.common.enable_logging() → list[int][source]: Enable logging for the daops package.

daops.utils.consolidate module

Consolidate file paths for each dataset in a collection.

daops.utils.consolidate.consolidate(collection, **kwargs)[source]

Find the file paths relating to each input dataset.

If a time range has been supplied then only the files relating to this time range are recorded.

Parameters:

collection – (clisops.parameter.CollectionParameter) The collection of datasets to process.
kwargs – Arguments of the operation taking place e.g. subset, average, or re-grid.

Returns:

An ordered dictionary of each dataset from the collection argument and the file paths relating to it.

daops.utils.consolidate.get_files_matching_time_range(time_param, file_paths)[source]

Examine each file to see if it contains years that are in the requested range.

Uses the settings in time_param.

The time_param can have three types:

type: “interval”: - defined with “start_time” and “end_time”
type: “series”: - defined with a list of “time_values”
type: “none”: - undefined

It attempts to filter out files that do not match the selected year. For any file that we cannot do this with, the file will be read by xarray.

Parameters:

time_param (TimeParameter) – time parameter of requested date/times
file_paths (list) – list of file paths

Returns:

file_paths (list) – filtered list of file paths

daops.utils.consolidate.get_year(value, default)[source]

Get a year from a datetime string.

Defaults to the value of default if not defined.

daops.utils.consolidate.get_years_from_file(fpath)[source]

Attempt to extract years from a file.

First by examining the file name. If that doesn’t work then it reads the file contents and looks at the time axis.

Returns a set of years.

daops.utils.consolidate.to_year(time_string)[source]: Return the year in a time string as an integer.

daops.utils.core module

Utility functions for the DAOPS package.

class daops.utils.core.Characterised(dset)[source]

Bases: Lookup

Characterisation lookup class to look up whether a dataset has been characterised.

lookup_characterisation()[source]

Attempt to find datasets in the characterisation store.

Returns True if they exist in the store, returns False if not.

daops.utils.core._wrap_sequence(obj)[source]

daops.utils.core.is_characterised(collection, require_all=False)[source]

Intake a collection (an individual data reference or a sequence of them).

Returns an ordered dictionary of a collection of ids with a boolean value for each stating whether the dataset has been characterised.

If require_all is True: return a single Boolean value.

Parameters:

collection – one or more data references
require_all – Boolean to require that all must be characterised

Returns:

Ordered Dictionary OR Boolean (if require_all is True)

daops.utils.core.open_dataset(ds_id, file_paths, apply_fixes=True)[source]

Open an xarray Dataset and apply fixes if requested.

Fixes are applied to the data either before or after the dataset is opened. Whether a fix is a ‘pre-processor’ or ‘post-processor’ is defined in the fix itself.

Parameters:

ds_id – Dataset identifier in the form of a drs id e.g. cmip5.output1.INM.inmcm4.rcp45.mon.ocean.Omon.r1i1p1.latest.zostoga
file_paths – (list) The file paths corresponding to the ds id.
apply_fixes – Boolean. If True fixes will be applied to datasets if needed. Default is True.

Returns:

xarray Dataset with fixes applied to the data.

daops.utils.fixer module

Apply fixes to input dataset from the elastic search index.

class daops.utils.fixer.Fixer(dset)[source]

Bases: Lookup

Fixer class to look up fixes to apply to input dataset from the elastic search index.

Gathers fixes into pre- and post-processors. Pre-process fixes are chained together to allow them to be executed with one call.

_gather_fixes(content)[source]: Gather pre- and post-processing fixes together.

_lookup_fix()[source]: Look up fixes on the elasticsearch index.

class daops.utils.fixer.FuncChainer(funcs)[source]

Bases: object

Chains functions together to allow them to be executed in one call.

daops.utils.normalise module

Normalise datasets.

class daops.utils.normalise.ResultSet(inputs=None)[source]

Bases: object

A class to hold the results from an operation e.g. subset.

add(dset, result)[source]

Add outputs to an ordered dictionary with the ds id as the key.

If the output is a file path this is also added to the file_paths variable so a list of file paths can be accessed independently.

daops.utils.normalise.normalise(collection, apply_fixes=True)[source]

Take file paths, then open and fix the datasets they make up.

Parameters:

collection – Ordered dictionary of ds ids and their related file paths.
apply_fixes – Boolean. If True fixes will be applied to datasets if needed. Default is True.

Returns:

An ordered dictionary of ds ids and their fixed xarray Dataset.

daops.utils.testing module

class daops.utils.testing.ContextLogger(caplog: _pytest.logging.LogCaptureFixture | None = None)[source]

Bases: object

Helper function for safe logging management in pytests.

daops.utils.testing.get_esgf_file_paths(esgf_cache_dir: str | PathLike[str])[source]

daops.utils.testing.write_roocs_cfg(cache_dir: str | Path)[source]