[1]:
from daops.ops.subset import subset

# remove previosuly created example file
import os
if os.path.exists("./output_001.nc"):
    os.remove("./output_001.nc")

Subset

Daops has a subsetting operation that calls clisops.ops.subset.subset from the clisops library.

Before making the call to the subset operation, daops will look up a database of known fixes. If there are any fixes for the requested dataset then the data will be loaded and fixed using the xarray library and the subsetting operation is then carried out by clisops.

Results of subset and applying a fix

The results of the subsetting operation in daops are returned as an ordered dictionary of the input dataset id and the output in the chosen format (xarray dataset, netcdf file paths, zarr file paths)

The example below requires a fix so the elasticsearch index has been consulted.

It also demostrates the results of the operation

[2]:
# An example of subsetting a dataset that requires a fix - the elasticsearch index is consulted.

ds = "badc/cmip5/data/cmip5/output1/INM/inmcm4/rcp45/mon/ocean/Omon/r1i1p1/latest/zostoga/*.nc"
result = subset(
        ds,
        time=("1955-01-01T00:00:00", "2013-12-30T00:00:00"),
        output_dir=None,
        output_type="xarray",
    )

result._results
2020-11-19 11:59:35,653 - /home/docs/checkouts/readthedocs.org/user_builds/daops/conda/release-v0.3.0/lib/python3.9/site-packages/daops/utils/consolidate.py - INFO - Testing 1 files in time range: ...
2020-11-19 11:59:35,681 - /home/docs/checkouts/readthedocs.org/user_builds/daops/conda/release-v0.3.0/lib/python3.9/site-packages/daops/utils/consolidate.py - INFO - File 0: badc/cmip5/data/cmip5/output1/INM/inmcm4/rcp45/mon/ocean/Omon/r1i1p1/latest/zostoga/zostoga_Omon_inmcm4_rcp45_r1i1p1_200601-210012.nc
2020-11-19 11:59:36,081 - /home/docs/checkouts/readthedocs.org/user_builds/daops/conda/release-v0.3.0/lib/python3.9/site-packages/daops/utils/consolidate.py - INFO - Kept 1 files
2020-11-19 11:59:36,085 - /home/docs/checkouts/readthedocs.org/user_builds/daops/conda/release-v0.3.0/lib/python3.9/site-packages/daops/utils/normalise.py - INFO - Working on datasets: OrderedDict([('cmip5.output1.INM.inmcm4.rcp45.mon.ocean.Omon.r1i1p1.latest.zostoga', ['badc/cmip5/data/cmip5/output1/INM/inmcm4/rcp45/mon/ocean/Omon/r1i1p1/latest/zostoga/zostoga_Omon_inmcm4_rcp45_r1i1p1_200601-210012.nc'])])
2020-11-19 11:59:36,537 - elasticsearch - INFO - GET https://elasticsearch.ceda.ac.uk:443/roocs-fix/_doc/f34d45e4f7f5e187f64021b685adc447 [status:200 request:0.449s]
2020-11-19 11:59:36,561 - /home/docs/checkouts/readthedocs.org/user_builds/daops/conda/release-v0.3.0/lib/python3.9/site-packages/daops/utils/core.py - INFO - Running post-processing function: squeeze_dims
2020-11-19 11:59:36,569 - /home/docs/checkouts/readthedocs.org/user_builds/daops/conda/release-v0.3.0/lib/python3.9/site-packages/daops/processor.py - INFO - Running subset [serial]: on Dataset with args: {'time': Time period to subset over
 start time: 1955-01-01T00:00:00
 end time: 2013-12-30T00:00:00, 'area': Area to subset over:
 None, 'level': Level range to subset over
 first_level: None
 last_level: None, 'output_type': 'xarray', 'output_dir': None, 'split_method': 'time:auto', 'file_namer': 'standard'}
2020-11-19 11:59:36,597 - /home/docs/checkouts/readthedocs.org/user_builds/daops/conda/release-v0.3.0/lib/python3.9/site-packages/clisops/ops/subset.py - INFO - Processing subset for times: ('2006-01-16', '2013-12-16')
2020-11-19 11:59:36,600 - /home/docs/checkouts/readthedocs.org/user_builds/daops/conda/release-v0.3.0/lib/python3.9/site-packages/clisops/utils/output_utils.py - INFO - fmt_method=None, output_type=xarray
2020-11-19 11:59:36,603 - /home/docs/checkouts/readthedocs.org/user_builds/daops/conda/release-v0.3.0/lib/python3.9/site-packages/clisops/utils/output_utils.py - INFO - Returning output as <class 'xarray.core.dataset.Dataset'>
/home/docs/checkouts/readthedocs.org/user_builds/daops/conda/release-v0.3.0/lib/python3.9/site-packages/clisops/ops/subset.py:34: UserWarning: "start_date" not found within input date time range. Defaulting to minimum time step in xarray object.
  result = subset_time(ds, **kwargs)
/home/docs/checkouts/readthedocs.org/user_builds/daops/conda/release-v0.3.0/lib/python3.9/site-packages/clisops/ops/subset.py:34: UserWarning: "end_date" has been nudged to nearest valid time step in xarray object.
  result = subset_time(ds, **kwargs)
[2]:
OrderedDict([('cmip5.output1.INM.inmcm4.rcp45.mon.ocean.Omon.r1i1p1.latest.zostoga',
              [<xarray.Dataset>
               Dimensions:    (bnds: 2, time: 96)
               Coordinates:
                   lev        float64 0.0
                 * time       (time) object 2006-01-16 12:00:00 ... 2013-12-16 12:00:00
               Dimensions without coordinates: bnds
               Data variables:
                   lev_bnds   (bnds) float64 dask.array<chunksize=(2,), meta=np.ndarray>
                   time_bnds  (time, bnds) object dask.array<chunksize=(96, 2), meta=np.ndarray>
                   zostoga    (time) float32 dask.array<chunksize=(96,), meta=np.ndarray>
               Attributes:
                   institution:            INM (Institute for Numerical Mathematics,  Moscow...
                   institute_id:           INM
                   experiment_id:          rcp45
                   source:                 inmcm4 (2009)
                   model_id:               inmcm4
                   forcing:                N/A
                   parent_experiment_id:   historical
                   branch_time:            56940.0
                   contact:                Evgeny Volodin, volodin@inm.ras.ru,INM RAS, Gubki...
                   history:                Mon Mar  9 11:49:38 2020: ncks -d lev,,,8 -v zost...
                   comment:                no comments
                   references:             Volodin, Diansky, Gusev 2010. Climate model INMCM...
                   initialization_method:  1
                   physics_version:        1
                   tracking_id:            e16ae391-db18-4e82-b2b8-46ff24aeec77
                   product:                output
                   experiment:             RCP4.5
                   frequency:              mon
                   creation_date:          2010-11-19T08:18:56Z
                   Conventions:            CF-1.4
                   project_id:             CMIP5
                   table_id:               Table Omon (12 May 2010) f2afe576fb73a3a11aaa3cc8...
                   title:                  inmcm4 model output prepared for CMIP5 RCP4.5
                   parent_experiment:      Historical
                   modeling_realm:         ocean
                   realization:            1
                   cmor_version:           2.0.0
                   NCO:                    4.7.3])])

File paths of output

If output as file paths, it is also possible to access just the output file paths from the results object. This is demonstrated below.

[3]:
# An example of subsetting a dataset that requires a fix - the elasticsearch index is consulted.

ds = "badc/cmip5/data/cmip5/output1/INM/inmcm4/rcp45/mon/ocean/Omon/r1i1p1/latest/zostoga/*.nc"
result = subset(
        ds,
        time=("1955-01-01T00:00:00", "2013-12-30T00:00:00"),
        output_dir=".",
        output_type="netcdf",
        file_namer="simple"
    )

print("ouptut file paths = ", result.file_paths)
2020-11-19 11:59:36,640 - /home/docs/checkouts/readthedocs.org/user_builds/daops/conda/release-v0.3.0/lib/python3.9/site-packages/daops/utils/consolidate.py - INFO - Testing 1 files in time range: ...
2020-11-19 11:59:36,776 - /home/docs/checkouts/readthedocs.org/user_builds/daops/conda/release-v0.3.0/lib/python3.9/site-packages/daops/utils/consolidate.py - INFO - File 0: badc/cmip5/data/cmip5/output1/INM/inmcm4/rcp45/mon/ocean/Omon/r1i1p1/latest/zostoga/zostoga_Omon_inmcm4_rcp45_r1i1p1_200601-210012.nc
2020-11-19 11:59:37,155 - /home/docs/checkouts/readthedocs.org/user_builds/daops/conda/release-v0.3.0/lib/python3.9/site-packages/daops/utils/consolidate.py - INFO - Kept 1 files
2020-11-19 11:59:37,159 - /home/docs/checkouts/readthedocs.org/user_builds/daops/conda/release-v0.3.0/lib/python3.9/site-packages/daops/utils/normalise.py - INFO - Working on datasets: OrderedDict([('cmip5.output1.INM.inmcm4.rcp45.mon.ocean.Omon.r1i1p1.latest.zostoga', ['badc/cmip5/data/cmip5/output1/INM/inmcm4/rcp45/mon/ocean/Omon/r1i1p1/latest/zostoga/zostoga_Omon_inmcm4_rcp45_r1i1p1_200601-210012.nc'])])
2020-11-19 11:59:37,726 - elasticsearch - INFO - GET https://elasticsearch.ceda.ac.uk:443/roocs-fix/_doc/f34d45e4f7f5e187f64021b685adc447 [status:200 request:0.564s]
2020-11-19 11:59:37,749 - /home/docs/checkouts/readthedocs.org/user_builds/daops/conda/release-v0.3.0/lib/python3.9/site-packages/daops/utils/core.py - INFO - Running post-processing function: squeeze_dims
2020-11-19 11:59:37,755 - /home/docs/checkouts/readthedocs.org/user_builds/daops/conda/release-v0.3.0/lib/python3.9/site-packages/daops/processor.py - INFO - Running subset [serial]: on Dataset with args: {'time': Time period to subset over
 start time: 1955-01-01T00:00:00
 end time: 2013-12-30T00:00:00, 'area': Area to subset over:
 None, 'level': Level range to subset over
 first_level: None
 last_level: None, 'output_type': 'netcdf', 'output_dir': '.', 'split_method': 'time:auto', 'file_namer': 'simple'}
2020-11-19 11:59:37,780 - /home/docs/checkouts/readthedocs.org/user_builds/daops/conda/release-v0.3.0/lib/python3.9/site-packages/clisops/ops/subset.py - INFO - Processing subset for times: ('2006-01-16', '2013-12-16')
2020-11-19 11:59:37,783 - /home/docs/checkouts/readthedocs.org/user_builds/daops/conda/release-v0.3.0/lib/python3.9/site-packages/clisops/utils/output_utils.py - INFO - fmt_method=to_netcdf, output_type=netcdf
2020-11-19 11:59:37,829 - /home/docs/checkouts/readthedocs.org/user_builds/daops/conda/release-v0.3.0/lib/python3.9/site-packages/clisops/utils/output_utils.py - INFO - Wrote output file: ./output_001.nc
ouptut file paths =  ['./output_001.nc']
/home/docs/checkouts/readthedocs.org/user_builds/daops/conda/release-v0.3.0/lib/python3.9/site-packages/clisops/ops/subset.py:34: UserWarning: "start_date" not found within input date time range. Defaulting to minimum time step in xarray object.
  result = subset_time(ds, **kwargs)
/home/docs/checkouts/readthedocs.org/user_builds/daops/conda/release-v0.3.0/lib/python3.9/site-packages/clisops/ops/subset.py:34: UserWarning: "end_date" has been nudged to nearest valid time step in xarray object.
  result = subset_time(ds, **kwargs)

Checks implemented by daops

Daops will check that files exist in the requested time range

[4]:
ds = "/badc/cmip5/data/cmip5/output1/INM/inmcm4/rcp45/mon/ocean/Omon/r1i1p1/latest/zostoga/*.nc"

try:
    result = subset(
            ds,
            time=("1955-01-01T00:00:00", "1990-12-30T00:00:00"),
            output_dir=None,
            output_type="xarray",
        )

except Exception as exc:
    print(exc)
2020-11-19 11:59:37,854 - /home/docs/checkouts/readthedocs.org/user_builds/daops/conda/release-v0.3.0/lib/python3.9/site-packages/daops/utils/consolidate.py - INFO - Testing 0 files in time range: ...
no files to open