[1]:
from daops.ops.subset import subset
# remove previosuly created example file
import os
if os.path.exists("./output_001.nc"):
os.remove("./output_001.nc")
Subset¶
Daops has a subsetting operation that calls clisops.ops.subset.subset
from the clisops
library.
Before making the call to the subset operation, daops
will look up a database of known fixes. If there are any fixes for the requested dataset then the data will be loaded and fixed using the xarray
library and the subsetting operation is then carried out by clisops
.
Results of subset and applying a fix¶
The results of the subsetting operation in daops are returned as an ordered dictionary of the input dataset id and the output in the chosen format (xarray dataset, netcdf file paths, zarr file paths)
The example below requires a fix so the elasticsearch index has been consulted.
It also demostrates the results of the operation
[2]:
# An example of subsetting a dataset that requires a fix - the elasticsearch index is consulted.
ds = "badc/cmip5/data/cmip5/output1/INM/inmcm4/rcp45/mon/ocean/Omon/r1i1p1/latest/zostoga/*.nc"
result = subset(
ds,
time=("1955-01-01T00:00:00", "2013-12-30T00:00:00"),
output_dir=None,
output_type="xarray",
)
result._results
2020-11-19 11:59:35,653 - /home/docs/checkouts/readthedocs.org/user_builds/daops/conda/release-v0.3.0/lib/python3.9/site-packages/daops/utils/consolidate.py - INFO - Testing 1 files in time range: ...
2020-11-19 11:59:35,681 - /home/docs/checkouts/readthedocs.org/user_builds/daops/conda/release-v0.3.0/lib/python3.9/site-packages/daops/utils/consolidate.py - INFO - File 0: badc/cmip5/data/cmip5/output1/INM/inmcm4/rcp45/mon/ocean/Omon/r1i1p1/latest/zostoga/zostoga_Omon_inmcm4_rcp45_r1i1p1_200601-210012.nc
2020-11-19 11:59:36,081 - /home/docs/checkouts/readthedocs.org/user_builds/daops/conda/release-v0.3.0/lib/python3.9/site-packages/daops/utils/consolidate.py - INFO - Kept 1 files
2020-11-19 11:59:36,085 - /home/docs/checkouts/readthedocs.org/user_builds/daops/conda/release-v0.3.0/lib/python3.9/site-packages/daops/utils/normalise.py - INFO - Working on datasets: OrderedDict([('cmip5.output1.INM.inmcm4.rcp45.mon.ocean.Omon.r1i1p1.latest.zostoga', ['badc/cmip5/data/cmip5/output1/INM/inmcm4/rcp45/mon/ocean/Omon/r1i1p1/latest/zostoga/zostoga_Omon_inmcm4_rcp45_r1i1p1_200601-210012.nc'])])
2020-11-19 11:59:36,537 - elasticsearch - INFO - GET https://elasticsearch.ceda.ac.uk:443/roocs-fix/_doc/f34d45e4f7f5e187f64021b685adc447 [status:200 request:0.449s]
2020-11-19 11:59:36,561 - /home/docs/checkouts/readthedocs.org/user_builds/daops/conda/release-v0.3.0/lib/python3.9/site-packages/daops/utils/core.py - INFO - Running post-processing function: squeeze_dims
2020-11-19 11:59:36,569 - /home/docs/checkouts/readthedocs.org/user_builds/daops/conda/release-v0.3.0/lib/python3.9/site-packages/daops/processor.py - INFO - Running subset [serial]: on Dataset with args: {'time': Time period to subset over
start time: 1955-01-01T00:00:00
end time: 2013-12-30T00:00:00, 'area': Area to subset over:
None, 'level': Level range to subset over
first_level: None
last_level: None, 'output_type': 'xarray', 'output_dir': None, 'split_method': 'time:auto', 'file_namer': 'standard'}
2020-11-19 11:59:36,597 - /home/docs/checkouts/readthedocs.org/user_builds/daops/conda/release-v0.3.0/lib/python3.9/site-packages/clisops/ops/subset.py - INFO - Processing subset for times: ('2006-01-16', '2013-12-16')
2020-11-19 11:59:36,600 - /home/docs/checkouts/readthedocs.org/user_builds/daops/conda/release-v0.3.0/lib/python3.9/site-packages/clisops/utils/output_utils.py - INFO - fmt_method=None, output_type=xarray
2020-11-19 11:59:36,603 - /home/docs/checkouts/readthedocs.org/user_builds/daops/conda/release-v0.3.0/lib/python3.9/site-packages/clisops/utils/output_utils.py - INFO - Returning output as <class 'xarray.core.dataset.Dataset'>
/home/docs/checkouts/readthedocs.org/user_builds/daops/conda/release-v0.3.0/lib/python3.9/site-packages/clisops/ops/subset.py:34: UserWarning: "start_date" not found within input date time range. Defaulting to minimum time step in xarray object.
result = subset_time(ds, **kwargs)
/home/docs/checkouts/readthedocs.org/user_builds/daops/conda/release-v0.3.0/lib/python3.9/site-packages/clisops/ops/subset.py:34: UserWarning: "end_date" has been nudged to nearest valid time step in xarray object.
result = subset_time(ds, **kwargs)
[2]:
OrderedDict([('cmip5.output1.INM.inmcm4.rcp45.mon.ocean.Omon.r1i1p1.latest.zostoga',
[<xarray.Dataset>
Dimensions: (bnds: 2, time: 96)
Coordinates:
lev float64 0.0
* time (time) object 2006-01-16 12:00:00 ... 2013-12-16 12:00:00
Dimensions without coordinates: bnds
Data variables:
lev_bnds (bnds) float64 dask.array<chunksize=(2,), meta=np.ndarray>
time_bnds (time, bnds) object dask.array<chunksize=(96, 2), meta=np.ndarray>
zostoga (time) float32 dask.array<chunksize=(96,), meta=np.ndarray>
Attributes:
institution: INM (Institute for Numerical Mathematics, Moscow...
institute_id: INM
experiment_id: rcp45
source: inmcm4 (2009)
model_id: inmcm4
forcing: N/A
parent_experiment_id: historical
branch_time: 56940.0
contact: Evgeny Volodin, volodin@inm.ras.ru,INM RAS, Gubki...
history: Mon Mar 9 11:49:38 2020: ncks -d lev,,,8 -v zost...
comment: no comments
references: Volodin, Diansky, Gusev 2010. Climate model INMCM...
initialization_method: 1
physics_version: 1
tracking_id: e16ae391-db18-4e82-b2b8-46ff24aeec77
product: output
experiment: RCP4.5
frequency: mon
creation_date: 2010-11-19T08:18:56Z
Conventions: CF-1.4
project_id: CMIP5
table_id: Table Omon (12 May 2010) f2afe576fb73a3a11aaa3cc8...
title: inmcm4 model output prepared for CMIP5 RCP4.5
parent_experiment: Historical
modeling_realm: ocean
realization: 1
cmor_version: 2.0.0
NCO: 4.7.3])])
File paths of output¶
If output as file paths, it is also possible to access just the output file paths from the results object. This is demonstrated below.
[3]:
# An example of subsetting a dataset that requires a fix - the elasticsearch index is consulted.
ds = "badc/cmip5/data/cmip5/output1/INM/inmcm4/rcp45/mon/ocean/Omon/r1i1p1/latest/zostoga/*.nc"
result = subset(
ds,
time=("1955-01-01T00:00:00", "2013-12-30T00:00:00"),
output_dir=".",
output_type="netcdf",
file_namer="simple"
)
print("ouptut file paths = ", result.file_paths)
2020-11-19 11:59:36,640 - /home/docs/checkouts/readthedocs.org/user_builds/daops/conda/release-v0.3.0/lib/python3.9/site-packages/daops/utils/consolidate.py - INFO - Testing 1 files in time range: ...
2020-11-19 11:59:36,776 - /home/docs/checkouts/readthedocs.org/user_builds/daops/conda/release-v0.3.0/lib/python3.9/site-packages/daops/utils/consolidate.py - INFO - File 0: badc/cmip5/data/cmip5/output1/INM/inmcm4/rcp45/mon/ocean/Omon/r1i1p1/latest/zostoga/zostoga_Omon_inmcm4_rcp45_r1i1p1_200601-210012.nc
2020-11-19 11:59:37,155 - /home/docs/checkouts/readthedocs.org/user_builds/daops/conda/release-v0.3.0/lib/python3.9/site-packages/daops/utils/consolidate.py - INFO - Kept 1 files
2020-11-19 11:59:37,159 - /home/docs/checkouts/readthedocs.org/user_builds/daops/conda/release-v0.3.0/lib/python3.9/site-packages/daops/utils/normalise.py - INFO - Working on datasets: OrderedDict([('cmip5.output1.INM.inmcm4.rcp45.mon.ocean.Omon.r1i1p1.latest.zostoga', ['badc/cmip5/data/cmip5/output1/INM/inmcm4/rcp45/mon/ocean/Omon/r1i1p1/latest/zostoga/zostoga_Omon_inmcm4_rcp45_r1i1p1_200601-210012.nc'])])
2020-11-19 11:59:37,726 - elasticsearch - INFO - GET https://elasticsearch.ceda.ac.uk:443/roocs-fix/_doc/f34d45e4f7f5e187f64021b685adc447 [status:200 request:0.564s]
2020-11-19 11:59:37,749 - /home/docs/checkouts/readthedocs.org/user_builds/daops/conda/release-v0.3.0/lib/python3.9/site-packages/daops/utils/core.py - INFO - Running post-processing function: squeeze_dims
2020-11-19 11:59:37,755 - /home/docs/checkouts/readthedocs.org/user_builds/daops/conda/release-v0.3.0/lib/python3.9/site-packages/daops/processor.py - INFO - Running subset [serial]: on Dataset with args: {'time': Time period to subset over
start time: 1955-01-01T00:00:00
end time: 2013-12-30T00:00:00, 'area': Area to subset over:
None, 'level': Level range to subset over
first_level: None
last_level: None, 'output_type': 'netcdf', 'output_dir': '.', 'split_method': 'time:auto', 'file_namer': 'simple'}
2020-11-19 11:59:37,780 - /home/docs/checkouts/readthedocs.org/user_builds/daops/conda/release-v0.3.0/lib/python3.9/site-packages/clisops/ops/subset.py - INFO - Processing subset for times: ('2006-01-16', '2013-12-16')
2020-11-19 11:59:37,783 - /home/docs/checkouts/readthedocs.org/user_builds/daops/conda/release-v0.3.0/lib/python3.9/site-packages/clisops/utils/output_utils.py - INFO - fmt_method=to_netcdf, output_type=netcdf
2020-11-19 11:59:37,829 - /home/docs/checkouts/readthedocs.org/user_builds/daops/conda/release-v0.3.0/lib/python3.9/site-packages/clisops/utils/output_utils.py - INFO - Wrote output file: ./output_001.nc
ouptut file paths = ['./output_001.nc']
/home/docs/checkouts/readthedocs.org/user_builds/daops/conda/release-v0.3.0/lib/python3.9/site-packages/clisops/ops/subset.py:34: UserWarning: "start_date" not found within input date time range. Defaulting to minimum time step in xarray object.
result = subset_time(ds, **kwargs)
/home/docs/checkouts/readthedocs.org/user_builds/daops/conda/release-v0.3.0/lib/python3.9/site-packages/clisops/ops/subset.py:34: UserWarning: "end_date" has been nudged to nearest valid time step in xarray object.
result = subset_time(ds, **kwargs)
Checks implemented by daops¶
Daops will check that files exist in the requested time range
[4]:
ds = "/badc/cmip5/data/cmip5/output1/INM/inmcm4/rcp45/mon/ocean/Omon/r1i1p1/latest/zostoga/*.nc"
try:
result = subset(
ds,
time=("1955-01-01T00:00:00", "1990-12-30T00:00:00"),
output_dir=None,
output_type="xarray",
)
except Exception as exc:
print(exc)
2020-11-19 11:59:37,854 - /home/docs/checkouts/readthedocs.org/user_builds/daops/conda/release-v0.3.0/lib/python3.9/site-packages/daops/utils/consolidate.py - INFO - Testing 0 files in time range: ...
no files to open