Home :: Developers

COWS Web Processing Service (COWS WPS)

Contents

The Subsetter data extraction Process

Introduction

The Subsetter process facilitates the extraction of a variable subset from a range of datasets. The user can select a dataset, a single variable, time range, vertical level range and bounding box. The output format can also be selected (NetCDF or CSV) along with instructions on how the process should divide output files into sensible time chunks.

The tool uses the CDMS (Climate Data Management System) libraries (developed as part of PCMDI’s Climate Data Analysis Tools (CDAT)) to interact with the datasets in the archives. The extraction jobs run on the batch processing servers and the user is e-mailed when the job has completed.

This page provides an overview of the Subsetter process and sorry, it starts with a disclaimer...

Disclaimer

The Subsetter process is built on third-party tools developed by a variety of organisations. Whilst these tools are used widely in the atmospheric and climate research communities there may be applications, working with certain files or grids, in which the output will not be valid. The use of the COWS WPS Subsetter carries the same risk that certain applications of the tool will not produce valid outputs. The authors hold no liability for any scientific results produced using this WPS process. Individuals accessing these WPS processes do so a their own risk. You are strongly advised to validate your outputs using other tools to ensure that the appropriate calculations have been performed.

Using the Subsetter through the COWS WPS User Interface

The Subsetter process can be accessed programmatically or via the COWS WPS User Interface (CWUI). The real power of this process is demonstrated when using the CWUI because it relies heavily on dynamic and dependent inputs (as describe in _:ref:wps-processes section). The following screenshots show the sequence of events that the user would see if making a request to the Subsetter process via the CWUI submission form.

1. Loading the Subsetter form

From the list of all WPS Processes, find the “Subsetter” Process by entering “subsetter” in the keyword search or filter facility. This will return the Subsetter Process entry. Simply follow the link to “Submit a request” and this will load the Subsetter form.

../../_images/subsetter_form_1.png

Figure 1. Selecting the “Subsetter” process.

This process includes fields whose possible values are dependent on other selections you have made (these are known as “dynamic” fields). The form works from top to bottom so please start by selecting an option for the first field. Once you have made your selection the form will automatically update to show the available options in the fields below. You can manually instruct the form to update by clicking the “Update form” button and you can re-set it to the initial values using the “Reset form” button.

2. Selecting a Dataset

The first field is “Dataset”. Select the dataset that contains the variable you wish to subset. The number of datasets made available through the Subsetter depends on the local configuration of the server.

See the Which datasets are available? section below for details of which datasets are available.

The “Subsetter” process will typically be secured so you will need to make sure that you are registered for any datasets that you request via this process.

IMPORTANT: If the dataset that you wish to subset happens to be the first on the drop-down list, then you should click the “Update form” button to load the associated variables to display in the “Variable” field below.

3. Selecting a Variable

This is a dynamic field which will only upload when the dataset will have been successfully selected (in the “Dataset” field). Simply select a value from the options provided. Only one variable can be selected.

../../_images/subsetter_form_2.png

Figure 2. Selecting the “Variable” option using the CWUI web form.

4. Temporal and spatial selections

Once the variable field has been selected, the Temporal and Spatial selections will be displayed.

Start Date Time/End Date Time
Please insert a date/time field in the format “YYYY-MM-DDThh:mm:ss” such as “2009-01-01T00:00:00”. Once again, this field is dynamic and the available options depend upon selections you have made for some, or all, of the fields above. An indication of valid date/time values is given depending on the dataset and variable selected.
Bounding Box
This input is optional. This field is dynamic and the available geographical extent depends upon selections you have made for some, or all, of the fields above. You may select a bounding box or single point of interest by using the interactive map or by setting the North, South, East and West input boxes to the required values. An indication of the geographical extent within which a bounding box may be defined is given depending on the dataset and variable selected.

5. Selecting the Output Format

The output format of your subsetted dataset may either be NetCDF or CSV (comma-separated variables).

6. Selecting time-chunking in the output files

This instructs the process to split the output files into chunks of a fixed and constant duration (e.g. month, year). If the default value (AUTOMATIC) is selected, then the process will try to determine a sensible chunking of the outputs based on the volume of the extraction.

7. Submit your request

Once all options have been selected, you can submit your subsetter job request. The “Job Confirmation” page will display an estimated duration and volume of the outputs that will be generated by your request. Please note that these are estimates and should only be interpreted as such. If the servers are very busy, the duration may be significantly under-estimated. If you wish to submit the request, click the “Submit” button and the job will start running in the background. You will be e-mailed when the job has completed and the outputs are ready for you to download.

8. Feedback on your job

The browser will be forwarded to the “Job Viewer” page at this point. This will include notification that the job is running with an estimate of the percentage progress. Note that that large jobs may take a long time so you may wish to navigate away from this page. The WPS will continue to run your job regardless.

If you choose to stay on the page it will continue to refresh until the job has completed. (Note that the estimated percentage progress does not include time when the job may be held in a queue, in which case it will remain at zero percent completion).

Once the job has completed, or if for some reason it failed during processing, the WPS will e-mail you with details.

9. Downloading the output files

A “completed job” e-mail message will include a URL that links to the “Job Viewer” page from which you can download the output files from your job.

Various text files are included with your outputs, including a request metadata, disclaimer and various log files.

Data Input Parameters

Hiding unused inputs in the CWUI web form

Since the Subsetter attempts to be a generic tool that can provide subsetting capabilities for many datasets it can handle some input parameters that might not be relevant to a particular dataset. The Vertical Level input parameter may not be relevant to all datasets. As a result, it does not necessarily need to be provided as part of the request. The CWUI web form will hide any parameters that are found to be superfluous to a specific request.

Which datasets are available?

The datasets available under the Subsetter are dependent on the particular deployment of the COWS WPS service. Since the documentation is usually distributed alongside a working WPS this subsetter_datasets_link will normally point to a table of the possible datasets that are available. If this is not available then please contact the manager of the service for details.

NOTE: The above link will not work on the documentation when provided outside a deployed WPS.

Configuring the Subsetter process

As a WPS administrator, there are various settings that can be configured for the Subsetter. They are:

SizeLimitPerRequestGB
This setting is set by the administrator in the process configuration file. It provides an upper limit for the total volume of data produced by a request to the “Subsetter” process. The default is: 50 Gbytes.
SizeLimitPerFileGB
This setting is set by the administrator in the process configuration file. It provides an upper limit for the size of each single output file generated by the “Subsetter” process. The default is: 0.5 Gbyte.

Contents