Steps¶
Writing a step¶
Writing a new step involves writing a class that has a process
method to perform work and a spec
member to define its configuration
parameters. (Optionally, the spec
member may be defined in a
separate spec
file).
Inputs and outputs¶
A Step
provides a full framework for handling I/O. Below is a short
description. A more detailed discussion can be found in
Step I/O Design.
Steps get their inputs from two sources:
Configuration parameters come from the parameter file or commandline and are set as member variables on the Step object by the stpipe framework.
Arguments are passed to the Step’s
process
function as regular function arguments.
Parameters should be used to specify things that must be determined outside of the code by a user using the class. Arguments should be used to pass data that needs to go from one step to another as part of a larger pipeline. Another way to think about this is: if the user would want to examine or change the value, use a parameter.
The parameters are defined by the Step.spec member.
Input Files, Associations, and Directories¶
All input files must be in the same directory. This directory is whichever directory the first input file is found in. This is particularly important for associations. It is assumed that all files referenced by an association are in the same directory as the association file itself.
Output Files and Directories¶
The step will generally return its output as a data model. Every step has
implicitly created parameters output_dir
and output_file
which the user can
use to specify the directory and file to save this model to. Since the stpipe
architecture generally creates output file names, in general, it is expected
that output_file
be rarely specified, and that different sets of outputs be
separated using output_dir
.
Output Suffix¶
There are three ways a step’s results can be written to a file:
Implicitly when a step is run from the command line or with
Step.from_cmdline
Explicitly by specifying the parameter
save_results
Explicitly by specifying a file name with the parameter
output_file
In all cases, the file, or files, is/are created with an added suffix
at the end of the base file name. By default this suffix is the class
name of the step that produced the results. Use the suffix
parameter
to explicitly change the suffix.
For pipelines, this can be done either in a parameter file, or within the code itself. See calwebb_dark for an example of specifying in the parameter file.
For an example where the suffix can only be determined at runtime, see calwebb_detector1. For an example of a pipeline that returns many results, see calwebb_spec2.
The Python class¶
At a minimum, the Python Step class should inherit from stpipe.Step
, implement
a process
method to do the actual work of the step and have a spec
member
to describe its parameters.
Objects from other Steps in a pipeline are passed as arguments to the
process
method.The parameters described in Configuring a Step are available as member variables on
self
.To support the caching suspend/resume feature of pipelines, images must be passed between steps as model objects. To ensure you’re always getting a model object, call the model constructor on the parameter passed in. It is good idea to use a
with
statement here to ensure that if the input is a file path that the file will be appropriately closed.Use
get_reference_file_model
method to load any CRDS reference files used by the Step. This will make a cached network request to CRDS. If the user of the step has specified an override for the reference file in either the parameter file or at the command line, the override file will be used instead. (See Interfacing with CRDS).Objects to pass to other Steps in the pipeline are simply returned from the function. To return multiple objects, return a tuple.
The parameters for the step are described in the
spec
member in theconfigspec
format.Declare any CRDS reference files used by the Step. (See Interfacing with CRDS).
from jwst.stpipe import Step
from stdatamodels.jwst.datamodels import ImageModel
from my_awesome_astronomy_library import combine
class ExampleStep(Step):
"""
Every step should include a docstring. This docstring will be
displayed by the `strun --help`.
"""
# 1.
def process(self, image1, image2):
self.log.info("Inside ExampleStep")
# 2.
threshold = self.threshold
# 3.
with ImageModel(image1) as image1, ImageModel(image2) as image2:
# 4.
with self.get_reference_file_model(image1, "flat_field") as flat:
new_image = combine(image1, image2, flat, threshold)
# 5.
return new_image
# 6.
spec = """
# This is the configspec file for ExampleStep
threshold = float(default=1.0) # maximum flux
"""
# 7.
reference_file_types = ['flat_field']
The Python Step subclass may be installed anywhere that your Python
installation can find it. It does not need to be installed in the
stpipe
package.
The spec member¶
The spec
member variable is a string containing information about
the parameters. It is in the configspec
format
defined in the ConfigObj
library that stpipe uses.
The configspec
format defines the types of the parameters, as well as allowing
an optional tree structure.
The types of parameters are declared like this:
n_iterations = integer(1, 100) # The number of iterations to run
factor = float() # A multiplication factor
author = string() # The author of the file
Note that each parameter may have a comment. This comment is extracted and displayed in help messages and docstrings etc.
Parameters can be grouped into categories using ini-file-like syntax:
[red]
offset = float()
scale = float()
[green]
offset = float()
scale = float()
[blue]
offset = float()
scale = float()
Default values may be specified on any parameter using the default
keyword argument:
name = string(default="John Doe")
While the most commonly useful parts of the configspec format are discussed here, greater detail can be found in the configspec documentation.
Configspec types¶
The following is a list of the commonly useful configspec types.
integer
: matches integer values. Takes optionalmin
andmax
arguments:integer() integer(3, 9) # any value from 3 to 9 integer(min=0) # any positive value integer(max=9)
float
: matches float values Has the same parameters as the integer check.
boolean
: matches boolean values: True or False.
string
: matches any string. Takes optional keyword argsmin
andmax
to specify min and max length of string.
list
: matches any list. Takes optional keyword argsmin
, andmax
to specify min and max sizes of the list. The list checks always return a list.
force_list
: matches any list, but if a single value is passed in will coerce it into a list containing that value.
int_list
: Matches a list of integers. Takes the same arguments as list.
float_list
: Matches a list of floats. Takes the same arguments as list.
bool_list
: Matches a list of boolean values. Takes the same arguments as list.
string_list
: Matches a list of strings. Takes the same arguments as list.
option
: matches any from a list of options. You specify this test with:option('option 1', 'option 2', 'option 3')Normally, steps will receive input files as parameters and return output files from their process methods. However, in cases where paths to files should be specified in the parameter file, there are some extra parameter types that stpipe provides that aren’t part of the core configobj library.
input_file
: Specifies an input file. Relative paths are resolved against the location of the parameter file. The file must also exist.
output_file
: Specifies an output file. Identical toinput_file
, except the file doesn’t have to already exist.
Interfacing with CRDS¶
If a Step uses CRDS to retrieve reference files, there are two things to do:
Within the
process
method, callself.get_reference_file
orself.get_reference_file_model
to get a reference file from CRDS. These methods take as input a) a model for the input file, whose metadata is used to do a CRDS bestref lookup, and b) a reference file type, which is just a string to identify the kind of reference file.Declare the reference file types used by the Step in the
reference_file_types
member. This information is used by the stpipe framework for two purposes: a) to pre-cache the reference files needed by a Pipeline before any of the pipeline processing actually runs, and b) to add override parameters to the Step’s configspec.
For each reference file type that the Step declares, an override_*
parameter
is added to the Step’s configspec. For example, if a step declares the
following:
reference_file_types = ['flat_field']
then the user can override the flat field reference file using the parameter file:
override_flat_field = /path/to/my_reference_file.fits
or at the command line:
--override_flat_field=/path/to/my_reference_file.fits
Making a simple commandline script for a step¶
Any step can be run from the commandline using Running a Step from the commandline. However,
to make a step even easier to run from the commandline, a custom
script can be created. stpipe provides a function
stpipe.cmdline.step_script
to make those scripts easier to write.
For example, to make a script for the step mypackage.ExampleStep
:
#!/usr/bin/python
# ExampleStep
# Import the custom step
from mypackage import ExampleStep
# Import stpipe.cmdline
from jwst.stpipe import cmdline
if __name__ == '__main__':
# Pass the step class to cmdline.step_script
cmdline.step_script(ExampleStep)
Running this script is similar to invoking the step with Running a Step from the commandline,
with one difference. Since the Step class is known (it is hard-coded
in the script), it does not need to be specified on the commandline.
To specify a config file on the commandline, use the --config-file
option.
For example:
> ExampleStep
> ExampleStep --config-file=example_step.asdf
> ExampleStep --parameter1=42.0 input_file.fits