Steps

Configuring a Step

This section describes how to instantiate a Step and set configuration parameters on it.

Steps can be configured by either:

  • Writing a parameter file

  • Instantiating the Step directly from Python

Running a Step from a parameter file

A parameter file contains one or more of a Step’s parameters. Any parameter not specified in the file will take its value from the CRDS-retrieved parameter reference file or the defaults coded directly into the Step. Note that any parameter specified on the command line overrides all other values.

The preferred format of parameter files is the ASDF Parameter Files format. Refer to the minimal example for a complete description of the contents. The rest of this document will focus on the step parameters themselves.

Every parameter file must contain the key class, followed by the optional name followed by any parameters that are specific to the step being run.

class specifies the Python class to run. It should be a fully-qualified Python path to the class. Step classes can ship with stpipe itself, they may be part of other Python packages, or they exist in freestanding modules alongside the configuration file. For example, to use the SystemCall step included with stpipe, set class to stpipe.subprocess.SystemCall. To use a class called Custom defined in a file mysteps.py in the same directory as the configuration file, set class to mysteps.Custom.

name defines the name of the step. This is distinct from the class of the step, since the same class of Step may be configured in different ways, and it is useful to be able to have a way of distinguishing between them. For example, when Steps are combined into Pipelines, a Pipeline may use the same Step class multiple times, each with different configuration parameters.

The parameters specific to the Step all reside under the key parameters. The set of accepted parameters is defined in the Step’s spec member. You can print out a Step’s configspec using the stspec commandline utility. For example, to print the configspec for an imaginary step called stpipe.cleanup:

$ stspec stpipe.cleanup
# The threshold below which to apply cleanup
threshold = float()

# A scale factor
scale = float()

# The output file to save to
output_file = output_file(default = None)

Note

Configspec information can also be displayed from Python, just call print_configspec on any Step class.

>>> from jwst.stpipe import cleanup
>>> cleanup.print_configspec()
>>> # The threshold below which to apply cleanup
>>> threshold = float()
>>> # A scale factor
>>> scale = float()

Using this information, one can write a parameter file to use this step. For example, here is a parameter file (do_cleanup.asdf) that runs the stpipe.cleanup step to clean up an image.

#ASDF 1.0.0
#ASDF_STANDARD 1.3.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.1.0
class: stpipe.cleanup
name: MyCleanup
parameters:
  threshold: 42.0
  scale: 0.01
...

Running a Step from the commandline

The strun command can be used to run Steps from the commandline.

The first argument may be either:

  • The path to a parameter file

  • A Python class

Additional parameters may be passed on the commandline. These parameters override any that are present in the parameter file. Any extra positional parameters on the commandline are passed to the step’s process method. This will often be input filenames.

For example, to use an existing parameter file from above, but override it so the threshold parameter is different:

$ strun do_cleanup.asdf input.fits --threshold=86

To display a list of the parameters that are accepted for a given Step class, pass the -h parameter, and the name of a Step class or parameter file:

$ strun -h do_cleanup.asdf
usage: strun [--logcfg LOGCFG] cfg_file_or_class [-h] [--pre_hooks]
             [--post_hooks] [--skip] [--scale] [--extname]

optional arguments:
  -h, --help       show this help message and exit
  --logcfg LOGCFG  The logging configuration file to load
  --verbose, -v    Turn on all logging messages
  --debug          When an exception occurs, invoke the Python debugger, pdb
  --pre_hooks
  --post_hooks
  --skip           Skip this step
  --scale          A scale factor
  --threshold      The threshold below which to apply cleanup
  --output_file    File to save the output to

Every step has an --output_file parameter. If one is not provided, the output filename is determined based on the input file by appending the name of the step. For example, in this case, foo.fits is output to foo_cleanup.fits.

Finally, the parameters a Step actually ran with can be saved to a new parameter file using the --save-parameters option. This file will have all the parameters, specific to the step, and the final values used.

Parameter Precedence

There are a number of places where the value of a parameter can be specified. The order of precedence, from most to least significant, for parameter value assignment is as follows:

  1. Value specified on the command-line: strun step.asdf --par=value_that_will_be_used

  2. Value found in the user-specified parameter file

  3. CRDS-retrieved parameter reference

  4. Step-coded default, determined by the parameter definition Step.spec

For pipelines, if a pipeline parameter file specifies a value for a step in the pipeline, that takes precedence over any step-specific value found, either from a step-specific parameter file or CRDS-retrieved step-specific parameter file. The full order of precedence for a pipeline and its sub steps is as follows:

  1. Value specified on the command-line: strun pipeline.asdf --steps.step.par=value_that_will_be_used

  2. Value found in the user-specified pipeline parameter file: strun pipeline.asdf

  3. Value found in the parameter file specified in a pipeline parameter file

  4. CRDS-retrieved parameter reference for the pipeline

  5. CRDS-retrieved parameter reference for each sub-step

  6. Pipeline-coded default for itself and all sub-steps

  7. Step-coded default for each sub-step

Debugging

To output all logging output from the step, add the --verbose option to the commandline. (If more fine-grained control over logging is required, see Logging).

To start the Python debugger if the step itself raises an exception, pass the --debug option to the commandline.

CRDS Retrieval of Step Parameters

In general, CRDS uses the input to a Step to determine which reference files to use. Nearly all JWST-related steps take only a single input file. However, often times that input file is an association. Since step parameters are configured only once per execution of a step or pipeline, only the first qualifying member, usually of type science is used.

Retrieval of Step parameters from CRDS can be completely disabled by using the --disable-crds-steppars command-line switch, or setting the environmental variable STPIPE_DISABLE_CRDS_STEPPARS to true.

Running a Step in Python

Running a step can also be done inside the Python interpreter and is as simple as calling its run() or call() classmethods.

run()

The run() classmethod will run a previously instantiated step class. This is very useful if one wants to setup the step’s attributes first, then run it:

from jwst.flatfield import FlatFieldStep

mystep = FlatFieldStep()
mystep.override_sflat = ‘sflat.fits’
output = mystep.run(input)

input in this case can be a fits file containing the appropriate data, or the output of a previously run step/pipeline, which is an instance of a particular datamodel.

Unlike in the use of call, a parameter file supplied while instantiating run() will be ignored.

Using the .run() method is the same as calling the instance or class directly. They are equivalent:

output = mystep(input)

call()

If one has all the parameter in a parameter file or can pass the arguments directly to the step, one can use the call() method, which creates a new instance of the class every time you call it. So:

from jwst.jump import JumpStep
output = JumpStep.call(input)

makes a new instance of FlatFieldStep and then runs. Because it is a new instance, it ignores any attributes of mystep that one may have set earlier, such overriding the sflat.

The nice thing about call() is that it can take a parameter file, so:

output = mystep.call(input, config_file=’my_flatfield.asdf’)

and it will take all the parameter from the config file.

Parameter parameters may be passed to the step by setting the config_file kwarg in call (which takes a path to a parameter file) or as keyword arguments. Any remaining positional arguments are passed along to the step’s process() method:

from jwst.stpipe import cleanup

cleanup.call('image.fits', config_file='do_cleanup.asdf', threshold=42.0)

So use call() if you’re passing a config file or passing along args or kwargs. Otherwise use run().