This document provides instructions on running the JWST Science Calibration Pipeline (referred to as “the pipeline”) and individual pipeline steps.
Multiple pipeline modules are used for different stages of processing and for different JWST observing modes. The modules are broken into 3 stages:
Stage 1: Detector-level corrections and ramp fitting for individual exposures
Stage 2: Instrument-mode calibrations for individual exposures
Stage 3: Combining data from multiple exposures within an observation
Stage 1 corrections are applied nearly universally for all instruments and modes. Stage 2 is divided into separate modules for imaging and spectroscopic modes. Stage 3 is divided into five separate modules for imaging, spectroscopic, coronagraphic, Aperture Masking Interferometry (AMI), and Time Series Observation (TSO) modes.
Details of all the pipeline modules can be found at Pipeline Modules. The remainder of this document discusses pipeline configuration files and gives examples of running pipelines as a whole or in individual steps.
Many pipeline steps rely on the use of reference files that contain different types of calibration data or information necessary for processing the data. The reference files are instrument-specific and are periodically updated as the data processing evolves and the understanding of the instruments improves. They are created, tested, and validated by the JWST Instrument Teams. They ensure all the files are in the correct format and have all required header keywords. The files are then delivered to the Reference Data for Calibration and Tools (ReDCaT) Management Team. The result of this process is the files being ingested into the JWST Calibration Reference Data System (CRDS), and made available to the pipeline team and any other ground subsystem that needs access to them.
Information about all the reference files used by the Calibration Pipeline can be found at Reference File Information, as well as in the documentation for each Calibration Step that uses a reference file.
CRDS reference file mappings are usually set by default to always give access
to the most recent reference file deliveries and selection rules. On
occasion it might be necessary or desirable to use one of the non-default
mappings in order to, for example, run different versions of the pipeline
software or use older versions of the reference files. This can be
accomplished by setting the environment variable
CRDS_CONTEXT to the
desired project mapping version, e.g.
$ export CRDS_CONTEXT='jwst_0421.pmap'
Within STScI, the current storage location for all JWST CRDS reference files is:
Each pipeline step records the reference file that it used in the value of
a header keyword in the output data file. The keyword names use the syntax
“R_<ref>”, where <ref> corresponds to a 6-character version of the reference
file type, such as
Running From the Command Line¶
Individual steps and pipelines (consisting of a series of steps) can be run
from the command line using the
$ strun <class_name or configuration_file> <input_file>
The first argument to
strun must be either the python class name of the
step or pipeline to be run, or the name of a configuration (.asdf or .cfg) file for the
desired step or pipeline (see Configuration Files below for more details).
The second argument to
strun is the name of the input data file to be processed.
For example, running the full stage 1 pipeline or an individual step by referencing their class names is done as follows:
$ strun jwst.pipeline.Detector1Pipeline jw00017001001_01101_00001_nrca1_uncal.fits $ strun jwst.dq_init.DQInitStep jw00017001001_01101_00001_nrca1_uncal.fits
When a pipeline or step is executed in this manner (i.e. by referencing the class name), it will be run using a CRDS-supplied configuration merged with default values
If you want to use non-default parameter values, you can specify them as keyword arguments on the command line or set them in the appropriate configuration file.
To specify parameter values for an individual step when running a pipeline
use the syntax
For example, to override the default selection of a dark current reference
file from CRDS when running a pipeline:
$ strun jwst.pipeline.Detector1Pipeline jw00017001001_01101_00001_nrca1_uncal.fits --steps.dark_current.override_dark='my_dark.fits' $ strun calwebb_detector1.cfg jw00017001001_01101_00001_nrca1_uncal.fits --steps.dark_current.override_dark='my_dark.fits'
You can get a list of all the available arguments for a given pipeline or step by using the ‘-h’ (help) argument to strun:
$ strun dq_init.cfg -h $ strun jwst.pipeline.Detector1Pipeline -h
JWST automatic processing uses configuration files to determine the
pipeline/step parameters to use. To retrieve these files, use the
collect_pipeline_cfgs command. The general form of the command is:
$ collect_pipeline_cfgs <dir>
<dir> is the destination directory. If the directory does not exist,
it will be created. For example, to place the configuration files in the current
working directory, use:
$ collect_pipeline_cfgs .
To use a configuration, specify the desired file in place of the
specification in the
strun command. For example, to run the
Detector1Pipeline using the
calwebb_detector1.cfg configuration file, use
$ strun calwebb_detector1.cfg jw00017001001_01101_00001_nrca1_uncal.fits
These configuration files can be edited as needed, or created completely from scratch. For more information, see the Configuration Files file section below.
strun produces the following exit status codes:
0: Successful completion of the step/pipeline
1: General error occurred
64: No science data found
The “No science data found” condition is returned by the
assign_wcs step of
calwebb_spec2.cfg pipeline when, after successfully determining the WCS
solution for a file, the WCS indicates that no science data will be found. This
condition most often occurs with NIRSpec’s NRS2 detector: There are certain
optical and MSA configurations in which dispersion will not cross to the NRS2
Running From Within Python¶
You can execute a pipeline or a step from within python by using the
run methods of the class, or by calling the pipeline or
step instance directly.
call method creates a new instance of the class and runs the pipeline or
step. Optional parameter settings can be specified by supplying a configuration file,
or via keyword arguments. Examples are shown on the Execute via call() page.
from jwst.pipeline import Detector1Pipeline result = Detector1Pipeline.call('jw00017001001_01101_00001_nrca1_uncal.fits') from jwst.linearity import LinearityStep result = LinearityStep.call('jw00001001001_01101_00001_mirimage_uncal.fits')
Another way to call the pipeline is by calling the instance of the pipeline directly. First create an instance, then set any desired parameter values and finally, execute. In this case, do not instatiate the pipeline with a configuration file. Examples are shown on the Execute via run() page.
pipe = Detector1Pipeline() pipe.jump.rejection_threshold = 5 result = pipe('jw00017001001_01101_00001_nrca1_uncal.fits')
A functionally identical way to execute the pipeline or step is to use the
method. Examples are shown on the Execute via run() page.
pipe = Detector1Pipeline() pipe.jump.rejection_threshold = 5 result = pipe.run('jw00017001001_01101_00001_nrca1_uncal.fits')
For more details on the different ways to run a pipeline step, see the Configuring a Step page.
Input and Output File Conventions¶
There are two general types of input to any step or pipeline: references files and data files. The references files, unless explicitly overridden, are provided through CRDS.
Data files are the science input, such as exposure FITS files and association files. All files are assumed to be co-resident in the directory where the primary input file is located. This is particularly important for associations: JWST associations contain file names only. All files referred to by an association are expected to be located in the directory in which the association file is located.
Output files will be created either in the current working directory, or where specified by the output_dir configuration parameter.
File names for the outputs from pipelines and steps come from three different sources:
The name of the input file
The product name defined in an association
As specified by the output_file argument
Regardless of the source, each pipeline/step uses the name as a “base name”, onto which several different suffixes are appended, which indicate the type of data in that particular file. A list of the main suffixes can be found below.
The pipelines do not manage versions. When re-running a pipeline, previous files will be overwritten.
Output File and Associations¶
Stage 2 pipelines can take an individual file or an association as input. Nearly all Stage 3 pipelines require an association as input. Normally, the output file is defined in each association’s “product name” which defines the basename that will be used for output file naming.
Often, one may reprocess the same set of data multiple times, such as to change
reference files or parameters in configuration parameters.
When doing so, it is highly suggested to use
output_dir to place
the results in a different directory instead of using
rename the output files. Most pipelines and steps create a set of output files.
Separating runs by directory may be much easier to manage.
Individual Step Outputs¶
If individual steps are executed without an output file name specified via
output_file argument, the
automatically uses the input file name as the root of the output file name
and appends the name of the step as an additional suffix to the input file
name. If the input file name already has a known suffix, that suffix
will be replaced. For example:
$ strun dq_init.cfg jw00017001001_01101_00001_nrca1_uncal.fits
produces an output file named
See Pipeline/Step Suffix Definitions for a list of the more common suffixes used.
By default, all pipeline and step outputs will drop into the current
working directory, i.e., the directory in which the process is
running. To change this, use the
output_dir argument. For example, to
have all output from
calwebb_detector1, including any saved
intermediate steps, appear in the sub-directory
$ strun calwebb_detector1.cfg jw00017001001_01101_00001_nrca1_uncal.fits --output_dir=calibrated
output_dir can be specified at the step level, overriding what was
specified for the pipeline. From the example above, to change the name
and location of the
dark_current step, use the following
$ strun calwebb_detector1.cfg jw00017001001_01101_00001_nrca1_uncal.fits --output_dir=calibrated --steps.dark_current.output_file='dark_sub.fits' --steps.dark_current.output_dir='dark_calibrated'
When running a pipeline, the
stpipe infrastructure automatically passes the
output data model from one step to the input of the next step, without
saving any intermediate results to disk. If you want to save the results from
individual steps, you have two options:
This option will save the results of the step, using a filename created by the step.
Specify a file name using
This option will save the step results using the name specified.
For example, to save the result from the dark current step of
calwebb_detector1 in a file named based on
$ strun calwebb_detector1.cfg jw00017001001_01101_00001_nrca1_uncal.fits --steps.dark_current.output_file='intermediate'
intermediate_dark_current.fits, will then be created. Note that the
suffix of the step is always appended to any given name.
You can also specify a particular file name for saving the end result of
the entire pipeline using the
--output_file argument also
$ strun calwebb_detector1.cfg jw00017001001_01101_00001_nrca1_uncal.fits --output_file='stage1_processed'
In this situation, using the default configuration, three files are created:
Override Reference File¶
For any step that uses a calibration reference file you always have the
option to override the automatic selection of a reference file from CRDS and
specify your own file to use. Arguments for this are of the form
ref_type is the name of the reference file
type, such as
linearity. When in doubt as to
the correct name, just use the
-h argument to
strun to show you the list
of available override arguments.
To override the use of the default linearity file selection, for example, you would use:
$ strun calwebb_detector1.cfg jw00017001001_01101_00001_nrca1_uncal.fits --steps.linearity.override_linearity='my_lin.fits'
Another argument available to all steps in a pipeline is
skip=True is set for any step, that step will be skipped, with the
output of the previous step being automatically passed directly to the input
of the step following the one that was skipped. For example, if you want to
skip the linearity correction step, edit the calwebb_detector1.cfg file to
[steps] [[linearity]] skip = True ...
Alternatively you can specify the
skip argument on the command line:
$ strun calwebb_detector1.cfg jw00017001001_01101_00001_nrca1_uncal.fits --steps.linearity.skip=True
The name of a file in which to save log information, as well as the desired level of logging messages, can be specified in an optional configuration file “stpipe-log.cfg”. This file must be in the same directory in which you run the pipeline in order for it to be used. If this file does not exist, the default logging mechanism is STDOUT, with a level of INFO. An example of the contents of the stpipe-log.cfg file is:
[*] handler = file:pipeline.log level = INFO
If there’s no
stpipe-log.cfg file in the working directory, which specifies
how to handle process log information, the default is to display log messages
to stdout. If you want log information saved to a file, you can specify the
name of a logging configuration file either on the command line or in the
pipeline cfg file.
$ strun calwebb_detector1.cfg jw00017001001_01101_00001_nrca1_uncal.fits --logcfg=pipeline-log.cfg
and the file
[*] handler = file:pipeline.log level = INFO
In this example log information is written to a file called
level argument in the log cfg file can be set to one of the standard
logging level designations of
CRITICAL. Only messages at or above the specified level
will be displayed.
stpipe-log.cfg can lead to confusion, especially if it is
forgotten about. If one has not run a pipeline in awhile, and then sees no
logging information, most likely it is because
present. Consider using a different name and specifying it explicitly on the
Configuration files can be used to specify parameter values when running a pipeline or individual steps. For JWST, configuration files are retrieved from CRDS, just as with other reference files. If there is no match between a step, the input data, and CRDS, the coded defaults are used. These values can be overridden either by the command line options, as previously described, and by a local configuration file. See Parameter Precedence for a full description of how a parameter gets its final value.
Step parameters from CRDS can be completely disabled by
--disable-crds-steppars command-line switch, or setting the
A configuration file should be used when there are parameters a user wishes to
change from the default/CRDS version for a custom run of the step. To create a
configuration file add
--save-parameters <filename.asdf> to the command:
$ strun <step.class> <required-input-files> --save-parameters <filename.asdf>
For example, to save the parameters used for a run of the
calwebb_image2.cfg pipeline, use:
$ collect_pipeline_cfgs . $ strun calwebb_image2.cfg jw82500001003_02101_00001_NRCALONG_rate.fits --save-parameters my_image2.asdf
Once saved, the file can be edited, removing parameters that should be left at their default/CRDS values, and setting the remaining parameters to the desired values. Once modified, the new configuration file can be used:
$ strun my_image2.asdf jw82500001003_02101_00001_NRCALONG_rate.fits
Note that the parameter values will reflect whatever was set on the command-line, through a specified local configuration file, and what was retrieved from CRDS. In short, the values will be those actually used in the running of the step.
For more information about and editing of configuration files, see ASDF Configuration Files. Note that the older Configuration (CFG) Files format is still an option, understanding that this format will be deprecated.
More information on configuration files can be found in the
Guide at For Users.
There are many pre-defined pipeline modules for processing data from different instrument observing modes through each of the 3 stages of calibration. For all of the details see Pipeline Stages.
Pipeline/Step Suffix Definitions¶
However the output file name is determined (see above), the various stage 1, 2, and 3 pipeline modules will use that file name, along with a set of predetermined suffixes, to compose output file names. The output file name suffix will always replace any known suffix of the input file name. Each pipeline module uses the appropriate suffix for the product(s) it is creating. The list of suffixes is shown in the following table. Replacement occurs only if the suffix is one known to the calibration code. Otherwise, the new suffix will simply be appended to the basename of the file.
Uncalibrated raw input
Corrected ramp data
Corrected countrate image
Corrected countrate per integration
Optional fitting results from ramp_fit step
Per integration background-subtracted image
Calibrated per integration images
CR-flagged per integration images
Resampled 2D image
Resampled 2D spectrum
Resampled 3D IFU cube
1D extracted spectrum
1D extracted spectra per integration
1D combined spectrum
Time Series photometric catalog
Time Series white-light catalog
Coronagraphic PSF image stack
Coronagraphic PSF-aligned images
Coronagraphic PSF-subtracted images
AMI fringe and closure phases
AMI averaged fringe and closure phases
AMI normalized fringe and closure phases
For More Information¶
More information on logging and running pipelines can be found in the
User’s Guide at For Users.
More detailed information on writing pipelines can be found
stpipe Developer’s Guide at For Developers.