Introduction to the JWST Pipeline
Introduction
The JWST Science Calibration Pipeline processes data from all JWST instruments
and observing modes by applying various science corrections sequentially,
producing both fully-calibrated individual exposures and high-level data
products (mosaics, extracted spectra, etc.). The pipeline is written in Python,
hosted open-source on Github, and can be run either via
command line interface (strun
) or via
the Python interface.
The full end-to-end ‘pipeline’ (from raw data to high-level data products) is comprised of three seperate pipeline stages that are run individually to produce output products at different calibration levels:
- Stage 1:
Detector-level corrections and ramp fitting for individual exposures.
- Stage 2:
Instrument-mode calibrations for individual exposures.
- Stage 3:
Combining data from multiple exposures within an observation
As such, the term ‘pipeline’ may refer to a single pipeline stage or to the full three-stage series.
Because JWST has many different instruments and observing modes, there are several different pipeline modules available for each stage. There is one single pipeline for Stage 1 - corrections are applied nearly universally for all instruments and modes. There are two pipeline modules for Stage 2: one for imaging and one for spectroscopic modes. Stage 3 is divided into five separate modules for imaging, spectroscopic, coronagraphic, Aperture Masking Interferometry (AMI), and Time Series Observation (TSO) modes. Details of all the available pipeline modules can be found at Pipeline Modules.
Each pipeline stage consists of a series of sequential steps (e.g, saturation correction, ramp fitting). Each full pipeline stage and every individual step has a unique module name (i.e Detector1Pipeline, or DarkCurrentCorrection). Steps can also be run individually on data to apply a single correction. The output of each pipeline stage is the input to the next, and within a pipeline stage the output of each step is the input to the next.
The pipeline relies on three components to direct processing: input data,
step parameters, and reference files. The inputs to the pipeline modules are
individual exposures (fits
files) or associations of multiple exposures
(asn.json
files). The parameters for each pipeline step are determined
hierarchically from the parameter defaults, parameter reference files, and any
specified overrides at run time. Finally, reference files provide data for each
calibration step that is specific to the dataset being processed. These files
may depend on things like instrument, observing mode, and date. In both the
command line and Python interface, a pipeline or step module may be configured
before running. Reference files can be overridden from those chosen by CRDS,
steps in a pipeline can be skipped, step parameters can be changed, and the
output and intermediate output products can be controlled.
A pipeline (or individual step) outputs corrected data either by writing an output
file on disk or returning an in-memory datamodel object. The output file suffix
(i.e cal.fits
, rate.fits
) depends on level of calibration - each full pipeline
stage as well as each individual step have a unique file suffix so that outputs
may be obtained at any level of calibration. Other pipeline outputs include
photometry catalogs and alignment catalogs (at stage 3).
Overview of Pipeline Code
The following is a brief overview of how the pipeline code in jwst
is
organized.
Pipeline and Step Classes
The JWST pipeline is organized into two main classes - pipeline
classes and
step
classes. Pipelines are made up of sequential step
classes chained together,
the output of one step being piped to the next, but both pipelines and steps
are represented as objects that can be configured and run on input data.
Detector1Pipeline # an example of a pipeline class
DarkCurrentStep # an example of a step class
Each pipeline or step has a unique module name, which is the identifier used to invoke the correct pipeline/step when using either the Python or the Command Line Interface.
Package Structure
Within the jwst
repository, there are separate modules for each pipeline step.
There is also a pipeline
module, where the pipeline
classes, consisting of
step
classes called in sequence, are defined.
jwst/
assign_wcs/
assign_wcs_step.py # contains AssignWcsStep
...
dark_current/
dark_current_step.py # contains DarkCurrent Step
...
pipeline/
calwebb_detector1.py # contains Detector1Pipeline
calwebb_image2.py # contains Image2Pipeline
...
Dependencies
The jwst
package has several dependencies (see the pyproject.toml
file
in the top-level directory of jwst
for a full list). Some notable
dependencies include:
asdf
ASDF, the Advanced Scientific Data Format is the file format the JWST uses to encode world coordinate system (WCS) information.
gwcs
GWCS, Generalized World Coordinate System - is an generalized alternative to FITS WCS which makes use of astropy models to describle the translation between detector and sky coordinates. In JWST data, WCS information is encoded in an ASDF extension in the FITS file that contains GWCS object. In contrast, FITS WCS is limited because it stores the WCS transformation as header keywords, which is not sufficient to describe many of the transformations JWST uses.
stpipe
STPIPE contains base classes for
pipeline
and step
, and command line tools that are shared between the JWST
and Nancy Grace Roman Telescope
(Roman) pipelines.
stcal
The stcal
package contains step code that is common to both JWST and the Roman
telescope, to avoid redundancy. All step classes for the JWST
pipeline are still defined in jwst
, but some of the underlying code for these
steps lives in stcal
if the algorithm is shared by Roman (for example, ramp
fitting, saturation).