Reference Files, Parameter Files and CRDS

The JWST pipeline uses version-controlled reference files and parameter files to supply pipeline steps with necessary data and set pipeline/step parameters, respectively. These files both use the ASDF format, and are managed by the Calibration References Data System (CRDS) system.

Reference Files

Most pipeline steps rely on the use of reference files that contain different types of calibration data or information necessary for processing the data. The reference files are instrument-specific and are periodically updated as the data processing evolves and the understanding of the instruments improves. They are created, tested, and validated by the JWST Instrument Teams. The teams ensure all the files are in the correct format and have all required header keywords. The files are then delivered to the Reference Data for Calibration and Tools (ReDCaT) Management Team. The result of this process is the files being ingested into the JWST Calibration Reference Data System (CRDS), and made available to users, the pipeline team and any other ground subsystem that needs access to them.

Information about all the reference files used by the Calibration Pipeline can be found at Reference File Information, as well as in the documentation for each Calibration Step that uses a reference file. Information on reference file types and their correspondence to calibration steps is described within the table at Reference File Types.

Parameter Files

Parameter files, which like reference files are encoded in ASDF and version-controlled by CRDS, define the ‘best’ set of parameters for pipeline steps as determined by the JWST instrument teams, based on instrument, observing model, filter, etc. They also may evolve over time as understanding of caibration improves.

By default, when running the pipeline via strun or using the pipeline/step.call() method when using the Python interface, the appropriate parameter file will be determined and retrieved by CRDS to set step parameters.

CRDS

Calibration References Data System (CRDS) is the system that manages the reference files that the pipeline uses. For the JWST pipeline, CRDS manages both data reference files as well as parameter reference files which contain step parameters.

CRDS consists of external servers that hold all available reference files, and the machinery to map the correct reference files to datasets and download them to a local cache directory.

When the Pipeline is run, CRDS uses the metadata in the input file to determine the correct reference files to use for that dataset, and downloads them to a local cache directory if they haven’t already been downloaded so they’re available on your filesystem for the pipeline to use.

Note

The environment variables crds_context and crds_server must be set before running the pipeline.

Reference Files Mappings (CRDS Context)

One of the main functions of CRDS is to associate a dataset with its best reference files - this mapping is referred to as the ‘CRDS context’ and is defined in a .pmap file, which itself is version-controlled to allow access to the reference file mapping at any point in time, and revert to any previous set of reference files if desired.

The CRDS context is usually set by default to always give the ‘best’ reference files associated with a given pipeline version. To use a specific CRDS context other than that automatically associated with a given pipeline version (see https://jwst-docs.stsci.edu/jwst-science-calibration-pipeline/crds-migration-to-quarterly-calibration-updates), the environment variable CRDS_CONTEXT can be used, e.g.:

export CRDS_CONTEXT='jwst_1293.pmap'

For all information about CRDS, including context lists, see the JWST CRDS website: https://jwst-crds.stsci.edu/

CRDS Servers

The CRDS server [1] can be found at https://jwst-crds.stsci.edu

To run the pipeline inside the STScI network, CRDS must be configured to find the CRDS server by setting the environment variable:

export CRDS_SERVER_URL=https://jwst-crds.stsci.edu

This server will be used to determine the appropriate CRDS context for a given pipeline version, and the pipeline will obtain individual reference files within this context from a local shared disk.

To run the pipeline outside the STScI network, CRDS must be configured by setting two environment variables:

export CRDS_PATH=$HOME/crds_cache/
export CRDS_SERVER_URL=https://jwst-crds.stsci.edu

This server will be used to determine the appropriate CRDS context for a given pipeline version, and the pipeline will automatically download individual reference files within this context to the local cache specified by CRDS_PATH.

CRDS Cache Configuration for Developers

For most pipeline users, the above settings will suffice for establishing a consistent local cache. For pipeline developers or testers, however, it is important to be aware that if you need to switch between CRDS servers (e.g. the ops and test servers), you will need to establish a separate cache for each server. Using the same cache for more than one server will lead to a corrupted local cache.

For example, the recommended configuration for developers while using the ops server is:

export CRDS_PATH=$HOME/crds_cache/jwst_ops
export CRDS_SERVER_URL=https://jwst-crds.stsci.edu

and while using the test server:

export CRDS_PATH=$HOME/crds_cache/jwst_test
export CRDS_SERVER_URL=https://jwst-crds-test.stsci.edu

If your cache does become corrupted, the best way to fix it is simply to remove the local cache and allow subsequent pipeline runs to repopulate it as needed. For example:

rm -r $CRDS_PATH

For more information on CRDS configuration, see the CRDS user guide posted to the JWST CRDS server.

Setting CRDS Environment Variables in Python

The CRDS environment variables need to be defined before importing anything from jwst or crds. The examples above show how to set an environment variable in the shell, but this can also be done within a Python session by using os.environ. In general, any scripts should assume the environment variables have been set before the scripts have run. If one needs to define the CRDS environment variables within a script, the following code snippet is the suggested method. These lines should be the first executable lines:

import os
os.environ['CRDS_PATH'] = 'path_to_local_cache'
os.environ['CRDS_SERVER_URL'] = 'url-of-server-to-use'

# Now import anything else needed
import jwst