Reference Files, Parameter Files and CRDS

The JWST pipeline uses version-controlled reference files and parameter files to supply pipeline steps with necessary data and set pipeline/step parameters, respectivley. These files both use the ASDF format, and are managed by the Calibration References Data System (CRDS) system.

Reference Files

Most pipeline steps rely on the use of reference files that contain different types of calibration data or information necessary for processing the data. The reference files are instrument-specific and are periodically updated as the data processing evolves and the understanding of the instruments improves. They are created, tested, and validated by the JWST Instrument Teams. The teams ensure all the files are in the correct format and have all required header keywords. The files are then delivered to the Reference Data for Calibration and Tools (ReDCaT) Management Team. The result of this process is the files being ingested into the JWST Calibration Reference Data System (CRDS), and made available to users, the pipeline team and any other ground subsystem that needs access to them.

Information about all the reference files used by the Calibration Pipeline can be found at Reference File Information, as well as in the documentation for each Calibration Step that uses a reference file. Information on reference file types and their correspondence to calibration steps is described within the table at Reference File Types.

Parameter Files

Parameter files, which like reference files are encoded in ASDF and version-controlled by CRDS, define the ‘best’ set of parameters for pipeline steps as determined by the JWST instrument teams, based on insturment, observing model, filter, etc. They also may evolve over time as understanding of caibration improves.

By default, when running the pipeline via strun or using the pipeline/step.call() method when using the Python interface, the appropriate parameter file will be determined and retrieved by CRDS to set step parameters.

CRDS

Calibration References Data System (CRDS) is the system that manages the reference files that the pipeline uses. For the JWST pipeline, CRDS manages both data reference files as well as parameter reference files which contain step parameters.

CRDS consists of external servers that hold all available reference files, and the machinery to map the correct reference files to datasets and download them to a local cache directory.

When the Pipeline is run, CRDS uses the metadata in the input file to determine the correct reference files to use for that dataset, and downloads them to a local cache directory if they haven’t already been downloaded so they’re available on your filesystem for the pipeline to use.

The environment variables `crds_context` and `crds_server` must be set before running the pipeline

Reference Files Mappings (CRDS Context)

One of the main functions of CRDS is to associate a dataset with its best reference files - this mapping is referred to as the ‘CRDS context’ and is defined in a pmap file, which itself is version-controlled to allow access to the reference file mapping at any point in time, and revert to any previous set of reference files if desired.

The CRDS context is usually set by default to always give access to the most recent reference file deliveries and selection rules - i.e the ‘best’, most up-to-date set of reference files. On occasion it might be necessary or desirable to use one of the non-default mappings in order to, for example, run different versions of the pipeline software or use older versions of the reference files. This can be accomplished by setting the environment variable CRDS_CONTEXT to the desired project mapping version, e.g.

$ export CRDS_CONTEXT='jwst_0421.pmap'

For all information about CRDS, including context lists, see the JWST CRDS website:

CRDS Servers

The CRDS server can be found at

https://jwst-crds.stsci.edu

Inside the STScI network, the pipeline defaults are sufficient and no further action is necessary.

To run the pipeline outside the STScI network, CRDS must be configured by setting two environment variables:

  • CRDS_PATH: Local folder where CRDS content will be cached.

  • CRDS_SERVER_URL: The server from which to pull reference information

To setup to use the server, use the following settings:

export CRDS_PATH=$HOME/crds_cache/
export CRDS_SERVER_URL=https://jwst-crds.stsci.edu

CRDS Cache Configuration for Developers

For most pipeline users, the above settings will suffice for establishing a consistent local cache. For pipeline developers or testers, however, it is important to be aware that if you need to switch between CRDS servers (e.g. the ops and test servers), you will need to establish a separate cache for each server. Using the same cache for more than one server will lead to a corrupted local cache.

For example, the recommended configuration for developers while using the ops server is :

export CRDS_PATH=$HOME/crds_cache/jwst_ops
export CRDS_SERVER_URL=https://jwst-crds.stsci.edu

and while using the test server:

export CRDS_PATH=$HOME/crds_cache/jwst_test
export CRDS_SERVER_URL=https://jwst-test-crds.stsci.edu

If your cache does become corrupted, the best way to fix it is simply to remove the local cache and allow subsequent pipeline runs to repopulate it as needed. For example:

rm -r $CRDS_PATH

For more information on CRDS configuration, see the CRDS user guide posted to the JWST CRDS server.

Setting CRDS Environment Variables in Python

The CRDS environment variables need to be defined before importing anything from jwst or crds. The examples above show how to set an environment variable in the shell, but this can also be done within a Python session by using os.environ. In general, any scripts should assume the environment variables have been set before the scripts have run. If one needs to define the CRDS environment variables within a script, the following code snippet is the suggested method. These lines should be the first executable lines:

import os
os.environ['CRDS_PATH'] = 'path_to_local_cache'
os.environ['CRDS_SERVER_URL'] = 'url-of-server-to-use'

# Now import anything else needed
import jwst