Reference Files, Parameter Files and CRDS
The JWST pipeline uses version-controlled reference files and parameter files to supply pipeline steps with necessary data and set pipeline/step parameters, respectivley. These files both use the ASDF format, and are managed by the Calibration References Data System (CRDS) system.
Reference Files
Most pipeline steps rely on the use of reference files that contain different types of calibration data or information necessary for processing the data. The reference files are instrument-specific and are periodically updated as the data processing evolves and the understanding of the instruments improves. They are created, tested, and validated by the JWST Instrument Teams. The teams ensure all the files are in the correct format and have all required header keywords. The files are then delivered to the Reference Data for Calibration and Tools (ReDCaT) Management Team. The result of this process is the files being ingested into the JWST Calibration Reference Data System (CRDS), and made available to users, the pipeline team and any other ground subsystem that needs access to them.
Information about all the reference files used by the Calibration Pipeline can be found at Reference File Information, as well as in the documentation for each Calibration Step that uses a reference file. Information on reference file types and their correspondence to calibration steps is described within the table at Reference File Types.
Parameter Files
Parameter files, which like reference files are encoded in ASDF and version-controlled by CRDS, define the ‘best’ set of parameters for pipeline steps as determined by the JWST instrument teams, based on insturment, observing model, filter, etc. They also may evolve over time as understanding of caibration improves.
By default, when running the pipeline via strun
or using the pipeline/step.call()
method when using the Python interface, the appropriate parameter file will be determined
and retrieved by CRDS to set step parameters.
CRDS
Calibration References Data System (CRDS) is the system that manages the reference files that the pipeline uses. For the JWST pipeline, CRDS manages both data reference files as well as parameter reference files which contain step parameters.
CRDS consists of external servers that hold all available reference files, and the machinery to map the correct reference files to datasets and download them to a local cache directory.
When the Pipeline is run, CRDS uses the metadata in the input file to determine the correct reference files to use for that dataset, and downloads them to a local cache directory if they haven’t already been downloaded so they’re available on your filesystem for the pipeline to use.
The environment variables `crds_context` and `crds_server` must be set before running the pipeline
Reference Files Mappings (CRDS Context)
One of the main functions of CRDS is to associate a dataset with its best
reference files - this mapping is referred to as the ‘CRDS context’ and is
defined in a pmap
file, which itself is version-controlled to allow access to
the reference file mapping at any point in time, and revert to any previous set
of reference files if desired.
The CRDS context is usually set by default to always give access
to the most recent reference file deliveries and selection rules - i.e the
‘best’, most up-to-date set of reference files. On occasion it might be
necessary or desirable to use one of the non-default mappings in order to, for
example, run different versions of the pipeline software or use older versions
of the reference files. This can be accomplished by setting the environment
variable CRDS_CONTEXT
to the desired project mapping version, e.g.
$ export CRDS_CONTEXT='jwst_0421.pmap'
For all information about CRDS, including context lists, see the JWST CRDS website:
CRDS Servers
The CRDS server can be found at
https://jwst-crds.stsci.edu
Inside the STScI network, the pipeline defaults are sufficient and no further action is necessary.
To run the pipeline outside the STScI network, CRDS must be configured by setting two environment variables:
CRDS_PATH: Local folder where CRDS content will be cached.
CRDS_SERVER_URL: The server from which to pull reference information
To setup to use the server, use the following settings:
export CRDS_PATH=$HOME/crds_cache/
export CRDS_SERVER_URL=https://jwst-crds.stsci.edu
CRDS Cache Configuration for Developers
For most pipeline users, the above settings will suffice for establishing a consistent
local cache. For pipeline developers or testers, however, it is important to be aware
that if you need to switch between CRDS servers (e.g. the ops
and test
servers), you
will need to establish a separate cache for each server. Using the same cache for
more than one server will lead to a corrupted local cache.
For example, the recommended configuration for developers while using the ops
server is :
export CRDS_PATH=$HOME/crds_cache/jwst_ops
export CRDS_SERVER_URL=https://jwst-crds.stsci.edu
and while using the test
server:
export CRDS_PATH=$HOME/crds_cache/jwst_test
export CRDS_SERVER_URL=https://jwst-test-crds.stsci.edu
If your cache does become corrupted, the best way to fix it is simply to remove the local cache and allow subsequent pipeline runs to repopulate it as needed. For example:
rm -r $CRDS_PATH
For more information on CRDS configuration, see the CRDS user guide posted to the JWST CRDS server.
Setting CRDS Environment Variables in Python
The CRDS environment variables need to be defined before importing anything
from jwst
or crds
. The examples above show how to set an environment variable in
the shell, but this can also be done within a Python session by using os.environ
.
In general, any scripts should assume the environment variables have been set before the scripts
have run. If one needs to define the CRDS environment variables within a script,
the following code snippet is the suggested method. These lines should be the first
executable lines:
import os
os.environ['CRDS_PATH'] = 'path_to_local_cache'
os.environ['CRDS_SERVER_URL'] = 'url-of-server-to-use'
# Now import anything else needed
import jwst