Default Outlier Detection Algorithm

This module serves as the interface for applying outlier_detection to direct image observations, like those taken with MIRI, NIRCam and NIRISS. The code implements the basic outlier detection algorithm used with HST data, as adapted to JWST.

Specifically, this routine performs the following operations:

  1. Extract parameter settings from input model and merge them with any user-provided values. See outlier detection arguments for the full list of parameters.

  2. Convert input data, as needed, to make sure it is in a format that can be processed.

    • A ModelContainer serves as the basic format for all processing performed by this step, as each entry will be treated as an element of a stack of images to be processed to identify bad-pixels/cosmic-rays and other artifacts.

    • If the input data is a CubeModel, convert it into a ModelContainer. This allows each plane of the cube to be treated as a separate 2D image for resampling (if done) and for combining into a median image.

  3. By default, resample all input images.

    • The resampling step starts by computing an output WCS that is large enoug to encompass all the input images.

    • All images from the same exposure will get resampled onto this output WCS to create a mosaic of all the chips for that exposure. This product is referred to as a “grouped mosaic” since it groups all the chips from the same exposure into a single image.

    • Each dither position will result in a separate grouped mosaic, so only a single exposure ever contributes to each pixel in these mosaics.

    • An explanation of how all NIRCam multiple detector group mosaics are defined from a single exposure or from a dithered set of exposures can be found here.

    • The fillval parameter specifies what value to use in the ouptut resampled image for any pixel which has no valid contribution from any input exposure. The default value of INDEF indicates that the value from the last exposure will be used, while a value of 0 would result in holes.

    • The resampling can be controlled with the pixfrac, kernel and weight_type parameters.

    • The pixfrac indicates the fraction by which input pixels are “shrunk” before being drizzled onto the output image grid, given as a real number between 0 and 1. This specifies the size of the footprint, or “dropsize”, of a pixel in units of the input pixel size.

    • The kernel specifies the form of the kernel function used to distribute flux onto the separate output images.

    • The weight_type indicates the type of weighting image to apply with the bad pixel mask. Available options are ivm (default) for computing and using an inverse-variance map and exptime for weighting by the exposure time.

    • The good_bits parameter specifies what DQ values from the input exposure should be used when resampling to create the output mosaic. Any pixel with a DQ value not included in this value (or list of values) will be ignored when resampling.

    • Resampled images will be written out to disk as _outlier_i2d.fits by default.

    • If resampling is turned off through the use of the resample_data parameter, a copy of the unrectified input images (as a ModelContainer) will be used for subsequent processing.

  4. Create a median image from all grouped observation mosaics.

    • The median image is created by combining all grouped mosaic images or non-resampled input data (as planes in a ModelContainer) pixel-by-pixel.

    • The nlow and nhigh parameters specify how many low and high values to ignore when computing the median for any given pixel.

    • The maskpt parameter sets the percentage of the weight image values to use, and any pixel with a weight below this value gets flagged as “bad” and ignored when resampled.

    • The median image is written out to disk as _<asn_id>_median.fits by default.

  5. By default, the median image is blotted back (inverse of resampling) to match each original input image.

    • Blotted images are written out to disk as _<asn_id>_blot.fits by default.

    • If resampling is turned off, the median image is compared directly to each input image.

  6. Perform statistical comparison between blotted image and original image to identify outliers.

    • This comparison uses the original input images, the blotted median image, and the derivative of the blotted image to create a cosmic ray mask for each input image.

    • The derivative of the blotted image gets created using the blotted median image to compute the absolute value of the difference between each pixel and its four surrounding neighbors with the largest value being the recorded derivative.

    • These derivative images are used to flag cosmic rays and other blemishes, such as satellite trails. Where the difference is larger than can be explained by noise statistics, the flattening effect of taking the median, or an error in the shift (the latter two effects are estimated using the image derivative), the suspect pixel is masked.

    • The backg parameter specifies a user-provided value to be used as the background estimate. This gets added to the background-subtracted blotted image to attempt to match the original background levels of the original input mosaic so that cosmic-rays (bad pixels) from the input mosaic can be identified more easily as outliers compared to the blotted mosaic.

    • Cosmic rays are flagged using the following rule:

      \[| image\_input - image\_blotted | > scale*image\_deriv + SNR*noise\]
    • The scale is defined as the multiplicative factor applied to the derivative which is used to determine if the difference between the data image and the blotted image is large enough to require masking.

    • The noise is calculated using a combination of the detector read noise and the poisson noise of the blotted median image plus the sky background.

    • The user must specify two cut-off signal-to-noise values using the snr parameter for determining whether a pixel should be masked: the first for detecting the primary cosmic ray, and the second for masking lower-level bad pixels adjacent to those found in the first pass. Since cosmic rays often extend across several pixels, the adjacent pixels make use of a slightly lower SNR threshold.

  7. Update input data model DQ arrays with mask of detected outliers.

Memory Model for Outlier Detection Algorithm

The outlier detection algorithm can end up using massive amounts of memory depending on the number of inputs, the size of each input, and the size of the final output product. Specifically,

  1. The input ModelContainer or CubeModel for IFU data, by default, all input exposures would have been kept open in memory to make processing more efficient.

  2. The initial resample step creates an output product for EACH input that is the same size as the final output product, which for imaging modes can span all chips in the detector while also accounting for all dithers. For some Level 3 products, each resampled image can be on the order of 2Gb or more.

  3. The median combination step then needs to have all pixels at the same position on the sky in memory in order to perform the median computation. The simplest implementation for this step requires keeping all resampled outputs fully in memory at the same time.

Many Level 3 products only include a modest number of input exposures that can be processed using less than 32Gb of memory at a time. However, there are a number of ways this memory limit can be exceeded. This has been addressed by implementing an overall memory model for the outlier detection that includes options to minimize the memory usage at the expense of file I/O. The control over this memory model happens with the use of the in_memory parameter. The full impact of this parameter during processing includes:

  1. The save_open parameter gets set to False when opening the input ModelContainer object. This forces all input models in the input ModelContainer or CubeModel to get written out to disk. The ModelContainer then uses the filename of the input model during subsequent processing.

  2. The in_memory parameter gets passed to the ResampleStep to set whether or not to keep the resampled images in memory or not. By default, the outlier detection processing sets this parameter to False so that each resampled image gets written out to disk.

  3. Computing the median image works section-by-section by only keeping 1Mb of each input in memory at a time. As a result, only the final output product array for the final median image along with a stack of 1Mb image sections are kept in memory.

  4. The final resampling step also avoids keeping all inputs in memory by only reading each input into memory 1 at a time as it gets resampled onto the final output product.

These changes result in a minimum amount of memory usage during processing at the obvious expense of reading and writing the products from disk.

Outlier Detection for TSO data

Time-series observations (TSO) result in input data stored as a 3D CubeModel where each plane in the cube represents a separate integration without changing the pointing. Normal imaging data benefit from combining all integrations into a single image. TSO data’s value, however, comes from looking for variations from one integration to the next. The outlier detection algorithm, therefore, gets run with a few variations to accomodate the nature of these 3D data.

  1. Input data is converted from a CubeModel (3D data array) to a ModelContainer

    • Each plane in the original input CubeModel gets copied to a separate model in the ModelContainer

  2. The median image is created without resampling the input data

    • All integrations are aligned already, so no resampling or shifting needs to be performed

  3. A matched median gets created by combining the single median frame with the noise model for each input integration.

  4. Perform statistical comparison between the matched median with each input integration.

  5. Update input data model DQ arrays with the mask of detected outliers.

Note

This same set of steps also gets used to perform outlier detection on coronographic data, because it too is processed as 3D (per-integration) cubes.

Outlier Detection for IFU data

Integral Field Unit (IFU) data is handled as 2D images, similar to direct imaging modes. The nature of the detection algorithm, however, is quite different and involves measuring the differences between neighboring pixels in the spatial (cross-dispersion) direction within the IFU slice images. See the IFU outlier detection documentation for all the details.

jwst.outlier_detection.outlier_detection Module

Primary code for performing outlier detection on JWST observations.

Functions

flag_cr(sci_image, blot_image[, snr, scale, ...])

Masks outliers in science image by updating DQ in-place

abs_deriv(array)

Take the absolute derivate of a numpy array.

Classes

OutlierDetection(input_models[, reffiles])

Main class for performing outlier detection.

Class Inheritance Diagram

Inheritance diagram of jwst.outlier_detection.outlier_detection.OutlierDetection