3.2 Pipeline Processing Overview

The calibration pipeline, calcos, has been developed by STScI to support the calibration of HST/COS data. Although the COS pipeline benefits from the design heritage of previous HST instruments and of the Far Ultraviolet Spectroscopic Explorer (FUSE), the calcos modules are tailored specifically to the COS instrument and based on data reduction algorithms defined by the COS Investigation Definition Team (IDT) and the COS team at STScI. As with other HST pipelines, calcos uses an association table (the _asn files) to specify the data files to be included when calibrating, and employs header keywords to specify the calibration steps to be performed and the reference files to be used. Calcos is written in Python, an open-source, easy-to-read scripting language, with many libraries for data reduction and analysis. Calcos can be found in the stenv python distribution, which is available for download from STScI.

Calcos is designed with a common underlying structure for processing FUV and NUV channels which, respectively, use a cross delay line (XDL) and a Multi Anode Microchannel Array (MAMA) detector. The calcos calibration pipeline includes pulse-height filtering and geometric correction for the FUV channel, and flat-field, deadtime, and Doppler correction for both channels. It includes methods for obtaining an accurate wavelength calibration by using the onboard spectral line lamps. A background subtracted spectrum is produced and the instrument sensitivity is applied to create the final flux calibrated spectrum.

There are two basic types of raw data files: TIME-TAG photon lists and ACCUM images of the detector. Calcos must convert these into one dimensional calibrated flux and wavelength arrays, and must be able to perform different types of calibration processes to accommodate the different input types.

The level of calibration performed depends upon the data type.

  • Acquisition-mode exposures (ACQ/SEARCH, ACQ/PEAKXD, and ACQ/PEAKD) are not calibrated by calcos, with the exception of ACQ/IMAGE. Only the raw data from the uncalibrated modes are provided.
  • All other science data, including NUV imaging data (ACQ/IMAGE), are completely calibrated. This includes pulse height filtering, geometric and thermal correction for the FUV data, flat fielding, and linearity corrections. The spectroscopic data are also flux calibrated and corrected for time dependence in the instrumental sensitivity. The data flow and calibration modules for processing the data are described in detail in sections 3.3 and 3.4.

The treatment of TIME-TAG and ACCUM mode data differs:

  • Raw data taken in TIME-TAG mode are event lists (rawtag binary tables). The basic calibration is done on the tabular data, producing a calibrated (corrtag) events table. The events are then accumulated into a calibrated image (flt) by calcos.
  • Raw data taken in ACCUM mode (_rawaccum) are binned into an image array onboard the spacecraft.

For spectral data, calcos extracts a spectrum from the flat-fielded image, computes associated wavelengths, and converts the count rates to flux densities, yielding a one-dimensional, background subtracted spectrum. For FUV data there will normally be two spectra, one from segment A and one from segment B. The two FUV segments are processed independently. For NUV data there will normally be three spectra, one for each spectral "stripe." When multiple exposures with the same setting (grating and central wavelength) but different FP-POS are contained within a single visit, these are combined into a single, summed spectrum.

See Chapter 2 for the naming conventions of the various input, temporary, and output calibrated files.

3.2.1 Overview of TWOZONE extraction

With the move to Lifetime Position 3 in February 2015, it became increasingly difficult to find science and background regions on the FUV detector that are free from overlap with the gain sagged regions from Lifetime Position 1. To allow reliable spectral extraction close to these gain sagged regions, a new method of spectral extraction was developed and implemented in calcos starting with version 3.0. Under the older "BOXCAR" algorithm, a rather large extraction region is used to ensure that all of the flux is collected, even for slightly miscentered targets. If any pixel in the BOXCAR extraction region is identified as bad (i.e., has a data quality flag within the Bad Pixel File (BPIXTAB) that matches those included in SDQFLAGS), the entire wavelength bin is rejected as bad and excluded from the summed files.

The newer "TWOZONE" algorithm is based on the assumption that bad pixels and gain-sagged regions that are in the outer wings of the point-source profile do not have a large enough impact on the extracted flux to force rejection of the wavelength bin; instead wavelength bins should only be rejected if a bad pixel occurs in the core of the profile. This allows for spectrum extraction with only a small error even when the far wings of the profile may overlap with gain-sagged regions near LP1. Note that the locations of LPs 3 through 6 were carefully chosen so that previous gain-sagged regions would not significantly impact the spectral quality and flux accuracy of the science spectra.

To implement this concept, the TWOZONE method divides the spectral extraction region into two parts: an INNER zone that defines the core of the profile, and an OUTER zone that includes the entire region used for the spectral extraction (note that the OUTER zone as defined here includes the INNER zone). The upper and lower boundaries for each of these zones are wavelength dependent and are defined in terms of the fraction of enclosed energy expected for the cross-dispersion profile of a point source. These enclosed energy fractions are set for each CENWAVE setting in the new TWOZXTAB reference file. For all settings, the reference files are currently set by default to define the central 80% of the profile’s enclosed energy as the INNER zone and 99% as the OUTER zone, but these boundaries can be adjusted to tailor the extraction (see 3.6 Customizing COS Data Calibration). The wavelength dependent point-source spatial profiles for each setting are contained in the PROFTAB reference file.

This approach has a number of additional consequences. In order to tabulate reference profiles that are sufficiently smooth as a function of wavelength, it proved necessary to first straighten the spectral image to correct the small-scale distortions in the cross-dispersion direction. This resulted in the addition of a new TRCECORR step, which uses corrections tabulated in the TRACETAB reference file. In addition, precise alignment of the observed spectrum with the reference profile is needed to ensure accurate flux extraction, and to do this the new ALGNCORR step was added. We recommend that the TRCECORR steps always be used whenever using the new TWOZONE algorithm, and be omitted when using the older BOXCAR algorithm. While it is possible to turn these steps on and off separately, the reliability of the extracted spectra produced may be adversely affected.

Note: The TWOZONE algorithm is only used for FUV data taken at the third and subsequent COS FUV Lifetime Positions (LP3 and up).

The TWOZONE algorithm is only used for FUV data taken at the third and subsequent COS FUV Lifetime Positions (LP3 and up). All NUV data and FUV data taken at LP1 and LP2 continue to be calibrated using the older BOXCAR algorithm. Note that when most FUV settings were moved to LP3 and later subsequent LP, the 1055 and 1096 CENWAVE settings of the G130M grating were left at LP2 because of their large cross-dispersion widths, and they will therefore continue to be calibrated with the BOXCAR algorithm.

3.2.2 Extended sources

Since the new TWOZONE algorithm shrinks the final region used for the spectral extraction to enclose only 99% of the expected point-source profile enclosed energy, the flux accuracy for extended sources will be more easily affected than was the case for the BOXCAR algorithm, which uses a larger fixed extraction height. In addition, the more extended spatial profile for these sources may increase the overlap with the gain-sagged regions near LP1, leading to significant loss of flux. Observations of extended sources therefore likely require customized extractions to produce optimum results, and close examination of the spectral images and extractions to identify artifacts in the reduced products is recommended.

Note: Observations of extended sources likely require customized extractions to produce optimum results.