5.3 Working with Extracted Spectra

Here we discuss how to customize the extraction of spectra and modify reference files that control the extraction process with calcos. Please note that STScI supports the FITS module located in the Python astropy.io package. This module provides the same core functionality as the earlier module PyFITS. Any script that is uses PyFITS can be modified by importing FITS into Python as follows:

from astropy.io import fits as pyfits

Information on the FITS module can be found on the astropy webpage.

5.3.1 Working with x1d Files in Python

When calibrating a single spectroscopic exposure, the calcos pipeline creates a one-dimensional extracted spectrum file, with suffix x1d and a filename such as "l9v220eqq_x1d.fits."

COS x1d files are MEF format files and their data contents and extension formats are discussed in Section 2.4.3. As with other COS data files, the primary [0] header will contain only header information and no data. The extracted spectra are stored in a single [SCI] extension as a multi-dimensional binary table. A standard FITS table consists of columns and rows forming a two-dimensional grid of cells; however, each of these cells can contain a data array, effectively creating a table of higher dimensionality.

Using the "Selectors Syntax" to work with tables

In order to analyze COS tabular spectral data with most Python modules, one can use the selectors syntax to specify the desired row and column (e.g., the wavelength or flux). The general syntax for selecting a particular cell is to first select the desired segment, and then select the column:

> import matplotlib.pyplot as plt
> from astropy.io import fits
> x1ddata = fits.getdata('ldel02cxq_x1d.fits')
> # select segment A or segment B data
> x1d_fuva = x1ddata['segment'] == 'FUVA'
> x1d_fuvb = x1ddata['segment'] == 'FUVB'
> # select the wavelength and flux columns for each segment
> wave_fuva = x1ddata[x1d_fuva]['wavelength'].flatten()
> flux_fuva = x1ddata[x1d_fuva]['flux'].flatten()
> wave_fuvb = x1ddata[x1d_fuvb]['wavelength'].flatten()
> flux_fuvb = x1ddata[x1d_fuvb]['flux'].flatten()
> # plot segment A
> plt.plot(wave_fuva, flux_fuva)
> plt.xlabel('Wavelength')
> plt.ylabel('Flux')
> plt.show()

Similarly, one could use the where() module in numpy to select the segment:

> from astropy.io import fits
> from astropy.table import Table
> hdulist = fits.open('s7g1700hl_disp.fits')
> tbdata = hdulist[1].data
> print(Table(tbdata))
SEGMENT OPT_ELEM APERTURE CENWAVE NELEM       COEFF [4]      
------- -------- -------- ------- ----- ---------------------
   FUVA G130M        PSA     1291     2 1272.64958352 .. -0.0
   FUVA G130M        WCA     1291     2 1272.21660012 .. -0.0
   FUVA G130M        BOA     1291     2 1272.64958352 .. -0.0
   FUVB G130M        PSA     1291     2 1119.38291962 .. -0.0
   FUVB G130M        WCA     1291     2 1118.98191865 .. -0.0
   FUVB G130M        BOA     1291     2 1119.38291962 .. -0.0
   FUVA G130M        PSA     1300     2 1282.84550637 .. -0.0
   FUVA G130M        WCA     1300     2 1282.41252297 .. -0.0
   FUVA G130M        BOA     1300     2 1282.84550637 .. -0.0
   FUVB G130M        PSA     1300     2 1129.58656362 .. -0.0
   FUVB G130M        WCA     1300     2 1129.18055776 .. -0.0
   FUVB G130M        BOA     1300     2 1129.58656362 .. -0.0
   ...
> desired_row = ((tbdata['SEGMENT'] == 'FUVB') & \
                        (tbdata['OPT_ELEM'] == 'G130M') & \
                        (tbdata['APERTURE'] == 'BOA') & \
                        (tbdata['CENWAVE'] == 1300))
> coeffs = tbdata[desired_row]['COEFF']
> print(coeffs)
 
array([[  1.12958656e+03,   9.96449962e-03, 0.00000000e+00, -0.00000000e+00]])
 
> hdulist.close()

Dumping x1d data to an ASCII File

It is possible to dump the arrays of an x1d file to an ASCII file. For example, to extract the WAVELENGTH, FLUX, ERROR, and DQ_WGT columns of FUV x1d file ldel02cxq_x1d.fits:

> from astropy.io import fits                                       
> import numpy as np
> x1ddata = fits.getdata('ldel02cxq_x1d.fits')
> wavelength = np.concatenate((x1ddata["wavelength"][1],\
>              x1ddata["wavelength"][0]))
> flux = np.concatenate((x1ddata["flux"][1], x1ddata["flux"][0]))
> err = np.concatenate((x1ddata["error"][1], x1ddata["error"][0]))
> dq_wgt = np.concatenate((x1ddata["dq_wgt"][1], \
>         x1ddata["dq_wgt"][0]))
>with open('data.txt','w') as f:
>    for row in zip(wavelength, flux, err, dq_wgt):
>        f.write('{}\t{}\t{}\t{}\n'.format(row[0], row[1], row[2], row[3]))

This will create a new 2-D ASCII text file, data.txt.

Plotting COS x1d Data

Each row of the science extensions in an x1d file will contain the columns listed in Table 2.5; a similar table, including array dimensions, can be displayed by using the Table module in astropy.table.

When using many Python routines with x1d files as input, it will be necessary to specify the extension number of the file. For example, to plot flux vs. wavelength for both segments in an x1d file using the plot module in matplotlib.pyplot:

> import matplotlib.pyplot as plt
> from astropy.io import fits
> import numpy as np
> x1ddata = fits.getdata('ldel02cxq_x1d.fits')
> # Print segments associated with different indexes
> print(x1ddata["segment"][0], x1ddata["segment"][1])
> wavelength = np.concatenate((x1ddata["wavelength"][1],\
              x1ddata["wavelength"][0]))
> flux = np.concatenate((x1ddata["flux"][1], \
         x1ddata["flux"][0])) 
> plt.plot(wavelength, flux)
> plt.xlabel('Wavelength')
> plt.ylabel('Flux')
> plt.show()

In this example, we explicitly reference extension 0 and 1 to obtain segment FUVB and FUVA respectively. One can verify this by printing the segment names:

> # Print segments associated with different indexes
> print(x1ddata["segment"][0], x1ddata["segment"][1])

For FUV data, the x1d files contain both1 segments A and B. To plot flux vs. wavelength in an FUV x1d file for segment A or segment B, using the plot module:

> import matplotlib.pyplot as plt
> from astropy.io import fits
> import numpy as np
> x1ddata = fits.getdata('ldel02cxq_x1d.fits')
# select only segment A
> x1d_fuva = x1ddata[x1ddata['segment'] == 'FUVA']
> wavelength = x1d_fuva['wavelength'].flatten()
> flux = x1d_fuva['flux'].flatten()
> plt.plot(wavelength, flux)
> plt.xlabel('Wavelength')
> plt.ylabel('Flux')
> plt.show()
# select only segment B
> x1d_fuvb = x1ddata[x1ddata['segment'] == 'FUVB']
> wavelength = x1d_fuvb['wavelength'].flatten()
> flux = x1d_fuvb['flux'].flatten()
> plt.plot(wavelength, flux)
> plt.xlabel('Wavelength')
> plt.ylabel('Flux')
> plt.show()

For NUV data, the x1d files contain the three stripes A, B, and C. To plot flux vs. wavelength in an NUV x1d file for all three stripes sequentially:

> import matplotlib.pyplot as plt
> from astropy.io import fits
> x1dfile = fits.getdata('lcwj10qtq_x1d.fits')
# select only stripe A
> x1d_nuva = x1ddata[x1ddata['segment'] == 'NUVA']
> wavelength = x1d_nuva['wavelength'].flatten()
> flux = x1d_nuva['flux'].flatten()
> plt.plot(wavelength, flux)
> plt.xlabel('Wavelength')
> plt.ylabel('Flux')
> plt.show()
 
# select only stripe B
> x1d_nuvb = x1ddata[x1ddata['segment'] == 'NUVB']
> wavelength = x1d_nuvb['wavelength'].flatten()
> flux = x1d_nuvb['flux'].flatten()
> plt.plot(wavelength, flux)
> plt.xlabel('Wavelength')
> plt.ylabel('Flux')
> plt.show()
 
# select only stripe C
> x1d_nuvc = x1ddata[x1ddata['segment'] == 'NUVC']
> wavelength = x1d_nuvc['wavelength'].flatten()
> flux = x1d_nuvc['flux'].flatten()
> plt.plot(wavelength, flux)
> plt.xlabel('Wavelength')
> plt.ylabel('Flux')
> plt.show()

5.3.2 Redoing Spectral Extraction

The x1dcorr module in calcos is designed to extract flux calibrated 1-D spectra from corrected COS event lists (corrtag files). This module is called by calcos as part of standard pipeline processing; its functioning in that role is described in Section 3.4.18.

Correcting for Shifts Along the Dispersion Direction

Properly aligning the spectrum along the dispersion direction is important not only for obtaining the correct wavelength solution, but also for properly applying the flux calibration. Incorrect registration of the spectrum will result in the wrong sensitivity being applied at each wavelength. This is especially important for low resolution spectra, since at some wavelengths the sensitivity changes rapidly with wavelength. The throughput also changes rapidly for the blue modes.

For rawtag exposures the wavecal lamp exposures are taken either concurrently with the science rawtag spectra (TAGFLASH) or they are acquired as separate rawtag spectra (AUTO or GO wavecals). For all science rawaccum exposures the wavecals are acquired as separate rawtag exposures.

The wavecal exposures are used by calcos to determine the location of both the wavecal image and the corresponding science image on the detector. The locations may vary in a non-repeatable manner due to non-repeatability of the COS grating positions, but are always at a fixed position from one another. When auto-wavecals are acquired as separate exposures they are taken close in time to the science exposures, with the grating in the same position as during the science exposure.

After processing data through calcos, you may decide that you need to shift the spectrum along the dispersion direction to correct offsets in the wavelength calibration. For example, wavelength calibration offsets may occur due to offsets of the target from the center of the PSA aperture (which can occur if the target acquisition was imperfect), or from drift of the grating due to thermal flexure. Assuming that calcos has been run on the data and a residual wavelength offset has been found in the calibrated spectrum, the offset can be corrected by first calculating the number of pixels corresponding to the offset, then subtracting it from the CORR or FULL position of coordinates along the dispersion direction. In the example below, the shift is applied to the XFULL column of the corrtag or corrtag_(a,b) file. In Python, the corrtag can be modified to apply the shift as follows:

> from astropy.io import fits
> hdulist = fits.open('ldel05ivq_corrtag_a.fits', mode='update')
> corrdata = hdulist[1].data
> corrdata['XFULL'] = corrdata['XFULL'] + SHIFT
> hdr = hdulist[0].header
> hdr['X1DCORR'] = 'PERFORM'
> hdulist.close()

In the example above, the XFULL positions of the science spectrum in an FUV A corrtag file have been moved by "SHIFT" pixels. Also, the header was updated such that X1DCORR will be performed again. The corrtag can then be run through calcos again if desired.

There is also a terminal command line option to apply a SHIFT1 value that is different than the one calcos finds internally. The values need to be in a separate file with the following columns included in the file: Dataset rootname, fpoffset, flash number, stripe or segment, desired SHIFT1 value, and desired SHIFT2 value. Calcos will recognize "any" as a value for fpoffset if the rootname is unique (not the association rootname). It will also recognize "any" for the flash number and stripe and segment. While SHIFT1 values (dispersion direction shift) must be specified in the file, specifying SHIFT2 values (cross-dispersion direction shift) is optional. This option can be called in the terminal as:

calcos –shift shift_filename.txt rawtag.fits

or in Python after importing calcos as:

calcos.calcos("rawtag.fits",shift_file="shift_filename.txt").

Adjusting the Background Subtraction

For spectra, background regions offset from the extraction region are used to determine the background. You can adjust the default parameters for this background region by first copying the _1dx reference file listed under the XTRACTAB keyword in the primary header (or _2zx reference file if performing a two-zone extraction) to a local directory, then adjusting the background parameters within the local version of the _1dx or _2zx reference file. Once you have adjusted the parameters to your satisfaction, edit the primary header of the rawtag file to indicate the path to the local version of the _1dx or _2zx file. You can then run calcos with the updated background subtraction parameters.

Examples of customizing extraction box heights are provided in Appendix A of COS ISR 2017-03. The background parameters available for editing in the _1dx and _2dx files are described in Section 3.4.19.

The header keywords in the data files to be processed need to point to the updated _1dx and _2zx reference files, and the headers scan be edited with the setval() function, part of the astropy.io.fits module in stenv. Below is an example modifying the header to point to a custom _1dx file:

> from astropy.io import fits
> fits.setval('filename_rawtag_a.fits', 'XTRACTAB', \
      value='custom_file_1dx.fits', ext=0)

5.3.3 Splicing Extracted Spectra

The IRAF task splice can be applied to combine overlapping extracted COS spectra (e.g., spectra taken with different central wavelengths). Information about using IRAF and the STSDAS analysis software can be found in Chapter 3 of the Introduction to HST Data Handbooks. Users should be aware of differences in spectral resolution at a given wavelength between different cenwaves. Splice takes into account the error (ERR) array as well as the data quality (DQ) array. Handling of the DQ array is important as it helps splice perform the combination properly and avoid bad or noisy data in the output file arising from the large changes in throughput at the edges of the detector.

cl> splice obs1_x1d.fits,obs2_x1d.fits output_splice.fits

Please refer to the splice task help file for more useful information. If a multispec format spectrum is preferred for further analysis, the task tomultispec can be run on the output file of the splice task.

Running splice as mentioned above (rather than transforming individual x1d fits tables into multispec format before combining them) has important advantages: it keeps the science data, error, and DQ arrays intact allowing for easier error analysis, and it does not have a limitation on the number of segments or wavelengths to include, which is the case with the multispec format. This limitation is caused by the size limit of the FITS header, which requires the wavelength scale to be fit with a function. We note that a Python replacement for the IRAF task splice is under development, and IRAF is no longer actively maintained.



1 For FUV x1d files, both segments A and B will be present as long as the individual raw data from both segments were available at the time of processing. If only one segment was present during processing, then a row selector of row=0 will point to the data from that segment. Similarly, a row selector of row=1 will result in an error.