HST Cycle 26 Primer Data Processing and the HST Data Archive

An overview of the HST data processing pipeline and archiving in MAST.

Routine Science Data Processing

Science data obtained with HST are sent to the TDRSS satellite system, from there to the TDRSS ground station at White Sands, New Mexico, then to the Sensor Data Processing Facility at Goddard Space Flight Center in Greenbelt, Maryland, and then finally to STScI. 

At STScI, the production pipeline provides standard processing for data editing, calibration, and product generation. These functions, performed automatically, include the following:

  • Reformatting and editing of data from the spacecraft packet format to images or spectra.
  • Performing standard calibrations (flat fields, wavelength calibrations, background subtraction, etc.) with best available calibration files.
  • Producing standard data output products (FITS format files of raw and calibrated images, OMS [jitter and performance flags] files). 

Standard calibrations performed on HST data, and the resulting output data products, are described in detail in the HST Data Handbook

In 2015, major upgrades to the Science Data Processing Pipelines (DP) and Data Distribution System (DADS) were installed. Archive users will now find HST data stored on disks for immediate download. Data will be reprocessed, as needed when updated calibration reference files or improved calibration algorithms are received to ensure the freshest data are available for users. Data from all non-active instruments (HSP,  GHRS, FOC, FOS, WF/PC, WFPC-2 and NICMOS) continue to be available for direct download.  In the event a dataset is requested when those data are queued for reprocessing, the user will experience a delay as those data are processed through the system at a higher priority before being delivered.

Space Telescope Science Data Analysis System (STSDAS)

The Space Telescope Science Data Analysis System (STSDAS), and its accompanying package, TABLES, provides access to applications for the analysis of HST data as well as various utilities for manipulating and plotting data. The TABLES package facilitates the manipulation of FITS table data. STSDAS and TABLES were originally layered onto the Image Reduction and Analysis Facility (IRAF) software from the National Optical Astronomy Observatories (NOAO). Both packages run from within IRAF and are supported on a variety of platforms, although not all of the platforms that IRAF supports.

All the existing calibration pipeline programs used by STScI to process all HST data can found in the HSTCALpackage, a package which allows users to compile and run the calibration software originally written in C without using or running IRAF. The latest versions of all the active instruments’ calibration software no longer rely on IRAFand has been installed as the HSTCAL package for Archive and pipeline operations. HST observers can use the programs in HSTCAL to recalibrate their data, to examine intermediate calibration steps, and to re-run the pipeline using different calibration switch settings and reference data as appropriate.

These HSTCAL software versions replace the calibration software available under STSDAS; specifically, CALACSfor ACS, CALWF3 for WFC3, and CALSTIS for STIS.

Much of the newer calibration and analysis software is written in Python, including CALCOS for COS data. It does not require IRAF, and is available as part of the STScI Python library included and installed as part of AstroConda releases. 

PyRAF is an alternate Python-based command-line environment that enables users to run IRAF tasks and allows the new Python-based software to be used, along with IRAF tasks, with IRAF command-line syntax. AstroConda includes astropy, a set of packages that provide Python programs the ability to read and write FITS files and read and manipulate the WCS information in image headers. The new Python environment allows users to manipulate and display data in a way not possible with IRAF that is more akin to how data can be manipulated and displayed by IDL(Interactive Data Language). The STScI Python environment described above is contained within the stsci_pythonpackage. Detailed information on STSDAS, TABLES, PyRAF, AstroConda, and other Python-based software, including the actual software, is available from the STScI Software webpage. Information about IRAF is available from the IRAF webpage.

The HST Data Archive

All science and calibration data are placed in the HST Data Archive. Science data become immediately available (after pipeline processing) to a program’s Principal Investigator (PI), as well as those designated by the PI as long as they are authorized to retrieve the data. These data may be retrieved after the PI has registered for an STScI single sign-on (SSO) account and are normally proprietary for a period of six months from the date of observation (see HST Cycle 26 Data Rights and Duplications). 

On average, science data from HST flow through the production pipeline and into the Archive within one day after observation on the telescope. Some data may take as long as three to five days. Observers are notified by e-mail when the first datasets reach the Archive. They are also provided with Web tools to track a visit’s completeness and to retrieve the data generated by the pipeline.

The time required for retrieving data from the Archive depends on whether the data are available for direct download or need to be reprocessed, the speed of the user’s connection, the size of the requested dataset, and the load on the Archive. As of October 1, 2016, the HST Data Archive contains over 126 TB of data. During the first quarter of 2017, all of the HST data, including proprietary data, should be online and available for direct download through the Data Discovery Portal. To retrieve proprietary data users must have a MyST account and be authorized to do so.

If there are strict scientific requirements for data receipt within days after execution, such as to provide pointing corrections for tightly scheduled visits, there are resource-intensive methods to expedite delivery of data. Such requirements must be stated in the proposal so that the resource needs can be determined and reviewed.

Web Access to the HST Data Archive

Most of the data in the HST Data Archive are public and may be retrieved by any user.

The Archive may be accessed through a number of web-based interfaces. The dedicated HST search page, also available under the “Mission Search” menu tab at the MAST webpage, allows qualified searches for HST data, and download of those data and their reference files. The MAST Portal provides a number of capabilities to search for, inspect, and download data. Its interface includes integrated access to the Digitized Sky Survey (DSS) and enables users to access a coordinate resolver to look up the coordinates of an object by name. In both interfaces, it is possible to request specific data products instead of the entire complement of files. Requesters of proprietary data will be required to log in to their MyST Single Sign-On (SSO) System account. Starview only allows searches of instrument-specific databases and requires users to sign in using their SSO account. All of the MAST search interfaces provide previews of most publicly available images and spectra.

The Archive Helpdesk can be reached by e-mail at http://masthelp.stsci.edu.

Amazon Web Services (AWS) Public Dataset Program

All non-proprietary data for current Hubble instruments (ACS, COS, STIS, WFC3, FGS) have been made available as part of the Amazon Web Services (AWS) public dataset program. Proposers may request to make use of this dataset under the archival legacy category. 

The Hubble Legacy Archive (HLA)

The Hubble Legacy Archive (HLA) is a project designed to enhance science from the Hubble Space Telescope by augmenting the HST Data Archive and by providing advanced browsing capabilities. It is a joint project of the Space Telescope Science Institute, the Canadian Astronomy Data Centre (CADC), and the European Space Astronomy Centre (ESAC). The primary enhancements are:

  • Advanced data products with several HST instruments, produced for public data. 
  • Data products that are immediately available online for viewing, searching and downloading.
  • A footprint service that makes it easier to browse and download images.
  • Availability of combined images, prototype mosaics, and spectral data.
  • For many images, the absolute astrometry has been improved from one to two arcsec to ~0.3 arcsec.
  • Source lists are available for ACS, WFPC2, and WFC3 observations.
  • NICMOS and ACS grism extractions have been produced by European Coordinating Facility (ST-ECF).
  • An interface is provided for many user-provided High-Level Science Products.

At the time of this writing (November 2016), the HLA has completed its ninth major Data Release, including enhanced data products for almost all public science data for WFC3, ACS, WFPC2 (produced by CADC) and NICMOS. Also available are source lists for ACS, WFPC2, and WFC3. Some STIS spectra are also available as high-level science products and can be searched, viewed, and retrieved through the same interface as enhanced image data. Multi-visit ACS mosaics have been produced for 67 pointings. The footprint service shows the outline of data from all instruments, including data that are still proprietary, and an interface to the main HST Archive is provided to access proprietary data (for PIs and their designated collaborators) and recent data for which enhanced HLA products are not yet available. 

Some of the more general goals of the HLA are to make HST data VO-compatible (Virtual Observatory), to move toward a sky atlas user view rather than a collection of datasets, and to develop an “all-HST-sky” source list. The Hubble Source Catalog (HSC), represents a major milestone toward the attainment of these goals for the HLA. 

The HLA can be accessed at http://hla.stsci.edu.

The Hubble Source Catalog (HSC)

The HLA produces source lists for tens of thousands of HST images. The Hubble Source Catalog (HSC) combines these single, visit-based WFC3, ACS, and WFPC2 source lists into a single Master Catalog, hence providing entry into the field of database astronomy. Searches that would require months or years to perform in the past can in many cases be done in seconds with the HSC. This resource may be used to support a wide range of new archival proposals, a few potential examples of which are listed below.

Version 2 of the Hubble Source Catalog was released in the fall of 2016. The primary improvements from version 1 were the addition of four years of ACS data and the cross-matching of the HSC with spectral observations from COS, FOS and GHRS. A journal-level publication describing the HSC, quality of the data, and potential for doing science is available at Whitmore et al. (2016). A FAQ describing most aspects of the HSC is available. The matching algorithms used by the HSC are described in Budavari & Lubow, 2012.

The HSC can be accessed in a variety of ways. For most cases, the best method will be the MAST Discovery Portal. This Virtual Observatory (VO)-based tool provides easy yet powerful access for searches involving 50,000 or less. It provides footprints, object selection filtering, and interactive displays. For larger queries, a powerful CASJobs (CAS = Catalog Archive Server) capability is available1. This resource is based on the Sloan Digital Sky Survey (SDSS) tool2, providing similar support into the world of database astronomy.

Below is a list of some of the types of projects that might make particularly good use of this new resource.

  • Variable stars, galaxies, calibrations, etc., identified using HST’s ~ 25-year baseline. Note: a project to construct a Hubble Variable Catalog is underway, headed by the National Observatory of Athens.
  • Astrometric properties: proper motions, cluster kinematics, KBOs, etc.
  • Extremely large data sets: e.g., creating CMDs based on the ~ 4000 HST observations of the LMC.
  • Cross-matching with other catalogs: SDSS, 2MASS, spectral catalogs, etc.
  • Object properties: star clusters and associations, colors, elongations, etc.
  • Photo-Zs.
  • Compilation of spectroscopic properties based on COS, FOS, and GHRS observations, cross-matched with their HSC counterparts.

1A third method of access is via the HSC homepage. This provides the most flexibility for very detailed queries.

2A word of caution is in order, however. Unlike SDSS, with a uniform set of filters and all-sky coverage over a substantial part of the sphere, the Hubble database consists of tiny pieces of the sky using three different cameras and hundreds of filters. Potential users should pay special attention to the Five Things You Should Know About the HSC webpage. Detailed use cases are available to guide users in common ways to make use of the HSC and to avoid common pitfalls.