HST Primer: Data Processing and the HST Data Archive
An overview of the HST data processing pipeline and archiving in MAST.
Routine Science Data Processing
Science data obtained with HST are sent to the TDRSS satellite system, from there to the TDRSS ground station at White Sands, New Mexico, then to the Sensor Data Processing Facility at Goddard Space Flight Center in Greenbelt, Maryland, and then finally to STScI.
At STScI, the production pipeline provides standard processing for data editing, calibration, and product generation. These functions, performed automatically, include the following:
- Reformatting and editing of data from the spacecraft packet format to images or spectra.
- Performing standard calibrations (flat fields, wavelength calibrations, background subtraction, etc.) with best available calibration files.
- Producing standard data output products (FITS format files of raw and calibrated images, OMS [jitter and performance flags] files).
Standard calibrations performed on HST data, and the resulting output data products, are described in detail in the HST Data Handbook.
In 2015, major upgrades to the Science Data Processing Pipelines (DP) and Data Distribution System (DADS) were installed. Archive users will now find HST data stored on disks for immediate download. Data will be reprocessed, as needed when updated calibration reference files or improved calibration algorithms are received to ensure the freshest data are available for users. Data from all non-active instruments (HSP, GHRS, FOC, FOS, WF/PC, WFPC-2 and NICMOS) continue to be available for direct download. If a user requests a dataset that is queued for reprocessing, they will be presented with the option to download the existing data or wait until the reprocessing is complete. Requested datasets are prioritized in the processing queue and are generally available within a couple of hours. The user can set up a subscription to receive a notification when the reprocessing has completed.
Software for data reduction and analysis
The Space Telescope Science Institute provides a number of software packages of the data reduction and analysis of HST data. For data reduction, the HSTCal package is a c package for reducing observations from ACS, STIS, and WFC3. The CalCOS package provides software for reducing observations from COS. Reference files may be accessed using the Calibration Data Reference System for reducing the observations. In addition, the ACSTools, COSTools, STISTools, and WFC3Tools software packages have additional functionality for analyzing HST data. Further information about reducing and analyzing observations are available in the HST Data Handbook. These tools, along with other additional support software, can be downloaded and installed as part of 'stenv' environment.
The HST Data Archive
All science and calibration data are placed in the HST Data Archive. Science data become immediately available (after pipeline processing) to a program’s Principal Investigator (PI), as well as those designated by the PI as long as they are authorized to retrieve the data. These data may be retrieved after the PI has registered for an STScI single sign-on (SSO) account and are normally exclusive access for a period of six months from the date of observation (see HST Data Rights and Duplications).
On average, science data from HST flow through the production pipeline and into the Archive within 9 hours of observation on the telescope. Some data may take as long as three to five days. Observers are notified by e-mail when the first datasets reach the Archive. They are also provided with Web tools to track a visit’s completeness and to retrieve the data generated by the pipeline.
The time required for retrieving data from the Archive depends on whether the data are available for direct download or need to be reprocessed, the speed of the user’s connection, the size of the requested dataset, and the load on the Archive. As of November 1, 2018, the HST Data Archive contains over 161 TB of data. All of the HST data, including exclusive access data, should be online and available for direct download through the Data Discovery Portal. To retrieve exclusive access data users must have a MyST account and be authorized to do so.
If there are strict scientific requirements for data receipt within days after execution, such as to provide pointing corrections for tightly scheduled visits, there are resource-intensive methods to expedite delivery of data. Such requirements must be stated in the proposal so that the resource needs can be determined and reviewed.
Web Access to the HST Data Archive
Most of the data in the HST Data Archive are public and may be retrieved by any user. HST data may be accessed through a number of web-based and programmatic (API) interfaces.
The HST search page allows searches for HST data using a form-based interface, where users can build both simple and complex queries to retrieve data. The MAST Data Discovery Portal provides a more visual/graphical interface for searching, inspecting, and retrieving data. In addition to providing HST data, the Data Discovery Portal provides access to the entire multi-mission archive at STScI. With the Portal, users can easily find complementary data sets from the more than 20 active and legacy missions at MAST, including Kepler/K2, TESS, GALEX, and Pan-STARRS.
MAST provides a suite of Application Programming Interface (API) interfaces for searching for and downloading data. Python users can use theMAST Astroquery module, which integrates easily with Astropy and other Python analysis tools. MAST also provides more generic webservice API access via HTTPS and HTTP GET requests.
Requesters of exclusive access (i.e., proprietary) data will be required to log in using their MyST Single Sign-On (SSO) System account. MAST provides a token-based authentication system to retrieve exclusive access data when downloading via cURL scripts or using Astroquery.
The Archive Helpdesk can be reached at http://masthelp.stsci.edu or by sending email to the address provided there.
Amazon Web Services (AWS) Public Dataset Program
All non-exclusive access data for current Hubble instruments (ACS, COS, STIS, WFC3, FGS) have been made available as part of the Amazon Web Services (AWS) public dataset program. Proposers may request to make use of this dataset under the archival legacy category.
The Hubble Legacy Archive (HLA)
The Hubble Legacy Archive (HLA) is a project designed to enhance science from the Hubble Space Telescope by augmenting the HST Data Archive and by providing advanced browsing capabilities. It is a joint project of the Space Telescope Science Institute, the Canadian Astronomy Data Centre (CADC), and the European Space Astronomy Centre (ESAC). The primary enhancements are:
- Advanced data products with several HST instruments, produced for public data.
- Data products that are immediately available online for viewing, searching and downloading.
- A footprint service that makes it easier to browse and download images.
- Availability of combined images, deep/wide multi-visit mosaics, and spectral data.
- For many images, the absolute astrometry has been improved from one to two arcsec to ~0.3 arcsec.
- Source lists are available for ACS, WFPC2, and WFC3 observations.
- NICMOS and ACS grism extractions have been produced by European Coordinating Facility (ST-ECF).
- An interface is provided for many user-provided High-Level Science Products.
The HLA regularly has major Data Releases, including enhanced data products for almost all science data for WFC3, ACS, WFPC2 (produced by CADC) and NICMOS. Also available are source lists for ACS, WFPC2, and WFC3. Some STIS spectra are also available as high-level science products and can be searched, viewed, and retrieved through the same interface as enhanced image data. Among the new additions in the HLA DR10 release are deep, wide-field ACS and WFC3 multi-visit mosaic data products for 1348 fields. The mosaic images were astrometrically corrected and aligned using Hubble Source Catalog version 2. The ACS and WFC3 images are drizzled onto a common pixel grid, which makes them very easy to use. The HLA DR10 also includes enhanced data products for ACS/SBC and for moving targets. It has many improvements in the data processing, including a robust alignment algorithm for misaligned exposures, greatly improved source lists, and the ability to handle almost all the rare observing modes utilized for HST observations.
Some of the more general goals of the HLA are to make HST data VO-compatible (Virtual Observatory), to move toward a sky atlas user view rather than a collection of datasets, and to develop an “all-HST-sky” source list. The Hubble Source Catalog (HSC), represents a major milestone toward the attainment of these goals for the HLA.
The HLA can be accessed at http://hla.stsci.edu. HLA data products are also accessible through the MAST Discovery Portal.
The Hubble Source Catalog (HSC)
The HLA produces source lists for tens of thousands of HST images. The Hubble Source Catalog (HSC) combines these single, visit-based WFC3, ACS, and WFPC2 source lists into a single Master Catalog, hence providing entry into the field of database astronomy. Searches that would require months or years to perform in the past can in many cases be done in seconds with the HSC. This resource may be used to support a wide range of new archival proposals, a few potential examples of which are listed below.
Version 3 of the Hubble Source Catalog was released in July 2018. The primary improvements from version 2 were: (1) the addition of 25% more ACS images and twice as many WFC3 images; (2) improved photometric quality in the source lists due both to the alignment algorithm used to match exposures and filters in the HLA image processing and to improved algorithms for Source Extractor photometry (particularly near the edges of images); and (3) improved astrometric calibration based on the Gaia DR1 catalog. A journal-level publication describing the HSC, quality of the data (in HSC version 1), and potential for doing science is available at Whitmore et al. (2016). An extensive FAQ describing most aspects of the HSC is available. The matching algorithms used by the HSC are described in Budavari & Lubow, 2012.
The HSC can be accessed in a variety of ways. For most cases, the easiest method will be the MAST Discovery Portal, which provides simple yet powerful access for searches returning up to 50,000 objects. It provides footprints, object selection filtering, cross-matching and interactive displays. The MAST API provides programmatic access to the MAST portal search and cross-match capabilities through a flexible Python-accessible interface. For larger queries, a CasJobs database query interface is available. This resource is based on the Sloan Digital Sky Survey (SDSS) CasJobs tool1, and supports complex database searches that may run for hours and generate large result tables with millions of rows. The new Virtual Observatory Table Access Protocol (TAP) interface also allows for direct database queries. The HSC TAP service can be used from popular tools such as TopCat as well as through Python and other high level languages (see the article in the May 2018 MAST Newsletter for more details). Finally, the HSC homepage is both a source of documentation and examples and also includes simple form interfaces (plus a queryable API).
Below is a list of some of the types of projects that might make particularly good use of this new resource.
- Variable stars, galaxies, calibrations, etc., identified using HST’s 25+ year baseline. This is enabled by the Hubble Catalog of Variables (see below).
- Astrometric properties: proper motions, cluster kinematics, KBOs, etc.
- Extremely large data sets: e.g., creating CMDs based on the ~ 4000 HST observations of the LMC.
- Cross-matching with other catalogs: SDSS, 2MASS, spectral catalogs, etc.
- Object properties: star clusters and associations, colors, elongations, etc.
- Compilation of spectroscopic properties based on COS, FOS, and GHRS observations, cross-matched with their HSC counterparts.
1A word of caution is in order, however. Unlike SDSS, with a uniform set of filters and all-sky coverage over a substantial part of the sphere, the Hubble database consists of tiny pieces of the sky using three different cameras and hundreds of filters. Potential users should pay special attention to the Five Things You Should Know About the HSC webpage. Detailed use cases are available to guide users in common ways to make use of the HSC and to avoid common pitfalls.
The Hubble Catalog of Variables (HCV)
The HCV is the result of a four-year collaboration between ESA and STScI. HCV is based on the HSC, and, with a baseline of 26+ years, is a valuable resource for studying variability in a variety of astronomical sources. More information may be found at the archive HCV page.