HST Primer: Data Processing and the HST Data Archive

An overview of the HST data processing pipeline and archiving in MAST.



Routine Science Data Processing

Science data obtained with HST are sent to the Tracking Data Relay Satellite System (TDRSS), from there to the TDRSS ground station at White Sands, New Mexico, then to the Sensor Data Processing Facility at Goddard Space Flight Center in Greenbelt, Maryland, and then finally to STScI. 

At STScI, the production pipeline provides standard processing for data editing, calibration, and product generation. These functions, performed automatically, include the following:

  • Reformatting and editing of data from the spacecraft packet format to images or spectra.
  • Performing standard calibrations (flat fields, wavelength calibrations, background subtraction, etc.) with best available calibration files.
  • Producing standard data output products (FITS format files of raw and calibrated images, OMS [jitter and performance flags] files). 

Standard calibrations performed on HST data, and the resulting output data products, are described in detail in the HST Data Handbook

In 2015, major upgrades to the Science Data Processing Pipelines (SDP) and Data Distribution System (DADS) were installed. Archive users will now find HST data stored on disks for immediate download. Data will be reprocessed, as needed when updated calibration reference files or improved calibration algorithms are received to ensure the best quality data are available for users. Data from all non-active instruments (HSP,  GHRS, FOC, FOS, WF/PC, WFPC2 and NICMOS) continue to be available for direct download. WFPC2 data have recently been reprocessed to take advantage of an updated distortion correction table and improvements in drizzlepac, including the direct alignment of image to an external catalog (generally GAIA DR3) when possible. If a user requests a dataset that is queued for reprocessing, they will be presented with the option to download the existing data or wait until the reprocessing is complete. Requested datasets are prioritized in the processing queue and are generally available within a couple of hours. The user can set up a subscription to receive a notification when the reprocessing has completed.

 

Software for data reduction and analysis

The Space Telescope Science Institute provides a number of software packages of the data reduction and analysis of HST data (HST Notebooks Github Repository). For data reduction, the HSTCal package is a C package for reducing observations from ACS, STIS, and WFC3. The CalCOS package provides software for reducing observations from COS. Reference files may be accessed using the Calibration Data Reference System for reducing the observations. In addition, the ACSToolsCOSToolsSTISTools, and WFC3Tools software packages have additional functionality for analyzing HST data. Further information about reducing and analyzing observations are available in the HST Data Handbook. These tools, along with other additional support software, can be downloaded and installed as part of 'stenv' environment.

The HST Data Archive

All science and calibration data are placed in the HST Data Archive. Science data become immediately available (after pipeline processing) to a program’s Principal Investigator (PI), as well as those designated by the PI as long as they are authorized to retrieve the data. These data may be retrieved after the PI has registered for an STScI single sign-on (SSO) account and are normally exclusive access for a period of six months from the date of observation (see HST Data Rights and Duplications). 

On average, science data from HST flow through the production pipeline and into the Archive within 9 hours of observation on the telescope. Some data may take as long as three to five days. Observers are notified by e-mail when the first datasets reach the Archive. They are also provided with Web tools to track a visit’s completeness and to retrieve the data generated by the pipeline.

The time required for retrieving data from the Archive depends on whether the data are available for direct download or need to be reprocessed, the speed of the user’s connection, the size of the requested dataset, and the load on the Archive. As of November 1, 2025, the HST Data Archive contains over 464 TB of data. All of the HST data, including exclusive access data, should be online and available for direct download through the Data Discovery Portal. To retrieve exclusive access data users must have a MyST account and be authorized to do so.

If there are strict scientific requirements for data receipt within days after execution, such as to provide pointing corrections for tightly scheduled visits, there are resource-intensive methods to expedite delivery of data. Such requirements must be stated in the proposal so that the resource needs can be determined and reviewed.

Web Access to the HST Data Archive

Most of the data in the HST Data Archive are public and may be retrieved by any user.  HST data may be accessed through a number of web-based and programmatic (API) interfaces.

The HST search page allows searches for HST data using a form-based interface, where users can build both simple and complex queries to retrieve data.  The MAST Data Discovery Portal  provides a more visual/graphical interface for searching, inspecting, and retrieving data.  In addition to providing HST data, the Data Discovery Portal provides access to the entire multi-mission archive at STScI.  With the Portal, users can easily find complementary data sets from the more than 20 active and legacy missions at MAST, including Kepler/K2, TESS, GALEX, and Pan-STARRS.

MAST provides a suite of Application Programming Interface (API) interfaces for searching for and downloading data.  Python users can use the MAST Astroquery module, which integrates easily with Astropy and other Python analysis tools.  MAST also provides more generic webservice API access via HTTPS and HTTP GET requests.

Requesters of exclusive access (i.e., proprietary) data will be required to log in using their MyST Single Sign-On (SSO) System account.  MAST provides a token-based authentication system to retrieve exclusive access data when downloading via CURL scripts or using Astroquery. 

The Archive Helpdesk can be reached at http://masthelp.stsci.edu or by sending email to the address provided there.

Amazon Web Services (AWS) Public Dataset Program

All non-exclusive access data for current Hubble instruments (ACS, COS, STIS, WFC3, FGS) have been made available as part of the Amazon Web Services (AWS) public dataset program. Proposers may request to make use of this dataset under the archival legacy category. 

The Hubble Legacy Archive (HLA)

The Hubble Legacy Archive (HLA) is a project designed to enhance science from the Hubble Space Telescope by augmenting the HST Data Archive and by providing advanced browsing capabilities. It is a joint project of the Space Telescope Science Institute, the Canadian Astronomy Data Centre (CADC), and the European Space Astronomy Centre (ESAC). The primary enhancements are:

  • Advanced data products with several HST instruments, produced for public data. 
  • Data products that are immediately available online for viewing, searching and downloading.
  • A footprint service that makes it easier to browse and download images.
  • Availability of combined images, deep/wide multi-visit mosaics, and spectral data.
  • For many images, the absolute astrometry has been improved from one to two arcsec to ~0.3 arcsec.
  • Source lists are available for ACS, WFPC2, and WFC3 observations.
  • NICMOS and ACS grism extractions have been produced by European Coordinating Facility (ST-ECF).
  • An interface is provided for many user-provided High-Level Science Products.

The HLA regularly has major Data Releases, including enhanced data products for almost all science data for WFC3, ACS, WFPC2 (produced by CADC) and NICMOS. Also available are source lists for ACS, WFPC2, and WFC3. Some STIS spectra are also available as high-level science products and can be searched, viewed, and retrieved through the same interface as enhanced image data. Among the new additions in the HLA DR10 release are deep, wide-field ACS and WFC3 multi-visit mosaic data products for 1348 fields. The mosaic images were astrometrically corrected and aligned using Hubble Source Catalog version 2. The ACS and WFC3 images are drizzled onto a common pixel grid, which makes them very easy to use. The HLA DR10 also includes enhanced data products for ACS/SBC and for moving targets. It has many improvements in the data processing, including a robust alignment algorithm for misaligned exposures, greatly improved source lists, and the ability to handle almost all the rare observing modes utilized for HST observations.

Some of the more general goals of the HLA are to make HST data VO-compatible (Virtual Observatory), to move toward a sky atlas user view rather than a collection of datasets, and to develop an “all-HST-sky” source list. The Hubble Source Catalog (HSC), represents a major milestone toward the attainment of these goals for the HLA. 

The HLA can be accessed at http://hla.stsci.edu. HLA data products are also accessible through the MAST Discovery Portal

Hubble Advanced Products (HAP)

The HLA has been replaced by the Hubble Advanced Products (HAP) project, which has developed a new data processing pipeline that runs as part of the routine instrument calibration processing and reprocessing campaigns. The new pipeline incorporates algorithms and lessons-learned from the HLA project to generate high-level data products that are similar to the HLA data products. The HAP pipeline generates single-visit and multi-visit mosaic images and source catalogs that are similar to those generated by the HLA pipeline. It includes improvements to astrometry and exposure alignment by using the Gaia DR3 catalog. The HAP pipeline immediately generates products for newly acquired HST data, including proprietary data, thereby reducing the time lag between data acquisition and the availability of advanced products. 

Single-visit mosaic (SVM) HAP ACS and WFC3 images were released in December of 2020, and multi-visit mosaic (MVM) images were released in April 2022. Point and segmentation map source catalogs based on the SVM began production in November 2021. The HLA was used extensively to test the quality of the HAP products via comparisons and cross-matches. In December 2023 a new pipeline was deployed to operations to generate new drizzle products for WFPC2 aligned directly to external source catalogs (generally Gaia DR3) and single visit mosaics as are generated for ACS and WFC3. 

All HAP products are recreated following updates to the external catalogs or to the alignment and catalog generation algorithms contained in drizzlepac. Individual products are also reprocessed when the underlying products are reprocessed due to reference file updates or improvements in the calibration code. HAP source catalogs have undergone extensive testing as part of the Hubble Source Catalog project, which has led to the identification of several areas for improvement in the source catalogs. For more information about the HAP, see the Instrument Science Report.

Hubble Advanced Spectral Products (HASP)

The Hubble Advanced Spectral Products (HASP) initiative transforms the accessibility and utility of archival Hubble Space Telescope (HST) data by automating the coaddition and abutment of one-dimensional spectra from the Cosmic Origins Spectrograph (COS) and Space Telescope Imaging Spectrograph (STIS). Building on the success of the UV Legacy Library of Young Stars as Essential Standards (ULLYSES) program, HASP extends automated coaddition to nearly every COS and STIS spectrum in the archive. This project includes a publicly available Python script and Jupyter Notebooks, enabling custom coaddition by the community. This service is regularly updated with the latest calibrations and new data from over 3300 programs and 65000 datasets as of Cycle 33. Additionally, HASP enables users to perform custom coadditions through interactive Jupyter Notebooks, designed to assist users in setting up and running the coadd script, catering to specific use cases not covered by automatic HASP coadds.

For more details, visit the HASP webpage or the Instrument Science Report.

Hubble Spectral Legacy Archive (HSLA)


The newly updated Hubble Spectroscopic Legacy Archive (HSLA) provides scientifically validated coadded spectra of individual targets that have been observed with the Cosmic Origins Spectrograph (COS) and the Space Telescope Imaging Spectrograph (STIS) over their operating lifetime. This new HSLA improves data quality and updates the earlier version that was released in 2017 and that was last updated in 2018. HSLA uses data available in the Mikulski Archive for Space Telescopes (MAST) and automatically produces coadds whenever new data become publicly available, or when there is newly recalibrated data. 

A key feature of the new HSLA is that it automatically defines individual targets, groups multiple observations of a single target into associations, and produces a classification for each target. Target associations make use of the dataset coordinates accounting for proper motions, and uses SIMBAD, NED and the Phase II observing proposals to determine which datasets should be associated with each unique target. Then, using the SIMBAD, NED, or Phase II keywords, a detailed classification is determined for an object to aid in the spectroscopic study of classes of astrophysical objects. The classifications consist of three tiers of detail, mapped to the Unified Astronomy Thesaurus (UAT) . For example, the HSLA target Markarian 817 is classified at Tier 1 as a Galaxy, Tier 2 as an Active Galaxy, and at Tier 3 as a Seyfert. It corresponds to the UAT object Seyfert Galaxies (1447).

For each individual target HSLA also provides a human-readable metadata file with key information that can be used in searches or for further exploration of the data. The metadata file includes the target name and unique identifiers, the target coordinates in J2000, information on the target's name and classification information, and a summary of the programs and instrument modes that are included in the target association.

HSLA data products, including quicklook coadds (_aspec files), coadded single grating products (_cspec files), metadata files (_metadata files), and code output logs (.trl files) are available at the MAST Portal, the HST Mission Search Form or via astroquery. Since the HSLA is fully automated, it will be updated routinely as new HST spectroscopic data are taken or if data are reprocessed with improved calibrations. Data access may change with time and users are encouraged to visit the HSLA Webpage for the latest details on how to access data. 

For more details, visit the HSLA webpage or the Instrument Science Report.

The Hubble Source Catalog (HSC)

The HLA produces source lists for tens of thousands of HST images. The Hubble Source Catalog (HSC) combines these single, visit-based WFC3, ACS, and WFPC2 source lists into a single Master Catalog, hence providing entry into the field of database astronomy. Searches that would require months or years to perform in the past can in many cases be done in seconds with the HSC. This resource may be used to support a wide range of new archival proposals, a few potential examples of which are listed below.

Version 3 of the Hubble Source Catalog was released in July 2018. The primary improvements from version 2 were: (1) the addition of 25% more ACS images and twice as many WFC3 images; (2) improved photometric quality in the source lists due both to the alignment algorithm used to match exposures and filters in the HLA image processing and to improved algorithms for Source Extractor photometry (particularly near the edges of images); and (3) improved astrometric calibration based on the Gaia DR1 catalog. A journal-level publication describing the HSC, quality of the data (in HSC version 1), and potential for doing science is available at Whitmore et al. (2016). An extensive FAQ describing most aspects of the HSC is available. The matching algorithms used by the HSC are described in Budavari & Lubow, 2012.

The HSC can be accessed in a variety of ways. For most cases, the easiest method will be the MAST Discovery Portal, which provides simple yet powerful access for searches returning up to 50,000 objects. It provides footprints, object selection filtering, cross-matching and interactive displays. The MAST API provides programmatic access to the MAST portal search and cross-match capabilities through a flexible Python-accessible interface. For larger queries, a CasJobs database query interface is available. This resource is based on the Sloan Digital Sky Survey (SDSS) CasJobs tool1, and supports complex database searches that may run for hours and generate large result tables with millions of rows. The new Virtual Observatory Table Access Protocol (TAP) interface also allows for direct database queries.  The HSC TAP service can be used from popular tools such as TopCat as well as through Python and other high level languages (see the article in the May 2018 MAST Newsletter for more details). Finally, the HSC homepage is both a source of documentation and examples and also includes simple form interfaces (plus a queryable API).

Below is a list of some of the types of projects that might make particularly good use of this new resource.

  • Variable stars, galaxies, calibrations, etc., identified using HST’s 25+ year baseline. This is enabled by the Hubble Catalog of Variables (see below).
  • Astrometric properties: proper motions, cluster kinematics, KBOs, etc.
  • Extremely large data sets: e.g., creating CMDs based on the ~ 4000 HST observations of the LMC.
  • Cross-matching with other catalogs: SDSS, 2MASS, spectral catalogs, etc.
  • Object properties: star clusters and associations, colors, elongations, etc.
  • Photo-Zs.
  • Compilation of spectroscopic properties based on COS, FOS, and GHRS observations, cross-matched with their HSC counterparts.

1A word of caution is in order, however. Unlike SDSS, with a uniform set of filters and all-sky coverage over a substantial part of the sphere, the Hubble database consists of tiny pieces of the sky using three different cameras and hundreds of filters. Potential users should pay special attention to the Five Things You Should Know About the HSC webpage. Detailed use cases are available to guide users in common ways to make use of the HSC and to avoid common pitfalls.

The Hubble Catalog of Variables (HCV)

The HCV is the result of a four-year collaboration between ESA and STScI.  HCV is based on the HSC, and, with a baseline of 26+ years, is a valuable resource for studying variability in a variety of astronomical sources.  More information may be found at the archive HCV page.