archive

archive.base module

class daxa.archive.base.Archive(archive_name, missions=None, clobber=False, download_products=True, use_preprocessed=False)[source]

Bases: object

The Archive class, which is to be used to consolidate and provide some interface with a set of mission’s data. Archives can be passed to processing and cleaning functions in DAXA, and also contain convenience functions for accessing summaries of the available data.

Parameters
  • archive_name (str) – The name to be given to this archive - it will be used for storage and identification. If an existing archive with this name exists it will be read in, unless clobber=True.

  • missions (List[BaseMission]/BaseMission) – The mission, or missions, which are to be included in this archive - any setup processes (i.e. the filtering of data to be acquired) should be performed prior to creating an archive. The default value is None, but this should be set for any new archives, it can only be left as None if an existing archive is being read back in.

  • clobber (bool) – If an archive named ‘archive_name’ already exists, then setting clobber to True will cause it to be deleted and overwritten.

  • download_products (bool/dict) – Controls whether pre-processed products should be downloaded for missions that offer it (assuming downloading was not triggered when the missions were declared). Default is True, but False may also be passed, as may a dictionary of DAXA mission names with True/False values.

  • use_preprocessed (bool/dict) – Whether pre-processed data products should be used rather than re-processing locally with DAXA. If True then what pre-processed data products are available will be automatically re-organised into the DAXA processed data structure during the setup of this archive. If False (the default) then this will not automatically be applied. Just as with ‘download_products’, a dictionary may be passed for more nuanced control, with mission names as keys and True/False as values.

property archive_name

Property getter for the name assigned to this archive by the user. :return: The archive name. :rtype: str

property top_level_path

The property getter for the absolute path to the top-level DAXA storage directory.

Returns

Absolute top-level storage path.

Return type

str

property archive_path

The property getter for the absolute path to the output archive directory.

Returns

Absolute path to the archive.

Return type

str

property mission_names

Property getter for the names of the missions associated with this Archive.

Returns

Return type

List[str]

property missions

Property getter that returns a list of missions associated with this Archive.

Returns

Missions associated with this archive.

Return type

List[BaseMission]

property preprocessed_missions

Gets a list of missions that have pre-processed data downloaded, if there are none an error will be raised.

Returns

A list of the mission instances in this archive which have pre-processed data downloaded.

Return type

List[BaseMission]

property process_success

Property getter for a nested dictionary containing boolean flags describing whether different processing steps applied to observations from various missions are considered to have completed successfully.

Returns

A nested dictionary where top level keys are mission names, next level keys are processing function names, and lowest level keys are either ObsID or ObsID+instrument names. The values attributed with the lowest level keys are boolean, with True indicating that the processing function was successful

Return type

dict

property process_errors

Property getter for a nested dictionary containing error information from processing applied to mission data.

Returns

A nested dictionary where top level keys are mission names, next level keys are processing function names, and lowest level keys are either ObsID or ObsID+instrument names. The values attributed with the lowest level keys are error outputs (e.g. parsed from stderr from command line tools).

Return type

dict

property process_warnings

Property getter for a nested dictionary containing warning information from processing applied to mission data.

Returns

A nested dictionary where top level keys are mission names, next level keys are processing function names, and lowest level keys are either ObsID or ObsID+instrument names. The values attributed with the lowest level keys are warning outputs (e.g. parsed from stderr from command line tools).

Return type

dict

property raw_process_errors

Property getter for a nested dictionary containing unparsed error information (e.g. the entire stderr output from an XMM SAS process) from processing applied to mission data.

Returns

A nested dictionary where top level keys are mission names, next level keys are processing function names, and lowest level keys are either ObsID or ObsID+instrument names. The values attributed with the lowest level keys are error outputs (e.g. stderr from command line tools).

Return type

dict

property process_logs

Property getter for a nested dictionary containing log information from processing applied to mission data.

Returns

A nested dictionary where top level keys are mission names, next level keys are processing function names, and lowest level keys are either ObsID or ObsID+instrument names. The values attributed with the lowest level keys are logs (e.g. stdout from command line tools).

Return type

dict

property process_extra_info

Property getter for a nested dictionary containing extra information from processing applied to mission data. This can be things like paths to event lists, or configuration information. It is unlikely to be necessary for users to directly access this property.

Returns

A nested dictionary where top level keys are mission names, next level keys are processing function names, and lowest level keys are either ObsID or ObsID+instrument names. The values attributed with the lowest level keys are dictionaries of extra information (e.g. config info).

Return type

dict

property process_names

Property that returns a dictionary containing the names of all processing steps that have been run on this archive. Top-level keys are mission names, and the values are lists of process names.

Returns

The dictionary containing mission name and process name information. Top-level keys are mission names, and the values are lists of process names.

Return type

dict

property observation_summaries

This property returns information on the different observations available to each mission. This information will vary from mission to mission, and is primarily intended for use by DAXA processing methods, but could include things such as whether an instrument was active for a particular observation, what sub-exposures there were (relevant for XMM for instance), what filter was active, etc.

Returns

A dictionary of information with missions as the top level keys, then ObsIDs, then instruments. Keys on levels below that will be determined by the information available for specific instruments of specific missions.

Return type

bool

property process_observation

This property returns the dictionary of mission-ObsID-Instrument(-subexposure) boolean flags that indicate whether the data for that observation-instrument(-subexposure) should be processed for science. There is a companion get method that returns only the data identifiers that should be processed.

Returns

The dictionary containing information on whether particular data should be processed.

Return type

dict

property final_process_success

This property returns the dictionary which stores the final judgement (at the ObsID level) of whether there are any useful data (True) or whether no aspect of that observation reached the end of the final processing step successfully. The ObsIDs marked as False will be moved from the archive processed data directory to a separate failed data directory.

The flags are only added once the final processing step for a particular mission has been run.

Returns

The dictionary of final processing success flags.

Return type

dict

property source_regions

This property returns all source regions which have been associated with missions in this archive. The top level keys of the dictionary are mission names, the bottom level keys are observation identifiers, and the values are lists of region objects.

If an observation in this archive has had regions added for it, then those regions will also have been written to permanent storage in the archive directory structure. The path can be identified using the get_region_file_path method of this archive.

Returns

Dictionary containing regions on a mission-observation basis.

Return type

dict

get_current_data_path(mission, obs_id)[source]

A method which returns the current location of the archive data for a particular ObsID of a particular mission. The two location options are in the ‘processed’ directory, which is the default and will be the home of all ObsIDs that haven’t made it to the final process for a particular mission, or the ‘failed’ directory, where any ObsID that has no use (per the final checks) will be stored.

Parameters
  • mission (BaseMission/str) – The mission for which to retrieve the current data path.

  • obs_id (str) – The ObsID for which to retrieve the current data path.

Returns

The current path to the requested ObsID of the specified mission.

Return type

str

construct_processed_data_path(mission=None, obs_id=None)[source]

This method is to construct paths to directories where processed data for a particular mission + observation ID combination will be stored. That functionality is added here so that any change to how those directories are named will take place in only one part of DAXA, and will propagate to other parts of the module. It is unlikely that a user will need to directly use this method.

If no mission is passed, then no observation ID may be passed. In the case of ‘mission’ and ‘obs_id’ being None, the returned string will be constructed ready to format; {mn} should be replaced by the DAXA mission name, and {oi} by the relevant ObsID.

Retrieving a data path from this method DOES NOT guarantee that it has been created.

Parameters
  • mission (BaseMission/str) – The mission for which to retrieve the processed data path. Default is None in which case a path ready to be formatted with a mission name will be provided.

  • obs_id (str) – The ObsID for which to retrieve the processed data path, cannot be set if ‘mission’ is set to None. Default is None, in which case a path ready to be formatted with an observation ID will be provided.

Returns

The requested path.

Return type

str

construct_failed_data_path(mission=None, obs_id=None)[source]

This method is to construct paths to directories where data for a particular mission + observation ID combination which failed to process will be stored. That functionality is added here so that any change to how those directories are named will take place in only one part of DAXA, and will propagate to other parts of the module. It is unlikely that a user will need to directly use this method.

If no mission is passed, then no observation ID may be passed. In the case of ‘mission’ and ‘obs_id’ being None, the returned string will be constructed ready to format; {mn} should be replaced by the DAXA mission name, and {oi} by the relevant ObsID.

Retrieving a data path from this method DOES NOT guarantee that it has been created.

Parameters
  • mission (BaseMission/str) – The mission for which to retrieve the failed data path. Default is None in which case a path ready to be formatted with a mission name will be provided.

  • obs_id (str) – The ObsID for which to retrieve the failed data path, cannot be set if ‘mission’ is set to None. Default is None, in which case a path ready to be formatted with an observation ID will be provided.

Returns

The requested path.

Return type

str

get_region_file_path(mission=None, obs_id=None)[source]

This method is to construct paths to files where the regions associated with a particular observation of a particular mission are stored after being added to the archive. If a mission and ObsID are specified then this method will check whether region information for that particular ObsID of that particular mission exists in this archive, and raise an error if it does not.

If no mission is passed, then no observation ID may be passed. In the case of ‘mission’ and ‘obs_id’ being None, the returned string will be constructed ready to format; {mn} should be replaced by the DAXA mission name, and {oi} by the relevant ObsID.

Retrieving a region file path from this method without passing mission and ObsID DOES NOT guarantee that one has been created for whatever mission and ObsID are added to the string later.

Parameters
  • mission (BaseMission/str) – The mission for which to retrieve the region file path. Default is None in which case a path ready to be formatted with a mission name will be provided.

  • obs_id (str) – The ObsID for which to retrieve the region file path, cannot be set if ‘mission’ is set to None. Default is None, in which case a path ready to be formatted with an observation ID will be provided.

Returns

The requested path.

Return type

str

get_obs_to_process(mission_name, search_ident=None)[source]

This method will provide a list of lists of [ObsID, Instrument, SubExposure (depending on mission)] that should be processed for scientific use for a specific mission. The idea is that this method can be called, and just by iterating through the result you will get the identifiers of all valid data that match your input.

It shouldn’t really need to be used directly by users, but instead will be very useful for the processing functions - it will tell them which data need to be processed.

Parameters
  • mission_name (str) – The internal DAXA name of the mission to retrieve information for.

  • search_ident (str) – Either an ObsID or an instrument name to retrieve matching information for. An ObsID will search through all the instruments/subexposures, an instrument will search all ObsIDs and sub-exposures. The default is None, in which case all ObsIDs, instruments, and sub-exposures will be searched.

Returns

List of lists of [ObsID, Instrument, SubExposure (depending on mission)].

Return type

List[List]

check_dependence_success(mission_name, obs_ident, dep_proc, no_success_error=True)[source]

This method should be used by processing functions, rather than the user, to determine whether previous processing steps (specified in the input to this function) ran successfully for the specified data.

Each processing function should be setup to call this method with appropriate previous steps and identifiers, and will know from its boolean array return which data can be processed safely. If no data has successfully run through a previous step, or no attempt to run a previous step occurred, then an error will be thrown.

Parameters
  • mission_name (str) – The name of the mission for which we wish to check the success of previous processing steps.

  • obs_ident (str/List[str], List[List[str]]) – A set (or individual) set of observation identifiers. This should be in the style output by get_obs_to_process (i.e. [ObsID, Inst, SubExp (depending on mission)], though does also support just an ObsID.

  • dep_proc (str/List[str]) – The name(s) of the process(es) that have to have been run for further processing steps to be successful.

  • no_success_error (bool) – If none of the specified previous processing steps have been run successfully, should a NoDependencyProcessError be raised. Default is True, but if set to False the error will not be raised and the return will be an all-False array. This will NOT override the error raised if a previous process hasn’t been run at all.

Returns

A boolean array that defines whether the process(es) specified in the input were successful. Each set of identifying information provided in obs_ident has a corresponding entry in the return.

Return type

np.ndarray

get_process_logs(process_name, mission_name=None, obs_id=None, inst=None, full_ident=None)[source]

This method allows for targeted retrieval of processing logs (stdout), for a specific processing step. The particular logs retrieved can be narrows down by mission, ObsID, or instrument. Multiple missions, ObsIDs, and instruments may be specified, but only one process at a time. The names of processes that have been run can be found in the ‘process_names’ property of an Archive.

Parameters
  • process_name (str) – The process for which logs are to be retrieved (see ‘process_names’ property for the names of processes run on this archive).

  • mission_name (str/List[str]) – The mission name(s) for which logs are to be retrieved. Default is None, in which case all missions will be searched, and either a single name or a list of names can be passed. See ‘mission_names’ for a list of associated mission names.

  • obs_id (str/List[str]) – The ObsID(s) for which logs are to be retrieved. Default is None, in which case all ObsIDs will be searched. Either a single or a set of ObsIDs can be passed.

  • inst (str/List[str]) – The instrument(s) for which logs are to be retrieved. Default is None, in which case all instruments will be searched. Either a single or a set of instruments can be passed.

  • full_ident (str/List[str]) – A full unique identifier (or a set of them) to make matches too. This will override any ObsID or insts that are specified - for instance one could pass 0201903501PNS003. Default is None.

Returns

A dictionary containing the requested logs - top level keys are mission names, lower level keys are unique identifiers, and the values are string logs which match the provided information.

Return type

dict

get_process_raw_error_logs(process_name, mission_name=None, obs_id=None, inst=None, full_ident=None)[source]

This method allows for targeted retrieval of processing raw-error logs (stderr), for a specific processing step. The particular logs retrieved can be narrows down by mission, ObsID, or instrument. Multiple missions, ObsIDs, and instruments may be specified, but only one process at a time. The names of processes that have been run can be found in the ‘process_names’ property of an Archive.

Parameters
  • process_name (str) – The process for which logs are to be retrieved (see ‘process_names’ property for the names of processes run on this archive).

  • mission_name (str/List[str]) – The mission name(s) for which logs are to be retrieved. Default is None, in which case all missions will be searched, and either a single name or a list of names can be passed. See ‘mission_names’ for a list of associated mission names.

  • obs_id (str/List[str]) – The ObsID(s) for which logs are to be retrieved. Default is None, in which case all ObsIDs will be searched. Either a single or a set of ObsIDs can be passed.

  • inst (str/List[str]) – The instrument(s) for which logs are to be retrieved. Default is None, in which case all instruments will be searched. Either a single or a set of instruments can be passed.

  • full_ident (str/List[str]) – A full unique identifier (or a set of them) to make matches too. This will override any ObsID or insts that are specified - for instance one could pass 0201903501PNS003. Default is None.

Returns

A dictionary containing the requested logs - top level keys are mission names, lower level keys are unique identifiers, and the values are string logs which match the provided information.

Return type

dict

get_failed_processes(process_name)[source]

A simple method to retrieve all unique identifiers of data that failed a particular processing step. The names of processes that have been run can be found in the ‘process_names’ property of an Archive.

Parameters

process_name (str) – The process for which unique identifiers of data that failed the processing step are to be retrieved (see ‘process_names’ property for the names of processes run on this archive).

Returns

A dictionary, with mission names as top level keys, and values being lists of failed unique identifiers.

Return type

dict

get_failed_logs(process_name)[source]

A convenience method that retrieves the logs (stdout and stderr) for processing of particular data (be it a whole ObsID, a particular instrument of an ObsID, or a particular sub-exposure of a particular instrument of an ObsID) which FAILED.

Parameters

process_name (str) – The process for which logs (stdout and stderr) are to be retrieved if the data of a particular unique identifier failed.

Returns

A tuple of two dictionaries, the first containing stdout logs, and the second containing stderr logs - the structure of the dictionaries has mission names as top level keys, unique identifiers as lower level keys, and string logs as values.

Return type

Tuple[dict, dict]

delete_raw_data(force_del=False, all_raw_data=False)[source]

This method will delete raw data downloaded for the missions in this archive; by default only directories corresponding to ObsIDs currently accepted through a mission’s filter will be deleted, but if all_raw_data is set to True then the WHOLE raw data directory corresponding to a particular mission will be removed.

Confirmation from the user will be sought that they wish to delete the data, unless force_del is set to True - in which case the removal will be performed straight away.

Parameters
  • force_del (bool) – This argument can be used to ensure that the delete option can be performed entirely programmatically, without requiring a user input. Default is False, but if set to True then the delete operation will be performed immediately.

  • all_raw_data (bool) – This controls whether only the data selected by the current instance of each mission are deleted (when False, the default behaviour) or if the whole directory associated with each mission is removed.

save()[source]

A simple method that saves the information necessary to reload this archive from disk at a later time. This largely consists of the various pieces of information regarding the success (or not) of various processing steps.

NOTE that the mission states are not saved here, as they could be triggered repeatedly, which can be slow for the ones with many possible ObsIDs (i.e. Swift and Integral). Instead, saves are triggered when the archive is created, in the init, and if the data in the archive are updated (as this necessitates a change in the mission states).

info()[source]

A simple method to present summary information about this archive.

archive.assemble module