By Joel Wilf
PDS Central Node
Review Draft 2.5
March 17, 2000
The Planetary Data System (PDS) ensures the quality of its product in two ways. The first is peer review, where the focus is on the scientific value of the data and documentation. Peer review answers the question, "How well does the product serve its scientific users?" The second is validation, where the emphasis is on overall quality and conformance to PDS standards. Validation answers the question, "How well does the product fit the PDS model?"
Validation improves the quality and consistency of each volume. But more important, it ties together the entire collection of PDS volumes. It creates a framework for building software, searchable catalogs, and complete information systems around the set of PDS volumes. It makes the difference between a collection of CDs and an archive.
Validation is central to PDS. But it can be difficult in practice. Historically, validation has depended on the judgment of the data engineer examining the volume. There are good reasons for this. Some of the checks are qualitative by nature. Also, PDS standards are complex, having evolved over nearly a decade of use.
Perhaps the biggest problem with validation has been ensuring consistency. One data engineer may object to what another has passed. Validation may also be confusing to the data producer, who doesn't know what to expect. For these reasons, a checklist was developed, which shows all the tests that are part of the validation process.
This document presents the validation checklist. The checklist should:
The validation checklist shows how to apply the standards. It is not a substitute for the standards, themselves. The PDS Standards Reference is still the canonical source of standards information. When there is a conflict between the checklist and the PDS Standards, it is the checklist that should be modified.
The validation checklist describes what to test. For the most part, it does not discuss the tools to use for testing. Think of it as providing the requirements for such tools.
There are two parts to the checklist. The first part summarizes the checklist as a table. This gives a high-level view of the validation process. It also makes it easy to "check off" each step as it is performed in an actual validation. The second part describes each step in detail.
The following steps are taken to ensure a valid PDS volume. Whenever possible, the steps are linked to an appropriate section of the PDS Standards Reference. Some steps fall under the category of general quality assurance, and these are labeled "General QA" in the list:
Validation Checklist Step-by-Step | ||
---|---|---|
Step | Action | Standards |
Part I: Get an Overview of the Volume | ||
1 | Read the top level files for a general understanding of the volume. | General QA |
Part II: Validate the Volume, Directory, and File Structure | ||
2 | Check that the volume is ISO9660 level-1 or level-2 compliant. | Section 10 |
3 | Ensure that the directory tree drawn in AAREADME.TXT matches the actual volume. | Appendix D 4.4 |
4 | Make sure all required files and directories are present, with no extraneous files or directories. | Section 19 |
5 | Make sure that every file has an attached or detached label, and there are no zero-length files. | Section 4.1, Section 5 |
Part III: Validate the Metadata | ||
6 | Check that all keywords that use formation rules have properly formed values. | Section 19.4.1, Section 19.5.1 |
7 | Check all labels on the volume for syntax and pointer errors. | Section 12 |
8 | Check that all keywords and standard values on the volume are in the Data Dictionary. | Section 12 |
9 | For attached labels, ensure that RECORD_TYPE, RECORD_BYTES, and related keywords are correct. | Section 5.1.2 |
10 | Check references. | General QA |
Part IV: Validate the Data Index | ||
11 | Validate the structure of INDEX.TAB and ensure that each entry points to a data file. | Section 19.3 |
12 | Make sure that every data file is pointed to in INDEX.TAB. | Section 19.3 |
13 | Make sure that CUMINDEX.TAB contains all the INDEX.TAB files for the volume set. | Section 19.3 |
Part V: Validate Files by Content | ||
14 | Make sure that all filenames have the proper extension, according to the data in the file. | Section 10.2 |
15 | Check ASCII files for bad line terminators, line lengths, and embedded illegal characters. | Section 5.1.1, Section 5.1.2 |
16 | Check HTML files for valid syntax and working links. | Section 22 |
17 | Ensure that tabular data displays correctly with NASAView. | Section 4 |
18 | Ensure that images display correctly with NASAView. | Section 4 |
19 | Ensure that non-ASCII documents are readable. | Section 9.2.2 |
20 | Check spelling in text files. | General QA |
21 | Verify Software. | General QA |
Part VI: Referential Integrity | ||
22 | Check keyword/value pairs for consistency. | General QA |
23 | Check catalog referential integrity through a test ingestion. | General QA |
The following describes each step of the validation checklist in more detail:
Step 1 - Read the top level files for a general understanding of the volume:
Read the AAREADME.TXT, VOLDESC.CAT, ERRATA.TXT, and any included memos. It is also important to read the DATASET.CAT, INST.CAT, INSTHOST.CAT, and MISSION.CAT, if they are new or have been modified. Check that the descriptions are written clearly. If done correctly, these files provide a good overview of the volume.
Step 2 - Check that the volume is ISO9660 level-1 or level-2 compliant:
All PDS volumes must be ISO9660 level-1 or level-2 compliant. These standards require the following:
When checking that filenames are uppercase, make sure that you are looking at the filenames as they actually appear on the CD-ROM. For example, when Windows NT Explorer reads ISO9660 filenames, it capitalizes the first letter and lowercases the rest. But the older Windows File Manager (winfile.exe) shows the filenames correctly as uppercase.
Step 3 - Insure that the directory tree drawn in AAREADME.TXT matches the volume:
The AAREADME.TXT shows the volume's file structure in the form of a tree, with the root directory on top. This must match the actual file structure on the volume.
If there is an HTML version of the AAREADME.TXT file, it should be named AAREADME.HTM and its directory tree display should have links to the files on the volume. It should be verified that the links work correctly.
Step 4 - Make sure all required directories and files are present:
Section 19 of the Standards describes the acceptable types of volume organization. Each type requires a certain directory structure. You should determine which type of organization is used by the current volume, then verify that the required directories and files are present. You should also make sure that there are no extraneous root-level files or directories.
In general, this means looking for:
Step 5 - Make sure that every file has an attached or detached label, with no zero-length files:
Every file on the volume should either contain a label itself or be pointed to by a label. It is important to ensure that there are no zero-length files. These often indicate an error writing the CD-ROM.
Step 6 - Check that all keywords that use formation rules have properly formed values:
A formation rule is a method for constructing a value, usually a name or ID string. Adhering to formation rules keeps names and IDs consistent. In PDS, keywords that use formation rules include DATA_SET_ID, VOLUME_SET_ID, VOLUME_ID, and DATA_OBJECT_TYPE.
Make sure that the DATA_SET_ID has the form:
mission-target(s)-instrument-CODMAC level-version id For example: DATA_SET_ID = "ESO-J/S/N/U-SPECTROPHOTOMETER-4-V2.0"
Further checking is necessary to ensure that each component of the DATA_SET_ID makes sense. This means checking that:
The VOLUME_SET_ID should conform to PDS naming standards. For projects funded by PDS, the designation should contain the PDS string. For example, VOLUME_SET_ID = "USA_NASA_PDS_GBA _0001" and not "USA_NASA_JPL_GBAT_0001". If the volume is not the first in the set, the VOLUME_SET_ID should already exist.
The VOLUME_ID should conform to PDS naming standards. It should also be unique within the PDS.
DATA_OBJECT_TYPE should correspond to the primary data type being delivered. For example, if DATA_OBJECT_TYPE = TABLE, then the main data files should be tables, with a .TAB extension. In the case of images, DATA_TYPE = IMAGE implies that the data files should have IMG, IBG, or IMQ extensions.
Step 7 - Check all labels on the volume for syntax and pointer errors.
All labels on the volume must be validated against the current Data Dictionary. All pointers in a label must be checked to ensure that they point to a valid object.
Historically, these tests have been performed by LVTOOL, the PDS label validation software. Note that the output of LVTOOL must be properly interpreted.
Step 8 - Check that all keywords and standard values on the volume are in the Data Dictionary.
In order for labels to be valid, all their keywords and standard values must be in the data dictionary.
Historically, this test has been performed by LVTOOL. Note that the output of LVTOOL must be properly interpreted. When a keyword or standard value is missing from the data dictionary, it could mean any one of the following:
Step 9 - For attached labels, ensure that RECORD_TYPE, RECORD_BYTES, and related keywords are correct:
These tests are performed to ensure consistency between the label and the data. There are several errors that commonly occur when attached labels are used with RECORD_TYPE = FIXED. Check the following:
For .TXT and .CAT files, also note:
Step 10 - Check references:
The reference file (REF.CAT) should be properly formed. It should contain citations made in other CATALOG/*.CAT files and be consistent with the reference database.
Check for the following consistency errors:
Note: Sometimes, when citations in REF.CAT differ from those in the database, it is the database citation that needs to be changed.
Also check the syntax of REFKEYID within text descriptions. It should be uppercase, enclosed in square brackets, and take the form:
In the case of multiple citations, the REFKEYIDs should be separated by a semicolon, not a comma, e.g., [SMITHETAL1993; SMITHETAL1994].
Step 11 - Validate the structure of INDEX.TAB and ensure that each entry points to a data file:
Check that INDEX.TAB columns are present for unique values PRODUCT_ID, VOLUME_ID, DATA_SET_ID, and PRODUCT_CREATION_TIME.
Whenever possible, check that INDEX.TAB values fall within their stated or implied minimum and maximum values. For example, check time values against the START_TIME and STOP_TIME given in DATASET.CAT.
Check the syntax of INDEX.TAB:
Check that every entry in INDEX.TAB points to the label for a data file. Note: the entries point to the label, not to the data directly.
Step 12 - Make sure that every data file is pointed to in INDEX.TAB:
Every data file must have an attached or detached label. Make sure that every one of these labels is pointed to by an entry in the INDEX.TAB.
Step 13 - Make sure that CUMINDEX.TAB contains all the INDEX.TAB files for the volume set:
Check whether a CUMINDEX.TAB is required. If VOLUMES = 1, it is not required. Otherwise, it must be present.
As a volume set is assembled, make sure each CUMINDEX.TAB file contains the INDEX.TAB files for all previously published volume.
Step 14 - Make sure all filenames have the proper extension, according to the data in the file:
File extensions should be used to identify the data type of a file. Section 10.2.3 lists the required file extensions for key PDS data types. Section 10.2.4 lists the file extensions reserved for other types of data. A filename has the proper extension when:
For example, a file with an .LBL extension should be a PDS label file. Conversely, a PDS label file should not have the extension .LAB.
Step 15 - Check ASCII files for bad line terminators, line lengths, and characters:
Check that the following is true for all labels, .TXT, and .ASC files:
With the exception of the line length limit, all these tests should also hold true for HTML files
Step 16 - Check HTML files for valid syntax and working links:
Validate HTML files against the HTML 3.2 DTD and make sure all links are working properly.
Step 17 - Ensure that tabular data displays correctly with NASAView:
NASAView provides a good visual interface for browsing tables. First it parses the label and reports any syntax errors. Then it displays the table. It is worth checking the columns against the data type prescribed in the label. Examine as large a sampling of tables as feasible.
Step 18 - Ensure that images display correctly with NASAView:
Use NASAView to view image files. Check the quality of the image. It is also worth checking the image against its description in the label. Examine as large a sampling of images as feasible.
Step 19 - Ensure that non-ASCII documents are readable:
If possible, look at the PDF, MS-WORD, and other non-ASCII documents in the DOCUMENT directory. Make sure the text is readable and the illustrations legible. This is especially important for legacy documents that were scanned into digital form.
Step 20 - Check spelling in text files:
Ideally, you should check the spelling in all files that contain descriptive text. At very least, key files such as AAREADME.TXT, VOLDESC.CAT, ERRATA.TXT, and the CATALOG files should be checked. If time allows, also look at the xxxxINFO.TXT files and key files in the DOCUMENT directory.
Step 21 - Verify software:
Software included on a volume should compile and run as described in the /SOFTWARE/SOFTINFO.TXT file. If source code is included on the volume, try to build and execute it on the specified platforms. Likewise, if binary code is included, try to run it on the specified platforms.
Step 22 - Check keyword/value pairs for consistency:
Examine the "KEYWORD = VALUE" pairs on the volume to find those that are technically legal but "make no sense." For example, INSTRUMENT_HOST_ID = VG2 is a legal value, but is out of place on a Voyager I volume.
There are other cases where a value must be checked against "outside information." For example: VOLUMES = 8 seems valid. But if you are checking the ninth volume in a set, it is incorrect. In fact, experience has shown it worth checking the VOLUMES keyword, since errors occasionally occur, just like the one in the example.
Step 23 - Test referential integrity with a test ingestion:
PDS volumes provide information, mostly through the .CAT files, for the high-level PDS catalog. The catalog is a relational database, and it is important that it maintain referential integrity (RI) as new information is added.
Currently, there is only one way to make certain of this. That is: ingest the new information on a copy of the database, then run the test for referential integrity.
Once the referential integrity test has succeeded, information from the volume can be ingested into the real catalog. At this point, the volume should be consistent with PDS standards and the PDS data model, and should be accepted as valid.