This namespace enables the specification of metadata that
describes products generated by the use of a machine learning
model. It captures information about how the model was trained
and evaluated.
## CHANGE LOG ##
1.1.0.0
- Change data_set_size to data_set_count.
- Expand attribute definitions.
- Update steward name.
1.0.1.0
- Address Oxygen-flagged errors.
- Make data_set_size a ASCII_NonNegative_Integer.
- Update some attribute and class definitions.
- Update README file per latest template.
- Use PDS Information Model 1.18 (I).
1.0.0.0
- Initial release.
The Data_Set class is the container for classes
and attributes describing the size and version of data sets used
by the machine learning model.
The Machine_Learning class is a container for
all machine learning information in the label.
The Machine_Learning_Algorithm class is a
container for classes and and attributes describing the
algorithm type and learning style used. An external reference to
a citation for the algorithm is required.
The Test_Performance class contains information
about a trained model's performance on the test
set.
The Test_Set class belongs to the Data_Set class
family and contains attributes describing the size and version
of the data set used to test the machine learning model (i.e.,
in terms of generalization to previously unseen
data).
The Trained_Machine_Learning_Model class is a
container for information about how a given model was trained
and evaluated. A Machine_Learning_Algorithm and Training_Set are
required, while Validation_Set and Test_Set (and
Test_Performance) are optional.
The Training_Set class belongs to the Data_Set
class family and contains attributes that describe the size and
version of the data set used to train the machine learning
model.
The Validation_Set class belongs to the Data_Set
class family and contains attributes that describe the size and
version of the data set used to validate the machine learning
model (e.g., to choose the best
hyperparameters).
This section contains the simpleTypes that provide more constraints
than those at the base data type level. The simpleTypes defined here build on the base data
types. This is another component of the common dictionary and therefore falls within the
common namespace.
The algorithm_learning_style attribute describes
the type of learning style employed by the algorithm to solve a
problem. Specifically, the learning style depends on whether
labeled or unlabeled data was employed to train the model.
Labeled data includes observations that are associated with a
desired output such as a class or numeric
value.
The algorithm_name attribute specifies the name
of the algorithm used.
The algorithm_type attribute describes the kind
of algorithm used, such as a regression model, neural network,
tree, etc.
The data_set_count attribute provides the number
of items in the data set.
The data_set_version_id attribute specifies the
data set version number.
The performance_measure attribute specifies the
name of the measure (or metric) used to report performance of
the model on the test set.
The performance_score attribute reports the
numeric score the model achieved using performance_measure on
the test set. Values are not constrained since the measure may
not be a strict metric. Examples could include accuracy, loss,
runtime, memory consumption, etc.
The trained_model_name attribute specifies the
name of the model used.
The trained_model_version_id attribute specifies
the trained model version number.