Training¶

paystone.training is a package providing the key interface for training: the Experiment.

Experiment is a computation graph. Its nodes represent "tasks", which are functions with a particular signature. The behaviour of an Experiment when it is "run" can be controlled by command line options. Below we explain how tasks are written, how the Experiment runs, and how command line arguments work.

This package also contains tools for model evaluation which are to be used in tests and certain tasks.

Creating and Running Experiments¶

The constructor of the Experiment object has one parameter: the hyperparameter model. This is a Pydantic data model -- the model itself, not an instance of it -- which has default values for all of its attributes.

The interface of experiments has three main methods: add_task(), run(), and save().

When adding tasks, there are 1 required and 2 optional parameters: the task, any inbound tasks, and any upstream tasks. Inbound tasks are tasks whose output are passed as inputs to the given task, which is described more below. Upstream tasks are those which must necessarily have run before the given task can run, but which do not pass on their output data as inputs.

Running an experiment will topologically order the tasks, potentially overriding some tasks, execute each task, and store their results in their corresponding nodes. Tasks are run serially accordiing to the topological sorting; there is no parallel or concurrent execution.

Saving an experiment writes the outputs of all nodes in the graph to either local or cloud storage according to the rules of Artifacts. Calling save is a crucial component of a training module; it should be the last line of virtually all main.py files in training modules.

Writing Tasks¶

A key feature of the Experiment class is static validation of tasks. Task signatures must follow these rules:

The positional parameters must correspond to the outputs of the inbound tasks in order. This is validated according to their type annotations.
- For example, if a task has two inbound tasks which have return types str and int respectively, the first two positional parameters of the task must be annotated str and int respectively. The names of the parameters do not matter.
There is either zero keyword arguments or one. The only potential keyword argument is hyperparameters; it must have this name. Its type annotation must be the hyperparameter data model from models.py.
- Hyperparameters are used by accessing them from the hyperparameters argument.
- We take this approach, rather than passing the hyperparameters individually, because when passing the entire Pydantic model, its type annotations are retained.
- For example, if a hyperparameters data model has a learning_rate: float then hyperparameters.learning_rate is sure to be a float. Otherwise, we would be relying on the developer to match up the annotation between the signature and the data model, which could go wrong.

If these rules are not followed, Pyright will complain.

Command Line Options¶

Two command line options are available when running training with psml experiments train.

--hyperparameter-overrides: values for hyperparameters that replace the default values from the data model.
- For details on formatting of this argument, see psml-cli experiments train --help docs.
--task-overrides: the name of a file in cli/psml/psml/task_overrides/ containing functions that replace tasks in the Experiment.
- For a file named overrides.py, the correct argument is overrides.
- The name of the functions in the overrides file must exactly match the names of the functions in the Experiment.
- The signature of the override should exactly match the function it is overriding, although this is not validated.

The directory of overrides is git ignored, so these can be freely added on a personal basis as needed.

Evaluation¶

As noted in the article on the experiment package structure, a requirement of training is that one task calls paystone.training.evaluation.evaluate_models().

This function takes in a dictionary mapping artifact names to artifacts, the serving application, lists of regression and data tests, and "X" and "y" data objects, which are the features and labels of a hold-out test dataset. It does what one would imagine with these things: it uses the serving application to get predictions on the test dataset by injecting the given artifacts as dependencies, and runs the given tests using those predictions and the test labels.

The resulting output of evaluate_models() is a data model containing the results of the regression and data tests. The regression test results are booleans, and the data test results are floats. This is saved as an .