Experiment Epic Template¶

When beginning a new epic to either improve or create a new machine learning service, use the template below. We use the same template even for improvement epics, because it gives us a chance to re-visit and potentially alter the problem definition and success criteria, which may have drifted since the last time. Of course, there is nothing wrong with copying these things from the previous version if nothing has changed, but it is worth re-evaluating each time.

Some phases are only applicable to first revisions, and others are only applicable to subsequent revisions. Because of this, the numbers on phases are left as "X". Consult the experiment lifecycle to see which phases apply to your given epic, remove the rest, and fill in the phase numbers accordingly.

Begin from the first sub-heading below. The sub-headings are fixed and should be directly copied. Any content underneath a sub-heading in italics is placeholder content that should be removed. Anything not in italics should be copied verbatim.

Problem¶

User Story: As a [stakeholder] I want [capability] so that [customer benefit].

Success Criteria¶

- Measurement X increases by Y%. - Here is the definition of Measurement X.

Phases¶

Phase X: Service Specification¶

[ ] I and a product leader have collaborated on a service specification that fully details the expectations for the new service.

Phase X: Service Schema¶

[ ] I have established a schema for Instances.
[ ] I have established a schema for Parameters.
[ ] I have established a schema for Predictions.
[ ] I have established the Routes enum for the service.

Phase X: Business Metrics¶

[ ] We have at least one metric per direct stakeholder (i.e. client, customer, engineer) that measures the performance of the service from the perspective of that stakeholder.

Phase X: Data and Regression Tests¶

[ ] I have written at least one test that uses a hold-out test dataset to produce a scalar metric evaluating the performance of the service.
[ ] I have written at least one test that uses a hand-crafted set of instances to produce a boolean outcome determining whether the service offers acceptable predictions in key circumstances.

Phase X: Load Testing¶

[ ] I have filled out the load testing data model with parameters that accurately reflect the current service specification.
[ ] I have created a request generation function for use in load testing.

Phase X: Experimentation¶

[ ] I have found an idea for an experiment module that I feel at least 60% confident will lead to a successful promotion.

Phase X: Experiment Documentation¶

[ ] I have filled out a README.md at the minor version level of the experiments package explaining the approach that the experiment will take.

Phase X: Serving Endpoint(s)¶

[ ] I have created all necessary handlers in the serve.app submodule of my experiment.
[ ] They have been made into a FastAPI application using create_serving_app, which is available as serve.app.app in my package.

Phase X: Serving Tests¶

[ ] I have written a complete suite of unit tests for my serving application that covers as many edge cases as is reasonable.
[ ] All unit tests have mocked out all relevant artifacts.

Phase X: Training Module¶

[ ] I have written a training module that creates a paystone.training.runner.Experiment complete with tasks that combine to produce all necessary artifacts for serving.
[ ] One of the tasks in the Experiment produces a paystone.serving.types.testing.TrainingTestsResults using paystone.training.evaluation.evaluate_models().
[ ] I have called both .run() and .save() on my Experiment within the if __name__ == "__main__" block.

Phase X: Infrastructure¶

[ ] I have filled out the experiment infrastructure data model in its entirety.

Uncertainties and Constraints¶

This is a potential cost concern.
This is something about this experiment that is unique and does not fit our current processes.
This is an infrastructure change that is required for this work to be fully completed.