Skip to content

Serving

paystone.serving forms the base of the FastAPI applications for machine learning services. It contains a large amount of functionality related to the deployment and invocation of machine learning services:

  1. A create_serving_app function, the main export of the package which is used to scaffold a FastAPI application around provided endpoint handlers.
  2. The Artifact class, a FastAPI dependency which loads artifacts from storage according to the rules of artifact storage.
  3. LoggingMiddleware which logs predictions made by machine learning services both to the console and to BigQuery for later analysis.
  4. The MLServingTestClient class, which is used by serving tests to call the service with mocked artifacts.
  5. The ServingClient class, to make requests to deployed services.
  6. PSServiceManager, a wrapper of google.cloud.servicemanagement_v1 which manages the OpenAPI deployment for the MLServing API.
  7. The get_openapi_schema_for_experiment function, used to create the component of the OpenAPI related to a given machine learning service.

Creating a Serving App

The create_serving_app function takes in functions as keyword arguments and produces a FastAPI application object which is executed via Gunicorn (in production) or Uvicorn (in development).

The keywords of the arguments are the names of the paths that are resolved by their associated functions; these argument names must match the values of the Route enum for the schema. For example, if the Route enum contains "a", "b", and "c" as values, then create_serving_app must be called like so:

app = create_serving_app(a=a_resolver, b=b_resolver, c=c_resolver)

This method then does the following with that input:

  • Exposes the given handlers at two paths: the path given as the keyword argument name, and a longer path that begins with the service name, the major version, and the minor version.
    • This is because when testing, we want to invoke the application using the simple path provided as the argument name, but when deployed, the backend service will invoke the application using the entire path at which the the MLServing API received a request.
  • Adds a health check path with a simple handler that returns {"status": "alive"} on success.
    • This endpoint is given the same dependencies that were provided for the handler. This means that the dependencies -- which are always Artifacts, or outputs from training tasks -- will be loaded into memory by each instance running the application as of the first invocation of the /health endpoint, which happens on a regular and frequent interval. This reduces the likelihood of a slow, cold-start-like first request to the instance.
  • Adds logging middleware, described below.

Artifacts

As described in the serving section of the article on experiment packages, Artifacts are dependencies (represented by FastAPI Depends defaults) of handlers. These take in a string which is a path to an artifact from storage. More details are provided in the linked article.

Logging Middleware

The LoggingMiddleware class is injected into the FastAPI application in order to log simultaneously to "standard output" and to BigQuery, specifically the machine_learning.logs table.

To do so, a method transforms all of the information that was provided to the console into a data model which represents the schema of the target BigQuery table; one instance of that data model for each line of logging is then sent to the BigQuery table as a row.

Test Client

Using dependency overrides, the MLServingTestClient injects a mapping of ArtifactMockSpec data model instances into a TestClient FastAPI object to be used for serving tests. The test client object itself is a context manager.

The ArtifactMockSpec data model contains 3 things:

  1. The mocked return values of methods for the artifact, e.g. methods={"predict": np.array([1,2,3])}.
  2. The mocked properties of the artifact, e.g. properties={"_feature_importances": [0.5, 0.5]}.
  3. The mocked return value of the artifact, e.g. return_value=1.0.

The return value mock is often useful for transformers pipelines, which are essentially a function on inputs that produce a prediction.

An example instantiation of an MLServingTestClient can be found in the housing template serving tests. Here the MLServingTestClient context manager is instantiated with an application and a mapping from artifact names to ArtifactMockSpec objects which mock the aforementioned components of each artifact.

Helper Methods

prediction_request, validate_response, and parse_response are simple helper functions which do the following:

  • prediction_request: make a request using the given client and request parameters.
  • validate_response: ensure that the response has a 200 status code and return a 2-tuple of a boolean and a string message, which can be asserted to give descriptive error messages.
  • parse_response: parse a requests Response object to retrieve the relevant data from the response body, which is the predictions from the "predictions" key.

These should be used by tests in test_app.py.

Serving Client

Much of the functionality here is inherited from paystone.api.PSAPIClient, which was discussed here. The main purpose of this further wrapper is to present a more convenient interface for the specific endpoints available in a machine learning services: the prediction routes, and the health check route.

Health checks are given a .health_check() method. This facilitates a .poll_for_ready() method which is used when deploying services to ensure that they are healthy before concluding the deployment phase.

Prediction routes are accessed via .predict(), with an argument for which path is being invoked. Since all paths are required to have the same schema, the rest of the signature is static: it takes a list of instances and a parameter data model instance, returning the predictions from the service for that path.

Service Manager

The service manager is a component used in the MLServing API's deployment which manages its OpenAPI configuration. The openapi.yaml config file discussed there is deployed via this service manager.

It can be built incrementally. For example, if a service is to be added to the configuration via .add(), it first loads the current openapi.yaml configuration, then adds the needed config for the new service. The same process occurs for removing services with .remove(). Multiple updates can be made at once via .update_many(). If a service is to be edited, that is a .remove() followed by an .add(), as part of an .update_many().

Before deploying a new schema, it is validated according to our own rules. This validation follows Pydantic conventions, where for any rules that are broken, we first attempt to "fix" the data, and return fixed data if possible. These rules currently are:

  • Major version level paths must exist for services where there exists at least one minor version revision.
    • For example, if there is a /housing/v1/m0/predict path, there must be a matching /housing/v1/predict path.

The contents of the YAML file are represented as a Pydantic data model, which, in addition to validating the data it is provided, has added functionality to merge instances of the model. For example, two partial schemas can be "merged" such that the resulting schema contains a union of the paths and definitions of each model, while retaining the otherwise common configuration between the two. The inverse operation, a "subtraction" of one model from the other, works in a similar fashion. This is essentially how "adding" and "removing" paths works.