Purpose of MLOps¶

MLOps, machine learning operations, is a relatively new term. It is a sub-discipline of software engineering whose goal is to support the development of machine learning services.

With the emergence in the last decade of machine learning from a branch of statistics to the lifeblood of many software products, we are very much in the early stages of its integration into live applications as a first-class citizen. At this time there are many different philosophies on how this should be done, and relatively few universally accepted standards. It is therefore the burden of each organization hoping to employ machine learning meaningfully in their products to figure these things out for themselves.

So: what does MLOps mean at Paystone?

A World Without MLOps¶

Given its nature as a supportive role to the more front-facing work of developing machine learning services, perhaps the best way to establish the purpose of MLOps is to consider what life would be like if we had ML without Ops. The remainder of this document imagines a world where machine learning engineers — ones who can produce trained models but have no time for much else — are left without any supporting cast, and exposes the gaps in this framework. The gaps we identify become the gaps that MLOps should fill, and the commonalities between these problems will form the principles that justify the reason for its existence.

We’ll start with the problems most immediately relevant to the model development processes being carried out by these hypothetical engineers, and expand our newfound capabilities from there, while ideally stopping short of the superfluous; we won’t burden the MLOps engineers with creating an IDE that develops model architectures automatically.

Experiments and Models Cannot Be Easily Tracked¶

Scientific experimentation is, by its nature, full of failures. More than in any other setting, failures are a form of success, because they successfully resolve hypotheses by confirming that they are false.

However, resolved hypotheses that go undocumented are bound to be proposed again. It is therefore critical to the scientific process that we document as much as we possibly can about our experiments, to avoid repeating ourselves and becoming trapped in an endlessly fruitless cycle. Of course, if we had to determine what it is we need to document every single time we conducted an experiment, we would end up with inconsistent notes that have little practical value to our future selves.

So: a problem that can be solved in order to greatly increase the velocity of work for machine learning engineers is establishing an automated system for exhaustively documenting experiments that produces an accessible catalogue.

Data Cannot Be Easily Accessed¶

Data comes from many sources and exists in many formats. If left with no framework or tools to access this data in some sort of unified fashion, machine learning engineers will have no choice but to write complicated, difficult-to-maintain data ingestion code that slows down their development and makes iterating on experiments a significant challenge.

Providing this unified and simplified interface, without hindering the engineer’s ability to extract and transform the data, is a key role of MLOps.

Models Cannot Be Easily Served¶

There have been many tutorials and blog posts written about “putting machine learning models into production” that represent this process as being as simple as writing a Flask app. The reality is that there is much more to serving models than this.

What infrastructure do we deploy models on?
How do artifacts from training become accessible in a serving environment?
How do we ensure that data used to build training datasets is available when deployed as a service?
How do we ensure that trained transformations are applied equivalently to all requests?
How do we handle streaming prediction requests and batch prediction requests?
How do we structure our API?
How do we expose our API securely?
How do we handle model versioning?
How do we enforce and broadcast the schema of our model prediction endpoints?

To place all of these concerns on the back of the ones expected to solve the mathematical problems that lead to the model artifacts themselves is too much for any one role. A machine learning engineer should be able to focus on implementing the logic that wraps their trained models without having to answer any of these questions from scratch.

Models Cannot Be Deployed Safely¶

Once an API has been established and models can be deployed, this stage is far from complete. The machine learning engineer needs to be able to build confidence that, having answered all of the preceding questions, they will have produced a reliable API that its consumers will trust. They need to know that:

The SLAs they have set can be met by the infrastructure they’ve deployed on.
Edge cases are accounted for and do not break the API.
The API is secure and exposed only to the intended audience.
The model has been rigorously evaluated according to all the tests they wrote before being exposed in the API.

All of these questions can and should be answered via automated processes. If any of the answers are negative, the model should not be deployed.

Models Cannot Be Easily Monitored in Production¶

As is stated many times throughout this documentation, the production of quality ML services does not end with deployment. The service must be monitored for reliability and performance continually while in production, with processes in place to rectify any of these problems before they impact consumers of the API.

This includes not only the infrastructure required to continuously monitor, but also the data to be able to measure the right things. It is one thing to monitor the accuracy of a model’s predictions, and quite another to measure its downstream impact on clients and customers. To produce optimal services, we need both. While it is the responsibility of the machine learning engineer to decide exactly what to monitor in order to achieve these goals, it is the responsibility of the MLOps engineer to provide the tools and data access to implement their decision.

The Developer Experience is Poor¶

Finally, while it is not a “problem to be solved”, it is implied in the notion of MLOps being a supporting role to ML that a responsibility of MLOps is to improve the development experience for ML.

We do, though, have to be careful with this bucket of work. MLOps Engineers are developers too, so the developer experience is one they have deep empathy for, and relevant experience with. This can lead to an over-emphasis on this problem, which is especially problematic when we consider the principle from our Purpose of ML document that the customer comes first. Developer experience tasks have a larger degree of separation from customers than most other types of task, and because of that they are inherently of lower priority.

With that caveat in place: there is definitely work to be done here. This could include:

Wrapping common functionality into an easy to use interface, such as a CLI.
Making it easy to launch development environments on arbitrarily large compute resources, when these are not available locally.
Automatically caring for the mundane and difficult to catch problems in software development, such as code linting.
Creating an environment where it is as difficult as possible to make logical mistakes, such as by making testing fast and easy, or incorporating static type checking into processes.

In general, developer experience is something of a catch-all bucket for enhancements that do not solve any of the preceding problems, but that increase the quality of life and quality of work for machine learning engineers.

Summary¶

The purpose of MLOps is to increase the velocity of development and reliability of performance for machine learning services.