The recent excitement around data science, and big data, has enabled the development of an extremely rich and dynamic ecosystem around the analysis of collected data. Open source tools, which are increasingly easy to use, are enabling many organizations to start analyzing their data. However, the multiplication of data projects and algorithms has also brought to light new problems specific to this type of project, which every data scientist must solve. Projects for which the tools and methods are not yet very mature. In this case, for a machine learning model, these issues include: deployment, continuous control, integration, test procedures, etc. To address these issues, new methods, grouped together under the name of MLOps, have been developed.
A definition of MLOps
For companies, it is still common to apply to a data project the same working and collaboration methods as those of a classic IT application project. This is commonly known as DevOps, a contraction of the words development (Dev) and operations (Ops). DevOps aims to provide a framework with different key steps during the development, deployment and monitoring of an application.
However, more and more experts are pointing out the shortcomings of this approach when applied to a data project. This is why the terms DataOps and MLOps, this time a contraction of Machine Learning and Operations, have recently appeared. The MLOps is intended to be an adaptation of DevOps to the specific problems of Machine Learning.
The development of these MLOps methods responds to the growing needs of companies to conduct data projects, by adopting efficient methods for the development, deployment and control of a Machine Learning system.
Adding a layer of complexity to which DevOps is not suited
Let’s consider a machine learning system (such as a recommendation engine, for example) as a whole: this includes all the elements necessary for the system to work properly. The mathematical model used in this system is a key element. It is one of the two main value-creating elements (along with the data itself).
It also justifies a team of data scientists working on it for several weeks, months or even years. Consequently, during the prototyping phase, this part is at the center of the stakes (because, very often, the margin of maneuver on the data is smaller).
However, putting this model into production requires the integration of many other elements found in DevOps, but which must be adapted to a machine learning system. These elements are those necessary for the system to function properly, among which are the following:
- An infrastructure (server + database),
- A dedicated application to run the model (often in the form of an API),
- Automated data pipelines,
- Monitoring and alert mechanisms,
- Etc…
These elements are generally present in DevOps. However, there are several reasons why a simple copy and paste from DevOps to MLOps is not enough.
First of all, several teams must be involved. The first challenge is therefore human and managerial: to succeed in getting teams that are not necessarily used to working together to collaborate, hence the need to develop dedicated working methods.
Secondly, and more importantly, the supervision of a machine learning system is particularly complex. Among the elements to control are :
- The state of the system: this point is relatively close to classic DevOps, by monitoring the level of load and availability of the system.
- Incoming data: if you reuse the data you retrieve every day online, it is essential to ensure that it is consistent over time. Indeed, it is common for an update of a system element to influence the way a data is retrieved, or for data used in your model to evolve over time, as is the case with demographic data for example. In this case, your system may begin to malfunction.
- Predictions made: if your data changes, then automatically your predictions will change and the results can become very different from those obtained during the tests. In addition, it is possible that the predictions influence the recovered data, as is the case with a recommendation system that is based on user behaviors while influencing these same behaviors. In this case, monitoring the predictions is essential to avoid a chain reaction.
The tools to answer these specific needs
Tools are developed specifically to meet the needs of the MLOps. Once again, the MLOps is new. Therefore there is not yet a well defined standard, and the tools change quickly.
For example, there are tools like Metaflow, developed internally at Netflix, and open source since 2019. The goal is to give data scientists a fixed framework to facilitate the integration of their work, without limiting their ability to create complex models. Reading this article may give you some ideas of tools to test if you are currently facing MLOps problems.