This article invites you on a short tour of how to go from exploration to production when working with Machine Learning models. What are the major stages of ML models life cycle? In the last part of the article, we will show an example of architecture based on Docker compose and hosted in the cloud to deploy your machine learning model.
Go From Exploration to Production
Every use case in data science is a small journey that requires often moving from an unclear operational problem to a formal and precise response. This response will have real impacts on the daily lives of employees or customers. From the beginning, this journey will challenge you “to get off the beaten track” because it is a new and fast-evolving field, and there are few standards available. It is quite rare that a successful practice or technology implemented by one company will produce the same benefits for another simply by purchasing the same technologies.
Machine learning projects face uncertainty at every step, from use case identification to delivery, so stay open to new approaches with a view to progressive scale-up. If you are reading this article, it probably means you are looking for ways to push your model one step further. The roadbook in this article is composed of three main stages with resting points between to decide whatever to pursue the journey or not.
Step 1 - The Proof of Concept (POC)
Your starting point is to clearly define what is the output and outcome of your ML project. The output will be what your model will be predicting. The outcome is the purpose of why you do what you do. During this stage, you frame business need and develop first machine learning model to qualify data. Ideally, you also evaluate the expected business gain and associated return on investment. Easy to say but hard to do.
Note: Identify the area of benefits that use cases will target (revenue growth or cost reduction) and estimate a range of potential EBITDA.
At the end of this stage, you come up with a freshly created and validated Proof Of Concept.
Step 2 - Prototyping
If your models are conclusive and business people are confident about the results, you can go ahead to frame a functional pilot in collaboration with IT teams. The purpose is now to qualify functionally and technically your use case. To do so, you start with:
- Defining your target architecture;
- Connecting to data sources;
- Automating data treatment and execution;
- Deploying a monitoring system.
The idea behind this stage is to develop a real application rapidly to collect users feedback and prepare for industrialisation. At the end of this stage, you dispose of an operational and partially automated prototype of your future ML application. From now, you are ready to hand over the lead responsibility to your IT teams. But before that, you need to clean, package and prepare the code for industrialisation.
Step 3 - Packaging & Preparing
To package the code, you need to address two main questions:
Question 1 – What types of technology to deploy your functional pilot?
In terms of technological environment, there are different points to address:
- Programming language: for application and functional development. For example, what language to use for automating data acquisition, exploration and prediction?
- Infrastructure: Cloud or not, private, hybrid or public?
- Open source technology: accepted or forbidden? For example, for dashboarding, system monitoring, etc.
Question 2 – What kind of methods to deliver the code?
- Code versioning,
- Model deployment options,
- Continuous integration / Continuous delivery (CI/CD),
- Tests strategy: unit, acceptance, integration.
One done, it will be time to deal with the automation of processing and monitoring of the whole data pipeline from extraction to exposure. Usually, this part is handled by IT teams in large organisations.
Example of an Architecture Based on Docker Compose and Hosted in the Cloud
Before showing an example of architecture to deploy your machine learning model, we will precise what is Docker compose and its characteristics. In this article, we covered the basics of Docker.
Just a short reminder:
Docker is an application container tool. A container is a sort of virtual machine that executes an application in a dedicated environment.
Quite often, to execute your ML application, you need to run many different applications. This is where Docker compose plays an important role.
Docker-compose is a container orchestrator. It allows us to launch an architecture made of services inside containers.
Now, we come back to our case study and set out the following below framework:
- Python for application and functional development:
- Infrastructure in the cloud (ex. AWS)
- Some open-source technology (ex. Grafana SuperSet, Airflow)
- Git for code versioning
- Docker for deployment
- Jenkins for CI/CD.
As we can see above, infrastructure hosted in the cloud includes storage and compute capabilities. Raw and processed data with predictions will be stored on a storage instance. All processing treatment is executed on a compute instance. The compute instance also provides access to expose ports and endpoints for other applications to consume the stored data.
Behind the “other applications”, you find dashboarding, monitoring, Jupiter Lab for interactive computing in Python and, etc.
To facilitate deployment, it is beneficial to use Docker containers to package each application with its dependencies and Docker compose to orchestrate all containers
There are tons of other possible architectures and technological environment you might choose to deploy your ML models. Remember, one size does not fit all. Be creative. We hope you appreciated this article and got inspired to put your machine learning model into production to make it useful for clients or internal employees!