To help you succeed in your Big Data/Artificial Intelligence project, here is our fourth article on the subject.
If you have been following us since the beginning, you now know how to set up a Data Lab, sometimes spelled Datalab, you know all about the pitfalls of POC, and understand the importance of putting the business vision at the heart of your project.
It’s time to look at the mistakes that prevent a Data Lab project from going into production in a company.
Mistake #1 - Wanting to do everything yourself in your Data Lab projects
Uber, in order to launch its Uber Eats offering, took nearly 18 months to develop its Data & Analytics platform. This shows that, even for a large Silicon Valley company, the end-to-end creation of its own project management platform remains long and complex.
In data science, the necessary technologies are difficult to assemble because they are disparate, so the work of implementation and maintenance is important.
The major risk for a small company is therefore to have a very late ROI (indeed, we cannot address a use case until it is in place), which can jeopardize the data project.
Mistake #2 - Operating in Shadow IT mode in a Data Lab project
This is one of the most recurrent problems. Shadow IT” is the practice of setting up a big data/artificial intelligence project without consulting the IT department first. The latter is therefore neither informed nor involved, and often ends up blocking the production launch.
Indeed, the chosen solutions do not always correspond to the criteria of the IT department in terms of infrastructure quality or IT security, and the projects therefore stop at the foot of the production launch.
It is therefore necessary to include the IT department of your company from the beginning in the management of your data projects.
Mistake #3 - "Bunkerizing" your Data Lake in your Data Lab project
With new regulations such as the RGPD, companies are now very cautious when it comes to digital data, especially personal data, on customers for example, which is reassuring for consumers, but can be a brake on a data science project.
Data lakes, which are increasingly closed (access restrictions, constraints related to the protection of personal data, etc.) to ensure better security, do not let much through. Less incoming data, less outgoing data, and therefore less use cases implemented. And obviously, no data science initiative is possible without all this.
Mistake #4 - Lacking collaboration in your Data Lab projects
As mentioned earlier with the specific case of “shadow IT”, it is not uncommon for teams to lack coordination in their work. This is regularly observed between the IT team and the teams in charge of the DataLab.
It is important to realize that the profiles hired come from different cultures, do not work on the same tools and do not work in the same way.
Indeed, their strategy, management and approach are even opposed: the Data Science team will favor Agile project management with the business or marketing teams, the use of new digital applications and test & learn, while the IT team relies on stricter standards and processes for issues such as IT security in the production environment.
In some cases, this can even lead to developers completely rewriting the data scientists’ code, which again is a huge waste of time.
Mistake #5 - Following artisanal approaches in your data lab projects
From experimentation to actual implementation in the production environment, techniques and solutions differ. We can distinguish between digital technologies known as Data Science, and technologies that are widely used in production.
For example, in Python, there are very advanced modeling libraries (such as the Scikit-Learn library) that are not found in a technology like Java.
Thus, this complicates :
- scaling up,
- the reproducibility of the work for the rest of the company.
Here are five of the main mistakes in managing data projects and setting up a Data Lab that cause the majority of big data/artificial intelligence applications to fail to reach production.
If you want to find out how our customers have avoided these pitfalls with our DataOps platform, feel free to test it for free!