Best R libraries for data science
R is a langage that has been created to manipulate data. This means that R has a lot of built-in tools and functions such as data frames, vectors, matrices and decision trees out of the box to cover your basic needs of data science and even machine learning. Considering this, it might be tempting to […]
How to industrialize a project with the GitHub CI / CD?
This week, Julien Fricou, Data Engineer at Saagie, was able to ask Alain Hélaïli, Principal Solutions Engineer at GitHub, about the use of the platform. The GitHub CI / CD will hold no secrets for you! How to create a workflow (yaml file, graphic editor)? With GitHub Actions, we decided that the philosophy would be […]
How to Easily Schedule Jobs with Apache Airflow?
This article is intended for both Airflow beginners and veterans and aims to present the fundamental objects of this technology as well as its interfacing with Saagie’s DataOps platform. We are not going to explain to you again how to create a Directed Acyclic Graph (commonly called DAG) or how to plan them. Indeed, there […]
How to Manage Machine Learning Deployment?
In this article, you will learn on how to deploy Machine Learning in Agile way to support your data projects. Here are 5 steps to keep in mind when addressing this kind of projects. Machine Learning Deployment Should be Managed as a Project When we think about Machine Learning deployment, we often think just about the […]
What is Natural Language Processing?
Have you ever wondered how your phone could possibly be able to understand what you are saying? Has this brainless pile of metal and plastic acquired the ability to talk with humans? If you already spend time playing with Siri, OK Google or Cortana, trying to fool them with some convoluted questions, you got an […]
How to Deploy a Machine Learning Model?
This article invites you on a short tour of how to go from exploration to production when working with Machine Learning models. What are the major stages of ML models life cycle? In the last part of the article, we will show an example of architecture based on Docker compose and hosted in the cloud to deploy your […]
Put Open Source in your Data Projects
It has become impossible to talk about Data without mentioning open source. Just take a look at the different platforms that offer Big Data solutions, the vast majority of which are open source oriented. For good reason, technologies such as Cassandra, Hadoop, Apache Spark, Talend and many others now offer high quality services for building […]
How to Grow a Culture of Code Quality?
Code is everywhere ; in IT and operational technology, and nowadays in all digital interactions ; code powers all routines of our lives, from the tiniest to the biggest. Trusting our lives to code brings forward new challenges, in particular reliability and transparency. Producing code that is robust and easy to maintain is first and […]
What is Data Visualization? (Including 6 of the Best Tools)
This article will explain what is Data Visualization as well as a selection of 6 tools we trust in order to make your own data rocks! Humankind has produced as much information in two days that it had done in two million years. This phenomenon, resulting from the convergence of digital technologies and telecommunication networks, […]
How to Extract Data From a Document?
Truth is, we’re not good enough yet. All documents are not standardized and we lose such a great amount of time reading them, finding informations we need and putting those into our databases : bills, legal papers, ID cards… Extract data from a document isn’t as simple as it seems. If the documents, from one […]