Modulos v.1.0.0: Introducing Data-Centric AI

Claudio Bruderer

Written by:
Claudio Bruderer (Head of Product at Modulos)

We have been busy the past few weeks since the last release of the Modulos platform. It is now finally time to unveil what we have been working on: Modulos v1.0.0 is available as of now! With this version, we are entering the era of Data-Centric Artificial Intelligence (Data-Centric AI). Instead of solely focussing on the training of Machine Learning (ML) models, we are putting the data and its quality at the center of attention. We are convinced that good data yields great ML Solutions.

We are celebrating the launch of our latest version on Wednesday, June 8th, 2022. Book your spot on the following link (launch video)!


The Seven Simple Data-Centric AI Steps

User journey when applying the Data-Centric AI approach.

In just seven steps you can go from having dirty datasets to iteratively cleaning the data quality for ML and deploying top performing ML Solutions (combination of the data pipeline and trained feature extractors and ML models). These seven steps are:

  1. Upload Data: Upload your dirty training and validation datasets onto the platform. The platform automatically performs checks on the datasets and infers helpful statistics.
  2. Find and Train ML Solutions: Define your ML task and launch it. The Modulos Automated Machine Learning (AutoML) feature takes care of the rest. It looks for and finds the best ML Solutions and trains them.
  3. Select ML Solution: Out of the range of trained ML Solutions, pick the Solution that best addresses your requirements (scores, speed, size etc.). Use it to further improve the quality of your training dataset and thus the performance of your ML Solution.
  4. Assess Performance: Before improving your data quality, benchmark the performance of your Solution and compute additional scores.
  5. Improve Data Quality: Define the objective you want to reach by improving the data quality (e.g. boost the accuracy or the fairness). The Data Quality Management feature then identifies the samples in your dataset that negatively affect reaching the objective. It yields a list of prioritized cleaning recommendations to efficiently improve the data quality.
  6. Retrain ML Solution: Use the cleaned version of your dataset to retrain and update your Solution. Repeat the steps 4-6 until you either meet your requirements or you have unlocked the maximum potential of your data.
  7. Deploy ML Solution: Once you are happy with your Solution, download it and easily deploy and integrate it into your services.

Screenshot of a dashboard for a ML Solution. In this example, we are using the platform recommendations to improve the fairness of our dataset with regards to “gender”. The curves show how the “Accuracy” and the “Equalized Odds” (fairness metric) evolve.

Unlocked Use Cases

The Modulos Data-Centric AI platform with its three core features (AutoML, Data Quality Management, retraining Solutions) unlocks the following use cases for you:

  • Find and train state-of-the-art ML Solutions by using AutoML without requiring any ML domain expertise.
  • Assess, control, and improve the data quality of your ML datasets with the Data Quality Management feature. It guides you to figure out what “good data” is for your specific application to ensure you get the best possible Solutions.
  • Efficiently identify and clean wrongly-labeled samples (e.g. if a bad but cheap method was used to assign training labels). No need to check all the samples, just review the ones that have a negative impact.
  • Retrain your ML Solutions whenever you have new or improved data (e.g. when you have fresh training data).

OTHER IMPROVEMENTS AND FIXES

  • Added seven new Data Science modules (ML models: Logistic Regression & Multilayer Neural Network models; Objectives: Equalized Odds (fairness); Data Quality Management modules: Shapley Values, Influence Function, Missing Label Detection).
  • Enabled the upload of datasets with missing entries. Automatically also apply the same imputation on data when generating predictions.
  • Allowed the use of custom validation datasets instead of the default on-platform training-validation dataset split.
  • Redesigned all charts on the platform and refined the look and feel of the navigation sidebar.

All our Modulos customers receive our regular releases and updates. If you want to become a customer and use our platform too, then sign up on the following link!