May 23, 2022

Modulos v.1.0.0: Introducing Data-Centric AI

Written by:
Claudio Bruderer (Head of Product at Modulos)

We have been busy the past few weeks since the last release of the Modulos platform. It is now finally time to unveil what we have been working on: Modulos v1.0.0 is available as of now! With this version, we are entering the era of Data-Centric Artificial Intelligence (Data-Centric AI). Instead of solely focussing on the training of Machine Learning (ML) models, we are putting the data and its quality at the center of attention. We are convinced that good data yields great ML Solutions.

We are celebrating the launch of our latest version on Wednesday, June 8th, 2022. Book your spot here (launch video)!

The Seven Simple Data-Centric AI Steps

User journey when applying the Data-Centric AI approach.

In just seven steps you can go from having dirty datasets to iteratively cleaning the data quality for ML and deploying top performing ML Solutions (combination of the data pipeline and trained feature extractors and ML models). These seven steps are:

Upload Data: Upload your dirty training and validation datasets onto the platform. The platform automatically performs checks on the datasets and infers helpful statistics.
Find and Train ML Solutions: Define your ML task and launch it. The Modulos Automated Machine Learning (AutoML) feature takes care of the rest. It looks for and finds the best ML Solutions and trains them.
Select ML Solution: Out of the range of trained ML Solutions, pick the Solution that best addresses your requirements (scores, speed, size etc.). Use it to further improve the quality of your training dataset and thus the performance of your ML Solution.
Assess Performance: Before improving your data quality, benchmark the performance of your Solution and compute additional scores.
Improve Data Quality: Define the objective you want to reach by improving the data quality (e.g. boost the accuracy or the fairness). The Data Quality Management feature then identifies the samples in your dataset that negatively affect reaching the objective. It yields a list of prioritized cleaning recommendations to efficiently improve the data quality.
Retrain ML Solution: Use the cleaned version of your dataset to retrain and update your Solution. Repeat the steps 4-6 until you either meet your requirements or you have unlocked the maximum potential of your data.
Deploy ML Solution: Once you are happy with your Solution, download it and easily deploy and integrate it into your services.

Screenshot of a dashboard for a ML Solution. In this example, we are using the platform recommendations to improve the fairness of our dataset with regards to “gender”. The curves show how the “Accuracy” and the “Equalized Odds” (fairness metric) evolve.

Unlocked Use Cases

The Modulos Data-Centric AI platform with its three core features (AutoML, Data Quality Management, retraining Solutions) unlocks the following use cases for you:

Find and train state-of-the-art ML Solutions by using AutoML without requiring any ML domain expertise.
Assess, control, and improve the data quality of your ML datasets with the Data Quality Management feature. It guides you to figure out what “good data” is for your specific application to ensure you get the best possible Solutions.
Efficiently identify and clean wrongly-labeled samples (e.g. if a bad but cheap method was used to assign training labels). No need to check all the samples, just review the ones that have a negative impact.
Retrain your ML Solutions whenever you have new or improved data (e.g. when you have fresh training data).

OTHER IMPROVEMENTS AND FIXES

Added seven new Data Science modules (ML models: Logistic Regression & Multilayer Neural Network models; Objectives: Equalized Odds (fairness); Data Quality Management modules: Shapley Values, Influence Function, Missing Label Detection).
Enabled the upload of datasets with missing entries. Automatically also apply the same imputation on data when generating predictions.
Allowed the use of custom validation datasets instead of the default on-platform training-validation dataset split.
Redesigned all charts on the platform and refined the look and feel of the navigation sidebar.

All our Modulos customers receive our regular releases and updates. If you want to become a customer and use our platform too, then sign up here!

Name	Borlabs Cookie
Provider	Owner of this website
Purpose	Saves the settings of the visitors selected in the Borlabs Cookie cookie box.
Cookie Name	borlabs-cookie
Cookie Expiry	1 Year

Name	HubSpot
Provider	HubSpot Inc., 25 First Street, 2nd Floor, Cambridge, MA 02141, USA
Purpose	HubSpot is a user database management service provided by HubSpot, Inc. We use HubSpot on this website for linking it to our newsletter service, the one pager download, and our online marketing activities. It is necessary to accept it in order for all website features to be available.
Privacy Policy	https://legal.hubspot.com/privacy-policy
Host(s)	*.hubspot.com, hubspot-avatars.s3.amazonaws.com, hubspot-realtime.ably.io, hubspot-rest.ably.io, js.hs-scripts.com
Cookie Name	__hs_opt_out, __hs_d_not_track, hs_ab_test, hs-messages-is-open, hs-messages-hide-welcome-message, __hstc, hubspotutk, __hssc, __hssrc, messagesUtk
Cookie Expiry	Session / 30 Minutes / 1 Day / 1 Year / 13 Months

Name	Google Tag Manager
Provider	Google Ireland Limited, Gordon House, Barrow Street, Dublin 4, Ireland
Purpose	Cookie by Google used to control advanced script and event handling.
Privacy Policy	https://policies.google.com/privacy?hl=en
Cookie Name	_ga,_gat,_gid
Cookie Expiry	2 Years

Accept	Google Analytics
Name	Google Analytics
Provider	Google Ireland Limited, Gordon House, Barrow Street, Dublin 4, Ireland
Purpose	Cookie by Google used for website analytics. Generates statistical data on how the visitor uses the website.
Privacy Policy	https://policies.google.com/privacy
Cookie Name	_ga,_gat,_gid
Cookie Expiry	2 Years

Accept	Hotjar
Name	Hotjar
Provider	Hotjar Ltd., Dragonara Business Centre, 5th Floor, Dragonara Road, Paceville St Julian's STJ 3141 Malta
Purpose	Hotjar is an user behavior analytic tool by Hotjar Ltd.. We use Hotjar to understand how users interact with our website.
Privacy Policy	https://www.hotjar.com/legal/policies/privacy/
Host(s)	*.hotjar.com
Cookie Name	_hjClosedSurveyInvites, _hjDonePolls, _hjMinimizedPolls, _hjDoneTestersWidgets, _hjIncludedInSample, _hjShownFeedbackMessage, _hjid, _hjRecordingLastActivity, hjTLDTest, _hjUserAttributesHash, _hjCachedUserAttributes, _hjLocalStorageTest, _hjptid
Cookie Expiry	Session / 1 Year

Accept	Facebook
Name	Facebook
Provider	Meta Platforms Ireland Limited, 4 Grand Canal Square, Dublin 2, Ireland
Purpose	Used to unblock Facebook content.
Privacy Policy	https://www.facebook.com/privacy/explanation
Host(s)	.facebook.com

Accept	Instagram
Name	Instagram
Provider	Meta Platforms Ireland Limited, 4 Grand Canal Square, Dublin 2, Ireland
Purpose	Used to unblock Instagram content.
Privacy Policy	https://www.instagram.com/legal/privacy/
Host(s)	.instagram.com
Cookie Name	pigeon_state
Cookie Expiry	Sitzung

Accept	OpenStreetMap
Name	OpenStreetMap
Provider	Openstreetmap Foundation, St John’s Innovation Centre, Cowley Road, Cambridge CB4 0WS, United Kingdom
Purpose	Used to unblock OpenStreetMap content.
Privacy Policy	https://wiki.osmfoundation.org/wiki/Privacy_Policy
Host(s)	.openstreetmap.org
Cookie Name	_osm_location, _osm_session, _osm_totp_token, _osm_welcome, _pk_id., _pk_ref., _pk_ses., qos_token
Cookie Expiry	1-10 Years

Accept	Twitter
Name	Twitter
Provider	Twitter International Company, One Cumberland Place, Fenian Street, Dublin 2, D02 AX07, Ireland
Purpose	Used to unblock Twitter content.
Privacy Policy	https://twitter.com/privacy
Host(s)	.twimg.com, .twitter.com
Cookie Name	__widgetsettings, local_storage_support_test
Cookie Expiry	Unlimited