Dec 19, 2022

Modulos v1.1.2: Data Insights & Cloud Integrations

Written by:
Claudio Bruderer (Head of Product at Modulos)

The latest version of the Modulos Platform, Modulos v1.1.2, is out! This software release adds several exciting new features. They focus on giving you more insights into your datasets and trained Machine Learning Solutions. This version also includes new tools to review and edit data records recommended for cleaning to further strengthen our Data-Centric Artificial Intelligence approach. Lastly, amongst other improvements, we are adding integrations to different cloud platforms for a more seamless dataset import.

Dataset Analyses and Insights

*Screenshot of the distributions of individual feature values in a dataset. They allow to easily explore your data and quickly draw preliminary insights*

Data is the key ingredient for Machine Learning (ML). This idea lies at the core of our Data-Centric AI (DCAI) approach first introduced with Modulos v1.0.0. DCAI aims to yield good and fair ML Solutions by putting the focus on the quality of your data. It gives you the tools to assess your datasets and to identify those flaws which limit your Solutions from reaching a desired outcome (e.g., decrease the discrimination of a ML Solution).

It is crucial to understand your datasets to effectively improve the data quality. This is why the platform now enables you to analyze your datasets by computing various statistics and plots. It also alerts you of potential data issues (e.g., significant number of empty values, large skewness etc.). This allows you to visually and quantitatively assess the distributions of feature values. Furthermore, the Modulos Platform computes the correlation matrix of all the numerical features and highlights the pairs showing strong (anti-)correlations. These pairs can also be investigated more carefully by inspecting the scatter plots.

Analyze Data Quality Flaws

*Interactively investigate data records with a negative and a positive impact on the fairness of the trained ML Solution.*

Another important ingredient in the DCAI journey is the data-model feedback loop that allows for iterative data quality improvements. Once you defined an improvement goal (e.g., improving the accuracy metric of your Solution), the Modulos Platform provides tools to identify which data records have a negative and which have a positive impact towards reaching that objective. By then addressing these flaws and/or acquiring more good data, the data quality is improved and the ML models can be retrained. These steps are repeated until either the objective is satisfied or the performance plateaus.

In addition to just assessing the impact of data records, the latest version of the Modulos Platform now also allows you to investigate these samples and understand what sets them apart. As shown in the animation above, simply select a subset of the data – in a relevant portion of the curve showing the samples ranked by their impact – and study shifts in the distributions of feature values. This is not only useful for insights on systematic data quality issues. It also allows you to characterize data records with a positive impact, which you could use as an input for synthetic data generation pipelines.

Lastly, for quick experimentation, this release furthermore provides an edit functionality to correct wrong label values. Simply review the prioritized list of data records with a negative impact and address potential sources of noise, error, and bias by amending the label values. Then, save the new dataset and automatically trigger the retraining of your ML Solution.

Dataset Import: Cloud Integrations

For the release of the Modulos Platform, we have also significantly extended the dataset import options. We have added several integrations to various cloud data storage sources like Azure Blob Storage, AWS S3, and Git LFS. You can now directly import a dataset stored on one of those systems using a presigned or SAS URL streamlining the dataset import.

Other Improvements and Fixes

In addition to the data insights, we have also added various plots on the performance of trained ML Solutions to the platform.
We have changed the handling and encoding of text features in tabular datasets with a large fraction of unique values for a more robustness.
For the latest version of the Modulos Platform, we have included additional validity checks of the software license when performing various actions.
We have fixed a display issue on the Solution Dashboards to avoid empty scores to be interpolated and not properly denoted as empty values.

Are you excited by all these new features? Are you ready to extract the full value out of your data and ML use case by using Data-Centric AI? Contact us and request a demo today!

Name	Borlabs Cookie
Provider	Owner of this website
Purpose	Saves the settings of the visitors selected in the Borlabs Cookie cookie box.
Cookie Name	borlabs-cookie
Cookie Expiry	1 Year

Name	HubSpot
Provider	HubSpot Inc., 25 First Street, 2nd Floor, Cambridge, MA 02141, USA
Purpose	HubSpot is a user database management service provided by HubSpot, Inc. We use HubSpot on this website for linking it to our newsletter service, the one pager download, and our online marketing activities. It is necessary to accept it in order for all website features to be available.
Privacy Policy	https://legal.hubspot.com/privacy-policy
Host(s)	*.hubspot.com, hubspot-avatars.s3.amazonaws.com, hubspot-realtime.ably.io, hubspot-rest.ably.io, js.hs-scripts.com
Cookie Name	__hs_opt_out, __hs_d_not_track, hs_ab_test, hs-messages-is-open, hs-messages-hide-welcome-message, __hstc, hubspotutk, __hssc, __hssrc, messagesUtk
Cookie Expiry	Session / 30 Minutes / 1 Day / 1 Year / 13 Months

Name	Google Tag Manager
Provider	Google Ireland Limited, Gordon House, Barrow Street, Dublin 4, Ireland
Purpose	Cookie by Google used to control advanced script and event handling.
Privacy Policy	https://policies.google.com/privacy?hl=en
Cookie Name	_ga,_gat,_gid
Cookie Expiry	2 Years

Accept	Google Analytics
Name	Google Analytics
Provider	Google Ireland Limited, Gordon House, Barrow Street, Dublin 4, Ireland
Purpose	Cookie by Google used for website analytics. Generates statistical data on how the visitor uses the website.
Privacy Policy	https://policies.google.com/privacy
Cookie Name	_ga,_gat,_gid
Cookie Expiry	2 Years

Accept	Hotjar
Name	Hotjar
Provider	Hotjar Ltd., Dragonara Business Centre, 5th Floor, Dragonara Road, Paceville St Julian's STJ 3141 Malta
Purpose	Hotjar is an user behavior analytic tool by Hotjar Ltd.. We use Hotjar to understand how users interact with our website.
Privacy Policy	https://www.hotjar.com/legal/policies/privacy/
Host(s)	*.hotjar.com
Cookie Name	_hjClosedSurveyInvites, _hjDonePolls, _hjMinimizedPolls, _hjDoneTestersWidgets, _hjIncludedInSample, _hjShownFeedbackMessage, _hjid, _hjRecordingLastActivity, hjTLDTest, _hjUserAttributesHash, _hjCachedUserAttributes, _hjLocalStorageTest, _hjptid
Cookie Expiry	Session / 1 Year

Accept	Facebook
Name	Facebook
Provider	Meta Platforms Ireland Limited, 4 Grand Canal Square, Dublin 2, Ireland
Purpose	Used to unblock Facebook content.
Privacy Policy	https://www.facebook.com/privacy/explanation
Host(s)	.facebook.com

Accept	Instagram
Name	Instagram
Provider	Meta Platforms Ireland Limited, 4 Grand Canal Square, Dublin 2, Ireland
Purpose	Used to unblock Instagram content.
Privacy Policy	https://www.instagram.com/legal/privacy/
Host(s)	.instagram.com
Cookie Name	pigeon_state
Cookie Expiry	Sitzung

Accept	OpenStreetMap
Name	OpenStreetMap
Provider	Openstreetmap Foundation, St John’s Innovation Centre, Cowley Road, Cambridge CB4 0WS, United Kingdom
Purpose	Used to unblock OpenStreetMap content.
Privacy Policy	https://wiki.osmfoundation.org/wiki/Privacy_Policy
Host(s)	.openstreetmap.org
Cookie Name	_osm_location, _osm_session, _osm_totp_token, _osm_welcome, _pk_id., _pk_ref., _pk_ses., qos_token
Cookie Expiry	1-10 Years

Accept	Twitter
Name	Twitter
Provider	Twitter International Company, One Cumberland Place, Fenian Street, Dublin 2, D02 AX07, Ireland
Purpose	Used to unblock Twitter content.
Privacy Policy	https://twitter.com/privacy
Host(s)	.twimg.com, .twitter.com
Cookie Name	__widgetsettings, local_storage_support_test
Cookie Expiry	Unlimited