AutoML v.0.4.1: Interactive Insights Plots and About Pages

Claudio Bruderer

Written by:
Claudio Bruderer (Product Manager at Modulos)

Modulos AutoML version 0.4.1. is out. With this version, we are adding several new features and enhancing existing ones. The focus of the latest AutoML version is to give you more insight into the platform and the trained Solutions. The newest additions include interactive confusion matrix plots and feature importance analyses in the Solutions. There are also enhancements to the dataset upload and new information pages about the software.

Data Science Updates

Interactive Confusion Matrix Plots

Interactive confusion matrix for a classification ML trained on the MNIST dataset.

The Solution is the downloadable output of the Modulos AutoML platform. Besides the trained machine learning model, it contains various documentation on its deployment and plots. These plots give you more insight into the performance of the model.

One of these plots is the confusion matrix. It is an essential plot for classification tasks, as it summarizes the correctly and incorrectly classified samples by the model. The newest version of Modulos AutoML now also contains an interactive form of this plot. It is helpful for cases where a label has a large number of categories. Furthermore, it looks nice!

Permutation Feature Importance

Example Permutation Feature Importance plot for a classification task prediction the occurrence of diabetes in women.

While model performance is crucial, it is also important to understand the main input parameters which influence a prediction. Answering these questions is part of the subfield of machine learning interpretability.

Striving for fully interpretable ML models, we introduce the Permutation Feature Importance plot in the latest version of Modulos AutoML. For a limited set of datasets and ML workflows, the plot shows the importance of each input feature. We randomly shuffle the individual parameter values and assess the impact of the shuffling on the prediction performance. With a higher resulting prediction error, the importance of the corresponding feature increases. This plot makes the ML model more interpretable and allows focusing on the crucial features during data preparation.

Dataset Structure File Upgrade

To upload your data and train ML models using the latest version of AutoML, the data needs to include the dataset structure file (DSSF). This file describes the structure of your dataset and allows you to easily upload a collection of different tables and/or images. The dataset structure file also allows you to specify the exact properties of your features. These are otherwise inferred by the platform.

We have restructured and refined the DSSF to make it easier for you to define all the feature types. These changes are fully backwards compatible. They thus ensure that the platform works seamlessly even for datasets using the previous DSSF version.

Platform Updates

About Pages

About Pages as displayed in the latest AutoML version.

Another new feature of the latest AutoML version are the About Pages. They describe the legal aspects of the platform and the downloadable Solutions, the current state of your AutoML license, and our company. The About Pages now give you more insight into Modulos and our products in addition to the existing documentation materials.

Other Improvements and Bug Fixes

We additionally have made a range of other improvements and fixes to AutoML, which include:

  • Fully redesigned the schema matching pipeline. This pipeline is the core component of our platform. It infers the applicable feature extractors and models combinations (ML modules) for any of your ML problem statements.
  • Refined few minor aspects of the workflow creation as the text on the navigation buttons and the sorting in the applicable ML modules popup.
  • Adapted all search fields on the platforms to be case insensitive leading to a more intuitive user experience.
  • Improved the database commands, which no longer require the backend Docker container to be running.
  • Increased the robustness of the ML model training by preventing rare memory errors. These occurred for specific student t-test settings used in combination with XGBoost or random forest models.