AutoML v.0.4.3: Resource Optimization and More Time Series

Claudio Bruderer

Written by:
Claudio Bruderer (Product Manager at Modulos)

Modulus AutoML v.0.4.3 is out! With this version, you can configure how long your Machine Learning workflow should run for to suit your needs. We are also significantly extending the capabilities of the Time Series workflow. Furthermore, we include a range of other enhancements and refinements.


Configurable Resource Optimization Strategy

Screenshot of the step in the workflow creation process, where the autopausing settings can be configured for a workflow.

To find the best ML models, we need to search the space of different ML model types, model architecture, and hyperparameter combinations. This space of possible Solutions is typically high-dimensional and can even be infinite. Hence, finding and training Solutions is best done systematically and in an automated fashion.

Modulos AutoML efficiently searches this space of possible combinations. Due to the nature of this task though, there can generally be no guarantee of finding the best Solution. This was the reason for us to introduce in AutoML v.0.3.4 the feature that ML workflows are automatically paused if the scores have not improved for a while.

We take it one step further with this release. You are now able to configure workflows with your own autopausing preferences. Should the workflow pause after not improving for X Solutions? Do you want it to pause after reaching a certain score? Do you have a fixed budget and it hence should pause after having computed X Solutions? Or do you want it to never pause automatically? You are now able to make this choice yourself and can configure it when creating a workflow!


Multi Step Time Series Forecasting

With the last release, Modulus AutoML v.0.4.2, we introduced Time Series ML workflows. This workflow type allows you to leverage the time dependency of your data and build ML models to tackle all sorts of forecasting challenges (e.g. forecast growth, demand, supply etc.).

For the previous release, the forecasting model was limited to predicting the immediate next time step. Now you can forecast further into the future and predict multiple steps. The platform builds and trains the ML models (Solutions) accordingly and optimizes them to do well for all forecasted steps.

In the figure above, we illustrate this new feature for the bike sharing use case we have presented here. In this example, a company renting out bicycles wants to forecast their demand for bikes one week from now. As you can see, we are doing quite well in predicting the demand for the next day (left plot), but could do better when forecasting the demand for bikes seven days in advance (right plot). These figures show for which forecast time scales you are meeting the requirements and for how far into the future the model is capable of forecasting reliably.

If you want to use Modulos AutoML with Time Series, we are happy to talk to you about your use case and enable this feature for you.


Other New Features

New Objective: Mean Absolute Percentage Error (MAPE)

Objectives are the key metrics to assess the performance of a ML model and determine what you are optimizing for. The choice of the objective depends on your use case and business requirements.

To broaden the selection, we are adding a new objective with this AutoML release: the Mean Absolute Percentage Error (MAPE). This metric computes the mean relative deviations between your true and predicted values. This objective is available for all regression tasks (predicting a number) like, for instance, Time Series forecasting.

Interactive Feature Importance Graphic

Example interactive Permutation Feature Importance plot for a classification task for predicting customer churn for a company in the telecommunications sector.

When training and building ML Solutions, Modulos AutoML optimizes the model to reach a good score. Besides the model performance, it is also important for the models to be interpretable, to understand how predictions are made, and what impacts them.

In Modulos AutoML v.0.4.1, we have introduced the permutation feature importance plot in the Solution. It is available for a select set of datasets and ML workflows. To compute this plot, we randomly shuffle input parameters and then apply the model. By assessing how much this shuffling affects the model performance, we can identify the crucial input parameters (shuffling an important input parameter leads to large prediction errors).

With this release we have made this plot interactive. This allows you to play around with it and it increases its readability.

AutoML Checker & Diagnosis

While we strive for adding more features with every release, enhancing the user experience platform maintenance is as important. To this end, we are adding the new “automl diagnosis” command. It gives the administrator of the platform the tools to diagnose the state of the platform. We are furthermore also adding additional checks, which are run in the background when executing any maintenance command. For instance, these will test the software versions of the Modulos AutoML prerequisites.


OTHER IMPROVEMENTS AND BUG FIXES

We additionally have made a range of other improvements and fixes to AutoML, which include:

  • Separated the dependencies of the downloadable Modulos AutoML Solutions depending on the corresponding ML models. This means you avoid installing unused dependencies.
  • Added support for the latest version of Docker v.20.10, which is a prerequisite for installing and running our platform.
  • Replaced the summary of the workflow configuration on the “Configuration” tab for each workflow with the summary of the workflow creation step to show all configured fields.
  • Fixed a bug in the Inputs & Label Selection step within the creation process of workflows. For data with numerical input feature names (e.g. “1”, “2”, …), the displayed example values did not match the actual values.
  • Increased the number of displayed elements on overview tables, which list all the workflows or datasets.
  • Fixed a bug that the evaluation of models on validation data failed, if a validation dataset contained exactly 101 samples or a multiple thereof.
  • Updated various Modulos AutoML dependencies; among which Node (v.14.17), sklearn (v.0.24.2), and SQLAlchemy (v.1.4.0).