Claudio Bruderer (Product Manager at Modulos)
Modulus AutoML version 0.4.2 is out. With this version, we are adding Time Series forecasting as an alpha feature. This new Machine Learning workflow type allows you to tackle a range of new use cases, e.g. supply and demand forecasting and capacity planning. In addition to this major new feature, we display more workflow status details and are shipping several other refinements and fixes.
Time Series Forecasting (Alpha Feature)
Creating and Training Time Series Workflows
The Modulos AutoML platform allows you to easily find top performing ML Solutions for classification (predicting a category) and regression (predicting a number) tasks. The platform can handle tables (including date & time information) and images as well as any combination of them.
With this AutoML version, we are introducing a new category of ML workflows. This workflow type will allow you to tackle Time Series use cases. In contrast to currently available classification and regression tasks, for Time Series workflows individual samples are not treated as independent data points. Instead, the time dependency of the samples is taken into account and exploited to build better ML models for applicable use cases.
The support for Time Series ML workflows significantly expands the range of use cases you can solve with Modulos AutoML. You can, for instance, generate ML Solutions to forecast growth or demand, predict the future state of your supply, or detect early if your system will require maintenance any time soon. See the next section for an example of how well a Time Series workflow compares to a standard regression workflow.
We are making this Time Series capability available to you as an alpha feature enabled upon request. If you’re interested in using it, we are glad to talk to you about how to apply this to your use case.
Time Series Forecasting in Action
To illustrate how a Time Series workflow helps you for your use cases with a time dependence, let’s revisit the bike sharing use case presented here. Say you are at a company renting out bicycles. You are the person responsible for making sure that there are always enough bikes for people to use. The demand depends on the weather, the weekday, and whether it’s a weekend or a holiday. In short, the number of rented bikes depends on many factors and can be difficult to predict. Modeling this demand is a job for Machine Learning and you would, of course, use our platform to do that.
Prior to this AutoML version, you could have trained a normal regression model. This would have yielded you a well performing model, which predicts next day’s demand using the weather forecast for tomorrow. It would however have neglected the time dependence and would have considered specific days individually. Yesterday’s demand would not have affected the model. In contrast, a Time Series workflow takes the number of rented bikes from previous days into account, yielding better models.
In our example, we have two years’ worth of data for the number of rented bikes per day. During the second year, the demand grew significantly with respect to the first year. This makes it more difficult to generate good predictions for a normal regression model. We use the first 1.5 years to train and validate our model and the last six months to independently test the performance of our ML models. We have created workflows with and without Time Series for this example.
The result is shown on the figure above. By using a Time Series ML model, which takes the past 7 days into account, we can reduce the median deviation. The difference of the predicted relative to the true values shrinks from 923 to 672 bikes. Or in relative terms: going from 14.6% to 10.3% thus boosting the performance of the ML Solution significantly. Thus, our bike sharing company could plan their bike allocation better with this Solution helping it increase its revenue.
More Detailed Visualization of the Workflow Status
In order to find the top performing ML Solutions, a large number (potentially even an infinite number) of models, features parameters, and hyperparameter configurations are applicable. Instead of doing this search by hand, the Modulos AutoML platform does it for you. It efficiently tries out many combinations and finds the best ones. Due to the large number of possibilities, there is however no guarantee that the top performing model is indeed the best Solution.
As a top performing model is often good enough, we have introduced the autopausing feature in AutoML version 0.3.4. Workflows pause automatically, if the score hasn’t improved for the last 200 Solutions. For this release, we have adapted our user interface. It now communicates to the user, if a workflow has indeed been autopaused.
Other Improvements and Bug Fixes
We additionally have made a range of other improvements and fixes to AutoML, which include:
- Expanded the dataset validation during the upload of a dataset onto the platform. It now also detects infinite values as generated by MS Excel. Furthermore, there are additional validations in place for the dataset structure file which describes the uploaded dataset and the relationships within it.
- Extended the coverage of validations for workflow properties further increasing the stability of the platform.
- Improved the computational performance of the feature importance computation. It gives you more insight into the trained ML Solution, while imaging data is excluded now from the analysis.
- Refined the README delivered with the software to streamline the installation further. It now also provides more information on how to embed the Modulos AutoML platform into complex environments.