AutoML v.0.4.6: Direct CSV Table Upload & Workflow Drafts

Claudio Bruderer

Written by:
Claudio Bruderer (Head of Product at Modulos)

Modulos AutoML version 0.4.6 is available now! With this version, we are simplifying and streamlining the user experience by allowing tables to be uploaded directly. You can now interrupt the configuration of Machine Learning (ML) workflows and save them as a draft. And, among many other enhancements and refinements, classification with a probabilistic outcome has been promoted to a full-fledged feature.


Direct Upload of CSV Tables

Screenshot of the Modulos AutoML upload window.

Our motto at Modulos is that we want to make it quick and easy to build state-of-the-art Artificial Intelligence (AI) solutions. We strive for simplicity in every aspect of the platform. In this release, we have focused on the upload of data onto the platform.

Until now, all datasets (whether it was images, tables, and/or any combination of these) had to be described by a metadata file (dataset structure file; DSSF) and packaged as a tarball. With this release, you can now also directly upload single CSV tables. Including a DSSF and packaging them as an archive are no longer required for most use cases (the DSSF offers additional configuration options). Simply drag and drop your tables onto the platform and start configuring your ML tasks right away.

Besides simplifying the upload of tables, we are also adding more configuration settings to the DSSF. Date and time information can be parsed in many different ways; some potentially being difficult to infer (e.g. are your dates day- or month-first?). To not have to rely on the platform guessing your dataset’s custom datetime format correctly, you can now specify it in the DSSF.


Introducing Workflow Drafts

List of fully configured and drafted ML workflows with different statuses.

Once datasets have been uploaded onto the platform, you can create and launch a ML workflow to find and train the best ML Solutions. The workflow configuration step is crucial and requires domain expert knowledge. For instance, you need to decide what the applicable input features are when generating your ML Solution. Also, you will have to select the appropriate optimization objective for your use case.

While creating workflows is usually quick and easy, you may not always have all the necessary information to complete the full workflow creation. Or, maybe you simply want to finish it at a later moment in time. You can now save workflows as drafts and resume their configuration later on. The workflow creation will continue right from where you have left off.


Upgraded Classification with a Probabilistic Outcome

ROC curve for a binary classification use case. The dots and the numbers denote the corresponding probability thresholds yielding a respective True and False Positive Rate. The dashed diagonal line shows the performance of a random classifier.

With the last release Modulos AutoML v.0.4.5, we introduced “classification with a probabilistic outcome” as an alpha feature. This opened up a new dimension to classification tasks: not only can you predict a target category directly, you can now also infer the probability of a sample belonging to a certain category. Using the predicted probabilities allows you to take risk-based decisions, only triggering actions if the probabilities are above a certain threshold. It also allows you to prioritize samples based on their probabilities (e.g. approach customers who have the highest probability to churn first before contacting others).

We have refined this feature further and promoted it from an alpha to a full-fledged feature. It is now available for any classification task no matter the number of label categories. We have furthermore added new insight plots which are provided with every trained ML Solution. One powerful figure to assess the performance of these models is the receiver operating characteristic curve (ROC; see above). It shows the true versus false positive rate of your predictions at different probability thresholds and helps you to choose the ideal one. These ROC curves are now available for both binary and multiclass classification cases.


OTHER IMPROVEMENTS AND BUG FIXES

  • Introduced a new functionality to export, archive, and view trained ML Solutions you want to keep long-term.
  • Restructured and refined the navigation of the ML Solution report (README) provided with every trained Solution. Furthermore, added a tutorial on how to build a prediction client using the REST API in JavaScript (execution of the Solution client is fully programming language-agnostic).
  • Extended the Neural Network ML models to also be available for tabular classification cases.
  • Introduced a speed-up of the background operations when configuring ML workflows for a smoother user experience. 
  • Included a cleanup routine in the installation & update scripts to cleanly remove the old automl packages and Docker images.
  • Refined and added interactivity to the CLI tools when interacting directly with the database.
  • Replaced the WSGI server in the platform backend with gunicorn.
  • Updated our dependencies for future-proofing and to apply all available and applicable security patches.