Claudio Bruderer (Head of Product at Modulos)
The latest version of the Modulos Platform, Modulos v1.1.0, is available now! With this version, we are introducing the ability to compute various performance metrics for your Machine Learning (ML) Solutions, letting you benchmark and better understand them. We are also adding a range of new fairness metrics. Finally, among many other improvements, we are adding several heuristics to our data cleaning recommendations: They now tell you not only which data records to review and clean but also what may be wrong with them.
Assess and Benchmark Solution Performances
More often than not, the score of the optimization objective (e.g., “Accuracy”) reached on validation or test data is quoted as the performance of a Solution; but a single number does not reflect the complexity of the feature extractor and the ML model. It is crucial that the performance of a Solution be assessed from different angles before deployment to understand how it will behave under various circumstances.
One way to make Solutions less of a black box is to evaluate different interpretability measures (e.g., Permutation Feature Importance, introduced in Modulos v0.4.1). Another consideration is computing other metrics besides the main optimization metric. In fact, when using our Data-Centric Artificial Intelligence (DCAI) approach (introduced in Modulos v1.0.0), assessing the performance of a Solution is crucial. With DCAI, you iteratively clean and improve the data quality of your dataset, which yields better Solutions.
By evaluating various metrics and how they evolve, you can better understand the performance of different Solutions. This allows you to identify tradeoffs between improving certain metrics; for example, it can help you answer a question like this one: “Does improving the fairness metric Equal Opportunity lead to a decrease in Accuracy?” It also helps you to spot issues with a Solution early on (e.g., “Why is my Accuracy score so much better than my F1 Score? Could my dataset be unbalanced?”).
Additional Fairness Metrics
In the past, there have been various cases of AI models discriminating against different groups. In this example, the AI models trained on historical data led to biases, yielding different credit limits depending on gender. These biases can be mitigated by first evaluating fairness metrics and then improving the data to be fairer. This is one of the core aspects of DCAI. We show and describe how this works in the example use case Fairness in Credit Risk.
With this release, we are introducing a range of new fairness metrics you can evaluate and, for some, directly improve with our DCAI approach. Modulus v1.1.0 features the following existing and new fairness metrics:
- Equalized Odds
- Equal Opportunity
- Disparate Impact
- Predictive Parity
- Statistical Parity
The differences between these metrics are nuanced, yet they matter. In general, the choice of the right fairness objective depends on your data, the type of ML problem you’re solving, your company values, and, lastly, the regulatory environment. By offering a range of metrics, we ensure that you can pick the ones best suited to your use case.
More Heuristics for Data Cleaning Recommendations
The unique value proposition of the DCAI approach is that it identifies flaws in your data. It finds and prioritizes the data records negatively affecting the performance of your Solutions based on whichever objective you want to improve. The highest priority samples are then recommended to be reviewed and cleaned. This DCAI approach is not only effective but also efficient (e.g., see this research paper).
When cleaning data with DCAI, there are two crucial questions: Which samples should I review and clean to yield the largest gains, and what may be wrong with these samples? We tackled the first question with the introduction of DCAI in Modulos v1.0.0. In this release, we are addressing the second by adding more heuristics (e.g., to detect whether a sample may be an outlier) and extending the availability of implemented objectives for more models and objectives. Additionally, we have added functionality to combine the results from different modules: These different indicators give you a more comprehensive view and help you in answering what may be wrong with the flagged records.
Other Improvements and Fixes
- Reworked the platform update procedure and the migration of data from previous software versions.
- Extended the Data Quality Management feature, which identifies data records with a negative impact, to allow for the queuing of tasks.
- Allowed for the automatic removal of features with a single, identical value to be disabled in the Dataset Structure File (metadata describing the uploaded dataset) and increased its version number to 0.5.
- Redesigned the look and behavior of success and error messages across the platform.
- Introduced the new behavior that workflows retraining a Solution (e.g., due to new training data) are archived after they are run.
- Fixed an issue leading to receiver operating characteristic (ROC) plots failing for probabilistic classification in the case of missing categories in the validation dataset.
- Extended the information on individual datasets to also display datasets similar to them, which are available as alternative training and/or validation datasets.