Aug 05, 2021

Permutation Feature Importance: Deep Dive

Written by:
Dennis Turp (Data Scientist at Modulos)

When we work with Machine Learning models, we often report the model’s score; e.g. “my model reached an accuracy of 0.9” or “my R2 score is 0.85”. These performance estimators are easy to understand and practical when benchmarking models against each other. Unfortunately however, they reduce the complexity of the model to a single number. When a company then uses these models to build real applications, new questions arise, which cannot be answered with these single numbers. For example: “Which of my input features is the model relying on to make predictions?”, “Are those predictions trustworthy even for unseen data instances?” or “My model is performing exceptionally well/poorly. Did we make mistakes when preparing the input data?”.

These are all valid questions that we should answer before using a model in a real-life setting. This article will show how permutation feature importance can be used to address some of these issues.

What is permutation feature importance, and how do we calculate it?

For each feature, permutation feature importance measures the effect that shuffling of its values has on the model’s prediction error. If the shuffling of a feature increases the model error, a feature is deemed important by this measure. This explanation makes intuitive sense; if a model relies heavily on the permuted feature, we would expect a significant change in the prediction. In contrast, permuting a feature that does not have an effect on the error should not change the model prediction. Figure 1 shows a visual explanation of how permutation feature importance can be computed:

Figure 1: Shows a visual explanation of how to calculate the feature importance value for one input feature. The upper row shows the table with the original data and predictions made by the model. The bottom row shows the same data but with permuted values for **Feature_2** and the corresponding predictions made by the model on the permuted data. If the model heavily relies on **Feature_2** for its predictions, the feature importance value will be large. On the other hand, if the model does not rely on Feature_2, permuting it will not impact the predictions and the feature importance value will be small.

This pseudo-code illustrates the computation:

Input: Trained model $M$, Feature Matrix $X$, labels $y$, error function $E(y, M)$
Calculate the original model error $E_{orig} = E(y, M(X)) $
For each feature $j$ in $(1, …, P)$ do:
- For each repetition $r$ in $(1,…,R)$ do:
  - Randomly shuffle column $j$ of the feature matrix $X$ to create a permuted data set $X^{jr}_{perm}$.
  - Estimate error $E^{jr}_{perm} = E(y,M(X^{jr}_{perm}))$ based on the predictions of the permuted data.
- Compute the feature importance value $FI_{j}=\frac{1}{R}\sum_r(|E_{orig} -E_{perm}^{jr}|)$
Sort all features by descending $FI_j$

Now that we have illustrated how feature importance is calculated, let’s look at how it can help us understand our Machine Learning models.

Build trust in the computation and analysis

In a first analysis, let us have a look at how feature importance can be used to build trust in the predictions of our Machine Learning models. For that, we will use the “Diabetes” dataset. Kaggle describes this dataset in the following way: “This dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. The objective of the dataset is to diagnostically predict whether or not a patient has diabetes, based on certain diagnostic measurements included in the dataset.” [1]

We use the Modulos AutoML platform to search for the best model and hyperparameter combination for the diabetes dataset. We pick the model with the highest score. In this case, the model yields an accuracy of 0.779. Whether this level of accuracy is sufficient for the task in question is up to medical professionals to decide. However, to build trust into our system, we should be able to explain which features our model relies on to make predictions. After calculating the feature importance for the diabetes dataset, we get the following result.

One can see that the most important feature for predicting if a patient has diabetes is the glucose level. This result makes intuitive sense and helps to build confidence in the system. If, for example, the model would heavily rely on the SkinThickness feature and ignore the Glucose levels altogether, a medical professional would likely deem the model unreliable even though the accuracy might seem sufficient.

Debug and audit input data

For the following example, we use the bike-sharing dataset from the UCI Machine Learning Repository [2]. Using this dataset, one can forecast the demand for rental bikes based on temperature, weekday features, etc. We pick the model that reaches an R2 Score of 0.98, which is almost perfect. Looking at the feature importance graphic, we can see that the only essential features for the model’s decision are the number of bikes rented by registered users and casual bike rentals.

Taking a closer look at those features, we realize that the quantity that we want to predict, the total number of bike rentals, corresponds to the sum of the registered and causal rentals. Since both features are present during training, creating a model with an almost perfect score was easy. In a real-world scenario however, the registered and casual bike rental numbers are unknown to the rental service in advance. Since those two numbers are not available during inference, we made a mistake in our data preparation.

Thus, the feature importance graphic revealed that we made a mistake in our data processing.

Caveats

So far, we have seen that feature importance can be a helpful tool to analyze and understand how Machine Learning models generate predictions. But, there are certain pitfalls and conclusions one should avoid when looking at feature importance plots:

1. Permutation feature importance calculations are always model-specific. For different models, different features can be important.

Please select a model and observe that the feature importance changes. The most important feature for all models is highlighted.

Select a model

2. Not doing enough permutations in the computation of the feature importance can lead to false/inaccurate results.

Please drag the slider to see that the most important feature changes and only stabilize with higher-order permutations.

Number of permutations

3. Strong correlations between features can reduce the overall importance of the correlated features. The Machine Learning model learns to rely on the information present in both features instead of only depending on one.

Please drag the slider to observe that adding features, which are strongly correlated with feature_0, decreases the importance of feature_0.

Additional features strongly correlated with feature_0:

Permutation Feature importance with Modulos

In the Modulos AutoML release 0.4.1, we introduced permutation feature importance for a limited set of datasets and ML workflows. For these workflows, the Modulos AutoML platform computes the permutation feature importance for all solutions. The static plots and feature importance data shown in this blog post were automatically created using the Modulos AutoML software. If you are interested in knowing more or trying out the platform, don’t hesitate to contact us.

If you found this explanation insightful, feel free to share it!

References

[1] https://www.kaggle.com/uciml/pima-indians-diabetes-database (external link)
[2] https://archive.ics.uci.edu/ml/datasets/bike+sharing+dataset (external link)
[3] https://christophm.github.io/interpretable-ml-book/feature-importance.html (external link)
[4] https://scikit-learn.org/stable/modules/permutation_importance.html (external link)

Name	Borlabs Cookie
Provider	Owner of this website
Purpose	Saves the settings of the visitors selected in the Borlabs Cookie cookie box.
Cookie Name	borlabs-cookie
Cookie Expiry	1 Year

Name	HubSpot
Provider	HubSpot Inc., 25 First Street, 2nd Floor, Cambridge, MA 02141, USA
Purpose	HubSpot is a user database management service provided by HubSpot, Inc. We use HubSpot on this website for linking it to our newsletter service, the one pager download, and our online marketing activities. It is necessary to accept it in order for all website features to be available.
Privacy Policy	https://legal.hubspot.com/privacy-policy
Host(s)	*.hubspot.com, hubspot-avatars.s3.amazonaws.com, hubspot-realtime.ably.io, hubspot-rest.ably.io, js.hs-scripts.com
Cookie Name	__hs_opt_out, __hs_d_not_track, hs_ab_test, hs-messages-is-open, hs-messages-hide-welcome-message, __hstc, hubspotutk, __hssc, __hssrc, messagesUtk
Cookie Expiry	Session / 30 Minutes / 1 Day / 1 Year / 13 Months

Name	Google Tag Manager
Provider	Google Ireland Limited, Gordon House, Barrow Street, Dublin 4, Ireland
Purpose	Cookie by Google used to control advanced script and event handling.
Privacy Policy	https://policies.google.com/privacy?hl=en
Cookie Name	_ga,_gat,_gid
Cookie Expiry	2 Years

Accept	Google Analytics
Name	Google Analytics
Provider	Google Ireland Limited, Gordon House, Barrow Street, Dublin 4, Ireland
Purpose	Cookie by Google used for website analytics. Generates statistical data on how the visitor uses the website.
Privacy Policy	https://policies.google.com/privacy
Cookie Name	_ga,_gat,_gid
Cookie Expiry	2 Years

Accept	Hotjar
Name	Hotjar
Provider	Hotjar Ltd., Dragonara Business Centre, 5th Floor, Dragonara Road, Paceville St Julian's STJ 3141 Malta
Purpose	Hotjar is an user behavior analytic tool by Hotjar Ltd.. We use Hotjar to understand how users interact with our website.
Privacy Policy	https://www.hotjar.com/legal/policies/privacy/
Host(s)	*.hotjar.com
Cookie Name	_hjClosedSurveyInvites, _hjDonePolls, _hjMinimizedPolls, _hjDoneTestersWidgets, _hjIncludedInSample, _hjShownFeedbackMessage, _hjid, _hjRecordingLastActivity, hjTLDTest, _hjUserAttributesHash, _hjCachedUserAttributes, _hjLocalStorageTest, _hjptid
Cookie Expiry	Session / 1 Year

Accept	Facebook
Name	Facebook
Provider	Meta Platforms Ireland Limited, 4 Grand Canal Square, Dublin 2, Ireland
Purpose	Used to unblock Facebook content.
Privacy Policy	https://www.facebook.com/privacy/explanation
Host(s)	.facebook.com

Accept	Instagram
Name	Instagram
Provider	Meta Platforms Ireland Limited, 4 Grand Canal Square, Dublin 2, Ireland
Purpose	Used to unblock Instagram content.
Privacy Policy	https://www.instagram.com/legal/privacy/
Host(s)	.instagram.com
Cookie Name	pigeon_state
Cookie Expiry	Sitzung

Accept	OpenStreetMap
Name	OpenStreetMap
Provider	Openstreetmap Foundation, St John’s Innovation Centre, Cowley Road, Cambridge CB4 0WS, United Kingdom
Purpose	Used to unblock OpenStreetMap content.
Privacy Policy	https://wiki.osmfoundation.org/wiki/Privacy_Policy
Host(s)	.openstreetmap.org
Cookie Name	_osm_location, _osm_session, _osm_totp_token, _osm_welcome, _pk_id., _pk_ref., _pk_ses., qos_token
Cookie Expiry	1-10 Years

Accept	Twitter
Name	Twitter
Provider	Twitter International Company, One Cumberland Place, Fenian Street, Dublin 2, D02 AX07, Ireland
Purpose	Used to unblock Twitter content.
Privacy Policy	https://twitter.com/privacy
Host(s)	.twimg.com, .twitter.com
Cookie Name	__widgetsettings, local_storage_support_test
Cookie Expiry	Unlimited