## Permutation Feature Importance: Deep Dive

Written by:
Dennis Turp (Data Scientist at Modulos)

When we work with Machine Learning models, we often report the model’s score; e.g. “my model reached an accuracy of 0.9” or “my R2 score is 0.85”. These performance estimators are easy to understand and practical when benchmarking models against each other. Unfortunately however, they reduce the complexity of the model to a single number. When a company then uses these models to build real applications, new questions arise, which cannot be answered with these single numbers. For example: “Which of my input features is the model relying on to make predictions?”, “Are those predictions trustworthy even for unseen data instances?” or “My model is performing exceptionally well/poorly. Did we make mistakes when preparing the input data?”.

These are all valid questions that we should answer before using a model in a real-life setting. This article will show how permutation feature importance can be used to address some of these issues.

### What is permutation feature importance, and how do we calculate it?

For each feature, permutation feature importance measures the effect that shuffling of its values has on the model’s prediction error. If the shuffling of a feature increases the model error, a feature is deemed important by this measure. This explanation makes intuitive sense; if a model relies heavily on the permuted feature, we would expect a significant change in the prediction. In contrast, permuting a feature that does not have an effect on the error should not change the model prediction. Figure 1 shows a visual explanation of how permutation feature importance can be computed:

This pseudo-code illustrates the computation:

• Input: Trained model $M$, Feature Matrix $X$, labels $y$, error function $E(y, M)$
• Calculate the original model error $E_{orig} = E(y, M(X))$
• For each feature $j$ in $(1, …, P)$ do:
• For each repetition $r$ in $(1,…,R)$ do:
• Randomly shuffle column $j$ of the feature matrix $X$ to create a permuted data set $X^{jr}_{perm}$.
• Estimate error $E^{jr}_{perm} = E(y,M(X^{jr}_{perm}))$ based on the predictions of the permuted data.
• Compute the feature importance value $FI_{j}=\frac{1}{R}\sum_r(|E_{orig} -E_{perm}^{jr}|)$
• Sort all features by descending $FI_j$

Now that we have illustrated how feature importance is calculated, let’s look at how it can help us understand our Machine Learning models.

### Build trust in the computation and analysis

In a first analysis, let us have a look at how feature importance can be used to build trust in the predictions of our Machine Learning models. For that, we will use the “Diabetes” dataset. Kaggle describes this dataset in the following way: “This dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. The objective of the dataset is to diagnostically predict whether or not a patient has diabetes, based on certain diagnostic measurements included in the dataset.” [1]

We use the Modulos AutoML platform to search for the best model and hyperparameter combination for the diabetes dataset. We pick the model with the highest score. In this case, the model yields an accuracy of 0.779. Whether this level of accuracy is sufficient for the task in question is up to medical professionals to decide. However, to build trust into our system, we should be able to explain which features our model relies on to make predictions. After calculating the feature importance for the diabetes dataset, we get the following result.

One can see that the most important feature for predicting if a patient has diabetes is the glucose level. This result makes intuitive sense and helps to build confidence in the system. If, for example, the model would heavily rely on the SkinThickness feature and ignore the Glucose levels altogether, a medical professional would likely deem the model unreliable even though the accuracy might seem sufficient.

### Debug and audit input data

For the following example, we use the bike-sharing dataset from the UCI Machine Learning Repository [2]. Using this dataset, one can forecast the demand for rental bikes based on temperature, weekday features, etc. We pick the model that reaches an R2 Score of 0.98, which is almost perfect. Looking at the feature importance graphic, we can see that the only essential features for the model’s decision are the number of bikes rented by registered users and casual bike rentals.

Taking a closer look at those features, we realize that the quantity that we want to predict, the total number of bike rentals, corresponds to the sum of the registered and causal rentals. Since both features are present during training, creating a model with an almost perfect score was easy. In a real-world scenario however, the registered and casual bike rental numbers are unknown to the rental service in advance. Since those two numbers are not available during inference, we made a mistake in our data preparation.

Thus, the feature importance graphic revealed that we made a mistake in our data processing.

### Caveats

So far, we have seen that feature importance can be a helpful tool to analyze and understand how Machine Learning models generate predictions. But, there are certain pitfalls and conclusions one should avoid when looking at feature importance plots:

###### 1. Permutation feature importance calculations are always model-specific. For different models, different features can be important.

Please select a model and observe that the feature importance changes. The most important feature for all models is highlighted.

Select a model
###### 2. Not doing enough permutations in the computation of the feature importance can lead to false/inaccurate results.

Please drag the slider to see that the most important feature changes and only stabilize with higher-order permutations.

Number of permutations
###### 3. Strong correlations between features can reduce the overall importance of the correlated features. The Machine Learning model learns to rely on the information present in both features instead of only depending on one.

Please drag the slider to observe that adding features, which are strongly correlated with feature_0, decreases the importance of feature_0.

Additional features strongly correlated with feature_0:

### Permutation Feature importance with Modulos

In the Modulos AutoML release 0.4.1, we introduced permutation feature importance for a limited set of datasets and ML workflows. For these workflows, the Modulos AutoML platform computes the permutation feature importance for all solutions. The static plots and feature importance data shown in this blog post were automatically created using the Modulos AutoML software. If you are interested in knowing more or trying out the platform, don’t hesitate to contact us.

If you found this explanation insightful, feel free to share it!

## Jérôme Fischer

### Sales Development

“The only way to do great work, is to love what you do.” – Steve Jobs

Jérome Fischer is an expert on Sales. Apart from the successful build up of several companies like the Ad Interim Sales GmbH and the Sales4IT GmbH, he passes on his experience in various Sales and Marketing coachings. Jérome now supports Modulos in establishing the first contact with our customers.

In his free time, Mr. Fischer is an ambitious athlete with numerous awards.

## Dominic Stark

### Data Scientist

Code quality correlates with food quality.

Dominic Stark studied physics at ETH Zürich. The transition of his career path to Data Science began when he was analyzing UV images of galaxies. Together with Kevin Schawinski an Ce Zhang, he worked on applying the latest advances of deep learning research to his problem. It turned out that the method itself was at least as interesting as the problem they designed it for. After publishing the results, his research project was about using Reinforcement Learning to develop novel ideas for data acquisition in astronomy. As a Data Scientist at Modulos, he keeps on solving problems, that require new ideas and technologies.

## Kevin Schawinski

### CEO / Co-Founder

Running a startup is super relaxing, right?

While a Ph.D student, he co-founded the Galaxy Zoo citizen science project involving more than a million members of the public in scientific research because machines weren’t quite good enough yet to go map the cosmos and classify galaxies. He stayed in Oxford as the Henry Skynner Junior Research fellow at Balliol College before moving to Yale as a NASA Einstein Fellow. In 2012, he started the galaxy and black hole research group at ETH Zurich as an assistant professor and began a close collaboration with Ce Zhang from computer science to work on the space.ml project. He is now the CEO of Modulos.

## Ce Zhang

### Co-Founder

Random is best.

He believes that by making data—along with the processing of data—easily accessible to non-computer scientists, we have the potential to make the world a better place. His current research focuses on building data systems to support machine learning and help facilitate other sciences. Before joining ETH, Ce was advised by Christopher Ré. He finished his PhD round-tripping between the University of Wisconsin-Madison and Stanford University, and spent another year as a postdoctoral researcher at Stanford. His PhD work produced DeepDive, a trained data system for automatic knowledge-base construction. He participated in the research efforts that won the SIGMOD Best Paper Award (2014) and SIGMOD Research Highlight Award (2015), and was featured in special issues including the Science magazine (2017), the Communications of the ACM (2017), “Best of VLDB” (2015), and the Nature magazine (2015).

## Alexandra Arvaniti

### Operations Manager

“You miss 100% of the shots you don’t take.” – Wayne Gretzky

During the last twenty years, she worked in different roles, setting up and running PMOs, supporting the Executive Management Team or as Operations Manager for the DACH region. She loves all organizational challenges, which she can use well at Modulos, like set up and establish administrative business processes.

## Rudolf Bär

### Chairman of the Advisory Board

After initially working for Dow Corning International in Zurich and Brussels (1964 to 1969), he held various management functions in the Private Banking Group Julius Baer, Zurich, lastly as CEO from 1993 to 2000 and retired from its Board of Directors in 2005. Since 2014 he has been studying at the Institute for Particle Physics and Astrophysics at the ETH, Zurich.

## Marianne Chiesi

Marianne has worked in administration of various companies and the ETH.

Marianne Chiesi worked in the administration of various companies before taking time off to raise her children. She translated text books and literary works into Braille and joined the ETH Zurich as an administrative assistant. At ETH, she worked with professorships and researchers in many areas, including astrophysicists, particle physicists and biochemists. She now runs the administration at Modulos.

## Bojan Karlaš

### Software Engineer

Real engineers must be a little bit lazy.

After getting a bachelor’s degree in software engineering at the University of Belgrade, Serbia, Bojan spent 2 years working as a developer at Microsoft building distributed database solutions. He then went to Switzerland to pursue a computer science master’s degree at EPFL. He did his master thesis with Ce Zhang at ETH Zürich on the topic of time series forecasting, after which he joined Ce’s group as a PhD student. His industry experience also includes internships at Microsoft, Oracle and Logitech. His research interests revolve around systems and abstractions for making machine learning accessible to non-experts.

## Romain Lencou

Deleted code is debugged code. (Jeff Sickel)

Romain Lencou graduated from the Grenoble Institut National Polytechnique with M.Sc in Computer Science in 2008. Growing up in France in the 90’s, he developed an enthusiasm for pop culture, technology and food. Always eager for technological challenges, Romain worked for companies like VMware, Intel and Logitech, covering various topics including cryptography, virtualization and computer vision. Bitten by the machine learning bug, he is looking forward to apply his problem solving skills in Modulos.

## Dennis Turp

### Data Scientist

Dennis Turp is the first employee of Modulos.

Prior to his work at Modulos he studied physics at ETH Zurich. During his Master studies he worked together with Kevin Schawinski and Ce Zhang on exploring machine learning related topics in astronomy. In these one and a half years they published three scientific papers together. Dennis Turp is currently employed as a Data Scientist. His main expertise lies in the fields of generative modeling and anomaly detection.

## Michael Röthlisberger

### Data Scientist

Data handling with structure

He started to take an interest in Data Science and Software Development during his master’s degree. For his master thesis he worked on the image reconstruction software for a new PET detector. Michael gained some first experience in an internship for Sensirion AG. There he was part of the R&D team, which was developing a new gas sensor. The participation of a machine learning hackathon was sparking the interest of Michael in ML and he decided to pursue a career in this field. He is now exited to face new challenges with modulos and experience working in a rising start-up.

## Laura Guerrini

### Data Science Intern

Laura Guerrini is the first intern of Modulos.

Laura is currently finishing her Master’s in Robotics, Systems and Control at ETH. During her studies, she focused on machine learning, control theory and optimization. She joined Modulos as a Data Science Intern to put theory into practice and boost her machine learning and programming skills.

## Andrei Văduva

### Software Engineer

The trendsetter geek

He focused his attention on designing Architectures of Computer Systems. During university, he gained an excellent understanding of performance optimization and scalability on architectures such as distributed systems. Having a good experience in various Computer Science fields like big data analytics and Artificial Intelligence, he did his bachelor’s thesis designing a Machine Learning algorithm for social media platforms. After graduation, he joined the investment banking industry, in London, where he gained good experience in designing and building high-quality software. Andrei moved to Switzerland to explore new perspectives and found a great challenge in the startup world. Using his passion for technology and professional experience, he brings the best practices in software engineering to Modulos.

## Anna Weigel

### Chief Technology Officer

After acquiring Bachelor and Master degrees in Physics, Anna completed her PhD in Astrophysics in Kevin Schawinski’s group at ETH. Her work on the relationship between supermassive black holes and their host galaxies is summarized in five first-author papers. After exploring the depths of our Universe, Anna joined Modulos as the Head of Data Science. She has since been appointed the role of CTO and is now leading the overall technology development.

## Claudio Bruderer

### Product Manager

Give me coffee to function.

After obtaining a BSc and a MSc degree in physics at ETH Zurich, Claudio decided to continue his studies of the Universe as a PhD student in Prof. Refregier’s Cosmology research group. He studied the gravitational lensing effect, whereby he measured the shapes of several billions of galaxy images (mostly synthetic ones). After acquiring his PhD, Claudio then joined the consulting company AWK Group AG and worked as a project manager and associate for IT and communications projects in the logistics and mobility sectors and for the federal government. Determined to create cutting-edge IT solutions, he decided to join Modulos as a product manager.

## Christoph Golombek

### Sales Manager

Happy customers, happy Christoph – or is it the other way around?

After finishing his master studies in Energy Technology at RWTH in Germany, Christoph started his professional career as an expert and Sales Support Engineer for wind turbines in cold climates in Canada. There he started seeing the benefits of machine help in tackling data-driven challenges. Having explored the great North, his passion for cutting edge technology drove him into the machine vision sector in Switzerland, where he has worked as a fusion of Sales Engineer and Tech Support, while also acting as a Team Leader of a team of four. At Modulos, he can now focus again on bringing state-of-the-art technology to happy customers.

## Florian Marty

### Sales Manager

Putting Science into the Art of Sales.

As a Ph.D. in Molecular Biology from the University of Zurich, Florian Marty was, like most scientists, not a big fan of sales initially. But, over the years and with growing experience in different commercial roles, he learned that there is a lot of science in what makes good salespeople. Coupled with his open mindset to learn new things and a communicative personality, Florian is fascinated to explore and test new strategies, tactics, and expert moves in sales. As a Sales Manager, he will be joining the commercial team helping to grow the customer base and make Machine Learning accessible to everyone. Fun fact, as Florian has never written a single line of code in his life.

We believe he is the perfect fit to bring across the Modulos value proposition to our customers. Do not hesitate to reach out to Florian to engage in a discussion about Modulos AutoML.