Categories

Follow Us

Share Post

Share on facebook
Share on twitter
Share on linkedin
Share on whatsapp

Apply the AutoML process to tackle your AI Use Case Idea

By Claudio Bruderer (Product Manager at Modulos).

Every activity in the day-to-day operations of a company produces a wealth of data. This can range from data on the preferences and individual interactions with customers, to the contents and usage of their services, and to the IT operations needed to provide them. Managing this amount of data is difficult in itself; analyzing these data and achieving data-driven insights is even more challenging. This need can however be met by using Analytics and Artificial Intelligence (AI). In this blog post, I focus on the latter and how a company can use AI – and in particular Machine Learning (ML) – to address their needs.

Our AutoML process

Illustration of the Modulos AutoML process.

The AutoML process is how we tackle a use case with AutoML and how this adds value to a company’s service with ML. This process consists broadly of five steps, which are illustrated above, starting from shaping an initial idea for a use case, to implementing it with AutoML, and finally deploying it to production. Each of these individual steps, as well as the entire process itself, are iterative: the use case idea, the data, the model, and the live performance are constantly refined. The steps are:

  • Ideate: An idea for a use case is developed (e.g. by using a Design Thinking approach) and the task to be tackled with ML is defined.
  • Select the Data: The corresponding datasets containing the insights are selected – if needed, enriched with external data -, combined, cleaned, and prepared to be analyzed with ML.
  • Create the ML Model: AutoML is used to easily select and systematically train applicable ML models on the data, yielding the best model as a prediction script for new data.
  • Test the Model: The trained ML model is used to predict properties on test data in order to benchmark and compare it with previous models and approaches.
  • Deploy to Production: The tested prediction script is integrated in existing services and constantly applied to live data, and thus completing the use case.

To illustrate the AutoML process, let’s consider a fictional bike-sharing company and how they can make full use of their data using the Modulos AutoML platform.

An example customer for our fictitious bike-sharing company.

Ideate

In this first step, the bike-sharing company needs to develop their use case idea. There are various creative methods to generate ideas; ideas that either address a pain point or add value is to a service.

Our bike-sharing company could, for instance, try to optimize the supply of bikes at different stations. This would improve the customer experience and increase revenue. In order to do so, the company needs to be able predict the number of bikes that are likely to be rented during the next few days. For a known scenario, the company may already have methods to do that (e.g. heuristics or extrapolation based on past records). In this case, ML can lead to a better performance since the models are trained in an unbiased way and the models do not rely on human decisions. For an unknown scenario on the other hand, heuristics may not even exist yet, but ML is still applicable.

Select the Data

The next step is to select the data, which may contain information leading to more accurate predictions. This is done by domain experts, who have a deeper understanding of which factors may play a role. The data then needs to be combined, cleaned, and prepared to be uploaded to the AutoML platform.

For our use case, the domain experts may, for instance, combine past records of bike usage by registered and casual users. They may also conclude that the weather and the knowledge of whether it is a workday, the weekend, or a local holiday is important and will enrich their dataset with this information.

In this example a publicly available dataset is used, which contains the usage of bikes for one rental station for a two-year period. This data is split into input data for creating the ML models (see “Create the ML Model”) and test data to evaluate the performance of the trained model (see “Test the Model”), for which the last 6 months of data are used.

Create the ML Model (with AutoML)

Next, the ML models well-suited for this use case need to be selected and trained on the cleaned input data. Since selecting and training the ML models by hand can be unsystematic and time-consuming, it is better left to a machine. Modulos AutoML was built to address this need and can train ML models for tabular and image data. For this use case here, we make use of the table regression functionality, as we’d like to predict a number (the total number of rented bikes on a given day).

After importing the dataset onto the platform, you are guided through the creation of the ML models. You are asked to decide what should be predicted, which columns (also called features) should be used for training, and what objective needs to be optimized. Furthermore, you have the opportunity to select the ML models and feature engineering methods to be tested and how the search for the best models is performed. One can also just use the default settings, as one of the key philosophies behind AutoML is that a priori knowledge of ML is not required.

For our bike-sharing company, the following few choices are made:

  • What do we want to predict? Number of bikes rented on a given day.
  • Based on which features are predictions made? All the other columns in the tabular dataset (weather data and calendar information).
  • What objective is to be optimized? Minimization of the median absolute differences between the predicted and true number of rented bikes.
  • What ML models and feature engineering methods are explored? Default choices by the platform (all applicable ML models).
  • How is the space of allowed ML models sampled? Default choices by the platform (random search).

After setting these few configuration choices the AutoML platform takes over. It splits the uploaded dataset into training and validation data, the selected ML models are systematically trained and tested, and their performance scores are displayed. As soon as you are happy with the performance of the trained models (also called solutions), you can stop the process, download the best model and apply this script on new data.

Test the Model

Illustration of the number of rented bikes over time (light green: training data, dark green: predicted numbers).

For the example dataset here, the last 6 months of past records were retained as test data and not uploaded to the platform (see “Select the Data”). For this timeframe, the exact number of rented bikes is known and is used for comparison to the predicted number of bikes.

The model trained with AutoML predicts the general trend of the total of rented bikes well. In absolute numbers, the median absolute difference (see “Create the ML Model”), when comparing the predicted to the known values, is 754 bikes for the test data. This corresponds to a median deviation of 13% (median of rented bikes in the second year: 5,927).

Deploy to Production

Lastly, the ML model can be deployed to production and used for prediction on live data. The platform provides solutions in the form of python scripts. All solutions have the same API,  making it easy for them to be integrated into existing environments and straightforward to be replaced with even better models trained by AutoML.

Our bike-sharing company is now able to predict the number of rented bikes a few days ahead in time, given the weather forecast and the holiday schedule. By predicting the demand more accurately with ML, the supply of bikes can be optimized, thus achieving this use case.

This predictive capability also unlocks other use cases, as the ML solution could, for instance, be used to improve the customer experience by including the predicted demand in the company’s bike-sharing app. It could also be used to first predict the use of bikes by casual users and then target them specifically to make them loyal customers (e.g. with special, tailored offers).

Share Post

Share on facebook
Share on twitter
Share on linkedin
Share on whatsapp

Find Out More

Read More:

Schedule a Demo

Modulos Newsletter

Sign up for our newsletter to receive updates on our products and company.

Romain Lencou

Head of Engineering

Deleted code is debugged code. (Jeff Sickel)

Romain Lencou graduated from the Grenoble Institut National Polytechnique with M.Sc in Computer Science in 2008. Growing up in France in the 90’s, he developed an enthusiasm for pop culture, technology and food. Always eager for technological challenges, Romain worked for companies like VMware, Intel and Logitech, covering various topics including cryptography, virtualization and computer vision. Bitten by the machine learning bug, he is looking forward to apply his problem solving skills in Modulos.

Kevin Schawinski

CEO / Co-Founder

Running a startup is super relaxing, right?

While a Ph.D student, he co-founded the Galaxy Zoo citizen science project involving more than a million members of the public in scientific research because machines weren’t quite good enough yet to go map the cosmos and classify galaxies. He stayed in Oxford as the Henry Skynner Junior Research fellow at Balliol College before moving to Yale as a NASA Einstein Fellow. In 2012, he started the galaxy and black hole research group at ETH Zurich as an assistant professor and began a close collaboration with Ce Zhang from computer science to work on the space.ml project. He is now the CEO of Modulos.

Ce Zhang

Co-Founder

Random is best.

He believes that by making data—along with the processing of data—easily accessible to non-computer scientists, we have the potential to make the world a better place. His current research focuses on building data systems to support machine learning and help facilitate other sciences. Before joining ETH, Ce was advised by Christopher Ré. He finished his PhD round-tripping between the University of Wisconsin-Madison and Stanford University, and spent another year as a postdoctoral researcher at Stanford. His PhD work produced DeepDive, a trained data system for automatic knowledge-base construction. He participated in the research efforts that won the SIGMOD Best Paper Award (2014) and SIGMOD Research Highlight Award (2015), and was featured in special issues including the Science magazine (2017), the Communications of the ACM (2017), “Best of VLDB” (2015), and the Nature magazine (2015).

Alexandra Arvaniti

Operations Manager

“You miss 100% of the shots you don’t take.” – Wayne Gretzky

During the last twenty years, she worked in different roles, setting up and running PMOs, supporting the Executive Management Team or as Operations Manager for the DACH region. She loves all organizational challenges, which she can use well at Modulos, like set up and establish administrative business processes.

Rudolf Bär

Chairman of the Advisory Board

After initially working for Dow Corning International in Zurich and Brussels (1964 to 1969), he held various management functions in the Private Banking Group Julius Baer, Zurich, lastly as CEO from 1993 to 2000 and retired from its Board of Directors in 2005. Since 2014 he has been studying at the Institute for Particle Physics and Astrophysics at the ETH, Zurich.

Marianne Chiesi

Administration

Marianne has worked in administration of various companies and the ETH.

Marianne Chiesi worked in the administration of various companies before taking time off to raise her children. She translated text books and literary works into Braille and joined the ETH Zurich as an administrative assistant. At ETH, she worked with professorships and researchers in many areas, including astrophysicists, particle physicists and biochemists. She now runs the administration at Modulos.

Bojan Karlaš

Software Engineer

Real engineers must be a little bit lazy.

After getting a bachelor’s degree in software engineering at the University of Belgrade, Serbia, Bojan spent 2 years working as a developer at Microsoft building distributed database solutions. He then went to Switzerland to pursue a computer science master’s degree at EPFL. He did his master thesis with Ce Zhang at ETH Zürich on the topic of time series forecasting, after which he joined Ce’s group as a PhD student. His industry experience also includes internships at Microsoft, Oracle and Logitech. His research interests revolve around systems and abstractions for making machine learning accessible to non-experts.

Nikolay Komarevskiy

Software Engineer

Software engineer in his prime

Passionate about nanophotonics and scientific research, he pursued his PhD degree in the Computational Optics group under the supervision of Prof. Christian Hafner at ETH Zurich. In addition to electromagnetics, Nikolay gained profound expertise in optimizations and in evolutionary optimizations in particular. Substantial part of his PhD work was conducted in collaboration with NASA Ames and was dedicated to the design and optimization of photonic reflectors. After a year of Postdoc, Nikolay moved to industry, where he joined an R&D team to employ his experience in electromagnetic/multiphysics simulations and stochastic optimizations. Fascinated by the recent advances in building smart software, Nikolay switched his gears to software engineering and eagerly faces new challenges.

Evangelia Mitsopoulou

Senior Frontend Engineer

Work? What is this? I only know the verb create.

She is g(r)eek frontend advocate. Evangelia holds a M.Sc on ICT (2008) from Aristotle University of Thesslaoniki and a B.Sc on Applied Computer Science (2006) from Univesity of Macedonia in Thessaloniki, Greece. She has worked as a semantic web researcher on EC-funded projects while living in London. The last 8 years she loves mastering the frontend world.

Florian Marty

Sales Manager

Putting Science into the Art of Sales.

As a Ph.D. in Molecular Biology from the University of Zurich, Florian Marty was, like most scientists, not a big fan of sales initially. But, over the years and with growing experience in different commercial roles, he learned that there is a lot of science in what makes good salespeople. Coupled with his open mindset to learn new things and a communicative personality, Florian is fascinated to explore and test new strategies, tactics, and expert moves in sales. As a Sales Manager, he will be joining the commercial team helping to grow the customer base and make Machine Learning accessible to everyone. Fun fact, as Florian has never written a single line of code in his life.

We believe he is the perfect fit to bring across the Modulos value proposition to our customers. Do not hesitate to reach out to Florian to engage in a discussion about Modulos AutoML.

Dominic Stark

Data Scientist

Code quality correlates with food quality.

Dominic Stark studied physics at ETH Zürich. The transition of his career path to Data Science began when he was analyzing UV images of galaxies. Together with Kevin Schawinski an Ce Zhang, he worked on applying the latest advances of deep learning research to his problem. It turned out that the method itself was at least as interesting as the problem they designed it for. After publishing the results, his research project was about using Reinforcement Learning to develop novel ideas for data acquisition in astronomy. As a Data Scientist at Modulos, he keeps on solving problems, that require new ideas and technologies.

Michael Röthlisberger

Data Scientist

Data handling with structure

He started to take an interest in Data Science and Software Development during his master’s degree. For his master thesis he worked on the image reconstruction software for a new PET detector. Michael gained some first experience in an internship for Sensirion AG. There he was part of the R&D team, which was developing a new gas sensor. The participation of a machine learning hackathon was sparking the interest of Michael in ML and he decided to pursue a career in this field. He is now exited to face new challenges with modulos and experience working in a rising start-up.

Dennis Turp

Data Scientist

Dennis Turp is the first employee of Modulos.

Prior to his work at Modulos he studied physics at ETH Zurich. During his Master studies he worked together with Kevin Schawinski and Ce Zhang on exploring machine learning related topics in astronomy. In these one and a half years they published three scientific papers together. Dennis Turp is currently employed as a Data Scientist. His main expertise lies in the fields of generative modeling and anomaly detection.

Andrei Văduva

Software Engineer

The trendsetter geek

He focused his attention on designing Architectures of Computer Systems. During university, he gained an excellent understanding of performance optimization and scalability on architectures such as distributed systems. Having a good experience in various Computer Science fields like big data analytics and Artificial Intelligence, he did his bachelor’s thesis designing a Machine Learning algorithm for social media platforms. After graduation, he joined the investment banking industry, in London, where he gained good experience in designing and building high-quality software. Andrei moved to Switzerland to explore new perspectives and found a great challenge in the startup world. Using his passion for technology and professional experience, he brings the best practices in software engineering to Modulos.

Anna Weigel

Chief Technology Officer

After acquiring Bachelor and Master degrees in Physics, Anna completed her PhD in Astrophysics in Kevin Schawinski’s group at ETH. Her work on the relationship between supermassive black holes and their host galaxies is summarized in five first-author papers. After exploring the depths of our Universe, Anna joined Modulos as the Head of Data Science. She has since been appointed the role of CTO and is now leading the overall technology development.

Claudio Bruderer

Product Manager

Give me coffee to function.

After obtaining a BSc and a MSc degree in physics at ETH Zurich, Claudio decided to continue his studies of the Universe as a PhD student in Prof. Refregier’s Cosmology research group. He studied the gravitational lensing effect, whereby he measured the shapes of several billions of galaxy images (mostly synthetic ones). After acquiring his PhD, Claudio then joined the consulting company AWK Group AG and worked as a project manager and associate for IT and communications projects in the logistics and mobility sectors and for the federal government. Determined to create cutting-edge IT solutions, he decided to join Modulos as a product manager.

Thank you for submitting this form.

Christoph Golombek

Sales Manager

Happy customers, happy Christoph – or is it the other way around?

After finishing his master studies in Energy Technology at RWTH in Germany, Christoph started his professional career as an expert and Sales Support Engineer for wind turbines in cold climates in Canada. There he started seeing the benefits of machine help in tackling data-driven challenges. Having explored the great North, his passion for cutting edge technology drove him into the machine vision sector in Switzerland, where he has worked as a fusion of Sales Engineer and Tech Support, while also acting as a Team Leader of a team of four. At Modulos, he can now focus again on bringing state-of-the-art technology to happy customers.

Thanks for your interest.

A member of our team will be in touch with you shortly.