Categories

Follow Us

Share Post

Share on facebook
Share on twitter
Share on linkedin
Share on whatsapp

Artificial Neural Networks: Deep Dive

By Anna Weigel (CTO at Modulos).

Have you ever wondered how features such as facial recognition on your phone, and autocomplete in text and emails actually work? The answer lies within artificial neural networks.

Today, artificial neural networks are commonly used in many segments of life – everywhere from machine translations, chatbots, YouTube’s recommended section, and speech recognition, all the way to automatic driving. Interestingly, artificial neural networks also play an important role in the perfume industry – they are used to combine ingredients and develop new fragrances in a manner unattainable for humans. Apart from these commercial purposes, neural networks have helped biologists solve a 50-year-old protein folding challenge. In a major scientific advance, an artificial intelligence system has been recognized as a solution to figuring out what shapes proteins fold into. This demonstrates the ability of AI to significantly accelerate progress in scientific discoveries.

Although significant in latest technologies and innovations, artificial neural networks are not a new invention. The first artificial neural networks are often accredited to Warren McCulloch and Walter Pitts. In 1943, McCulloch and Pitts used an electrical circuit to recreate the function of neurons in the human brain. They had thus created a computational model for neural networks based on algorithms known as Threshold Logic. In the 2000s, increased available computing power, the use of GPUs, and distributed computing helped to successfully overcome challenges in training neural networks. Artificial neural networks were deployed on a large scale, specifically in image and visual recognition problems, which we now know of as deep learning.

The Underlying Principle of Neural Networks

An artificial neural network consists of dozens to millions of artificial neurons (called processing units or nodes) arranged in a series of layers. Each node connects to other nodes via links. Each of those links has a specific weight that determines the strength of one node’s influence on the other. The node takes these into account and computes the weighted sum (weights w) of its inputs (X) and a bias (b). It then applies an activation function (φ) and passes the result on to the next node.

The purpose of the activation function is to introduce non-linearity. Commonly used methods include the sigmoid (scale sum to range between 0 and 1) and ReLU (Rectified Linear Unit, max(0, x)) activation functions.

The other term besides the weighted sum is the bias, which is similar to the constant in a linear function (y = ax + b). It allows an additional shift of the weighted sum. 

Schematic depiction of a single neuron in an artificial neural network. It shows how a neuron combines the inputs X to generate the output Y.

Network Layers

Within an artificial neural network, nodes are arranged in multiple layers. We distinguish between the input layer, hidden layers, and the output layer.

Nodes in the input layer receive input but don’t perform any computation. Hidden nodes on the other hand receive information from previous layers, perform computations, and pass their outputs on to the output layer. Lastly, the output layer nodes perform the final computation and determine the network’s output. With deep neural networks therefore, the “deep” refers to the number of layers in the network.

Schematic illustration of an artificial neural network with three layers (input, hidden, and output layer). It generates an output Y based on input X.

The output layer’s activation function determines the model type (e.g. linear for a regression model, softmax for classification). In a fully connected network, each neuron connects to all neurons from the previous layer. That’s not the case with a convolutional neural network (see below).

What about building a neural network?

Even the aforementioned simple example has 7 free parameters (4 weights and 3 biases), while realistic networks contain many more layers and neurons. In order to build a neural network, we need to train it.

Training a Neural Network

Backpropagation

A common method used to train an artificial neural networks is backpropagation. It works in the following way: To start off, we need to initialize all weights and biases. One option is to randomly assign weights and set biases to 0. To generate a prediction then, we feed the input forward through the network – this is referred to as forward propagation. After comparing the prediction to the true label, we calculate the error by using the loss function (e.g. MSE for regression, cross entropy for classification).

Our aim is to minimize the error of the output layer, i.e. to reduce the loss. The prediction depends on the parameters (weights and biases) of the nodes in previous layers. So, to reduce the loss and produce a more accurate prediction, the nodes’ parameters have to be updated. We therefore propagate the errors back through the network (backpropagation) and use an optimization method to choose new parameters.

Finally, we propagate the input through the network again and repeat the process.

Optimization strategy

The optimization method, or the employed strategy to update the parameters, is key here. Gradient descent is a popular optimization algorithm: to minimize the loss function, we compute the gradient. In general, the gradient tells us how a slight variation of a function’s input changes the output. When trying to find the minimum of the loss function, i.e. a network’s optimal parameters, the gradient points us in the direction of the minimum. As we move backwards through the network, we compute the gradient (i.e. the partial derivative) of the loss function with respect to the parameters in the previous layer. This allows us to estimate how a variation of a layer’s weights and biases impacts all successive layers and the final loss.

Once we have computed the gradient, we update our network’s parameters – for each weight, we subtract the corresponding gradient multiplied by the learning rate.

The learning rate corresponds to the step size we take to approach the minimum – choose a learning rate that is too high and you might miss the minimum; choose a learning rate that is too low and you will need more iterations to find the minimum, and might get stuck in a local minimum.

Gradient descent comes in different versions, e.g. batch gradient descent (inject all data at once), stochastic gradient descent (use a random sample for each iteration), and mini batch gradient descent (feed network with N random samples).

Now that we’ve mastered training of neural networks, it’s time to get to know some of the different types that exist, each serving a different use case and task.

Illustration of how a Convoluted Neural Network analyzes an input image.

Examples of Specialized Neural Networks

Convolutional Neural Networks

In contrast to fully connected networks, convolution neural networks contain convolutional layers. Nodes in one layer only connect to some local nodes in the previous layer. This is similar to convolving the input with a filter. Compared to a fully connected network, there are significantly fewer weights.

Convolution neural networks have proven especially effective for images.

Recurrent Neural Networks

Feedforward neural networks, like the one described above, assume that the input and output data are independent from each other – they map one input to one output. Recurrent neural networks (RNNs) have a “memory”; they remember information from previous inputs. Input and output are not independent from each other – the information from previous inputs impacts the current input and output.

RNNs are commonly used for sequential data, e.g. in natural language processing where the order of words impacts the meaning.

Autoencoders

The goal of autoencoders is not to return classifications or numerical values for regression, rather it’s to learn a representation of the data. They can compress, as well as reconstruct data. Aiming to reproduce their own input, they are trained in an unsupervised way. In its most basic form, an autoencoder consists of an input layer, at least one hidden layer, and an output layer. The input layer maps the input data to the lower dimensional hidden layer (encoder). The output layer then maps the data back to its original dimension (decoder).

Autoencoders are useful for image denoising, anomaly detection, or general dimensionality reduction.

Finding the Best Model

During hyperparameter tuning, we try to find the best solution within a search space that can be infinite. It is hence plausible to automate the process.

As a rough rule, more complex data sets require more complex models, such as neural networks. Keep in mind that training neural networks is time consuming and computationally expensive. But you also might not always need a neural network. Depending on the task at hand, simpler models (e.g. random forest and ridge regression) can perform similarly.

However, this is a guideline. It’s difficult to tell ahead of time which type of model would be best for your use case. That’s why it’s important not to discard simpler models right away, and approach model selection and hyperparameter tuning in an unbiased and systematic way. This is exactly what we strive to do with the Modulos AutoML platform.

If you’d like to learn more about the concepts behind machine learning, please head over to our Resources Page where we have a series of videos on the topic.

Share Post

Share on facebook
Share on twitter
Share on linkedin
Share on whatsapp
Evangelina Mitsopoulou

Evangelia Mitsopoulou

Senior Frontend Engineer

Work? What is this? I only know the verb create.

She is g(r)eek frontend advocate. Evangelia holds a M.Sc on ICT (2008) from Aristotle University of Thesslaoniki and a B.Sc on Applied Computer Science (2006) from Univesity of Macedonia in Thessaloniki, Greece. She has worked as a semantic web researcher on EC-funded projects while living in London. The last 8 years she loves mastering the frontend world.

Kevin Schawinski

Kevin Schawinski

CEO / Co-Founder

Running a startup is super relaxing, right?

While a Ph.D student, he co-founded the Galaxy Zoo citizen science project involving more than a million members of the public in scientific research because machines weren’t quite good enough yet to go map the cosmos and classify galaxies. He stayed in Oxford as the Henry Skynner Junior Research fellow at Balliol College before moving to Yale as a NASA Einstein Fellow. In 2012, he started the galaxy and black hole research group at ETH Zurich as an assistant professor and began a close collaboration with Ce Zhang from computer science to work on the space.ml project. He is now the CEO of Modulos.

Ce Zhang

Ce Zhang

Co-Founder

Random is best.

He believes that by making data—along with the processing of data—easily accessible to non-computer scientists, we have the potential to make the world a better place. His current research focuses on building data systems to support machine learning and help facilitate other sciences. Before joining ETH, Ce was advised by Christopher Ré. He finished his PhD round-tripping between the University of Wisconsin-Madison and Stanford University, and spent another year as a postdoctoral researcher at Stanford. His PhD work produced DeepDive, a trained data system for automatic knowledge-base construction. He participated in the research efforts that won the SIGMOD Best Paper Award (2014) and SIGMOD Research Highlight Award (2015), and was featured in special issues including the Science magazine (2017), the Communications of the ACM (2017), “Best of VLDB” (2015), and the Nature magazine (2015).

Alexandra Arvaniti

Alexandra Arvaniti

Operations Manager

“You miss 100% of the shots you don’t take.” – Wayne Gretzky

During the last twenty years, she worked in different roles, setting up and running PMOs, supporting the Executive Management Team or as Operations Manager for the DACH region. She loves all organizational challenges, which she can use well at Modulos, like set up and establish administrative business processes.

Rudolf Bar

Rudolf Bär

Chairman of the Advisory Board

After initially working for Dow Corning International in Zurich and Brussels (1964 to 1969), he held various management functions in the Private Banking Group Julius Baer, Zurich, lastly as CEO from 1993 to 2000 and retired from its Board of Directors in 2005. Since 2014 he has been studying at the Institute for Particle Physics and Astrophysics at the ETH, Zurich.

Marianne Chiesi

Marianne Chiesi

Administration

Marianne has worked in administration of various companies and the ETH.

Marianne Chiesi worked in the administration of various companies before taking time off to raise her children. She translated text books and literary works into Braille and joined the ETH Zurich as an administrative assistant. At ETH, she worked with professorships and researchers in many areas, including astrophysicists, particle physicists and biochemists. She now runs the administration at Modulos.

Bojan Karlaš

Bojan Karlaš

Software Engineer

Real engineers must be a little bit lazy.

After getting a bachelor’s degree in software engineering at the University of Belgrade, Serbia, Bojan spent 2 years working as a developer at Microsoft building distributed database solutions. He then went to Switzerland to pursue a computer science master’s degree at EPFL. He did his master thesis with Ce Zhang at ETH Zürich on the topic of time series forecasting, after which he joined Ce’s group as a PhD student. His industry experience also includes internships at Microsoft, Oracle and Logitech. His research interests revolve around systems and abstractions for making machine learning accessible to non-experts.

Nikolay Komarevskiy

Nikolay Komarevskiy

Software Engineer

Software engineer in his prime

Passionate about nanophotonics and scientific research, he pursued his PhD degree in the Computational Optics group under the supervision of Prof. Christian Hafner at ETH Zurich. In addition to electromagnetics, Nikolay gained profound expertise in optimizations and in evolutionary optimizations in particular. Substantial part of his PhD work was conducted in collaboration with NASA Ames and was dedicated to the design and optimization of photonic reflectors. After a year of Postdoc, Nikolay moved to industry, where he joined an R&D team to employ his experience in electromagnetic/multiphysics simulations and stochastic optimizations. Fascinated by the recent advances in building smart software, Nikolay switched his gears to software engineering and eagerly faces new challenges.

Romain Lencou

Romain Lencou

Head of Engineering

Deleted code is debugged code. (Jeff Sickel)

Romain Lencou graduated from the Grenoble Institut National Polytechnique with M.Sc in Computer Science in 2008. Growing up in France in the 90’s, he developed an enthusiasm for pop culture, technology and food. Always eager for technological challenges, Romain worked for companies like VMware, Intel and Logitech, covering various topics including cryptography, virtualization and computer vision. Bitten by the machine learning bug, he is looking forward to apply his problem solving skills in Modulos.

Dominic Stark

Dominic Stark

Data Scientist

Code quality correlates with food quality.

Dominic Stark studied physics at ETH Zürich. The transition of his career path to Data Science began when he was analyzing UV images of galaxies. Together with Kevin Schawinski an Ce Zhang, he worked on applying the latest advances of deep learning research to his problem. It turned out that the method itself was at least as interesting as the problem they designed it for. After publishing the results, his research project was about using Reinforcement Learning to develop novel ideas for data acquisition in astronomy. As a Data Scientist at Modulos, he keeps on solving problems, that require new ideas and technologies.

Modulos Newsletter

Sign up for our newsletter to receive updates on our products and company.

Michael Röthlisberger

Michael Röthlisberger

Data Scientist

Data handling with structure

He started to take an interest in Data Science and Software Development during his master’s degree. For his master thesis he worked on the image reconstruction software for a new PET detector. Michael gained some first experience in an internship for Sensirion AG. There he was part of the R&D team, which was developing a new gas sensor. The participation of a machine learning hackathon was sparking the interest of Michael in ML and he decided to pursue a career in this field. He is now exited to face new challenges with modulos and experience working in a rising start-up.

Dennis Turp

Dennis Turp

Data Scientist

Dennis Turp is the first employee of Modulos.

Prior to his work at Modulos he studied physics at ETH Zurich. During his Master studies he worked together with Kevin Schawinski and Ce Zhang on exploring machine learning related topics in astronomy. In these one and a half years they published three scientific papers together. Dennis Turp is currently employed as a Data Scientist. His main expertise lies in the fields of generative modeling and anomaly detection.

Andrei Văduva

Andrei Văduva

Software Engineer

The trendsetter geek

He focused his attention on designing Architectures of Computer Systems. During university, he gained an excellent understanding of performance optimization and scalability on architectures such as distributed systems. Having a good experience in various Computer Science fields like big data analytics and Artificial Intelligence, he did his bachelor’s thesis designing a Machine Learning algorithm for social media platforms. After graduation, he joined the investment banking industry, in London, where he gained good experience in designing and building high-quality software. Andrei moved to Switzerland to explore new perspectives and found a great challenge in the startup world. Using his passion for technology and professional experience, he brings the best practices in software engineering to Modulos.

Modulos appoints Anna Weigel as CTO

Anna Weigel

Chief Technology Officer

After acquiring Bachelor and Master degrees in Physics, Anna completed her PhD in Astrophysics in Kevin Schawinski’s group at ETH. Her work on the relationship between supermassive black holes and their host galaxies is summarized in five first-author papers. After exploring the depths of our Universe, Anna joined Modulos as the Head of Data Science. She has since been appointed the role of CTO and is now leading the overall technology development.

Claudio Bruderer

Claudio Bruderer

Product Manager

Give me coffee to function.

After obtaining a BSc and a MSc degree in physics at ETH Zurich, Claudio decided to continue his studies of the Universe as a PhD student in Prof. Refregier’s Cosmology research group. He studied the gravitational lensing effect, whereby he measured the shapes of several billions of galaxy images (mostly synthetic ones). After acquiring his PhD, Claudio then joined the consulting company AWK Group AG and worked as a project manager and associate for IT and communications projects in the logistics and mobility sectors and for the federal government. Determined to create cutting-edge IT solutions, he decided to join Modulos as a product manager.

Thank you for submitting this form.

Christoph Golombek

Christoph Golombek

Sales Manager

Happy customers, happy Christoph – or is it the other way around?

After finishing his master studies in Energy Technology at RWTH in Germany, Christoph started his professional career as an expert and Sales Support Engineer for wind turbines in cold climates in Canada. There he started seeing the benefits of machine help in tackling data-driven challenges. Having explored the great North, his passion for cutting edge technology drove him into the machine vision sector in Switzerland, where he has worked as a fusion of Sales Engineer and Tech Support, while also acting as a Team Leader of a team of four. At Modulos, he can now focus again on bringing state-of-the-art technology to happy customers.

Florian Marty

Florian Marty

Sales Manager

Putting Science into the Art of Sales.

As a Ph.D. in Molecular Biology from the University of Zurich, Florian Marty was, like most scientists, not a big fan of sales initially. But, over the years and with growing experience in different commercial roles, he learned that there is a lot of science in what makes good salespeople. Coupled with his open mindset to learn new things and a communicative personality, Florian is fascinated to explore and test new strategies, tactics, and expert moves in sales. As a Sales Manager, he will be joining the commercial team helping to grow the customer base and make Machine Learning accessible to everyone. Fun fact, as Florian has never written a single line of code in his life.

We believe he is the perfect fit to bring across the Modulos value proposition to our customers. Do not hesitate to reach out to Florian to engage in a discussion about Modulos AutoML.