Accelerating Astrophysics Discovery at Yale with AutoML

AutoML for astrophysics discoveries at Yale - NGC 6240 captured by the Hubble Telescope
Hubble image of galaxy merger NGC 6240
NGC 6240 captured by the Hubble Space Telescope. Credit: NASA/Hubble

Key Points

  • Using Modulos AutoML you can reduce the time from idea to deployable machine learning model significantly. Rather than spending time and attention on model selection and tuning, you can focus on the problem you are trying to solve.
  • Yale astronomy researcher Aritra Ghosh spent more than three months manually building a deep learning classifier for galaxy images.
  • When experimenting with Modulos AutoML, he was able to automatically build a similar performance deep learning classifier with two weeks of computational time.

Studying Supermassive Black Holes in the Early Universe

Photo of Aritra Ghosh
Credit: Aritra Ghosh

By Kevin Schawinski (CEO at Modulos).

Aritra Ghosh from Yale University is working on understanding the evolution of distant galaxies and supermassive black holes. He is trying to understand how growing supermassive black holes (known as “active galactic nuclei” or “quasars”) affect their host galaxies. So a basic observational question is “what are the general properties of the host galaxies of active galactic nuclei?”. Ghosh put his focus on the morphology (the shape) of galaxies with growing black holes. And the first step in doing that for a very large sample of galaxies is to figure out a way to automatically classify the morphologies of various kinds of different galaxies

It is thought that major galaxy collisions have the ability to light up black holes by funneling tremendous amounts of gas to them so that they start shining as quasars. But galaxy collisions also tend to destroy the thin and fragile disk of a galaxy like our own Milky Way. So by studying how many galaxies with and without active black holes exist at a given epoch of the universe, astrophysicists like Ghosh can decode the formation history of the galaxy and the black hole with it.

The Power of Deep Learning

Classifying images is a core machine learning task. To tackle his research questions, Ghosh built a convolutional neural network called Galaxy Morphology Network or GaMorNet (https://arxiv.org/abs/2006.14639). But the time required to start a deep learning project and develop a model with good performance can take time and effort.

Hubble image of the Antennae Galaxies
The Antennae galaxies, captured by the Hubble Space Telescope. Credit: NASA/Hubble


”I started working on the GaMorNet project at the very beginning of 2018. And at that point I had very little knowledge of machine learning and deep learning. So I had to teach myself first the statistical concepts of the things that I was dealing with, and second, the different programming libraries, which allowed me to code, to build up the network in a sense.” says Ghosh. “So from when I started teaching myself about machine learning to the time we had some working version of GaMorNet, it must have been about seven, eight months or so. And then to completely finish the work of building the network, classifying the galaxies, writing the paper and everything, it probably was one and a half years or so.“

Building ML Models Faster

Three to four months of that time, Ghosh spent exclusively on building and tuning his deep learning model. That’s when he turned to Modulos AutoML to see whether AutoML could produce a machine learning model with similar performance as GaMorNet, but without Ghosh’s time investment. We ran this experiment on a 1 GPU machine and after two weeks of compute, AutoML had reached similar performance levels as the hand-crafted GaMorNet. By using a multi-GPU machine, AutoML would likely have delivered even faster.

Screenshot of Modulos AutoML platform building image classifier models
Modulos AutoML training image classifiers.

AutoML Accelerates the Research Process

“I think this is where AutoML primarily comes in. If you have spent enough time knowing your dataset, you have cleaned your dataset so that you know you can get reasonable answers, and you have a general idea of what you think might work, this is where I see AutoML coming in and then kind of taking off from that idea and then giving you more sense of what algorithm and what hyper-parameters will produce the best results on your dataset.”

“[T]his is where I see AutoML coming in and then kind of taking off from that idea and then giving you more sense of what algorithm and what hyper-parameters will produce the best results on your dataset.”

— Aritra Ghosh, Yale University

“And since it keeps trying these models on itself without any kind of human interaction,that also helps. Even if I was trying different algorithms on a supercluster here, I can at most do one model a day. So I submit the job, wait for it to complete, I’m sleeping while the job completes, and only when I start my work day next day, I come back, see the model. And again, I make a decision seeing the results, for example to reduce the learning rate, do something else. AutoML keeps running new models based on its previous results. I start a run, forget about it and then keep checking it every two, three days or one day to see how it’s going. That means it does not require too much time investment from me when it’s running so that I can concentrate.

Modulos AutoML lets researchers like Ghosh automate sophisticated machine learning projects, freeing up their time to focus on how to advance their projects, or to tackle more challenges at the same time.

To support researchers like Ghosh, Modulos offers special programs for academic and non-profit users.