The Institute for Astrophysics at PUC in Chile is using Modulos AutoML to support their research. Chilean scientists are at the forefront of astrophysics research as the mountains in the country are amongst the best sites for telescopes. Students and postdoctoral researchers at the institute have access to Modulos AutoML running on the institute’s supercomputer, Geryon. Kevin Schawinski spoke with Professor Ezequiel Treister about how Automated Machine Learning is enabling cutting edge astronomy research and training students in using machine learning.
What kinds of questions are you pursuing the answers to in your astrophysicist studies?
Ezequiel Treister: There are many problems in astrophysics that are based on a combination of big data and proper machine learning. The overwhelming size and the dimensionality of some problems cannot be analyzed by the human brain; it’s impossible. As a result, it’s very natural to try to combine machine learning techniques and astrophysical data to overcome this deficiency.
Machine learning can be applied to remedy a whole range of problems, if not every problem. Specifically in our team’s case, we are focusing on galaxy morphologies, morphological classifications, and in particular a program that, well yourself started a long time ago, is the identification of major galaxy mergers.
Machine learning is, of course, entering all sorts of research areas. What do you think are the barriers to entry for scientists seeking to use machine learning? Why is it difficult to adopt such technologies?
Ezequiel Treister: It is mostly to do with training. Basically, those belonging to the older generations – like myself – were not trained to use these techniques. Most of us do not have the skills to easily learn or start practicing the techniques. Also, more fundamentally, we don’t have the necessary skills to train those belonging to newer generations either.
It may be easier for the new generation of students and postdocs to start learning and using these techniques, but there is still a learning gap they need to cross which is difficult to overcome. Basically there’s a whole language that needs to be learned; there’s knowledge to acquire on algorithms, technicalities, and hard programming skills and languages need to be learned. Whilst these skills can be learned, it certainly takes time and not many people are willing to pay that price.
That’s a beautiful segue to my next questions because at Modulos we are working on Automated Machine Learning:
· What are the benefits of applying Automated Machine Learning to science projects like the project your team is working on?
· Is it easier to get started on the project by simply using the applications without dealing with the technology on a fundamental level?
Ezequiel Treister: The main benefit of using Modulos, and it makes me very excited about using it more widely, is that it allows us to naturally solve the problems associated with the kind of projects we have in mind. You tend to find these problems in early projects; so typically with undergraduate but even graduate students too.
You tell them about machine learning and the scientific projects we want them to pursue, and immediately they start to spend a lot of time (days, weeks, and even months) focused on the implementation problems. For example, they have problems with libraries; the code doesn’t do what they want; they cannot code what they need to; making use of extensive feedback. So at the beginning you don’t trust the results which means a lot of testing is needed.
This soon adds up to a significant time requirement. The students become frustrated too because they are interested in doing an astrophysical project, but instead they are worrying about a library that does not have something installed or is not doing the right thing.
To give you an example, Modulos allowed us to cross that bridge very easily because you can isolate the programming, the implementation of the code from the scientific problem, and it still allows the students to understand what they’re doing at a conceptual level. They can jump directly into the scientific part and in terms of motivation this is very important. In many ways it makes it much more straightforward for a student as they can directly jump into the scientific problem without having to deal with the implementation aspect.
Is it important for astrophysicist and science students that you are training to gain machine learning skills? Is it important for them to have these skills if they want to start their own companies, or want to work in Chilean companies, for example?
Ezequiel Treister: Absolutely. One of the things that is keeping me very excited is that this summer we are going to offer, so in a couple of months, to undergraduate students in our university, who come from astronomy but also from engineering and other schools, to do research in-person with us. It will be for a relatively short period of time; so you can imagine if they have typically a month to do a scientific project and they need to start machine learning from scratch, they will most likely not achieve anything.
However, using Modulos allows them to directly jump into the problem; to actually start working on the problem; to see the results; to analyze the results in a very short time span. That is fundamental because what they learned, and in our case it will obviously be about astrophysical problem, can basically be translated to everything else they do. For example, if they come from an engineering background, they can still apply in their satellite applications. There is of course basic research involved as well, but in different areas. The idea is that these particular internships become a great experience that they can replicate in the future.