AlphaD3M now has grammar, baby

There’s this lovely little paper called Automatic Machine Learning by Pipeline Synthesis using Model-Based Reinforcement Learning and a Grammar. It’s by Drori et al (ICML workshop 2019), and here’s the paper link: https://arxiv.org/pdf/1905.10345.pdf

But, what is AlphaD3M? Don’t know? Read this summary before continuing, brave adventurer.

I won’t give you an overview of this paper here (but the title kind of already does).

So, what’s interesting about it? The takeaways? Give it to me nice and straight, you cry, and don’t you dare waste words on frivolous sentences.

Sure. Here you go.

Intresting shit Link to heading

Constraining primitives with a context free grammar works well. A sentence is made from words, and a pipeline is made from primitives. But only some word combinations make a sentence, and only some primitive combinations make a pipeline. It’s clear why a grammar is useful here. Why allow pipelines that don’t make sense?

The grammar vastly reduced the MCTS search space. The branching factor went down three-fold and the average search depth by an order of magnitude. Now the MCTS searches through less stuff and it’s a lot faster.

The model performance was unaffected by using a grammar. For some tasks performance was much better.

When comparing AutoML methods, the first part of computation is the most important. Many methods, given enough time, will eventually reach the same performance level. These methods will search the entire space until an optimal solution is found. What’s important is how quickly you get this optimal solution.

Pre-trained AlphaD3M trains twice as fast as AlphaD3M learning from scratch. AutoSklearn is twice as slow again.

Code is available for this variant of AlphaD3M, which is cool. It comes as a Dropbox folder.

Some other stuff Link to heading

I think the architecture of the model is the same as the original AlphaD3M paper. The paper doesn’t emphasise any changes, at least.

The pre-trained model is pre-trained using “other datasets”, which I guess means datasets not considered in the 74 ones from OpenML used here. You could say the context free grammar is kind of pre-trained as well, since it’s based on machine learning pipelines that already exist.

Let’s talk a bit about primitives:

  • Primitives are divided up into three categories. These are data cleaning, data transformation, and estimators. Since the authors wanted to compare this version of AlphaD3M against AutoSklearn, they only included Sklearn primitives since that is all AutoSklearn can use. AlphaD3M can handle other primitives too allegedly.
  • There are 2 data cleaning primitives. They are for either missing value imputation or for creating a column as a missing value indicator. They are needed to get the estimators working. Some sklearn estimators don’t like missing values.
  • There are 11 data transformation primitives. They include onehot encoding, ordinal encoding, PCA and feature selection. Stuff that transforms your data, unsurprisingly.
  • There are 38 estimator primitives. These are divided into 16 classification and 22 regression primitives. Two examples: Ridge Classifier and Linear SVC.

I appreciated the authors giving more detail about primitives in this paper. It helped me understand it a bit better.

Conclusion Link to heading

AlphaD3M got faster and a bit better. It did this through: (a) using a context free grammar to restrict the space of ML pipelines, and (b) using a pre-trained model.