https://arxiv.org/abs/1706.06083

Looks to be an important paper in the field. Probably worth revisiting.

**Category:** Propose a new framework for thinking about adversarial optimisation. Frame the problem in a similar way to robust optimisation.

**Content:** Related to: robust optimisation, classes of attacks and defences, extra parameters in the model.

**Correctness:** I saw a few things, see below. Hard to evaluate for most of them.

**Contributions:** Reframe adversarial optimisation as two problems who have a tradeoff against each other. Call this the saddle point problem. It turns out that you can solve it, despite non-convexity. Also, you need curvy decision boundary for robustness, implying you need more model parameters than strictly required.

**Clarity:** Medium. Spacious paper, not dense, but symbols not always defined before usage.

### Summary

This paper used techniques from robust optimisation. I have not heard much about this area before. One summary on the area they offer is Ben-Tal (2009), called Robust optimisation.

The basic idea is that you have two optimisation problems at odds with each other. The “inner maximisation” problem is where you try to perturb each datapoint adversarially in order to maximise the loss function. The “outer minimisation” problem is where you find the best architecture/model parameters for your neural network that minimises the results from the inner maximisation problem. The authors call this the **saddle point problem**.

The outer problem (adversarial example finding) is non-concave and the inner problem (architecture tweaking) is non-convex. Because of this, you’d expect that there would be significant difficulty solving the overall optimisation problem, with no guarantees possible. Yet it appears to be possible to solve it in practice. The authors draw parallels with successful training of neural networks being possible because there are many local minima with similar values.

Because you can solve this problem, the implication is that the neural network now has protection against the $l_\infty$ class of adversarial attacks. There might be other attacks that it is not protected against.

An interesting figure showed the decision boundary and the effect of adversarial perturbation. Adversarial robustness has the effect of making the decision boundary “curve” around the $l_\infty$ allowed space of perturbations for each data point. However, I do wonder about what effect this has on overfitting the function. Maybe it’s a tradeoff between overfitting and the boundary curving?

**Shortcomings**

I couldn’t see any conversation around dealing with attacks outside those in the $l_\infty$ class. This could lead to a false sense of security.

The authors boast about MNIST results, yet their CIFAR results are much worse. I’ve already seen from Carlini and Wagner (2017) that MNIST isn’t the best dataset for this stuff.

**Future topics**

- empirical risk minimisation
- $l_\infty$ bounded attacks (Explaining and harnessing adversarial examples (Goodfellow 2015))
- projected gradient descent
- robust optimisation.
- first order information. I think this refers to “gradients of the loss function with respect to the input”.