back · main · about · writing · notes · reading · d3 · now · contact

Notes on DeepFool
27 Feb 2020 · 1046 words

Paper: Deepfool: a simple and accurate method to fool deep neural networks. (Moosavi-Dezfooli et al, 2016).

Idea: If you want to find an adversarial example to an image, look for the closest decision boundary, and then move the image towards a linear approximation of the decision boundary by orthogonally projecting it onto the boundary. Once it crosses the boundary it will be an adversarial image.


Affine functions. These are very similar to linear functions. The only difference is that while a linear function fixes the origin, an affine function doesn’t have to. An affine function is just a linear function that has been translated. In terms of a neural network, it’s usually the input matrix multiplied by the weight matrix. In the paper an affine classifier is defined as $f(\mathbf{x} = \mathbf{w}^T\mathbf{x} + b)$ although it can apply to anything of the form $wx + b.$

Level sets. A level set is a set where a function is equal to a constant value $c$. It is defined as $ L_c(f) = { (x_1, \cdots, x_n) : f(x_1, \cdots, x_n) =c }$. If there are two variables $x_1, x_2$, the level set is a line or curve; if there are three variables $x_1, x_2, x_3$, the level set is a plane or surface; for more variables the level set forms a hyperplane or hypersurface. The paper defines decision boundaries as level sets: i.e. $\mathscr{F} = {\mathbf{x} : \mathbf{w}^T\mathbf{x}+b = 0 }$.

Function linearisation. Say you had a function $f(x)$. The idea of linearisation is the tangent line at the point $(a, f(a)$ will approximate the function $f(x)$ well for $x$-values near $a$.

image-20200227123752105 The slope of the curve at $x=-1$ is a good approximation of the curve near that point.

You can approximate the output of $f$ at a point $x$ by calculating $f(x) = f(a) + f’(a)(x-a)$, the first order Taylor expansion around the point $a$. For the curve above, $a=-1$. As you get further and further away from $a=-1$, the approximation gets worse and worse.

For example, say $f(x) = \sqrt{x}$ , and you wanted to know the value $f(x)$ at a point $x=4.001$ (i.e. $\sqrt{4.001}$ ) using a linear approximation. You would calculate $f(a) + f’(a)(x-a)$ which equals $\sqrt{a} + \frac{1}{2\sqrt{a}}(x-a)$. You might use $a=4$ as a close value. So $x = 4.001$ and $a=4$, and the linear approximation is $f(x) = \sqrt{4.001} = \sqrt{4} + \frac{(4.001 - 4)}{2\sqrt{4}} = 2.0025$.


Overview: Say we are attacking $\mathbf{x}_0$, which has true label as digit 4, and we want to classify it as a 3. Then the decision boundary is given by the level set $\mathscr{F_3} = {\mathbf{x}:F(\mathbf{x})_4 - F(\mathbf{x})_3 = 0 }$.

Let $f(\mathbf{x}) = F(\mathbf{x})_4 - F(\mathbf{x})_3 $. The hyperplane of the decision boundary is linearised with a first-order Taylor expansion: $\mathscr{F’_3} = {\mathbf{x}: f(\mathbf{x}) \approx f(\mathbf{x}_0) + \langle \nabla_\mathbf{x} f(\mathbf{x}_0) \cdot (\mathbf{x}-\mathbf{x}_0) \rangle = 0 }$. Then an orthogonal vector $\mathbf{r}(\mathbf{x}_0)$ is calculated from $\mathbf{x}_0$ to $\mathscr{F}’_3$, which is the same as finding $\mathbf{x}_0$ projected onto the plane $\mathscr{F}’_3$. The vector $\mathbf{r}(\mathbf{x}_0) $ is the perturbation that creates the adversarial example : it shifts $\mathbf{x}_0$ beyond the decision boundary $\mathscr{F}_3$

image-20200227120219108 Hyperplanes $\mathscr{F}_i$ separate datapoints of different classes. (Adversarial Attacks and Defenses in Images, Graphs and Text: A Review., Xu et al (2019))


Start with an affine/linear classifier, $f(x) = \mathbf{w}^T\mathbf{x}+b$.

The decision boundary is the hyperplane $\mathscr{F} = {x: \mathbf{w}^T\mathbf{x}+b=0 }$.

The robustness of $f$ at a point $ x_0$ is called $\Delta(x_0;f)$ and is the shortest distance between $x_0$ and $\mathscr{F}$ : the smallest amount of perturbation required to changed the labelled class of the image. You find this distance by calculating the orthogonal projection of $\mathbf{x_0}$ onto $\mathscr{F}$ and it works out to be $\mathbf{r}_*(\mathbf{x}_0) := - \frac{f(\mathbf{x}_0)}{||\mathbf{w}||^2_2} \mathbf{w}$, a closed-form solution.

image-20200228081508951 Figure 2 of the DeepFool paper.

So that’s for an affine classifier $f$.

Next comes the case where $f$ isn’t linear, but instead some curved surface. Like before, we want to find the distance $\mathbf{r}_*(\mathbf{x}_0)$, the minimal perturbation before the label of $\mathbf{x}_0$ is changed.

Now there isn’t a closed-form solution for this, so an iterative method is used. image-20200228085938117 Pseudocode for the algorithm

image-20200228085829198 Visualisation of the method, with original caption

This has all been for the binary case. I won’t go into detail for the multi-class case, but the same ideas hold, except you have multiple decision boundaries.

image-20200228093055662 The multiclass affine case

You are trying to move a point towards (and eventually over) the nearest decision boundary, since that is the one with the least perturbation.

If you have a curved decision boundary, you’ll need to take linear approximations and do the iterative algorithm again.


Here is the pseudocode for this algorithm:

image-20200228095421929 Pseudocode for the multi-class classification case

The method is a bit slower than FGSM, but it works much better. It’s been superceded by other methods today.

back · main · about · writing · notes · reading · d3 · now · contact