This will briefly elaborate what I have done in the past two weeks. I won't be highlighting the troubleshooting part which took the maximum time in these 2 previous weeks, but I will be showcasing the pull requests we made and some math behind the gradient of a supposedly complex function.

I have learnt a lot about gradients, `adjoint`

s and `rrule`

s in these weeks. The learning curve was close to $\tan{\frac{\pi}{2}}$ (for me). It was very frustrating in the beginning, but at the end the results were very delightful.

Some PRs I got merged/will be merged -

There might be some more PRs in `Zygote`

and `StaticArrays`

, since we feel there is a bit of type piracy. You can also track the realtime progress here.

Some math now.

Our aim till now is to make warping functions in `ImageTransformations`

differentiable.

Warps can be defined to be $Y = f(X;W)$, where $Y$ is the output image/matrix, $X$ is the input image/matrix and $W$ is the transform.

Transforms can be represented as matrices; some examples of transforms are image resizing, rotation and homography transforms.

The current progress of how warps are generated is something like this -

create an interpolation object of the input image (input would always be an

`AbstractArray`

), and then create an extrapolation of the same interpolation. You can find more about Julia's Interpolations library here.Get the indices which preserve all the information from $X$ after applying the transform.

Apply transform pixelwise to these above indices and store the mapping data in the form of an image with above indices.

This may be a little vague (yes it is), but for that, you'll have to explore a bit of transforms yourself.

For the case where the Interpolation scheme $g$ (here a Linear BSpline) is non trainable and the transform $W$ is trainable, the math looks something like this -

$\frac{\partial{L}}{\partial{W}} = \frac{\partial{L}}{\partial{g}} \times \frac{\partial{g}}{\partial{(i,j)}} \times \frac{\partial{(i,j)}}{\partial{W}}$where, $L$ is the loss/objective function. It can be as simple as

$L(x) = sum(x)$which gives us the sum of elements of an array.

Now, we see we have splitted the derivatives into 3 parts. We will go partwise for each of them.

$\frac{\partial{L}}{\partial{g}} \implies$ this part can be calculated by the AD itself.

$\frac{\partial{g}}{\partial{(i,j)}} \implies$ this part is implemented as `Interpolations.gradient(g, i, j)`

. Note that the input is assumed to be two-dimensional, hence the two indices for derivatives in each direction.

$\frac{\partial{(i,j)}}{\partial{W}} \implies$ This has not been implemented. The exact implementation details will be shown below.

The exact derivative should look something like this -

$\frac{\partial{L}}{\partial{W}} = \frac{\partial{L}}{\partial{g}} \times \begin{bmatrix} \frac{\partial{g}}{\partial{i}} & \frac{\partial{g}}{\partial{j}}\\ \end{bmatrix} \times \begin{bmatrix} \frac{\partial{i}}{\partial{W}}\\ \frac{\partial{j}}{\partial{W}}\\ \end{bmatrix}$Now what we finally needed was this -

$\nabla = \begin{bmatrix} \frac{\partial{L}}{\partial{W_{1,1}}} & \frac{\partial{L}}{\partial{W_{1,2}}} & \frac{\partial{L}}{\partial{W_{1,3}}}\\ \frac{\partial{L}}{\partial{W_{2,1}}} & \frac{\partial{L}}{\partial{W_{2,2}}} & \frac{\partial{L}}{\partial{W_{2,3}}}\\ \frac{\partial{L}}{\partial{W_{3,1}}} & \frac{\partial{L}}{\partial{W_{3,2}}} & \frac{\partial{L}}{\partial{W_{3,3}}}\\ \end{bmatrix}$Some notations before the final results. We use homogenous coordinates in our calculations.

$\begin{bmatrix} i\\ j\\ 1\\ \end{bmatrix} \gets \begin{bmatrix} i_1\\ j_1\\ z\\ \end{bmatrix} = W \times \begin{bmatrix} i_0\\ j_0\\ 1\\ \end{bmatrix}$I won't finally derive all the results, but the final calculations were something like this -

$\frac{\partial{i}}{\partial{W}} = \frac{1}{z} \times \begin{bmatrix} i_0 & j_0 & 1\\ 0 & 0 & 0\\ -ii_0 & -ij_0 & -i\\ \end{bmatrix}$ $\frac{\partial{j}}{\partial{W}} = \frac{1}{z} \times \begin{bmatrix} 0 & 0 & 0\\ i_0 & j_0 & 1\\ -ji_0 & -jj_0 & -j\\ \end{bmatrix}$It finally becomes -

$\frac{\partial{L}}{\partial{W}} = \frac{\partial{L}}{\partial{g}} \times [\frac{\partial{g}}{\partial{i}} \times \frac{\partial{i}}{\partial{W}} + \frac{\partial{g}}{\partial{j}} \times \frac{\partial{j}}{\partial{W}}]$(Also let me know if you find any calculation mistake)

I implemented the adjoint as per the above calculations, and finally got something that was working.

```
julia> Zygote.gradient(src3, h) do img, tform
c = sum(warp_me2(img, tform))
c
end[2].H
3×3 Matrix{Float64}:
1291.15 348.268 -3.61461
1486.71 439.969 -5.47212
25459.4 17443.4 -75.2837
```

I shall upload some really juicy results at a later point. I still have to get it working for $X$`::AbstractArray{<:Colorant}`

, since mathematical operations aren't defined for colorants as such, and that should take me a little time (?)

I with my mentor Dhairya and Johnny Chen have something really nice in mind. I hope (read expect) it comes to fruiton till the end evaluations.

I will be working on the above mentioned issues, as well as the type where the interpolation scheme is trainable ($\frac{\partial{L}}{\partial{X}}$), but the warp can/cannot be trainable. I am the most excited about this part.

Till then, cheers, be safe and stay healthy!

© Som V Tambe. Last modified: August 27, 2021. Website built with Franklin.jl and the Julia programming language.