Som V Tambe

Second JSoC blogpost

This will briefly elaborate what I have done in the past two weeks. I won't be highlighting the troubleshooting part which took the maximum time in these 2 previous weeks, but I will be showcasing the pull requests we made and some math behind the gradient of a supposedly complex function.

I have learnt a lot about gradients, adjoints and rrules in these weeks. The learning curve was close to tanπ2\tan{\frac{\pi}{2}} (for me). It was very frustrating in the beginning, but at the end the results were very delightful.

Some PRs I got merged/will be merged -

  1. Add gradients for FilledExtrapolations #439.

  2. Add differentiable homography warps #7 (huge PR).

There might be some more PRs in Zygote and StaticArrays, since we feel there is a bit of type piracy. You can also track the realtime progress here.

Some math now.

Our aim till now is to make warping functions in ImageTransformations differentiable.

Warps can be defined to be Y=f(X;W)Y = f(X;W), where YY is the output image/matrix, XX is the input image/matrix and WW is the transform.

Transforms can be represented as matrices; some examples of transforms are image resizing, rotation and homography transforms.

The current progress of how warps are generated is something like this -

  1. create an interpolation object of the input image (input would always be an AbstractArray), and then create an extrapolation of the same interpolation. You can find more about Julia's Interpolations library here.

  2. Get the indices which preserve all the information from XX after applying the transform.

  3. Apply transform pixelwise to these above indices and store the mapping data in the form of an image with above indices.

This may be a little vague (yes it is), but for that, you'll have to explore a bit of transforms yourself.

For the case where the Interpolation scheme gg (here a Linear BSpline) is non trainable and the transform WW is trainable, the math looks something like this -

LW=Lg×g(i,j)×(i,j)W\frac{\partial{L}}{\partial{W}} = \frac{\partial{L}}{\partial{g}} \times \frac{\partial{g}}{\partial{(i,j)}} \times \frac{\partial{(i,j)}}{\partial{W}}

where, LL is the loss/objective function. It can be as simple as

L(x)=sum(x)L(x) = sum(x)

which gives us the sum of elements of an array.

Now, we see we have splitted the derivatives into 3 parts. We will go partwise for each of them.

Lg \frac{\partial{L}}{\partial{g}} \implies this part can be calculated by the AD itself.

g(i,j) \frac{\partial{g}}{\partial{(i,j)}} \implies this part is implemented as Interpolations.gradient(g, i, j). Note that the input is assumed to be two-dimensional, hence the two indices for derivatives in each direction.

(i,j)W \frac{\partial{(i,j)}}{\partial{W}} \implies This has not been implemented. The exact implementation details will be shown below.

The exact derivative should look something like this -

LW=Lg×[gigj]×[iWjW] \frac{\partial{L}}{\partial{W}} = \frac{\partial{L}}{\partial{g}} \times \begin{bmatrix} \frac{\partial{g}}{\partial{i}} & \frac{\partial{g}}{\partial{j}}\\ \end{bmatrix} \times \begin{bmatrix} \frac{\partial{i}}{\partial{W}}\\ \frac{\partial{j}}{\partial{W}}\\ \end{bmatrix}

Now what we finally needed was this -

=[LW1,1LW1,2LW1,3LW2,1LW2,2LW2,3LW3,1LW3,2LW3,3] \nabla = \begin{bmatrix} \frac{\partial{L}}{\partial{W_{1,1}}} & \frac{\partial{L}}{\partial{W_{1,2}}} & \frac{\partial{L}}{\partial{W_{1,3}}}\\ \frac{\partial{L}}{\partial{W_{2,1}}} & \frac{\partial{L}}{\partial{W_{2,2}}} & \frac{\partial{L}}{\partial{W_{2,3}}}\\ \frac{\partial{L}}{\partial{W_{3,1}}} & \frac{\partial{L}}{\partial{W_{3,2}}} & \frac{\partial{L}}{\partial{W_{3,3}}}\\ \end{bmatrix}

Some notations before the final results. We use homogenous coordinates in our calculations.

[ij1][i1j1z]=W×[i0j01] \begin{bmatrix} i\\ j\\ 1\\ \end{bmatrix} \gets \begin{bmatrix} i_1\\ j_1\\ z\\ \end{bmatrix} = W \times \begin{bmatrix} i_0\\ j_0\\ 1\\ \end{bmatrix}

I won't finally derive all the results, but the final calculations were something like this -

iW=1z×[i0j01000ii0ij0i] \frac{\partial{i}}{\partial{W}} = \frac{1}{z} \times \begin{bmatrix} i_0 & j_0 & 1\\ 0 & 0 & 0\\ -ii_0 & -ij_0 & -i\\ \end{bmatrix} jW=1z×[000i0j01ji0jj0j] \frac{\partial{j}}{\partial{W}} = \frac{1}{z} \times \begin{bmatrix} 0 & 0 & 0\\ i_0 & j_0 & 1\\ -ji_0 & -jj_0 & -j\\ \end{bmatrix}

It finally becomes -

LW=Lg×[gi×iW+gj×jW] \frac{\partial{L}}{\partial{W}} = \frac{\partial{L}}{\partial{g}} \times [\frac{\partial{g}}{\partial{i}} \times \frac{\partial{i}}{\partial{W}} + \frac{\partial{g}}{\partial{j}} \times \frac{\partial{j}}{\partial{W}}]

(Also let me know if you find any calculation mistake)

I implemented the adjoint as per the above calculations, and finally got something that was working.

julia> Zygote.gradient(src3, h) do img, tform
                c = sum(warp_me2(img, tform))

3×3 Matrix{Float64}:
  1291.15    348.268   -3.61461
  1486.71    439.969   -5.47212
 25459.4   17443.4    -75.2837

I shall upload some really juicy results at a later point. I still have to get it working for XX::AbstractArray{<:Colorant}, since mathematical operations aren't defined for colorants as such, and that should take me a little time (?)

I with my mentor Dhairya and Johnny Chen have something really nice in mind. I hope (read expect) it comes to fruiton till the end evaluations.

I will be working on the above mentioned issues, as well as the type where the interpolation scheme is trainable (LX\frac{\partial{L}}{\partial{X}}), but the warp can/cannot be trainable. I am the most excited about this part.

Till then, cheers, be safe and stay healthy!