Function Repository Resource:

GridSampleLayer

Source Notebook

Define a neural net layer for image transformation based on input and index grids

Contributed by: Pierre-André Brousseau

ResourceFunction["GridSampleLayer"][n,{inWidth,inHeight},{outWidth,outHeight}]

represents a net layer that takes a feature array and a pixel coordinate array and recovers specified parts from the input array using bilinear interpolation.

ResourceFunction["GridSampleLayer"][n,{inWidth,inHeight},{outWidth,outHeight},mode]

uses either pixel coordinates or normalized pixel coordinates depending on mode.

Details

ResourceFunction["GridSampleLayer"] is a lookup operator to a feature array using pixel coordinates. Inside a neural network this can be used to warp features between two views or to project a higher-order tensor by looking up pixel positions. This layer is often necessary in tasks where the coordinate array is a learning objective such as optical flow or when a feature array is a learning objective such as learnable parametric encodings for NERF or DeepSDF.
ResourceFunction["GridSampleLayer"] is typically used inside NetChain, NetGraph, etc.
The output of ResourceFunction["GridSampleLayer"] is a NetGraph with the following ports:
The possible values for mode of ResourceFunction["GridSampleLayer"] are:
The "Input" is a 2D feature array of dimensions {n,inWidth,inHeight}.
The "Index" is a 2D index array of dimensions {2,outWidth,outHeight}.
In ResourceFunction["GridSampleLayer"][n,{inWidth,inHeight},{outWidth,outHeight}], the arguments are defined as follows:
Bilinear interpolation is a sampling technique where the output at a point is the weighted average of the four nearest points on the grid. This sampling is the same as in ResizeLayer with the "linear" Resampling method, i.e. piecewise linear interpolation.
This GridSampleLayer is an implementation of torch.nn.functional.grid_sample (spatial 4-D) with this specific set of parameters: -mode= 'bilinear' -padding_mode= 'reflection' -align_corners=False
GridSampleLayer supports Export to "ONNX" format.

Examples

Basic Examples (3) 

Create a GridSampleLayer to sample from a simple 1×28×28 array using a 16×16 coordinate array:

In[1]:=
gsLayer = ResourceFunction["GridSampleLayer"][1, {28, 28}, {16, 16}]
Out[1]=

Create a simple array from a MNIST digit:

In[2]:=
array = {ImageData[\!\(\*
GraphicsBox[
TagBox[RasterBox[CompressedData["
1:eJxTTMoPSmNiYGAo5gASQYnljkVFiZXBAkBOaF5xZnpeaopnXklqemqRRRJI
mQwU/x+6YKEoD6bglwqrL0BKTew2htQpUUaBV///H+XuwdR2mpGpEkhNYtqG
KdfD6A2i4piwOIG95BeQesUdii7zs50hHsxYzTgHTepfHBNY1///rmLvUaXe
xzFq34cwVVXRtIUwMTGxWfWAfSd9GkXqAKPXwm15rozapiUTxdDtuw22a4+D
Fw8jk9cvTD9AwC3GPFxS/zcyNeGUK2Z8hVPOSwen1H8tL9xykPDGCk4x1uGU
u69wDLeZxAIAa2LTqA==
"], {{0, 28}, {28, 0}}, {0, 255},
ColorFunction->GrayLevel],
BoxForm`ImageTag[
        "Byte", ColorSpace -> Automatic, Interleaving -> None],
Selectable->False],
DefaultBaseStyle->"ImageGraphics",
ImageSizeRaw->{28, 28},
PlotRange->{{0, 28}, {0, 28}}]\)]};

Sample using a coordinate array:

In[3]:=
coords = Transpose[Table[{i, j}, {i, 1, 16}, {j, 1, 16}], {2, 3, 1}];
sample = gsLayer[<|"Input" -> array, "Index" -> coords|>];
Image[sample, Interleaving -> False, ImageSize -> Small]
Out[5]=

Visualize a standard sampling using a Manipulate:

In[6]:=
Manipulate[(
  coords = Transpose[Table[{i + g, j + h}, {i, 1, 16}, {j, 1, 16}], {2, 3, 1}];
  sample = gsLayer[<|"Input" -> array, "Index" -> coords|>];
  Image[sample, Interleaving -> False, ImageSize -> Small]
  ), {{g, 6}, 0, 12, 0.5}, {{h, 6}, 0, 12, 0.5}]
Out[6]=

Visualize a subpixel sampling using a Manipulate:

In[7]:=
Manipulate[(
  coords = Transpose[
    Table[{i, j}, {i, g - 7/s, g + 8/s, 1/s}, {j, h - 7/s, h + 8/s, 1/s}], {2, 3, 1}];
  sample = gsLayer[<|"Input" -> array, "Index" -> coords|>];
  Image[sample, Interleaving -> False, ImageSize -> Small]
  ), {{g, 14}, 8, 20, 1}, {{h, 14}, 8, 20, 1}, {{s, 1}, 1, 4, 0.1}]
Out[7]=

Test GridSampleLayer on a test image of a house:

In[8]:=
image = ExampleData[{"TestImage", "House"}]
input = ImageData[image, Interleaving -> False];
{n, inH, inW} = Dimensions[input];
Out[8]=

Create a GridSampleLayer to transform from a 3×256×256 image array using a 256×256 coordinates array:

In[9]:=
gsHouseLayer = ResourceFunction["GridSampleLayer"][n, {inH, inW}, {inH, inW}, "Normalized"]
Out[9]=

Apply a vertical translation:

In[10]:=
t = 0.2;
coords = Transpose[
   Table[{i, j}, {i, -1 + t, 1 + t, 2/(inH - 1)}, {j, -1, 1, 2/(inW - 1)}], {2, 3, 1}];
im = gsHouseLayer[<|"Input" -> input, "Index" -> coords|>];
Image[im, Interleaving -> False]
Out[11]=

Apply a rotation:

In[12]:=
\[Theta] = Pi/4;
coords = coords = Transpose[
    Table[{{Cos[\[Theta]], -Sin[\[Theta]]}, {Sin[\[Theta]], Cos[\[Theta]]}} . {i, j}, {i, -1, 1, 2/(inH - 1)}, {j, -1, 1, 2/(inW - 1)}], {2, 3, 1}];
im = gsHouseLayer[<|"Input" -> input, "Index" -> coords|>];
Image[im, Interleaving -> False]
Out[15]=

Apply a crop and resize:

In[16]:=
{{c0x, c0y}, {c1x, c1y}} = {{64, 64}, {192, 192}};

lx = Rescale[c0x, {1, inH}, {-1, 1}];
hx = Rescale[c1x, {1, inH}, {-1, 1}];
ly = Rescale[c0y, {1, inW}, {-1, 1}];
hy = Rescale[c1y, {1, inW}, {-1, 1}];
coords = coords = Transpose[
    Table[{i, j}, {i, lx, hx, (hx - lx)/(inH - 1)}, {j, ly, hy, (hy - ly)/(inW - 1)}], {2, 3, 1}];
im = gsHouseLayer[<|"Input" -> input, "Index" -> coords|>];
Image[im, Interleaving -> False]
Out[23]=

Create a network to sample from a parametric encoding, a NetArrayLayer, in a NetGraph:

In[24]:=
parametricGrid = NetInitialize@NetGraph@FunctionLayer[Block[{array, sample}, (
       array = NetArrayLayer["Output" -> {16, 48, 64}][];
       sample = ResourceFunction["GridSampleLayer"][16, {48, 64}, {32, 32}, "Coordinates"][<|"Input" -> array, "Index" -> #Input|>]
       )] &]
Out[24]=

Sample from a NetArrayLayer at desired locations. This can often be used as a lookup operator into a learnable array:

In[25]:=
indices = Transpose[Table[{i, j}, {i, 9, 40}, {j, 17, 48}], {2, 3, 1}];
sample = parametricGrid[indices];
sample // Dimensions
Out[26]=

Neat Examples (2) 

Image distortion in a neural network, inspired by ImageTransformation:

In[27]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/1d30c4b3-508f-447d-b6d9-02bc79b5a205"]
Out[9]=

Caricature effect:

In[28]:=
f[pt_] := With[{s = {.5, .5}}, Module[{r, a},
    r = Sqrt[Norm[pt - s] Max[s]]; a = ArcTan @@ (pt - s);
    s + r {Cos[a], Sin[a]}]];
coords = Transpose[
   Table[f[{i, j}]*2 - 1, {i, 0, 1, 1/(inH - 1)}, {j, 0, 1, 1/(inW - 1)}], {2, 3, 1}];
out = gsDistortionLayer[<|"Input" -> input, "Index" -> coords|>];
Image[out, Interleaving -> False]
Out[29]=

Fisheye effect:

In[30]:=
f[pt_] := With[{s = {0.55, 0.55}}, Module[{r, a},
      r = Norm[pt - s]^2/Norm[s]; a = ArcTan @@ (pt - s);
      s + r {Cos[a], Sin[a]}]]
coords = Transpose[
   Table[f[{i, j}]*2 - 1, {i, 0, 1, 1/(inH - 1)}, {j, 0, 1, 1/(inW - 1)}], {2, 3, 1}];
out = gsDistortionLayer[<|"Input" -> input, "Index" -> coords|>];
Image[out, Interleaving -> False]
Out[31]=

Spiral mirror effect:

In[32]:=
f[pt_] := With[{s = {.5, .5}}, Module[{r, a},
   r = Norm[pt - s]^2/Max[s]; a = ArcTan @@ (pt - s)/Pi*180;
   pt + {Mod[(a/200 + r/2.), 16/200.] - 8/200., 0}]]
coords = Transpose[
   Table[f[{i, j}]*2 - 1, {i, 0, 1, 1/(inH - 1)}, {j, 0, 1, 1/(inW - 1)}], {2, 3, 1}];
out = gsDistortionLayer[<|"Input" -> input, "Index" -> coords|>];
Image[out, Interleaving -> False]
Out[35]=

Curvy distortion:

In[36]:=
Clear[f];
f[pt_] := With[{s = {.5, .5}}, Module[{r, a, an},
    r = Norm[pt - s]; a = ArcTan @@ (pt - s); an = a + 2 r;
    s + r {Cos[an], Sin[an]}]];
coords = Transpose[
   Table[f[{i, j}]*2 - 1, {i, 0, 1, 1/(inH - 1)}, {j, 0, 1, 1/(inW - 1)}], {2, 3, 1}];
out = gsDistortionLayer[<|"Input" -> input, "Index" -> coords|>];
Image[out, Interleaving -> False]
Out[40]=

Optical flow is a general motion field in an image, i.e. the apparent motion. The MPI-Sintel dataset is available at: http://sintel.is.tue.mpg.de/. It is a dataset for the evaluation of optical flow derived from the open source 3D animated short film, Sintel. Warp a Sintel image with its optical flow:

In[41]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/e27c1a7b-4bdd-4614-ad08-7e30dc9e6f44"]
Out[41]=
In[42]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/a33e081c-cf45-4340-a51d-e0c949225b51"]
Out[42]=
In[43]:=
gsOpticalFlow = ResourceFunction["GridSampleLayer"][n, {inH, inW}, {inH, inW}, "Coordinates"]
Out[43]=
In[44]:=
out = Table[
   gsOpticalFlow[<|"Input" -> input, "Index" -> coords + k*Reverse[flo]|>, TargetDevice -> "GPU"], {k,
     0, 1, 0.1}];
frames = Image[#, Interleaving -> False] & /@ Reverse[out];
FrameListVideo[Join[frames, Reverse[frames]]]
Out[45]=

Publisher

Pierre-Andre Brousseau

Requirements

Wolfram Language 13.0 (December 2021) or above

Version History

  • 1.0.0 – 13 December 2024

Source Metadata

Author Notes

This implementation allows training of self-supervised stereo matching and optical flow estimation.

License Information