FastDVDNet Trained on DAVIS Data

Remove noise from a video

Released in 2019, this network achieves state-of-the-art results on video denoising. FastDVDNet outruns its competitors by faster inference time and the ability to handle a wide range of noise levels within a single model.

Number of layers: 237 | Parameter count: 2,488,008 | Trained size: 11 MB |

Training Set Information

Performance

Examples

Get the pre-trained net:

In[1]:=
NetModel["FastDVDNet Trained on DAVIS Data"]
Out[2]=

Basic usage

Get a noisy video:

In[3]:=
video = ResourceData["Sample Video: Horse Riding (Noisy)"];

Define a noise level for a video:

In[4]:=
noise = ConstantArray[1/8., {1, 256, 256}];

Denoise a video:

In[5]:=
denoised = VideoFrameMap[
    NetModel["FastDVDNet Trained on DAVIS Data"][<|"Video" -> #, "Noise" -> noise|>] &, video, 5, 1]; // AbsoluteTiming
Out[6]=

Visualize the five frames of a noisy and denoised video:

In[7]:=
Grid[VideoFrameList[#, 5] & /@ {video, denoised}]
Out[7]=

Adapt to any size

Automatic frame resizing can be avoided by replacing the NetEncoder. First get the net:

In[8]:=
net = NetModel["FastDVDNet Trained on DAVIS Data"];

Get a noisy video:

In[9]:=
video = ResourceData["Sample Video: Skating (Noisy)"];

Create a new NetEncoder with the desired dimensions (to get a resizable net, the spatial dimensions should be divisible by 4):

In[10]:=
netEnc = NetEncoder[{"VideoFrames", First[Information[video, "OriginalRasterSize"]], "ColorSpace" -> "RGB", "TargetLength" -> 5, FrameRate -> Inherited}]
Out[10]=

Attach the NetEncoder, define a noise level and run the net:

In[11]:=
resizedNet = NetReplacePart[
   net, {"Video" -> netEnc, "Noise" -> {1, Automatic, Automatic}}];
noise = ConstantArray[0.1, Prepend[Reverse[First[Information[video, "OriginalRasterSize"]]], 1]];
denoised = VideoFrameMap[resizedNet[<|"Video" -> #, "Noise" -> noise|>] &, video, 5, 1];

Visualize the three frames of a noisy and denoised video:

In[12]:=
Grid[VideoFrameList[#, 3] & /@ {video, denoised}]
Out[12]=

FastDVDNet architecture

The video encoder takes the first five frames of the input video and encodes it into a vector of size (5,3,256,256):

In[13]:=
NetExtract[NetModel["FastDVDNet Trained on DAVIS Data"], "Video"]
Out[14]=

Note that the first three blocks are shared. (Framei, Framei+1, Framei+2, Noise) are given as inputs to blocki to obtain an intermediate denoised frame to obtain an intermediate denoised output frame . Finally, (, , , Noise) are given as inputs to the last block to get the final denoised output frame:

In[15]:=
 NetExtract[
 NetModel["FastDVDNet Trained on DAVIS Data"], "FastDVDNet"]
Out[16]=

Extract the first and the last blocks of the net:

In[17]:=
{w, h} = First[Information[video, "OriginalRasterSize"]];
{blockIn, blockOut} = Map[NetReplacePart[
     NetExtract[
      NetModel["FastDVDNet Trained on DAVIS Data"], {"FastDVDNet", #}],
     {"Frame1" -> NetEncoder[{"Image", {w, h}}], "Frame2" -> NetEncoder[{"Image", {w, h}}], "Frame3" -> NetEncoder[{"Image", {w, h}}], "Noise" -> {1, h, w},
       "Output" -> NetDecoder["Image"]}
     ] &, {"block1", "block4"}];

All the blocks share a U-Net type architecture. Explore the input block:

In[18]:=
blockIn
Out[18]=

Get a noisy video and define a noise level for a video:

In[19]:=
video = ResourceData["Sample Video: Surfing (Noisy)"];
noise = ConstantArray[0.17, {1, h, w}];

Simply mapping the net to a list of frames can be inefficient due some blocks processing the same frame multiple times. To avoid double counting, split the inference into two steps:

In[20]:=
interm = VideoFrameMap[
   blockIn[<| "Noise" -> noise, "Frame1" -> #[[1]], "Frame2" -> #[[2]], "Frame3" -> #[[3]] |>] &, video, 3, 1];

Get the final denoised video:

In[21]:=
final = VideoFrameMap[
   blockOut[<| "Noise" -> noise, "Frame1" -> #[[1]], "Frame2" -> #[[2]], "Frame3" -> #[[3]]  |>] &, interm, 3, 1];

Visualize the three frames of a noisy video and the intermediate and final result:

In[22]:=
Grid[VideoFrameList[#, 3] & /@ {video, interm, final}]
Out[22]=

Memory-efficient evaluation

Define a custom videoDenoise function for memory-efficient processing of video with Gaussian or Poisson noise:

In[23]:=
Clear[videoDenoise];
Options[videoDenoise] = {TargetDevice -> "CPU"};
videoDenoise[video_Video,
   {noiseType : "GaussianNoise" | "PoissonNoise", a_?Internal`RealValuedNumberQ} /; 0 < a <= 1, opts : OptionsPattern[]] :=
  Module[
   {VideoDenoiseFrame, w, h, blockIn, blockOut, net, sigmaFactor, interFrame2 = Null, interFrame3 = Null, interFrame4 = Null},
   {w, h} = Normal[Information[video]["VideoTracks"][1]["OriginalRasterSize"]];
   {w, h} = Round[{w, h}, 4];
   (* Load first level U-Net *)
   net = NetModel["FastDVDNet Trained on DAVIS Data"];
   blockIn = NetReplacePart[
     NetExtract[net, {"FastDVDNet", "block1"}],
     {"Frame1" -> NetEncoder[{"Image", {w, h}}],
      "Frame2" -> NetEncoder[{"Image", {w, h}}],
      "Frame3" -> NetEncoder[{"Image", {w, h}}],
      "Noise" -> {1, h, w},
      "Output" -> {3, h, w} }
     ];
   (* Load second level U-Net *)
   blockOut = NetReplacePart[
     NetExtract[net, {"FastDVDNet", "block4"}],
     {"Frame1" -> {3, h, w},
      "Frame2" -> {3, h, w},
      "Frame3" -> {3, h, w},
      "Noise" -> {1, h, w},
      "Output" -> NetDecoder["Image"] }
     ];
   sigmaFactor = Switch[noiseType, "GaussianNoise", a, "PoissonNoise", Sqrt[a/(100*(1 - a))]];
   (* Neural function to combine five frames into one denoised frame *)
   VideoDenoiseFrame[frames : {__Image}] :=
    Block[
     {sigma, interFrame},
     sigma = Switch[noiseType,
       "GaussianNoise",
       ConstantArray[sigmaFactor, {1, h, w}],
       "PoissonNoise",
       ArrayReshape[
        ImageData[
         Sqrt[ColorConvert[frames[[3]], "Grayscale"]]*
          sigmaFactor], {1, h, w}]
       ];
     interFrame = blockIn[AssociationThread[{"Frame1", "Frame2", "Frame3", "Noise"} -> Append[frames, sigma] ], TargetDevice -> OptionValue[TargetDevice]];
     interFrame2 = If[NumericArrayQ[interFrame3], interFrame3, interFrame];
     interFrame3 = If[NumericArrayQ[interFrame4], interFrame4, interFrame];
     interFrame4 = interFrame;
     blockOut[<|"Frame1" -> interFrame2, "Frame2" -> interFrame3, "Frame3" -> interFrame4, "Noise" -> sigma |>, TargetDevice -> OptionValue[TargetDevice]]
     ];
   (* Map denoise function onto the entire video *)
   VideoFrameMap[VideoDenoiseFrame, video, 3, 1]
   ];

Get a noisy video:

In[24]:=
video = ResourceData["Sample Video: Horse Riding (Noisy)"];

Get a final denoised video assuming a Gaussian distribution on the input noise:

In[25]:=
denoisedGaussian = videoDenoise[video, {"GaussianNoise", 1/8.}]; // AbsoluteTiming
Out[26]=

Get a final denoised video assuming a Poisson distribution on the input noise:

In[27]:=
denoisedPoisson = videoDenoise[video, {"PoissonNoise", 0.8}]; // AbsoluteTiming
Out[28]=

Visualize the four frames of a noisy video and the denoised results assuming a Gaussian and Poisson distribution on the input noise:

In[29]:=
Grid[VideoFrameList[#, 4] & /@ {video, denoisedGaussian, denoisedPoisson}]
Out[29]=

Net information

Inspect the number of parameters of all arrays in the net:

In[30]:=
Information[
 NetModel["FastDVDNet Trained on DAVIS Data"], "ArraysElementCounts"]
Out[31]=

Obtain the total number of parameters:

In[32]:=
Information[
 NetModel[
  "FastDVDNet Trained on DAVIS Data"], "ArraysTotalElementCount"]
Out[33]=

Obtain the layer type counts:

In[34]:=
Information[
 NetModel["FastDVDNet Trained on DAVIS Data"], "LayerTypeCounts"]
Out[35]=

Display the summary graphic:

In[36]:=
Information[
 NetModel["FastDVDNet Trained on DAVIS Data"], "SummaryGraphic"]
Out[37]=

Export to ONNX

Export the net to the ONNX format:

In[38]:=
onnxFile = Export[FileNameJoin[{$TemporaryDirectory, "net.onnx"}], NetModel["FastDVDNet Trained on DAVIS Data"]]
Out[39]=

Get the size of the ONNX file:

In[40]:=
FileByteCount[onnxFile]
Out[40]=

The byte count of the resource object is smaller because shared arrays are currently being duplicated when exporting to ONNX:

In[41]:=
ResourceObject["FastDVDNet Trained on DAVIS Data"]["ByteCount"]
Out[42]=

Check some metadata of the ONNX model:

In[43]:=
{OpsetVersion, IRVersion} = {Import[onnxFile, "OperatorSetVersion"], Import[onnxFile, "IRVersion"]}
Out[43]=

Import the model back into the Wolfram Language. However, the NetEncoder and NetDecoder will be absent because they are not supported by ONNX:

In[44]:=
Import[onnxFile]
Out[44]=

Resource History

Reference

  • M. Tassano, J. Delon, T. Veit, "FastDVDnet: Towards Real-Time Deep Video Denoising Without Flow
    Estimation," arXiv:1907.01361 (2019)
  • Available from: https://github.com/m-tassano/fastdvdnet
  • Rights: Copying and distribution of this file, with or without modification,
    are permitted in any medium without royalty provided the copyright
    notice and this notice are preserved. This file is offered as-is,
    without any warranty.
    • Author: Matias Tassano, mtassano at gopro dot com
    • Copyright: © 2019 Matias Tassano
    • License: GPL v3 +, see GPLv3.txt
    The sequences are Copyright GoPro 2018.