FastDVDNet Trained on DAVIS Data

Remove noise from a video

Released in 2019, this network achieves state-of-the-art results on video denoising. FastDVDNet outruns its competitors by faster inference time and the ability to handle a wide range of noise levels within a single model.

Number of layers: 237 | Parameter count: 2,488,008 | Trained size: 11 MB |

Training Set Information

DAVIS: Densely Annotated VIdeo Segmentation. The DAVIS set contains 30 color-training sequences of resolution 854×480.

Performance

The model achieves 31.86 PSNR and 28.89 ST-RRED on the Set8 and DAVIS testset.

Examples

Download Example Notebook

Open in Wolfram Cloud

Get the pre-trained net:

In[1]:=

Out[2]=

Basic usage

Get a noisy video:

In[3]:=

Define a noise level for a video:

In[4]:=

Denoise a video:

In[5]:=

denoised = VideoFrameMap[
NetModel["FastDVDNet Trained on DAVIS Data"][<|"Video" -> #, "Noise" -> noise|>] &, video, 5, 1]; // AbsoluteTiming

Out[6]=

Visualize the five frames of a noisy and denoised video:

In[7]:=

Out[7]=

Adapt to any size

Automatic frame resizing can be avoided by replacing the NetEncoder. First get the net:

In[8]:=

Get a noisy video:

In[9]:=

Create a new NetEncoder with the desired dimensions (to get a resizable net, the spatial dimensions should be divisible by 4):

In[10]:=

netEnc = NetEncoder[{"VideoFrames", First[Information[video, "OriginalRasterSize"]], "ColorSpace" -> "RGB", "TargetLength" -> 5, FrameRate -> Inherited}]

Out[10]=

Attach the NetEncoder, define a noise level and run the net:

In[11]:=

resizedNet = NetReplacePart[
net, {"Video" -> netEnc, "Noise" -> {1, Automatic, Automatic}}];
noise = ConstantArray[0.1, Prepend[Reverse[First[Information[video, "OriginalRasterSize"]]], 1]];
denoised = VideoFrameMap[resizedNet[<|"Video" -> #, "Noise" -> noise|>] &, video, 5, 1];

Visualize the three frames of a noisy and denoised video:

In[12]:=

Out[12]=

FastDVDNet architecture

The video encoder takes the first five frames of the input video and encodes it into a vector of size (5,3,256,256):

In[13]:=

Out[14]=

Note that the first three blocks are shared. (Frame_i, Frame_i+1, Frame_i+2, Noise) are given as inputs to block_i to obtain an intermediate denoised frame to obtain an intermediate denoised output frame . Finally, (, , , Noise) are given as inputs to the last block to get the final denoised output frame:

In[15]:=

Out[16]=

Extract the first and the last blocks of the net:

In[17]:=

{w, h} = First[Information[video, "OriginalRasterSize"]];
{blockIn, blockOut} = Map[NetReplacePart[
NetExtract[
NetModel["FastDVDNet Trained on DAVIS Data"], {"FastDVDNet", #}],
{"Frame1" -> NetEncoder[{"Image", {w, h}}], "Frame2" -> NetEncoder[{"Image", {w, h}}], "Frame3" -> NetEncoder[{"Image", {w, h}}], "Noise" -> {1, h, w},
"Output" -> NetDecoder["Image"]}
] &, {"block1", "block4"}];

All the blocks share a U-Net type architecture. Explore the input block:

In[18]:=

Out[18]=

Get a noisy video and define a noise level for a video:

In[19]:=

Simply mapping the net to a list of frames can be inefficient due some blocks processing the same frame multiple times. To avoid double counting, split the inference into two steps:

In[20]:=

interm = VideoFrameMap[
blockIn[<| "Noise" -> noise, "Frame1" -> #[[1]], "Frame2" -> #[[2]], "Frame3" -> #[[3]] |>] &, video, 3, 1];

Get the final denoised video:

In[21]:=

final = VideoFrameMap[
blockOut[<| "Noise" -> noise, "Frame1" -> #[[1]], "Frame2" -> #[[2]], "Frame3" -> #[[3]] |>] &, interm, 3, 1];

Visualize the three frames of a noisy video and the intermediate and final result:

In[22]:=

Out[22]=

Memory-efficient evaluation

Define a custom videoDenoise function for memory-efficient processing of video with Gaussian or Poisson noise:

In[23]:=

$Clear[videoDenoise]; Options[videoDenoise] = {TargetDevice -> "CPU"}; videoDenoise[video_Video, {noiseType : "GaussianNoise" | "PoissonNoise", a_?Internal`RealValuedNumberQ} /; 0 < a <= 1, opts : OptionsPattern[]] := Module[ {VideoDenoiseFrame, w, h, blockIn, blockOut, net, sigmaFactor, interFrame2 = Null, interFrame3 = Null, interFrame4 = Null}, {w, h} = Normal[Information[video]["VideoTracks"][1]["OriginalRasterSize"]]; {w, h} = Round[{w, h}, 4]; (* Load first level U-Net *) net = NetModel["FastDVDNet Trained on DAVIS Data"]; blockIn = NetReplacePart[ NetExtract[net, {"FastDVDNet", "block1"}], {"Frame1" -> NetEncoder[{"Image", {w, h}}], "Frame2" -> NetEncoder[{"Image", {w, h}}], "Frame3" -> NetEncoder[{"Image", {w, h}}], "Noise" -> {1, h, w}, "Output" -> {3, h, w} } ]; (* Load second level U-Net *) blockOut = NetReplacePart[ NetExtract[net, {"FastDVDNet", "block4"}], {"Frame1" -> {3, h, w}, "Frame2" -> {3, h, w}, "Frame3" -> {3, h, w}, "Noise" -> {1, h, w}, "Output" -> NetDecoder["Image"] } ]; sigmaFactor = Switch[noiseType, "GaussianNoise", a, "PoissonNoise", Sqrt[a/(100*(1 - a))]]; (* Neural function to combine five frames into one denoised frame *) VideoDenoiseFrame[frames : {__Image}] := Block[ {sigma, interFrame}, sigma = Switch[noiseType, "GaussianNoise", ConstantArray[sigmaFactor, {1, h, w}], "PoissonNoise", ArrayReshape[ ImageData[ Sqrt[ColorConvert[frames[[3]], "Grayscale"]]* sigmaFactor], {1, h, w}] ]; interFrame = blockIn[AssociationThread[{"Frame1", "Frame2", "Frame3", "Noise"} -> Append[frames, sigma] ], TargetDevice -> OptionValue[TargetDevice]]; interFrame2 = If[NumericArrayQ[interFrame3], interFrame3, interFrame]; interFrame3 = If[NumericArrayQ[interFrame4], interFrame4, interFrame]; interFrame4 = interFrame; blockOut[<|"Frame1" -> interFrame2, "Frame2" -> interFrame3, "Frame3" -> interFrame4, "Noise" -> sigma |>, TargetDevice -> OptionValue[TargetDevice]] ]; (* Map denoise function onto the entire video *) VideoFrameMap[VideoDenoiseFrame, video, 3, 1] ];$

Get a noisy video:

In[24]:=

Get a final denoised video assuming a Gaussian distribution on the input noise:

In[25]:=

Out[26]=

Get a final denoised video assuming a Poisson distribution on the input noise:

In[27]:=

Out[28]=

Visualize the four frames of a noisy video and the denoised results assuming a Gaussian and Poisson distribution on the input noise:

In[29]:=

Out[29]=

Net information

Inspect the number of parameters of all arrays in the net:

In[30]:=

Out[31]=

Obtain the total number of parameters:

In[32]:=

Out[33]=

Obtain the layer type counts:

In[34]:=

Out[35]=

Display the summary graphic:

In[36]:=

Out[37]=

Export to ONNX

Export the net to the ONNX format:

In[38]:=

Out[39]=

Get the size of the ONNX file:

In[40]:=

Out[40]=

The byte count of the resource object is smaller because shared arrays are currently being duplicated when exporting to ONNX:

In[41]:=

Out[42]=

Check some metadata of the ONNX model:

In[43]:=

Out[43]=

Import the model back into the Wolfram Language. However, the NetEncoder and NetDecoder will be absent because they are not supported by ONNX:

In[44]:=

Out[44]=

Construction Notebook

Download Construction Notebook

Open in Wolfram Cloud

Resource History

Date Created: 7 October 2021
Latest Update: 13 October 2021

Reference

M. Tassano, J. Delon, T. Veit, "FastDVDnet: Towards Real-Time Deep Video Denoising Without Flow
Estimation," arXiv:1907.01361 (2019)
Available from: https://github.com/m-tassano/fastdvdnet
Rights: Copying and distribution of this file, with or without modification,
are permitted in any medium without royalty provided the copyright
notice and this notice are preserved. This file is offered as-is,
without any warranty.
• Author: Matias Tassano, mtassano at gopro dot com
• Copyright: © 2019 Matias Tassano
• License: GPL v3 +, see GPLv3.txt
The sequences are Copyright GoPro 2018.