Enhanced Super-Resolution GAN Trained on DIV2K, Flickr2K and OST Data

Increase the resoution of an image

Released in 2018, this architecture uses the GAN framework to train a very deep network that both upsamples and sharpens an image. It is based on the SRGAN architecture but makes use of a number of improvements to model architecture and training, such as Residual-in-Residual Dense Blocks and Relativistic GAN training, to achieve better visual quality.

Number of layers: 1,093 | Parameter count: 16,697,987 | Trained size: 72 MB |

Training Set Information

Performance

Examples

Resource retrieval

Get the pre-trained net:

In[1]:=
NetModel["Enhanced Super-Resolution GAN Trained on DIV2K, Flickr2K \
and OST Data"]
Out[2]=

Evaluation function

ESRGAN has been trained to upscale images by a factor of 4. Create an evaluation function to handle any resize factor:

In[3]:=
Options[netevaluate] = {PerformanceGoal -> Automatic, TargetDevice -> "CPU"};
netevaluate[img_, imgScale_, OptionsPattern[]] := Block[{net, resizedNet, resizedImg, resizeScale, perfGoal, res, numResizes},
  perfGoal = OptionValue[PerformanceGoal];
  If[perfGoal === Automatic && imgScale <= 4, perfGoal = "Quality"];
  If[perfGoal === Automatic && imgScale > 4, perfGoal = "Speed"];
  net = NetModel[
    "Enhanced Super-Resolution GAN Trained on DIV2K, Flickr2K and OST \
Data"];
  numResizes = Ceiling[Log[4, imgScale]];
  resizeScale = imgScale/(4^numResizes);
  res = img;
  Do[
   resizedNet = NetReplacePart[net, "Input" -> NetEncoder[{"Image", ImageDimensions@res}]];
   res = resizedNet[res, TargetDevice -> OptionValue[TargetDevice]];
   ,
   numResizes - 1
   ];
  If[perfGoal == "Speed", res = ImageResize[res, Scaled[resizeScale]]];
  resizedNet = NetReplacePart[net, "Input" -> NetEncoder[{"Image", ImageDimensions@res}]];
  res = resizedNet[res, TargetDevice -> OptionValue[TargetDevice]];
  If[perfGoal == "Quality", res = ImageResize[res, Scaled[resizeScale]]];
  res
  ]

Basic usage

Get an image:

In[4]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/aa14d0e6-f5f8-48c1-ae9c-564d7cce8d0c"]

Downscale the image by a factor of 3:

In[5]:=
zoom = 3;
In[6]:=
downScaled = ImageResize[mandrill, Scaled[1/zoom]]
Out[6]=

Upscale the downscaled image using the net:

In[7]:=
upScaled = netevaluate[downScaled, zoom]
Out[7]=

Compare the details with a naively upscaled version and the original:

In[8]:=
naive = ImageResize[downScaled, Scaled[zoom]];
In[9]:=
Grid[
 {{"Naive", "Net", "Original"}, Table[ImageTrim[
    img, {{140, 60}, {240, 260}}], {img, {naive, upScaled, mandrill}}]},
 Frame -> All
 ]
Out[9]=

Evaluate the peak signal-to-noise ratio:

In[10]:=
10*Log[10, 1/Mean@Flatten@ImageData[(upScaled - mandrill)^2]]
Out[10]=

Control the time-quality tradeoff

When upscaling by factors of 4 or less, as in the examples in the previous section, ESRGAN is applied once. If necessary, the image is downscaled to match the required factor. It is possible to choose whether to downscale before or after ESRGAN is applied, affecting both the running time and the quality of the final result. Get an image:

In[11]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/0ece1997-b481-4894-8be1-6b2e12a4634f"]

Downscale the image by a factor of 3:

In[12]:=
zoom = 3;
downScaled = ImageResize[horse, Scaled[1/zoom]]
Out[13]=

When setting PerformanceGoal to "Speed", the input image is first downscaled by a factor of 3/4 and then upscaled, using ESRGAN, by a factor of 4. This will make the network operate on the smallest possible image but will throw away some details of the original, yielding a lower quality result:

In[14]:=
speedResult = netevaluate[downScaled, zoom, PerformanceGoal -> "Speed"]
Out[14]=

When setting PerformanceGoal to "Quality", the input image is first upscaled, using ESRGAN, by a factor of 4 and then downscaled by a factor of 3/4. This is the default setting for factors of 4 or less and will make the network operate on the full-sized image, yielding a higher-quality result:

In[15]:=
qualityResult = netevaluate[downScaled, zoom, PerformanceGoal -> "Quality"]
Out[15]=

Compare the details with a naively upscaled version and the original:

In[16]:=
naive = ImageResize[downScaled, Scaled[zoom]];
In[17]:=
Grid[
 {{"Naive", "Speed", "Quality", "Original"}, Table[ImageTrim[
    img, {{160, 120}, {240, 240}}], {img, {naive, speedResult, qualityResult, horse}}]},
 Dividers -> All
 ]
Out[17]=

When upscaling by factors of more than 4, ESRGAN must be applied multiple times. Get an image:

In[18]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/5fc2155b-13cc-47d0-bb08-2a9daf2f6dca"]

Downscale the image by a factor of 6:

In[19]:=
zoom = 6;
downScaled = ImageResize[scene, Scaled[1/zoom]]
Out[20]=

When setting PerformanceGoal to "Speed", the downscaling happens before the last evaluation of ERSGAN. Hence, for a factor of 6, the input image is first upscaled by a factor of 4, then downscaled by a factor of 6/16 and finally upscaled by a factor of 4 again. This is the default setting for factors larger than 4:

In[21]:=
speedResult = netevaluate[downScaled, zoom, PerformanceGoal -> "Speed"]
Out[21]=

When setting PerformanceGoal to "Quality", the downscaling happens after the last evaluation of ERSGAN. Hence, for a factor of 6, the input image is first upscaled by a factor of 4 twice and then downscaled by a factor of 6/16 (if available, set TargetDevice -> "GPU" for faster evaluation time):

In[22]:=
qualityResult = netevaluate[downScaled, zoom, PerformanceGoal -> "Quality"]
Out[22]=

Compare the details with a naively upscaled version and the original:

In[23]:=
naive = ImageResize[downScaled, Scaled[zoom]];
In[24]:=
Grid[
 {{"Naive", "Speed", "Quality", "Original"}, Table[ImageTrim[
    img, {{110, 20}, {170, 200}}], {img, {naive, speedResult, qualityResult, scene}}]},
 Dividers -> All
 ]
Out[24]=

Net information

Inspect the number of parameters of all arrays in the net:

In[25]:=
NetInformation[
 NetModel["Enhanced Super-Resolution GAN Trained on DIV2K, Flickr2K \
and OST Data"], "ArraysElementCounts"]
Out[26]=

Obtain the total number of parameters:

In[27]:=
NetInformation[
 NetModel["Enhanced Super-Resolution GAN Trained on DIV2K, Flickr2K \
and OST Data"], "ArraysTotalElementCount"]
Out[28]=

Obtain the layer type counts:

In[29]:=
NetInformation[
 NetModel["Enhanced Super-Resolution GAN Trained on DIV2K, Flickr2K \
and OST Data"], "LayerTypeCounts"]
Out[30]=

Display the summary graphic:

In[31]:=
NetInformation[
 NetModel["Enhanced Super-Resolution GAN Trained on DIV2K, Flickr2K \
and OST Data"], "SummaryGraphic"]
Out[34]=

Export to MXNet

Export the net into a format that can be opened in MXNet:

In[35]:=
jsonPath = Export[FileNameJoin[{$TemporaryDirectory, "net.json"}], NetReplacePart[
   NetModel[
    "Enhanced Super-Resolution GAN Trained on DIV2K, Flickr2K and OST \
Data"], "Input" -> "Image"], "MXNet"]
Out[36]=

Export also creates a net.params file containing parameters:

In[37]:=
paramPath = FileNameJoin[{DirectoryName[jsonPath], "net.params"}]
Out[37]=

Get the size of the parameter file:

In[38]:=
FileByteCount[paramPath]
Out[38]=

The size is similar to the byte count of the resource object:

In[39]:=
ResourceObject[
  "Enhanced Super-Resolution GAN Trained on DIV2K, Flickr2K and OST \
Data"]["ByteCount"]
Out[40]=

Requirements

Wolfram Language 12.0 (April 2019) or above

Resource History

Reference