Enhanced Super-Resolution GAN Trained on DIV2K, Flickr2K and OST Data

Increase the resoution of an image

Released in 2018, this architecture uses the GAN framework to train a very deep network that both upsamples and sharpens an image. It is based on the SRGAN architecture but makes use of a number of improvements to model architecture and training, such as Residual-in-Residual Dense Blocks and Relativistic GAN training, to achieve better visual quality.

Number of layers: 1,093 | Parameter count: 16,697,987 | Trained size: 72 MB |

Training Set Information

The main dataset used for training, Diverse 2K (DIV2K), contains 800 2K resolution images. The Flickr2K and OutdoorSceneTraining (OST) datasets were used to enrich the training set with more diverse textures. The training data was augmented with random horizontal flips and 90-degree rotations.

Performance

This model achieves a peak signal-to-noise ratio of 27.03 and a perceptual index of 0.8153 on the Urban 100 dataset with a scale factor of 4.

Examples

Download Example Notebook

Open in Wolfram Cloud

Resource retrieval

Get the pre-trained net:

In[1]:=

$NetModel["Enhanced Super-Resolution GAN Trained on DIV2K, Flickr2K \ and OST Data"]$

Out[2]=

Evaluation function

ESRGAN has been trained to upscale images by a factor of 4. Create an evaluation function to handle any resize factor:

In[3]:=

$Options[netevaluate] = {PerformanceGoal -> Automatic, TargetDevice -> "CPU"}; netevaluate[img_, imgScale_, OptionsPattern[]] := Block[{net, resizedNet, resizedImg, resizeScale, perfGoal, res, numResizes}, perfGoal = OptionValue[PerformanceGoal]; If[perfGoal === Automatic && imgScale <= 4, perfGoal = "Quality"]; If[perfGoal === Automatic && imgScale > 4, perfGoal = "Speed"]; net = NetModel[ "Enhanced Super-Resolution GAN Trained on DIV2K, Flickr2K and OST \ Data"]; numResizes = Ceiling[Log[4, imgScale]]; resizeScale = imgScale/(4^numResizes); res = img; Do[ resizedNet = NetReplacePart[net, "Input" -> NetEncoder[{"Image", ImageDimensions@res}]]; res = resizedNet[res, TargetDevice -> OptionValue[TargetDevice]]; , numResizes - 1 ]; If[perfGoal == "Speed", res = ImageResize[res, Scaled[resizeScale]]]; resizedNet = NetReplacePart[net, "Input" -> NetEncoder[{"Image", ImageDimensions@res}]]; res = resizedNet[res, TargetDevice -> OptionValue[TargetDevice]]; If[perfGoal == "Quality", res = ImageResize[res, Scaled[resizeScale]]]; res ]$

Basic usage

Get an image:

In[4]:=

(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/aa14d0e6-f5f8-48c1-ae9c-564d7cce8d0c"]

Downscale the image by a factor of 3:

In[5]:=

In[6]:=

Out[6]=

Upscale the downscaled image using the net:

In[7]:=

Out[7]=

Compare the details with a naively upscaled version and the original:

In[8]:=

In[9]:=

Grid[
{{"Naive", "Net", "Original"}, Table[ImageTrim[
img, {{140, 60}, {240, 260}}], {img, {naive, upScaled, mandrill}}]},
Frame -> All
]

Out[9]=

Evaluate the peak signal-to-noise ratio:

In[10]:=

Out[10]=

Control the time-quality tradeoff

When upscaling by factors of 4 or less, as in the examples in the previous section, ESRGAN is applied once. If necessary, the image is downscaled to match the required factor. It is possible to choose whether to downscale before or after ESRGAN is applied, affecting both the running time and the quality of the final result. Get an image:

In[11]:=

(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/0ece1997-b481-4894-8be1-6b2e12a4634f"]

Downscale the image by a factor of 3:

In[12]:=

Out[13]=

When setting PerformanceGoal to "Speed", the input image is first downscaled by a factor of 3/4 and then upscaled, using ESRGAN, by a factor of 4. This will make the network operate on the smallest possible image but will throw away some details of the original, yielding a lower quality result:

In[14]:=

Out[14]=

When setting PerformanceGoal to "Quality", the input image is first upscaled, using ESRGAN, by a factor of 4 and then downscaled by a factor of 3/4. This is the default setting for factors of 4 or less and will make the network operate on the full-sized image, yielding a higher-quality result:

In[15]:=

Out[15]=

Compare the details with a naively upscaled version and the original:

In[16]:=

In[17]:=

Grid[
{{"Naive", "Speed", "Quality", "Original"}, Table[ImageTrim[
img, {{160, 120}, {240, 240}}], {img, {naive, speedResult, qualityResult, horse}}]},
Dividers -> All
]

Out[17]=

When upscaling by factors of more than 4, ESRGAN must be applied multiple times. Get an image:

In[18]:=

(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/5fc2155b-13cc-47d0-bb08-2a9daf2f6dca"]

Downscale the image by a factor of 6:

In[19]:=

Out[20]=

When setting PerformanceGoal to "Speed", the downscaling happens before the last evaluation of ERSGAN. Hence, for a factor of 6, the input image is first upscaled by a factor of 4, then downscaled by a factor of 6/16 and finally upscaled by a factor of 4 again. This is the default setting for factors larger than 4:

In[21]:=

Out[21]=

When setting PerformanceGoal to "Quality", the downscaling happens after the last evaluation of ERSGAN. Hence, for a factor of 6, the input image is first upscaled by a factor of 4 twice and then downscaled by a factor of 6/16 (if available, set TargetDevice -> "GPU" for faster evaluation time):

In[22]:=

Out[22]=

Compare the details with a naively upscaled version and the original:

In[23]:=

In[24]:=

Grid[
{{"Naive", "Speed", "Quality", "Original"}, Table[ImageTrim[
img, {{110, 20}, {170, 200}}], {img, {naive, speedResult, qualityResult, scene}}]},
Dividers -> All
]

Out[24]=

Net information

Inspect the number of parameters of all arrays in the net:

In[25]:=

$NetInformation[ NetModel["Enhanced Super-Resolution GAN Trained on DIV2K, Flickr2K \ and OST Data"], "ArraysElementCounts"]$

Out[26]=

Obtain the total number of parameters:

In[27]:=

$NetInformation[ NetModel["Enhanced Super-Resolution GAN Trained on DIV2K, Flickr2K \ and OST Data"], "ArraysTotalElementCount"]$

Out[28]=

Obtain the layer type counts:

In[29]:=

$NetInformation[ NetModel["Enhanced Super-Resolution GAN Trained on DIV2K, Flickr2K \ and OST Data"], "LayerTypeCounts"]$

Out[30]=

Display the summary graphic:

In[31]:=

$NetInformation[ NetModel["Enhanced Super-Resolution GAN Trained on DIV2K, Flickr2K \ and OST Data"], "SummaryGraphic"]$

Out[34]=

Export to MXNet

Export the net into a format that can be opened in MXNet:

In[35]:=

$jsonPath = Export[FileNameJoin[{$TemporaryDirectory, "net.json"}], NetReplacePart[ NetModel[ "Enhanced Super-Resolution GAN Trained on DIV2K, Flickr2K and OST \ Data"], "Input" -> "Image"], "MXNet"]$

Out[36]=

Export also creates a net.params file containing parameters:

In[37]:=

Out[37]=

Get the size of the parameter file:

In[38]:=

Out[38]=

The size is similar to the byte count of the resource object:

In[39]:=

$ResourceObject[ "Enhanced Super-Resolution GAN Trained on DIV2K, Flickr2K and OST \ Data"]["ByteCount"]$

Out[40]=

Construction Notebook

Download Construction Notebook

Open in Wolfram Cloud

Requirements

Wolfram Language 12.0 (April 2019) or above

Resource History

Date Created: 7 May 2019
Latest Update: 17 May 2019

Reference

X. Wang, K. Yu, S. Wu, J. Gu, Y. Liu, C. Dong, Y. Qiao, C. C. Loy, X. Tang, "ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks," arXiv:1809.00219 (2018)
Available from: https://github.com/xinntao/ESRGAN
Rights: Apache License 2.0