DBNet Text Detector Trained on ICDAR-2015 and Total-Text Data

Detect and localize text in an image

Released in 2022, this family of segmentation networks introduces a novel framework for detecting arbitrary-shape scene text. By means of a new module named Differentiable Binarization (DB), a segmentation network can adaptively set the thresholds for binarization; this approach simplifies the postprocessing and improves the performance of text detection. In addition, an efficient Adaptive Scale Fusion (ASF) module fuses features of different scales adaptively, improving the scale robustness. The experiments have shown that for a ResNet-50 backbone, the network outperforms state-of-the-art results on five standard scene text benchmarks. And for a lightweight ResNet-18 backbone, the network achieves competitive performance while showing real-time inference speed.

Training Set Information

Model Information

Examples

Resource retrieval

Get the pre-trained net:

In[1]:=
NetModel["DBNet Text Detector Trained on ICDAR-2015 and Total-Text Data"]
Out[1]=

NetModel parameters

This model consists of a family of individual nets, each identified by a specific architecture. Inspect the available parameters:

In[2]:=
NetModel["DBNet Text Detector Trained on ICDAR-2015 and Total-Text Data", "ParametersInformation"]
Out[2]=

Pick a non-default net by specifying the architecture:

In[3]:=
NetModel[{"DBNet Text Detector Trained on ICDAR-2015 and Total-Text Data", "Architecture" -> "DBNet", "Backbone" -> "Resnet18", "Dataset" -> "TotalText"}]
Out[3]=

Pick a non-default uninitialized net:

In[4]:=
NetModel[{"DBNet Text Detector Trained on ICDAR-2015 and Total-Text Data", "Architecture" -> "DBNetPP", "Backbone" -> "Resnet50-OCLIP"}, "UninitializedEvaluationNet"]
Out[4]=

Evaluation function

Write an evaluation function to scale the result to the input image size and suppress the least probable detections:

In[5]:=
xywha2Parallelogram[input_] := Block[{cp, w, h, a, r, pts},
   		cp = input[[;; 2]];
   		{w, h} = input[[3 ;; 4]] / 2;
   		a = N[input[[5]]];
   		r = {cp - {w, h}, cp + {w, -h}, cp + {-w, h}};
   		pts = RotationTransform[a, cp][r];
   		Parallelogram[pts[[1]], {pts[[2]] - pts[[1]], pts[[3]] - pts[[1]]}]
   	];
parallelogram2xywha[Parallelogram[r1_, {v1_, v2_}]] := Block[{n1, n2, w, h, a},
   		{n1, n2} = {Norm[v1], Norm[v2]};
   		If[n1 === 0. || n2 === 0., Return[{Sequence @@ ( r1 + (v1 + v2) / 2), 0., 0., 0.}];];
   		If[ n1 > n2,
    			w = n1; h = n2; a = ArcTan[Sequence @@ v1],
    			w = n2; h = n1; a = ArcTan[Sequence @@ v2]
    		];
   		(*If[a > Pi/2, a = Pi - a];*)
   		{Sequence @@ ( r1 + (v1 + v2) / 2), w, h, a}
   	];
In[6]:=
scaleParallelogram[pgram_, scale_] := Module[{cpx, cpy, w, h, a}, {cpx, cpy, w, h, a} = parallelogram2xywha[pgram];
   	xywha2Parallelogram[{cpx, cpy, scale[[1]]*w, scale[[2]]*h, a}]
   ];
parallelogramPerimeter[Parallelogram[r1_, {v11_, v12_}]] := 2*(Norm[v12] + Norm[v11]);
filterLowScoreDetections[{boundingParallelograms_, scores_}, acceptanceThreshold_] := Transpose@Select[
    	Transpose[{boundingParallelograms, scores}], #[[2]] >= acceptanceThreshold &];
filterSmallPerimeterDetections[{boundingParallelograms_, scores_}, minPerimeter_] := Transpose@Select[
    	Transpose[{boundingParallelograms, scores}], parallelogramPerimeter[#[[1]]] >= minPerimeter &];
In[7]:=
Options[netevaluate] = { "MaskThreshold" -> 0.3, "AcceptanceThreshold" -> 0.1, "MinPerimeter" -> 13, "ScaledPadding" -> {1.3, 2.5}, "Output" -> "Regions" | "Masks"};
netevaluate[img_, OptionsPattern[]] := Module[
   {probabilitymap, inputImageDims, w, h, ratio, tRatio, probmap, probmapdims, mask, contours,  boundingRects,
    	probcrops, maskcrops, scores, boundingParallelograms, res}, probabilitymap = Image[NetModel[
       "DBNet Text Detector Trained on ICDAR-2015 and Total-Text Data"][
      img]];
   (* Scale probability map to original image dimensions *)
   inputImageDims = ImageDimensions[img];
   {w, h} = ImageDimensions[probabilitymap];
   ratio = ImageAspectRatio[img];
   tRatio = ImageAspectRatio[probabilitymap];
   probmap = If[tRatio/ratio > 1,
     			ImageResize[ImageCrop[probabilitymap, {w, w*ratio}], inputImageDims],
     			ImageResize[ImageCrop[probabilitymap, {h /ratio, h}], inputImageDims]
     		];
   probmapdims = ImageDimensions[probmap];
   (* Binarize the probability map to get the mask *)
   mask = Binarize[probmap, OptionValue["MaskThreshold"]];
   (*Output mask*)
   If[SameQ[OptionValue["Output"], "Masks"], Return[mask]];
   (* extract contours*)
   contours = ImageMeasurements[mask, "PerimeterPositions", CornerNeighbors -> True];
   (* get bounding box for each contour *)
   boundingRects = BoundingRegion[#, "MinRectangle"] & /@ contours;
   If[boundingRects === {}, Return @{{}, {}}];
   (* score computation defined as the mean intensity of the pixels inside the mask region for each contour *)
   probcrops = ImageTrim[probmap, boundingRects];
   maskcrops = ImageTrim[mask, boundingRects];
   scores = MapThread[
     ImageMeasurements[ImageMultiply[#1, #2], "Mean"] &, {probcrops, maskcrops}];
   (*Get parallelograms*)
   boundingParallelograms = BoundingRegion[#, "MinOrientedRectangle"] & /@ contours;
   If[boundingParallelograms === {}, Return @{{}, {}}];
   (* Filter parallelograms by score *)
   res = filterLowScoreDetections[{boundingParallelograms, scores}, OptionValue["AcceptanceThreshold"]];
   If[res === {}, Return @{{}, {}}];
   (* scale resulting parallelograms *)
   If[UnsameQ[OptionValue["ScaledPadding"], Automatic],
    		boundingParallelograms = scaleParallelogram[#, OptionValue["ScaledPadding"]] & /@ res[[1]],
    		boundingParallelograms = res[[1]]
    	];
   (* Filter small perimeters *)
   res = filterSmallPerimeterDetections[{boundingParallelograms, res[[2]]}, OptionValue["MinPerimeter"]];
   If[res === {}, Return @{{}, {}}];
   (* show result *)
   Association[{"BoundingBoxes" -> res[[1]], "Scores" -> res[[2]]}]
   ];

Basic usage

Obtain the detected bounding boxes with their corresponding scores for a given image:

In[8]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/8c24702b-fa56-4788-b52b-bdd4281b120d"]
In[9]:=
detection = netevaluate[testImage];

The model's output is an Association containing the detected "BoundingBoxes" and "Scores":

In[10]:=
Keys[detection]
Out[10]=

The "BoundingBoxes" key is a list of Parallelogram expressions corresponding to the bounding regions of the detected objects:

In[11]:=
boxes = detection["BoundingBoxes"]
Out[11]=

Visualize the bounding regions:

In[12]:=
HighlightImage[testImage, boxes]
Out[12]=

The "Scores" key contains the score values of the detected objects:

In[13]:=
detection["Scores"]
Out[13]=

Advanced usage

Get an image:

In[14]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/82246f4b-9314-420b-a89f-853355af775b"]

Get the text region masks via the option "Output"->"Masks" and visualize it:

In[15]:=
mask = netevaluate[testImage, "Output" -> "Masks"];
mask = Dilation[mask, 10];
HighlightImage[testImage, mask, ImageSize -> Medium]
Out[17]=

Obtain the boxes using the default evaluation and visualize them:

In[18]:=
boxes1 = netevaluate[testImage]["BoundingBoxes"];
In[19]:=
HighlightImage[testImage, Style[boxes1, Green], ImageSize -> Medium]
Out[19]=

Increase the "MinPerimeter" to remove small boxes:

In[20]:=
boxes2 = netevaluate[testImage, "MinPerimeter" -> 150]["BoundingBoxes"];

Visualize the selected and filtered out boxes:

In[21]:=
HighlightImage[testImage, {Legended[Style[boxes2, Green], "Selected"],
   Legended[Style[Complement[boxes1, boxes2], Blue], "Filtered out"]},
  ImageSize -> Medium]
Out[21]=

Increase the "AcceptanceThreshold" to remove low probability boxes:

In[22]:=
boxes3 = netevaluate[testImage, "AcceptanceThreshold" -> 0.3]["BoundingBoxes"];

Visualize the selected and filtered out boxes:

In[23]:=
HighlightImage[testImage, {Legended[Style[boxes3, Green], "Selected"],
   Legended[Style[Complement[boxes1, boxes3], Blue], "Filtered out"]},
  ImageSize -> Medium]
Out[23]=

Change the box padding with the "ScaledPadding" option:

In[24]:=
boxes4 = netevaluate[testImage, "ScaledPadding" -> {1.5, 3.5}][
   "BoundingBoxes"];

Visualize the original and padded boxes:

In[25]:=
HighlightImage[testImage, {Legended[Style[boxes1, Green], "Original"],
   Legended[Style[boxes4, Yellow], "Padded"]}, ImageSize -> Medium]
Out[25]=

The "MaskThreshold" option can help to filter noisy detections. Increasing the "MaskThreshold" helps to select the boxes with the strongest probability map:

In[26]:=
boxes5 = netevaluate[testImage, "MaskThreshold" -> 0.7]["BoundingBoxes"];

Visualize the selected boxes:

In[27]:=
HighlightImage[testImage, {Legended[Style[boxes5, Green], "Selected"]},
  ImageSize -> Medium]
Out[27]=

Network result

Get an image:

In[28]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/9250a93e-b846-4b3e-b09d-066964fb936c"]

Get the probability map for the detected text:

In[29]:=
probabilitymap = NetModel[
    "DBNet Text Detector Trained on ICDAR-2015 and Total-Text Data"][
   testImage];

Adjust the result dimensions to the original image shape:

In[30]:=
probabilitymap = Image[probabilitymap];
inputImageDims = ImageDimensions[testImage];
{w, h} = ImageDimensions[probabilitymap];
ratio = ImageAspectRatio[testImage];
tRatio = ImageAspectRatio[probabilitymap];
probmap = If[tRatio/ratio > 1,
   ImageResize[ImageCrop[probabilitymap, {w, w*ratio}], inputImageDims],
   ImageResize[ImageCrop[probabilitymap, {h /ratio, h}], inputImageDims]
   ];

Visualize the probability map:

In[31]:=
ImageCompose[
 Colorize[Threshold[probmap, 0.005], ColorFunction -> ColorData["TemperatureMap"], ColorRules -> {0 -> White}], {testImage, 0.5}]
Out[31]=

Binarize the probability map to obtain the mask:

In[32]:=
mask = Binarize[probmap, 0.3]
Out[32]=

Visualize the bounding boxes around the masked regions:

In[33]:=
contours = ImageMeasurements[mask, "PerimeterPositions", CornerNeighbors -> True];
boundingRects = BoundingRegion[#, "MinRectangle"] & /@ contours;
HighlightImage[testImage, boundingRects]
Out[35]=

Scale the boxes to the cover the whole text:

In[36]:=
boundingParallelograms = BoundingRegion[#, "MinOrientedRectangle"] & /@ contours;
boundingParallelograms = scaleParallelogram[#, {1.2, 2.1}] & /@ boundingParallelograms;
HighlightImage[testImage, boundingParallelograms]
Out[38]=

Net information

Inspect the number of parameters of all arrays in the net:

In[39]:=
Information[
 NetModel[
  "DBNet Text Detector Trained on ICDAR-2015 and Total-Text Data"], "ArraysElementCounts"]
Out[39]=

Obtain the total number of parameters:

In[40]:=
Information[
 NetModel[
  "DBNet Text Detector Trained on ICDAR-2015 and Total-Text Data"], "ArraysTotalElementCount"]
Out[40]=

Obtain the layer type counts:

In[41]:=
Information[
 NetModel[
  "DBNet Text Detector Trained on ICDAR-2015 and Total-Text Data"], "LayerTypeCounts"]
Out[41]=

Display the summary graphic:

In[42]:=
Information[
 NetModel[
  "DBNet Text Detector Trained on ICDAR-2015 and Total-Text Data"], "SummaryGraphic"]
Out[42]=

Export to ONNX

Export the net to the ONNX format:

In[43]:=
onnxFile = Export[FileNameJoin[{$TemporaryDirectory, "net.onnx"}], NetModel[
   "DBNet Text Detector Trained on ICDAR-2015 and Total-Text Data"]]
Out[43]=

Get the size of the ONNX file:

In[44]:=
FileByteCount[onnxFile]
Out[44]=

The size is similar to the byte count of the resource object:

In[45]:=
NetModel["DBNet Text Detector Trained on ICDAR-2015 and Total-Text Data", "ByteCount"]
Out[45]=

Check some metadata of the ONNX model:

In[46]:=
{OpsetVersion, IRVersion} = {Import[onnxFile, "OperatorSetVersion"], Import[onnxFile, "IRVersion"]}
Out[46]=

Import the model back into Wolfram Language. However, the NetEncoder and NetDecoder will be absent because they are not supported by ONNX:

In[47]:=
Import[onnxFile]
Out[47]=

Resource History

Reference