DBNet Text Detector Trained on ICDAR-2015 and Total-Text Data

Detect and localize text in an image

Released in 2022, this family of segmentation networks introduces a novel framework for detecting arbitrary-shape scene text. By means of a new module named Differentiable Binarization (DB), a segmentation network can adaptively set the thresholds for binarization; this approach simplifies the postprocessing and improves the performance of text detection. In addition, an efficient Adaptive Scale Fusion (ASF) module fuses features of different scales adaptively, improving the scale robustness. The experiments have shown that for a ResNet-50 backbone, the network outperforms state-of-the-art results on five standard scene text benchmarks. And for a lightweight ResNet-18 backbone, the network achieves competitive performance while showing real-time inference speed.

Training Set Information

The ICDAR-2015 dataset consists of one thousand training images and five hundred testing images, which are captured by Google Glasses with a resolution of 720x1280. The text instances are labeled at the word level. Total-Text is a text detection dataset that consists of 1,555 images with a variety of text types including horizontal, multi-oriented and curved text instances. The training split and testing split have 1,255 images and 300 images, respectively.

Model Information

Examples

Download Example Notebook

Open in Wolfram Cloud

Resource retrieval

Get the pre-trained net:

In[1]:=

Out[1]=

NetModel parameters

This model consists of a family of individual nets, each identified by a specific architecture. Inspect the available parameters:

In[2]:=

Out[2]=

Pick a non-default net by specifying the architecture:

In[3]:=

Out[3]=

Pick a non-default uninitialized net:

In[4]:=

Out[4]=

Evaluation function

Write an evaluation function to scale the result to the input image size and suppress the least probable detections:

In[5]:=

$xywha2Parallelogram[input_] := Block[{cp, w, h, a, r, pts}, cp = input[[;; 2]]; {w, h} = input[[3 ;; 4]] / 2; a = N[input[[5]]]; r = {cp - {w, h}, cp + {w, -h}, cp + {-w, h}}; pts = RotationTransform[a, cp][r]; Parallelogram[pts[[1]], {pts[[2]] - pts[[1]], pts[[3]] - pts[[1]]}] ]; parallelogram2xywha[Parallelogram[r1_, {v1_, v2_}]] := Block[{n1, n2, w, h, a}, {n1, n2} = {Norm[v1], Norm[v2]}; If[n1 === 0. || n2 === 0., Return[{Sequence @@ ( r1 + (v1 + v2) / 2), 0., 0., 0.}];]; If[ n1 > n2, w = n1; h = n2; a = ArcTan[Sequence @@ v1], w = n2; h = n1; a = ArcTan[Sequence @@ v2] ]; (*If[a > Pi/2, a = Pi - a];*) {Sequence @@ ( r1 + (v1 + v2) / 2), w, h, a} ];$

In[6]:=

$scaleParallelogram[pgram_, scale_] := Module[{cpx, cpy, w, h, a}, {cpx, cpy, w, h, a} = parallelogram2xywha[pgram]; xywha2Parallelogram[{cpx, cpy, scale[[1]]*w, scale[[2]]*h, a}] ]; parallelogramPerimeter[Parallelogram[r1_, {v11_, v12_}]] := 2*(Norm[v12] + Norm[v11]); filterLowScoreDetections[{boundingParallelograms_, scores_}, acceptanceThreshold_] := Transpose@Select[ Transpose[{boundingParallelograms, scores}], #[[2]] >= acceptanceThreshold &]; filterSmallPerimeterDetections[{boundingParallelograms_, scores_}, minPerimeter_] := Transpose@Select[ Transpose[{boundingParallelograms, scores}], parallelogramPerimeter[#[[1]]] >= minPerimeter &];$

In[7]:=

$Options[netevaluate] = { "MaskThreshold" -> 0.3, "AcceptanceThreshold" -> 0.1, "MinPerimeter" -> 13, "ScaledPadding" -> {1.3, 2.5}, "Output" -> "Regions" | "Masks"}; netevaluate[img_, OptionsPattern[]] := Module[ {probabilitymap, inputImageDims, w, h, ratio, tRatio, probmap, probmapdims, mask, contours, boundingRects, probcrops, maskcrops, scores, boundingParallelograms, res}, probabilitymap = Image[NetModel[ "DBNet Text Detector Trained on ICDAR-2015 and Total-Text Data"][ img]]; (* Scale probability map to original image dimensions *) inputImageDims = ImageDimensions[img]; {w, h} = ImageDimensions[probabilitymap]; ratio = ImageAspectRatio[img]; tRatio = ImageAspectRatio[probabilitymap]; probmap = If[tRatio/ratio > 1, ImageResize[ImageCrop[probabilitymap, {w, w*ratio}], inputImageDims], ImageResize[ImageCrop[probabilitymap, {h /ratio, h}], inputImageDims] ]; probmapdims = ImageDimensions[probmap]; (* Binarize the probability map to get the mask *) mask = Binarize[probmap, OptionValue["MaskThreshold"]]; (*Output mask*) If[SameQ[OptionValue["Output"], "Masks"], Return[mask]]; (* extract contours*) contours = ImageMeasurements[mask, "PerimeterPositions", CornerNeighbors -> True]; (* get bounding box for each contour *) boundingRects = BoundingRegion[#, "MinRectangle"] & /@ contours; If[boundingRects === {}, Return @{{}, {}}]; (* score computation defined as the mean intensity of the pixels inside the mask region for each contour *) probcrops = ImageTrim[probmap, boundingRects]; maskcrops = ImageTrim[mask, boundingRects]; scores = MapThread[ ImageMeasurements[ImageMultiply[#1, #2], "Mean"] &, {probcrops, maskcrops}]; (*Get parallelograms*) boundingParallelograms = BoundingRegion[#, "MinOrientedRectangle"] & /@ contours; If[boundingParallelograms === {}, Return @{{}, {}}]; (* Filter parallelograms by score *) res = filterLowScoreDetections[{boundingParallelograms, scores}, OptionValue["AcceptanceThreshold"]]; If[res === {}, Return @{{}, {}}]; (* scale resulting parallelograms *) If[UnsameQ[OptionValue["ScaledPadding"], Automatic], boundingParallelograms = scaleParallelogram[#, OptionValue["ScaledPadding"]] & /@ res[[1]], boundingParallelograms = res[[1]] ]; (* Filter small perimeters *) res = filterSmallPerimeterDetections[{boundingParallelograms, res[[2]]}, OptionValue["MinPerimeter"]]; If[res === {}, Return @{{}, {}}]; (* show result *) Association[{"BoundingBoxes" -> res[[1]], "Scores" -> res[[2]]}] ];$

Basic usage

Obtain the detected bounding boxes with their corresponding scores for a given image:

In[8]:=

(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/8c24702b-fa56-4788-b52b-bdd4281b120d"]

In[9]:=

The model's output is an Association containing the detected "BoundingBoxes" and "Scores":

In[10]:=

Out[10]=

The "BoundingBoxes" key is a list of Parallelogram expressions corresponding to the bounding regions of the detected objects:

In[11]:=

Out[11]=

Visualize the bounding regions:

In[12]:=

Out[12]=

The "Scores" key contains the score values of the detected objects:

In[13]:=

Out[13]=

Advanced usage

Get an image:

In[14]:=

(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/82246f4b-9314-420b-a89f-853355af775b"]

Get the text region masks via the option "Output"->"Masks" and visualize it:

In[15]:=

mask = netevaluate[testImage, "Output" -> "Masks"];
mask = Dilation[mask, 10];
HighlightImage[testImage, mask, ImageSize -> Medium]

Out[17]=

Obtain the boxes using the default evaluation and visualize them:

In[18]:=

In[19]:=

Out[19]=

Increase the "MinPerimeter" to remove small boxes:

In[20]:=

Visualize the selected and filtered out boxes:

In[21]:=

Out[21]=

Increase the "AcceptanceThreshold" to remove low probability boxes:

In[22]:=

Visualize the selected and filtered out boxes:

In[23]:=

Out[23]=

Change the box padding with the "ScaledPadding" option:

In[24]:=

Visualize the original and padded boxes:

In[25]:=

Out[25]=

The "MaskThreshold" option can help to filter noisy detections. Increasing the "MaskThreshold" helps to select the boxes with the strongest probability map:

In[26]:=

Visualize the selected boxes:

In[27]:=

Out[27]=

Network result

Get an image:

In[28]:=

(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/9250a93e-b846-4b3e-b09d-066964fb936c"]

Get the probability map for the detected text:

In[29]:=

Adjust the result dimensions to the original image shape:

In[30]:=

probabilitymap = Image[probabilitymap];
inputImageDims = ImageDimensions[testImage];
{w, h} = ImageDimensions[probabilitymap];
ratio = ImageAspectRatio[testImage];
tRatio = ImageAspectRatio[probabilitymap];
probmap = If[tRatio/ratio > 1,
ImageResize[ImageCrop[probabilitymap, {w, w*ratio}], inputImageDims],
ImageResize[ImageCrop[probabilitymap, {h /ratio, h}], inputImageDims]
];

Visualize the probability map:

In[31]:=

ImageCompose[
Colorize[Threshold[probmap, 0.005], ColorFunction -> ColorData["TemperatureMap"], ColorRules -> {0 -> White}], {testImage, 0.5}]

Out[31]=

Binarize the probability map to obtain the mask:

In[32]:=

Out[32]=

Visualize the bounding boxes around the masked regions:

In[33]:=

contours = ImageMeasurements[mask, "PerimeterPositions", CornerNeighbors -> True];
boundingRects = BoundingRegion[#, "MinRectangle"] & /@ contours;
HighlightImage[testImage, boundingRects]

Out[35]=

Scale the boxes to the cover the whole text:

In[36]:=

boundingParallelograms = BoundingRegion[#, "MinOrientedRectangle"] & /@ contours;
boundingParallelograms = scaleParallelogram[#, {1.2, 2.1}] & /@ boundingParallelograms;
HighlightImage[testImage, boundingParallelograms]

Out[38]=

Net information

Inspect the number of parameters of all arrays in the net:

In[39]:=

Out[39]=

Obtain the total number of parameters:

In[40]:=

Out[40]=

Obtain the layer type counts:

In[41]:=

Out[41]=

Display the summary graphic:

In[42]:=

Out[42]=

Export to ONNX

Export the net to the ONNX format:

In[43]:=

Out[43]=

Get the size of the ONNX file:

In[44]:=

Out[44]=

The size is similar to the byte count of the resource object:

In[45]:=

Out[45]=

Check some metadata of the ONNX model:

In[46]:=

Out[46]=

Import the model back into Wolfram Language. However, the NetEncoder and NetDecoder will be absent because they are not supported by ONNX:

In[47]:=

Out[47]=

Resource History

Date Created: 23 January 2025

Reference

M. Liao, Z. Zou, Z. Wan, C. Yao, X. Bai, "Real-Time Scene Text Detection with Differentiable Binarization and Adaptive Scale Fusion," arXiv:2202.10304 (2022)
Available from: https://github.com/open-mmlab/mmocr
Rights: Apache 2.0 License