PSENet Text Detector Trained on ICDAR-2015 and CTW1500 Data

Detect and localize text in an image

This family of models introduce the novel Progressive Scale Expansion Network (PSENet), which can precisely detect text instances with arbitrary shapes. The basic framework of PSENet is implemented from a Feature Pyramid Network with a ResNet backbone; the network produces seven segmentation masks at a certain scale. In the postprocessing step, PSENet uses a progressive scale expansion algorithm that gradually expands the minimal scale kernel to the text instance with the complete shape, avoiding conflicting pixel labeling at each expansion step. Experiments on CTW1500 validate the effectiveness of PSENet, achieving an F-measure of 74.3% at 27 FPS.

Training Set Information

Model Information

Examples

Resource retrieval

Get the pre-trained net:

In[1]:=
NetModel["PSENet Text Detector Trained on ICDAR-2015 and CTW1500 Data"]
Out[1]=

NetModel parameters

This model consists of a family of individual nets, each identified by a specific architecture. Inspect the available parameters:

In[2]:=
NetModel["PSENet Text Detector Trained on ICDAR-2015 and CTW1500 Data", "ParametersInformation"]
Out[2]=

Pick a non-default net by specifying the architecture:

In[3]:=
NetModel[{"PSENet Text Detector Trained on ICDAR-2015 and CTW1500 Data", "Dataset" -> "ICDAR2015"}]
Out[3]=

Pick a non-default uninitialized net:

In[4]:=
NetModel[{"PSENet Text Detector Trained on ICDAR-2015 and CTW1500 Data", "Dataset" -> "ICDAR2015"}, "UninitializedEvaluationNet"]
Out[4]=

Evaluation function

Write an evaluation function to scale the result to the input image size and suppress the least probable detections:

In[5]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/25cb88da-9189-4f4c-b6bf-c5119995f37c"]

Basic usage

Obtain the detected bounding boxes and masks with their corresponding classes and confidences for a given image:

In[6]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/680db4d5-3580-4712-b37a-7e17a6dd8695"]
In[7]:=
detection = netevaluate[testImage];

The model returns "BoundingRegion" and "Scores":

In[8]:=
Keys[detection]
Out[8]=

The "BoundingRegion" is a list of Polygon expressions corresponding to the bounding regions of the detected objects:

In[9]:=
detection["BoundingRegion"]
Out[9]=

"Scores" contains the confidence scores of the detected objects:

In[10]:=
detection["Scores"]
Out[10]=

Visualize the bounding region for each text instance:

In[11]:=
HighlightImage[testImage, detection["BoundingRegion"], ImageLabels -> None]
Out[11]=

Get the individual masks via the option "Output"->"Masks":

In[12]:=
masks = netevaluate[testImage, "Output" -> "Masks"]
Out[12]=

Visualize the masks for each text instance with its assigned score:

In[13]:=
HighlightImage[testImage, Values@masks, ImageLegends -> Flatten@Values@detection["Scores"]]
Out[13]=

Network result

Get a sample image:

In[14]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/b462488a-4e49-487a-b5bb-3fd1be5a171e"]

The network computes seven prototyped segmentation masks for all the text instances at different scales:

In[15]:=
results = NetModel[
    "PSENet Text Detector Trained on ICDAR-2015 and CTW1500 Data"][
   testImage];
In[16]:=
Dimensions[results]
Out[16]=

Visualize the prototyped segmentation masks:

In[17]:=
GraphicsRow[Map[Image, results]]
Out[17]=

Rescale the probability map of the first segmentation mask to the original image size:

In[18]:=
scaleResult[img_Image, orImg_Image] := Module[{inputImageDims, w, h, ratio, tRatio},
   (*scale the results to match the shape of the original image*)
   inputImageDims = ImageDimensions[orImg];
   {w, h} = ImageDimensions[img];
   ratio = ImageAspectRatio[orImg];
   tRatio = ImageAspectRatio[img];
   If[
    tRatio/ratio > 1,
    ImageResize[ImageCrop[img, {w, w*ratio}], inputImageDims],
    ImageResize[ImageCrop[img, {h /ratio, h}], inputImageDims]
    ]
   ];
In[19]:=
probmap = scaleResult[Image[results[[1]]], testImage];

Visualize the probability map of having text:

In[20]:=
ImageCompose[
 Colorize[probmap, ColorFunction -> ColorData["TemperatureMap"], ColorRules -> {0 -> White}], {testImage, 0.6}]
Out[20]=

Threshold the results to get the masks:

In[21]:=
maskThreshold = 0.5;
masks = Image[Boole[#]] & /@ Map[# > maskThreshold &, results, {3}];

The first segmentation mask is used as the text mask because it has the largest scale that allows the selection of text regions. Intercept the masks with the predicted text regions:

In[22]:=
textMask = masks[[1]];
kernelMasks = Map[ImageMultiply[textMask, #] &, masks];

The MorphologicalComponents function can create masks for each text instance, using the final segmentation mask. This mask, which has the smallest scale, clearly separates different text instances by keeping their boundaries apart:

In[23]:=
labels = MorphologicalComponents[kernelMasks[[-1]]];
Colorize@labels
Out[24]=

Use the SelectComponents function to split the components into different images:

In[25]:=
labels = Table[
  Image@SelectComponents[labels, SameQ[#Label, i] &],
  {i , Range[Max[labels]]}
  ]
Out[25]=

The progressive scale expansion algorithm starts from the pixels of multiple kernels and iteratively merges the adjacent text pixels avoiding the conflict of shared pixels and preserving the distinction between instances. Define a function that removes the shared pixels between kernels:

In[26]:=
removeIntersect[labels_, labelsPast_] := Module[{labelsNew, nLabels},
   nLabels = Length[labels];
   labelsNew = Table[ImageSubtract[labels[[i]], ImageMultiply[labelsPast[[i]], ImageAdd[labels[[Complement[Range[nLabels], {i}]]]]]], {i, nLabels}]; Table[
    ImageMultiply[labelsNew[[i]], Sequence @@ ColorNegate@labelsNew[[Complement[Range[nLabels], {i}]]]], {i, nLabels}]
   ];

Apply the progressive scale expansion algorithm starting from the mask with the smallest scale and adding pixels progressively using the other masks:

In[27]:=
labels = Fold[removeIntersect[
     Map[Function[x, GeodesicDilation[x, #2]], #1], #1] &, labels, Reverse[kernelMasks]];
GraphicsRow[labels]
Out[28]=

Rescale the final list of masks to the original image size and visualize:

In[29]:=
masks = Map[scaleResult[#, testImage] &, labels];
HighlightImage[testImage, Thread[Range[Length[masks]] -> masks], ImageLabels -> None]
Out[30]=

It is possible to choose a bounding region type. Find the contour points of each region and select a bounding region type to enclose each piece of text:

In[31]:=
contours = Map[Values[
      ComponentMeasurements[#, "PerimeterPositions", CornerNeighbors -> True]][[1]] &, masks];
regionTypes = {"MinRectangle", "MinOrientedRectangle", "MinConvexPolygon" };
regions = Table[BoundingRegion[contour[[1]], regType] , {regType, regionTypes}, {contour, contours}];
In[32]:=
MapThread[
 Labeled[
   HighlightImage[testImage, Thread[Range[Length[#1]] -> #1], ImageLabels -> None], #2, Top] &,
 {regions, regionTypes}
 ]
Out[32]=

Net information

Inspect the number of parameters of all arrays in the net:

In[33]:=
Information[
 NetModel[
  "PSENet Text Detector Trained on ICDAR-2015 and CTW1500 Data"], "ArraysElementCounts"]
Out[33]=

Obtain the total number of parameters:

In[34]:=
Information[
 NetModel[
  "PSENet Text Detector Trained on ICDAR-2015 and CTW1500 Data"], "ArraysTotalElementCount"]
Out[34]=

Obtain the layer type counts:

In[35]:=
Information[
 NetModel[
  "PSENet Text Detector Trained on ICDAR-2015 and CTW1500 Data"], "LayerTypeCounts"]
Out[35]=

Display the summary graphic:

In[36]:=
Information[
 NetModel[
  "PSENet Text Detector Trained on ICDAR-2015 and CTW1500 Data"], "SummaryGraphic"]
Out[36]=

Export to ONNX

Export the net to the ONNX format:

In[37]:=
onnxFile = Export[FileNameJoin[{$TemporaryDirectory, "net.onnx"}], NetModel[
   "PSENet Text Detector Trained on ICDAR-2015 and CTW1500 Data"]]
Out[37]=

Get the size of the ONNX file:

In[38]:=
FileByteCount[onnxFile]
Out[38]=

The size is similar to the byte count of the resource object :

In[39]:=
NetModel["PSENet Text Detector Trained on ICDAR-2015 and CTW1500 Data", "ByteCount"]
Out[39]=

Check some metadata of the ONNX model:

In[40]:=
{OpsetVersion, IRVersion} = {Import[onnxFile, "OperatorSetVersion"], Import[onnxFile, "IRVersion"]}
Out[40]=

Import the model back into Wolfram Language. However, the NetEncoder and NetDecoder will be absent because they are not supported by ONNX:

In[41]:=
Import[onnxFile]
Out[41]=

Resource History

Reference