PANet Text Detector Trained on ICDAR-2015 and CTW1500 Data

Detect and localize text in an image

The Pixel Aggregation Network (PAN) is a family of models that offers an efficient and accurate detector featuring a low-cost segmentation head and learnable postprocessing. PAN's segmentation head includes the Feature Pyramid Enhancement Module for multilevel segmentation and the Feature Fusion Module for feature refinement. The pixel aggregation method enhances precision by grouping text pixels via similarity vectors. Experiments show PAN achieves a strong 79.9% F-measure at 84.2 FPS on CTW1500.

Training Set Information

Model Information

Examples

Resource retrieval

Get the pre-trained net:

In[1]:=
NetModel["PANet Text Detector Trained on ICDAR-2015 and CTW1500 Data"]
Out[1]=

NetModel parameters

This model consists of a family of individual nets, each identified by a specific architecture. Inspect the available parameters:

In[2]:=
NetModel["PANet Text Detector Trained on ICDAR-2015 and CTW1500 Data", "ParametersInformation"]
Out[2]=

Pick a non-default net by specifying the architecture:

In[3]:=
NetModel[{"PANet Text Detector Trained on ICDAR-2015 and CTW1500 Data",
   "Dataset" -> "ICDAR2015"}]
Out[3]=

Pick a non-default uninitialized net:

In[4]:=
NetModel[{"PANet Text Detector Trained on ICDAR-2015 and CTW1500 Data",
   "Dataset" -> "ICDAR2015"}, "UninitializedEvaluationNet"]
Out[4]=

Evaluation function

Write an evaluation function to extract the bounding regions and masks for each text instance:

In[5]:=
perimeter = ImageSubtract[Dilation[#, 1] - #] &;
expandComponent[component_, similarity_, t_] := Module[{p, mean, dist, new},
  p = PixelValuePositions[perimeter@component, 1];
  mean = ImageMeasurements[similarity, "Mean", Masking -> component];
  dist = DistanceMatrix[PixelValue[similarity, p], {mean}][[All, 1]];
  new = Pick[p, UnitStep[t - dist], 1];
  ReplacePixelValue[component, new -> 1]]
In[6]:=
Options[netevaluate] = { "MaskThreshold" -> 0.5, "KernelThreshold" -> 0.2, "MinTextArea" -> 16, "RegionType" -> "MinOrientedRectangle", "Output" -> "Regions" | "Masks"};
netevaluate[img_, OptionsPattern[]] := Module[
   {result, kernel, embeddings, similarity, comlist, labels, scores, masks, elem, area, mean, score, i, inputImageDims, h, w, ratio, tRatio, contours, boundingReg}, result = NetModel[
      "PANet Text Detector Trained on ICDAR-2015 and CTW1500 Data"][
     img]; kernel = Image[UnitStep[result["Kernel"] - OptionValue["KernelThreshold"]]*
      UnitStep[result["TextRegion"] - OptionValue["MaskThreshold"]]];
   similarity = Image[result["Similarity"], Interleaving -> False];
   comlist = Image /@ Values@ComponentMeasurements[kernel, "Mask"];
   labels = Map[expandComponent[#, similarity, OptionValue["MaskThreshold"]] &,
      comlist]; (*Filter the results by defining thresholds for the instance area and pixel value's mean*)
   masks = Association[];
   scores = Association[];
   elem = 1;
   For[i = 1, i <= Length[labels], i++,
    With[{label = labels[[i]]},
     area = Values[ComponentMeasurements[label, "Area"]][[1]];
     If[SameQ[area, {}], Continue[]];
     If[area >= OptionValue["MinTextArea"],
      AppendTo[masks, elem -> label];
      elem += 1,
      Continue[]
      ]
     ]
    ]; (*scale the results to match the shape of the original image*)
   inputImageDims = ImageDimensions[img];
   {w, h} = ImageDimensions[kernel];
   ratio = ImageAspectRatio[img];
   tRatio = ImageAspectRatio[kernel];
   masks = Map[If[
       tRatio/ratio > 1,
       ImageResize[ImageCrop[#, {w, w*ratio}], inputImageDims],
       ImageResize[ImageCrop[#, {h /ratio, h}], inputImageDims]
       ] &, masks]; If[SameQ[OptionValue["Output"], "Masks"], Return[masks]]; (*get the texts contours*)
   contours = Map[Values[
        ComponentMeasurements[#, "PerimeterPositions", CornerNeighbors -> True]][[1]] &, masks]; (*get the bounding region for each contour *)
   boundingReg = Which[
     SameQ[OptionValue["RegionType"], "MinRectangle"],
      Map[BoundingRegion[#[[1]], "MinRectangle"] & , contours],
     SameQ[OptionValue["RegionType"], "MinOrientedRectangle"],
      Map[BoundingRegion[#[[1]], "MinOrientedRectangle"] & , contours],
     SameQ[OptionValue["RegionType"], "MinConvexPolygon"],
      Map[BoundingRegion[#[[1]], "MinConvexPolygon"] & , contours]
     ];
   boundingReg
   ];

Basic usage

Obtain the bounding boxes and masks for each text instance in a given image:

In[7]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/c554b551-f58a-4ce8-8565-7db9f95fdc81"]
In[8]:=
detection = netevaluate[testImage];

The output is an Association containing the detected bounding boxes with their labels:

In[9]:=
detection
Out[9]=

Visualize the bounding regions:

In[10]:=
HighlightImage[testImage, detection, ImageLabels -> None]
Out[10]=

Advanced usage

Get an image:

In[11]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/2d82ac04-570f-419d-a511-81fdee62edaf"]

Obtain the bounding regions using the default evaluation and visualize them:

In[12]:=
reg1 = netevaluate[testImage];
In[13]:=
HighlightImage[testImage, reg1, ImageLabels -> None]
Out[13]=

Get the individual masks via the option "Output"->"Masks":

In[14]:=
reg2 = netevaluate[testImage, "Output" -> "Masks"]
Out[14]=
In[15]:=
HighlightImage[testImage, reg2, ImageLabels -> None]
Out[15]=

Increase the "MinTextArea" to remove small regions:

In[16]:=
reg3 = netevaluate[testImage, "MinTextArea" -> 500];
In[17]:=
HighlightImage[testImage, reg3, ImageLabels -> None]
Out[17]=

Set the region type to "MinConvexPolygon" to generate arbitrarily shaped regions:

In[18]:=
reg4 = netevaluate[testImage, "RegionType" -> "MinConvexPolygon"];
In[19]:=
HighlightImage[testImage, reg4, ImageLabels -> None]
Out[19]=

Network result

Get an image:

In[20]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/0eb24de7-a93c-4416-9f0b-32c0ab1005d1"]

Run the model on the image:

In[21]:=
result = NetModel[
    "PANet Text Detector Trained on ICDAR-2015 and CTW1500 Data"][
   testImage];
In[22]:=
Keys[result]
Out[22]=

The model's outputs are the "TextRegion", "Kernel" and "Similarity" components. The text region matrix outlines the entire area of each text instance, while the kernel matrix helps distinguish between individual text instances. The similarity vector then guides the grouping of pixels within each instance:

In[23]:=
textProbMap = Image[result["TextRegion"]]
Out[23]=
In[24]:=
kernelMap = Image[result["Kernel"]]
Out[24]=
In[25]:=
similarityMap = Image[result["Similarity"], Interleaving -> False]
Out[25]=

Binarize the text probability map and the kernel. Multiply both images to obtain the final kernel:

In[26]:=
mask = Binarize[textProbMap, 0.5];
kernel = Binarize[kernelMap, 0.5];
kernel = ImageMultiply[kernel, mask]
Out[28]=

Split the detected instances:

In[29]:=
comlist = ComponentMeasurements[kernel, "Mask"]
Out[29]=

Use the expandComponent function to expand the kernel region using the similarity matrices as a guide:

In[30]:=
labels = Map[expandComponent[#, similarityMap, 0.5] &, Image /@ Values@comlist]
Out[30]=

Filter the small areas:

In[31]:=
labels = Select[labels, (Values[ComponentMeasurements[#, "Area"]][[1]] > 80) &]
Out[31]=

All outputs contain rectangular matrices with fixed dimensions, specifically 160×192. Adjust the result dimensions to the original image shape:

In[32]:=
ImageDimensions /@ labels
Out[32]=
In[33]:=
scaleResult[img_Image, orImg_Image] := Module[{inputImageDims, w, h, ratio, tRatio},
   (*scale the results to match the shape of the original image*)
   inputImageDims = ImageDimensions[orImg];
   {w, h} = ImageDimensions[img];
   ratio = ImageAspectRatio[orImg];
   tRatio = ImageAspectRatio[img];
   If[
    tRatio/ratio > 1,
    ImageResize[ImageCrop[img, {w, w*ratio}], inputImageDims],
    ImageResize[ImageCrop[img, {h /ratio, h}], inputImageDims]
    ]
   ];
labels = Map[scaleResult[#, testImage] &, labels]
Out[34]=

Visualize the detected text instances:

In[35]:=
HighlightImage[testImage, Thread[Range[Length[labels]] -> labels], ImageLabels -> None]
Out[35]=

Net information

Inspect the number of parameters of all arrays in the net:

In[36]:=
Information[
 NetModel[
  "PANet Text Detector Trained on ICDAR-2015 and CTW1500 Data"], "ArraysElementCounts"]
Out[36]=

Obtain the total number of parameters:

In[37]:=
Information[
 NetModel[
  "PANet Text Detector Trained on ICDAR-2015 and CTW1500 Data"], "ArraysTotalElementCount"]
Out[37]=

Obtain the layer type counts:

In[38]:=
Information[
 NetModel[
  "PANet Text Detector Trained on ICDAR-2015 and CTW1500 Data"], "LayerTypeCounts"]
Out[38]=

Display the summary graphic:

In[39]:=
Information[
 NetModel[
  "PANet Text Detector Trained on ICDAR-2015 and CTW1500 Data"], "SummaryGraphic"]
Out[39]=

Export to ONNX

Export the net to the ONNX format:

In[40]:=
onnxFile = Export[FileNameJoin[{$TemporaryDirectory, "net.onnx"}], NetModel[
   "PANet Text Detector Trained on ICDAR-2015 and CTW1500 Data"]]
Out[40]=

Get the size of the ONNX file:

In[41]:=
FileByteCount[onnxFile]
Out[41]=

The size is similar to the byte count of the resource object:

In[42]:=
NetModel["PANet Text Detector Trained on ICDAR-2015 and CTW1500 Data", "ByteCount"]
Out[42]=

Check some metadata of the ONNX model:

In[43]:=
{OpsetVersion, IRVersion} = {Import[onnxFile, "OperatorSetVersion"], Import[onnxFile, "IRVersion"]}
Out[43]=

Import the model back into Wolfram Language. However, the NetEncoder and NetDecoder will be absent because they are not supported by ONNX:

In[44]:=
Import[onnxFile]
Out[44]=

Resource History

Reference