Resource retrieval
Get the pre-trained net:
NetModel parameters
This model consists of a family of individual nets, each identified by a specific architecture. Inspect the available parameters:
Pick a non-default net by specifying the architecture:
Pick a non-default uninitialized net:
Evaluation function
Write an evaluation function to scale the result to the input image size and suppress the least probable detections:
Basic usage
Obtain the detected bounding boxes and masks with their corresponding classes and confidences for a given image:
The model returns "BoundingRegion" and "Scores":
The "BoundingRegion" is a list of Polygon expressions corresponding to the bounding regions of the detected objects:
"Scores" contains the confidence scores of the detected objects:
Visualize the bounding region for each text instance:
Get the individual masks via the option "Output"->"Masks":
Visualize the masks for each text instance with its assigned score:
Network result
Get a sample image:
The network computes seven prototyped segmentation masks for all the text instances at different scales:
Visualize the prototyped segmentation masks:
Rescale the probability map of the first segmentation mask to the original image size:
Visualize the probability map of having text:
Threshold the results to get the masks:
The first segmentation mask is used as the text mask because it has the largest scale that allows the selection of text regions. Intercept the masks with the predicted text regions:
The MorphologicalComponents function can create masks for each text instance, using the final segmentation mask. This mask, which has the smallest scale, clearly separates different text instances by keeping their boundaries apart:
Use the SelectComponents function to split the components into different images:
The progressive scale expansion algorithm starts from the pixels of multiple kernels and iteratively merges the adjacent text pixels avoiding the conflict of shared pixels and preserving the distinction between instances. Define a function that removes the shared pixels between kernels:
Apply the progressive scale expansion algorithm starting from the mask with the smallest scale and adding pixels progressively using the other masks:
Rescale the final list of masks to the original image size and visualize:
It is possible to choose a bounding region type. Find the contour points of each region and select a bounding region type to enclose each piece of text:
Net information
Inspect the number of parameters of all arrays in the net:
Obtain the total number of parameters:
Obtain the layer type counts:
Display the summary graphic:
Export to ONNX
Export the net to the ONNX format:
Get the size of the ONNX file:
The size is similar to the byte count of the resource object :
Check some metadata of the ONNX model:
Import the model back into Wolfram Language. However, the NetEncoder and NetDecoder will be absent because they are not supported by ONNX: