YOLOR
Trained on
MSCOCO Data
YOLO (You Only Learn One Representation) Version R is a family of object detection models published in May 2021. It is characterized by a unified network that can accomplish various tasks, integrating implicit and explicit knowledge by leveraging techniques such as kernel space alignment, prediction refinement and a convolutional neural network with multitask learning. These models achieve comparable object detection accuracy as the ScaledYOLO Version 4 models while having an inference speed faster by 88%.
Examples
Resource retrieval
Get the pretrained net:
NetModel parameters
This model consists of a family of individual nets, each identified by a specific parameter combination. Inspect the available parameters:
Pick a nondefault net by specifying the parameters:
Pick a nondefault uninitialized net:
Evaluation function
Write an evaluation function to scale the result to the input image size and suppress the least probable detections:
Basic usage
Obtain the detected bounding boxes with their corresponding classes and confidences for a given image:
Inspect which classes are detected:
Visualize the detection:
Network result
The network computes 102,200 bounding boxes and the probability that the objects in each box are of any given class:
Rescale the bounding boxes to the coordinates of the input image and visualize them scaled by their "objectness" measures:
Visualize all the boxes scaled by the probability that they contain a cat:
Superimpose the cat prediction on top of the input received by the net:
Net information
Inspect the number of parameters of all arrays in the net:
Obtain the total number of parameters:
Obtain the layer type counts:
Display the summary graphic:
Resource History
Reference

C.Y. Wang, I.H. Yeh, H.Y. M. Liao,"You Only Learn One Representation: Unified Network for Multiple Tasks," arXiv:2105.04206v1 (2021)
 Available from:

Rights:
GNU General Public License