My question is same as the title.
I follow this sample for RZ/V2L https://github.com/renesas-rz/rzv_ai_sdk/tree/main/R01_object_detection it only provides object object model. I want to reporduce all steps and check model archiecture in this sample. I can generate yolov3.onnx model from other repos but I am not sure it is same as the one in this example.. Could you share yolov3.onnx for this example. Thank you.
Hi AnhOi,My apologies, we cannot provide the requested model.Thank you.Kind Regards.
The AI SDK Demo are based on open source trained models. For yolov3 the models and the dataset used to create the trained model can be found.
YOLOv3: DarknetDataset: COCOExample of how to translate the YoloV3 model to run on the RZV2L DRP please refer here.
The Models used for AI SDK are based on open source trained models. The YoloV3 model can be found below
YOLOv3: DarknetDataset: COCO
For information on how to traslate the model to run on the RZV2L DRP-AI please refer to here.
Hi PT_Renesas
I only want to check the shape of 3 outputs from renesas's yolov3, because I could not find 3 nodes with same shape of YOLOv3 as Renesas mentioned.
In this link https://github.com/renesas-rz/rzv_ai_sdk/tree/main/R01_object_detection you said that
Input size: 1x3x416x416Output1 size: 1x13x13x255Output2 size: 1x26x26x255Output3 size: 1x52x52x255
I follow this repo to convert from darknet to yolov3 https://github.com/zldrobit/onnx_tflite_yolov3 and visualize onnx model by Neutron but I did not see 3 nodes correspond with your above shape.
From my point of view, it is hard for user to follow guideline and customize for other problem. I think that if renesas can not provide model for specific dataset, you can provide general guide to create model (reproduce architecture). I think that it is better for users.
Thank you.
Hi michael kosinski
Thank you for response. I also found this link today. It is better to link this guide https://github.com/renesas-rz/rzv_drp-ai_tvm/blob/main/docs/model_list/how_to_convert/How_to_convert_yolov3_onnx_model_V2L_V2M_V2MA.md to https://github.com/renesas-rz/rzv_ai_sdk/tree/main/R01_object_detection
Thank you again.
I convert from yolov3.pt to yolov3.onnx, here are 3 outputs with shape. It seems that they are different than shapes in this link https://github.com/renesas-rz/rzv_ai_sdk/tree/bba59d0de1b6a6997eada351bc80e005fca76498/R01_object_detection#ai-model Could you please tell me specific output nodes of YOLOv3 you get for this application https://github.com/renesas-rz/rzv_ai_sdk/tree/bba59d0de1b6a6997eada351bc80e005fca76498/R01_object_detection#ai-model
Hi AnhOi,Please chek this header file.The output shape is defined in lines 148-165.There is a typo in the size description of README.Kind Regards.
Thank you. Let me check it more.
I checked this link that you mentioned. In the line of header file https://github.com/renesas-rz/rzv_ai_sdk/blob/bba59d0de1b6a6997eada351bc80e005fca76498/R01_object_detection/src/define.h#L163 there is output inference size, not shape.
Because with different shapes, we can have same output inference size.
The shape of the tensor floatarr is determined by the number of output layers (NUM_INF_OUT_LAYER), the number of bounding boxes per grid cell (NUM_BB), the number of classes (NUM_CLASS), and the grid sizes (num_grids[]).
floatarr
NUM_INF_OUT_LAYER
NUM_BB
NUM_CLASS
num_grids[]
Each output layer has a grid of size num_grids[n] × num_grids[n], where n ranges from 0 to 2, meaning the grids are:
num_grids[n] × num_grids[n]
n
13 × 13
26 × 26
52 × 52
Each grid cell contains NUM_BB bounding boxes. Each bounding box has (NUM_CLASS + 5) values:
(NUM_CLASS + 5)
tx, ty, tw, th
objectness
So, the total number of elements in floatarr is:
∑n=02NUMBB×(NUMCLASS+5)×num_grids[n]×num_grids[n]\sum_{n=0}^{2} NUM_BB \times (NUM_CLASS + 5) \times num\_grids[n] \times num\_grids[n]n=0∑2NUMBB×(NUMCLASS+5)×num_grids[n]×num_grids[n]
Substituting values:
3×(80+5)×(13×13+26×26+52×52)3 \times (80 + 5) \times (13 \times 13 + 26 \times 26 + 52 \times 52)3×(80+5)×(13×13+26×26+52×52) 3×85×(169+676+2704)3 \times 85 \times (169 + 676 + 2704)3×85×(169+676+2704) 3×85×3549=9049053 \times 85 \times 3549 = 9049053×85×3549=904905
Thus, the shape of floatarr is (904905, ), a 1D tensor
The 1D tensor floatarr can be split into three separate tensors, corresponding to each grid size in num_grids[]:
First tensor (for 13×13 grid)
13×13
NUM_CLASS + 5
Second tensor (for 26×26 grid)
26×26
Third tensor (for 52×52 grid)
52×52
Thank you for detailed information. Yes, I understood total number elements of floatarr.
As my understanding, we will use 3 default outputs of YOLOv3, because there is not note as for YOLOX https://confluence.renesas.com/display/REN/Translation+procedure+for+YOLOX . Sorry for miscummunication, I only want to confirm specific output nodes for YOLOv3.