I saw the first guide of translation for YOLOX here Translation procedure for YOLOX - Renesas-wiki - Renesas Confluence
I also saw the second guide for YOLOX here rzv_drp-ai_tvm/how-to/sample_app_v2h/app_yolox_cam at main · renesas-rz/rzv_drp-ai_tvm · GitHub
I have some questions.
1. Does the above first guide is used for the second guide?
2. In the second guide rzv_drp-ai_tvm/how-to/sample_app_v2h/app_yolox_cam at main · renesas-rz/rzv_drp-ai_tvm · GitHub you give use yolox_VOC.onnx here https://github.com/renesas-rz/rzv_drp-ai_tvm/blob/main/how-to/sample_app_v2h/app_yolox_cam/yolox-S_VOC.onnx (as my understanding, it is the original model) but I saw that when translating model you use the model YoloX-S_VOC_sparse70.onnx. Could you share the model YoloX-S_VOC_sparse70.onnx please?
$TRANSLATOR/../onnx_models/YoloX-S_VOC_sparse70.onnx \ -o yolox_cam \ -t $SDK \ -d $TRANSLATOR \ -c $QUANTIZER \ -s 1,3,640,640 \ --images $TRANSLATOR/../GettingStarted/tutorials/calibrate_sample/ \ -v 1003. In the first guide. Why do you split (Reshape > Concat > Transpose) for original model? I can not catch up your idea. We can keep the original as one block to reduce complexity4. I know that YOLOX need further processing - Non-max suppression. As my understanding, this part (Reshape > Concat > Transpose) and Non-max suppression is implemented in .cpp files. Is that right?
Kind regards,
Stefan
Hi Stefan
Thank you for your response.
Let me confirm this point.
DRP-AI TVM can handle that, but the last reshaping and concatenation would be processed on CPU and reduce the performance.
It means that we seperate (Reshape > Concat > Transpose) block from the original model and run this block (Reshape > Concat > Transpose) on CPU and the first block (before Reshape) on DRP-AI TVM to optimize inference time. Is that right? Is there any technique to know at which layer we need to separate model?
As my understanding for your response, when we run whole model on DRP-AI TVM, it is not as optimized as the above separating method. Is that right?
Furthermore, I read docs from Renesas, you can see in this link https://www.renesas.com/en/document/whp/next-generation-highly-power-efficient-ai-accelerator-drp-ai3-10x-faster-embedded-processing?r=25471761 (page 8, figure 10). It said that "On the one hand, as AI processing speeds up, the processing time for algorithm-based image processing without AI, such as pre- and post-AI processing is becoming a relative bottleneck. In AI-MPUs, a portion of the image processing program is offloaded to the DRP, thereby contributing to the improvement of the overall system processing time". Whether we can run all image processing (load image by opencv, do preprocessing), all post-processing on DRP-AI to improve inference time?
Thank you so much.
Hi AnhOi,
DRP-AI TVM maps the operations of neural networks to DRP-AI and, if needed, to CPU.Please check the DRP-AI TVM log file to see how the neural network will be processed.
Besides the last reshape/concat/transpose layer, the given YOLOx is completely mapped to DRP-AI.
It is not mandatory to split YOLOx for the translation with DRP-AI TVM.This depends on the format used in the target application.
The DRP-AI TVM sample application uses YOLOx without the reshape/concat/transpose layer.The cut procedure is described in Translation procedure for YOLOX - Renesas-wiki - Renesas Confluence
As mentioned in the white paper, DRP-AI is able to accelerate typical pre- and post-process steps.This is used whenever applicable, of course. The DRP-AI TVM tutorial explains the details.If you want to discuss this in more detail, please submit a ticket submit a ticket.
Thanks a lot. It is more cleared for me. I will come back if i have questions.
Have a nice day.