Torchscript vs TensorRT for real time inference - nvidia

I have trained an object detection model to be used in production for real-time applications. I have the following two options. Can anyone suggest what is the best way to run inference on Jetson Xavier for best performance? Any other suggestions are also welcome.
Convert the model to ONXX format and use with TensorRT
Save the model as Torchscript and run inference in C++

On Jetson hardware, my experience is that using TensorRT is definitely faster. You can convert ONNX models to TensorRT using the ONNXParser from NVIDIA. For optimal performance you can choose to use mixed precision. How to convert ONNX to TensorRT is explained here: TensorRT. Section 3.2.5 for python bindings and Section 2.2.5 for the C++ bindings.

I don't have any experience in Jetson Xavier, but in Jetson Nano TensorRT is a little bit faster than ONNX or pytorch. TorchScript does no make any difference from pyTorch.

Related

How can I get YOLOv4 inference times with OpenVINO that are as fast as OpenCV?

If I run a YoloV4 model with leaky relu activations on my CPU with 256x256 RGB images in OpenCV with an OpenVINO backend, inference time plus non-max suppression is about 80ms. If, on the other hand, I convert my model to an IR following https://github.com/TNTWEN/OpenVINO-YOLOV4, which is linked to from https://github.com/AlexeyAB/darknet, inference time directly using the OpenVINO inference engine is roughly 130ms, which does not even include non-max suppression, which is quite slow when implemented naively in python.
Unfortunately, OpenCV does not offer all of the control I would like for the models and inference schemes I want to try (e.g. I want to change batch size, import models from YOLO repositories other than darknet, etc.)
What is the magic that allows OpenCV with OpenVINO backend to be so much faster?
Inference performance is application dependent and subject to many variables such as model size, model architecture, processors, etc.
This benchmark result shows performance results of running yolo-v4-tf on multiple Intel® CPUs, GPUs and VPUs.
For example, you may use an 11th Gen Intel® Core™ i7-11850HE # 2.60GHz CPU to run yolo-v4-tf, which gives 80.4 ms inferencing time.
yolo-v4-tf and yolo-v4-tiny-tf are public pre-trained models that you can use for learning and demo purposes or for developing deep learning software. You may download these models using Model Downloader.

How to convert pytorch model to TensorRT?

I have trained the classification model on Nvidia GPU and saved the model weights(checkpoint.pth). If I want to deploy this model in jetson nano and test it.
Should I convert it to TenorRT? How to convert it to TensorRT?
I am new to this. It would be helpful if someone can even correct me.
The best way to achieve the way is to export the Onnx model from Pytorch.
Next, use the TensorRT tool, trtexec, which is provided by the official Tensorrt package, to convert the TensorRT model from onnx model.
You can refer to this page: https://github.com/NVIDIA/TensorRT/blob/master/samples/opensource/trtexec/README.md
The TRTEXEC is a more native tool that you can take it from NVIDIA NGC images or downloading from the official website directly.
If you use a tool such as torch2trt, it is easy to encounter the operator issue and complicated to resolve it indeed (if you are not familiar to deal with plugin issues).
You can use this tool:
https://github.com/NVIDIA-AI-IOT/torch2trt
Here are more details how to implent a converter to a engine file:
https://github.com/NVIDIA-AI-IOT/torch2trt/issues/254

Efficient inference of 3D deep learning model (pytorch)

I am trying to use a Pytorch 3D UNet for inference (from here: https://github.com/wolny/pytorch-3dunet) which receives images of size (96, 96, 96). I would like to use it on CPU instances, but I am getting very high memory usages (~18 GB). After researching on the subject I found out that this was due to the way convolutions are implemented on CPU (see https://discuss.pytorch.org/t/pytorch-high-memory-demand/2798/5). I thus have the following questions:
Is there a way to use a more memory-efficient implementation of the convolution in Pytorch?
How can I optimize my model for CPU inference? I saw that some tools like AWS Neo, Intel OpenVINO, etc. exist; could they solve my problem?
Does Tensorflow have a similar problem for using convolutions on CPU?
Any other tip, link on how to deploy such models in an efficient way is welcome!
Thanks!
You could benchmark your model's performance with DNN-Bench and choose the best inference engine for your application and your hardware. You might need to convert your model to ONNX first.

Is there a native library written in Julia for Machine Learning?

I have started using Julia.I read that it is faster than C.
So far I have seen some libraries like KNET and Flux, but both are for Deep Learning.
also there is a command "Pycall" tu use Python inside Julia.
But I am interested in Machine Learning too. So I would like to use SVM, Random Forest, KNN, XGBoost, etc but in Julia.
Is there a native library written in Julia for Machine Learning?
Thank you
A lot of algorithms are just plain available using dedicated packages. Like BayesNets.jl
For "classical machine learning" MLJ.jl which is a pure Julia Machine Learning framework, it's written by the Alan Turing Institute with very active development.
For Neural Networks Flux.jl is the way to go in Julia. Also very active, GPU-ready and allow all the exotics combinations that exist in the Julia ecosystem like DiffEqFlux.jl a package that combines Flux.jl and DifferentialEquations.jl.
Just wait for Zygote.jl a source-to-source automatic differentiation package that will be some sort of backend for Flux.jl
Of course, if you're more confident with Python ML tools you still have TensorFlow.jl and ScikitLearn.jl, but OP asked for pure Julia packages and those are just Julia wrappers of Python packages.
Have a look at this kNN implementation and this for XGboost.
There are SVM implementations, but outdated an unmaintained (search for SVM .jl). But, really, think about other algorithms for much better prediction qualities and model construction performance. Have a look at the OLS (orthogonal least squares) and OFR (orthogonal forward regression) algorithm family. You will easily find detailed algorithm descriptions, easy to code in any suitable language. However, there is currently no Julia implementation I am aware of. I found only Matlab implementations and made my own java implementation, some years ago. I have plans to port it to julia, but that has currently no priority and may last some years. Meanwhile - why not coding by yourself? You won't find any other language making it easier to code a prototype and turn it into a highly efficient production algorithm running heavy load on a CUDA enabled GPGPU.
I recommend this quite new publication, to start with: Nonlinear identification using orthogonal forward regression with nested optimal regularization

What is the relationship between PyTorch and Torch?

There are two PyTorch repositories :
https://github.com/hughperkins/pytorch
https://github.com/pytorch/pytorch
The first clearly requires Torch and lua and is a wrapper, but the second doesn't make any reference to the Torch project except with its name.
How is it related to the Lua Torch?
Here a short comparison on pytorch and torch.
Torch:
A Tensor library like numpy, unlike numpy it has strong GPU support.
Lua is a wrapper for Torch (Yes! you need to have a good understanding of Lua), and for that you will need LuaRocks package manager.
PyTorch:
No need for the LuaRocks package manager, no need to write code in Lua. And because we are using Python, we can develop Deep Learning models with utmost flexibility. We can also exploit major Python packages likes scipy, numpy, matplotlib and Cython with PyTorch's own autograd.
There is a detailed discussion on this on pytorch forum. Adding to that both PyTorch and Torch use THNN. Torch provides lua wrappers to the THNN library while Pytorch provides Python wrappers for the same.
PyTorch's recurrent nets, weight sharing and memory usage with the flexibility of interfacing with C, and the current speed of Torch.
For more insights, have a look at this discussion session here.
Just to clarify the confusion between both pytorch repositories:
pytorch/pytorch is very similar to (Lua) Torch but in Python. So it's a wrapper over THNN. This was written by Facebook too.
hughperkins/pytorch: I have come across this repo when I was developing in Torch before pytorch existed, but I have never used it so I'm not quite sure if it is a wrapper written in Python over (Lua) Torch which is in turn a wrapper over THNN OR a wrapper over THNN and Lua. In both case, this is not the original version of Torch. It was written by Hugh Perkins when there was no Python alternative for Torch.
If you are wondering which one to go for, I would definitely recommend pytorch/pytorch as it communicates directly with THNN, is written by the people who made THNN and is continuously maintained. hughperkins/pytorch does not seem to be maintained anymore.

Resources