I am using GoogleNet model for binary classification of images. Earlier, I was using the virtual machine and now I am using Ubuntu 14.04. Both are giving me different results. I tried to find out a lot where is the problem but couldn't pinpoint it.
I have trained two models separately one in Ubuntu 14.04 and another in the virtual machine. Both models are using CPU. cuDNN is not being used in both. Regarding BLAS library I am using default ATLAS.
Any suggestions would be of great help.
Since you started your training from scratch in both cases and you did not explicitly fixed random_seed parameter in your solver.prototxt it is very likely that caffe initialized your model with different random weights for each of the two training processes. Starting from different points is very likely to end with differently trained models.
If you are concerned about possible differences in caffe between the two architectures, try repeat the training but with the same random_seed parameter in solver.prototxt.
Related
If I run a YoloV4 model with leaky relu activations on my CPU with 256x256 RGB images in OpenCV with an OpenVINO backend, inference time plus non-max suppression is about 80ms. If, on the other hand, I convert my model to an IR following https://github.com/TNTWEN/OpenVINO-YOLOV4, which is linked to from https://github.com/AlexeyAB/darknet, inference time directly using the OpenVINO inference engine is roughly 130ms, which does not even include non-max suppression, which is quite slow when implemented naively in python.
Unfortunately, OpenCV does not offer all of the control I would like for the models and inference schemes I want to try (e.g. I want to change batch size, import models from YOLO repositories other than darknet, etc.)
What is the magic that allows OpenCV with OpenVINO backend to be so much faster?
Inference performance is application dependent and subject to many variables such as model size, model architecture, processors, etc.
This benchmark result shows performance results of running yolo-v4-tf on multiple Intel® CPUs, GPUs and VPUs.
For example, you may use an 11th Gen Intel® Core™ i7-11850HE # 2.60GHz CPU to run yolo-v4-tf, which gives 80.4 ms inferencing time.
yolo-v4-tf and yolo-v4-tiny-tf are public pre-trained models that you can use for learning and demo purposes or for developing deep learning software. You may download these models using Model Downloader.
I have a Keras model which is doing inference on a Raspberry Pi (with a camera). The Raspberry Pi has a really slow CPU (1.2.GHz) and no CUDA GPU so the model.predict() stage is taking a long time (~20 seconds). I'm looking for ways to reduce that by as much as possible. I've tried:
Overclocking the CPU (+ 200 MhZ) and got a few extra seconds of performance.
Using float16's instead of float32's.
Reducing the image input size as much as possible.
Is there anything else I can do to increase the speed during inference? Is there a way to simplify a model.h5 and take a drop in accuracy? I've had success with simpler models, but for this project I need to rely on an existing model so I can't train from scratch.
VGG16 / VGG19 architecture is very slow since it has lots of parameters. Check this answer.
Before any other optimization, try to use a simpler network architecture.
Google's MobileNet seems like a good candidate since it's implemented on Keras and it was designed for more limited devices.
If you can't use a different network, you may compress the network with pruning. This blog post specifically do pruning with Keras.
Maybe OpenVINO will help. OpenVINO is an open-source toolkit for network inference, and it optimizes the inference performance by, e.g., graph pruning and fusing some operations. The ARM support is provided by the contrib repository.
Here are the instructions on how to build an ARM plugin to run OpenVINO on Raspberry Pi.
Disclaimer: I work on OpenVINO.
Some users might see this as opinion-based-question but if you look closely, I am trying to explore use of Caffe as a purely testing platform as opposed to currently popular use as training platform.
Background:
I have installed all dependencies using Jetpack 2.0 on Nvidia TK1.
I have installed caffe and its dependencies successfully.
The MNIST example is working fine.
Task:
I have been given a convnet with all standard layers. (Not an opensource model)
The network weights and bias values etc are available after training. The training has not been done via caffe. (Pretrained Network)
The weights and bias are all in the form of MATLAB matrices. (Actually in a .txt file but I can easily write code to get them to be matrices)
I CANNOT do training of this network with caffe and must used the given weights and bias values ONLY for classification.
I have my own dataset in the form of 32x32 pixel images.
Issue:
In all tutorials, details are given on how to deploy and train a network, and then use the generated .proto and .caffemodel files to validate and classify. Is it possible to implement this network on caffe and directly use my weights/bias and training set to classify images? What are the available options here? I am a caffe-virgin so be kind. Thank you for the help!
The only issue here is:
How to initialize caffe net from text file weights?
I assume you have a 'deploy.prototxt' describing the net's architecture (layer types, connectivity, filter sizes etc.). The only issue remaining is how to set the internal weights of caffe.Net to pre-defined values saved as text files.
You can get access to caffe.Net internals, see net surgery tutorial on how this can be done in python.
Once you are able to set the weights according to your text file, you can net.save(...) the new weights into a binary caffemodel file to be used from now on. You do not have to train the net if you already have trained weights, and you can use it for generating predictions ("test").
I have an old checkpoint (model specification) from a Cuda Convnet model trained by someone else a couple years ago where the training data is no longer available. I would like to find a way to convert this exact model to a Caffe model file. Is there a tool (which is currently available and supported) which does this? I would still interested even if the conversion is only to another ML framework which can export Caffe models (i.e. using Theano, Torch7, etc.) as a bridge.
I have a large dataset and I'm trying to build a DAgger classifier for it.
As you know, in the training time, I need to run the initial learned classifier on training instances (predict them), one instance at a time.
Libsvm is too slow even for the initial learning.
I'm using OLL but that needs each instance to be written to a file and then run the test code on it and get the prediction, this involves many disk I/O.
I have considered working with vowpal_wabbit (yet I'm not sure if it will help with disk I/O) but I don't have the permission to install it on the cluster I'm working with.
Liblinear is too slow and again needs disk I/O I believe.
What are the other alternatives I can use?
I recommend trying Vowpal Wabbit (VW). If Boost (and gcc or clang) is installed on the cluster, you can simply compile VW yourself (see the Tutorial). If Boost is not installed, you can compile it yourself as well.
VW contains more modern algorithms than OLL. Moreover, it contains several structured prediction algorithms (SEARN, DAgger) and also a C++ and Python interface to it. See an iPython notebook tutorial.
As for the disk I/O: for one-pass learning, you can pipe the input data directly to vw (cat data | vw) or run vw --daemon. For multi-pass learning, you must use cache file (the input data in binary fast-to-load format), which takes some time to create (during the first pass, unless it already existed), but the subsequent passes are much faster because of the binary format.