I have just started to study ConvNets and I have a question on how to train them.
Actually what I have understood is that one can use a CNN to extract features from images, even on pictures that are different from those used to train the network.
Since I would like to use a ConvNet (such as Vgg or Inception), I would also like to start from a trained network and continue its training in order to improve the weights using my dataset.
The problem is that I have 3D data and the most powerful CNNs are designed to use 2D data! Is there a smart way to feed the 2D ConvNet with 3D data which is not a raw averaging of the slices of the 3D images?
Thank you!
You can convert your 3D data to 2D by just concatenating the slices (you put them together in one large 2D image).
Related
I need to separate images into three categories: vectors, photos with vectors and pure photos. The classifying needs to happen in "real-time" so this leads me to my question: what would be good algorithm for classifying these types of images from accuracy/performance tradeoff perspective?
Images are not my speciality so all pointers are appreciated.
For performance heavy side of things I tried tensorflow with inception to get a baseline however the model reached only ~88% accuracy. I conclude this is due Inception being a photo classifying model and something like a solid color vector doesn't really fit into its world.
There must be easier/more lightweight solution than deep learning to detect such a different type of images?
EXAMPLES:
Photo:
Photo with vector:
Vector:
I trained up a very vanilla CNN using keras/theano that does a pretty good job of detecting whether a small (32X32) portion of an image contains a (relatively simple) object of type A or B (or neither). The output is an array of three numbers [prob(neither class), prob(A), prob(B)]. Now, I want to take a big image (512X680, methinks) and sweep across the image, running the trained model on each 32X32 sub-image to generate a feature map that's 480X648, at each point consisting of a 3-vector of the aforementioned probabilities. Basically, I want to use my whole trained CNN as a (nonlinear) filter with three-dimensional output. At the moment, I am cutting each 32X32 out of the image one at a time and running the model on it, then dropping the resulting 3-vectors in a big 3X480X648 array. However, this approach is very slow. Is there a faster/better way to do this?
I am a a beginner in machine learning and currently trying to learn about deep learning and convNets. I have been following the tutorials on tensorflow.org and have done the first two tutorials. But so far I have done examples of 2d input vectors (images).
My ultimate goal is to be able to train a CNN to be able recognise peaks in a spectra(which is 1d vector). Is there any tutorials/example code/suggestion as to how I should start approaching this problem?
There is no actual difference, simply your convolutional kernels will be rectangular instead of square, of size 1xK (as opposed to typical KxK). Besides that there is no much of the difference.
Let's say I have a perfect 3D model of the rigid object I am looking for.
I want to find this object in a scene image using the histogram of oriented gradients (HOG) algorithm.
One way to train my SVM would be to render this object on top of a bunch of random backgrounds, in order to generate the positive training examples.
But, is there a faster, more direct way to use the model to train the SVM? One that doesn't involve rendering it multiple times?
I need to prepare training data which I will then use with OpenCV's cascaded classifier. I understand that for training data I'll need to provide rectangular images as samples with aspect ratios that correspond to the -w and -h parameters in OpenCV's training commands.
I was fine with this idea, but then I saw web-based annotation tool LabelMe.
People have labelled in LabelMe using complex polygons!
Can these polygons be somehow used in cascaded training?
Wouldn't using irregular polygons improve the classification results?
If not, then what is the use of the complex polygons that outline objects in LabelMe'd images?
Data sets annotated with LabelMe are used for many different purposes. Some of them, like image segmentation, require tight boundaries, rather than bounding boxes.
On the other hand, the cascade classifier in OpenCV is designed to classify rectangular image regions. It is then used as part of a sliding-window object detector, which also works with bounding boxes.
Whether tight boundaries help improve object detection is an interesting question. There is evidence that the background pixels caught by the bounding box actually help the classification.