Are convolution networks sensitive to the input features? - image-processing

How should I prepare input images for training in a CNN or any deep learning network? Do I have to prepare HOG or other filters alongside the original image? or just the images itself is enough?
I mean do I have to use threshold images as well or just the original image is sufficient?

Related

Best practice for large size image handling/processing with neural network

I have tried some neural network architectures for object classification and recognition. Such neural networks can distinguish cats from dogs, classify numbers from MNIST dataset, and recover private keys from public ones. A feature of such models is a small number of neurons in the last layer, and the input images are scaled to certain rather small sizes, for example, 224x224 pixels. Now I would like to try to solve more complex (for me) problems using a neural network. I'm interested in neural networks for image super resolution. For these purposes, I want to use autoencoders or a fully convolutional network like UNET. At the moment I don't understand how exactly to handle large size images. Is it necessary to feed the complete image to the input of the neural network, or is it necessary to process the image in parts, dividing it into a smaller tile and forming the final image from the fragments received at the output of the network? I think that in the first case, the network will become very large and will not be able to converge to good results in the learning process, and in the second case, artifacts will appear on the final image at the junctions of the resulting fragments. All the papers and articles I've read use small image sizes as examples.
But how then do generative adversarial networks (GAN) models, autoencoders for noise reduction, semantic segmentation, instance segmentation or image upscaling networks work? After all, the output of such a network should be a large image, for example, 2K, 4K, 8K resolution. Do I understand correctly that the number of input and output neurons in such networks is in the millions? How does this affect training time and convergence? Or are there some other ways to process large images with neural networks?

Training convolutional neural network (CNN) with images captured using Xbox Kinect in Keras

Can i train a convolutional neural network (CNN) on images captured with Xbox Kinect Sensor in Keras.?
Will using depth images rather than the ordinary RGB image increase the accuracy of the model that I intend to use to classify hand gestures.?
You can train CNNs with any signal...
In addition to an RGB image both Kinect versions, although working on different principles, yield a depth image. That means instead of intensity information each pixel encodes the distance from the object to the camera.
Processing intensity and depth images is pretty much the same thing. You can apply the same techniques to both.

sub patch generation mechanism for training fully convolutional neural network

I have a image set, consisting of 300 image pairs, i.e., raw image and mask image. A typical mask image is shown as follows. Each image has size of 800*800. I am trying to train a fully convolutional neural network model for this image set to perform the semantic segmentation. I am trying to generate the small patches (256*256) from the original images for constructing the training set. Are there any strategies recommended for this patch sampling process? Naturally, random sampling is a trivial approach. Here the area marked with yellow, foreground class, usually take 25% of the whole image area across the image set. It tends to reflect an imbalanced data set.
If you train a fully convolutional architecture, assuming 800x800 inputs and 25x25 outputs (after five 2x2 pooling layers, 25=800/2^5). Try to build the 25x25 outputs directly and train directly on them. You can add higher weights in the loss function for the "positive" labels to balance them with the "negative".
I definitely do not recommend sampling because it will be an expensive process and is not really fully convolutional.

how to make Multi-scale images to train the CNN

I am working on Convolution Neural Network using satellite images. I want to try Multi-scale problem. Can you please suggest me how can I make the multi-scale dataset. As the input of the CNN is fixed image is fixed (e.g. 100x100)
how can the images of different scale to train the system for multi-scale problem.
There is a similar question about YOLO9000:Multi-Scale Training?
Since there are only convolutional and pooling layers, so when you input multi-scale image, the weight parameter amount is same. Thus, multi-scale images can use a CNN model to train.
In different tasks, the methods are different. for example, in classification task, we can we can add a global pooling after the last layer; in detection task, the output is not fixed if we input multi-scale images.

What method should I use for image classification?

I am working on an image classification problem where I should be able to classify an image as say a watch with a rectangular dial/ a watch with a circular dial/ a shoe etc..
I have looked into Content Based Image Retrieval (using Dense SIFT for feature detection and Bag of Words + SVM for classification) and am currently exploring Convolutional Neural Networks (Unsupervised Feature Learning).
My problem is that the image is a photo taken from a camera and hence contains other elements (not there in training data). For example, my training data for watches with rectangular dials contains only the watch whereas my test image has the watch and a portion of the hand as well or my test image of a shoe has the shoe oriented in a different direction (when compared with the training data for shoes).
How do I address this issue?
Is CNN (Unsupervised Feature Learning) the correct approach or should I stick to D-SIFT + BOW + SVM?
How do I collect appropriate training data?
Thank You

Resources