How to use three channel images in FeedForward Neural Network? - machine-learning

As FeedForward Neural Networks (Not CNN) has only one Layer of Data i.e. It has the ability to process only GrayScale images. How do we make it process Color (RGB) 3-Channel images?

I would do the following methods.
Add pixel-wise values of RGB channels and give it as 1-channel image
to your Feedforward Neural Network.
Add pixel-wise values of RGB
channel, divide pixel values by 3 and give it as grey-scale image to
your Feedforward Neural Network.
Perpend 3 Linear layers to your
model, apply a non-linear function, add 3 Layers results Neuron wise
and then give it your as input to your Feedforward Neural Network.

Related

Best practice for large size image handling/processing with neural network

I have tried some neural network architectures for object classification and recognition. Such neural networks can distinguish cats from dogs, classify numbers from MNIST dataset, and recover private keys from public ones. A feature of such models is a small number of neurons in the last layer, and the input images are scaled to certain rather small sizes, for example, 224x224 pixels. Now I would like to try to solve more complex (for me) problems using a neural network. I'm interested in neural networks for image super resolution. For these purposes, I want to use autoencoders or a fully convolutional network like UNET. At the moment I don't understand how exactly to handle large size images. Is it necessary to feed the complete image to the input of the neural network, or is it necessary to process the image in parts, dividing it into a smaller tile and forming the final image from the fragments received at the output of the network? I think that in the first case, the network will become very large and will not be able to converge to good results in the learning process, and in the second case, artifacts will appear on the final image at the junctions of the resulting fragments. All the papers and articles I've read use small image sizes as examples.
But how then do generative adversarial networks (GAN) models, autoencoders for noise reduction, semantic segmentation, instance segmentation or image upscaling networks work? After all, the output of such a network should be a large image, for example, 2K, 4K, 8K resolution. Do I understand correctly that the number of input and output neurons in such networks is in the millions? How does this affect training time and convergence? Or are there some other ways to process large images with neural networks?

Training convolutional neural network (CNN) with images captured using Xbox Kinect in Keras

Can i train a convolutional neural network (CNN) on images captured with Xbox Kinect Sensor in Keras.?
Will using depth images rather than the ordinary RGB image increase the accuracy of the model that I intend to use to classify hand gestures.?
You can train CNNs with any signal...
In addition to an RGB image both Kinect versions, although working on different principles, yield a depth image. That means instead of intensity information each pixel encodes the distance from the object to the camera.
Processing intensity and depth images is pretty much the same thing. You can apply the same techniques to both.

sub patch generation mechanism for training fully convolutional neural network

I have a image set, consisting of 300 image pairs, i.e., raw image and mask image. A typical mask image is shown as follows. Each image has size of 800*800. I am trying to train a fully convolutional neural network model for this image set to perform the semantic segmentation. I am trying to generate the small patches (256*256) from the original images for constructing the training set. Are there any strategies recommended for this patch sampling process? Naturally, random sampling is a trivial approach. Here the area marked with yellow, foreground class, usually take 25% of the whole image area across the image set. It tends to reflect an imbalanced data set.
If you train a fully convolutional architecture, assuming 800x800 inputs and 25x25 outputs (after five 2x2 pooling layers, 25=800/2^5). Try to build the 25x25 outputs directly and train directly on them. You can add higher weights in the loss function for the "positive" labels to balance them with the "negative".
I definitely do not recommend sampling because it will be an expensive process and is not really fully convolutional.

how to make Multi-scale images to train the CNN

I am working on Convolution Neural Network using satellite images. I want to try Multi-scale problem. Can you please suggest me how can I make the multi-scale dataset. As the input of the CNN is fixed image is fixed (e.g. 100x100)
how can the images of different scale to train the system for multi-scale problem.
There is a similar question about YOLO9000:Multi-Scale Training?
Since there are only convolutional and pooling layers, so when you input multi-scale image, the weight parameter amount is same. Thus, multi-scale images can use a CNN model to train.
In different tasks, the methods are different. for example, in classification task, we can we can add a global pooling after the last layer; in detection task, the output is not fixed if we input multi-scale images.

Are convolution networks sensitive to the input features?

How should I prepare input images for training in a CNN or any deep learning network? Do I have to prepare HOG or other filters alongside the original image? or just the images itself is enough?
I mean do I have to use threshold images as well or just the original image is sufficient?

Resources