For an image recognition task one may use a pretrained convolutional neural network (like VGG or GoogLeNet). They usually work great - but with one assumption - only for RGB images. I'm looking for a good pretrained neural network which was trained on a monochromatic images. Does anyone know something like this?
Related
I have tried some neural network architectures for object classification and recognition. Such neural networks can distinguish cats from dogs, classify numbers from MNIST dataset, and recover private keys from public ones. A feature of such models is a small number of neurons in the last layer, and the input images are scaled to certain rather small sizes, for example, 224x224 pixels. Now I would like to try to solve more complex (for me) problems using a neural network. I'm interested in neural networks for image super resolution. For these purposes, I want to use autoencoders or a fully convolutional network like UNET. At the moment I don't understand how exactly to handle large size images. Is it necessary to feed the complete image to the input of the neural network, or is it necessary to process the image in parts, dividing it into a smaller tile and forming the final image from the fragments received at the output of the network? I think that in the first case, the network will become very large and will not be able to converge to good results in the learning process, and in the second case, artifacts will appear on the final image at the junctions of the resulting fragments. All the papers and articles I've read use small image sizes as examples.
But how then do generative adversarial networks (GAN) models, autoencoders for noise reduction, semantic segmentation, instance segmentation or image upscaling networks work? After all, the output of such a network should be a large image, for example, 2K, 4K, 8K resolution. Do I understand correctly that the number of input and output neurons in such networks is in the millions? How does this affect training time and convergence? Or are there some other ways to process large images with neural networks?
I am using the torchvision.models.detection.maskrcnn_resnet50_fpn Mask R-CNN neural network from the Pytorch library. This model can detect the classes from the COCO DATASET.
I can't figure how to train my network in order for it to detect an additional class, while also detecting the ones it was pre-trained on. Do you have any suggestions on this matter?
I recently implemented a simple Perceptron. This type of perceptron (composed of only one neuron giving binary information in output) can only solve problems where classes can be linearly separable.
I would like to implement a simple shape recognition in images of 8 by 8 pixels. I would like for example my neural network to be able to tell me if what I drawn is a circle, or not.
How to know if this problem has classes being linearly separable ? Because there is 64 inputs, can it still be linearly separable ? Can a simple perceptron solve this kind of problem ? If not, what kind of perceptron can ? I am a bit confused about that.
Thank you !
This problem, in a general sense, can not be solved by a single layer perception. In general other network structures such as convolutional neural networks are best for solving image classification problems, however given the small size of your images a multilayer perception may be sufficient.
Most problems are linearly separable, but not necessarily in 2 dimensions. Adding extra layers to a network allows it to transform data in higher dimensions so that it is linearly separable.
Look into multilayer perceptrons or convolutional neural networks. Examples of classification on the MNIST dataset might be helpful as well.
I am working on Convolution Neural Network using satellite images. I want to try Multi-scale problem. Can you please suggest me how can I make the multi-scale dataset. As the input of the CNN is fixed image is fixed (e.g. 100x100)
how can the images of different scale to train the system for multi-scale problem.
There is a similar question about YOLO9000:Multi-Scale Training?
Since there are only convolutional and pooling layers, so when you input multi-scale image, the weight parameter amount is same. Thus, multi-scale images can use a CNN model to train.
In different tasks, the methods are different. for example, in classification task, we can we can add a global pooling after the last layer; in detection task, the output is not fixed if we input multi-scale images.
How should I prepare input images for training in a CNN or any deep learning network? Do I have to prepare HOG or other filters alongside the original image? or just the images itself is enough?
I mean do I have to use threshold images as well or just the original image is sufficient?