I wish to use transfer learning to process images, and my images have different sizes.
I think in general convolutional layers can take variable input size, but fully connected layers can only take input of specific size.
However, the Keras implementation of VGG-16 or ResNet50 can take any image size larger than 32x32, although they do have fully connected layers. I wonder how it is done to get fix fully connected layer size for different image dimensions?
Thanks very much!
What you are saying is misleading, you can build a VGG/ResNet Keras model with any input image size larger than 32x32, but once the model is built, you can't change the input size, and that is usually the problem. So the model cannot really take variable sized images.
Related
I want to create an image classifier model using CreateML. I have images available in very high resolution but that comes at a cost in terms of data traffic and processing time, so I prefer to use images as small as possible.
The docs say that:
The images (...) don’t have to be a particular size, nor do they need to be the same size as each other. However, it’s best to use images that are at least 299 x 299 pixels.
I trained a test model with images of various sizes > 299x299px and the model parameters in Xcode show the dimension 299x299px which I understand is the normalized image size:
This dimension seems to be determined by the CreateML Image Classifier algorithm and is not configurable.
Does it make any sense to train the model with images that are larger than 299x299px?
If the image dimensions are not a square (same height as width) will the training image be center cropped to 299x299px during the process of normalization, or will the parts of the image that are outside the square influence the model?
From reading and experience training image classification models (but no direct inside Apple knowledge), it appears that Create ML scales incoming images to fit a square image 299 x 299. You would be wasting disk space and preprocessing time by providing larger images.
The best documentation I can find is to look at the mlmodel file created by CreateML for an image classifier template. The input is explicitly defined as color image 299 x 299. No option to change that setting in the stand-alone app.
Here is some documentation (applies to Classifer template which uses ScenePrint by default):
https://developer.apple.com/documentation/createml/mlimageclassifier/featureextractortype/sceneprint_revision
There may be a Center/Crop option in the Playground workspace, but I never found it in the standalone app version of Create ML.
Given a dataset of images with very big resolution, 8000x6000, which is the best method to train a neural network? For image segmentation and not only.
As the title says, should I slice them for 800x600 each patch of image or should I resize the whole image to 800x600?
In the first case I'd lose the content aware and in second case I'd lose a lot of details but it will be content aware of the full image.
Thank you!
Resizing is a better option. Often your neural network does not require all the details from a super high resolution, and a smaller resolution suffices. The content of the image, on the other hand, is important and it may become problematic depending on what part of the image you crop it.
I am using the standard AlexNet model with image data size 3*224*224. I artificially construct these images, which consists of numerous sub-images.
I am trying to recognize small, simple sub-images (100*2) that might be at side or corner of 224*224 space.
Is AlexNet likely to handle this well? Or should sub-image really take-up most of the 224*224?
After some testing, I have found that AlexNet does not see very small objects well within image. I suspect that this is due to stride size.
i'm work on graduation project for image forgery detection using CNN , Most of the paper i read before feed the data set to the network they Down scale the image size, i want to know how Does this process effect image information ?
Images are resized/rescaled to a specific size for a few reasons:
(1) It allows the user to set the input size to their network. When designing a CNN you need to know the shape (dimensions) of your data at each step; so, having a static input size is an easy way to make sure your network gets data of the shape it was designed to take.
(2) Using a full resolution image as the input to the network is very inefficient (super slow to compute).
(3) For most cases the features desired to be extracted/learned from an image are also present when downsampling the image. So in a way resizing an image to a smaller size will denoise the image, filtering out much of the unimportant features within the image for you.
Well you change the images size. Of course it changes it's information.
You cannot reduce image size without omitting information. Simple case: Throw away every second pixel to scale image to 50%.
Scaling up adds new pixels. In its simplest form you duplicate pixels, creating redundant information.
More complex solutions create new pixels (less or more) by averaging neighbouring pixels or interpolating between them.
Scaling up is reversible. It doesn't create nor destroy information.
Scaling down divides the amount of information by the square of the downscaling factor*. Upscaling after downscaling results in a blurred image.
(*This is true in a first approximation. If the image doesn't have high frequencies, they are not lost, hence no loss of information.)
Image showing required content transfer between source and target images
Essentially is there a way to grow an image patch defined by mask and keep it realistic?
For your first question, you are talking about image style transfer. In that case, CNNs may help you.
For the second, if I understand correctly, by growing you mean introducing variations in the image patch while keeping it realistic. If that's the goal, you may use GANs for generating images, provided you have a reasonable sized dataset to train with:
Image Synthesis with GANs
Intuitively, conditional GANs model the joint distribution of the input dataset (which in your case, are images you want to imitate) and can draw new samples (images) from the learned distribution, thereby allowing you to create more images having similar content.
Pix2Pix is the open-source code of a well-known paper that you can play around to generate images. Specifically, let X be your input image and Y be a target image. You can train the network and feed X to observe the output O of the generator. Thereafter, by tweaking the architecture a bit or by changing the skip connections (read the paper) train again and you can generate variety in the output images O.
Font Style Transfer is an interesting experiment with text on images (rather than image on image, as in your case).