I am looking into reversing the transformations that ImageDataGenerator performs on my input image. I am able to get the transformations performed along with the values of the transformations.
Is there a way I can use these values to reverse the transformations using these values?
Related
I am quite new in the deep learning game, I was wondering why do we flatten the last layer of the encoder in a VAE and then give the flattened output to a linear layer, which then approximates a location and scale parameter for the prior? Can't we just split the output of a convolutional layer and get the location and scale from here directly, or do the spatial information captured by a convolution mess up the scale and location?
Thanks a lot!
Why do we flatten the last layer of the encoder in a VAE?
There isn't really a good reason other than to make it convenient for printing or reporting. If right before flattening the encoder is of shape [BatchSize,2,2,32] , flattening it to [BatchSize,128] just makes it handy to just list all 128 encoded values per sample. When the decoder then reshapes it to [BatchSize,2,2,32] all the spacial information is put back where it was. No spacial information was lost.
Of course, one may decide to use the encoder of a trained VAE as an image feature extractor. This is actually very useful when we have a LOT of unlabeled images to train a VAE with, but only a few labeled images. After training the VAE on the large unlabeled image set, the encoder effectively becomes a feature extractor. We can then feed the feature extractor into a dense layer whos purpose is to learn the labels. Having the encoder output a flattened data set is very useful in this situation.
Background
I am using an embedding layer for a categorical data column in Keras.
My understanding is that an embedding layer is simply a matrix which consist of trainable vectors, each mapped to an index.
My problem
After training finished, I want to add a new index-vector pair to the embedding matrix.
(The vector is generated by me, no training involved at this stage.)
How do I do this?
I wish to use the newly added embedding in predictions as well.
Code
keras.layers.Embedding(number_of_categories, embedding_size, input_length=1)
I am especially stuck, since number of categories are coded in the model architecture. Is there any way around this?
I'm trying to use a pretrained VGG16 as an object localizer in Tensorflow on ImageNet data. In their paper, the group mentions that they basically just strip off the softmax layer and either toss on a 4D/4000D fc layer for bounding box regression. I'm not trying to do anything fancy here (sliding windows, RCNN), just get some mediocre results.
I'm sort of new to this and I'm just confused about the preprocessing done here for localization. In the paper, they say that they scale the image to 256 as its shortest side, then take the central 224x224 crop and train on this. I've looked all over and can't find a simple explanation on how to handle localization data.
Questions: How do people usually handle the bounding boxes here?...
Do you use something like the tf.sample_distorted_bounding_box command, and then rescale the image based on that?
Do you just rescale/crop the image itself, and then interpolate the bounding box with the transformed scales? Wouldn't this result in negative box coordinates in some cases?
How are multiple objects per image handled?
Do you just choose a single bounding box from the beginning ,crop to that, then train on this crop?
Or, do you feed it the whole (centrally cropped) image, and then try to predict 1 or more boxes somehow?
Does any of this generalize to the Detection or segmentation (like MS-CoCo) challenges, or is it completely different?
Anything helps...
Thanks
Localization is usually performed as an intersection of sliding windows where the network identifies the presence of the object you want.
Generalizing that to multiple objects works the same.
Segmentation is more complex. You can train your model on a pixel mask with your object filled, and you try to output a pixel mask of the same size
Can the Keras deal with input images with different size? For example, in the fully convolutional neural network, the input images can have any size. However, we need to specify the input shape when we create a network by Keras. Therefore, how can we use Keras to deal with different input size without resizing the input images to the same size? Thanks for any help.
Yes.
Just change your input shape to shape=(n_channels, None, None).
Where n_channels is the number of channels in your input image.
I'm using Theano backend though, so if you are using tensorflow you might have to change it to (None,None,n_channels)
You should use:
input_shape=(1, None, None)
None in a shape denotes a variable dimension. Note that not all layers
will work with such variable dimensions, since some layers require
shape information (such as Flatten).
https://github.com/fchollet/keras/issues/1920
For example, using keras's functional API your input layer would be:
For a RGB dataset
inp = Input(shape=(3,None,None))
For a Gray dataset
inp = Input(shape=(1,None,None))
Implementing arbitrarily sized input arrays with the same computational kernels can pose many challenges - e.g. on a GPU, you need to know how big buffers to reserve, and more weakly how much to unroll your loops, etc. This is the main reason that Keras requires constant input shapes, variable-sized inputs are too painful to deal with.
This more commonly occurs when processing variable-length sequences like sentences in NLP. The common approach is to establish an upper bound on the size (and crop longer sequences), and then pad the sequences with zeros up to this size.
(You could also include masking on zero values to skip computations on the padded areas, except that the convolutional layers in Keras might still not support masked inputs...)
I'm not sure if for 3D data structures, the overhead of padding is not prohibitive - if you start getting memory errors, the easiest workaround is to reduce the batch size. Let us know about your experience with applying this trick on images!
Just use None while specifying input shape. But I still do not know how to pass different-shaped images into fit function.
I have some images of tags with shapes on them (circle, rectangle and blank). After processing the images with median blur and Gabor filters I can eliminate most of the effect that variable illumination had on the images and they look like this:
I've tried training an SVM using HOG, LDA, PCA and the pixels themselves but I can barely get past 40-60% accuracy. What I really want to do is use in the information in the shapes of the images. I had Fourier descriptors recommended to me, and while I've found a good tutorial about applying Fourier transform to images using NumPy and OpenCV, I'm not sure how to go about extracting Fourier descriptors from an image and then identifying the ones that are unique to the different shapes. Does anyone know how to do this or can recommend an alternative technique to get features from these images that would allow an SVM to distinguish between them?