Pytorch/torchvision - modify images and labels of a Dataset object - image-processing

So I have this line of code to load a dataset of images from two classes called "0" and "1" for simplicity:
train_data = torchvision.datasets.ImageFolder(os.path.join(TRAIN_DATA_DIR), train_transform)
and then I prepare the loader to be used with my model in this way:
train_loader = torch.utils.data.DataLoader(train_data, TRAIN_BATCH_SIZE, shuffle=True)
So for now each image is associated to a class, what I want to do is take each image and apply a transformation to it between those two lines of code, let's say a rotation of one of four degrees: 0, 90, 180, 270, and add that info as an additional label of four classes: 0, 1, 2, 3. In the end I want the dataset to contain the rotated images and as their labels a list of two values: the class of the image and the applied rotation.
I tried this and there is no error, but the dataset remains unchanged if then I try to print the labels:
for idx,label in enumerate(train_data.targets):
train_data.targets[idx] = [label, 1]
Is there a nice way to do it by modifying directly train_data without requiring a custom dataset?

Is there a nice way to do it by modifying directly train_data without requiring a custom dataset?
No, there isn't. If you want to use datasets.ImageFolder, you have to accept its limited flexibility. In fact, ImageFolder is simply a subclass of DatasetFolder, which is pretty much what a custom dataset is. You can see in its source code the following section of __getItem__:
if self.transform is not None:
sample = self.transform(sample)
if self.target_transform is not None:
target = self.target_transform(target)
This makes what you want impossible since your expected transform should modify both the image and the target at the same time, which is done independently here.
So, start by making your subclass of Dataset similar to DatasetFolder, and simply implement your own transform which takes in an image and a target at the same time and returns their transformed values. This is just an example of a transform class you could have, which would then need to be composed into a single function call:
class RotateTransform(object):
def __call__(self, image, target):
# Rotate the image randomly and adjust the target accordingly
#...
return image, target
If that's too much trouble for your case, then the best option you have is what #jchaykow mentionned, which is to simply modify your files prior to running your code.

Related

How to create a custom Dataset for YOLO v3 by LabelImg

I have used LabelImg Save as YOLO option to save my label in the form of .txt with the format like
6 0.333984 0.585938 0.199219 0.160156
But I want it to be in this format
path/to/img1.jpg 50,100,150,200,0 30,50,200,120,3
path/to/img2.jpg 120,300,250,600,2
How do I achieve that?
YOLO uses relative values rather than raw pixel values. In other words, the format is:
center-x center-y width height
Where center-x is the percentage of the width. In other words, if the image is 800px wide, and the center-x is at 400px, the center-x would be written as 0.5.
So your Labellmg values are already correct for training YOLO. Also, in YOLO v3 you do actually need them all to be separate .txt files, rather than in one big long file. So you're already good to go.
Disagree with the above answer. Not all implementations require percentage of width, center or height and the implementations of Yolo I used require a single train.txt file. A specific one for example https://github.com/qqwweee/keras-yolo3 requires exact format mentioned in the question but the 4 numbers are coordinates top right x, top right y, bottom right x, bottom right y followed by class number. Nevertheless you can use those text files and merge them together in a csv including the name of the image in a column as well. This can be done using glob or pandas library. You can do the width, height calculations in the csv for the whole column at once. Then add the path to the complete column at once and convert it into a text file and it will be ready for input.

What does the "source hidden state" refer to in the Attention Mechanism?

The attention weights are computed as:
I want to know what the h_s refers to.
In the tensorflow code, the encoder RNN returns a tuple:
encoder_outputs, encoder_state = tf.nn.dynamic_rnn(...)
As I think, the h_s should be the encoder_state, but the github/nmt gives a different answer?
# attention_states: [batch_size, max_time, num_units]
attention_states = tf.transpose(encoder_outputs, [1, 0, 2])
# Create an attention mechanism
attention_mechanism = tf.contrib.seq2seq.LuongAttention(
num_units, attention_states,
memory_sequence_length=source_sequence_length)
Did I misunderstand the code? Or the h_s actually means the encoder_outputs?
The formula is probably from this post, so I'll use a NN picture from the same post:
Here, the h-bar(s) are all the blue hidden states from the encoder (the last layer), and h(t) is the current red hidden state from the decoder (also the last layer). One the picture t=0, and you can see which blocks are wired to the attention weights with dotted arrows. The score function is usually one of those:
Tensorflow attention mechanism matches this picture. In theory, cell output is in most cases its hidden state (one exception is LSTM cell, in which the output is the short-term part of the state, and even in this case the output suits better for attention mechanism). In practice, tensorflow's encoder_state is different from encoder_outputs when the input is padded with zeros: the state is propagated from the previous cell state while the output is zero. Obviously, you don't want to attend to trailing zeros, so it makes sense to have h-bar(s) for these cells.
So encoder_outputs are exactly the arrows that go from the blue blocks upward. Later in a code, attention_mechanism is connected to each decoder_cell, so that its output goes through the context vector to the yellow block on the picture.
decoder_cell = tf.contrib.seq2seq.AttentionWrapper(
decoder_cell, attention_mechanism,
attention_layer_size=num_units)

What is "Parameter" layer in caffe?

Recently I came across "Parameter" layer in caffe.
It seems like this layer exposes its internal parameter blob to "top".
What is this layer using for?
Can you give a usage example?
This layer was introduced in the pull request #2079, with the following description:
This layer simply holds a parameter blob of user-defined shape, and shares it as its single top.
which is exactly what you expected. This was introduced in context of the issue #1474, which basically proposes to treat parameters like normal bottom blobs. To show why this can be useful, consider the following example (taken from issue #1474, by #longjon):
The inner product layer calculates C = A * B, where C is the top blob (output), B is a bottom blob, and A must be a parameter blob. This is very restrictive, as this makes it impossible to use the inner product layer for multiplying the inner product between two bottom blobs, e.g. multiplying two input matrices. The issue #1474 suggests to make a fundamental change: make parameters independent of the layer. Instead, treat them like a normal bottom blob.
As a first step in that direction, the Parameter layer was introduced. This allows you to define a new parameter, which you can then feed into the bottom of another layer.
The counterpart - a method for feeding parameters into a layer as a bottom blob - is proposed in pull request #2166, which isn't merged yet, as of Jan. 2017.
Though this is not merged yet, you can still use Parameter layer to define new learnable parameters to feed into other layers as bottom blob.

Which is better approach to train images

Can we put the training data into separate directories for each class, and loop through the images in each directory, and set the labels based on the directory like if i put the positive images in one directory with 50 images and assign all that images to 1 and another directory with 50 negative images assign all that images -1 ? is this the right approach or this make the trainer untrain ?
string PosImagesDirectory="E:\\faces\\";
string NegImagesDirectory_2="D:\\not_faces\\";
I first loop through all the images of faces and assign them 1 and than loop through not_face and assign them -1
Or using the approach that in only have one directory like
string YourImagesDirectory_2="D:\\images\\";
and it contain both positive and negative images , and take images randomly , and i mark them number that which image is positive and which is negative but i am not clear about this approach .
I want to train my data through images using feature algorithms like SIFT/HOG/Bow
I don't understand your second approach. Do you mean to label them manually one image at a time when they are loaded?
I think that the first approach is ok. You do not need to label them manually, just iterate and label them.

OpenCV: Generating points from image after thinning

I've ran in to an issue concerning generating floating point coordinates from an image.
The original problem is as follows:
the input image is handwritten text. From this I want to generate a set of points (just x,y coordinates) that make up the individual characters.
At first I used findContours in order to generate the points. Since this finds the edges of the characters it first needs to be ran through a thinning algorithm, since I'm not interested in the shape of the characters, only the lines or as in this case, points.
Input:
thinning:
So, I run my input through the thinning algorithm and all is fine, output looks good. Running findContours on this however does not work out so good, it skips a lot of stuff and I end up with something unusable.
The second idea was to generate bounding boxes (with findContours), use these bounding boxes to grab the characters from the thinning process and grab all none-white pixel indices as "points" and offset them by the bounding box position. This generates even worse output, and seems like a bad method.
Horrible code for this:
Mat temp = new Mat(edges, bb);
byte roi_buff[] = new byte[(int) (temp.total() * temp.channels())];
temp.get(0, 0, roi_buff);
int COLS = temp.cols();
List<Point> preArrayList = new ArrayList<Point>();
for(int i = 0; i < roi_buff.length; i++)
{
if(roi_buff[i] != 0)
{
Point tempP = bb.tl();
tempP.x += i%COLS;
tempP.y += i/COLS;
preArrayList.add(tempP);
}
}
Is there any alternatives or am I overlooking something?
UPDATE:
I overlooked the fact that I need the points (pixels) to be ordered. In the method above I simply do scanline approach to grabbing all the pixels. If you look at the 'o' for example, it would grab first the point on the left hand side, then the one on the right hand side. I would need them to be ordered by their neighbouring pixels since I want to draw paths with the points later on (outside of opencv).
Is this possible?
You should look into implementing your own connected components labelling. The concept is very simple: you scan the first line and assign unique labels to each horizontally connected strip of pixels. You basically check for every pixel if it is connected to its left neighbour and assign it either that neighbour's label or a new label. In the second row you do the same, but you also check against the pixels above it. Sometimes you need a label merge: two strips that were not connected in the previous row are joined in the current row. The way to deal with this is either to keep a list of label equivalences or use pointers to labels (so you can easily do a complete label change for an object).
This is basically what findContours does, but if you implement it yourself you have the freedom to go for 8-connectedness and even bridge a single-pixel or two-pixel gap. That way you get "almost-connected components labelling". It looks like you need this for the "w" in your example picture.
Once you have the image labelled this way, you can push all the pixels of a single label to a vector, and order them something like this. Find the top left pixel, push it to a new vector and erase it from the original vector. Now find the pixel in the original vector closest to it, push it to the new vector and erase from the original. Continue until all pixels have been transferred.
It will not be very fast this way, but it should be a start.

Resources