How to create a custom Dataset for YOLO v3 by LabelImg - opencv

I have used LabelImg Save as YOLO option to save my label in the form of .txt with the format like
6 0.333984 0.585938 0.199219 0.160156
But I want it to be in this format
path/to/img1.jpg 50,100,150,200,0 30,50,200,120,3
path/to/img2.jpg 120,300,250,600,2
How do I achieve that?

YOLO uses relative values rather than raw pixel values. In other words, the format is:
center-x center-y width height
Where center-x is the percentage of the width. In other words, if the image is 800px wide, and the center-x is at 400px, the center-x would be written as 0.5.
So your Labellmg values are already correct for training YOLO. Also, in YOLO v3 you do actually need them all to be separate .txt files, rather than in one big long file. So you're already good to go.

Disagree with the above answer. Not all implementations require percentage of width, center or height and the implementations of Yolo I used require a single train.txt file. A specific one for example https://github.com/qqwweee/keras-yolo3 requires exact format mentioned in the question but the 4 numbers are coordinates top right x, top right y, bottom right x, bottom right y followed by class number. Nevertheless you can use those text files and merge them together in a csv including the name of the image in a column as well. This can be done using glob or pandas library. You can do the width, height calculations in the csv for the whole column at once. Then add the path to the complete column at once and convert it into a text file and it will be ready for input.

Related

vips - How to create justified text?

I'm using the vips library for manipulating some images, specifically its Lua binding, lua-vips, and I'm trying to create justified text images. I guess there is no available function in vips to do this directly, so I was wondering how to come up with an algorithm for that.
My first idea was to parse the text to be justified using a known algorithm for justification, but using image width of each separate word instead of number of characters to measure empty space. Then, for each of the lines, place the text images of each word next to each other, adding necessary space as black pixels between them.
However I couldn't figure out how to check line height, as it isn't necessarily equal to the text image height, so I'm not sure I'm using a good approach.
git master libvips supports justification now, and this feature should be in the upcoming libvips 8.8 (due spring 2019).
Use it like this:
$ vips text x.png "hello world sdkj hsdfkj herqkjh wehf" --width 100 --justify
To make:
Or from Lua:
x = vips.Image.text("hello world sdkj hsdfkj herqkjh wehf", {width = 100, justify = true})

Saving figures in julia

Two questions regarding Winston package
How do you change dimensions of an image using savefig?
I have a 256x320 matrix that i use to plot an image using Winston package with the imagesc() command and then when i try to save it using savefig("picture.png","width","height") i get the same 512x512 pixels picture and i can't resize it, no matter how i change values : width, height.
Is it possible to export a FramedPlot chart to an image?
Regards
Mike
To answer the first part, savefig() takes the variable number of positional arguments and named key value pairs, so the right way of calling your function is,
julia>savefig("name.png","width",10,"height",20)

Finding coordinates of an image in a pdf to replace it with another one

I have a pdf which I would like to use as a template to create a new pdf. The goal is to place an image inside a particular placeholder rectangle in the original pdf. The creation of the original pdf is under my control but the placeholder rectangle/bounds might be anywhere in the pdf. I am thinking of using a dummy image(of same dimensions) as the placeholder rectangle in the original pdf.
The Prawn gem supports placing an image at a given absolute/relative position within a page.
The trouble is that since the rectangle or dummy-image could be anywhere in the original pdf, I don't know what values to use for the following
pdf.image "/path/to/image", :at => [x,y] prawn call
Is there a way to get the coordinates of an image in the original pdf. My primitive understanding tells me that one would have to render the entire pdf to know this. Is that right ? If yes, what would be a good way to render pdf in memory (headless) and get the co-ordinates of various pdf objects(like bounding rectangles, images, etc).
I am not limited by language/runtime here as long as I can trigger it programmatically.
What could be other approaches to this problem ?
Not an answer (e.g. I don't know the Ruby language), but in lieu of any others, and because I can't post a comment yet, here's what I think.
If conditions stated above are true (placeholder and replacement images are exactly same size + same color model e.g. RGB 24 bps) and you control template creation (therefore you can store placeholder inside PDF uncompressed), it can be as quick and dirty as raw replacement in a file treated as byte string. E.g. placeholder filled with red, then you search for pattern (0xFF0000) x W*H and replace it with raw image data. Which, of course, you can get any way you like, e.g.:
convert my_image.jpg RGB:- | ...
If this solution is too dirty or conditions not exact, then parse page content stream for construct like
width 0 0 height x y cm
/name Do
It's not cleanest, either, but for vast number of simple page descriptions x and y are the coordinates you are looking for.
Further, if you control template creation, why don't you store additional information inside pdf as e.g. custom keys in Info dictionary, and then read them back when using the template.

Word Openxml: how to get a text box the right size?

I'm using PHP to generate docx documents from a database. The generated document contains column charts which have labels attached (i.e. user shapes containing textboxes). In an attempt to get the textboxes to accommodate and display all of the text (i.e. it shouldn't be necessary for the user to resize a textbox to see all the text) my code calculates how many characters will fit into 3cm, adds linefeeds to the string as required and tells me how many lines of text are needed. I have:
<a:xfrm xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main">
<a:off x="1638276" y="1676399"/>
<a:ext cx="1257325" cy="'.(252000 * $labelLeftLines).'"/>
</a:xfrm>
which I believe should give me a text box around 3.5cm wide (extra .5 for the internal padding) and a height of .7cm multiplied by whatever is the value of $labelLeftLines. However, the text box always turns up as 3.cm wide by .86cm high, which only ever displays one line of text.
If I add in 'autofit':
<a:bodyPr xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main" vertOverflow="clip" wrap="square" rtlCol="0">
<a:spAutoFit/>
</a:bodyPr>
the generated file looks just the same, though, when I right click on the textbox to inspect the properties, 'autofit' is indeed applied. I have to uncheck it and recheck it to make it affect the textbox.
Any openXML gurus out there?
Hmm, some random floundering around revealed that the values I need to manipulate are here:
<cdr:relSizeAnchor xmlns:cdr="http://schemas.openxmlformats.org/drawingml/2006/chartDrawing">
<cdr:from>
<cdr:x>0.47</cdr:x>
<cdr:y>0.75</cdr:y>
</cdr:from>
<cdr:to>
<cdr:x>0.67</cdr:x>
<cdr:y>1</cdr:y>
</cdr:to>
Changing those values does actually change the size of the texbox, though I haven't a clue what units are being used. From 0.75 to 1 produces a height of 1.43cm.
One day I'll maybe be able to find my way around the doucmentation.

OpenCV: Generating points from image after thinning

I've ran in to an issue concerning generating floating point coordinates from an image.
The original problem is as follows:
the input image is handwritten text. From this I want to generate a set of points (just x,y coordinates) that make up the individual characters.
At first I used findContours in order to generate the points. Since this finds the edges of the characters it first needs to be ran through a thinning algorithm, since I'm not interested in the shape of the characters, only the lines or as in this case, points.
Input:
thinning:
So, I run my input through the thinning algorithm and all is fine, output looks good. Running findContours on this however does not work out so good, it skips a lot of stuff and I end up with something unusable.
The second idea was to generate bounding boxes (with findContours), use these bounding boxes to grab the characters from the thinning process and grab all none-white pixel indices as "points" and offset them by the bounding box position. This generates even worse output, and seems like a bad method.
Horrible code for this:
Mat temp = new Mat(edges, bb);
byte roi_buff[] = new byte[(int) (temp.total() * temp.channels())];
temp.get(0, 0, roi_buff);
int COLS = temp.cols();
List<Point> preArrayList = new ArrayList<Point>();
for(int i = 0; i < roi_buff.length; i++)
{
if(roi_buff[i] != 0)
{
Point tempP = bb.tl();
tempP.x += i%COLS;
tempP.y += i/COLS;
preArrayList.add(tempP);
}
}
Is there any alternatives or am I overlooking something?
UPDATE:
I overlooked the fact that I need the points (pixels) to be ordered. In the method above I simply do scanline approach to grabbing all the pixels. If you look at the 'o' for example, it would grab first the point on the left hand side, then the one on the right hand side. I would need them to be ordered by their neighbouring pixels since I want to draw paths with the points later on (outside of opencv).
Is this possible?
You should look into implementing your own connected components labelling. The concept is very simple: you scan the first line and assign unique labels to each horizontally connected strip of pixels. You basically check for every pixel if it is connected to its left neighbour and assign it either that neighbour's label or a new label. In the second row you do the same, but you also check against the pixels above it. Sometimes you need a label merge: two strips that were not connected in the previous row are joined in the current row. The way to deal with this is either to keep a list of label equivalences or use pointers to labels (so you can easily do a complete label change for an object).
This is basically what findContours does, but if you implement it yourself you have the freedom to go for 8-connectedness and even bridge a single-pixel or two-pixel gap. That way you get "almost-connected components labelling". It looks like you need this for the "w" in your example picture.
Once you have the image labelled this way, you can push all the pixels of a single label to a vector, and order them something like this. Find the top left pixel, push it to a new vector and erase it from the original vector. Now find the pixel in the original vector closest to it, push it to the new vector and erase from the original. Continue until all pixels have been transferred.
It will not be very fast this way, but it should be a start.

Resources