Opencv: Selecting pixel with mouse - opencv

I'm using the cv2.setMouseCallback function to select a pixel of an image shown in a window.
The callback function returns an x and y integers that represent the position of the pixel in the image, but paying attention to it's behaviour it seems to me that doesn't return the pixel you are over but the rounded value of a point in an imaginary axis.
If you look at the two first images, in both the mouse is over the pixel 0,0 but the result is diferent if you move closer to other pixels.
Ok. I know in a real image the error is insignificant, but is this a bug?
cv2.namedWindow('image',cv2.WINDOW_NORMAL) # Can be resized
cv2.resizeWindow('image', self.w, self.h) #Reasonable size window
cv2.setMouseCallback('image',self.mouse_callback) #Mouse callback
while(not self.finished):
cv2.imshow('image',self.img)
k = cv2.waitKey(4) & 0xFF
if k == 27:
breakim
cv2.destroyAllWindows()
# mouse callback function
def mouse_callback(self,event,x,y,flags,param):
if event == cv2.EVENT_LBUTTONDOWN:
print x, y

The bug should be fixed in future releases.
http://code.opencv.org/issues/3409 if interested.

Related

Placing a shape inside another shape using opencv

I have two images and I need to place the second image inside the first image. The second image can be resized, rotated or skewed such that it covers a larger area of the other images as possible. As an example, in the figure shown below, the green circle need to be placed inside the blue shape:
Here the green circle is transformed such that it covers a larger area. Another example is shown below:
Note that there may be some multiple results. However, any similar result is acceptable as shown in the above example.
How do I solve this problem?
Thanks in advance!
I tested the idea I mentioned earlier in the comments and the output is almost good. It may be better but it takes time. The final code was too much and it depends on one of my old personal projects, so I will not share. But I will explain step by step how I wrote such an algorithm. Note that I have tested the algorithm many times. Not yet 100% accurate.
for N times do this:
1. Copy from shape
2. Transform it randomly
3. Put the shape on the background
4-1. It is not acceptable if the shape exceeds the background. Go to
the first step.
4.2. Otherwise we will continue to step 5.
5. We calculate the length, width and number of shape pixels.
6. We keep a list of the best candidates and compare these three
parameters (W, H, Pixels) with the members of the list. If we
find a better item, we will save it.
I set the value of N to 5,000. The larger the number, the slower the algorithm runs, but the better the result.
You can use anything for Transform. Mirror, Rotate, Shear, Scale, Resize, etc. But I used warpPerspective for this one.
im1 = cv2.imread(sys.path[0]+'/Back.png')
im2 = cv2.imread(sys.path[0]+'/Shape.png')
bH, bW = im1.shape[:2]
sH, sW = im2.shape[:2]
# TopLeft, TopRight, BottomRight, BottomLeft of the shape
_inp = np.float32([[0, 0], [sW, 0], [sW, sH], [0, sH]])
cx = random.randint(5, sW-5)
ch = random.randint(5, sH-5)
o = 0
# Random transformed output
_out = np.float32([
[random.randint(-o, cx-1), random.randint(1-o, ch-1)],
[random.randint(cx+1, sW+o), random.randint(1-o, ch-1)],
[random.randint(cx+1, sW+o), random.randint(ch+1, sH+o)],
[random.randint(-o, cx-1), random.randint(ch+1, sH+o)]
])
# Transformed output
M = cv2.getPerspectiveTransform(_inp, _out)
t = cv2.warpPerspective(shape, M, (bH, bW))
You can use countNonZero to find the number of pixels and findContours and boundingRect to find the shape size.
def getSize(msk):
cnts, _ = cv2.findContours(msk, cv2.RETR_TREE, cv2.CHAIN_APPROX_NONE)
cnts.sort(key=lambda p: max(cv2.boundingRect(p)[2],cv2.boundingRect(p)[3]), reverse=True)
w,h=0,0
if(len(cnts)>0):
_, _, w, h = cv2.boundingRect(cnts[0])
pix = cv2.countNonZero(msk)
return pix, w, h
To find overlaping of back and shape you can do something like this:
make a mask from back and shape and use bitwise methods; Change this section according to the software you wrote. This is just an example :)
mskMix = cv2.bitwise_and(mskBack, mskShape)
mskMix = cv2.bitwise_xor(mskMix, mskShape)
isCandidate = not np.any(mskMix == 255)
For example this is not a candidate answer; This is because if you look closely at the image on the right, you will notice that the shape has exceeded the background.
I just tested the circle with 4 different backgrounds; And the results:
After 4879 Iterations:
After 1587 Iterations:
After 4621 Iterations:
After 4574 Iterations:
A few additional points. If you use a method like medianBlur to cover the noise in the Background mask and Shape mask, you may find a better solution.
I suggest you read about Evolutionary Computation, Metaheuristic and Soft Computing algorithms for better understanding of this algorithm :)

Pixel-perfect collisions in Monogame, with float positions

I want to detect pixel-perfect collisions between 2 sprites.
I use the following function which I have found online, but makes total sense to me.
static bool PerPixelCollision(Sprite a, Sprite b)
{
// Get Color data of each Texture
Color[] bitsA = new Color[a.Width * a.Height];
a.Texture.GetData(0, a.CurrentFrameRectangle, bitsA, 0, a.Width * a.Height);
Color[] bitsB = new Color[b.Width * b.Height];
b.Texture.GetData(0, b.CurrentFrameRectangle, bitsB, 0, b.Width * b.Height);
// Calculate the intersecting rectangle
int x1 = (int)Math.Floor(Math.Max(a.Bounds.X, b.Bounds.X));
int x2 = (int)Math.Floor(Math.Min(a.Bounds.X + a.Bounds.Width, b.Bounds.X + b.Bounds.Width));
int y1 = (int)Math.Floor(Math.Max(a.Bounds.Y, b.Bounds.Y));
int y2 = (int)Math.Floor(Math.Min(a.Bounds.Y + a.Bounds.Height, b.Bounds.Y + b.Bounds.Height));
// For each single pixel in the intersecting rectangle
for (int y = y1; y < y2; ++y)
{
for (int x = x1; x < x2; ++x)
{
// Get the color from each texture
Color colorA = bitsA[(x - (int)Math.Floor(a.Bounds.X)) + (y - (int)Math.Floor(a.Bounds.Y)) * a.Texture.Width];
Color colorB = bitsB[(x - (int)Math.Floor(b.Bounds.X)) + (y - (int)Math.Floor(b.Bounds.Y)) * b.Texture.Width];
if (colorA.A != 0 && colorB.A != 0) // If both colors are not transparent (the alpha channel is not 0), then there is a collision
{
return true;
}
}
}
//If no collision occurred by now, we're clear.
return false;
}
(all the Math.floor are useless, I copied this function from my current code where I'm trying to make it work with floats).
It reads the color of the sprites in the rectangle portion that is common to both sprites.
This actually works fine, when I display the sprites at x/y coordinates where x and y are int's (.Bounds.X and .Bounds.Y):
View an example
The problem with displaying sprites at int's coordinates is that it results in a very jaggy movement in diagonals:
View an example
So ultimately I would like to not cast the sprite position to int's when drawing them, which results in a smooth(er) movement:
View an example
The issue is that the PerPixelCollision works with ints, not floats, so that's why I added all those Math.Floor. As is, it works in most cases, but it's missing one line and one row of checking on the bottom and right (I think) of the common Rectangle because of the rounding induced by Math.Floor:
View an example
When I think about it, I think it makes sense. If x1 is 80 and x2 would actually be 81.5 but is 81 because of the cast, then the loop will only work for x = 80, and therefore miss the last column (in the example gif, the fixed sprite has a transparent column on the left of the visible pixels).
The issue is that no matter how hard I think about this, or no matter what I try (I have tried a lot of things) - I cannot make this work properly. I am almost convinced that x2 and y2 should have Math.Ceiling instead of Math.Floor, so as to "include" the last pixel that otherwise is left out, but then it always gets me an index out of the bitsA or bitsB arrays.
Would anyone be able to adjust this function so that it works when Bounds.X and Bounds.Y are floats?
PS - could the issue possibly come from BoxingViewportAdapter? I am using this (from MonoExtended) to "upscale" my game which is actually 144p.
Remember, there is no such thing as a fractional pixel. For movement purposes, it completely makes sense to use floats for the values and cast them to integer pixels when drawn. The problem is not in the fractional values, but in the way that they are drawn.
The main reason the collisions are not appearing to work correctly is the scaling. The colors for the new pixels in between the diagonals get their colors by averaging* the surrounding pixels. The effect makes the image appear larger than the original, especially on the diagonals.
*there are several methods that may be used for the scaling, bi-cubic and linear are the most common.
The only direct(pixel perfect) solution is to compare the actual output after scaling. This requires rendering the entire screen twice, and requires the scale factor more computations. (not recommended)
Since you are comparing the non-scaled images your collisions appear to be off.
The other issue is movement speed. If you are moving faster than one pixel per Update(), detecting per pixel collisions is not enough, if the movement is to be restricted by the obstacle. You must resolve the collision.
For enemies or environmental hazards your original code is sufficient and collision resolution is not required. It will give the player a minor advantage.
A simple resolution algorithm(see below for a mathematical solution) is to unwind the movement by half, check for collision. If it is still colliding, unwind the movement by a quarter, otherwise advance it by a quarter and check for collision. Repeat until the movement is less than 1 pixel. This runs log of Speed times.
As for the top wall not colliding perfectly: If the starting Y value is not a multiple of the vertical movement speed, you will not land perfectly on zero. I prefer to resolve this by setting the Y = 0, when Y is negative. It is the same for X, and also when X and Y > screen bounds - origin, for the bottom and right of the screen.
I prefer to use mathematical solutions for collision resolution. In your example images, you show a box colliding with a diamond, the diamond shape is represented mathematically as the Manhattan distance(Math.Abs(x1-x2) + Math.Abs(y1-y2)). From this fact, it is easy directly calculate the resolution to the collision.
On optimizations:
Be sure to check that the bounding Rectangles are overlapping before calling this method.
As you have stated, remove all Math.Floors, since, the cast is sufficient. Reduce all calculations inside of the loops not dependent on the loop variable outside of the loop.
The (int)a.Bounds.Y * a.Texture.Width and (int)b.Bounds.Y * b.Texture.Width are not dependent on the x or y variables and should be calculated and stored before the loops. The subtractions 'y-[above variable]` should be stored in the "y" loop.
I would recommend using a bitboard(1 bit per 8 by 8 square) for collisions. It reduces the broad(8x8) collision checks to O(1). For a resolution of 144x144, the entire search space becomes 18x18.
you can wrap your sprite with a rectangle and use its function called Intersect,which detedct collistions.
Intersect - XNA

Repeated rotation - increasing image dimension at export to png

I want an user to draw something. I will rotate that image many times and I will save each file to a folder. A template is img<degree>.png, for example img24.png is the original image rotated by 24 degree. It's like using Rotate tool, set it to 24 degree and export it with default sittings.
The problem is that every time I rotate and export to png the files getting bigger and bigger. When the original file is 100x100 & 380B, the 9th file is 413x412 2,47KB. I want the images to stay at the same size (100x100 in the above example).
(define (degrees-to-radians degrees) (/ (* degrees *pi*) 180))
(define (script-fu-rotate-and-save in-image in-drawable directory-name) ; degree)
(let ((ind 0) (x 0) (y 0))
(while (< ind 361)
(set! x (car (gimp-image-width in-image)))
(set! y (car (gimp-image-height in-image)))
(gimp-item-transform-rotate in-drawable (degrees-to-radians ind) FALSE (/ x 2) (/ y 2))
(file-png-save-defaults 1 in-image in-drawable (string-append directory-name "/img" (number->string ind) ".png") (string-append directory-name "/temp.png"))
(set! ind (+ ind 45))
)
)
;(gimp-displays-flush) ; show changes on image
)
(script-fu-register
"script-fu-rotate-and-save" ;name
"rotate and save"
"Rotates and saves"
"me"
"copyrights"
"today"
""
SF-IMAGE "image-main" 0
SF-DRAWABLE "drawable-main" 0
SF-DIRNAME "directory-name" ""
;SF-ADJUSTMENT "label" '(value lower upper step_inc page_inc digits type)
;SF-ADJUSTMENT "degree" '(1 1 360 1 1 0 0)
)
(script-fu-menu-register "script-fu-rotate-and-save" "<Image>/Rotate and save")
If you rotate a rectangular image, you must either obtain a slightly larger image, or clip off some of that data. Often the area of interest is in fact roughly circular and the corners either background or transparent. However it's unlikely that a rotate algorithm will make that decision for you.
If you iteratively rotate, you not only get an accumulation of size, you also get an accumulation or error because pixels don't match (to see how to suppress this effect, look up rotatebyshear, in the binary image library (here). So the image will start to blur. So you need to always start from your original image, and apply the total rotation.
If you compare gimp-item-transform-rotate to its - now deprecated - predecessor, you will notice that it has an additional paramter called clip-result, with four possible values (the number in parens is the numeric value of the option):
TRANSFORM-RESIZE-ADJUST (0)
TRANSFORM-RESIZE-CLIP (1)
TRANSFORM-RESIZE-CROP (2)
TRANSFORM-RESIZE-CROP-WITH-ASPECT (3)
The current gimp-item-* API get the value from the current context, gimp-context-set-transform-resize is used to set the value you desire.
The default is TRANSFORM-RESIZE-ADJUST (0) - this enlarges the layer on every rotate, and if you rotate the same layer over and over again, the results become bigger and bigger.
You want to try TRANSFORM-RESIZE-CLIP (1) - this clips the rotated layer to the original size.
The remaining two options are a bit harder to understand - there you definitely want to have a look at the user manual. These options are common to the transform tools, btw.
The issue with error accumulation, as indicate in Malcolm's answer, remains. you definitely want to rotate a copy of the original layer by the accumulated angle, instead of rotating the same layer over and over again.

Wrong result using function fillPoly in opencv for very large images

I have a hard time solving the issue with mask creation.My image is large,
40959px X 24575px and im trying to create a mask for it.
I noticed that i dont have a problem for images up to certain size(I tested about 33000px X 22000px), but for dimensions larger than that i get an error inside my mask(Error is that it gets black in the middle of the polygon and white region extends itself to the left edge.Result should be without black area inside polygon and no white area extending to the left edge of image).
So my code looks like this:
pixel_points_list = latLonToPixel(dataSet, lat_lon_pairs)
print pixel_points_list
# This is the list im getting
#[[213, 6259], [22301, 23608], [25363, 22223], [27477, 23608], [35058, 18433], [12168, 282], [213, 6259]]
image = cv2.imread(in_tmpImgFilePath,-1)
print image.shape
#Value of image.shape: (24575, 40959, 4)
mask = np.zeros(image.shape, dtype=np.uint8)
roi_corners = np.array([pixel_points_list], dtype=np.int32)
print roi_corners
#contents of roi_corners_array:
"""
[[[ 213 6259]
[22301 23608]
[25363 22223]
[27477 23608]
[35058 18433]
[12168 282]
[ 213 6259]]]
"""
channel_count = image.shape[2]
ignore_mask_color = (255,)*channel_count
cv2.fillPoly(mask, roi_corners, ignore_mask_color)
cv2.imwrite("mask.tif",mask)
And this is the mask im getting with those coordinates(minified mask):
You see that in the middle of the mask the mask is mirrored.I took those points from pixel_points_list and drawn them on coordinate system and im getting valid polygon, but when using fillPoly im getting wrong results.
Here is even simpler example where i have only 4(5) points:
roi_corners = array([[ 213 6259]
[22301 23608]
[35058 18433]
[12168 282]
[ 213 6259]])
And i get
Does anyone have a clue why does this happen?
Thanks!
The issue is in the function CollectPolyEdges, called by fillPoly (and drawContours, fillConvexPoly, etc...).
Internally, it's assumed that the point coordinates (of integer type int32) have meaningful values only in the 16 lowest bits. In practice, you can draw correctly only if your points have coordinates up to 32768 (which is exactly the maximum x coordinate you can draw in your image.)
This can't be considered as a bug, since your images are extremely large.
As a workaround, you can try to scale your mask and your points by a given factor, fill the poly on the smaller mask, and then re-scale the mask back to original size
As #DanMaĆĄek pointed out in the comments, this is in fact a bug, not fixed, yet.
In the bug discussion, there is another workaround mentioned. It consists on drawing using multiple ROIs with size less than 32768, correcting coordinates for each ROI using the offset parameter in fillPoly.

How the window function works in STFT

Can anyone experienced in signal processing and STFT explains to me why the window function in the below posted image is from (t-t'), given that t is the total time and t' is the width of the window?
I can not figure it out because, initially, the window is located at t=0, and if the window length for an example is 3, then the window will spans from t=0 -> t=3, and if the total time T = 10 for an example then the window function will be like w(T-3), which is 7?! I really can not understand it and I believe there is any hidden thing I can not comprehend.
Kindly please clarify it and guide. Thanks
Image:
note that, the width of the winow function is constant throughout the entire STFT process. and the time (t) in the function g(t-t') indicate sthat, t: is the current time on the time axis and it is variable each time the window is moved/shifted to the righ to overlap the main signal.
in other words, and i hope this clarifies better, the "t" at the end of the time axis is NOT the "t" in the function g(t-t'). as i stated earlir in the function g(t-t'), t: is the current time on the time axis and it is variable for each shift of the window function and t': is the width of the window and it is constant throughout the entire STFT process.
t is your time variable, not the total time.
t' is not the width of the window, it is the integration variable in the integral, and the integral is missing a dt' at the right end.
g(x) is the window function, and the width of it is not defined above, but represented as the width of the light blue bell in the figure.
The image may have a different interpretation, but it may be wrong; if you apply these adjustments:
Swap the labels t and t' on the horizontal axis.
Change x(t) with x(t') on the vertical axis.
you are now looking at x(t') (black line) and at g(t-t') (upper contour of the light-blue zone) for a FIXED time t. The bell-shaped window function is centered around t, and the product of the bell and of the signal is the function of which you are calculating the Fourier transform in the equation, and it is non-zero only in proximity of the fixed value of t. Consistently, the quantity is the 'local', i.e. short-time, Fourier transform of the signal, in the vicinity of the fixed time t.
You can do the same for all values of t (with a different figure for each value, with a bell shifted to the left/right), and obtain the STFT.

Resources