Approximately optimize overlap (masking) of 2d-images on a GPU - machine-learning

Let's say I have two grayscale images. x is 100x100 and y is 125x125.
y is a similar image to x. It is also on a different scale.
What I would like is to stretch y between 80% to 120% on the height and width dimensions (independently). And then pick a shift for the height and width dimensions, i.e. a start point for cropping to 100x100. Let's call y' = transform(y, stretch_h, stretch_w, shift_h, shift_w). I would like to minimize |y' - x|.
i.e. I want to stretch and shift y so that it overlaps / masks x as well as possible.
I'd like to do this on a GPU, ideally in 100ms or less. I prefer this be fast even if it is less exact.
How?

Related

Image Processing: how to find factor x and y of resizing image knowing that the object has an angle up to 90°

I built such classification model that takes as input an image of such rectangular object that has fixed dimensions (w0 x h0). the output of the model is the class of this object.., where w0 = 742 pixels and h0 = 572 pixels
now I have the same problem but with bigger rectangular object that has new fixed dimensions (w x h), where w = 1077 pixels and h = 681 pixels.
I would like to resize the new image so that the object will have exactly the same size of the old object to fit the current model I already built. How can I find the x and y factors of the image resizing knowing that the object is not straight and can has an angle alpha from 0° to 90°? alpha is known for me.
currently I have a bad solution:
rotate the image -alpha
resize the image using factor x = x0/x and y = y0/y
rotate the image back (+alpha)
Is it possible to calculate the resizing factors with respect of the angle alpha so I only resize the image without rotating the image back and forth? or maybe you know such function in opencv that does this calcuation?

How Resolution changes X & Y cords

I am tracking the color of a pixel at X & Y at a resolution of 1920 by 1080, I am simply wondering if there is any mathematically way to remain accurate in tracking the same pixel throughout various resolutions.
The pixel is not moving and is static, however I am aware that changing resolutions affects scaling and the X & Y system of the monitor.
So any suggestions would be great!
As always the whole screen area is filled, the same location on that physical screen (determined as the ratio of xLocation divided by the xWidth and the yLocation divided by the yHeight, this in centimeters or inches) will also always be at the same ratio of xPixelindex divided by xTotalpixels and yPixelindex divided by yTotalpixels.
Lets assume you have xRefererence and yReference of the target pixel, in a resolution WidthReference and HeightReference in which these coordinates mark the desired pixel.
Lets assume you have WidthCurrent and HeightCurrent of your screen size in pixels, for the resolurion in which you want to target a pixel at the same physical location.
Lets assume that you need to determine xCurrent and yCurrent as the coordinates for the pixel in the current resolution.
Then calculate the current coordinates as:
xCurrent = (1.0 * WidthCurrent) / WidthReference * xReference;
yCurrent = (1.0 * HeightCurrent)/ HeightReference * yReference;

Apply transform computed at lower resolution to a higher resolution object

I'm working on an iOS app that has to compute the transform matrix between consecutive real time video frames. I'm using OpenCV to compute optical flow and then find the affine matrix.
This process was working perfectly but was a little slow, so I'm now downsampling each frame to half its size before start processing it. The thing is, I have to later apply the transform to another video frames with the original resolution (double of the one for which I compute the matrix).
My question is: how should I apply the transform matrix I have computed for a frame at resolution X, to another frame with resolution 2X? I know I should "scale" the matrix somehow, but not sure how. I've tried multiplying the translation components of the matrix by 2, and this works almost perfectly (although I don't understand why), but depending on the transformation sometimes is not accurate.
One possible solution is to scale the frame to half its size, apply the transform and then scale it back to its original size, but this have a cost in performance, that's why I'm trying to compute a single matrix I can later use to transform the frame.
If your use a homography from H_00 to H_22 then you have to apply a scale factor to H_00 and H_11.
I would recomend another workaround. After tracking or correspondenz estimation. If x0[n], y0[n] are your start points and x1[n] , y1[n] your end points multiply both with a scale factor an than run findHomography or getAffineMatrix. E.g. let w0=200 and h0=100 be the width and height of your frame you estimated the feautre correspondenzes with and w1=400 and h1=300 the frame you want to apply. Than sx=2 and sy=3 are the scale factors and x0[n] = x0[n] * sx and y0[n] = y0[n] * sy and x1[n] = x1[n] * sx , etc.

What does closest pixels mean in a digital image?

I need to apply some simple filter to a digital image. It says that for each pixel, I need to get the median of the closest pixels. I wonder since the image for example is M x M. What are the closest pixels? Are they just left, right, upper, lower pixel, and the current pixel (in total 5 pixels) or I need to take into account all the 9 pixels in a 3x3 area?
Follow up question: what if I want the median of the N closest pixels (N = 3)?
Thanks.
I am guessing you are trying to apply median filter to a sample image. By definition of median for an image, you need to look at the neighboring pixels and find the median. There are two definitions which is important, one is the image size which is mn and the other filter kernel size which xy. If the kernel size is of size 3*3, you will need to look at 9 pixels like this:
Find the median of a odd number of pixels is easy, consider you have three 3 pixels x1, x2 and x3 arranged in ascending order of their values. The median of this set of pixels is x2.
Now, if you have an even number of pixels, usually the average of two pixels lying midway is computed. For example, say there are 4 pixels x1, x2, x3 and x4 arranged in ascending order of their values. The median of this set of pixels is (x1+x2)/2.

A 2D grid of 100 x 100 squares in OpenGL ES 2.0

I'd like to display a 2D grid of 100 x 100 squares. The size of each square is 10 pixels wide and filled with color. The color of any square may be updated at any time.
I'm new to OpenGL and wondered if I need to define the vertices for every square in the grid or is there another way? I want to use OpenGL directly rather than a framework like Cocos2D for this simple task.
You can probably get away with just rendering the positions of your squares as points with a size of 10. GL_POINT's always are a set number of pixels wide and high, so that will keep your squares 10 pixels always. If you render the squares as a quad you will have to make sure they are the right distance from the camera to be 10 pixels wide and high (also the aspect may affect it).

Resources