Stiching top down photos of long items using OpenCV - opencv

I'm working on a project that includes taking top-down photos from three stationary cameras mounted above a long white table. Long packs of veneer are placed on the table to be photographed. The three images then need to be stitched together. The goal is to achieve a smooth transition on the border. I'm trying to use OpenCV to achieve this, but I'm running into problems. The images are either stitched with huge overlap and distortion or, most commonly, just give an "assertion failed" error. I've tried using both "panorama" and "scans" modes to no avail. I'm using the Emgu.CV.NET wrapper, but the logic behind it should be the same.
The reason I'm not joining images manually is due to slight chromatic distortions towards the edges of images, which cause the pack size to slightly differ in the place where the connection should be. Also, the exact position of cameras may shift over time, causing further shifts. I was hoping the existing algorithms could fix that.
Am I using the stitcher wrongly, or is it unfit for this task? Can someone recommend other tools or methods for this problem?
Here's example images, which error out the stitcher.
(Click to see full-size versions)
Code used:
class ImageStitching
{
public static Mat StichImages(List<Mat> images)
{
Mat output = new Mat();
VectorOfMat matVector = new VectorOfMat();
foreach (Mat img in images)
{
matVector.Push(img);
}
Stitcher stitcher = new Stitcher(Emgu.CV.Stitching.Stitcher.Mode.Scans);
Brisk detector = new Brisk();
stitcher.SetFeaturesFinder(detector);
stitcher.Stitch(matVector, output);
return output;
}
}

Related

Image plotted by plt.imshow() is inverted while same image by cv2_imshow() is fine, how do I know what my neural net gets?

Here is my snippet for both of them
from google.colab.patches import cv2_imshow
import cv2
pt = '/content/content/DATA/testing_data/1/126056495_AO_BIZ-0000320943-Process_IP_Cheque_page-0001.jpg' ##param
img = cv2.imread(pt)
cv2_imshow(img)
and here is the other one
import matplotlib.image as mpimg
pt = '/content/content/DATA/testing_data/1/126056495_AO_BIZ-0000320943-Process_IP_Cheque_page-0001.jpg'
image = mpimg.imread(pt)
plt.imshow(image)
Now, the image in second case is inverted
and image on my system is upright
What I am mostly afraid of is, if my ML model is consuming inverted image, that is probably messing with my accuracy. What could possibly be the reason to It and how do I fix it
(ps: I cannot share the pictures unfortunately, as they are confidential )
(Run on google colab)
All the help is appreciated
Your picture is upside-down when you use one method for reading, and upright when you use the other method?
You use two different methods to read the image file:
OpenCV cv.imread()
Mediapipe mpimg.imread()
They behave differently. OpenCV's imread() respects file metadata and rotates the image as instructed. Mediapipe's function does not.
Solution: Stick to OpenCV's imread(). Don't use Mediapipe's function.
The issue is not with matplotlib. When plt.imshow() is called, it presents the image with an origin in the top left corner, i.e. the Y-axis grows downward. That corresponds to how cv.imshow() behaves.
If your plot does have an Y-axis growing upwards, causing the image to stand upside-down, then you must have set this plot up in specific ways that aren't presented in your question.

How show stereo camera with Oculus Rift?

I use the OpenCV for show in a new windows the left and right image from a stereo camera. Now I want to see the same thing on the Oculus Rift but when I connect the Oculus the image doesn't became in the Characteristic Circled image suitable with the lens inside the Oculus...
I need to process by myself the image ? It's not Automatic?
This is the code for show the windows:
cap >> frame; //cap= camera 1 & cap2=camera 2
cap.read(frame);
sz1 = frame.size();
//second camera
cap2 >> frame2;
cap2.read(frame2);
sz2 = frame2.size();
cv::Mat bothFrames(sz2.height, sz2.width + sz1.width, CV_8UC3);
// Move right boundary to the left.
bothFrames.adjustROI(0, 0, 0, -sz1.width);
frame2.copyTo(bothFrames);
// Move the left boundary to the right, right boundary to the right.
bothFrames.adjustROI(0, 0, -sz2.width, sz1.width);
frame.copyTo(bothFrames);
// restore original ROI.
bothFrames.adjustROI(0, 0, sz2.width, 0);
cv::imencode(".jpg", bothFrames, buf, params);
I have another problem. I'm trying to add the OVR Library to my code but I have the error "System Ambibuous Symbol" because some class inside the OVR Library used the same namaspace... This error arise when I add the
#include "OVR.h"
using namespace OVR;
-.-"
The SDK is meant to perform lens distortion correction, chromatic aberration correction (different refractive indices for different color light causes color fringing in image without correction), time warp, and possibly other corrections in the future. Unless you have a heavy weight graphics pipeline that you're hand optimizing, it's best to use the SDK rendering option.
You can learn about the SDK and different kinds of correction here:
http://static.oculusvr.com/sdk-downloads/documents/Oculus_SDK_Overview.pdf
It also explains how the distortion corrections are applied. The SDK is open source so you could also just read the source for a more thorough understanding.
To fix your namespace issue, just don't switch to the OVR namespace! Every time you refer to something from the OVR namespace, prefix it with OVR:: - e.g, OVR::Math - this is, after all, the whole point of namespaces :p

Using BlobTrackerAuto to track people in computer vision application

I am currently trying to develop a system that tracks people in a queue using EmguCV (OpenCV Wrapper). I started by running and understanting the VideoSurveilance example that's in Emgu package I downloaded. Here is my code based on the example:
private static void processVideo(string fileName)
{
Capture capture = new Capture(fileName);
MCvFont font = new MCvFont(Emgu.CV.CvEnum.FONT.CV_FONT_HERSHEY_SIMPLEX,
1.0, 1.0);
BlobTrackerAuto<Bgr> tracker = new BlobTrackerAuto<Bgr>();
//I'm using a class that I implemented for foreground segmentation
MyForegroundExtractor fgExtractor = new MyForegroundExtractor();
Image<Bgr, Byte> frame = vVideo.QueryFrame();
fgExtractor.initialize(frame);
while (frame != null)
{
Image<Gray, Byte> foreground = fgExtractor.getForegroundImg(frame);
tracker.Process(frame, foreground);
foreach (MCvBlob blob in tracker)
{
if (isPersonSize(blob))
{
frame.Draw((Rectangle)blob, new Bgr(0, 0, 255), 3);
frame.Draw(blob.ID.ToString(), ref font,
Point.Round(blob.Center), new Bgr(255.0, 255.0, 255.0));
}
}
CvInvoke.cvShowImage("window", frame);
CvInvoke.cvWaitKey(1);
frame = capture.QueryFrame();
}
}
The above code is meant to process each frame of an AVI Video, and show the processed frame with red rectangles around each person in scene. I didn't like the results I was getting using the IBGFGDetector<Bgr> class that is used in VideoSurveilance example, so I am trying to use my own foreground detector, using Emgu's functions such as CvInvoke.cvRunningAvg(), CvInvoke.cvAbsDiff(), CvInvoke.cvThreshold() and cvErode/cvDilate(). I have a few issues:
The video starts with a few people already in the scene. I am not getting the blobs corresponding to the people that are in the scene when the video starts.
Sometimes I "lose" a person for a few frames: I had the red rectangle drawn around a person for several seconds/frames and it disappears and after a while is drawn again with a different ID.
As you can see from the sample code, I check if the blob may be a person checking its height and width (isPersonSize() method), and draw the red rectangle only in the ones that pass in the test. How can I remove the ones that are not person sized?
I want to measure the time a person stays in the scene. What's the best way to know when a blob disappeared? Should I store the IDs of the blobs that I think correspond to people in an array and at each loop check if each one is still there using tracker.GetBlobByID()?
I think I am getting better results if I don't process every frame in the loop. I added a counter variable and an if-statement to process at every 3 frames:
if (i % 3 == 0)
tracker.Process(frame, foreground);
I added the if-statement because the program execution was really slow. But when I did that, I was able to track people that I wasn't able before.
To summarize, I would really appreciate if someone that is more used to OpenCV/EmguCV helped me by saying if it is a good approach to track people using BlobTrackerAuto, and by helping me with the issues above. I get the feeling that I am not taking advantage of the tools EmguCV can provide me.

What processing steps should I use to clean photos of line drawings?

My usual method of 100% contrast and some brightness adjusting to tweak the cutoff point usually works reasonably well to clean up photos of small sub-circuits or equations for posting on E&R.SE, however sometimes it's not quite that great, like with this image:
What other methods besides contrast (or instead of) can I use to give me a more consistent output?
I'm expecting a fairly general answer, but I'll probably implement it in a script (that I can just dump files into) using ImageMagick and/or PIL (Python) so if you have anything specific to them it would be welcome.
Ideally a better source image would be nice, but I occasionally use this on other folk's images to add some polish.
The first step is to equalize the illumination differences in the image while taking into account the white balance issues. The theory here is that the brightest part of the image within a limited area represents white. By blurring the image beforehand we eliminate the influence of noise in the image.
from PIL import Image
from PIL import ImageFilter
im = Image.open(r'c:\temp\temp.png')
white = im.filter(ImageFilter.BLUR).filter(ImageFilter.MaxFilter(15))
The next step is to create a grey-scale image from the RGB input. By scaling to the white point we correct for white balance issues. By taking the max of R,G,B we de-emphasize any color that isn't a pure grey such as the blue lines of the grid. The first line of code presented here is a dummy, to create an image of the correct size and format.
grey = im.convert('L')
width,height = im.size
impix = im.load()
whitepix = white.load()
greypix = grey.load()
for y in range(height):
for x in range(width):
greypix[x,y] = min(255, max(255 * impix[x,y][0] / whitepix[x,y][0], 255 * impix[x,y][1] / whitepix[x,y][1], 255 * impix[x,y][2] / whitepix[x,y][2]))
The result of these operations is an image that has mostly consistent values and can be converted to black and white via a simple threshold.
Edit: It's nice to see a little competition. nikie has proposed a very similar approach, using subtraction instead of scaling to remove the variations in the white level. My method increases the contrast in the regions with poor lighting, and nikie's method does not - which method you prefer will depend on whether there is information in the poorly lighted areas which you wish to retain.
My attempt to recreate this approach resulted in this:
for y in range(height):
for x in range(width):
greypix[x,y] = min(255, max(255 + impix[x,y][0] - whitepix[x,y][0], 255 + impix[x,y][1] - whitepix[x,y][1], 255 + impix[x,y][2] - whitepix[x,y][2]))
I'm working on a combination of techniques to deliver an even better result, but it's not quite ready yet.
One common way to remove the different background illumination is to calculate a "white image" from the image, by opening the image.
In this sample Octave code, I've used the blue channel of the image, because the lines in the background are least prominent in this channel (EDITED: using a circular structuring element produces less visual artifacts than a simple box):
src = imread('lines.png');
blue = src(:,:,3);
mask = fspecial("disk",10);
opened = imerode(imdilate(blue,mask),mask);
Result:
Then subtract this from the source image:
background_subtracted = opened-blue;
(contrast enhanced version)
Finally, I'd just binarize the image with a fixed threshold:
binary = background_subtracted < 35;
How about detecting edges? That should pick up the line drawings.
Here's the result of Sobel edge detection on your image:
If you then threshold the image (using either an empirically determined threshold or the Ohtsu method), you can clean up the image using morphological operations (e.g. dilation and erosion). That will help you get rid of broken/double lines.
As Lambert pointed out, you can pre-process the image using the blue channel to get rid of the grid lines if you don't want them in your result.
You will also get better results if you light the page evenly before you image it (or just use a scanner) cause then you don't have to worry about global vs. local thresholding as much.

Dynamically alter or destroy a Texture2D for drawing and collision detection

I am using XNA for a 2D project. I have a problem and I don't know which way to solve it. I have a texture (an image) that is drawn to the screen for example:
|+++|+++|
|---|---|
|+++|+++|
Now I want to be able to destroy part of that structure/image so that it looks like:
|+++|
|---|---|
|+++|+++|
so that collision now will work as well for the new image.
Which way would be better to solve this problem:
Swap the whole texture with another texture, that is transparent in the places where it is destroyed.
Use some trickery with spriteBatch.Draw(sourceRectangle, destinationRectangle) to get the desired rectangles drawn, and also do collision checking with this somehow.
Split the texture into 4 smaller textures each of which will be responsible for it's own drawing/collision detection.
Use some other smart-ass way I don't know about.
Any help would be appreciated. Let me know if you need more clarification/examples.
EDIT: To clarify I'll provide an example of usage for this.
Imagine a 4x4 piece of wall that when shot at, a little 1x1 part of it is destroyed.
I'll take the third option:
3 - Split the texture into 4 smaller
textures each of which will be
responsible for it's own
drawing/collision detection.
It's not hard do to. Basically it's just the same of TileSet struct. However, you'll need to change your code to fit this approach.
Read a little about Tiles on: http://www-cs-students.stanford.edu/~amitp/gameprog.html#tiles
Many sites and book said about Tiles and how to use it to build game worlds. But you can use this logic to everything which the whole is compost from little parts.
Let me quick note the other options:
1 - Swap the whole texture with
another texture, that is transparent
in the places where it is destroyed.
No.. have a different image to every different position is bad. If you need to change de texture? Will you remake every image again?
2- Use some trickery with
spriteBatch.Draw(sourceRectangle,
destinationRectangle) to get the
desired rectangles drawn, and also do
collision checking with this somehow.
Unfortunately it's don't work because spriteBatch.Draw only works with Rectangles :(
4 Use some other smart-ass way I don't
know about.
I can't imagine any magic to this. Maybe, you can use another image to make masks. But it's extremely processing-expensive.
Check out this article at Ziggyware. It is about Deformable Terrain, and might be what you are looking for. Essentially, the technique involves settings the pixels you want to hide to transparent.
Option #3 will work.
A more robust system (if you don't want to be limited to boxes) would use per-pixel collision detection. The process basically works as follows:
Calculate a bounding box (or circle) for each object
Check to see if two objects overlap
For each overlap, blit the sprites onto a hidden surface, comparing pixel values as you go. If a pixel is already set when you try to draw the pixel from the second sprite, you have a collision.
Here's a good XNA example (another Ziggyware article, actually): 2D Per Pixel Collision Detection
Some more links:
Can someone explain per-pixel collision detection
XNA 2-d per-pixel collision
I ended up choosing option 3.
Basically I have a Tile class that contains a texture and dimention. Dimention n means that there are n*n subtiles within that tile. I also have an array that keeps track of which tiles are destroyed or not. My class looks like this in pseudo code:
class Tile
texture
dimention
int [,] subtiles; //0 or 1 for each subtile
public Tile() // constructor
subtiles = new int[dimention, dimention];
intialize_subtiles_to(1);
public Draw() // this is how we know which one to draw
//iterate over subtiles
for(int i..
for(int j ...)
if(subtiles[i,j] == 1)
Vector2 draw_pos = Vector2(i*tilewidth,
j*tileheight)
spritebatch.Draw(texture, draw_pos)
In a similar fashion I have a collision method that will check for collision:
public bool collides(Rectangle rect)
//iterate over subtiles
for i...
for j..
if(subtiles[i,j]==0) continue;
subtile_rect = //figure out the rect for this subtile
if(subtile_rect.intersects(rect))
return true;
return false;
And so on. You can imagine how to "destroy" certain subtiles by setting their respective value to 0, and how to check if the whole tile is destroyed.
Granted with this technique, the subtiles will all have the same texture. So far I can't think of a simpler solution.

Resources