Subtract two images to remove the background with OpenCV - opencv

I'am using OpenCV 2.4.5 to track a fish in real time. The fish is in a shuttle, also moving.
I get only frames from a gray-scale camera.
I try to get only the fish using those two images.
Before applying a threshold, I want to remove the background
I tried to subtract the two images, but it doesn't work with the parts outside the shuttle.
Here are the 2 frames, and my two results : https://app.box.com/s/3iug7wan8vz75j3usv7w
My code is as simple as that:
Mat fg = imread("fg.tif",1);
Mat bg = imread("bg.tif",1);
Mat result1 = abs(fg-bg);
imwrite("withoutMask.tif",result1);
Mat result2;
bitwise_and(fg, result1, result2);
imwrite("withMask.tif",result2);
It works when the fish stays in the shuttle, but not when he is out.
The problem is that the part of the tail outside the shuttle should have the same intensity as the part in.
I would really appreciate if somebody could help me with this.
Thanks in advance.

There is always a smooth pattern in every moves. The best way for video processing is to find moves and shifts in video.
Your problem is that the shuttle is moving and fish is swimming in it.
1) Calculate Motion Vector of small blocks of size 8 or 16 (or 4 if the image is low-resolution).
2) Using motion vectors of many small blocks and correlation between them you can identify the movements of few close blocks in a particular direction it is your object/fish (you will know approximate size and size variation of your fish :P)
3) The scene/background/shuttle is making rest of the large screen. Thus if you find lots of block with MV in same direction and magnitude then it is the background. You can suspect the shuttle with corners also.
Keep track of last position of fish to avoid huge processing by searching in the region close to last position rather than searching in whole image. (As I said movements are smooth and hence the next move will be very close the previous so no need to search whole image).

there is a video module in opencv that provide some good background substraction algorithm, take a look at that: http://docs.opencv.org/modules/video/doc/motion_analysis_and_object_tracking.html#backgroundsubtractor
Moreover there is a useful discuss here: http://answers.opencv.org/question/7998/object-detection-using-background-subtraction/

Related

OpenCV - align stack of images - different cameras

We have this camera array arranged in an arc around a person (red dot). Think The Matrix - each camera fires at the same time and then we create an animated gif from the output. The problem is that it is near impossible to align the cameras exactly and so I am looking for a way in OpenCV to align the images better and make it smoother.
Looking for general steps. I'm unsure of the order I would do it. If I start with image 1 and match 2 to it, then 2 is further from three than it was at the start. And so matching 3 to 2 would be more change... and the error would propagate. I have seen similar alignments done though. Any help much appreciated.
Here's a thought. How about performing a quick and very simple "calibration" of the imaging system by using a single reference point?
The best thing about this is you can try it out pretty quickly and even if results are too bad for you, they can give you some more insight into the problem. But the bad thing is it may just not be good enough because it's hard to think of anything "less advanced" than this. Here's the description:
Remove the object from the scene
Place a small object (let's call it a "dot") to position that rougly corresponds to center of mass of object you are about to record (the center of area denoted by red circle).
Record a single image with each camera
Use some simple algorithm to find the position of the dot on every image
Compute distances from dot positions to image centers on every image
Shift images by (-x, -y), where (x, y) is the above mentioned distance; after that, the dot should be located in the center of every image.
When recording an actual object, use these precomputed distances to shift all images. After you translate the images, they will be roughly aligned. But since you are shooting an object that is three-dimensional and has considerable size, I am not sure whether the alignment will be very convincing ... I wonder what results you'd get, actually.
If I understand the application correctly, you should be able to obtain the relative pose of each camera in your array using homographies:
https://docs.opencv.org/3.4.0/d9/dab/tutorial_homography.html
From here, the next step would be to correct for alignment issues by estimating the transform between each camera's actual position and their 'ideal' position in the array. These ideal positions could be computed relative to a single camera, or relative to the focus point of the array (which may help simplify calculation). For each image, applying this corrective transform will result in an image that 'looks like' it was taken from the 'ideal' position.
Note that you may need to estimate relative camera pose in 3-4 array 'sections', as it looks like you have a full 180deg array (e.g. estimate homographies for 4-5 cameras at a time). As long as you have some overlap between sections it should work out.
Most of my experience with this sort of thing comes from using MATLAB's stereo camera calibrator app and related functions. Their help page gives a good overview of how to get started estimating camera pose. OpenCV has similar functionality.
https://www.mathworks.com/help/vision/ug/stereo-camera-calibrator-app.html
The cited paper by Zhang gives a great description of the mathematics of pose estimation from correspondence, if you're interested.

Detecting balls on a pool table

I'm currently working on a project where I need to be able to very reliable get the positions of the balls on a pool table.
I'm using a Kinect v2 above the table as the source.
Initial image looks like this (after converting it to 8-bit from 16-bit by throwing away pixels which is not around table level):
Then a I subtract a reference image with the empty table from the current image.
After thresholding and equalization it looks like this: image
It's fairly easy to detect the individual balls on a single image, the problem is that I have to do it constantly with 30fps.
Difficulties:
Low resolution image (512*424), a ball is around 4-5 pixel in diameter
Kinect depth image has a lot of noise from this distance (2 meters)
Balls look different on the depth image, for example the black ball is kind of inverted compared to the others
If they touch each other then they can become one blob on the image, if I try to separate them with depth thresholding (only using the top of the balls) then some of the balls can disappear from the image
It's really important that anything other than balls should not be detected e.g.: cue, hands etc...
My process which kind of works but not reliable enough:
16bit to 8bit by thresholding
Subtracting sample image with empty table
Cropping
Thresholding
Equalizing
Eroding
Dilating
Binary threshold
Contour finder
Some further algorithms on the output coordinates
The problem is that a pool cue or hand can be detected as a ball and also if two ball touches then it can cause issues. Also tried with hough circles but with even less success. (Works nicely if the Kinect is closer but then it cant cover the whole table)
Any clues would be much appreciated.
Expanding comments above:
I recommend improving the IRL setup as much as possible.
Most of the time it's easier to ensure a reliable setup than to try to "fix" that user computer vision before even getting to detecting/tracking anything.
My suggestions are:
Move the camera closer to the table. (the image you posted can be 117% bigger and still cover the pockets)
Align the camera to be perfectly perpendicular to the table (and ensure the sensor stand is sturdy and well fixed): it will be easier to process a perfect top down view than a slightly tilted view (which is what the depth gradient shows). (sure the data can be rotated, but why waste CPU cycles when you can simply keep the sensor straight)
With a more reliable setup you should be able to threshold based on depth.
You can possible threshold to the centre of balls since the information bellow is occluded anyway. The balls do not deform, so it the radius decreases fast the ball probably went in a pocket.
One you have a clear threshold image, you can findContours() and minEnclosingCircle(). Additionally you should contrain the result based on min and max radius values to avoid other objects that may be in the view (hands, pool cues, etc.). Also have a look at moments() and be sure to read Adrian's excellent Ball Tracking with OpenCV article
It's using Python, but you should be able to find OpenCV equivalent call for the language you use.
In terms tracking
If you use OpenCV 2.4 you should look into OpenCV 2.4's tracking algorithms (such as Lucas-Kanade).
If you already use OpenCV 3.0, it has it's own list of contributed tracking algorithms (such as TLD).
I recommend starting with Moments first: use the simplest and least computationally expensive setup initially and see how robuts the results are before going into the more complex algorithms (which will take to understand and get the parameters right to get expected results out of)

People detect using Hog not finding anyone

I have a video of soccer in which the players are relatively far away from the camera and thus represent small portions of the image. I'm using background subtraction to detect the players and the results are fine but I have been asked to try detecting using Hog.
I tried using the detect MultiScale using the default descriptors presented on opencv but i cant get any detection. I dont really understand how can I make it work on this case, because on other sequences where the people are closer to the camera, the detector works fine.
Here is a sample image link
Thanks.
The descriptor you use with HOG determines the minimum size of person you can detect: with the DefaultPeopleDetector the detection window is 128 pixels high x 64 wide, so you can detect people around 90px high. With the Daimler descriptor the size you can detect is a bit smaller.
Your pedestrians are still too small for this, so you may need to magnify the whole image, or just the parts which show up as foreground using background segmentation.
Have a look at the function definition for detectMultiscale http://docs.opencv.org/modules/objdetect/doc/cascade_classification.html#cascadeclassifier-detectmultiscale
It might be that you need to reduced the value of minsize so as to detect smaller people or the people might just be too far away.

Idea needed about vehicle counting by analysing the white pixels of the binary foreground image

I am a bit new to image processing so I'd like to ask you about finding the optimal solution for my problem, not help for code.
I couldn't think of a good idea yet so wanted to ask for your advices. Hope you can help.
I'm working on a project under OpenCV which is about counting the vehicles from a video file or a live camera. Other people working on such a project generally track the moving objects then count them but instead of it, I wanted to work with a different viewpoint; asking user to set a ROI(Region of interest) on the video window and work only for this region(for some reasons, like to not deal with the whole frame and some performance increase), as seen below.(btw, user can set more than one ROI and user is asked to set the height of the ROI about 2 times of a normal car by sense of proportion )
I've done some basic progress so far, like backgound updating, morphological filters, threshoulding and getting the moving object as a binary image something like below.
After doing them, I tried to count the white pixels of the final threshoulded foreground frame and estimate whether it was a car or not by checking the total white pixels number(I set a lower bound by a static calculation by knowing the height of ROI). To illustrate, I drew a sample graphic:
As you can see from the graphic, it was easy to calculate the white pixels and checking if it draws a curve by the time and determining whether a car or something like noise.
I was quite successful until two cars passed through my ROI together at the same time. My algorithm crashed by counting them as one car as you can guess :/ I tried different approaches for this problem and similar to this like long vehicles but I couldn't get an optimum solution up to now.
My question is: is it impossible to handle this task by this approach of pixel value counting? If it is possible, what would be your suggestion? I wish you also faced something similar to this before and can help me.
All ideas are welcome, thanks in advance friends.
Isolate the traffic from the background - take two images, run high pass filter on one of them, convert the other to a binary image - use the binary image to mask the filtered one, you should be able to use edge detection to identify the roof of each vehicle as a quadrilateral and you should then be able to compute a relative measure of it.
You then have four scenarios:
no quadrilaterals - no cars
large quadrilaterals - trucks
multiple small quadrilaterals - several cars
single quadrilaterals - one car
In answer to your question "Is it possible to do this using pixel counting?"
The short answer is "No", for the very reason your quoting: mere pixel counting of static images is not enough.
If you are limited to pixel counting, you can try looking at pixel count velocity (change of pixel counts between success frames) and you might pick out different "velocity" shapes when 1 car, 2 cars or trucks pass.
But just plain pixel counting? No. You need shape (geometric) information as well.
If you apply any kind of thresholding algorithm (e.g. for background subtraction), don't forget to update the background whenever light levels change (e.g. day and night). Also consider the grief when it is a partly cloudy with sharp cloud shadows that move across your image.

OpenCV: Detect blinking lights in a video feed

I have a video feed. This video feed contains several lights blinking at different rates. All lights are the same color (they are all infrared LEDs). How can I detect the position and frequency of these blinking lights?
Disclaimer: I am extremely new to OpenCV. I do have a copy of Learning OpenCV, but I am finding it a bit overwhelming. If anyone could explain a solution in OpenCV terminology, it would be greatly appreciated. I am not expecting code to be written for me.
Threshold each image in the sequence with a threshold that makes the LED:s visible. If you can threshold it with a threshold that only keeps the LED and removes background then you are more or less finished since all you need to do now is to keep track of each position that has seen a LED and count how often it occurs.
As a middle step, if there is "background noise" in the thresholded image would be to use erosion to remove small mistakes, and then maybe dilate to "close holes" in the blobs you are actually interested in.
If the scene is static you could also make a simple background model by taking the median of a few frames and removing the resulting median image from any frame and threshold that. Stuff that has changed (your LEDs) will appear stronger.
If the scene is moving I see no other (easy) solution than making sure the LED are bright enough to be able to use the threshold approach given above.
As for OpenCV: if you know what you want to do, it is not very hard to find a function that does it. The hard part is coming up with a method to solve the problem, not the actual coding.
If the leds are stationary, the problem is far simpler than when they are moving. Assuming they are stationary, a solution to find the frequency could simply be to keep a vector or an array for each pixel location in which you store the values of that pixel, preferably after the preprocessing described by kigurai, over some timeframe. You can then compute the 1D fourier transform of those value vectors and find the ground frequency as the first significant component after the DC peak. If the DC peak is too low, it means there is no led there.
Hope this problem is still somewhat actual, and that my solution makes sense.

Resources