There is an operation called Shift which is performed after DFT to bring zero-frequency components to the center of the frequency spectrum.
I have two questions regarding this operation:
Why don't/can't DFT automatically center the zero-frequency components?
What happens if we don't perform Shift operation after DFT? I.e. how does it affect our other tasks of image processing?
Can anyone provide me some material to know about this specific operation named Shift?
References:
- fftw shift zero-frequency to the image center
- Why we shift the zero-frequency component to the center of the spectrum?
The DFT, by definition, uses n=0..N-1 and k=0..N-1, where n is the index into the time-domain signal and k the index into the frequency-domain signal. k also corresponds to the frequency. The DFT is defined this way in analogy to Fourier series.
Since the frequency in the DFT is periodic, one can think of k=N-1 to correspond to k=-1 instead. The shift function thus moves the upper half of the frequencies to the left of the origin, so they can be more readily interpreted as negative frequencies. But this is solely a convenience for display, as it brings the frequency-domain signal to a form that we're more familiar with (this is probably because it makes some Fourier analysis easier to explain, and hence text books display it this way, and hence we learn about Fourier by looking at Frequency plots with the origin in the middle).
For most tasks in image processing, we do not need to shift the origin. Again, it is only for display that this is convenient and pretty.
For example, to compute cross-correlation:
cc = ifft( fft(img1) * conj(fft(img2)) )
Here, the top-left pixel of cc is the origin. If img1==img2, the top-left pixel will be the maximum value. If we had an fft function that shifted the origin to the middle, then the cross-correlation image cc would have its origin in the middle also. After finding the peak, we'd need to do some computations to figure out what the shift between img1 and img2 is. (Not that this is complicated, but it shows that shifting is not necessarily advantageous.)
When convolving, one often has a spatial-domain kernel with the origin in the middle (as for example in this recent question). In this case, one must shift the origin to the top-left before computing the DFT. But there is no point in shifting the origin of the frequency-domain signals just to multiply them together and then undo the shift. One can simply directly multiply the signals that have the origin in the top-left corner:
kernel = ifftshift(kernel)
filtered = ifft( fft(img) * fft(kernel) )
Note that there are two different shift functions, often called fftshift and ifftshift. The one shifts the origin from the top-left to the middle, the other shifts the origin from the middle to the top-left. These two functions do exactly the same thing for even-sized signals (images), but differ if the sizes are odd.
I need to find orientation of corn pictures (as examples below) they have different angles to right or left. I need to turn them upside (90 degree angle with their normal) (when they look like a water drop)
Is there any way I can do it easily?
As starting point - find image moments (and Hu moments for complex forms like pear). From the link:
Information about image orientation can be derived by first using the
second order central moments to construct a covariance matrix.
I suspect that usage of some image processing library like OpenCV could give more reliable results in common case
From the OP I got the impression you a rookie in this so I stick to something simple:
compute bounding box of image
simple enough go through all pixels and remember min,max of x,y coordinates of non background pixels
compute critical dimensions
Just cast few lines through the bounding box computing the red points positions. So select the start points I choose 25%,50%,75% of height. First start from left and stop on first non background pixel. Then start from right and stop on first non background pixel.
axis aligned position
start rotating the image with some step remember/stop on position where the red dots are symmetric so they are almost the same distance from left and from right. Also the bounding box has maximal height and minimal width in axis aligned position so you can also exploit that instead ...
determine the position
You got 4 options if I call the distance l0,l1,l2,r0,r1,r2
l means from left, r means from right
0 is upper (bluish) line, 1 middle, 2 bottom
then you wanted position is if (l0==r0)>=(l1==r1)>=(l2==r2) and bounding box is bigger in y axis then in x axis so rotate by 90 degrees until match is found or determine the orientation directly from distances and rotate just once ...
[Notes]
You will need accessing pixels of image so I strongly recommend to use Graphics::TBitmap from VCL. Look here gfx in C specially the section GDI Bitmap and also at this finding horizon on high altitude photo might help a bit.
I use C++ and VCL so you have to translate to Pascal but the VCL stuff is the same...
I need to calculate distance between two points using a webcam. Now the catch is I don't need it to be any way related to actual measurements in cm or whatever. What I want is to use different webcams of different resolutions and they should all give the same measurement. I'll explain.
Suppose I am viewing a square shape using a webcam of 640x480 and it measures as one unit. I then view the same object from the same positions using a webcam of 1024x768 and it should still measure as 1 unit. How do I do this?
You didn't mentioned about the process by which you are measuring the dimensions of the object. I'm gonna assume you are measuring by using a single camera. You can take this method as a reference & this can be applied to any methodology.
Here are the steps to measure the size of object:
How will you measure length of a line drawn in this picture?
You need a ruler as a reference. To make this ruler you have to know the real world ruler size which will be in pixels in our case.
Now make a graph. I'm gonna take a unit line as a reference graph. I'm taking centimeter scale as reference.
Place this graph in front of the camera & detect the Two red dots. Now calculate the number of pixels between this two points ref. Lets assume the distance is 1000 pixels. So 1 cm is taking 1000 pixels. So 1 pixel is equal to 0.1 cm & take this as a Reference_pixels_count.
Repeat this step 4 for all the resolutions & find the Reference_pixels_count for that Resolution.
Now place an object & get the size of image.find corners & cycle through each corner and find the distance between each corner. Multiply this distance with the Reference_pixels_count to get the actual dimension of the object.
NOTE: This method can work only for flat object with negligible depth change.
When it comes to cascade classifiers (using haar like features) I always read that methods like AdaBoosting are used to select the 'best' features for detection. However this only works if there is some initial set of features to begin boosting.
Given a 24x24 pixel image there are 162,336 possible haar features. I might be wrong here, but I don't think libraries like openCV initially test against all of these features.
So my question is how are the initial features selected or how are they generated? Is there any guideline about the initial number of features?
And if all 162,336 features are used initially. How are they generated?
From your question i am able to understand that you wanted to know what are 1,62,336 features.
From 4 original viola jones features(http://en.wikipedia.org/wiki/Viola%E2%80%93Jones_object_detection_framework)
We can generate 1,62,336 features by varying size of 4 original features and their position on 24*24 input image.
For example consider one of the original feature which has two rectangles adjacent to each other.
Let us consider size of each rectangle is 1 pixel. Initially if one rectangle is present on (0,0) of 24*24 image then it is considered as one feature & now if you move it horizontally by one pixel( to (1,0) ) then it is considered as second feature as its position is changed to (1,0). In this way u can move it horizontally upto (22,0) generating 23 features. Similarly, if you move along vertical axis from (0,0) up to (0,23) then u can generate 24 features. Now if you move on image covering every position (for example (1,1),(1,2).....(22,23) ) then u can generate 24*23=552 features.
Now if we consider width of each rectangle is 2 pixels and height is 1 pixel. Initially if one rectangle is present on (0,0) and is moved along horizontal axis up to (20,0) as said above then we can have 21 features, as its height is same if we move along vertical axis from (0,0) to (0,23) we can have 24 features. Thus if we move so as to cover every position on image then we can have 24*21=504 features.
In this way if we increase width of each rectangle by one pixel keeping height of each rectangle as 1 pixel every time we cover complete image, so that its width changes from 1 pixel to 24 pixels we get no. of features = 24*(23+21+19.....3+1)
Now, if we consider width of each rectangle is 1 pixel and height as 2 pixel. Initially if one rectangle is present on (0,0) and is moved along horizontal axis up to (23,0) then we can have 23 features as its width is 1 pixel, as its height is 2 pixels if we move along vertical axis from (0,0) to (0,22) then we can have 23 features. Thus if we move so as to cover every position on image then we can have 23*23=529 features.
Similarly, if we increase width of each rectangle by one pixel keeping height of each rectangle as 2 pixels every time we cover complete image, so that its width changes from 1 pixel to 24 pixels we get no. of features = 23*(23+21+19.....3+1)
Now, if we increase height of each rectangle by 1 pixel after changing width of each rectangle from 1 pixel to 24 pixels until height of each rectangle becomes 24 pixels, then
no. of features = 24*(23+21+19.....3+1) + 23*(23+21+19.....3+1) + 22*(23+21+19.....3+1) +.................+ 2*(23+21+19.....3+1) + 1*(23+21+19.....3+1)
= 43,200 features
Now if we consider 2nd viola jones original feature which has two rectangles with one rectangle above other(that is rectangles are arranged vertically), as this is similar to 1st viola jones original feature it will also have
no. of features = 43,200
Similarly if we follow above process, from 3rd original viola jones feature which has 3 rectangles arranged along horizontal direction, we get
no. of features = 24*(22+19+16+....+4+1) + 23*(22+19+16+....+4+1) + 22*(22+19+16+....+4+1) +................+ 2*(22+19+16+....+4+1) + 1*(22+19+16+....+4+1)
=27,600
Now, if we consider another feature which has 3 rectangles arranged vertically(that is one rectangle upon another) then we get
no. of features = 27,600 (as it is similar to 3rd original viola jones feature)
Lastly, if we consider 4th original viola jones feature which has 4 rectangles we get
no.of features = 23*(23+21+19+......3+1) + 21*(23+21+19+......3+1) + 19*(23+21+19+......3+1) ..................+ 3*(23+21+19+......3+1) + 1*(23+21+19+......3+1)
= 20,736
Now summing up all these features we get = 43,200 + 43,200 + 27,600 + 27,600 + 20,736
= 1,62,336 features
Thus from above 1,62,336 features Adaboost selects some of them to form strong classifier.
I presume, you're familiar with Viola/Jones' original work on this topic.
You start by manually choosing a feature type (e.g. Rectangle A). This gives you a mask with which you can train your weak classifiers. In order to avoid moving the mask pixel by pixel and retraining (which would take huge amounts of time and not any better accuracy), you can specify how much the feature moves in x and y direction per trained weak classifier. The size of your jumps depend on your data size. The goal is to have the mask be able to move in and out of the detected object. The size of the feature can also be variable.
After you've trained multiple classifiers with a respective feature (i.e. mask position), you proceed with AdaBoost and Cascade training as usual.
The number of features/weak classifiers is highly dependent on your data and experimental setup (i.e. also the type of classifier you use). You'll need to test the parameters extensibly to also know which type of features work best (rectangle/circle/tetris-like objects etc). I worked on this 2 years ago and it took us quite a long time to evaluate which features and feature-generation-heuristics yielded the best results.
If you wanna start somewhere, just take 1 of the 4 original Viola/Jones features and train a classifier applying it anchored to (0,0). Train the next classifier with (x,0). The next with (2x,0)....(0,y), (0,2y), (0,4y),.. (x,y), (x, 2y) etc...
And see what happens. Most likely you'll see that it's ok to have less weak classifiers, i.e. you can proceed to increase the x/y step values which determine how the mask slides. You can also have the mask grow or do other stuff to save time. The reason this "lazy" feature generation works is AdaBoost: as long as these features make the classifiers slightly better than random, AdaBoost will combine these classifiers into a meaningful classifier.
It seems to me that there is a little bit of confusion here.
Even the accepted answer seems not correct to me (maybe I haven’t got it well).
The original Viola-Jones algorithm, the main later improvements of it as the Lienhart-Maydt algorithm, and the Opencv implementation, all of them evaluate each and every feature of the feature set in turn.
You can check the source code of Opencv (and whatever implementation you prefer).
At the end of function void CvHaarEvaluator::generateFeatures() you have numFeatures, which is just 162,336 for BASIC mode and size 24x24.
And all of them are checked in turn, when all the feature set is provided in the form of featureEvaluator (source):
bool isStageTrained = tempStage->train( (CvFeatureEvaluator*)featureEvaluator, curNumSamples,
_precalcValBufSize, _precalcIdxBufSize, *((CvCascadeBoostParams*)stageParams) );
Every weak classifier is constructed by checking each feature and chosing the one that yields the best result at that point (in case of a decision tree the process is similar).
After this choice, the weights of samples are changed accordingly, so that at the next round a different feature, from all the feature set again, will be selected.
A single feature evalution is computationally cheap, but multiplied by numFeatures can be demanding.
The whole training of a cascade can take weeks, but the bottleneck is not the feature evaluation process, it is the negative sample gathering at latest stages.
From the wikipedia link you provided I read:
in a standard 24x24 pixel sub-window, there are a total of M= 162,336
possible features, and it would be prohibitively expensive to evaluate
them all when testing an image.
Don’t be mislead by this, it means than after the long training process your detection algorithm should be very fast and it only needs to check few features (just the ones selected during training).
I am new to OPENCV so bear with me if there are simple things that I am missing here.
I am trying to work out a camera based system that can continuously output the speed of a vehicle with the following assumptions:
1. The camera is horizontally placed and the vehicle passes near 3 to 5 feet of the camera lens.
2. The speed will not be more than 30KM/Hrs
I was hoping to start with the concept of a optical mouse which detects the displacement in the surface pattern. However I am unclear as to how to handle the background when the vehicle starts to enter the frame.
There are two methods I was interested in experiment with but am looking for further inputs.
Detect the vehicle as it enters the frame and separate from background.
Use cvGoodFeaturesToTrack to find points on the vehicle.
Track the point across the next frame. & Calculate the horizontal velocity using Lucas_Kanade Pyramid function for optical flow.
Repeat
Please suggest corrections and amendments.
Also I request more experienced members to help me code this procedure efficiently since I don't know which are the most correct functions to use here.
Thanks in advance.
Hope you will be using a simple camera with 20 fps to 30 fps and your camera is placed perpendicular to the road but away from it...the object i.e. your cars have a max velocity of 8 ms-1 in the image plane...calculate the speed of the cars in the image plane with the help of the lens you are using...
( speed in object plane / distance of camera from road ) = ( speed in image plane / focal length )
you should get in pixels per second if you know how much each pixel measures...
Steps...
You can use frame differentiation...that is subtract the current frame from the previous frame and take the absolute difference...threshold the difference...this segments out your moving car from the back ground...remember this segments all moving objects...so if u want a car and not a moving man you can use the shape characteristic that is the height is to width ratio...fit a rectangle to the segmented part and in each frame do the same steps. so in each frame you can keep a record of the coordinate of the leading edge of the bounding box... that way when a car enters the view till it pass out of the view you know for how long the car has persisted...use the number of frames , the frame rate and the coordinaes of the leading edge of the bounding box to calculate the speed...
You can use goodfeaturestotrack and optical flow of open cv...that way you can make distinguish between fast moving and slow moving objects...but keep refreshing the points that goodfeaturestotrack gives you or else any new car coming into the camera view will not be updated...record the displacement of the set of points picked by goodfeaturestotrack in each frame..that is the displacement of the moving object...calculate speed in the same way...the basic idea to calculate speed is to record the number of frames the object has persisted in the camera field of view...if your camera is fixed so is your field of view...hence what matters is in how many frames are you able to catch the object...
remember....the optical flow of opencv works for tracking slow moving objects or more theoretically the feature point (determined by goodfeatures to track..) displacement is less between 2 consecutive frames for the algorithm to work...big displacements will have some erroneous predictions by the algorithm...that is why the speed in the image plane is important..at least qualitatively you should get an idea of it...
NOTE: both the methods are for single object tracking ..for multiple object tracking you need some modifications...however you can start with either of the method...i think it will work..