Using Image recognition inside a Video with Vuforia and Unity - image-processing

I'm trying to do image recognition / tracking inside of video file play from unity. Is it possible to do image recognition of a video file (not augmented reality) using Vuforia API?
If not, does anyone else have any suggestions on how to accomplish this?
Thanks!

If you want to recognize a particular frame in your video stream, most simple but effective solution is matching histograms of your sample frame and frames in your video stream, i don't know whether it can be done using Vuforia API, but if you are interested in implementation of some image processing algorithm the process is quite simple:
1) Convert your sample image in gray (if its a color image).
2) Calculate image histogram for certain number of bin.
3) Store this histogram in a variable.
4) Now run your video file in a loop and extract frames from it, apply above 3 steps and get histogram of same size as of sample image.
5) Find out distance between two histogram using simple some of square, put a similarity threshold there, if distance is less than your threshold frame is quite similar to sample image.
Another approach may be:
1) Find out color co-variance matrix from the input sample(if its a color image):
2) to find out it convert your color channel (R,G,B) into column vector and put them column wise in a single variable e.g. [R,G,B].
3) get column wise mean and subtract it with each value of the respected column (Centralize your data around the mean).
4) now transpose your 3 column matrix and multiply it like:
Cov = [R,G,B]^T * [R,G,B];
5) above will give you a 3 by 3 matrix.
6) do above for each frame and find distance between cov matrix of sample image and query frame. put threshold to find similarity.
Further extension in above may be finding eigen values of cov matrix and then use them as features for similarity caculation.
you can also try extraction of color histogram rather than a gray scale histogram.
For more complex situation you can go with key-point detection and matching approaches.
Thank You

Related

How to assess image quality using image comparison

I would like to compare videos. To compare the quality (Non blurry) by coding a C program. Someone told me to learn about DFT (Discrete Fourier Transform) for image analysis and to use a FFT or DFT tool to learn the difference between blurred vs detailed (non-blurry) copies of same image.
(copied from other question):
Lets say we have different files with different video quality, one is extremely clear, other is blurred, one is having rough colors. Compare all files basically frame by frame and report to the user which has better quality.
So can anyone help me with this ??
Let's say we have various files having different video quality:
one is extremely clear, other is blurred, one is having rough colors.
Compare all files basically frame by frame and report to the user which has better quality.
(1) Color Quality detection...
To check which has better color, you analyze the histograms of the test images. The histogram will be a count of how many pixels have intensity X. Where X is a number ranging between 0 up to 255 (because each red, green and blue channels each holds any of those 256 possible intensities).
There are many tutorials online about how to create a histogram since it's a basic task in computer graphics.
Generally it goes like:
First make 3 arrays (eg: hist_Red) to hold data for red, green and blue channels.
Break up (using FOR loop) each pixel into individual R/G/B channel components:
example:
temp_Red = this_pixel >> 16 & 0x0ff;
temp_Grn = this_pixel >> 8 & 0x0ff;
temp_Blu = this_pixel >> 0 & 0x0ff;
Then add +1 to that specific red/green/blue intensity in relevant histogram.
example:
hist_Red[ temp_Red ] += 1;
hist_Grn[ temp_Grn ] += 1;
hist_Blu[ temp_Blu ] += 1;
By adding the totals of red, green and blue, you will have total intensities of RGB in an array that could build charts like below. Check with image's array has most values to find image with better quality of colors:
(2) Detailed vs Blurred detection...
You can try using a convolution filter to detect blur in image. Give the filter a kernel (eg: a matrix). The matrix (3x3) shown below gives an edge-detect filter, where blurred images give less edges (therefore gives more black pixels).
Use logic to assume that: more black pixels EQuals a more blurred image (less detail).
You can read about convolutions here
Lode's Computer Graphics Tutorial: Image Filtering
Image Convolution with C/C++ code
PDF Image Manipulation: Filters and Convolutions
PDF Read page 10 onwards : Convolution filters

Detect triangles, ellipses and rectangles from an image

I am trying to detect the regions of traffic signs. Using OpenCV, my approach is as follows:
The color image:
Using the TanTriggs Preprocessing get rid of the illumination variances:
Equalize histogram:
And binarize (Cv2.Threshold(blobs, blobs, 127, 255, ThresholdTypes.BinaryInv):
Iterate each blob using ConnectedComponents and get the mean color value using the blob as mask. If it is a red color then it may be a red sign.
Then get contours of this blob using FindContours.
Simplify the contours using ApproxPolyDP and check the points of each contour:
If 3 points then triangle shape is acceptable --> candidate for triangle sign
If 4 points then shape is acceptable --> candidate
If more than 4 points, BBox dimensions are acceptable and most of the points are on the ellipse fitted (FitEllipse) --> candidate
This approach works for the separated blobs in the binary image, like the circular 100km sign in my example. However if there is a connection to the outside objects, like the triangle left bottom part in the binary image, it fails.
Because, the mean value of this blob is far from red!
Using Erosion helps in some cases, however makes it worse in many of the other images.
Using different threshold values for the binarization also works for some, but fails on many; like the erosion.
Using HoughCircle is just very slow and I couldn't manage to get good results playing with the parameters.
I have tried using matchShapes but couldn't get good results.
Can anybody show me another way the achieve what I want (with a reasonable computational time)?
Any information, or code in any language is wellcome.
Edit:
Using circularity measure (C=P^2/4πA) or the approach I have described above, triangle and ellips shapes can be found when they are separated. However when the contour is like this for example:
I could not find a robust way to extract the triangle piece. If I could, I would check the mean color, and decide if its a red sign candidate.
Sorry, I don't have the kudos to comment, but can't you use the red colour?
import common
myshow = common.myshow
img = cv2.imread("ms0QB.png")
grey = np.zeros(img.shape[:2],np.uint8)
hsv = cv2.cvtColor(img,cv2.COLOR_mask = np.logical_or(hsv[:,:,0]>160,hsv[:,:,0]<10 )
grey[mask] = 255
cv2.imshow("160<hue<182",grey)
cv2.waitKey()

Map one camera's colour profile to another

I have two cameras (A & B), which I've taken photos of a calibration scene then corrected distortion and used feature mapping to get a pixel precise registration resulting in the following:
As you can see, the colour response is quite different. What I would like to do now is take a new photo with A and answer the question: what would it look like if instead I had used camera B?
Is there some existing technique or algorithm to convert between the colour spaces/profiles of two cameras like this?
From the image you provided it is not to hard to segment them to small squares. After that take the mean(or even better median) of each square in the both images. Now you have 2*m*n value which are as follow: MeansReference_(m*n) , MeansQuery_(m*n). Using the linear color correction matrix which is:
You can construct this linear system:
MeansReference[i][j]= C * MeansQuery[i][j]
Where:
MeansReference[i][j] is a vector (3*1) of the color (R,G,B) of the square [i,j] in the Reference image.
MeansQuery[i][j] is a vector (3*1) of the color (R,G,B) of the square [i,j] in the Query image.
C is the 3*3 Matrix (a11,a12,... ,a33)
Now, For each i,j you will got 3 linear equations (for R,G,B). Since there are 9 variables (a11...a33) you need at least 9 equations which mean at least 3 squares (each square provide you with 3 equation). However, the more equation you construct, the more accuracy you got.
How to solve linear system with the number of equations more than the number of variables? Use Batch-LSE. You can find great details about it in Neuro-Fuzzy-and-Soft-Computing-Jang-Sun-Mizutan book or any online source.
After you find the 9 variables you have a color correction matrix. Just apply it on any image from the new camera and you will got an image that look like it was taken by the old camera. If you want the opposite, apply C^-1 instead.
Good Luck!

direction on image pattern description and representation

I have a basic question regarding pattern learning, or pattern representation. Assume I have a complex pattern of this form, could you please provide me with some research directions or concepts that I can follow to learn how to represent (mathematically describe) these forms of patterns? in general the pattern does not have a closed contour nor it can be represented with analytical objects like boxes, circles etc.
By mathematically describe I'm assuming you mean derive from the image a vector of values that represents the content of the image. In computer vision/image processing we call this an "image descriptor".
There are several image descriptors that could be applied to pixel based data of the form you showed, which appear to be 1 value per pixel i.e. greyscale images.
One approach is to perform "spatial gridding" where you divide the image up into a regular grid of a constant size e.g. a 4x4 grid. You then average the pixel values within each cell of the grid. Then concatenate these values to form a 16 element vector - this coarsely describes the pixel distribution of the image.
Another approach would be to use "image moments" which are 2D statistical moments. Use this equation:
where f(x,y) is they pixel value at coordinates (x,y). W and H are the image width and height. The mu_x and mu_y indicate the average x and y. The values i and j select the order of moment you want to compute. Various orders of moment can be combined in different ways for example in the "Hu moments" we can compute 7 numbers using combinations of image moments:
The cool thing about the Hu moments is you can scale, rotate, flip etc the image and you still get the same 7 values which makes this a robust ("affine invariant") image descriptor.
Hope this helps as a general direction to read more in.

OpenCV - Dynamically find HSV ranges for color

When given an image such as this:
And not knowing the color of the object in the image, I would like to be able to automatically find the best H, S and V ranges to threshold the object itself, in order to get a result such as this:
In this example, I manually found the values and thresholded the image using cv::inRange.The output I'm looking for, are the best H, S and V ranges (min and max value each, total of 6 integer values) to threshold the given object in the image, without knowing in advance what color the object is. I need to use these values later on in my code.
Keypoints to remember:
- All given images will be of the same size.
- All given images will have the same dark background.
- All the objects I'll put in the images will be of full color.
I can brute force over all possible permutations of the 6 HSV ranges values, threshold each one and find a clever way to figure out when the best blob was found (blob size maybe?). That seems like a very cumbersome, long and highly ineffective solution though.
What would be good way to approach this? I did some research, and found that OpenCV has some machine learning capabilities, but I need to have the actual 6 values at the end of the process, and not just a thresholded image.
You could create a small 2 layer neural network for the task of dynamic HSV masking.
steps:
create/generate ground truth annotations for image and its HSV range for the required object
design a small neural network with at least 1 conv layer and 1 fcn layer.
Input : Mask of the image after applying the HSV range from ground truth( mxn)
Output : mxn mask of the image in binary
post processing : multiply the mask with the original image to get the required object highligted

Resources