I am building a website that does cool things using computer vision techniques, with videos live recorded and uploaded by users using their webcam. For this, I need camera intrinsic and distortion parameters. I am trying to figure out what would be the best way to compute these given the user uploaded videos. We can make no assumptions about what videos user might upload - but a reasonable assumption is that a human might be present in the video. I am still in the initial stages of this, but I am interested in knowing how others have solved this problem.
To be specific, below are the questions that I would appreciate someone experienced in the group might comment upon:
What algorithms, libraries and techniques are available to extract intrinsic and distortion parameters of any generic webcam available in the market? [I say "extract" and not "calibrate" to include cases where intrinsic parameters are just a method call away with no calibration necessary].
In general, how much variance have you observed in the intrinsic and distortion parameters in the webcams available in the market? Did you approximate them with a single intrinsic and distortion parameters or what approach did you follow?
What camera self-calibration methods, if any, could be employed in these scenarios? Are there any opensource or commercial libraries available which might be of some help?
If we aim to calibrate the webcams using the videos user record and upload, what assumptions in the parameters [like fx==fy or no distortion params] makes sense and sounds reasonable to you?
Would a reasonable approximation of intrinsic and distortion params for all the cameras make sense? What would be a reasonable approach to validate how good particular intrinsic and distortion parameters are for a specific webcam?
Are there any other issues that need to be considered?
Sometimes I am the one who comes with the bad news :) So do I now.
For almost all your points there the clear answer is No, None, Not, and so on. Only for the last point, with the other issues, the answer is not a no, but a long list :).
Actually, camera calibration without a chessboard and some specific constraints is almost impossible.
The closest implementation to a no-assumptions calibration is found in the stitching module in OpenCV. Hovewer, it is not perfect, and it's not working on random videos. Give it a try.
There is the famous Camera Calibration Toolbox, a good Matlab implementation of extracting intrinsic and extrinsic parameters.
There is a variance not only amongst webcams, but also of:
Different modules
Different zoom levels (Affects the optics)
I think that this is a really hard problem, if you restrict yourself to making no assumptions regarding the video. Both the calibration and the evaluation is hard if you don't use something known - such as checker board in Camera Calibration Toolbox.
Many algorithms, including the currently used in opencv requires that known points can be detected (e.g corners in a chess board). You would have to require that your users took pictures of this known patterns, which ruin the concept of random videos. I dont have a solution to this but you might want to consider requiring users to record videos of structures scenes(no specific patterns or objects) and use the algorithm described in:
"Camera calibration with lens distortion from low-rank textures"
http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5995548&tag=1
Haven't tried it myself though.
Related
Problem: I have a photo of an object (a manufactured part like the attached photo below), using my Andoird phone camera I want to verify if the object in camera preview matches to the template or not. (in other words, is it the same part as the template or not)
I can make the user to move the camera in order to have similar view of the template in camera preview as the template however there will be different noise level and/or lighting and maybe different background.
Question: What do you recommend me to use for solving this problem? I was thinking of Canny edge extraction and then matching the camera frames towards the canny edge extract from template? is this a good idea? if yes would you please tell me how can I implement this? any resources? samples? (I can do the Canny edge extraction but couldn't find a way to do the matching)
if Not a good idea then what do you recommend?
Things I have tried:
Feature Extract and Matching: I used few different extractor and matcher implementations from OpenCV and my app is working and drawing the detected feature points and matches, etc. however being a beginner with image processing I cannot make sense of the result and also how to know what is a match. any idea, help, good resources?
Template Matching: I used OpenCV template matching however the performance was horrible and I decided that this cannot be the solution.
I tried object recognition with my phone on your test image and the results were positive.
Detector used :ORB(Binary Detector).
Descriptor used :ORB.
Matching Technique : Brute-force matching .
Image Size 640x480.
I was able to detect around 500 feature points (number of keypoints is around sufficient but it might produce false matches when you have more images with similar looking objects.you need to refine your matching to avoid false matches).
Result of object recognition on two different scales.
Regarding you finding difficulties in understanding object recognition. What exactly did you not understand(Specific topic).
I recommend you to go thru the these two books
Learning OpenCV by By Adrian Kaehler, Gary Bradski
OpenCV 2 Computer Vision Application Programming Cookbook by by Robert Laganière(chapter 8 & 9).
Cheers!
from what I understand canny edge detection might not be an optimal solution. according to me after some basic pre-processing of the test image find its sift features and compare it with the sift features of the template. sift being really versatile should work here too.
you can also try opensurf feature they are faster than sift but i havent had an opportunity to work alot with them to be able to comment on its accuracy
I work at an airport where we need to determine the visibility conditions of pilots.
To do this, we have signs placed every 200 meters along the runway that allow us to determine how far the visibility is. We have multiple runways, and the visibility needs to be checked every hour.
Right now the visibility check is done manually with a human being who looks at the photos from the cameras placed at the end of each runway. So it can be tedious.
I'm a programmer who has very little experience with machine learning, but this sounds like an easy problem to automate. How should I approach this problem? Which algorithms should I study? Would OpenCV help me?
Thanks!
I think this can be automated using computer vision techniques. openCV could make the implementation easier. If all the signs are similar then ,we can train our program to recognize the sign in a specific conditions(lights). Then, we can use the trained classifier to check for the visibility of signs every hours using a simple script.
There is harr-like feature extraction already in openCV. You can use to train classifier which will output a .xml file and use that .xml file for detecting the sign regularly.
I have done a similar project RTVTR(Real Time Vehicle Tracking and Recognition) using openCV and it worked great. http://www.youtube.com/watch?v=xJwBT76VEZ4
Answering to your questions:
How should I approach this problem?
It depends on the result you want/need to obtain. Is this an "hobby" project (even if job-related) or do you need to build a machine vision system to solve the problem and should it be compliant with some regulations or standard?
Which algorithms should I study?
I am very interested in your question but I am not an expert in the field of meteorology and so searching in the relative literature is, for me, a time consuming task... so I reserve to update this part of the answer in the future. I think there will be different algorithms involved in the solution of the problem, some are very general like for example algorithms for the image segmentation, some are very specific like for example how to measure the visibility.
Update: one of the keyword for searching in the literature is Meteorological Visibility, for example
HAUTIERE, Nicolas, et al. Automatic fog detection and estimation of visibility distance through use of an onboard camera. Machine Vision and Applications, 2006, 17.1: 8-20.
LENOR, Stephan, et al. An Improved Model for Estimating the Meteorological Visibility from a Road Surface Luminance Curve. In: Pattern Recognition. Springer Berlin Heidelberg, 2013. p. 184-193.
Would OpenCV help me?
Yes, I think OpenCV can help giving you a starting point.
An idea for a naïve algorithm:
Segment the image in order to get the pixel regions belonging to the signs and to the background.
Compute the measure of visibility according to some procedure, the measure is computed by a function that has as input the regions of all the signs and the background region.
The segmentation can be simplified a lot if the signs are always in the same fixed and known position inside the image.
The measure of visibility is obviously the core of the algorithm and it can be performed in a lot of ways...
You can follow a simple approach where you compute the visibility with a mathematical formula based on the average gray level of the signs and background regions.
You can follow a more sophisticated and machine-learning oriented approach where you implement an algorithm that mimics your current human being based procedure. In this case your problem can be framed as a supervised learning task: you have a set of training examples, each training example is a pair composed by a) the photo of the runway (the input) and b) the visibility related to that photo and computed by human (the desired output). Then the system is trained by means of the training set and when you give a new photo as input it will give you back the visibility measure. I think you have a log for past visibility measures (METAR?) and if you saved the related images too, you will already have a relevant amount of data in order to build a training set and a test set.
Update in the age of Convolutional Neural Networks:
YOU, Yang, et al. Relative CNN-RNN: Learning Relative Atmospheric Visibility from Images. IEEE Transactions on Image Processing, 2018.
Both Tensor and uvts_cvs 's replies are very helpful. While the opencv mainly aims to recognize the sign pattern or even segment it from the background, when you extract the core feature in your problem : visibility, you may still need to include the background signal in your training set. I assume manual check of visibility is based on image contrast, if so, the signal-to-noise ratio(SNR) or contrast-to-noise ratio(CNR) is a good feature in learning. A threshold is defined to classify 'visible-1' and 'invisible-0'. The SNR/CNR can be obtained automatically especially if your sign position and size are fixed in your camera images.
Gather whole bunch of photos and videos and propose it as a challenge on Kaggle. I am sure many people would like to try solve it, even if reward would not be very high.
You can use the template matching functionality of openCV:
http://docs.opencv.org/doc/tutorials/imgproc/histograms/template_matching/template_matching.html
Where the template is the sign. If you manage to find a correct match, then the sign is visible. I think you can also get a sense of the scale of the sign in the image from that code.
As this is a very controlled and static environment, you have perfect conditions to estimate the visibility with vision-based approaches. Nonetheless, it is not so easy to decide which approach to take. In my thesis, I am reviewing this topic in depth for the less well-controlled environment of road traffic. See: LENOR, Stephan. Model-Based Estimation of Meteorological Visibility in the Context of Automotive Camera Systems. 2016. Doktorarbeit. (https://archiv.ub.uni-heidelberg.de/volltextserver/20855/1/20160509_lenor_thesis_final_print.pdf).
I see two major directions you could follow up:
Model-based approaches: Advantages: Not so much dependent on your very specific setup. You do not need heavy collection of data.
Data-based approaches/ML: Advantages: Can hide the whole complexity of different light and weather conditions. You seem to have a good source of data if there are people doing the job right now. Very promising without much engineering effort (just use a light-weighted CNN with few layers or so).
You could also combine both, etc. etc. If you are still interested in a solution, you can contact me again and I am happy to consult in more depth.
I would like to use computer vision to do the following:
A camera is mounted outside a building, capturing a videostream of the street below. The camera is installed approximately 5-6 meters above the street.
Whenever a person wearing a certain kind of hat(white, round) is captured by the camera, an event should be triggered.
Which algorithm should I look into to implement this kind of behavior ?
Is this best achieved through training the algorithm with sample data or is there another way to tell it to look for this type of hat ?
Also, how do I use multiple frames of video to increase the quality of detection ?
Edit: Added a picture of the hat
Before we do everything in comments I will start an answer here.
The first link you posted describes a simple color-based detection. You can try that, but it will fail if there are other pixel clusters of similar color in the image. Your idea of combining it with tracking is good: Identify clusters, build trajectories over several images, and only accept plausible trajectories as a hit. For robust tracking you may want to look into Kalman filtering. A problem you will most likely encounter is that a "white" hat will hardly be "white" in the images your camera delivers.
The second link you refer to - boosted Classifiers Based on Haar-like Features - is for detection of more complex objects. It probably won't help you find white blobs. Invest your time and energy in learning about tracking.
I'm happy to repeat myself here: "Solving a computer vision problem" is not something like "sorting an array". OpenCV is not the C++ Standard Library. You can use an std::map without knowing anything about a red-black tree. But (IMHO) you can't use Vision APIs without knowing a good deal of the math and theory. Working solutions Computer Vision are typically heavily tuned towards the specific problem scenario. Sorry if that sounds pedantic, but it explains why your question got beaten.
For my project, supposed to segment closest hand region from camera, I initially try openCV's stereovision example. However, disparity map looks very bad and its useless for me.
Is there any other method which is better than openCV implementation and have some output(image-video). Because, my time is limited, I must choose one better algorithm and implement this.
Thank you.
OpenCV implements a number of stereo block matching algorithms some of them pretty cutting edge.
Disparity maps always look bad except in very simple circumstances - the first step is to try and improve the source images, the lighting and the background. I
If it was easy then everybody would eb doing it and there would be no market for expensive 3D laser scanners.
Try the different block matching algorithms provided by OpenCV. The little bit of experimentation I've done so far seems to indicate that cv::StereoSGBM gives better disparity maps than cv::StereoBM, but is slower.
The performance of the block matching algorithms will depend on what parameters they are initialized with. Have a look at the stereo examples again here, notice line 195-222 where the algorithms are initialized.
I also suggest you use some basic GUI (OpenCV:s highgui for example) to manipulate these parameters real-time when finetuning the algorithm.
In brief, what are the available options for implementing the Tracking of a particular Image(A photo/graphic/logo) in webcam feed using OpenCv?In particular i am trying to collate opinion about the following:
Would HaarTraining be overkill(considering that it is not 3d objects but simply Images to be tracked) or is it the only way out?
Have tried Template Matching, Color-based detection but these don't offer reliable tracking under varying illumination/Scale/Orientation at all.
Would SIFT,SURF feature matching work as reliably in video as with static image
comparison?
Am a relative beginner to OpenCV , as is evident by my previous queries on SO (very helpful replies). Any cues or links to what could be good resources for beginning NFT implementation with OpenCV?
Can you talk a bit more about your requirements? Namely, what type of appearance variations do you expect/how much control you have over the environment. What type of constraints do you have in terms of speed/power/resource footprint?
Without those, I can only give some general assessment to the 3 paths you are talking about.
1.
Haar would work well and fast, particularly for instance recognition.
Note that Haar doesn't work all that well for 3D unless you train with a full spectrum of templates to cover various perspectives. The poster child application of Haar cascades is Viola Jones' face detection system which is largely geared towards frontal faces (can certainly be trained for many other things)
For a tutorial on doing Haar training using OpenCV, see here.
2.
Try NCC or better yet, Lucas Kanade tracking (cvCalcOpticalFlowPyrLK which is a pyramidal as in coarse-to-fine LK - a 4 level pyramid usually works well) for a template. Usually good upto 10% scale or 10 degrees rotation without template changes. Beyond that, you can have automatically evolving templates which can drift over time.
For a quick Optical Flow/tracking tutorial, see this.
3.
SIFT/SURF would indeed work very well. I'd suggest some additional geometric verification step to remove spurious matches.
I'd be a bit concerned about the amount of computational time involved. If there isn't significant illumination/scale/in-plane rotation, then SIFT is probably overkill. If you truly need it, check out Changchang Wu's excellent SIFTGPU implmentation. Note: 3rd party, not OpenCV.
It seems that none of the methods when applied alone could bring reliable results unless it is a hobby project. Probably some adaptive algorithm would be more or less acceptable. For example see a famous opensource project where they use machine learning.