OpenCV + ARToolkit - opencv

For school, I have to do an augmented reality project. ARToolkit is good for tracking markers but my problem is my procamcalib calibration can't be used by artoolkit (distortion coefficient in procamcalib and distortion factor in artoolkit).
I see that with openCV i can calibrate my ps eye and can apply the undistortion directly.
So my question is can get the ps eye image, undistorted it and give it to artoolkit after to have my markers's positions?
Thanks
(Sorry for my english, I'm a french student, if you've got some troubles to read I can explain again)

might be a bit of work to de-couple the video code, but in the end you can use just:
arDetectMarker(dataPtr, thresh, &marker_info, &marker_num)
with pixels from anywhere ( e.g an undistorted opencv-Mat from your pseye )

Not entirely sure if I understanded your question. But you could run the example calibration program that comes with ARToolKit. More information could be found here: Calibrating Your Camera
Then you would be able to get the calibration result "camera_para.dat" at ARToolKit's bin/Data, which could be used later in your project.
If by any chance you are using Unity for your AR project (If not, ignore below), simply import the ARToolKit, then in the AR Controller inspector, give your .dat file a unique name, then include it in "Camera Parameters" option.

Related

Eye to Hand Calibration OpenCV

I am new to this Eye to Hand calibration. I have read the opencv documentation. It says that we need to use cv2.calibrateHandEye(R_gripper2base, t_gripper2base, R_target2cam, t_target2cam).
Can somebody help me by clearly explaining what input values we need to provide, from where these input values need to be taken, and how it has to be in matrix format? Particularly for (R_target2cam, t_target2cam).I am using the UFactory Arm robot and Intel Realsense camera. So I need to calibrate both. Kindly guide me.
This is my Robot position coordinates
So, I think I have Rx,Ry,Rz for R_gripper2base. X,Y,Z for t_gripper2base. What and where can I get the values for R_target2cam, t_target2cam?

Improve image quality

I need to improve image quality, from low quality to high hd quality. I am using OpenCV libraries. I experimented a lot with GaussianBlur(), Laplacian(), transformation functions, filter functions etc, but all I could succeed is to convert image to hd resolution and keep the same quality. Is it possible to do this? Do I need to implement my own algorithm or is there a way how it's done? I will really appreciate any kind of help. Thanks in advance.
I used this link for my reference. It has other interesting filters that you can play with.
If you are using C++:
detailEnhance(Mat src, Mat dst, float sigma_s=10, float sigma_r=0.15f)
If you are using python:
dst = cv2.detailEnhance(src, sigma_s=10, sigma_r=0.15)
The variable 'sigma_s' determines how big the neighbourhood of pixels must be to perform filtering.
The variable 'sigma_r' determines how the different colours within the neighbourhood of pixels will be averaged with each other. Its range is from: 0 - 1. A smaller value means similar colors will be averaged out while different colors remain as they are.
Since you are looking for sharpness in the image, I would suggest you keep the kernel as minimum as possible.
Here is the result I obtained for a sample image:
1. Original image:
2. Sharpened image for lower sigma_r value:
3. Sharpened image for higher sigma_r value:
Check the above mentioned link for more information.
How about applying Super Resolution in OpenCV? A reference article with more details can be found here: https://learnopencv.com/super-resolution-in-opencv/
So basically you will need to have the Python dependency opencv-contrib-python installed, together with a working version of opencv-python.
There are different techniques for the Super Resolution in OpenCV you can choose from, including EDSR, ESPCN, FSRCNN, and LapSRN. Code examples in both Python and C++ have been included in the tutorial article as well for easy reference.
A correction is needed
dst = cv2.detailEnhance(src, sigma_s=10, sigma_r=0.15)
using kernel will give error.
+1 to kris stern answer,
If you are looking for practical implementation of super resolution using pretrained model in OpenCV, have a look at below notebook also video describing details.
https://github.com/pankajr141/experiments/blob/master/Reasoning/ComputerVision/super_resolution_enhancing_image_quality_using_pretrained_models.ipynb
https://www.youtube.com/watch?v=JrWIYWO4bac&list=UUplf_LWNn0a9ubnKCZ-95YQ&index=4
Below is a sample code using opencv
model_pretrained = cv2.dnn_superres.DnnSuperResImpl_create()
# setting up the model initialization
model_pretrained.readModel(filemodel_filepath)
model_pretrained.setModel(modelname, scale)
# prediction or upscaling
img_upscaled = model_pretrained.upsample(img_small)

Determine skeleton joints with a webcam (not Kinect)

I'm trying to determine skeleton joints (or at the very least to be able to track a single palm) using a regular webcam. I've looked all over the web and can't seem to find a way to do so.
Every example I've found is using Kinect. I want to use a single webcam.
There's no need for me to calculate the depth of the joints - I just need to be able to recognize their X, Y position in the frame. Which is why I'm using a webcam, not a Kinect.
So far I've looked at:
OpenCV (the "skeleton" functionality in it is a process of simplifying graphical models, but it's not a detection and/or skeletonization of a human body).
OpenNI (with NiTE) - the only way to get the joints is to use the Kinect device, so this doesn't work with a webcam.
I'm looking for a C/C++ library (but at this point would look at any other language), preferably open source (but, again, will consider any license) that can do the following:
Given an image (a frame from a webcam) calculate the X, Y positions of the visible joints
[Optional] Given a video capture stream call back into my code with events for joints' positions
Doesn't have to be super accurate, but would prefer it to be very fast (sub-0.1 sec processing time per frame)
Would really appreciate it if someone can help me out with this. I've been stuck on this for a few days now with no clear path to proceed.
UPDATE
2 years later a solution was found: http://dlib.net/imaging.html#shape_predictor
To track a hand using a single camera without depth information is a serious task and topic of ongoing scientific work. I can supply you a bunch of interesting and/or highly cited scientific papers on the topic:
M. de La Gorce, D. J. Fleet, and N. Paragios, “Model-Based 3D Hand Pose Estimation from Monocular Video.,” IEEE transactions on pattern analysis and machine intelligence, vol. 33, Feb. 2011.
R. Wang and J. Popović, “Real-time hand-tracking with a color glove,” ACM Transactions on Graphics (TOG), 2009.
B. Stenger, A. Thayananthan, P. H. S. Torr, and R. Cipolla, “Model-based hand tracking using a hierarchical Bayesian filter.,” IEEE transactions on pattern analysis and machine intelligence, vol. 28, no. 9, pp. 1372–84, Sep. 2006.
J. M. Rehg and T. Kanade, “Model-based tracking of self-occluding articulated objects,” in Proceedings of IEEE International Conference on Computer Vision, 1995, pp. 612–617.
Hand tracking literature survey in the 2nd chapter:
T. de Campos, “3D Visual Tracking of Articulated Objects and Hands,” 2006.
Unfortunately I don't know about some freely available hand tracking library.
there is a simple way for detecting hand using skin tone. perhaps this could help... you can see the results on this youtube video. caveat: the background shouldn't contain skin colored things like wood.
here is the code:
''' Detect human skin tone and draw a boundary around it.
Useful for gesture recognition and motion tracking.
Inspired by: http://stackoverflow.com/a/14756351/1463143
Date: 08 June 2013
'''
# Required moduls
import cv2
import numpy
# Constants for finding range of skin color in YCrCb
min_YCrCb = numpy.array([0,133,77],numpy.uint8)
max_YCrCb = numpy.array([255,173,127],numpy.uint8)
# Create a window to display the camera feed
cv2.namedWindow('Camera Output')
# Get pointer to video frames from primary device
videoFrame = cv2.VideoCapture(0)
# Process the video frames
keyPressed = -1 # -1 indicates no key pressed
while(keyPressed < 0): # any key pressed has a value >= 0
# Grab video frame, decode it and return next video frame
readSucsess, sourceImage = videoFrame.read()
# Convert image to YCrCb
imageYCrCb = cv2.cvtColor(sourceImage,cv2.COLOR_BGR2YCR_CB)
# Find region with skin tone in YCrCb image
skinRegion = cv2.inRange(imageYCrCb,min_YCrCb,max_YCrCb)
# Do contour detection on skin region
contours, hierarchy = cv2.findContours(skinRegion, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
# Draw the contour on the source image
for i, c in enumerate(contours):
area = cv2.contourArea(c)
if area > 1000:
cv2.drawContours(sourceImage, contours, i, (0, 255, 0), 3)
# Display the source image
cv2.imshow('Camera Output',sourceImage)
# Check for user input to close program
keyPressed = cv2.waitKey(1) # wait 1 milisecond in each iteration of while loop
# Close window and camera after exiting the while loop
cv2.destroyWindow('Camera Output')
videoFrame.release()
the cv2.findContour is quite useful, you can find the centroid of a "blob" by using cv2.moments after u find the contours. have a look at the opencv documentation on shape descriptors.
i havent yet figured out how to make the skeletons that lie in the middle of the contour but i was thinking of "eroding" the contours till it is a single line. in image processing the process is called "skeletonization" or "morphological skeleton". here is some basic info on skeletonization.
here is a link that implements skeletonization in opencv and c++
here is a link for skeletonization in opencv and python
hope that helps :)
--- EDIT ----
i would highly recommend that you go through these papers by Deva Ramanan (scroll down after visiting the linked page): http://www.ics.uci.edu/~dramanan/
C. Desai, D. Ramanan. "Detecting Actions, Poses, and Objects with
Relational Phraselets" European Conference on Computer Vision
(ECCV), Florence, Italy, Oct. 2012.
D. Park, D. Ramanan. "N-Best Maximal Decoders for Part Models" International Conference
on Computer Vision (ICCV) Barcelona, Spain, November 2011.
D. Ramanan. "Learning to Parse Images of Articulated Objects" Neural Info. Proc.
Systems (NIPS), Vancouver, Canada, Dec 2006.
The most common approach can be seen in the following youtube video. http://www.youtube.com/watch?v=xML2S6bvMwI
This method is not quite robust, as it tends to fail if the hand is rotated to much (eg; if the camera is looking at the side of the hand or at a partially bent hand).
If you do not mind using two camera's you can look into the work Robert Wang. His current company (3GearSystems) uses this technology, augmented with a kinect, to provide tracking. His original paper uses two webcams but has much worse tracking.
Wang, Robert, Sylvain Paris, and Jovan Popović. "6d hands: markerless hand-tracking for computer aided design." Proceedings of the 24th annual ACM symposium on User interface software and technology. ACM, 2011.
Another option (again if using "more" than a single webcam is possible), is to use a IR emitter. Your hand reflects IR light quite well whereas the background does not. By adding a filter to the webcam that filters normal light (and removing the standard filter that does the opposite) you can create a quite effective hand tracking. The advantage of this method is that the segmentation of the hand from the background is much simpler. Depending on the distance and the quality of the camera, you would need more IR leds, in order to reflect sufficient light back into the webcam. The leap motion uses this technology to track the fingers & palms (it uses 2 IR cameras and 3 IR leds to also get depth information).
All that being said; I think the Kinect is your best option in this. Yes, you don't need the depth, but the depth information does make it a lot easier to detect the hand (using the depth information for the segmentation).
My suggestion, given your constraints, would be to use something like this:
http://docs.opencv.org/doc/tutorials/objdetect/cascade_classifier/cascade_classifier.html
Here is a tutorial for using it for face detection:
http://opencv.willowgarage.com/wiki/FaceDetection?highlight=%28facial%29|%28recognition%29
The problem you have described is quite difficult, and I'm not sure that trying to do it using only a webcam is a reasonable plan, but this is probably your best bet. As explained here (http://docs.opencv.org/modules/objdetect/doc/cascade_classification.html?highlight=load#cascadeclassifier-load), you will need to train the classifier with something like this:
http://docs.opencv.org/doc/user_guide/ug_traincascade.html
Remember: Even though you don't require the depth information for your use, having this information makes it easier for the library to identify a hand.
At last I've found a solution. Turns out a dlib open-source project has a "shape predictor" that, once properly trained, does exactly what I need: it guesstimates (with a pretty satisfactory accuracy) the "pose". A "pose" is loosely defined as "whatever you train it to recognize as a pose" by training it with a set of images, annotated with the shapes to extract from them.
The shape predictor is described in here on dlib's website
I don't know about possible existing solutions. If supervised (or semi-supervised) learning is an option, training decision trees or neural networks might already be enough (kinect uses random forests from what i have heard). Before you go such a path, do everything you can to find an existing solution. Getting Machine Learning stuff right takes a lot of time and experimentation.
OpenCV has machine learning components, what you would need is training data.
With the motion tracking features of the open source Blender project it is possible to create a 3D model based on 2D footage. No kinect needed. Since blender is open source you might be able to use their pyton scripts outside the blender framework for your own purposes.
Have you ever heard about Eyesweb
I have been using it for one of my project and I though it might be usefull for what you want to achieve.
Here are some interesting publication LNAI 3881 - Finger Tracking Methods Using EyesWeb and Powerpointing-HCI using gestures
Basically the workflow is:
You create your patch in EyesWeb
Prepare the datas you want to send with a network client
Use theses processed datas on your own server (your app)
However, I don't know if there is a way to embed the real time image processing part of Eyes Web into a soft as a library.

OpenCV : Building a simple 3d model

I Have decided to use OpenCV to build a 3d scene by using a series of 2D Images. I found the example code that came with OpenCV [ build3dmodel.cpp Here ].
I just want to run this once and see what kind of outcome this gives. My knowledge with OpenCV is low, I don't want to understand the whole code, I just want to know how to give inputs to this program (the image set) to see the output.
The line command of this code example requires the following parameters:
build3dmodel -i intrinsics_filename.yml [-d detector] [-de
descriptor_extractor] -m model_name.yml
The first file is the camera matrix which you obtain after the calibration process (there is an especific example with it). Detector and descriptor detector must match with valid FeatureDetector and DescriptorExtractor names. Model name is a bit confusing, it looks like part of the yml file name where data will be saved.
First see some tutorial like introduction to OpenCv or OpenCV tutorial. Also, see input and output with OpenCv.

OpenCV Multilevel B-Spline Approximation

Hi (sorry for my english) .. i'm working in a project for University in this project i need to use the MBA (Multilevel B-Spline Approximation) algorithm to get some points (control points) of a image to use in other operations.
I'm reading a lot of papers about this algorithm, and i think i understand, but i can't writing.
The idea is: Read a image, process a image (OpenCV), then get control points of the image, use this points.
So the problem here is:
The algorithm use a set of points {(x,y,z)} , this set of points are approximated with a surface generated with the control points obtained from MBA. the set of points {(x,y,z)} represents de data we need to approximate (the image)..
So, the image is in a cv::Mat format , how can transform this format to an ordinary array to simply access to the data an manipulate...
Here are one paper with an explanation of the method:
(Paper) REGULARIZED MULTILEVEL B-SPLINE REGISTRATION
(Paper)Scattered Data Interpolation with Multilevel B-splines
(Matlab)MBA
If someone can help, maybe a guideline, idea or anything will be appreciate ..
Thanks in advance.
EDIT: Finally i wrote the algorithm in C++ using armadillo and OpenCV ...
Well i'm using armadillo a C++ linear algebra library to works with matrix for the algorithm

Resources