Kinect - Detect where the user is looking - orientation

I would like to detect whether the user is watching the screen or not using Kinect.
I've noticed when displaying the body joints orientation (W, X, Y and Z values) that the head joint returns 0 for all orientations. Why is that? Should I use other joints to determine this information?

To determine if the user is looking at the screen, you can infer from it by having to detect a skeleton that is at a certain range maybe?
as for the return value 0, it means that it is not being tracked.
source : Here
but basically if you can detect a skeleton, there is a high chance the user is looking at you

Related

finding the depth in arkit with SCNVector3Make

the goal of the project is to create a drawing app. i want it so that when i touch the screen and move my finger it will follow the finger and leave a cyan color paint. i did created it BUT there is one problem. the paint DEPTH is always randomly placed.
here is the code, just need to connect the sceneView with the storyboard.
https://github.com/javaplanet17/test/blob/master/drawingar
my question is how do i make the program so that the depth will always be consistent, by consistent i mean there is always distance between the paint and the camera.
if you run the code above you will see that i have printed out all the SCNMatrix4, but i none of them is the DEPTH.
i have tried to change hitTransform.m43 but it only messes up the x and y.
If you want to get a point some consistent distance in front of the camera, you don’t want a hit test. A hit test finds the real world surface in front of the camera — unless your camera is pointed at a wall that’s perfectly parallel to the device screen, you’re always going to get a range of different distances.
If you want a point some distance in front of the camera, you need to get the camera’s position/orientation and apply a translation (your preferred distance) to that. Then to place SceneKit content there, use the resulting matrix to set the transform of a SceneKit node.
The easiest way to do this is to stick to SIMD vector/matrix types throughout rather than converting between those and SCN types. SceneKit adds a bunch of new accessors in iOS 11 so you can use SIMD types directly.
There’s at least a couple of ways to go about this, depending on what result you want.
Option 1
// set up z translation for 20 cm in front of whatever
// last column of a 4x4 transform matrix is translation vector
var translation = matrix_identity_float4x4
translation.columns.3.z = -0.2
// get camera transform the ARKit way
let cameraTransform = view.session.currentFrame.camera.transform
// if we wanted, we could go the SceneKit way instead; result is the same
// let cameraTransform = view.pointOfView.simdTransform
// set node transform by multiplying matrices
node.simdTransform = cameraTransform * translation
This option, using a whole transform matrix, not only puts the node a consistent distance in front of your camera, it also orients it to point the same direction as your camera.
Option 2
// distance vector for 20 cm in front of whatever
let translation = float3(x: 0, y: 0, z: -0.2)
// treat distance vector as in camera space, convert to world space
let worldTranslation = view.pointOfView.simdConvertPosition(translation, to: nil)
// set node position (not whole transform)
node.simdPosition = worldTranslation
This option sets only the position of the node, leaving its orientation unchanged. For example, if you place a bunch of cubes this way while moving the camera, they’ll all be lined up facing the same direction, whereas with option 1 they’d all be in different directions.
Going beyond
Both of the options above are based only on the 3D transform of the camera — they don’t take the position of a 2D touch on the screen into account.
If you want to do that, too, you’ve got more work cut out for you — essentially what you’re doing is hit testing touches not against the world, but against a virtual plane that’s always parallel to the camera and a certain distance away. That plane is a cross section of the camera projection frustum, so its size depends on what fixed distance from the camera you place it at. A point on the screen projects to a point on that virtual plane, with its position on the plane scaling proportional to the distance from the camera (like in the below sketch):
So, to map touches onto that virtual plane, there are a couple of approaches to consider. (Not giving code for these because it’s not code I can write without testing, and I’m in an Xcode-free environment right now.)
Make an invisible SCNPlane that’s a child of the view’s pointOfView node, parallel to the local xy-plane and some fixed z distance in front. Use SceneKit hitTest (not ARKit hit test!) to map touches to that plane, and use the worldCoordinates of the hit test result to position the SceneKit nodes you drop into your scene.
Use Option 1 or Option 2 above to find a point some fixed distance in front of the camera (or a whole translation matrix oriented to match the camera, translated some distance in front). Use SceneKit’s projectPoint method to find the normalized depth value Z for that point, then call unprojectPoint with your 2D touch location and that same Z value to get the 3D position of the touch location with your camera distance. (For extra code/pointers, see my similar technique in this answer.)

Tracking objects from camera; PID controlling; Parrot AR Drone 2

I am working on a project where I should implement object tracking technique using the camera of Parrot AR Drone 2.0. So the main idea is, a drone should be able to identify a specified colour and then follow it by keeping some distance.
I am using the opencv API to establish communication with the drone. This API provides function:
ARDrone::move3D(double vx, double vy, double vz, double vr)
which moves the AR.Drone in 3D space and where
vx: X velocity [m/s]
vy: Y velocity [m/s]
vz: Z velocity [m/s]
vr: Rotational speed [rad/s]
I have written an application which does simple image processing on the images obtained from the camera of the drone using OpenCV and finds needed contours of the object to be tracked. See the example below:
Now the part I am struggling is finding the technique using which I should find the velocities to be sent to the move3D function. I have read that common way of doing controlling is by using PID controlling. However, I have read about that and could not get how it could be related to this problem.
To summarise, my question is how to move a robot towards an object detected in its camera? How to find coordinates of certain objects from the camera?
EDIT:
So, I just realized you are using a drone and your cordinate system WRT the drone is likely x forward into image, y left of image (image columns), z up vertically(image rows). My answer has coordinates WRT the camera x = columns, y = rows, z = depth (into image) Keep that n mind when you read my outline. Also everything I wrote is psuedo-code, it won't run without many modifications
Original Post:
A PID controller is a Proportional–integral–derivative controller. It decides an action sequenced based on the your specific error.
For your problem lets assume the optimal tracking means the rectangle is in the center of the image, and it takes up ~30% of the pixel space. This means that you move your camera/bot until these conditions are met. We will call these goal parameters
x_ideal = image_width / 2
y_ideal = image_height / 2
area_ideal = image_width * image_height * 0.3
Now lets say your bounding box is characterized by 4 parameters
(x_bounding, y_bounding, width_bounding_box, height_bounding_box)
Your error would be something along the lines of:
x_err = x_bounding - x_ideal;
y_err = y_bounding - y_ideal;
z_err = area_ideal - (width_bounding_box * height_bounding_box)
Notice I have tied the z distance (depth) to the size of the object. This assumes that the object being tracked is rigid and doesn't change size. Any change in size is due to the objects distance to the camera (a bigger bounding box means object is close, a small one means object is far). This is a bit of an estimation, but without having any parameters on the camera or the object itself we can only make these general statements.
We need to keep the sign in mind when creating our control sequences, this is why the order matters when doing subtraction. Lets think about this logically. x_err determines how far off the bounding box is horizontally from the desired position. In our case this should be positive, meaning the bot should move to the left so the object moves closer to the center of the image. The box is too small, meaning the object is too far away, etc.
z_err < 0 : means bot is too close and needs to slow down, Vz should be reduced
z_err = 0 : keep the speed command the same, no change
z_err > 0 : we need to get closer, Vz should increase
x_err < 0 : means bot is to the right and needs to turn left(decreasing x), Vx should be reduced
x_err = 0 : keep the speed in X the same, no change to Vx
x_err > 0 : means bot is to the left and needs to turn right(increasing x), Vx should be increased
We can do the same for each y axis. Now we use this error to create a command sequence for the bot.
That description sounds a lot like a PID controller. Observe a state, figure out an error, create a control sequence to reduce the error, then repeat the process over and over. In your case the velocity would be the actions output by your algorithm. You will essentially have 3 PIDs running
PID for X
PID for Y
PID for Z
Because these are orthogonal by nature, we can say each system is independent (and ideally it is), moving in the x direction shouldn't affect the Y direction. This example also completely ignores the bearing information (Vr), but it's meant to be a thought exercise, not a complete solution
The exact velocity of the corrections is determined by your PID coefficients and this is where things get a little tricky. Here is a easy to read (almost no math) overview or PID control. You will have to play with your system (aka "Tune" your parameters) through a bit of experimentation. This is made even more difficult because the camera is not a full 3d sensor, so we can't extract true measurements from the environment. It hard to convert an error of ~30 pixels into m/s without knowing more information about your sensor/environment, But I hope this gave you a general idea of how to proceed

iOS camera lens Position change values

I am working on camera app which supports above iOS8.Due to my requirement i need to change the lens values.(Not focus point).My lens position values are something like 36 mm,45 mm and so on.How can i apply these values to camera or any other default values available? I am using AVCapture for taking photos. Any help would be thankful.
Not sure if I understand your question because when you move the lens position, you are in fact changing camera's focal point (camera's optical properties), not the same thing as focus point of interest witch is the area of interest in the the image.
So, to change the lens position you can do it with this method:
- (void)setFocusModeLockedWithLensPosition:(float)lensPosition
completionHandler:(void (^)(CMTime syncTime))handler
You can find it in AVCaptureDevice class reference (this will be the device you selected has camera for your app). lensPosition value it's beetween 0 and 1, it's a relative positioning system, so you can't use those values (36mm and 45mm).

How to measure user distance from wall

I need to measure distance of wall from user. When user open the camera and point to the any surface i need to get the distance. I have read some link Is it possible to measure distance to object with camera? and i used code for find the iphone camera angle from here http://blog.sallarp.com/iphone-accelerometer-device-orientation.
- (void)accelerometer:(UIAccelerometer *)accelerometer didAccelerate:(UIAcceleration *)acceleration
{
// Get the current device angle
float xx = -[acceleration x];
float yy = [acceleration y];
float angle = atan2(yy, xx);
}
d = h * tan angle
But nothing happen in the nslog and camera.
In the comments, you shared a link to a video: http://youtube.com/watch?v=PBpRZWmPyKo.
That app is not doing anything particularly sophisticated with the camera, but rather appears to be calculate distances using basic trigonometry, and it accomplishes this by constraining the business problem in several critical ways:
First, the app requires the user to specify the height at which the phone's camera lens is being held.
Second, the user is measuring the distance to something sitting on the ground and aligning the bottom of that to some known location on the screen (meaning you have a right triangle).
Those two constraints, combined with the accelerometer and the camera's lens focal length, would allow you to calculate the distance.
If your target cross-hair was in the center of the screen, it greatly simplifies the problem and it becomes a matter of simple trigonometry, i.e. your d = h * tan(angle).
BTW, the "angle" code in the question appears to measure the rotation about the z-axis, the clockwise/counter-clockwise rotation as the device faces you. For this problem, though, you want to measure the rotation of the device about its x-axis, the forward/backward tilt. See https://stackoverflow.com/a/16555778/1271826 for example of how to capture the device orientation in space. Also, that answer uses CoreMotion, whereas the article referenced in your question is using an API that has since been deprecated.
The only way this would be possible is if you could read out the setting of the auto-focus mechanism in the lens. To my knowledge this is not possible.

iPhone augmented reality Euler angles rotation – roll issue

I’m working on an iOS augmented reality application.
It is location-based, not marker-based.
I use the GPS, compass and accelerometers to get latitude, longitude, altitude and the 3 euler angles: yaw, pitch and roll. I know using NSLog() that those 6 variables contain valid data.
My application shows some 3d objects over the camera view.
It works fine as long as I use everything but the roll angle.
If I add that third angle, the rotation applied to my opengl world is not good. I do it that way in the main OpenGL draw method
glRotatef(pitch, 1, 0, 0);
glRotatef(yaw, 0, 1, 0);
//glRotatef(roll, 0, 0, 1);
I think there is something wrong with this approach but am certainly not a specialist. Maybe I should create some sort of unique rotation matrix rather than 3 different ones?
Maybe that’s not possible easily? After all most desktop video games, FPS and the like, just let the user change the yaw and the pitch using the mouse, so only 2 angles, not 3. But unlike the mouse, which is a 2d device, a phone used for augmented reality can move in any angles.
But then again, all AR tutorials I have seen online couldn’t handle ‘roll’ properly. ‘Rolling’ your phone would either completely mess AR stuff up or do nothing at all, using some roll-compensation strategies.
So my question is, assuming I have my 3 Euler angles using the phone sensors, how should I apply them to my 3d opengl view?
I think you're likely talking about gimbal lock.
The essence of the problem is that if you rotate with Eulers then there's always a sequence to it. For example, you rotate around x, then around y, then z. But then one axis can always becomes ambiguous because a preceding can move it onto a different axis.
Suppose the rotation were 0 degrees around x, 90 degrees around y, then 20 degrees around z. So you do the x rotation and nothing has changed. You do the y rotation and everything moves 90 degrees. But now you've moved the z axis onto where the x axis was previously. So the z rotation will appear to be around x.
No matter what most people's instincts tell them, there's no way to avoid the problem. The kneejerk reaction is that you'll always rotate around the global axes rather than the local one. That doesn't resolve the problem, it just reverses the order. The z rotation could then the y rotation — which has already occurred — into an x rotation.
You're right that you should aim to create a unique description of rotation separated from measuring angles.
For augmented reality it's actually not all that difficult.
The accelerometer tells you which way down is. The compass tells you which way north is. The two may not be orthogonal though — the compass reading should vary from being exactly at a right angle to the floor on the equator to being exactly parallel to the accelerometer at the poles.
So:
just accept the accelerometer vector as down;
get the cross product of down and the compass vector to get your side vector — it should point along a line of longitude;
then get the cross product of your side vector and your down vector to get a north vector that is suitably perpendicular.
You could equally use the dot product to remove that portion of the compass vector that is in the direction of gravity and cross product from there.
You'll want to normalise everything.
That gives you three basis vectors, so just put them directly into a matrix. No further work required.

Resources