How to use the VGA camera as a optical sensor? - image-processing

I am designing an information kiosk which incorporates a mobile phone hidden inside the kiosk.
I wonder whether it would be possible to use the VGA camera of the phone as a sensor to detect when somebody is standing in front of the kiosk.
Which SW components (e.g. Java, APIs, bluetooth stack etc) would be required for a code to use the VGA camera for movement detection?

Obvious choice is to use face detection. But you would have to calibrate this to ensure that the face detected is close enough to the kiosk. May be using the relative size of the face in the picture. This could be done using opencv lib which is widely used. But as this kiosk would be deployed in places you would have little control of the lighting, there's a good chance of false positives and negatives. May be you also want to consider a proximity sensor in combination with face detection.

Depending on what platform is the information kiosk using the options would vary... But assuming there is linux somewhere underneath, you should take a look at OpenCV library. And in case it is of any use - here's a link to my funny experiment to get the 'nod-controlled interface' for reading the long web pages.
And speaking of false positives - or even worse - false negatives - in case of bad lighting or unusual angle the chances are pretty high. So you'd need to complement that by some fallback mechanism like onscreen button 'press here to start' which would be there by default, and then use the inactivity timeout alongside with the face detection to avoid having just one information input vector.
Another idea (depending on the light conditions), might be to measure the overall amount of light in the picture - natural light should be eliciting only slow changes, while the person walking close to the kiosk would cause rapid lighting change.

In j2me (java for mobile phones), you can use the mmapi (mobile media api) to capture the camera screen.
Most phones support this.

#Andrew's suggestion on OpenCV is good. There are a lot of motion detection projects. BUT, I would suggest adding a cheap CMOS camera rather than the mobile phone camera.

Related

how does google measure app works on android?

I can see that it can measure horizontal and vertical distances with +/-5% accuracy. I have a use case scenario in which I am trying to formulate an algorithm to detect distances between two points in an image or video. Any pointers to how it could be working would be very useful to me.
I don't think the source is available for the Android measure app, but it is ARCore based and I would expect it uses a combination of triangulation and knowledge it reads from the 'scene', using the Google ARCore term, it is viewing.
Like a human estimating distance to a point, by basic triangulation between two eyes and the point being looked at, a measurement app is able to look at multiple views of the scene and to measure using its sensors how far the device has moved between the different views. Even a small movement allows the same triangulation techniques be used.
The reason for mentioning all this is to highlight that you do not have the same tools or information available to you if you are analysing image or video files without any position or sensor data. Hence, the Google measure app may not be the best template for you to look to for your particular problem.

track user translation movement on ios using sensors for vr game?

I'm starting to experiment VR game development on ios. I learned a lot from google cardboard sdk. It can track user's head orientation, but it can not track user's translation. This shortage cause the use can only look at the virtual environment from a fix location (I known I can add auto walk to the game, but it's just not the same).
I'm searching around the internet, some says translation tracking just can't be done by using sensors, but it seems combining magnetometer, you can track user's movement path, like this example.
I also found a different method called SLAM, which use camera and opencv to do some feature tracking, then use feature point informations to calculate translation. Here's some example from 13th Lab. And google has a Tango Project which is more advanced, but it require hardware support.
I'm quite new to this kind of topic, so I 'm wondering, if I want to track not only the head orientation but also the head(or body) translation movement in my game, which method should I choose. SLAM seems pretty good, but it's also pretty difficult, and I think it will has a big impact on the cpu.
If you are familiar with this topic, please give some advice, thanks in advance!
If high accuracy is not important, you can try using the accelerometer to detect walking movement (basically a pedometer) and multiply it with an average human step width. Direction can be determined by the compass / magnetometer.
High accuracy tracking would likely require complex algorithms such as SLAM, though many such algorithms have already been implemented in VR libraries such as Vuforia or Kudan
Hi I disagree with you Zhiquiang Li
Look at this video made with kudan, the video is quite stable and moreover my smartphone is a quite old phone.
https://youtu.be/_7zctFw-O0Y

How does knocktounlock work?

I am trying to figure out how knocktounlock.com is able to detect "knocks" on the iPhone. I am sure they use the accelerometer to achieve this, however all my tries come up with false flags (if user moves, jumps, etc it sometimes fires)
Basically, I want to be able to detect when a user knocks/taps/smacks their phone (and be able to distinguish that from things that may also give a rise to the accelerometer). So I am looking for sharp high peeks. The device will be in the pocket so the movement of the device will not be very much.
I have tried things like high/low pass (not sure if there would be a better option)
This is a duplicate of this: Detect hard taps anywhere on iPhone through accelerometer But it has not received any answers.
Any help/suggestions would be awesome! Thanks.
EDIT: Looking for more thoughts before I accept the answer below. I did hear back from Knocktounlock and they use the fourth derivative (jounce) to get better values to then analyse. Which is interesting.
I would consider knock on the iPhone to be exactly same as bumping two phones with each other. Check out this Github Repo,
https://github.com/joejcon1/iOS-Accelerometer-visualiser
Build&Run the App on iPhone and check out the spikes on Green line. You can see the value of the spike clearly,
Knocking the iPhone:
As you can see the time of the actual spike is very short when you knock the phone. However the spike patterns are little different in Hard Knock and Soft knock but can be distinguished programmatically.
Now lets see the Accelerometer pattern when iPhone moves in space freely,
As you can see the Spikes are bell shaped that means the it takes a little time for spike value to return to 0.
By these pattern it will be easier to determine the knocking pattern. Good Luck.
Also, This will drain your battery out as the sensor will always be running and iPhone needs to persist connection with Mac via Bluetooth.
P.S.: Also check this answer, https://stackoverflow.com/a/7580185/753603
I think the way to go here is using pattern recognition with accelerometer data.
You could (write and) train a classifier (e.g. K-nearest neighbor) with data you gathered and that has been classified by hand. Neural networks are also an option. However, there will be many different ways to solve that problem. But there is probably no straightforward way for achieving this.
Some papers showing pattern recognition approaches to similar topics (activity, movement), like
http://www.math.unipd.it/~cpalazzi/papers/Palazzi-Accelerometer.pdf
(some more, but I am not allowed to post them with my reputation count. You can search for "pattern recognition accelerometer data")
There is also a master thesis about gesture recognition on the iPhone:
http://klingmann.ch/msc_thesis_marco_klingmann_iphone_gestures.pdf
In general you won't achieve 100% correct classification. Depending on the time/knowledge one has got the result will vary between good-usable and we-could-use-random-classification-instead.
Just a though, but It could be useful to add to the mix the output of the microphone to listen to really short, loud noises at the same time that a possible "knock" movement has been detected.
I am surprised that 4th derivative is needed, intuitively feels to me 3rd ("jerk", the derivative of acceleration) should be enough. It is a big hint what to keep eye on, though.
It seems quite simple to me: collect accelerometer data at high rates, plot on chart, observe. Calculate from that first derivative, plot&observe. Then rinse&repeat, derivative of the last one. Draw conclusions. I highly doubt you will need to do pattern recognition per se, clustering/classifiers/what-have-you - i think you will see very distinct peak on one of your charts, may only need to tune collection rate and smoothing.
It is more interesting to me how come you don't have to be running the KnockToUnlock app for this to work? And if it was running in the background, who left it run there for unlimited time. I dont think accel. qualifies for unlimited background run. And after some pondering, i am guessing the reason is that the app uses Bluetooth to connect Mac as accessory - and as such gets a pass from iOS to run in the background (and suck your battery, shhht)
To solve this problem you need to select the frequency. Tap (knock)
has a very high frequency, so you should chose the frequency of the
accelerometer is not lower than 50 Hz (perhaps even 100 Hz) for
quality tap detection in the case of noise from other movements.
The use of classifiers is necessary, but in order to save battery consumption you should not call a classifier very often.It should write a simple algorithm that would find only taps and situation similar to knoks and report that you program need to call a classifier.
Note the gyro signal, it also responds to knocks, besides the
gyroscope signal not be need separated from the constant component
and the gyroscope signal contains less noise.
That is a good video about the basics of working with smartphones sensors: http://talkminer.com/viewtalk.jsp?videoid=C7JQ7Rpwn2k#.UaTTYUC-2Sp .

Real Time Optical Flow

I'm using optical flow as a real time obstacle detection and avoidance system for the visually impaired. I'm developing the application in c# and using Emgu Cv for image processing. I use the Lucas and Kanade method and I'm pretty satisfied with the speed of the algorithm. I am using monocular vision thus making it hard for me to compute the depth accurately to each of the features being tracked and to alert the user accordingly. I plan on using an ultrasonic sensor to help with the obstacle detection due to the fact that depth computation is hard with monocular camera. Any suggestions on how I could get an accurate estimation of depth using the camera alone?
You might want to check out this paper: A Robust Visual Odometry and Precipice Detection System Using Consumer-grade Monocular Vision. They usea nice trick for detecting as well obstacles as holes in the field of view.
Hate to give such a generic answer, but you'd be best off starting with a standard text on structure-from-motion to get an overview of techniques. A good one is Richard Szeliski's recent book available online (Chapter 7), and its references. After that, for your application you may want to look at recent work in SLAM - Oxford's Active Vision group have published some great work and Andrew Davison's group too.
more a comment on RobAu's answer below,
'structure from motion' might give better search results, than '3d from video'
Depth from one care will only work if you have movement of the camera. You could look into some 3d from video approaches. It is a very hard problem, especially when the objects in the field of view of the camera are moving as well.

Feature Detection in Noisy Images

I've built an imaging system with a webcam and feature matching such that as I move the camera around; I can track the camera's motion. I am doing something similar to here, except with the webcam frames as the input.
It works really well for "good" images, but when taking images in really low light lots of noise appears (camera high gain), and that messes with the feature detection and matching. Basically, it doesn't detect any good features, and when it does, it cannot match them correctly between frames.
Does anyone know a good solution for this? What other methods are used for finding and matching features?
Here are two example images with very low features:
I think phase correlation is going to be your best bet here. It is designed to tell you the phase shift (i.e., translation) between two images. It is much more resilient (but not immune) to noise than feature detection because it operates in frequency space; whereas, feature detectors operate spatially. Another benefit is, it is very fast when compared with feature detection methods. I have an implementation available in the OpenCV trunk that is sub-pixel accurate located here.
However, your images are pretty much "featureless" with the exception of the crease in the middle, so even phase correlation may have some trouble with it. Think of it like trying to detect translation in a snow storm. If all you can see is white, you can't tell that you have translated at all, thus the term whiteout. In your case, the algorithm might suffer from "greenout" :)
Can you adjust the camera settings to work better in low-light conditions. Have you fully opened the iris? Can you live with lower framerates? Setting a longer exposure time will allow the camera to gather more light, thus giving you more features at the cost of adding motion blur. Or, if low-light is your default environment you probably want something designed for this like an IR camera, but those can be expensive. Other than that, a big lens and long exposures are your friend :)
Histogram equalization may be of interest in improving the image contrast. But, sometimes it can just enhance the noise. OpenCV has a global histogram equalization function called equalizeHist. For a more localized implementation, you'll want to look at Contrast Limited Adaptive Histogram Equalization or CLAHE for short. Here is a good article on it. This page has some nice examples, and some code.

Resources