As you've read on the title, I have a project on photo capture on mobile devices. I am supposed to detect if the real-time input to the camera of a mobile device is stable or not. But so far, all I've been seeing is on stabilization itself, mostly videos...not on seeing whether the input is stable. And I haven't read or seen any paper which has metrics on image instability. Though there are items on blur and focus, but they are not quite concrete. Is there any way on quantifying the "instability" of an image (assuming it's more than just blur, shake, and focus)?
The real-time input is presented as a series of preview images. If the camera is still and the object is still, then each of these images is going to be very similar to the previous one. Personally, I'd hang on to each preview image, and when the next one comes in, compare them pixel-by-pixel. Compute:
totalDifference = 0;
for (each pixel n)
for (each colour R, G, B)
totalDifference += abs(oldValue - newValue);
stability = 1/(1 + totalDifference); // a value of 1 is stable, near 0 is unstable
although that would not take into account instability in dark objects. Maybe you should use
totalDifference += abs((oldValue - newValue)/(oldValue + newValue)); // watch for divide by 0!
instead. Good luck!
Related
I'm using captureImage of ARFrame in ARKit to generate signatures for human faces and compare the similarities. However, due to the poor resolution of the captureImage, the quality of the comparison is negatively affected compared to the Vision framework.
capturedDepthData shows that the resolution of the capturedImage in ARKit is only 640x480.
First, I've tried improving the video format to the highest resolution:
let configuration = ARFaceTrackingConfiguration()
if let videoFormat = ARFaceTrackingConfiguration.supportedVideoFormats.sorted(by: { ($0.imageResolution.width * $0.imageResolution.height) < ($1.imageResolution.width * $1.imageResolution.height) }).last {
configuration.videoFormat = videoFormat
}
But, this does not seem to improve the resolution of the capturedImage.
Second, I tried using captureHighResolutionFrame and as well as change the video format:
if let videoFormat = ARFaceTrackingConfiguration.recommendedVideoFormatForHighResolutionFrameCapturing {
configuration.videoFormat = videoFormat
}
However, according to the documentation:
The system delivers a high-resolution frame out-of-band, which means that it doesn't affect the other frames that the session receives at a regular interval
which seems to toggle back and forth between the regular capturedImage and the high resolution images, instead of replacing the regular images due to the asynchronous nature of capturing the high resolution images. This is problematic because the size differences require displayTransform and CGAffineTransform to be used differently for scaling the images for each cases.
On top of that, the capturing of the images with this method creates the shutter sound at 60 times per second.
I did not try this specific model but I hope it's doing good work since people are using it, and the transformer-based models achieved excellent scores in NLP.
Here is the link for one of them, specialized in enhancing image resolution
model. Hope this work well for your case.
Come from Android os, I'm trying to understand the AVCaptureDevice API and find a match between the different parameters of the IOS and Android.
I'm working with auto-continuous exposure mode.
I'm having trouble with exposure parameters above:
To my understating:
exposureDuration - This is the length of time in which the expose actuacly happens. It can be normalized to units of [seconds] by using the value and scale of this property.
exposureTargetOffset, exposureTargetBias - I'm not sure what these values represents - are they kind of fix applied to get the desired exposure level? what is this exposure target value?
You aren't alone. I'm not a professional photographer either, so it's pretty confusing. I think your gut is leading you in the right direction.
If you set exposureDuration, you're out of "auto-exposure mode" and it'll freeze that exposure duration and current or specified ISO setting. If the light changes, you're stuck with that setting.
If you set the exposureTargetBias, it will mimic a fancy camera and move the automatically calculated exposure settings up or down an exposure value (combination of f-number and exposure duration). There's a standard value for exposure of an image, but sometimes you want to over-expose or under-expose for style or shutter-speed priority. Changing the bias tells the automatic exposure system to aim for a value over or under the "correct" standard value.
Here's a great article explaining it in iOS: https://www.imore.com/camera-api-ios-8-explained
Exposure compensation is expressed in f-stops. +1 f-stop doubles the brightness, -1 f-stop halves the brightness.
Developers can currently set exposure target biases between -8 and +8 for all existing iOS devices. However, Apple warns that that could change in the future.
If you have a new iPhone (11 or newer) you can even change the bias in real time.
Exposure Bias is explained here: https://digital-photography-school.com/using-exposure-bias-to-improve-picture-detail/
exposureTargetOffset tells you how well the camera is hitting your requested bias value. Sometimes it just can't adjust enough to darken the image (aiming at the sun, the camera tries to shorten the exposure time and drops the ISO very low) or lighten it (pitch-black closet, the camera tries to expose the image sensor for a long time and bumps up the ISO a ton to gather all the light, resulting in a dark and grainy image). If the camera can't hit the target or is in the process of adjusting to it, the offset tells you how far off it currently is. For video, the exposure is obviously limited by framerate.
i am trying to subtract 2 image using the function cvAbsDiff(img1, img2, dest);
it working but sometimes when i bring my hand before my head or body the hand is not clear and background comes into picture... the background image(head) overlays my foreground.(hand)..
it works correctly on plain surfaces i.e when the background is even like a wall.
please check out my image...so that you can better understand my problem...!!!!
http://www.2shared.com/photo/hJghiq4b/bg_overlays_foreground.html
if you have any solution/hint please help me.......
There's nothing wrong with your code . Background subtraction is not a preffered way for motion detection or silhoutte detection because its not very robust.The problem is coming because both the background and the foreground are similar in colour at many regions which on subtractions pushes the foreground to back . You might try using
- optical flow for motion detection
- If your task is just detecting silhoutte or hand try training a HOG classifier over it
In case you do not want to try a new approach . You may try around playing with the threshold value(in your case 30).So when you subtract similar colour image there difference is less than 30 . And later you threshold with 30 so it just blacks out. Also you may try HSV or some other colourspace as well .
Putting in the relevant code would help. Also knowing what you're actually trying to achieve.
Which two images are you subtracting? I've done subtracting subsequent images (so, images taken with a delay of a fraction of a second), and the background subtraction generally results in the edges of moving objects, for example the edges of a hand, and not the entire silhouette of a hand. I'm guessing you're taking the difference of the current frame and a static startup frame. It's possible that parts aren't different enough (skin+skin).
I've got some computer problems tonight, I'll test it out tomorrow (pls put up at least the steps you actually carry thorough though) and let you know.
I'm still not sure what your ultimate goal is, although I'm guessing you want to do some gesture-recognition (since you have a vector called "fingers").
As Manpreet said, your biggest problem is robustness, and that is from the subjects having similar color.
I reproduced your image by having my face in the static comparison image, then moving it. If I started with only background, it was already much more robust and in anycase didn't display any "overlaying".
Quick fix is, make sure to have a clean subject-free static image.
Otherwise, you'll want to have dynamic comparison image, simplest would be comparing frame_n with frame_n-1. This will generally give you just the moving edges though, so if you want the entire silhouette you can either:
1) Use a different segmenting algorithm (what I recommend. Background subtraction is fast and you can use it to determine a much smaller ROI in which to search, and then use a different algorithm for more robust segmentation.)
2) Try to make a compromise between the static and dynamic comparison image, for example as an average of the past 10 frames or something like that. I don't know how well this works, but would be quite simple to implement, worth a try :).
Also, try with CV_THRESH_OTSU instead of 30 for your threshold value, see if you like that better.
Also, I noticed often the output flares (regions which haven't changed switch from black to white). Checking with the live stream, I'm quite certain it because of the webcam autofocusing/adjusting white balance etc.. If you're getting that too, turning off the autofocus etc. should help (which btw isn't done through openCV but depends on the camera. Possibly check this: How to programatically disable the auto-focus of a webcam?)
It seems to me that CoreAudio adds sound waves together when mixing into a single channel. My program will make synthesised sounds. I know the amplitudes of each of the sounds. When I play them together should I add them together and multiply the resultant wave to keep within the range? I can do it like this:
MaxAmplitude = max1 + max2 + max3 //Maximum amplitude of each sound wave
if MaxAmplitude > 1 then //Over range
Output = (wave1 + wave2 + wave3)/MaxAmplitude //Meet range
else
Output = (wave1 + wave2 + wave3) //Normal addition
end if
Can I do it this way? Should I pre-analyse the sound waves to find the actual maximum amplitude (Because the maximum points may not match on the timeline) and use that?
What I want is a method to play several synthesised sounds together without reducing the volume throughout extremely and sounding seamless. If I play a chord with several synthesised instruments, I don't want to require single notes to be practically silent.
Thank you.
Changing the scale suddenly on a single sample basis, which is what your "if" statement does, can sound very bad, similar to clipping.
You can look into adaptive AGC (automatic gain control) which will change the scale factor more slowly, but could still clip or get sudden volume changes during fast transients.
If you use lookahead with the AGC algorithm to prevent sudden transients from clipping, then your latency will get worse.
If you do use AGC, then isolated notes may sound like they were played much more loudly than when played in a chord, which may not correctly represent a musical composition's intent (although this type of compression is common in annoying TV and radio commercials).
Scaling down the mixer output volume so that the notes will never clip or have their volume reduced other than when the composition indicates will result in a mix with greatly reduced volume for a large number of channels (which is why properly reproduced classical music on the radio is often too quiet to draw enough viewers to make enough money).
It's all a trade-off.
I don't see this is a problem. If you know the max amplitude of all your waves (for all time) it should work. Be sure not to change the amplitude on per sample basis but decide for every "note-on". It is a very simple algorithm but could suit your needs.
When I capture camera images of projected patterns using openCV via 'cvQueryFrame', I often end up with an unintended artifact: the projector's scan line. That is, since I'm unable to precisely time when 'cvQueryFrame' captures an image, the image taken does not respect the constant 30Hz refresh of the projector. The result is that typical horizontal band familiar to those who have turned a video camera onto a TV screen.
Short of resorting to hardware sync, has anyone had some success with approximate (e.g., 'good enough') informal projector-camera sync in openCV?
Below are two solutions I'm considering, but was hoping this is a common enough problem that an elegant solution might exist. My less-than-elegant thoughts are:
Add a slider control in the cvWindow displaying the video for the user to control a timing offset from 0 to 1/30th second, then set up a queue timer at this interval. Whenever a frame is needed, rather than calling 'cvQueryFrame' directly, I would request a callback to execute 'cvQueryFrame' at the next firing of the timer. In this way, theoretically the user would be able to use the slider to reduce the scan line artifact, provided that the timer resolution is sufficient.
After receiving a frame via 'cvQueryFrame', examine the frame for the tell-tale horizontal band by looking for a delta in HSV values for a vertical column of pixels. Naturally this would only work when the subject being photographed contains a fiducial strip of uniform color under smoothly varying lighting.
I've used several cameras with OpenCV, most recently a Canon SLR (7D).
I don't think that your proposed solution will work. cvQueryFrame basically copies the next available frame from the camera driver's buffer (or advances a pointer in a memory mapped region, or blah according to your driver implementation).
In any case, the timing of the cvQueryFrame call has no effect on when the image was captured.
So as you suggested, hardware sync is really the only route, unless you have a special camera, like a point grey camera, which gives you explicit software control of the frame integration start trigger.
I know this has nothing to do with synchronizing but, have you tried extending the exposure time? Or doing so by intentionally "blending" two or more images into one?