Programming camera for ROI selection - image-processing

I want to transfer an ROI instead of the full Image using Camera. This I am doing to increase the (usefull) data transfer rate from camera to PC(less data, less time), where I will be doing some Image processing on the ROI. Basically the user will define the ROI's coordinates, using which the camera will capture ROI and will send only this ROI to the PC through USB or gigaBitEthernet.
Is it possioble to do this programatically since my application will need ROI which will be changing dynamically. Do we have some APIs that lets us define ROI and prgram the camera accordingly?
I will be using C/C++ with OpenCV for the entire application.

Do we have some APIs that lets us define ROI and prgram the camera
accordingly?
This all depends on your camera and the driver. Some cameras produced by Point Grey have this feature and they come with a SDK that provides an API for setting a ROI.

Related

Using OpenVR TrackedCamera for OpenCV stereo rectification and block matching

I am trying to create a disparity map of the images created by the stereo camera mounted on the Valve Index VR headset. I am using OpenVR and OpenCV. OpenVR allows access to the cameras using the IVRTrackedCamera interface. In order to perform OpenCV StereoBM the left and right images need to be rectified.
My first question:
OpenVR allows for acquiring frames using GetVideoStreamFrameBuffer(). This method allows for passing an EVRTrackedCameraFrameType, either Distorted, Undistorted or MaximumUndistorted. What do the different FrameTypes mean? Can I assume the frames are already rectified onto a common plane when using the Undistorted or MaxUndistorted frametypes?
Second question:
If the frames are not yet rectified unto a common plane, how to do so? With OpenVR I can get camera intrinsics for each individual camera using GetCameraIntrinsics(), again supplying an EVRTrackedCameraFrameType. I can also acquire the Distortion Parameters for each individual camera using GetArrayTrackedDeviceProperty(Prop_CameraDistortionCoefficients_Float_Array). Now, the two parameters I am missing for OpenCV's stereoRectify() are:
R – Rotation matrix between the coordinate systems of the first and the second cameras.
T – Translation vector between coordinate systems of the cameras.
Is it possible to acquire these parameters from OpenVR?

How to convert webcam image to RGB Depth

I'm building an iPhone-like FaceID program using my PC's webcam. I'm following this notebook which uses Kinect to create RGB-D images. So can I use my webcam to capture several images for the same purpose?
Here's how to predict the person in the Kinect image. It uses a .dat file.
inp1 = create_input_rgbd(file1)
file1 = ('faceid_train/(2012-05-16)(154211)/011_1_d.dat')
inp2 = create_input_rgbd(file1)
model_final.predict([inp1, inp2])
They use Kinect to create RGB-D images where you want to only use RGB camera to do the similar? Hardwarely they are different. So there wont be a direct method
You have to first estimate a depth map using only monocular Image.
Well you can try with Revisiting Single Image Depth Estimation: Toward Higher Resolution Maps with Accurate Object Boundaries as shown below. The depth obtained is pretty much close to real ground truth. For non-life threatening case(e.g control UAV or control car), you can use it anytime.
The code and model are available at
https://github.com/JunjH/Revisiting_Single_Depth_Estimation
Edit the demo py file to do a single image detection.
image = you
deep_learned_fake_depth = model(image)
#Add your additional classification routing behind.
Take note this method cant work real time. So you can only do it at keyframes. Usualy people use the feature tracking technique to fake continuous detection( which is the common practice).
Also take note that some of the phone devices does have a small depth estimation sensor that you can make use of. Details I'm not very sure as I deal android and ios at a very minimal level.

Kinect Point Cloud To Color Space Transform

Right now, I am trying to determine a transform matrix in the Kinect V2.0 framework. My end goal is to end up with the translation and rotation matrix necessary to convert my point cloud, which is relative to the depth sensor, into a point cloud relative to the color camera.
I know that I can manually calibrate (via OpenCV, etc.), to determine the transform, but I would like the actual camera matrix. I use the call MapDepthFrameToCameraSpaceUsingIntPtr, so I know that there is an internal understanding (aka Matrix transform) between depth space, and color space.
Do anyone know how to extract, or if there exists a matrix inside the Kinect v2 API, that they use internally for the MapDepthFrameToCameraSpaceUsingIntPtr call? Or, if there is a way to translate a point cloud image frame into color camera space?
Probably they know the rotation and translation matrix, and the color camera parameters. Unfortunately as far as you use Microsoft SDK they don't expose these data (only depth camera parameters are public). Either you calibrate the cameras or you use the look-up table provided.
What you are trying to do (transformation) is called Registration. I have explained here very clearly how to do the Registration.
I know that I can manually calibrate (via OpenCV, etc.), to determine
the transform, but I would like the actual camera matrix.
calibrating your camera is the only way to get the most accurate camera matrix of your Kinect sensor since every Kinect sensor's camera matrix differs from each other with little margin. But it will make a significant difference once you build your point cloud.
Do anyone know how to extract, or if there exists a matrix inside the
Kinect v2 API, that they use internally for the
MapDepthFrameToCameraSpaceUsingIntPtr call?
You can extract part of the matrix, but not all. Something very impotenent is MapDepthFrameToCameraSpaceUsingIntPtrnot processed in your CPU. It is calculated inside a chip in kinect hardware itself. The values of the matrix are embedded into the chip itself. The reason for this is that, think about how many calculation has to be done for this API call. E.g. Kinect frame rate in 30 frames per second and each color frame has 1920 x 1080 pixels and depth frame has 512 x 424 pixels. at least
30 x 512 x 424 = 6,512,640 calculations per second.
You can't build the point cloud in real world coordinate space without knowing the camera matrix. If you build the point cloud directly using depth image coordinates, then that point cloud in depth space.
I have developed a prototype for 3D interaction with real time point cloud visualization.
You can check out my repository VRInteraction.
Demo video
Calibrated color and depth camera matrix
As you can see the right hand side of the video, it is a real time 3D point cloud. I achieve this using CUDA (GPU accelaration) by Registering depth frame to color frame and building RGBXYZ point cloud.

How do I generate stereo images from mono camera?

I have a stationary mono camera which captures a single image frame at some fps.
Assume the camera is not allowed to move,how do I generate a stereo image pair from the obtained single image frame? Is there any algorithms exists for this? If so, are they available in Open-CV?
To get a stereo image, you need a stereo camera, i.e. a camera with two calibrated lenses. So you cannot get a stereo image from a single camera with traditional techniques.
However, with the magic of deep learning, you can obtain the depth image from single camera.
And no, there's no builtin OpenCV function to do that.
The most common use of this kind of techniques is in 3D TVs, which often offer 2D-to-3D conversion, and thus mono to stereo conversion.
Various algorithms are used for this, you can look at this state of the art report.
There is also optical way for this.
If you can add binocular prisms/mirrors to your camera objective ... then you could obtain real stereoscopic image from single camera. That of coarse need access to the camera and setting up the optics. This also introduce some problems like wrong auto-focusing , need for image calibration, etc.
You can also merge Red/Cyan filtered images together to maintain the camera full resolution.
Here is a publication which might be helpful Stereo Panorama with a single Camera.
You might also want to have a look at the opencv camera calibration module and a look at this page.

send images to kinect as input instead of camera feed

I want to send my images to Kinect SDK either OpenNI or windows SDK for Kinect sdk to tell me the position of a user's hand or head and ..
I don't want to use kinect's camera feed. The images are from a paper which I want to do some image processing on them and I need to work exactly on same images so can not use my own body as input to Kinect camera.
I don't matter between Kinect sdk of Microsoft or the OpenNI thing, it just needs to be able to get my rgb and depth images as input instead of Kinect camera's one.
Is it possible? If yes how can I do it?
I subscribe to the same question. I want to feed a Kinect face detection app to read images from the hard drive and return the Animation Units of the recognized face. I want to train a classifier for facial emotion recognition using Animation Units as input and features.
Thanks,
Daniel.

Resources