I'm doing some work with a camera and video stabilization with OpenCV.
Let's suppose I know exactly (in meters) how much my camera has moved from one frame to another and I want to use this to return the second frame where it should be.
I'm sure I have to do some math with this number before I make the translation matrix, but i'm a little lost with that... Any help?
Thanks.
EDIT:Ok I'll try to explain it better:
I want to remove from a video the movement (shaking) of the camera and I know how much the camera has moved (and the direction) from one frame to another.
So what I want to do is to move back the second frame where it should be using that information I have.
I have to make a traslation matrix for each two frames and apply it to the second frame.
But here is when I doubt: As the info I have is en meters and is the movement of the camera, and now I'm working with a image and pixels, I think I have to do some operations so the traslation is correct, but I'm not sure what they are exactly
Knowing how much the camera has moved is not enough for creating a synthesized frame. For that you'll need the 3D model of the world as well, which I assume you don't have.
To demonstrate that assume the camera movement is pure translation and you are looking at two objects, one is very far - a few kilometers away and the other is very close - a few centimeters away. The very far object will hardly move in the new frame, while the very close one can move dramatically or even disappear from the field of view of the second frame, you need to know how much the viewing angle has changed for each point and for that you need the 3D model.
Having sensor information may help in the case of rotation but it is not as useful for translations.
Related
We have this camera array arranged in an arc around a person (red dot). Think The Matrix - each camera fires at the same time and then we create an animated gif from the output. The problem is that it is near impossible to align the cameras exactly and so I am looking for a way in OpenCV to align the images better and make it smoother.
Looking for general steps. I'm unsure of the order I would do it. If I start with image 1 and match 2 to it, then 2 is further from three than it was at the start. And so matching 3 to 2 would be more change... and the error would propagate. I have seen similar alignments done though. Any help much appreciated.
Here's a thought. How about performing a quick and very simple "calibration" of the imaging system by using a single reference point?
The best thing about this is you can try it out pretty quickly and even if results are too bad for you, they can give you some more insight into the problem. But the bad thing is it may just not be good enough because it's hard to think of anything "less advanced" than this. Here's the description:
Remove the object from the scene
Place a small object (let's call it a "dot") to position that rougly corresponds to center of mass of object you are about to record (the center of area denoted by red circle).
Record a single image with each camera
Use some simple algorithm to find the position of the dot on every image
Compute distances from dot positions to image centers on every image
Shift images by (-x, -y), where (x, y) is the above mentioned distance; after that, the dot should be located in the center of every image.
When recording an actual object, use these precomputed distances to shift all images. After you translate the images, they will be roughly aligned. But since you are shooting an object that is three-dimensional and has considerable size, I am not sure whether the alignment will be very convincing ... I wonder what results you'd get, actually.
If I understand the application correctly, you should be able to obtain the relative pose of each camera in your array using homographies:
https://docs.opencv.org/3.4.0/d9/dab/tutorial_homography.html
From here, the next step would be to correct for alignment issues by estimating the transform between each camera's actual position and their 'ideal' position in the array. These ideal positions could be computed relative to a single camera, or relative to the focus point of the array (which may help simplify calculation). For each image, applying this corrective transform will result in an image that 'looks like' it was taken from the 'ideal' position.
Note that you may need to estimate relative camera pose in 3-4 array 'sections', as it looks like you have a full 180deg array (e.g. estimate homographies for 4-5 cameras at a time). As long as you have some overlap between sections it should work out.
Most of my experience with this sort of thing comes from using MATLAB's stereo camera calibrator app and related functions. Their help page gives a good overview of how to get started estimating camera pose. OpenCV has similar functionality.
https://www.mathworks.com/help/vision/ug/stereo-camera-calibrator-app.html
The cited paper by Zhang gives a great description of the mathematics of pose estimation from correspondence, if you're interested.
I just know a few about opengl es 2.0, like draw 2 triangle into 2 rect or a cube. But I have no idea how to handle this. a few about vertex and fragment, not much.
I shoot a video with 360 degree, How am I supposed to play video on iOS, the functions are: you can move your phone or drag one direction to another direction, so you can watch the video in different view.
The effect should be like Kolor Eyes.
I think the steps are:
get each frame from the video (original, looks like a sphere)
handle frame one by one, to make it be view in panorama way to watch.
Hope somebody could help me out, Thanks a lot
The problem is not connected to ios or any other specific platform but first of all an algorithmic thing. How to convert the pixels from the pano view to a panaromic view? My best guess is something like a transfer function which takes pixel a at position A in the src image and transfers it into a corresponding pixel b at Position B in the destination image.
Maybe you should check the basics of texture mapping which is a common technique to map an image onto an arbitray surface.
Just as an idea: the source is a radial view ranging from 0° to 360°, so what you need is to transfer this into a view where the angle increases horizontally from 0° to 360°. Each src pixel would need an angle and a distance. Given these two properties you could write a function which puts this into a different view.
I am wondering if there are any good frameworks available that would allow me to:
Identify a "ball" object in a video. There will ALWAYS be a ball object, usually an identifiable color, but not always the same darkness, etc
Track the movement of that ball object over time. For example, I need to know how far it moves (x, y coordinates) in a 5 second period.
Take into consideration camera movement. If the user backs up, twitches, etc, I still need my x,y calculations to be accurate based on the new scale factor of the video frame.
Can anyone point me to a library that would get me started down this path?
Thanks
You should look at the OpenCV library. You might be asking too much, but that's probably your best bet.
I want to track the head of a player in order to move the camera inside XNA.
When the player rotates left or right, the camera inside XNA will respond to this action and will also rotate.
I tried using the head joint from Skeleton Data and taking the vector value X,Y but this is not an accurate solution. I need another solution that can rotate the camera inside XNA.
Any suggestions?
You could use the Face Tracking API and see the difference from a certain point on the users face (like their nose) to decide whether or not the user looked in a different direction. The points on a users face are assembled like this:
Then you can see if the X changed and by what amount to see the rotation effects.
(You might want to see Facial Recognition with Kinect)
I'm trying to simulate a shaky cam in a static video. I could choose a couple of points randomly and then pan/zoom/warp using easing, but I was wondering if there's a better, more standard way.
A shaky camera will usually not include zooming. The image rotation component would also be very small, and can probably be ignored. You can probably get sufficient results with 2D translation only.
What you should probably do is define your shake path in time - the amount of image motion from the original static video for each frame - and then shift each frame by this amount.
You might want to crop your video a bit to hide any blank parts near the image border, remaining blank regions may be filled using in-painting. This path should be relatively smooth
and not completely random jitter since you are simulating physical hand motion.
To make the effect more convincing, you should also add motion-blur.
The direction of this blur is the same as the shake-path, and the amount is based on the current shake speed.