Implementing a software gimbal - spatial

I have a game controller (e.g. Wii) which gives me in software its position and orientation.
The given spatial data has a lot of noise, both because of the hardware, and because the user is a human, and human hands aren't stable.
I want to implement something similar to a gimbal, to stabilize the input and remove noise.
Ideally, if the hand is held in place, the spatial data should stay the same, even if in reality the hand moves a little bit.
And when the hand is moving, the data should be stable and smooth, and not all jiggly and noisy.
I tried simple things like a moving average, but it's really not as effective as an actual gimbal.
Searching for anything related to "gimbal" either results in game rotation math (gimbal lock), or electronics projects that use a physical gimbal.
Are there any resources related to stabilizing spatial data?

Related

ARKit and Unity - How can I detect the act of hitting the AR object by a real world object from the camera?

Think if someone in real life waved their hand and hit the 3D object in AR, how would I detect that? I basically want to know when something crosses over the AR object so I can know that something "hit" it and react.
Another example would be to place a virtual bottle on the table and then wave your hand in the air where the bottle is and then it gets knocked over.
Can this be done? If so how? I would prefer unity help but if this can only be done via Xcode and ARKit natively, I would be open to that as well.
ARKit does solve a ton of issues with AR and make them a breeze to work with. Your issue just isn't one of them.
As #Draco18s notes (and emphasizes well with the xkcd link 👍), you've perhaps unwittingly stepped into the domain of hairy computer vision problems. You have some building blocks to work with, though: ARKit provides pixel buffers for each video frame, and the projection matrix needed for you to work out what portion of the 2D image is overlaid by your virtual water bottle.
Deciding when to knock over the water bottle is then a problem of analyzing frame-to-frame differences over time in that region of the image. (And tracking that region's movement relative to the whole camera image, since the user probably isn't holding the device perfectly still.) The amount of of analysis required varies depending on the sophistication of effect you want... a simple pixel diff might work (for some value of "work"), or there might be existing machine learning models that you could put together with Vision and Core ML...
You should take a look at ManoMotion: https://www.manomotion.com/
They're working on this issue and suppose to release a solution in form of library soon.

Calculate distance between camera and pixel in image

Can you, please, suggest me ways of determining the distance between camera and a pixel in an image (in real world units, that is cm/m/..).
The information I have is: camera horizontal (120 degrees) and vertical (90 degrees) field of view, camera angle (-5 degrees) and the height at which the camera is placed (30 cm).
I'm not sure if this is everything I need. Please tell me what information should I have about the camera and how can I calculate the distance between camera and one pixel?
May be it isn't right to tell 'distance between camera and pixel ', but I guess it is clear what I mean. Please write in the comments if something isn't clear.
Thank you in advance!
What I think you mean is, "how can I calculate the depth at every pixel with a single camera?" Without adding some special hardware this is not feasible, as Rotem mentioned in the comments. There are exceptions, and though I expect you may be limited in time or budget, I'll list a few.
If you want to find depths so that your toy car can avoid collisions, then you needn't assume that depth measurement is required. Google "optical flow collision avoidance" and see if that meets your needs.
If instead you want to measure depth as part of some Simultaneous Mapping and Localization (SLAM) scheme, then that's a different problem to solve. Though difficult to implement, and perhaps not remotely feasible for a toy car project, there are a few ways to measure distance using a single camera:
Project patterns of light, preferably with one or more laser lines or laser spots, and determine depth based on how the dots diverge or converge. The Kinect version 1 operates on this principle of "structured light," though the implementation is much too complicated to reproduce completely. For a collision warning simple you can apply the same principles, only more simply. For example, if the projected light pattern on the right side of the image changes quickly, turn left! Learning how to estimate distance using structured light is a significant project to undertake, but there are plenty of references.
Split the optical path so that one camera sensor can see two different views of the world. I'm not aware of optical splitters for tiny cameras, but they may exist. But even if you find a splitter, the difficult problem of implementing stereovision remains. Stereovision has inherent problems (see below).
Use a different sensor, such as the somewhat iffy but small Intel R200, which will generate depth data. (http://click.intel.com/intel-realsense-developer-kit-r200.html)
Use a time-of-flight camera. These are the types of sensors built into the Kinect version 2 and several gesture-recognition sensors. Several companies have produced or are actively developing tiny time-of-flight sensors. They will generate depth data AND provide full-color images.
Run the car only in controlled environments.
The environment in which your toy car operates is important. If you can limit your toy car's environment to a tightly controlled one, you can limit the need to write complicated algorithms. As is true with many imaging problems, a narrowly defined problem may be straightforward to solve, whereas the general problem may be nearly impossible to solve. If you want your car to run "anywhere" (which likely isn't true), assume the problem is NOT solvable.
Even if you have an off-the-shelf depth sensor that represents the best technology available, you would still run into limitations:
Each type of depth sensing has weaknesses. No depth sensors on the market do well with dark, shiny surfaces. (Some spot sensors do okay with dark, shiny surfaces, but area sensors don't.) Stereo sensors have problems with large, featureless regions, and also require a lot of processing power. And so on.
Once you have a depth image, you still need to run calculations, and short of having a lot of onboard processing power this will be difficult to pull off on a toy car.
If you have to make many compromises to use depth sensing, then you might consider just using a simpler ultrasound sensor to avoid collisions.
Good luck!

WebGL earth : how to make clouds

Problem
I would like to build a realistic view of the earth from low-orbit (here ~300km) with WebGL. That is to say, on the web, with all that it implies and moreover, on mobile. Do not stop reading here : to make this a little less difficult, the user can look everywhere but not pan, so the view does only concern a small 3000km-wide area. But the view follows a satellite so few minutes later, the user comes back to where it was before, with the slight shift of the earth's rotation, etc. So the clouds cannot be at the same place all the time.
I have actually yet been able to include city lights, auroras, lightnings... except clouds. I have seen a lot of demos of realtime rendering passionates and researchers, but none of them had a nice, realistic cloud layer. However I am sure I am the 100(...)00th person thinking about doing this, so please enlight me.
Few questions are implied :
what input to use for clouds ? Meteorological live data ?
what rendering possibilities ? A transparent layer with a cloud map, modified with shaders ? Few transparent layers to get a feeling of volumetric rendering ? But how to cast shadows one to another : the only solution would then be using a mesh ? Or shadows could be procedurally computed and mapped on the server every x minutes ?
Few specifications
Here are some ideas summing up what I have not seen yet, sorted by importance :
clouds hide 60% of the earth.
clouds scatter cities & lightnings'lights and have rayleigh scattering at night.
At this distance the parallax effect is visible and even quite awesome with the smallest clouds.
As far as i've seen, even expensive realtime meteorological online resources are not useful : they aim rainy or stormy clouds with help of UV and IR lightwaves, so they don't catch 100% of them and don't give the 'normal' view we all know. Moreover the rare good cloud textures shot in visible light hardly differentiate ground from clouds : sometimes a 5000km-long coast stands among nowhere. A server may be able to use those images to create better textures.
When I look at those pictures I imagine that the lest costy way would be to merge few nice cloud meshes from a database containing different models, then slightly transform those meshes inside a shader while the user passes over. If he is still here 90 minutes later when he comes back, no matter if the model are not the same again. However a hurrican cannot disappear.
What do you think about this ?
For such effects there is probably just one way to do it properly and that is:
Voxel maps + Volume rendering probably with Back-Ray-tracer rendering
As your position is fixed so it should not be so hard on memory requirements. You need to implement both MIE and Rayleigh scattering. Scattering can be simplified a lot and still looking good see
simplified atmosphere scattering
realistic n-body solar system simulation
voxel maps handle light gaps,shadows and scattering relatively easy but need a lot of memory and computational power. All the other 2D techniques just usually painfully work around what 3D voxel maps do natively with little effort. For example:
Voxel map shadows
Procedural cloud map generators
You need this for each type of clouds so you have something to render. There are libs/demos/examples out there see:
first relevant google hit

Structure from Motion (SfM) in a tunnel-like structure?

I have a very specific application in which I would like to try structure from motion to get a 3D representation. For now, all the software/code samples I have found for structure from motion are like this: "A fixed object that is photographed from all angle to create the 3D". This is not my case.
In my case, the camera is moving in the middle of a corridor and looking forward. Sometimes, the camera can look on other direction (Left, right, top, down). The camera will never go back or look back, it always move forward. Since the corridor is small, almost everything is visible (no hidden spot). The corridor can be very long sometimes.
I have tried this software and it doesn't work in my particular case (but it's fantastic with normal use). Does anybody can suggest me a library/software/tools/paper that could target my specific needs? Or did you ever needed to implement something like that? Any help is welcome!
Thanks!
What kind of corridors are you talking about and what kind of precision are you aiming for?
A priori, I don't see why your corridor would not be a fixed object photographed from different angles. The quality of your reconstruction might suffer if you only look forward and you can't get many different views of the scene, but standard methods should still work. Are you sure that the programs you used aren't failing because of your picture quality, arrangement or other reasons?
If you have to do the reconstruction yourself, I would start by
1) Calibrating your camera
2) Undistorting your images
3) Matching feature points in subsequent image pairs
4) Extracting a 3D point cloud for each image pair
You can then orient the point clouds with respect to one another, for example via ICP between two subsequent clouds. More sophisticated methods might not yield much difference if you don't have any closed loops in your dataset (as your camera is only moving forward).
OpenCV and the Point Cloud Library should be everything you need for these steps. Visualization might be more of a hassle, but the pretty pictures are what you pay for in commercial software after all.
Edit (2017/8): I haven't worked on this in the meantime, but I feel like this answer is missing some pieces. If I had to answer it today, I would definitely suggest looking into the keyword monocular SLAM, which has recently seen a lot of activity, not least because of drones with cameras. Notably, LSD-SLAM is open source and may not be as vulnerable to feature-deprived views, as it operates directly on the intensity. There even seem to be approaches combining inertial/odometry sensors with the image matching algorithms.
Good luck!
FvD is right in the sense that your corridor is a static object. Your scenario is the same and moving around and object and taking images from multiple views. Your views are just not arranged to provide a 360 degree view of the object.
I see you mentioned in your previous comment that the data is coming from a video? In that case, the problem could very well be the camera calibration. A camera calibration tells the SfM algorithm about the internal parameters of the camera (focal length, principal point, lens distortion etc.) In the absence of knowledge about these, the bundler in VSfM uses information from the EXIF data of the image. However, I don't think video stores any EXIF information (not a 100% sure). As a result, I think the entire algorithm is running with bad focal length information and cannot solve for the orientation.
Can you extract a few frames from the video and see if there is any EXIF information?

Algorithm for real-time tracking of several simple objects

I'm trying to write a program to track relative position of certain objects while I'm playing the popular game, League of Legends. Specifically, I want to track the x,y screen coordinates of any "minions" currently on the screen (The "minions" are the little guys in the center of the picture with little red and green bars over their heads).
I'm currently using the Java Robot class to send screen captures to my program while I'm playing, and am trying to figure out the best algorithm for locate the minions and track them so long as they stay on the screen.
My current thinking is to use a convolutional neural network to identify and locate the minions by the the colored bars over there heads. However, I'd have to re-identify and locate the minions on every new frame, and this seems like it'd be computationally expensive if I want to do this in real time (~10-60 fps).
These sorts of computer vision algorithms aren't really my specialization, but it seems reasonable that algorithms exist that exploit the fact objects in videos move in a continuous manner (i.e. they don't jump around from frame to frame).
So, is there an easily implementable algorithm for accomplishing this task?
Since this is a computer game, I think that the color of the bars should be constant. That might not be true only if the dynamic illumination affects the health bar, which is highly unlikely.
Thus, just find all of the pixels with this specific colors. Then you do some morphological operations and segment the image into blobs. By selecting only the blobs that fit some criteria, you can find the location of the units.
I know that my answer does not involve video, but the operations should be so simple, that it should be very quick.
As for the tracking, just find per each point the closest in the next frame.
Since the HUD location is constant, there should be no problem removing it.
Here is mine quick and not-so-robust implementation in Matlab that has a few limitations:
Units must be quite healthy (At least 40 pixels wide)
The bars do not overlap.
function FindUnits()
x = double(imread('c:\1.jpg'));
green = cat(3,149,194,151);
diff = abs(x - repmat(green,[size(x,1) size(x,2)]));
diff = mean(diff,3);
diff = logical(diff < 30);
diff = imopen(diff,strel('square',1));
rp = regionprops(diff,'Centroid','MajorAxisLength','MinorAxisLength','Orientation');
long = [rp.MajorAxisLength]./[rp.MinorAxisLength];
rp( long < 20) = [];
xy = [rp.Centroid];
x = xy(1:2:end);
y = xy(2:2:end);
figure;imshow('c:\1.jpg');hold on ;scatter(x,y,'g');
end
And the results:
You should use a model which includes a dynamic structure in it. For your object tracking purpose Hidden Markov Models (HMMs) (or in general Dynamic Bayesian Networks) are very well suitable. You can find a lot of resources on HMMs online. The issues you are going to face however, depends on your system model. If your system dynamics can easily be represented as a linear Gauss-Markov model then a simple Kalman Filter will do fine. However, in the case of nonlinear non-gaussian dynamics you should use Particle Filtering which is a Sequential Monte Carlo Method. Both Kalman Filter and Particle Filter are sequential methods so you will use the results you have at the current step to have a result at the next time step. I suggest you to check some online tutorials and papers on Multiple Object Tracking via Particle Filters. As far as I am concerned the main difficulty you will have is however, the number of objects you may want to track since you won't know the number of the objects you want to track and also a object you are tracking can just disappear as well (you may kill those little guys or they may just leave the screen) or some other guy can just enter the screen. Hope this helps.

Resources