Is it possible to store hydrogen nano-bubbles inside the van der Waals spaces of 2D layered materials using simulation - storage

I want to know if it is possible to simulate the intercalation of hydrogen nano-bubbles into 2D layered materials (as a potential way to store high density hydrogen energy) using first principles calculations. Which software would be the most suitable for this purpose? What are the important things to consider when performing such a study such as lattice mismatch, expansion/local distortion of the "host" material, concentration gradient, strain induced in the host material etc? And which 2D material would be the most suitable for this purpose (MXenes, graphene etc)? Also what is the average number of atoms contained in a hydrogen nano-bubble?

Related

OpenCV - Feature Matching vs Optical Flow

I am interested in making a motion tracking app using OpenCV, and there has been a wealth of information available online. However, I am a tad confused between feature matching and tracking features using a sparse optical flow algorithm such as Lucas-Kanade. With that in mind, I have the following questions:
What is the main difference between the two (feature matching and optical flow) if I have specified a region of pixels to track? I'm not interested in tracking in real time, if that helps clear up any assumptions.
In addition, since I'm not doing real time tracking, is it a better idea to use dense optical flow (Farneback) to keep track of the pixels in my specified region of interest?
Thank you.
I would like to add a few thoughts about that theme since I found this a very interesting question too.
As said before Feature Matching is a technique that is based on:
A feature detection step which returns a set of so called feature points. These feature points are located at positions with salient image structures, e.g. edge-like structures when you are using FAST or blob like structures if you are using SIFT or SURF.
The second step is the matching. The association of feature points extracted from two different images. The matching is based on local visual descriptors, e.g. histogram of gradients or binary patterns, that are locally extracted around the feature positions. The descriptor is a feature vector and associated feature point pairs are pairs a minimal feature vector distances.
Most feature matching methods are scale and rotation invariant and are robust for changes in illuminations (e.g caused by shadow or different contrast). Thus these methods can be applied to image sequences but are more often used to align image pairs captured from different views or with different devices.The disadvantage of Feature Matching methods is the difficulty of defining where the feature matches are spawn and that the feature pair (which in a image sequence are motion vectors) are in general very sparse. In addition the subpixel accuracy of matching approaches are very limited as most detector are fine-graded to integer positions.
From my experience the main advantage of feature matching approaches is that they can compute very large motions/ displacements.
OpenCV offers some feature matching methods but there are a lot of more recent, faster and more accurate approaches available online e.g.:
DeepMatching which relies on deep learning and are often used to initialize optical flow methods to help them deal with long-range motions.
Stereoscann which is a very fast approach at its origin proposed for visual odometry.
Optical flow methods in contrast rely on the minimization of the brightness constancy and additional constrain e.g. smoothness etc. Thus they derive motion vector based on spatial and temporal image gradients of a sequence of consecutive frames. Thus they are more suited image sequences rather than image pairs that are captured from very different view points. The main challenges in the estimation of motion with optical flow vectors are large motions, occlusion, strong illumination changes and changes of the appearance of the objects and mostly the low runtime. However optical flow methods can be highly accurate and compute dense motion fields which respect to shared motion boundaries of the objects in a scene.
However, the accuracy of different optical flow methods is very different. Local methods such as the PLK (Lucas Kanade) are in general less accurate but allow to compute pre selected motion vectors only and can thus be very fast. (In the recent years we have done some research to improve the accuracy of the local approach, see here for further information).
The main OpenCV trunk offers global approaches such as the Farnback. But this is a quite outdated approach. Try the OpenCV contrib trunk which more recent methods. But to get an good overview of the most recent methods take a look at the public optical flow benchmarks. Here you will find code and implementations as well e.g.:
MPI-Sintel optical flow benchmark
KITTI 2012 optical flow benchmark. Both offer links e.g. to git's or source code for some newer methods. Such as FlowFields.
But from my point of view I would not on an early stage reject a specific approach matching or optical flow. Try as much as possible available online implementations and see what is the best for your application.
Feature matching uses the feature descriptors to match features with one another (usually) using a nearest neighbor search in the feature descriptor space. The basic idea is you have descriptor vectors, and the same feature in two images should be near each other in the descriptor space, so you just match that way.
Optical flow algorithms do not look at a descriptor space, and instead, looks at pixel patches around features and tries to match those patches instead. If you're familiar with dense optical flow, sparse optical flow just does dense optical flow but on small patches of the image around feature points. Thus optical flow assumes brightness constancy, that is, that pixel brightness doesn't change between frames. Also, since you're looking around neighboring pixels, you need to make the assumption that neighboring points to your features move similarly to your feature. Finally, since it's using a dense flow algorithm on small patches, the points where they move cannot be very far in the image from the original feature location. If they are, then the pyramid-resolution approach is recommended, where you scale down the image before you do this so that what once was a 16 pixel translation is now a 2 pixel translation, and then you can scale up with the found transformation as your prior.
So feature matching algorithms are all-in-all far better when it comes to using templates where the scale is not exactly the same, or if there's a perspective difference in the image and template, or if the transformations are large. However, your matches are only as good as your feature detector is exact. On optical flow algorithms, as long as it's looking in the right spot, the transformations can be really, really precise. They're both computationally expensive a bit; optical flow algorithms being an iterative approach makes them expensive (and although you'd think the pyramid approach can eat up more costs by running on more images, it can actually make it faster in some cases to reach the desired accuracy), and nearest neighbor searches are also expensive. Optical flow algorithms OTOH can work really well when the transformations are small, but if anything in your scene messes with your lighting or you get some incorrect pixels (like say, even minor occlusion) can really throw it off.
Which one to use definitely depends on the project. For a project I worked on with satellite imagery, I used dense optical flow because the images of desert terrain I was working with did not have precise enough features (in location) and different feature descriptors happen to look relatively similar so searching that feature space wasn't giving tons of great matches. In this case, optical flow was the better method. However, if you were doing image alignment on satellite imagery of a city where buildings can occlude parts of the scene, there are a lot of features that will stay matched and give a better result.
The OpenCV Lucas-Kanade tutorial doesn't give a whole lot of insight but should get your code moving in the right direction with the above in mind.
key-point matching = sparse optical flow
KLT tracking is a good example of sparse flow, see the demo LKDemo.cpp (it had some python wrapper example too, cant remember it now).
for a dense example, see samples/python/opt_flow.py, using Farnebäcks method.
You are right in being confused... The entire world is confused about this terribly simple topic. Alot of the reason is because people believe Lucas-Kanade to be sparse flow (due to a terribly badly named and commented example in openCV: LKdemo which should be called KLTDemo).

Calculate distance between camera and pixel in image

Can you, please, suggest me ways of determining the distance between camera and a pixel in an image (in real world units, that is cm/m/..).
The information I have is: camera horizontal (120 degrees) and vertical (90 degrees) field of view, camera angle (-5 degrees) and the height at which the camera is placed (30 cm).
I'm not sure if this is everything I need. Please tell me what information should I have about the camera and how can I calculate the distance between camera and one pixel?
May be it isn't right to tell 'distance between camera and pixel ', but I guess it is clear what I mean. Please write in the comments if something isn't clear.
Thank you in advance!
What I think you mean is, "how can I calculate the depth at every pixel with a single camera?" Without adding some special hardware this is not feasible, as Rotem mentioned in the comments. There are exceptions, and though I expect you may be limited in time or budget, I'll list a few.
If you want to find depths so that your toy car can avoid collisions, then you needn't assume that depth measurement is required. Google "optical flow collision avoidance" and see if that meets your needs.
If instead you want to measure depth as part of some Simultaneous Mapping and Localization (SLAM) scheme, then that's a different problem to solve. Though difficult to implement, and perhaps not remotely feasible for a toy car project, there are a few ways to measure distance using a single camera:
Project patterns of light, preferably with one or more laser lines or laser spots, and determine depth based on how the dots diverge or converge. The Kinect version 1 operates on this principle of "structured light," though the implementation is much too complicated to reproduce completely. For a collision warning simple you can apply the same principles, only more simply. For example, if the projected light pattern on the right side of the image changes quickly, turn left! Learning how to estimate distance using structured light is a significant project to undertake, but there are plenty of references.
Split the optical path so that one camera sensor can see two different views of the world. I'm not aware of optical splitters for tiny cameras, but they may exist. But even if you find a splitter, the difficult problem of implementing stereovision remains. Stereovision has inherent problems (see below).
Use a different sensor, such as the somewhat iffy but small Intel R200, which will generate depth data. (http://click.intel.com/intel-realsense-developer-kit-r200.html)
Use a time-of-flight camera. These are the types of sensors built into the Kinect version 2 and several gesture-recognition sensors. Several companies have produced or are actively developing tiny time-of-flight sensors. They will generate depth data AND provide full-color images.
Run the car only in controlled environments.
The environment in which your toy car operates is important. If you can limit your toy car's environment to a tightly controlled one, you can limit the need to write complicated algorithms. As is true with many imaging problems, a narrowly defined problem may be straightforward to solve, whereas the general problem may be nearly impossible to solve. If you want your car to run "anywhere" (which likely isn't true), assume the problem is NOT solvable.
Even if you have an off-the-shelf depth sensor that represents the best technology available, you would still run into limitations:
Each type of depth sensing has weaknesses. No depth sensors on the market do well with dark, shiny surfaces. (Some spot sensors do okay with dark, shiny surfaces, but area sensors don't.) Stereo sensors have problems with large, featureless regions, and also require a lot of processing power. And so on.
Once you have a depth image, you still need to run calculations, and short of having a lot of onboard processing power this will be difficult to pull off on a toy car.
If you have to make many compromises to use depth sensing, then you might consider just using a simpler ultrasound sensor to avoid collisions.
Good luck!

Type of graph cut algorithm for 3D reconstruction

I have read several papers on using graph cuts for 3D reconstruction and I have noticed that there seem to be two alternative approaches to posing this problem.
One approach is volumetric and describes a 3D region of voxels for which a graph cut is used to infer a binary labelling (contains object of interest or does not) for each voxel. Papers which take this approach include Multi-View Stereo via Volumetric Graph Cuts and Occlusion Robust Photo-Consistency and A Surface Reconstruction Using Global Graph Cut Optimization.
The second approach is 2D and seeks to label each pixel of a reference image with the depth of the 3D point that projects there. Papers which take this approach include Computing Visual Correspondence with Occlusions via Graph Cuts.
I want to understand the advantages/disadvantages of each method and which are the most significant when choosing which method to use. So far I understand that some advantages of the first approach are:
It is a binary problem, so is solvable exactly with Max-Flow algorithms.
Provides simple methods of modelling occlusion.
And some advantages of the second approach are:
Smaller neighbor set for each node of the graph.
Easier to model smoothness (but does it give better results?).
Additionally, I would be interested in which situations I would be better off choosing one representation or the other and why.
The most significant difference is the type of scenes the algorithms are typically used with, and the way they represent the 3D shape of the object.
Volumetric approaches perform best...
with a large number of images...
taken from different viewpoints, well distributed around the object,...
of a more or less compact "object" (e.g. an artifact, in contrast, for example, to an outdoor scene observed by a vehicle camera).
Volumetric approaches are popular for reconstructing "objects" (especially artifacts). Given sufficient views (i.e. images), the algorithms give a complete volumetric (i.e. voxel) representation of the object's shape. This can be converted to a surface representation using Marching Cubes or similar method.
The second type of algorithms you identified are called stereo algorithms, and graph cuts are just one of many methods of solving such problems. Stereo is best...
if you have only two images...
with a fairly narrow baseline (i.e. distance between cameras)
Generalizations to more than two images (with narrow baselines) exist, but most of the literature deals with the binocular (i.e. two image) case. Some algorithms generalize more easily to more views than others.
Stereo algorithms only give you a depth map, i.e. an image with a depth value for each pixel. This does not allow you to look "around" the object. There are, however, 3D reconstruction systems that start with stereo on image pairs and combine the depth maps in order to get a representation of the complete object, which is a non-trivial problem of its own right. Interestingly, this is often approached using a volumetric representation as an intermediate step.
Stereo algorithms can be and are often used to for "scenes", e.g. the road observed by a pair of cameras in a vehicle, or people in a room for 3D video conferencen.
Some closing remarks
For both stereo and volumetric reconstruction, graph cuts are just one of several methods to solve the problem. Stereo, for example, can also be formulated as a continuous optimization problem, rather than a discrete one, which implies other optimization methods for its solution.
My answer contains a bunch of generalizations and simplifications. It is not meant to be a definitive treatment of the subject.
I don't necessarily agree that smoothness is easier in the stereo case. Why do you think so?

3D reconstruction -- How to create 3D model from 2D image?

If I take a picture with a camera, so I know the distance from the camera to the object, such as a scale model of a house, I would like to turn this into a 3D model that I can maneuver around so I can comment on different parts of the house.
If I sit down and think about taking more than one picture, labeling direction, and distance, I should be able to figure out how to do this, but, I thought I would ask if someone has some paper that may help explain more.
What language you explain in doesn't matter, as I am looking for the best approach.
Right now I am considering showing the house, then the user can put in some assistance for height, such as distance from the camera to the top of that part of the model, and given enough of this it would be possible to start calculating heights for the rest, especially if there is a top-down image, then pictures from angles on the four sides, to calculate relative heights.
Then I expect that parts will also need to differ in color to help separate out the various parts of the model.
As mentioned, the problem is very hard and is often also referred to as multi-view object reconstruction. It is usually approached by solving the stereo-view reconstruction problem for each pair of consecutive images.
Performing stereo reconstruction requires that pairs of images are taken that have a good amount of visible overlap of physical points. You need to find corresponding points such that you can then use triangulation to find the 3D co-ordinates of the points.
Epipolar geometry
Stereo reconstruction is usually done by first calibrating your camera setup so you can rectify your images using the theory of epipolar geometry. This simplifies finding corresponding points as well as the final triangulation calculations.
If you have:
the intrinsic camera parameters (requiring camera calibration),
the camera's position and rotation (it's extrinsic parameters), and
8 or more physical points with matching known positions in two photos (when using the eight-point algorithm)
you can calculate the fundamental and essential matrices using only matrix theory and use these to rectify your images. This requires some theory about co-ordinate projections with homogeneous co-ordinates and also knowledge of the pinhole camera model and camera matrix.
If you want a method that doesn't need the camera parameters and works for unknown camera set-ups you should probably look into methods for uncalibrated stereo reconstruction.
Correspondence problem
Finding corresponding points is the tricky part that requires you to look for points of the same brightness or colour, or to use texture patterns or some other features to identify the same points in pairs of images. Techniques for this either work locally by looking for a best match in a small region around each point, or globally by considering the image as a whole.
If you already have the fundamental matrix, it will allow you to rectify the images such that corresponding points in two images will be constrained to a line (in theory). This helps you to use faster local techniques.
There is currently still no ideal technique to solve the correspondence problem, but possible approaches could fall in these categories:
Manual selection: have a person hand-select matching points.
Custom markers: place markers or use specific patterns/colours that you can easily identify.
Sum of squared differences: take a region around a point and find the closest whole matching region in the other image.
Graph cuts: a global optimisation technique based on optimisation using graph theory.
For specific implementations you can use Google Scholar to search through the current literature. Here is one highly cited paper comparing various techniques:
A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms.
Multi-view reconstruction
Once you have the corresponding points, you can then use epipolar geometry theory for the triangulation calculations to find the 3D co-ordinates of the points.
This whole stereo reconstruction would then be repeated for each pair of consecutive images (implying that you need an order to the images or at least knowledge of which images have many overlapping points). For each pair you would calculate a different fundamental matrix.
Of course, due to noise or inaccuracies at each of these steps you might want to consider how to solve the problem in a more global manner. For instance, if you have a series of images that are taken around an object and form a loop, this provides extra constraints that can be used to improve the accuracy of earlier steps using something like bundle adjustment.
As you can see, both stereo and multi-view reconstruction are far from solved problems and are still actively researched. The less you want to do in an automated manner the more well-defined the problem becomes, but even in these cases quite a bit of theory is required to get started.
Alternatives
If it's within the constraints of what you want to do, I would recommend considering dedicated hardware sensors (such as the XBox's Kinect) instead of only using normal cameras. These sensors use structured light, time-of-flight or some other range imaging technique to generate a depth image which they can also combine with colour data from their own cameras. They practically solve the single-view reconstruction problem for you and often include libraries and tools for stitching/combining multiple views.
Epipolar geometry references
My knowledge is actually quite thin on most of the theory, so the best I can do is to further provide you with some references that are hopefully useful (in order of relevance):
I found a PDF chapter on Multiple View Geometry that contains most of the critical theory. In fact the textbook Multiple View Geometry in Computer Vision should also be quite useful (sample chapters available here).
Here's a page describing a project on uncalibrated stereo reconstruction that seems to include some source code that could be useful. They find matching points in an automated manner using one of many feature detection techniques. If you want this part of the process to be automated as well, then SIFT feature detection is commonly considered to be an excellent non-real-time technique (since it's quite slow).
A paper about Scene Reconstruction from Multiple Uncalibrated Views.
A slideshow on Methods for 3D Reconstruction from Multiple Images (it has some more references below it's slides towards the end).
A paper comparing different multi-view stereo reconstruction algorithms can be found here. It limits itself to algorithms that "reconstruct dense object models from calibrated views".
Here's a paper that goes into lots of detail for the case that you have stereo cameras that take multiple images: Towards robust metric reconstruction
via a dynamic uncalibrated stereo head. They then find methods to self-calibrate the cameras.
I'm not sure how helpful all of this is, but hopefully it includes enough useful terminology and references to find further resources.
Research has made significant progress and these days it is possible to obtain pretty good-looking 3D shapes from 2D images. For instance, in our recent research work titled "Synthesizing 3D Shapes via Modeling Multi-View Depth Maps and Silhouettes With Deep Generative Networks" took a big step in solving the problem of obtaining 3D shapes from 2D images. In our work, we show that you can not only go from 2D to 3D directly and get a good, approximate 3D reconstruction but you can also learn a distribution of 3D shapes in an efficient manner and generate/synthesize 3D shapes. Below is an image of our work showing that we are able to do 3D reconstruction even from a single silhouette or depth map (on the left). The ground-truth 3D shapes are shown on the right.
The approach we took has some contributions related to cognitive science or the way the brain works: the model we built shares parameters for all shape categories instead of being specific to only one category. Also, it obtains consistent representations and takes the uncertainty of the input view into account when producing a 3D shape as output. Therefore, it is able to naturally give meaningful results even for very ambiguous inputs. If you look at the citation to our paper you can see even more progress just in terms of going from 2D images to 3D shapes.
This problem is known as Photogrammetry.
Google will supply you with endless references, just be aware that if you want to roll your own, it's a very hard problem.
Check out The Deadalus Project, althought that website does not contain a gallery with illustrative information about the solution, it post several papers and info about the working method.
I watched a lecture from one of the main researchers of the project (Roger Hubbold), and the image results are quite amazing! Althought is a complex and long problem. It has a lot of tricky details to take into account to get an approximation of the 3d data, take for example the 3d information from wall surfaces, for which the heuristic to work is as follows: Take a photo with normal illumination of the scene, and then retake the picture in same position with full flash active, then substract both images and divide the result by a pre-taken flash calibration image, apply a box filter to this new result and then post-process to estimate depth values, the whole process is explained in detail in this paper (which is also posted/referenced in the project website)
Google Sketchup (free) has a photo matching tool that allows you to take a photograph and match its perspective for easy modeling.
EDIT: It appears that you're interested in developing your own solution. I thought you were trying to obtain a 3D model of an image in a single instance. If this answer isn't helpful, I apologize.
Hope this helps if you are trying to construct 3d volume from 2d stack of images !! You can use open source tool such as ImageJ Fiji which comes with 3d viewer plugin..
https://quppler.com/creating-a-classifier-using-image-j-fiji-for-3d-volume-data-preparation-from-stack-of-images/

Finding path obstacles in a 2D image

what approach would you recommend for finding obstacles in a 2D image?
Here are some key points I came up with till now:
I doubt I can use object recognition based on "database of obstacles" search, since I don't know what might the obstruction look like.
I assume color recognition might be problematic if the path does not differ a lot from the object itself.
Possibly, adding one more camera and computing a 3D image (like a Kinect does) would work, but that would not run as smooth as I require.
To illustrate the problem; robot can ride either left or right side of the pavement. In the following picture, left side is the correct choice:
If you know what the path looks like, this is largely a classification problem. Acquire a bunch of images of path at different distances, illumination, etc. and manually label the ground in each image. Use this labeled data to train a classifier that classifies each pixel as either "road" or "not road." Depending upon the texture of the road, this could be as simple as classifying each pixels' RGB (or HSV) values or using OpenCv's built-in histogram back-projection (i.e. cv::CalcBackProjectPatch()).
I suggest beginning with manual thresholds, moving to histogram-based matching, and only using a full-fledged machine learning classifier (such as a Naive Bayes Classifier or a SVM) if the simpler techniques fail. Once the entire image is classified, all pixels that are identified as "not road" are obstacles. By classifying the road instead of the obstacles, we completely avoided building a "database of objects".
Somewhat out of the scope of the question, the easiest solution is to add additional sensors ("throw more hardware at the problem!") and directly measure the three-dimensional position of obstacles. In order of preference:
Microsoft Kinect: Cheap, easy, and effective. Due to ambient IR light, it only works indoors.
Scanning Laser Rangefinder: Extremely accurate, easy to setup, and works outside. Also very expensive (~$1200-10,000 depending upon maximum range and sample rate).
Stereo Camera: Not as good as a Kinect, but it works outside. If you cannot afford a pre-made stereo camera (~$1800), you can make a decent custom stereo camera using USB webcams.
Note that professional stereo vision cameras can be very fast by using custom hardware (Stereo On-Chip, STOC). Software-based stereo is also reasonably fast (10-20 Hz) on a modern computer.

Resources