Graph-SLAM when it uses only odometry information, will it still run? and what is the outcome? - robotics

This is a kind of difficult question.
I know about EKF-SLAM, which uses a state from previous time to estimate next state as an online filter,
I also know about Graph-SLAM, which uses all states in past as full SLAM, and represents them as merely whole bunch of nodes and edges, and optimize structure of nodes and eges by minimizing error to estimate better states.
Now,
I know that there is no meaning in running EKF-SLAM with odometry info only, since what EKF does is estimating future state by balancing weight between Odometry info AND Observation of landmark info. so both are needed.
My question is, is it possible to run Graph-SLAM with only Odometry info and no landmark observation info whatsoever?
It seems like Graph-SLAM can run by gathering all Odometry info state upto current ones and converting them to nodes and edges just like it does when both Odo and obs are provided, and it can optimize the structure of nodes and edges.
Is is possible?
What does output mean? "Optimized" Odometry?
Any thought to it?
Thank you in advance. :)

preface: I am not 100% certain, these are just my assumptions/opinions
the point of SLAM is Simultaneous Localization And Mapping In order to do any mapping you need Observation of landmarks, or some other feature. Otherwise you are only performing localization.
Think if I dropped you a building you've never been in before and I said, create a map for me, BUT you can ONLY count your footsteps. You must not use any other senses (eyes closed, ears plugged, etc). You would quickly realize this is a nearly impossible task. If you use only odometry, something like a Kalman Filter, or EKF should work nicely, but again this is only doing localization, not mapping.
hope that helps

Related

How to track Fast Moving Objects?

I'm trying to create an application that will be able to track rapidly moving objects in video/camera feed, however have not found any CV/DL solution that is good enough. Can you recommend any computer vision solution for tracking fast moving objects on regular laptop computer and web cam? A demo app would be ideal.
For example see this video where the tracking is done in hardware (I'm looking for software solution) : https://www.youtube.com/watch?v=qn5YQVvW-hQ
Target tracking is a very difficult problem. In target tracking you will have two main issues: the motion uncertainty problem, and the origin uncertainty problem. The first one refers to the way you model object motion so you can predict its future state, and the second refers to the issue of data association(what measurement corresponds to what track, and the literature is filled with scientific ways in which this issue can be approached).
Before you can come up with a solution to your problem you will have to answer some questions yourself, regarding the tracking problem you want to solve. For example: what are the values that you what to track(this will define your state vector), how are those values related to one another, are you trying to perform single object tracking or multiple object tracking, how are the objects moving( do they have a relatively constant acceleration or velocity ) or not, do objects make turns, can objects also be occluded or not and so on.
The Kalman Filter is good solution to predict the next state of your system (once you have identified your process model). A deep learning alternative to the Kalman filter is the so called Deep Kalman Filter which essentially is used to do the same thing. In case your process or measurement models are not linear, you will have to linearize them before predicting the next state. Some solutions that deal with non-linear process or measurement models are the Extended Kalman Filter (EKF) or Unscented Kalman Filter (UKF).
Now related to fast moving objects, an idea you can use is to have a larger covariance matrix since the objects can move a lot more if they are fast, so the search space for the correct association has to be a bit larger. Additionally you can use multiple motion models in case your motion model cannot be satisfied with only one model.
In case of occlusions I will leave you this stack overflow thread, where I have given an answer covering more details regarding occlusion handling in case of tracking. I have added some references for you to read. You will have to provide more details in your question, if you would like to receive more information regarding a solution (for example you should define fast moving objects with respect to camera frame rate).
I personally do not think there is a silver bullet solution for the tracking problem, I prefer to tailor a solution to the problem I am trying to solve.
The tracking problem is complicated. It is also more in the realm of control systems than computer vision. It would be also helpful to know more about your situation, as the performance of the chosen method pretty much depends on your problem constraints. Are you interested in real-time tracking? Are you trying to reconstruct an existing trajectory? Are there multiple targets? Just one? Are the physical properties of the targets (i.e. velocity, direction, acceleration) constant?
One of the most basic tracking methods is implemented by a Linear Dynamic System (LDS) description, in concrete, a discrete implementation, since we’re working with discrete frames of information. This method is purely based on physics, and its prediction is very sensitive. Depending on your application, the error rate could be acceptable… or not.
A more robust solution is Kalman’s Filter, and it is pretty much the go-to answer when tracking is needed. It implements prediction based on all the measurements obtained so far during the model's lifetime. It mainly works on constant-based measurements (velocity and acceleration) although it can be extended to handle non-constant models. If you are working with targets that won't exhibit a drastic change in their velocity, this is what you (probably) should implement.
I'm sorry I can't provide you with more, but the topic is pretty extensive and, admittedly, the details are beyond my area of expertise. Hopefully, this info should give you a little bit of context for finding a solution.
The problem of tracking fast-moving objects (FMO) is a known research topic in computer vision. FMOs are defined as objects which move over a distance larger than their size in one video frame. The solutions which have been proposed use classical image processing and energy minimization to establish their trajectories and sharp appearance.
If you need a demo app, I would suggest this GitHub repository: https://github.com/rozumden/fmo-cpp-demo. The demo is written in OpenCV/C++ and runs in real-time. The authors also provide a mobile app version, which is still in testing mode. Using this demo app you can track any fast moving objects in real-time without even providing an object model. However, if you provide object size in real-world units, the app can also estimate object speed.
A more sophisticated algorithm is open-sourced here: https://github.com/rozumden/deblatting_python, written in Python and PyTorch for speed-up. The repository contains a solution to the deblatting (deblurring and matting) problem, exactly what happens when a Fast Moving Object appears in front of a camera.

Image analysis technique to determine approximate change in view over a short period of time?

I am working on an open source package for robot owners. I want to do a decent job of detecting when the robot is having movement problems. One of the problems the robot commonly has is that the back wheel gets "tucked underneath" in a bad way and makes it turn very slowly when on carpet. I believe that with a combination of accelerometer value inspection and (I hope) a relatively simple yet robust vision analysis technique, I will be able to tell when the robot is having this specific problem.
What I need is to be able to analyze two images, separated by about 1/2 second in time, and get a numerical value that tells about how close they are, but in a way that has some intelligence about the objects in the screen instead of just a simple color/hue/etc. analysis. I've heard of an algorithm called optical flow that is used in object and scene tracking, but I'm hoping I don't need something heavyweight.
Is there an machine vision algorithm/function that can analyze two JPEG's and tell if they belong to the same scene and viewpoint, yet can also deliver a numerical monotonically increasing value that tells me rough how different they are? If I could get that numerical value and compare it to the number of milliseconds past, while examining the current accelerometer activity, I believe I can detect when the robot is having the "slow turn of death" problem.
If so, please tell me the basic technique involved, and if you know of machine vision library that implements it, which one it is.
but in a way that has some intelligence about the objects in the screen instead of just a simple color/hue/etc. analysis
What you are suggesting is a complex problem by itself, so forget about 'lightweight' solutions. Probably you are going to need something like optical flow.
Other options I would recommend you looking into are:
Vanishing points detection and variation from image to image. This quite fits into your problem domain Wikipedia
Disparity map: related to optical flow. Used for stereographic vision, but I think you can use it for the kind of application you are looking for. Take a look at this

Machine Learning: sign visibility

I work at an airport where we need to determine the visibility conditions of pilots.
To do this, we have signs placed every 200 meters along the runway that allow us to determine how far the visibility is. We have multiple runways, and the visibility needs to be checked every hour.
Right now the visibility check is done manually with a human being who looks at the photos from the cameras placed at the end of each runway. So it can be tedious.
I'm a programmer who has very little experience with machine learning, but this sounds like an easy problem to automate. How should I approach this problem? Which algorithms should I study? Would OpenCV help me?
Thanks!
I think this can be automated using computer vision techniques. openCV could make the implementation easier. If all the signs are similar then ,we can train our program to recognize the sign in a specific conditions(lights). Then, we can use the trained classifier to check for the visibility of signs every hours using a simple script.
There is harr-like feature extraction already in openCV. You can use to train classifier which will output a .xml file and use that .xml file for detecting the sign regularly.
I have done a similar project RTVTR(Real Time Vehicle Tracking and Recognition) using openCV and it worked great. http://www.youtube.com/watch?v=xJwBT76VEZ4
Answering to your questions:
How should I approach this problem?
It depends on the result you want/need to obtain. Is this an "hobby" project (even if job-related) or do you need to build a machine vision system to solve the problem and should it be compliant with some regulations or standard?
Which algorithms should I study?
I am very interested in your question but I am not an expert in the field of meteorology and so searching in the relative literature is, for me, a time consuming task... so I reserve to update this part of the answer in the future. I think there will be different algorithms involved in the solution of the problem, some are very general like for example algorithms for the image segmentation, some are very specific like for example how to measure the visibility.
Update: one of the keyword for searching in the literature is Meteorological Visibility, for example
HAUTIERE, Nicolas, et al. Automatic fog detection and estimation of visibility distance through use of an onboard camera. Machine Vision and Applications, 2006, 17.1: 8-20.
LENOR, Stephan, et al. An Improved Model for Estimating the Meteorological Visibility from a Road Surface Luminance Curve. In: Pattern Recognition. Springer Berlin Heidelberg, 2013. p. 184-193.
Would OpenCV help me?
Yes, I think OpenCV can help giving you a starting point.
An idea for a naïve algorithm:
Segment the image in order to get the pixel regions belonging to the signs and to the background.
Compute the measure of visibility according to some procedure, the measure is computed by a function that has as input the regions of all the signs and the background region.
The segmentation can be simplified a lot if the signs are always in the same fixed and known position inside the image.
The measure of visibility is obviously the core of the algorithm and it can be performed in a lot of ways...
You can follow a simple approach where you compute the visibility with a mathematical formula based on the average gray level of the signs and background regions.
You can follow a more sophisticated and machine-learning oriented approach where you implement an algorithm that mimics your current human being based procedure. In this case your problem can be framed as a supervised learning task: you have a set of training examples, each training example is a pair composed by a) the photo of the runway (the input) and b) the visibility related to that photo and computed by human (the desired output). Then the system is trained by means of the training set and when you give a new photo as input it will give you back the visibility measure. I think you have a log for past visibility measures (METAR?) and if you saved the related images too, you will already have a relevant amount of data in order to build a training set and a test set.
Update in the age of Convolutional Neural Networks:
YOU, Yang, et al. Relative CNN-RNN: Learning Relative Atmospheric Visibility from Images. IEEE Transactions on Image Processing, 2018.
Both Tensor and uvts_cvs 's replies are very helpful. While the opencv mainly aims to recognize the sign pattern or even segment it from the background, when you extract the core feature in your problem : visibility, you may still need to include the background signal in your training set. I assume manual check of visibility is based on image contrast, if so, the signal-to-noise ratio(SNR) or contrast-to-noise ratio(CNR) is a good feature in learning. A threshold is defined to classify 'visible-1' and 'invisible-0'. The SNR/CNR can be obtained automatically especially if your sign position and size are fixed in your camera images.
Gather whole bunch of photos and videos and propose it as a challenge on Kaggle. I am sure many people would like to try solve it, even if reward would not be very high.
You can use the template matching functionality of openCV:
http://docs.opencv.org/doc/tutorials/imgproc/histograms/template_matching/template_matching.html
Where the template is the sign. If you manage to find a correct match, then the sign is visible. I think you can also get a sense of the scale of the sign in the image from that code.
As this is a very controlled and static environment, you have perfect conditions to estimate the visibility with vision-based approaches. Nonetheless, it is not so easy to decide which approach to take. In my thesis, I am reviewing this topic in depth for the less well-controlled environment of road traffic. See: LENOR, Stephan. Model-Based Estimation of Meteorological Visibility in the Context of Automotive Camera Systems. 2016. Doktorarbeit. (https://archiv.ub.uni-heidelberg.de/volltextserver/20855/1/20160509_lenor_thesis_final_print.pdf).
I see two major directions you could follow up:
Model-based approaches: Advantages: Not so much dependent on your very specific setup. You do not need heavy collection of data.
Data-based approaches/ML: Advantages: Can hide the whole complexity of different light and weather conditions. You seem to have a good source of data if there are people doing the job right now. Very promising without much engineering effort (just use a light-weighted CNN with few layers or so).
You could also combine both, etc. etc. If you are still interested in a solution, you can contact me again and I am happy to consult in more depth.

Are there any good non-predictive path following algorithms?

All the path following steering algorithms (e.g. for robots steering to follow a colored terrain) that I can find are predictive, so they rely on the robot being able to sense some distance beyond its body.
I need path following behavior on a robot with a light sensor on its underside. It can only see terrain it is directly over and so can't make any predictions; are there any standard examples of good techniques to use for this?
I think that the technique you are looking for will most likely depend on what environment will you be operating in as well as to what type of your resources will your robot have access to. I have used NXT robots in the past, so you might consider this video interesting (This video is not mine).
Assuming that you will be working on a flat non glossy surface, you can let your robot wander around until it finds a predefined colour. The robot can then kick in a 'path following' mechanism and will keep tracking the line. If it does not sense the line any more, it might want to try to turn right and/or left (since the line might no longer be under the robot because it has found a bend).
In this case though the robot will need in advance what is the colour of the line that it needs to follow.
The reason the path finding algorithms you are seeing are predictive is because the robot needs to be able to interpret what it is "seeing" in context.
For instance, consider a coloured path in the form of a straight line. Even in this simple example, how is the robot to know:
Whether there is a coloured square in front of it, hence it should advance
Which direction it is even travelling in.
These two questions are the fundamental goals the algorithm you are looking for would answer (and things would get more complex as you add more difficult terrain and paths).
The first can only be answered with suitable forward-looking ability (hence a predictive algorithm), and the latter can only be answered with some memory of the previous state.
Based solely on the details you provided in your question, you wouldn't be able to implement an appropriate solution. Although, I would imagine that your sensor input and on-board memory would in fact be suitable for a predictive solution, you may just need to investigate further what the capabilities of your hardware allow for.

People counting using OpenCV

I'm starting a search to implement a system that must count people flow of some place.
The final idea is to have something like http://www.youtube.com/watch?v=u7N1MCBRdl0 . I'm working with OpenCv to start creating it, I'm reading and studying about. But I'd like to know if some one can give me some hints of source code exemples, articles and anything elese that can make me get faster on my deal.
I started with blobtrack.exe sample to study, but I got not good results.
Tks in advice.
Blob detection is the correct way to do this, as long as you choose good threshold values and your lighting is even and consistent; but the real problem here is writing a tracking algorithm that can keep track of multiple blobs, being resistant to dropped frames. Basically you want to be able to assign persistent IDs to each blob over multiple frames, keeping in mind that due to changing lighting conditions and due to people walking very close together and/or crossing paths, the blobs may drop out for several frames, split, and/or merge.
To do this 'properly' you'd want a fuzzy ID assignment algorithm that is resistant to dropped frames (ie blob ID remains, and ideally predicts motion, if the blob drops out for a frame or two). You'd probably also want to keep a history of ID merges and splits, so that if two IDs merge to one, and then the one splits to two, you can re-assign the individual merged IDs to the resulting two blobs.
In my experience the openFrameworks openCv basic example is a good starting point.
I'll not put this as the right answer.
It is just an option for those who are able to read in Portugues or can use a translator. It's my graduation project and there is the explanation of a option to count people in it.
Limitations:
It's do not behave well on envirionaments that change so much the background light.
It must be configured for each location that you will use it.
Advantages:
It's fast!
I used OpenCV to do the basic features as, capture screen, go trough the pixels, etc. But the algorithm to count people was done by my self.
You can check it on this paper
Final opinion about this project: It's not prepared to go alive, to became a product. But it works very well as base for study.

Resources