Lane tracking with a camera: how to get distance from camera to the lane? - opencv

i am doing final year project as lane tracking using a camera. the most challenging task now is how i can measure distance between the camera (the car that carries it actually) and the lane.
While the lane is easily recognized (Hough line transform) but i found no way to measure distance to it.
given the fact that there is a way to measure distance to object in front of camera based on Pixel width of the object, but it does not work here be because the nearest point of the line, is blind in the camera.

What you want is to directly infer the depth map with a monocular camera.
You can refer my answer here
https://stackoverflow.com/a/64687551/11530294
Usually, we need a photometric measurement from a different position in the world to form a geometric understanding of the world(a.k.a depth map). For a single image, it is not possible to measure the geometric, but it is possible to infer depth from prior understanding.
One way for a single image to work is to use a deep learning-based method to direct infer depth. Usually, the deep learning-based approaches are all based on python, so if you only familiar with python, then this is the approach that you should go for. If the image is small enough, i think it is possible for realtime performance. There are many of this kind of work using CAFFE, TF, TORCH etc. you can search on git hub for more option. The one I posted here is what i used recently
reference:
Godard, Clément, et al. "Digging into self-supervised monocular depth estimation." Proceedings of the IEEE international conference on computer vision. 2019.
Source code: https://github.com/nianticlabs/monodepth2
The other way is to use a large FOV video for a single camera-based SLAM. This one has various constraints such as need good features, large FOV, slow motion, etc. You can find many of this work such as DTAM, LSDSLAM, DSO, etc. There are a couple of other packages from HKUST or ETH that does the mapping given the position(e.g if you have GPS/compass), some of the famous names are REMODE+SVO open_quadtree_mapping etc.
One typical example for a single camera-based SLAM would be LSDSLAM. It is a realtime SLAM.
This one is implemented based on ROS-C++, I remember they do publish the depth image. And you can write a python node to subscribe to the depth directly or the global optimized point cloud and project it into a depth map of any view angle.
reference: Engel, Jakob, Thomas Schöps, and Daniel Cremers. "LSD-SLAM: Large-scale direct monocular SLAM." European conference on computer vision. Springer, Cham, 2014.
source code: https://github.com/tum-vision/lsd_slam

Related

Creating Meshes from Pointclouds of Urban Scenes

I want to create high fidelity meshes of urban street scenes using pointcloud data. The data consists of pointclouds using a HDL64-e and the scenes are very similar to the one in the Kitti Dataset.
Currently im only able to use the 'raw' point clouds and odometry of the car. Previous works already implemented the LeGO-LOAM algorithm to create a monolithic map and better odometry estimates.
Available Data:
Point Clouds with 10Hz timings
Odometry estimates with higher frequencies (LOAM Output)
Monolithic map of the scene (LOAM Output) (~1.500.000 Points)
I already did some research and came to the conclusion, that I can either
use the monolithic map with algorithms like Poisson Reconstruction, Advancing Front, etc... (using CGAL)
go the robotics way and use some packages like Voxgraph (which uses Marching Cubes internally)
As we might want to integrate image data at a later step the second option would be preferred.
Questions:
Is there a State-of-the-Art way to go?
Is it possible to get a mesh that can preserve small features like curbs and sign posts? (I know there might be a feasable limit on how fine the mesh can be)
I am very interested in some feedback and a discourse on how to tackle this problem 'the right way'.
Thank you for your suggestions/answers in advance!

Real Time Camera/Image Based Large-Scale Slam Algorithm

I want to use an already implemented SLAM algorithm for mapping my college campus.
I have found some algorithms on OpenSLAM.org and some other independent ones such as LSD-SLAM and Hector SLAM, which show some promises but they have limitation such as they use LIDAR, or don't extend to large dataset etc.
SLAM has been an active topic for many years and some groups have also mapped an entire town. Can someone point me to such efficient algorithm?
My requirements are:
It must use RGB camera/cameras.
Preferably produce (somewhat) dense map of area.
It should be able to map large area (I have seen some algo which can only map up to a desk table or room, but they usually lose track if there is jerk in camera motion (Observed in LSD SLAM) or take very few landmarks which is only useful for study purposes).
Preferably a ROS implementation.

Machine Learning: sign visibility

I work at an airport where we need to determine the visibility conditions of pilots.
To do this, we have signs placed every 200 meters along the runway that allow us to determine how far the visibility is. We have multiple runways, and the visibility needs to be checked every hour.
Right now the visibility check is done manually with a human being who looks at the photos from the cameras placed at the end of each runway. So it can be tedious.
I'm a programmer who has very little experience with machine learning, but this sounds like an easy problem to automate. How should I approach this problem? Which algorithms should I study? Would OpenCV help me?
Thanks!
I think this can be automated using computer vision techniques. openCV could make the implementation easier. If all the signs are similar then ,we can train our program to recognize the sign in a specific conditions(lights). Then, we can use the trained classifier to check for the visibility of signs every hours using a simple script.
There is harr-like feature extraction already in openCV. You can use to train classifier which will output a .xml file and use that .xml file for detecting the sign regularly.
I have done a similar project RTVTR(Real Time Vehicle Tracking and Recognition) using openCV and it worked great. http://www.youtube.com/watch?v=xJwBT76VEZ4
Answering to your questions:
How should I approach this problem?
It depends on the result you want/need to obtain. Is this an "hobby" project (even if job-related) or do you need to build a machine vision system to solve the problem and should it be compliant with some regulations or standard?
Which algorithms should I study?
I am very interested in your question but I am not an expert in the field of meteorology and so searching in the relative literature is, for me, a time consuming task... so I reserve to update this part of the answer in the future. I think there will be different algorithms involved in the solution of the problem, some are very general like for example algorithms for the image segmentation, some are very specific like for example how to measure the visibility.
Update: one of the keyword for searching in the literature is Meteorological Visibility, for example
HAUTIERE, Nicolas, et al. Automatic fog detection and estimation of visibility distance through use of an onboard camera. Machine Vision and Applications, 2006, 17.1: 8-20.
LENOR, Stephan, et al. An Improved Model for Estimating the Meteorological Visibility from a Road Surface Luminance Curve. In: Pattern Recognition. Springer Berlin Heidelberg, 2013. p. 184-193.
Would OpenCV help me?
Yes, I think OpenCV can help giving you a starting point.
An idea for a naïve algorithm:
Segment the image in order to get the pixel regions belonging to the signs and to the background.
Compute the measure of visibility according to some procedure, the measure is computed by a function that has as input the regions of all the signs and the background region.
The segmentation can be simplified a lot if the signs are always in the same fixed and known position inside the image.
The measure of visibility is obviously the core of the algorithm and it can be performed in a lot of ways...
You can follow a simple approach where you compute the visibility with a mathematical formula based on the average gray level of the signs and background regions.
You can follow a more sophisticated and machine-learning oriented approach where you implement an algorithm that mimics your current human being based procedure. In this case your problem can be framed as a supervised learning task: you have a set of training examples, each training example is a pair composed by a) the photo of the runway (the input) and b) the visibility related to that photo and computed by human (the desired output). Then the system is trained by means of the training set and when you give a new photo as input it will give you back the visibility measure. I think you have a log for past visibility measures (METAR?) and if you saved the related images too, you will already have a relevant amount of data in order to build a training set and a test set.
Update in the age of Convolutional Neural Networks:
YOU, Yang, et al. Relative CNN-RNN: Learning Relative Atmospheric Visibility from Images. IEEE Transactions on Image Processing, 2018.
Both Tensor and uvts_cvs 's replies are very helpful. While the opencv mainly aims to recognize the sign pattern or even segment it from the background, when you extract the core feature in your problem : visibility, you may still need to include the background signal in your training set. I assume manual check of visibility is based on image contrast, if so, the signal-to-noise ratio(SNR) or contrast-to-noise ratio(CNR) is a good feature in learning. A threshold is defined to classify 'visible-1' and 'invisible-0'. The SNR/CNR can be obtained automatically especially if your sign position and size are fixed in your camera images.
Gather whole bunch of photos and videos and propose it as a challenge on Kaggle. I am sure many people would like to try solve it, even if reward would not be very high.
You can use the template matching functionality of openCV:
http://docs.opencv.org/doc/tutorials/imgproc/histograms/template_matching/template_matching.html
Where the template is the sign. If you manage to find a correct match, then the sign is visible. I think you can also get a sense of the scale of the sign in the image from that code.
As this is a very controlled and static environment, you have perfect conditions to estimate the visibility with vision-based approaches. Nonetheless, it is not so easy to decide which approach to take. In my thesis, I am reviewing this topic in depth for the less well-controlled environment of road traffic. See: LENOR, Stephan. Model-Based Estimation of Meteorological Visibility in the Context of Automotive Camera Systems. 2016. Doktorarbeit. (https://archiv.ub.uni-heidelberg.de/volltextserver/20855/1/20160509_lenor_thesis_final_print.pdf).
I see two major directions you could follow up:
Model-based approaches: Advantages: Not so much dependent on your very specific setup. You do not need heavy collection of data.
Data-based approaches/ML: Advantages: Can hide the whole complexity of different light and weather conditions. You seem to have a good source of data if there are people doing the job right now. Very promising without much engineering effort (just use a light-weighted CNN with few layers or so).
You could also combine both, etc. etc. If you are still interested in a solution, you can contact me again and I am happy to consult in more depth.

Using Augmented Reality libraries for Academic Project

I'm planning on doing my Final Year Project of my degree on Augmented Reality. It will be using markers and there will also be interaction between virtual objects. (sort of a simulation).
Do you recommend using libraries like ARToolkit, NyARToolkit, osgART for such project since they come with all the functions for tracking, detection, calibration etc? Will there be much work left from the programmers point of view?
What do you think if I use OpenCV and do the marker detection, recognition, calibration and other steps from scratch? Will that be too hard to handle?
I don't know how familiar you are with image or video processing, but writing a tracker from scratch will be very time-consuming if want it to return reliable results. The effort also depends on which kind of markers you plan to use. Artoolkit e.g. compares the marker's content detected from the video stream to images you earlier defined as markers. Hence it tries to match images and returns a value of probability that a certain part of the video stream is a predefined marker. Depending on the threshold you are going to use and the lighting situation, markers are not always recognized correctly. Then there are other markers like datamatrix, qrcode, framemarkers (used by QCAR) that encode an id optically. So there is no image matching required, all necessary data can be retrieved from the video stream. Then there are more complex approaches like natural feature tracking, where you can use predefined images, given that they offer enough contrast and points of interest so they can be recognized later by the tracker.
So if you are more interested in the actual application or interaction than in understanding how trackers work, you should base your work on an existing library.
I suggest you to use OpenCV, you will find high quality algorithms and it is fast. They are continuously developing new methods so soon it will be possible to run it real-time in mobiles.
You can start with this tutorial here.
Mastering OpenCV with Practical Computer Vision Projects
I did the exact same thing and found Chapter 2 of this book immensely helpful. They provide source code for the marker tracking project and I've written a framemarker generator tool. There is still quite a lot to figure out in terms of OpenGL, camera calibration, projection matrices, markers and extending it, but it is a great foundation for the marker tracking portion.

Image classification: recognizing various features of many buildings from images

So, let's say I have the longitude/latitude or address of many buildings
I can get satellite images, "street view", and perhaps 3d/perspective views of buildings.
I want to find the:
height,
number of floors,
floor area (max building footprint)
of the building. about 200k buildings.
Is there a library for recognizing buildings from satellite shots or pictures? Kind of like face detection I suppose.
Any other suggestions?
Thanks!
Yes and no. You will have to program many of the feature recognition yourself, but this should just be edge detection and point-of-interest detection (corners, straight lines, grids) to work out how many windows high a building is.
To do this you might want to checkout OpenCV (Computer Vision) which has C routines for real-time image analysis and common operations such as edge detection. I can also recommend DIP3e as a book to read and Image Processing in C and it's Java counterpart (if Java floats your boat).
Image processing is fun, and this sounds like a great project. There is tons on the Internet to learn from ... check out AI techniques to train image recognition systems to get better using a known sample set. Fun times!
Also, be careful of perspective ;)
good luck

Resources