detecting geometric shapes in UML, flowcharts - opencv

I am looking for ways to automatically image-process a state-machine diagram (eg. Finite State Machine or see a sample at http://www.artima.com/designtechniques/images/TrafficLight.gif) and turn it into a state-transition table. I am told that this is a solved problem -- in that there are multiple automatic image processing solutions out there that already process diagrams and turn them into some internal notation. Is this true? Also, are there similar solutions for flow charts or any other UML diagrams (like message sequence charts) also?
FYI, for image processing I am using OpenCV on windows. For text recognition (i.e. OCR), I have found Tessract useful. Before finding my own mechanism to automatically processing state-machines, I wish to know if this is a solved problem.

Related

How to prepare training data for image segmentation

I am using bounding box marking tools like BBox and YOLO marker for object localisation. I wanted to know is there any equivalent marking tools available for image segmentation tasks. How people in academia and research are preparing data sets for these image segmentation tasks. Recent Kaggle competition severstal-steel-defect-detection has pixel level segmentation information. Which tool they used to prepare this data?
Generally speaking it is a pretty complex but a common task, so you'll likely be able to find several tools. Supervise.ly is a good example. Look through the demo to understand the actual complexity.
Another way is to use OpenCV to get some specific results. We did that, but results were pretty rough. Another problem is performance. There are couple reasons we use 4K video.
Long story short, we decided to implement a custom tool to get required results (and do that fast enough).
(see in action)
Just to summarize, if you want to build a training set for segmentation you have the following options:
Use available services (pretty much all of them will require additional manual work)
Use OpenCV to deal with a specially prepared input
Develop a custom solution to deal with a properly prepared input, providing full control and accurate results
The third option seems to be the most flexible solution. Here are some examples. Those are custom multi-color segmentation results. You might got an impression custom implementation is way more complex, but as it turned out if you properly implement some straight-forward algorithm you might be surprised with the result. We were interested in accurate pixel-perfect results:
(see in action)
I have created a simple script to generate colored masks of annotation to be used for semantic segmentation.
You will need to use VIA(VGG image annotator) tool which gives the facility to mark a region in polygon. Once a polygon is created on a class/attribute you can give an attribute name and save the annotation as csv file. The x,y coordinates of the polygon gets saved in the csv file basically.
The program and steps to use are present at: https://github.com/pateldigant/SemanticAnnotator
If you have any doubts regarding the use of this script you can comment down.

Is there a OCR API that can count objects?

Is there a OCR API that could be used for recognizing and counting objects from image? Or can this be done with another image processing image processing technique?
For example if i take a close-up photo of three boxes, API would just return number 3 as a result.
You can look into OpenCV, which is popular for programmers learning about image processing and vision. You'll find an endless number of posts here on StackOverflow about OpenCV.
http://opencv.org/
Some freeware GUIs and free starter versions of commercial image processing packages will allow you to test image processing techniques without having to write the code. ImageJ is old but still worth checking out:
http://rsbweb.nih.gov/ij/
I don't want to show favoritism towards any of my sisters and brothers in the image processing world, but if you google for "machine vision free" or "computer vision free" and add words such as "GUI" you should be able to quickly find some free software that will allow you to test different image processing techniques just by using your mouse.
Along with your OCR algorithm, you'll need a segmentation method to count objects.
One such technique is the connected components algorithm:
http://en.wikipedia.org/wiki/Connected-component_labeling
The typical connect components algorithm would rely on some preprocessing:
Find a binarization threshold.
Apply the binarization threshold to generate an image of black (0) and white (1) values.
Run the connect components algorithm and label all components (objects)
Filter the results by size and other parameters. For example, you probably don't want to include foreground objects that are only a few pixels in size.
Check the size of the list of filtered components.
This is a simple, low-level method, but it's useful in many situations. Even if you think you need a more complicated technique, I would strongly recommend that you first become familiar with connected components before moving on. Until one grasps the subtleties of lighting, binarization, and component labeling, it's unlikely one can learn much useful about more complicated algorithms. There really are no shortcuts.
There are other,more complicated methods, but before suggesting which might be appropriate you would have to be more specific about what kind of objects you want to find.
With any image processing question, always include one or more sample images. It's generally not useful to talk about image processing algorithms without first understanding the image set with which you are working. What may be obvious to you will not be obvious to others, especially those who have spent years working on OCR applications and who have had to deal with a wide variety of backgrounds, scripts, and specifications.

Sketch-based Image Retrieval with OpenCV or LIRe

I'm currently reading for BSc Creative Computing with the University of London and I'm in my last year of my studies. The only remaining module I have left in order to complete the degree is the Project.
I'm very interested in the area of content-based image retrieval and my project idea is based on that concept. In a nutshell, my idea is to help novice artists in drawing sketches in perspective with the use of 3D models as references. I intend to achieve this by rendering the side/top/front views of each 3D model in a collection, pre-process these images and index them. While drawing, the user gets a series of models (that have been pre-processed) that best match his/her sketch, which can be used as guidelines to further enhance the sketch. Since this approach relies on 3D models, it is also possible for the user to rotate the sketch in 3D space and continue drawing based on that perspective. Such approach could help comic artists or concept designers in quickly sketching their ideas.
While carrying out my research I came across LIRe and I must say I was really impressed. I've downloaded the LIRe demo v0.9 and I played around with the included sample. I've also developed a small application which automatically downloades, indexes and searches for similar images in order to better understand the inner workings of the engine. Both approaches returned very good results even with a limited set of images (~300).
Next experiment was to test the output response when a sketch rather than an actual image is provided as input. As mentioned earlier, the system should be able to provide a set of matching models based on the user's sketch. This can be achieved by matching the sketch with the rendered images (which are of course then linked to the 3D model). I've tried this approach by comparing several sketches to a small set of images and the results were quite good - see http://claytoncurmi.net/wordpress/?p=17. However when I tried with a different set of images, results weren't as good as the previous scenario. I used the Bag of Visual Words (using SURF) technique provided by LIRe to create and search through the index.
I'm also trying out some sample code that comes with OpenCV (I've never used this library and I'm still finding my way).
So, my questions are;
1..Has anyone tried implementing a sketch-based image retrieval system? If so, how did you go about it?
2..Can LIRe/OpenCV be used for sketch-based image retrieval? If so, how this can be done?
PS. I've read several papers about this subject, however I didn't find any documentation about the actual implementation of such system.
Any help and/or feedback is greatly appreciated.
Regards,
Clayton

image segmentation techniques

I am working on a computer vision application and I am stuck at a conceptual roadblock. I need to recognize a set of logos in a video, and so far I have been using feature matching methods like SIFT (and ASIFT by Yu and Morel), SURF, FERNS -- basically everything in the "Common Interfaces of Generic Descriptor Matchers" section of the OpenCV documentation. But recently I have been researching methods used in OCR/Random Trees classifier (I was playing with this dataaset: http://archive.ics.uci.edu/ml/datasets/Letter+Recognition) and thinking that this might be a better way to go about finding the logos. The problem is that I can't find a reliable way to automatically segment an arbitrary image.
My questions:
Should I bother looking into methods other than descriptor/keypoint, or is this the
best way to recognize a typical logo (stylized, few colors, sharp edges)?
How can I segment an arbitary image (or a video frame, in my case) so that I can properly
match against a sample database?
It would seem that HaarCascades work in a similar way (databases of samples), but I
can't figure out how the processes are related. Is there segmentation going on there?
Sorry of these questions are too broad. I'm trying to wrap my head around this stuff with little help. Thanks!
It seems like segmentation is not what you want. I think it has to do more with object detection and recognition. You want to detect the presence of a certain set of logos, in a certain set of images. This doesn't seem related to segmentation which is about labeling surfaces or areas of a common color, texture, shape, etc., although examining segmentation based methods may be useful.
I would definitely encourage you to look at problem and examine all possible methods that can be applied, not only the fashionable ones (such as SIFT, GLOH, SURF, etc). I would recommend you look at older, simpler methods like simple template matching, chamfering, etc.
Haar cascades became popular after a 2000 paper by Viola and Jones used for face detection (similar to what you see in modern point and click cameras). It does sound a bit similar to the problem you are interested in. You should perhaps also examine this part of the problem, but try not to focus too much on the learning part.

GUI version of OpenCV for feature-detection (SIFT etc.) prototyping before actual project development?

I had an idea for which I need to be able to recognize certain objects or models from a rendered three dimensional digital movie.
After limited research, I know now that what I need is called feature detection in the field of Computer Vision.
So, what I want to do is:
create a few screenshots of a certain character in the movie (eg. front/back/leftSide/rightSide)
play the movie
while playing the movie, continuously create new screenshots of the movie
for each screenshot, perform feature detection (SIFT?, with openCV?) to see if any of our character appearances are there (they must still be recognized if the character is further away and thus appears smaller, or if the character is eg. lying down).
give a notice whenever the character is found
This would be possible with OpenCV, right?
The "issue" is that I would have to learn c++ or python to develop this application. This is not a problem if my movie and screenshots are applicable for what I want to do.
So, I would like to first test my screenshots of the movie. Is there a GUI version of OpenCV that I can input my test data and then execute it's feature detection algorithms manually as a means of prototyping?
Any feedback is appreciated. Thanks.
There is no GUI of OpenCV able to do what you want. You will be able to use OpenCV for some aspects of your problem, but there is no ready-made solution waiting there for you.
While it's definitely possible to solve your problem, the learning curve for this problem is quite long. If you're a professional, then an alternative to learning about it yourself would be to hire an expert to do it for you. It would cost money, but save you time.
EDIT
As far as template matching goes, you wouldn't normally use it to solve such a problem because the thing you're looking for is changing appearance and shape. There aren't really any "dynamic parameters to set". The closest thing you could try is have a massive template collection that would try to cover the expected forms that your target may take. But it would hardly be an elegant solution. Plus it wouldn't scale.
Next, to your point about face recognition. This is kind of related, but most facial recognition applications deal with a controlled environment: lighting, distance, pose, angle, etc. Outside of that controlled environment face detection effectiveness drops significantly. If you're detecting objects in a movie, then your environment isn't really controlled.
You may want to first try a simpler problem of accurately detecting where the characters are, without determining who they are (video surveillance, essentially). While it may sound simple, you'll find that it's actually non-trivial for arbitrary scenes. The result of solving that problem may be useful in identifying the characters.
There is Find-Object by Mathieu Labbé. It was very helpful for me to start getting an understanding of the descriptors since you can change them while your video is running to see what happens.
This is probably too late, but might help someone else looking for a solution.
Well, using OpenCV you would of taking a frame of a video file and do any computations on it.
You can do several different methods of detecting a character on that image, but it's not so easy to have it as flexible so you can even get that person if it's lying on the floor for example, if you only entered reference images of that character standing.
Basically you could try extracting all important features from your set of reference pictures and have a (in your case supervised) learning algorithm that gets a good feature-vector of that character for classification.
You then need to write your code that plays the video and which takes a video frame let's say each 500ms (or other as you desire), gets a segmentation of the object you thing would be that character and compare it with the reference values you get from your learning algorithm. If there's a match, your code can yell "Yehaaawww!" or do other things...
But all this depends on how flexible you want this to be. You could also try a template match or cross-correlation which basically shifts the reference image(s) over the frame and checks how equal both parts are. But this unfortunately is very sensitive for rotation, deformations or other noise... so you wouldn't get that person if its i.e. laying down. And I doubt you can get all those calculations done in realtime...
Basically: Yes OpenCV is good to use for your image processing/computer vision tasks. But it offers a lot of methods and ways and you'd need to find a way that works for your images... it's not a trivial task though...
Hope that helps...
Have you tried looking at some of the work of the Oxford visual geometry group?
Their Video Google system describes to a large extent what you want, instance detection.
Their work into Naming People in TV shows is also pretty relevant. A face detection and facial feature pipeline is included that can be run from Matlab. Are you familiar with Matlab?
Have you tried computer vision frameworks like Cassandra? There you can exactly do that just by some mouse clicks.

Resources