How to check images for custom characters? - image-processing

I have a set of image files that I can identify. Rather than an OCR, I'd like to search only for matches within the set. What's the ideal platform to quickly find matches?

OpenCV is an advanced computer vision library. It can recognize text blocks, colors, shapes, etc. so it might be of use.
Tesseract can be trained to handle languages, but I can't see a reason why you couldn't train it with shapes. Here's a really confusing training guide.
ImageMagick can also be useful. It's pretty hardcore endless parameter chaining, but you can get it to find images. It's not perfect for this application, but it's been done before. The documentation is insanely huge, but it's about as complete and illustrated as I could wish for (I'm a frequent user, as it's useful for quick image operations via CLI). Here's the image comparison documentation.
I would suggest OpenCV, but it's up to you. Good luck!

Related

How to prepare training data for image segmentation

I am using bounding box marking tools like BBox and YOLO marker for object localisation. I wanted to know is there any equivalent marking tools available for image segmentation tasks. How people in academia and research are preparing data sets for these image segmentation tasks. Recent Kaggle competition severstal-steel-defect-detection has pixel level segmentation information. Which tool they used to prepare this data?
Generally speaking it is a pretty complex but a common task, so you'll likely be able to find several tools. Supervise.ly is a good example. Look through the demo to understand the actual complexity.
Another way is to use OpenCV to get some specific results. We did that, but results were pretty rough. Another problem is performance. There are couple reasons we use 4K video.
Long story short, we decided to implement a custom tool to get required results (and do that fast enough).
(see in action)
Just to summarize, if you want to build a training set for segmentation you have the following options:
Use available services (pretty much all of them will require additional manual work)
Use OpenCV to deal with a specially prepared input
Develop a custom solution to deal with a properly prepared input, providing full control and accurate results
The third option seems to be the most flexible solution. Here are some examples. Those are custom multi-color segmentation results. You might got an impression custom implementation is way more complex, but as it turned out if you properly implement some straight-forward algorithm you might be surprised with the result. We were interested in accurate pixel-perfect results:
(see in action)
I have created a simple script to generate colored masks of annotation to be used for semantic segmentation.
You will need to use VIA(VGG image annotator) tool which gives the facility to mark a region in polygon. Once a polygon is created on a class/attribute you can give an attribute name and save the annotation as csv file. The x,y coordinates of the polygon gets saved in the csv file basically.
The program and steps to use are present at: https://github.com/pateldigant/SemanticAnnotator
If you have any doubts regarding the use of this script you can comment down.

Emgu CV Surf picture detection against known database?

I'm trying to compare an image against a known set of images and find the closest match(es) using Emgu CV and Surf. I've found a lot of people trying to do the same thing but not a complete solution that uses the GPU for speed.
The closest I've gotten is the tutorial here:
http://romovs.github.io/blog/2013/07/05/matching-image-to-a-set-of-images-with-emgu-cv/
However that doesn't take advantage of the GPU and it's really slow for my application. I need something fast like the SurfFeature sample.
So I tried to refactor that tutorial code to match the SurfFeature logic that uses the GPU. Everything was going well with GpuMat's replacing Matrix here and there. But I ran into a major problem when I got to the core of the tutorial above, that is to say, the logic that concatenates all of the descriptors into one large matrix. I couldn't find a way to append GpuMat's to each other - even if I could do that, there's no guarantee that the FlannIndex search routine would even work with the Gpu-based code.
So now I'm stuck on something I thought would be relatively straight-forward. There are certainly a number of people trying to do this over the years so I'm really surprised that there isn't a published solution.
If you could help me, I'd be most appreciative. To summarize, I need to do the following:
Build a large in-memory (on the GPU) list of descriptors and keypoints for a known set of images using Surf (as per the SurfFeature sample). Given an unknown image, search against the in-memory stuff to find the closest match (if any).
Thanks in advance if you can help!

image segmentation techniques

I am working on a computer vision application and I am stuck at a conceptual roadblock. I need to recognize a set of logos in a video, and so far I have been using feature matching methods like SIFT (and ASIFT by Yu and Morel), SURF, FERNS -- basically everything in the "Common Interfaces of Generic Descriptor Matchers" section of the OpenCV documentation. But recently I have been researching methods used in OCR/Random Trees classifier (I was playing with this dataaset: http://archive.ics.uci.edu/ml/datasets/Letter+Recognition) and thinking that this might be a better way to go about finding the logos. The problem is that I can't find a reliable way to automatically segment an arbitrary image.
My questions:
Should I bother looking into methods other than descriptor/keypoint, or is this the
best way to recognize a typical logo (stylized, few colors, sharp edges)?
How can I segment an arbitary image (or a video frame, in my case) so that I can properly
match against a sample database?
It would seem that HaarCascades work in a similar way (databases of samples), but I
can't figure out how the processes are related. Is there segmentation going on there?
Sorry of these questions are too broad. I'm trying to wrap my head around this stuff with little help. Thanks!
It seems like segmentation is not what you want. I think it has to do more with object detection and recognition. You want to detect the presence of a certain set of logos, in a certain set of images. This doesn't seem related to segmentation which is about labeling surfaces or areas of a common color, texture, shape, etc., although examining segmentation based methods may be useful.
I would definitely encourage you to look at problem and examine all possible methods that can be applied, not only the fashionable ones (such as SIFT, GLOH, SURF, etc). I would recommend you look at older, simpler methods like simple template matching, chamfering, etc.
Haar cascades became popular after a 2000 paper by Viola and Jones used for face detection (similar to what you see in modern point and click cameras). It does sound a bit similar to the problem you are interested in. You should perhaps also examine this part of the problem, but try not to focus too much on the learning part.

A good method for detecting the presence of a particular feature in an image

I have made a videochat, but as usual, a lot of men like to ehm, abuse the service (I leave it up to you to figure the nature of such abuse), which is not something I endorse in any way, nor do most of my users. No, I have not stolen chatroulette.com :-) Frankly, I am half-embarassed to bring this up here, but my question is technical and rather specific:
I want to filter/deny users based on their video content when this content is of offending character, like user flashing his junk on camera. What kind of image comparison algorithm would suit my needs?
I have spent a week or so reading some scientific papers and have become aware of multiple theories and their implementations, such as SIFT, SURF and some of the wavelet based approaches. Each of these has drawbacks and advantages of course. But since the nature of my image comparison is highly specific - to deny service if a certain body part is encountered on video in a range of positions - I am wondering which of the methods will suit me best?
Currently, I lean towards something along the following (Wavelet-based plus something I assume to be some proprietary innovations):
http://grail.cs.washington.edu/projects/query/
With the above, I can simply draw the offending body part, and expect offending content to be considered a match based on a threshold. Then again, I am unsure whether the method is invariable to transformations and if it is, to what kind - the paper isn't really specific on that.
Alternatively, I am thinking that a SURF implementation could do, but I am afraid that it could give me false positives. Can such implementation be trained to recognize/give weight to specific feature?
I am aware that there exist numerous questions on SURF and SIFT here, but most of them are generic in that they usually explain how to "compare" two images. My comparison is feature specific, not generic. I need a method that does not just compare two similar images, but one which can give me a rank/index/weight for a feature (however the method lets me describe it, be it an image itself or something else) being present in an image.
Looks like you need not feature detection, but object recognition, i.e. Viola-Jones method.
Take a look at facedetect.cpp example shipped with OpenCV (also there are several ready-to-use haarcascades: face detector, body detector...). It also uses image features, called Haar Wavelets. You might be interested to use color information, take a look at CamShift algorithm (also available in OpenCV).
This is more about computer vision. You have to recognize objects in your image/video sequence, whatever... for that, you can use a lot of different algorithms (most of them work in the spectral domain, that's why you will have to use a transformation).
In order to be accurate, you will also need a knowledge base or, at least, some descriptors that will define the object.
Try OpenCV, it has some algorithms already implemented (and basic descriptors included).
There are applications/algorithms out there that you can "train" (like neural networks) and are able to identify objects based on the training. Most of them (at least, the good ones) are not very popular and can only be found in research groups specialized in computer vision, object recognition, AI, etc.
Good luck!

Computer Vision with Mathematica

Does anybody here do computer vision work on Mathematica? I would like to know what external libraries are available for doing that. The built in image processing functions are not enough. I am looking for things like SURF, stereo, camera calibration, multi-view geometry etc.
How difficult would it be to wrap OpenCV for use in Mathematica?
Apart from the extensive set of image processing tools that are now (version 8) natively present in Mathematica, and which include a number of CV algorithms like finding morphologic objects, image segmentation and feature detection (see figure below), there's the new LibraryLink functionality, which makes working with DLLs very easy. You wouldn't have to change OpenCV much to be able to call it from Mathematica. Just some wrappers for the functions to be called and you're basically done.
I don't think such a thing exists, but I'm getting started.
It has the advantage that you can perform some analytic methods... for example rather than hacking in openCV or even Matlab endlessly, you can compute analytically a quantity, and see that the method leading to this matrix is numerically unstable as a function of input variables. Thus you do not need to hack, as it would be pointless.
As for wrapping opencv, that doesn't seem to make sense. The correct procedure would be to fix bad implementations in opencv based on your analysis in Mathematica and on paper.
Agreeing with Peter, I don't believe that forcing Mathematica to use OpenCV is a great thing.
All of the computer vision people that I've talked to, read about, and seen examples are using Matlab and the Imaging toolkit. Its either that, or go with a OpenCV compatible language + OpenCV.
Mathematica has a rich set of tools for image processing, but I'm uncertain about the computer vision capabilities.

Resources