I have this project I'm working on. A part of the project involves multiple test runs during which screenshots of an application window are taken. Now, we have to ensure that screenshots taken between consecutive runs match (barring some allowable changes). These changes could be things like filenames, dates, different logos, etc. within the application window that we're taking a screenshot of.
I had the bright idea to automate the process of doing this checking. Essentially my idea was this. If I could somehow mathematically quantify the difference between a screenshot from the N-1th run and the Nth run, I could create a binary labelled dataset that mapped feature vectors of some sort to a label (0 for pass or 1 for fail if the images do not adequately match up). The reason for all of this was so that my labelled data would help make the model understand what scale of changes are acceptable, because there are so many kinds that are acceptable.
Now lets say I have access to lots of data that I have meticulously labelled, in the thousands. So far I have tried using SIFT in opencv using keypoint matching to determine a similarity score between images. But this isn't an intelligent, learning process. Is there some way I could take some information from SIFT and use it as my x-value in my dataset?
Here are my questions:
what would that be the information I need as my x-value? It needs to be something that represents the difference between two images. So maybe the difference between feature vectors from SIFT? What do I do when those vectors are of slightly different dimensions?
Am I on the right track with thinking about using SIFT? Should I look elsewhere and if so where?
Thanks for your time!
The approach that is being suggested in the question goes like this -
Find SIFT features of two consecutive images.
Use those to somehow quantify the similarity between two images (sounds reasonable)
Use this metric to first classify the images into similar and non-similar.
Use this dataset to train a NN do to the same job.
I am not completely convinced if this is a good approach. Let's say that you created the initial classifier with SIFT features. You are then using this data to train a NN. But this data will definitely have a lot of wrong labels. Because if it didn't have a lot of wrong labels, what's stopping you from using your original SIFT based classifier as your final solution?
So if your SIFT based classification is good, why even train a NN? On the other hand, if it's bad, you are giving a lot of wrong labeled data to the NN for training. I think the latter is a probably a bad idea. I say probably because there is a possibility that maybe the wrong labels just encourage the NN to generalize better, but that would require a lot of data, I imagine.
Another way to look at this is, let's say that your initial classifier is 90% accurate. That's probably the upper limit of the performance for the NN that you are looking at when talking about training it with this data.
You said that the issue that you have with your first approach is that 'it's not a an intelligent, learning process'. I think it's the wrong approach to think that the former approach is always inferior to the latter. SIFT is a powerful tool that can solve a lot of problems without all the 'black-boxness' of an NN. If this problem can be solved with sufficient accuracy using SIFT, I think going after a learning based approach is not the way to go, because again, a learning based approach isn't necessarily superior.
However, if the SIFT approach isn't giving you good enough results, definitely start thinking of NN stuff, but at that point, using the "bad" method to label the data is probably a bad idea.
Also in relation, I think you could potentially be underestimating the amount of data that is needed for this. You mentioned data in the thousands, but that's honestly, not a lot. You would need a lot more, I think.
One way I would think about instead doing this -
Do SIFT keyponits detection for a sample reference image.
Manually filter out keypoints that does not belong to the things in the image that are invariant. That is, just take keypoints at the locations in the image that is guaranteed (or very likely) to be always present.
When you get a new image, compute the keypoints and do matching with the reference image.
Set some threshold of the ratio of good matches to the total number of matches.
Depending on your application, this might give you good enough results.
If not, and if you really want your solution to be NN based, I would say you need to manually label the dataset as opposed to using SIFT.
4 years ago I posted this question and got a few answers that were unfortunately outside my skill level. I just attended a build tour conference where they spoke about machine learning and this got me thinking of the possibility of using ML as a solution to my problem. i found this on the azure site but i dont think it will help me because its scope is pretty narrow.
Here is what i am trying to achieve:
i have a source image:
and i want to which one of the following symbols (if any) are contained in the image above:
the compare needs to support minor distortion, scaling, color differences, rotation, and brightness differences.
the number of symbols to match will ultimately at least be greater than 100.
is ML a good tool to solve this problem? if so, any starting tips?
As far as I know, Project Oxford (MS Azure CV API) wouldn't be suitable for your task. Their APIs are very focused to Face related tasks (detection, verification, etc), OCR and Image description. And apparently you can't extend their models or train new ones from the existing ones.
However, even though I don't know an out of the box solution for your object detection problem; there are easy enough approaches that you could try and that would give you some start point results.
For instance, here is a naive method you could use:
1) Create your dataset:
This is probably the more tedious step and paradoxically a crucial one. I will assume you have a good amount of images to work with. What would you need to do is to pick a fixed window size and extract positive and negative examples.
If some of the images in your dataset are in different sizes you would need to rescale them to a common size. You don't need to get too crazy about the size, probably 30x30 images would be more than enough. To make things easier I would turn the images to gray scale too.
2) Pick a classification algorithm and train it:
There is an awful amount of classification algorithms out there. But if you are new to machine learning I will pick the one I would understand the most. Keeping that in mind, I would check out logistic regression which give decent results, it's easy enough for starters and have a lot of libraries and tutorials. For instance, this one or this one. At first I would say to focus in a binary classification problem (like if there is an UD logo in the picture or not) and when you master that one you can jump to the multi-class case. There are resources for that too or you can always have several models one per logo and run this recipe for each one separately.
To train your model, you just need to read the images generated in the step 1 and turn them into a vector and label them accordingly. That would be the dataset that will feed your model. If you are using images in gray scale, then each position in the vector would correspond to a pixel value in the range 0-255. Depending on the algorithm you might need to rescale those values to the range [0-1] (this is because some algorithms perform better with values in that range). Notice that rescaling the range in this case is fairly easy (new_value = value/255).
You also need to split your dataset, reserving some examples for training, a subset for validation and another one for testing. Again, there are different ways to do this, but I'm keeping this answer as naive as possible.
3) Perform the detection:
So now let's start the fun part. Given any image you want to run your model and produce coordinates in the picture where there is a logo. There are different ways to do this and I will describe one that probably is not the best nor the more efficient, but it's easier to develop in my opinion.
You are going to scan the picture, extracting the pixels in a "window", rescaling those pixels to the size you selected in step 1 and then feed them to your model.
If the model give you a positive answer then you mark that window in the original image. Since the logo might appear in different scales you need to repeat this process with different window sizes. You also would need to tweak the amount of space between windows.
4) Rinse and repeat:
At the first iteration it's very likely that you will get a lot of false positives. Then you need to take those as negative examples and retrain your model. This would be an iterative process and hopefully on each iteration you will have less and less false positives and fewer false negatives.
Once you are reasonable happy with your solution, you might want to improve it. You might want to try other classification algorithms like SVM or Deep Learning Artificial Neural Networks, or to try better object detection frameworks like Viola-Jones. Also, you will probably need to use crossvalidation to compare all your solutions (you can actually use crossvalidation from the beginning). By this moment I bet you would be confident enough that you would like to use OpenCV or another ready to use framework in which case you will have a fair understanding of what is going on under the hood.
Also you could just disregard all this answer and go for an OpenCV object detection tutorial like this one. Or take another answer from another question like this one. Good luck!
I've been tasked to carry out a benchmark of an existing classifier for my company. The biggest problem currently is differentiating between different type of transportation's, like recognizing if i'm currently in a train, driving a car or bicycling so this is the main focus.
I've been reading alot about LSTM, http://en.wikipedia.org/wiki/Long_short_term_memory and its recent success in handwriting and speech-recognition, where the time between significant events could be pretty long.
So, my first thought about the problem with train/bus is that there probably isn't such a clear and short cycle as there is when walking/running for instance so long-term memory is probably crucial.
Have anyone tried anything similar with decent results?
Or is there other techniques that could potentially solve this problem better?
I've worked on mode of transportation detection using smartphone accelerometers. The main result I've found is that almost any classifier will do; the key problem is then the set of features. (This is no different from many other machine learning problems.) My feature set ended up containing time-domain and frequency-domain values, both taken from time-series sliding-window segmentation.
Another problem is that the accelerometer can be placed anywhere. On the body, it can be anywhere and in any orientation. If the user is driving, is the phone in a pocket, in a bag, on a car seat, attached to a suction-cup window mount, etc.?
If you want to avoid these problems, use GPS instead of the accelerometer. You can make relatively accurate classifications with that sensor, but the cost is the battery usage.
I am somewhat of an amateur farmer and I have a precious cherry tomato plant growing in a pot. Lately, to my chagrin, I have discovered that my precious plant has been the victim of a scheme perpetrated by the evil Manduca Quinquemaculata - also known as the Tomato Hornworm (http://insects.tamu.edu/images/insects/common/images/cd-43-c-txt/cimg308.html).
While smashing the last worm I saw, I thought to myself, if I were to use a webcam connected to my computer with a program running, would it be possible to use some kind of an application to monitor my precious plant? These pests are supremely camouflaged and very difficult for my naive eyes to detect.
I've seen research using artificial neural networks (ANNs) for all sorts of things such as recognizing people's faces, etc., and so maybe it would be possible to locate the pest with an ANN.
I have several questions though that I would like some suggestions though.
1) Is there a ranking of the different ANNs in terms of how good they are at classifying? Are multilayer perceptrons known to be better than Hopfields? Or is this a question to which the answer is unknown?
2) Why do there exist several different activation functions that can be used in ANNs? Sigmoids, hyperbolic tangents, step functions, etc. How would one know which function to choose?
3) If I had an image of a plant w/ a worm on one of the branches, I think that I could train a neural network to look for branches that are thin, get fat for a short period, and then get thin again. I have a problem though with branches crossing all over the place. Is there a preprocessing step that could be applied on an image to distinguish between foreground and background elements? I would want to isolate individual branches to run through the network one at a time. Is there some kind of nice transformation algorithm?
Any good pointers on pattern recognition and image processing such as books or articles would be much appreciated too.
Sincerely,
mj
Tomato Hornworms were harmed during the writing of this email.
A good rule of thumb for machine learning is: better features beat better algorithms. I.e if you feed the raw image pixels directly into your classifier, the results will be poor, no matter what learning algorithm you use. If you preprocess the image and extract features that are highly correlated with "caterpillar presence", then most algorithms will do a decent job.
So don't focus on the network topology, start with the computer vision task.
Do these little suckers move around regularly? If so, and if the plant is quite static (meaning no wind or other forces that make it move), then a simple filter to find movement could be sufficient. That would bypass the need of any learning algorithm, which are often quite difficult to train and implement.
A developer I am working with is developing a program that analyzes images of pavement to find cracks in the pavement. For every crack his program finds, it produces an entry in a file that tells me which pixels make up that particular crack. There are two problems with his software though:
1) It produces several false positives
2) If he finds a crack, he only finds small sections of it and denotes those sections as being separate cracks.
My job is to write software that will read this data, analyze it, and tell the difference between false-positives and actual cracks. I also need to determine how to group together all the small sections of a crack as one.
I have tried various ways of filtering the data to eliminate false-positives, and have been using neural networks to a limited degree of success to group cracks together. I understand there will be error, but as of now, there is just too much error. Does anyone have any insight for a non-AI expert as to the best way to accomplish my task or learn more about it? What kinds of books should I read, or what kind of classes should I take?
EDIT My question is more about how to notice patterns in my coworker's data and identify those patterns as actual cracks. It's the higher-level logic that I'm concerned with, not so much the low-level logic.
EDIT In all actuality, it would take AT LEAST 20 sample images to give an accurate representation of the data I'm working with. It varies a lot. But I do have a sample here, here, and here. These images have already been processed by my coworker's process. The red, blue, and green data is what I have to classify (red stands for dark crack, blue stands for light crack, and green stands for a wide/sealed crack).
In addition to the useful comments about image processing, it also sounds like you're dealing with a clustering problem.
Clustering algorithms come from the machine learning literature, specifically unsupervised learning. As the name implies, the basic idea is to try to identify natural clusters of data points within some large set of data.
For example, the picture below shows how a clustering algorithm might group a bunch of points into 7 clusters (indicated by circles and color):
(source: natekohl.net)
In your case, a clustering algorithm would attempt to repeatedly merge small cracks to form larger cracks, until some stopping criteria is met. The end result would be a smaller set of joined cracks. Of course, cracks are a little different than two-dimensional points -- part of the trick in getting a clustering algorithm to work here will be defining a useful distance metric between two cracks.
Popular clustering algorithms include k-means clustering (demo) and hierarchical clustering. That second link also has a nice step-by-step explanation of how k-means works.
EDIT: This paper by some engineers at Phillips looks relevant to what you're trying to do:
Chenn-Jung Huang, Chua-Chin Wang, Chi-Feng Wu, "Image Processing Techniques for Wafer Defect Cluster Identification," IEEE Design and Test of Computers, vol. 19, no. 2, pp. 44-48, March/April, 2002.
They're doing a visual inspection for defects on silicon wafers, and use a median filter to remove noise before using a nearest-neighbor clustering algorithm to detect the defects.
Here are some related papers/books that they cite that might be useful:
M. Taubenlatt and J. Batchelder, “Patterned Wafer Inspection Using Spatial Filtering for Cluster Environment,” Applied Optics, vol. 31, no. 17, June 1992, pp. 3354-3362.
F.-L. Chen and S.-F. Liu, “A Neural-Network Approach to Recognize Defect Spatial Pattern in Semiconductor Fabrication.” IEEE Trans. Semiconductor Manufacturing, vol. 13, no. 3, Aug. 2000, pp. 366-373.
G. Earl, R. Johnsonbaugh, and S. Jost, Pattern Recognition and Image Analysis, Prentice Hall, Upper Saddle River, N.J., 1996.
Your problem falls in the very broad field of image classification. These types of problems can be notoriously difficult, and at the end of the day, solving them is an art. You must exploit every piece of knowledge you have about the problem domain to make it tractable.
One fundamental issue is normalization. You want to have similarly classified objects to be as similar as possible in their data representation. For example, if you have an image of the cracks, do all images have the same orientation? If not, then rotating the image may help in your classification. Similarly, scaling and translation (refer to this)
You also want to remove as much irrelevant data as possible from your training sets. Rather than directly working on the image, perhaps you could use edge extraction (for example Canny edge detection). This will remove all the 'noise' from the image, leaving only the edges. The exercise is then reduced to identifying which edges are the cracks and which are the natural pavement.
If you want to fast track to a solution then I suggest you first try the your luck with a Convolutional Neural Net, which can perform pretty good image classification with a minimum of preprocessing and noramlization. Its pretty well known in handwriting recognition, and might be just right for what you're doing.
I'm a bit confused by the way you've chosen to break down the problem. If your coworker isn't identifying complete cracks, and that's the spec, then that makes it your problem. But if you manage to stitch all the cracks together, and avoid his false positives, then haven't you just done his job?
That aside, I think this is an edge detection problem rather than a classification problem. If the edge detector is good, then your issues go away.
If you are still set on classification, then you are going to need a training set with known answers, since you need a way to quantify what differentiates a false positive from a real crack. However I still think it is unlikely that your classifier will be able to connect the cracks, since these are specific to each individual paving slab.
I have to agree with ire_and_curses, once you dive into the realm of edge detection to patch your co-developers crack detection, and remove his false positives, it seems as if you would be doing his job. If you can patch what his software did not detect, and remove his false positives around what he has given you. It seems like you would be able to do this for the full image.
If the spec is for him to detect the cracks, and you classify them, then it's his job to do the edge detection and remove false positives. And your job to take what he has given you and classify what type of crack it is. If you have to do edge detection to do that, then it sounds like you are not far from putting your co-developer out of work.
There are some very good answers here. But if you are unable to solve the problem, you may consider Mechanical Turk. In some cases it can be very cost-effective for stubborn problems. I know people who use it for all kinds of things like this (verification that a human can do easily but proves hard to code).
https://www.mturk.com/mturk/welcome
I am no expert by any means, but try looking at Haar Cascades. You may also wish to experiment with the OpenCV toolkit. These two things together do face detection and other object-detection tasks.
You may have to do "training" to develop a Haar Cascade for cracks in pavement.
What’s the best approach to recognize patterns in data, and what’s the best way to learn more on the topic?
The best approach is to study pattern recognition and machine learning. I would start with Duda's Pattern Classification and use Bishop's Pattern Recognition and Machine Learning as reference. It would take a good while for the material to sink in, but getting basic sense of pattern recognition and major approaches of classification problem should give you the direction. I can sit here and make some assumptions about your data, but honestly you probably have the best idea about the data set since you've been dealing with it more than anyone. Some of the useful technique for instance could be support vector machine and boosting.
Edit: An interesting application of boosting is real-time face detection. See Viola/Jones's Rapid Object Detection using a Boosted Cascade of Simple
Features (pdf). Also, looking at the sample images, I'd say you should try improving the edge detection a bit. Maybe smoothing the image with Gaussian and running more aggressive edge detection can increase detection of smaller cracks.
I suggest you pick up any image processing textbook and read on the subject.
Particularly, you might be interested in Morphological Operations like Dilation and Erosion, which complements the job of an edge detector. Plenty of materials on the net...
This is an image processing problem. There are lots of books written on the subject, and much of the material in these books will go beyond a line-detection problem like this. Here is the outline of one technique that would work for the problem.
When you find a crack, you find some pixels that make up the crack. Edge detection filters or other edge detection methods can be used for this.
Start with one (any) pixel in a crack, then "follow" it to make a multipoint line out of the crack -- save the points that make up the line. You can remove some intermediate points if they lie close to a straight line. Do this with all the crack pixels. If you have a star-shaped crack, don't worry about it. Just follow the pixels in one (or two) directions to make up a line, then remove these pixels from the set of crack pixels. The other legs of the star will recognized as separate lines (for now).
You might perform some thinning on the crack pixels before step 1. In other words, check the neighbors of the pixels, and if there are too many then ignore that pixel. (This is a simplification -- you can find several algorithms for this.) Another preprocessing step might be to remove all the lines that are too thin or two faint. This might help with the false positives.
Now you have a lot of short, multipoint lines. For the endpoints of each line, find the nearest line. If the lines are within a tolerance, then "connect" the lines -- link them or add them to the same structure or array. This way, you can connect the close cracks, which would likely be the same crack in the concrete.
It seems like no matter the algorithm, some parameter adjustment will be necessary for good performance. Write it so it's easy to make minor changes in things like intensity thresholds, minimum and maximum thickness, etc.
Depending on the usage environment, you might want to allow user judgement do determine the questionable cases, and/or allow a user to review the all the cracks and click to combine, split or remove detected cracks.
You got some very good answer, esp. #Nate's, and all the links and books suggested are worthwhile. However, I'm surprised nobody suggested the one book that would have been my top pick -- O'Reilly's Programming Collective Intelligence. The title may not seem germane to your question, but, believe me, the contents are: one of the most practical, programmer-oriented coverage of data mining and "machine learning" I've ever seen. Give it a spin!-)
It sounds a little like a problem there is in Rock Mechanics, where there are joints in a rock mass and these joints have to be grouped into 'sets' by orientation, length and other properties. In this instance one method that works well is clustering, although classical K-means does seem to have a few problems which I have addressed in the past using a genetic algorithm to run the interative solution.
In this instance I suspect it might not work quite the same way. In this case I suspect that you need to create your groups to start with i.e. longitudinal, transverse etc. and define exactly what the behviour of each group is i.e. can a single longitudinal crack branch part way along it's length, and if it does what does that do to it's classification.
Once you have that then for each crack, I would generate a random crack or pattern of cracks based on the classification you have created. You can then use something like a least squares approach to see how closely the crack you are checking fits against the random crack / cracks you have generated. You can repeat this analysis many times in the manner of a Monte-Carlo analysis to identify which of the randomly generated crack / cracks best fits the one you are checking.
To then deal with the false positives you will need to create a pattern for each of the different types of false positives i.e. the edge of a kerb is a straight line. You will then be able to run the analysis picking out which is the most likely group for each crack you analyse.
Finally, you will need to 'tweak' the definition of different crack types to try and get a better result. I guess this could either use an automated approach or a manual approach depending on how you define your different crack types.
One other modification that sometimes helps when I'm doing problems like this is to have a random group. By tweaking the sensitivity of a random group i.e. how more or less likely a crack is to be included in the random group, you can sometimes adjust the sensitivty of the model to complex patterns that don't really fit anywhere.
Good luck, looks to me like you have a real challenge.
You should read about data mining, specially pattern mining.
Data mining is the process of extracting patterns from data. As more data are gathered, with the amount of data doubling every three years, data mining is becoming an increasingly important tool to transform these data into information. It is commonly used in a wide range of profiling practices, such as marketing, surveillance, fraud detection and scientific discovery.
A good book on the subject is Data Mining: Practical Machine Learning Tools and Techniques
(source: waikato.ac.nz) ](http://www.amazon.com/Data-Mining-Ian-H-Witten/dp/3446215336 "ISBN 0-12-088407-0")
Basically what you have to do is apply statistical tools and methodologies to your datasets. The most used comparison methodologies are Student's t-test and the Chi squared test, to see if two unrelated variables are related with some confidence.