Recognising a drawn line using neural networks in a web app - machine-learning

Basically, I was weighing up some options for a software idea I had. The web app thing is a bit of a constraint on the project, so I'm assuming I would be writing this in js.
I need to create a drawable area for the user, which is okay, allow them to draw and then compare the input to a correct example. This is just an arrow, but the arrow can be double headed (normal point arrow) or single headed (half an arrowhead), so the minute details are fairly important, as is the location.
Now, I've read around for a few hours or so, and it seems to be that a good approach is to downsample the input so I am just comparing a couple of pixels. I am wondering though if there is a simpler way to achieve what I want here, and if there are good resources for learning what I feel is a very basic implementation of image recognition. Also having never implemented something like this, I'm a little worried about the little details of something like this, like speed; obviously feedback has to be fairly quick.
Thanks.

Use openCV. It already has the kind of use cases you want (location, style etc. of the image). There are many other open source libraries but not many as robust as this.
After that you have to decide all the possible images you want to make as the standard image, then get training examples for each of these standard images (each of these std images would be your one single class).
Now use the pixels as the features (openCV will do it for you with minimum help) and do your classification training. Not you have to provide these training images and have at least a good amount of training images for each class. Then use this trained classifier to classify the images that are drawn by your users. You can put GUI on top of it to adapt to your needs that you posted above.

Related

Image Recognition - Finding glowing object

Most game anti-cheat use heuristic approach such as detecting known binaries signature or preventing third party library injection. But, Valve software use deep learning to combat cheat. Valve feed its AI with view angle, fire rate, etc. And its quite working good.
My question is, how do i make such thing but with images instead of data?
Consider this example
Not - Cheat :
Cheating :
Is it possible to make a model like that?
Well images are also just data.
You can seperate each image and its pixels into f.e. its raw numbers for like rgb.
This way you could model a network based on converted inputs of pixels from your image.
In this example, the pattern would probably just recognize huge spikes of those vibrant colors since those value will differ alot from the usual environment.
If the question aims to archive some kind of "visual cheat detection" and is not all about deep-learning, you could simply check the images pixels manually, if you know the color of your "cheat"-overlay or simply detect differences, and flag them this way.

Blur people from 1000+ images

I need to anonymize people (and maybe later license plates) from thousand images automatic.
I search through the internet to make a solution on my own with openCV/emguCV, but so far the detection rate is rather bad.
Then I came across Amazon Rekognition, which also looks good but has a steep learning curve for me.
I am somewhat confused that there is no software out there to anonymize pictures without userinput, I though in the age of StreetView this would be easier.
Am I missing something out here?
One of the simplest face localisation API I'm aware of is this one (Python, but based on dlib, which is C++ library).
It's well documented and almost ridiculously easy to use from Python.
It will give you the coordinates of a bounding box which you can blur.
Note that two different detectors you can use. The "classic" one is quite fast but misses some faces, especially when not seen full frontal. The one based on a deep learning model is much better but is quite slow without a GPU.
If you want to be a bit more sophisticated, it can give you facial feature locations (but only with the "classic" detector) and you could place the center of a blurring circe on the nose or so, but for a large number of images, I would just go for the bounding box.

How to rearrange images by pixel groups

I would like to create an image transition program. It should shift pixel areas from one image and transition them to another based on certain criteria, like colour and shape.
To do this, I need to be able to analyse the image, split it into groups, and shift these groups.
The first problem already starts with determining the pixel groups. They should not be chosen at random or perfect polygons/shapes. Does anyone know of an algorithm that can differentiate different textures/surroundings/borders?
Next, I need to do the slight adjustments to the areas in order to make them fit to the new image. Then the areas will be moved. That'll not be as hard as the first problem.
Performance doesn't matter that much; first I have to get the program working. It can take an hour to load the transition beforehand or whatever ;)
Could anyone give me some advice where to start or what technologies/APIs I could use? I'm fine with most programming languages, preferably C#, VB, JavaScript, PHP, Java, etc. The platform doesn't matter either.
I know, this is complex, but I gave my best to try to explain it. Any ideas?
Your first task, grouping according to color/texture/etc. is called segmentation. There are many approaches and algorithms to do it, and none is absolutely better than all other, as many things in image processing, the best algorithm depends on your image and your specific functional/artistic goal.
The general idea is to define multiple distances between pixels, like one distance would be based only on the position of pixels, another on the difference in their color, a more advanced metric could take the neighborhood into account to do something related to shape, contour orientations or texture. Then you would combine these distances (for example in a weighted sum) to get a "clever" measure of how similar two pixels are. After that you compute more or less exhaustively all distances and group similar pixels according to some thresholds (like how big the final groups are).
If you don't want to research and implement all that, you'd be better off using an existing image processing library. I suggest looking at OpenCV and the "segmentation" keyword. You'll get implementations of k-means, watershed and meanshift algorithms which are probably of interest for achieving your effect.
OpenCV is C++ but it also have bindings in Java and Python I think, and probably other.
For your second task, you need a mix of moving and blending pixels, but that's simpler and you can do it "by hand", or look at morphing algorithms.
A quick search revealed this blog post with a source code using OpenCV to morph two images. You also have some ready-made libraries in a few languages, have a look at related questions.
You could even directly call a command-line utility: xmorph but doesn't seem portable or imagemagick (see this script) which is more modern but not doesn't implement a real morphing algorithm AFAIK.

How can I use computer vision to find a shape in an image?

I have a simple photograph that may or may not include a logo image. I'm trying to identify whether a picture includes the logo shape or not. The logo (rectangular shape with a few extra features) could be of various sizes and could have multiple occurrences. I'd like to use Computer Vision techniques to identify the location of these logo occurrences. Can someone point me in the right direction (algorithm, technique?) that can be used to achieve this goal?
I'm quite a novice to Computer Vision so any direction would be very appreciative.
Thanks!
Practical issues
Since you need a scale-invariant method (that's the proper jargon for "could be of various sizes") SIFT (as mentioned in Logo recognition in images, thanks overrider!) is a good first choice, it's very popular these days and is worth a try. You can find here some code to download. If you cannot use Matlab, you should probably go with OpenCV. Even if you end up discarding SIFT for some reason, trying to make it work will teach you a few important things about object recognition.
General description and lingo
This section is mostly here to introduce you to a few important buzzwords, by describing a broad class of object detection methods, so that you can go and look these things up. Important: there are many other methods that do not fall in this class. We'll call this class "feature-based detection".
So first you go and find features in your image. These are characteristic points of the image (corners and line crossings are good examples) that have a lot of invariances: whatever reasonable processing you do to to your image (scaling, rotation, brightness change, adding a bit of noise, etc) it will not change the fact that there is a corner in a certain point. "Pixel value" or "vertical lines" are bad features. Sometimes a feature will include some numbers (e.g. the prominence of a corner) in addition to a position.
Then you do some clean-up, like remove features that are not strong enough.
Then you go to your database. That's something you've built in advance, usually by taking several nice and clean images of whatever you are trying to find, running you feature detection on them, cleaning things up, and arrange them in some data structure for your next stage —
Look-up. You have to take a bunch of features form your image and try to match them against your database: do they correspond to an object you are looking for? This is pretty non-trivial, since on the face of it you have to consider all subsets of the bunch of features you've found, which is exponential. So there are all kinds of smart hashing techniques to do it, like Hough transform and Geometric hashing.
Now you should do some verification. You have found some places in the image which are suspect: it's probable that they contain your object. Usually, you know what is the presumed size, orientation, and position of your object, and you can use something simple (like a convolution) to check if it's really there.
You end up with a bunch of probabilities, basically: for a few locations, how probable it is that your object is there. Here you do some outlier detection. If you expect only 1-2 occurrences of your object, you'll look for the largest probabilities that stand out, and take only these points. If you expect many occurrences (like face detection on a photo of a bunch of people), you'll look for very low probabilities and discard them.
That's it, you are done!

An algorithm for a drawing and painting robot - any tips?

Algorithm for a drawing and painting robot -
Hello
I want to write a piece of software which analyses an image, and then produces an image which captures what a human eye perceives in the original image, using a minimum of bezier path objects of varying of colour and opacity.
Unlike the recent twitter super compression contest (see: stackoverflow.com/questions/891643/twitter-image-encoding-challenge), my goal is not to create a replica which is faithful to the image, but instead to replicate the human experience of looking at the image.
As an example, if the original image shows a red balloon in the top left corner, and the reproduction has something that looks like a red balloon in the top left corner then I will have achieved my goal, even if the balloon in the reproduction is not quite in the same position and not quite the same size or colour.
When I say "as perceived by a human", I mean this in a very limited sense. i am not attempting to analyse the meaning of an image, I don't need to know what an image is of, i am only interested in the key visual features a human eye would notice, to the extent that this can be automated by an algorithm which has no capacity to conceptualise what it is actually observing.
Why this unusual criteria of human perception over photographic accuracy?
This software would be used to drive a drawing and painting robot, which will be collaborating with a human artist (see: video.google.com/videosearch?q=mr%20squiggle).
Rather than treating marks made by the human which are not photographically perfect as necessarily being mistakes, The algorithm should seek to incorporate what is already on the canvas into the final image.
So relative brightness, hue, saturation, size and position are much more important than being photographically identical to the original. The maintaining the topology of the features, block of colour, gradients, convex and concave curve will be more important the exact size shape and colour of those features
Still with me?
My problem is that I suffering a little from the "when you have a hammer everything looks like a nail" syndrome. To me it seems the way to do this is using a genetic algorithm with something like the comparison of wavelet transforms (see: grail.cs.washington.edu/projects/query/) used by retrievr (see: labs.systemone.at/retrievr/) to select fit solutions.
But the main reason I see this as the answer, is that these are these are the techniques I know, there are probably much more elegant solutions using techniques I don't now anything about.
It would be especially interesting to take into account the ways the human vision system analyses an image, so perhaps special attention needs to be paid to straight lines, and angles, high contrast borders and large blocks of similar colours.
Do you have any suggestions for things I should read on vision, image algorithms, genetic algorithms or similar projects?
Thank you
Mat
PS. Some of the spelling above may appear wrong to you and your spellcheck. It's just international spelling variations which may differ from the standard in your country: e.g. Australian standard: colour vs American standard: color
There is an model that can implemented as an algorithm to calculate a saliency map for an image, determining which parts of the image would get the most attention from a human.
The model is called itti koch model
You can find a startin paper here
And more resources and c++ sourcecode here
I cannot answer your question directly, but you should really take a look at artist/programmer (Lisp) Harold Cohen's painting machine Aaron.
That's quite a big task. You might be interested in image vectorizing (don't know what it's called officially), which is used to take in rasterized images (such as pictures you take with a camera) and outputs a set of bezier lines (i think) that approximate the image you put in. Since good algorithms often output very high quality (read: complex) line sets you'd also be interested in simplification algorithms which can help enormously.
Unfortunately I am not next to my library, or I could reccomend a number of books on perceptual psychology.
The first thing you must consider is the physiology of the human eye is such that when we examine an image or scene, we are only capturing very small bits at a time, as our eyes dart around rapidly. Our mind peices the different parts together to try and form a whole.
You might start by finding an algorithm for the path of an eyeball as it darts around. Perhaps it is attracted to contrast?
Next is that our eyes adjust the "exposure" depending on the context. It's like those high dynamic range images, if they were peiced together not by multiple exposures of a whole scene, but by many small images, each balanced on its own, but blended into its surroundings to form a high dynamic range.
Now there was a finding in a monkey brain that there is a single neuron that lights up if there's a diagonal line in the upper left of its field of vision. Similar neurons can be found for vertical lines, and horizontal lines in various areas of that monkey's field of vision. The "diagonalness" determines the frequency with which that neuron fires.
one might speculated that other neurons might be found and mapped to other qualities such as redness, or texturedness, and other things.
There's something humans can do that I've not seen a computer program ever able to do. it's something called "closure", where a human is able to fill in information about something that they are seeing, that doesn't actually exist in the image. an example:
*
* *
is that a triangle? If you knew that it was in advance, then you could probably make a program to connect the dots. But what if it's just dots? How can you know? I wouldn't attempt this one unless I had some really clever way of dealing with that one.
There are many other facts about human perception you might be able to use. Good luck, you've not picked a straightforward task.
i think a thing that could help you in this enormous task is human involvement. i mean data. like you could have many people sitting staring at random dots (like from the previous post) and connect them as they see right. you could harness that data.

Resources