How can I manipulate the output image the computer monitor displays before it gets displayed? - monitor

I have been reading into GPU abstractions and framebuffers, but I have not been able to find specific literature that satisfies my needs. I would like to programmatically manipulate the image sent from the GPU to the monitor before the monitor displays the image. Essentially, I want to manipulate the pixels on the screen (rearranging them) to effectively create a distortion when one uses the computer. If possible, an OS independent solution would be preferred and I am open to any language.
Thanks,
Saif

DLL code injection might work (http://en.wikipedia.org/wiki/DLL_injection). Or you could write a screensaver that starts with a screen-shot of the current display and then alters that image. The distortions could be done as warps using matrix transformations or other distortions.

Related

Extracting foreground objects from an image to run through convolution neural net

I am new to computer vision and image recognition. For my first CV project I am developing a tool that detects apples (the fruit) in images.
What I have so far:
I developed a convolution neural net in Python using tensorflow that determines whether something is an apple or not. The drawback is that my CNN only works on images where the apple is the only object in the image. My training data set looks something like:
What I want to achieve: I would like to be able to detect an apple in an image and put a border around them. The images however would be full of other objects like in this image of a picnic:
Possible approaches:
Sliding Window: I would break my photo down into smaller image. I would start with a large window size in the top left corner and move to right by a step size. When I get to right border of the image I would move down a certain amount of pixels and repeat. This is effectively a sliding window and every one of these smaller images would be run through my CNN.
The window size would get smaller and smaller until an apple is found. The downside of this is that I would be running hundreds of smaller images through my CNN which would take a long time to detect an apple. Additionally if there isn't an apple present in the image, a lot of time would be wasted for nothing.
Extracting foreground objects: Another approach could be to extract all the foreground elements from an image (using OpenCV maybe?) and running those objects through my CNN.
Compared to the sliding window approach, I would be running a handful of images through my CNN vs. hundreds of images.
These are two approaches I could think of, but I was wondering if there are better ones out there in terms of speed. The sliding window approach would eventually work, but it will take a really long time to get the border window of the apple.
I would really appreciate if someone could give me some guidance (maybe I'm on a completely wrong track?), a link to some reading material or some examples of code for extracting foreground elements. Thanks!
A better way to do this is to use the Single Shot Multibox detector (SSD), or "You Only Look Once" (YOLO). Until this approach was designed, it was common to detect objects the way you suggest in the question.
There is a python implementation of SSD is here. OpenCV is used in the YOLO implementation. You can train the networks anew for Apples, in case the current versions do not detect them, or your project requires you to build a system from scratch.

How to match images with unknown rotation differences

I have a collection of about 3000 images that were taken from camera suspended from a weather balloon in flight. The camera is pointing a different direction in each image but is generally aimed down, so all the images share a significant area (40-50%) with the previous image but at a slightly different scale and rotated an arbitrary (and not consistent) amount. The image metadata includes a timestamp, so I do know with certainty the correct order of images and the elapsed time between each.
I want to process these images into a single video. If I simply string them together it will be great for making people seasick, but won't really capture the amazingness of the set :)
The specific part I need help with is finding the rotation of the image from the previous image. Is there a library somewhere that can identify regions of overlap between two images when the images themselves are rotated relative to each other? If I can find 2-3 common points (or more), I can do the remaining calculations to determine the amount of rotation and the offset so I can put them together correctly. Alternately, if there is a library that calculates both of those things for me, that would be even better.
I can do this in any language, with a slight preference for either Java or Python. The data is in Hadoop, so Java is the most natural language, but I can use scripting languages as well if necessary.
Since I'm new to image processing, I don't even know where to start. Any help is greatly appreciated!
For a problem like this you could look into SIFT. This algorithm detects local features in images. OpenCV has an implementation of it, you can read about it here.
You could also try SURF, which is a similar type of algorithm. OpenCV also has this implemented, you can read about that here.

Sparse Image matching in iOS

I am building an iOS app that, as a key feature, incorporates image matching. The problem is the images I need to recognize are small orienteering 10x10 plaques with simple large text on them. They can be quite reflective and will be outside(so the light conditions will be variable). Sample image
There will be up to 15 of these types of image in the pool and really all I need to detect is the text, in order to log where the user has been.
The problem I am facing is that with the image matching software I have tried, aurasma and slightly more successfully arlabs, they can't distinguish between them as they are primarily built to work with detailed images.
I need to accurately detect which plaque is being scanned and have considered using gps to refine the selection but the only reliable way I have found is to get the user to manually enter the text. One of the key attractions we have based the product around is being able to detect these images that are already in place and not have to set up any additional material.
Can anyone suggest a piece of software that would work(as is iOS friendly) or a method of detection that would be effective and interactive/pleasing for the user.
Sample environment:
http://www.orienteeringcoach.com/wp-content/uploads/2012/08/startfinishscp.jpeg
The environment can change substantially, basically anywhere a plaque could be positioned they are; fences, walls, and posts in either wooded or open areas, but overwhelmingly outdoors.
I'm not an iOs programmer, but I will try to answer from an algorithmic point of view. Essentially, you have a detection problem ("Where is the plaque?") and a classification problem ("Which one is it?"). Asking the user to keep the plaque in a pre-defined region is certainly a good idea. This solves the detection problem, which is often harder to solve with limited resources than the classification problem.
For classification, I see two alternatives:
The classic "Computer Vision" route would be feature extraction and classification. Local Binary Patterns and HOG are feature extractors known to be fast enough for mobile (the former more than the latter), and they are not too complicated to implement. Classifiers, however, are non-trivial, and you would probably have to search for an appropriate iOs library.
Alternatively, you could try to binarize the image, i.e. classify pixels as "plate" / white or "text" / black. Then you can use an error-tolerant similarity measure for comparing your binarized image with a binarized reference image of the plaque. The chamfer distance measure is a good candidate. It essentially boils down to comparing the distance transforms of your two binarized images. This is more tolerant to misalignment than comparing binary images directly. The distance transforms of the reference images can be pre-computed and stored on the device.
Personally, I would try the second approach. A (non-mobile) prototype of the second approach is relatively easy to code and evaluate with a good image processing library (OpenCV, Matlab + Image Processing Toolbox, Python, etc).
I managed to find a solution that is working quite well. Im not fully optimized yet but I think its just tweaking filters, as ill explain later on.
Initially I tried to set up opencv but it was very time consuming and a steep learning curve but it did give me an idea. The key to my problem is really detecting the characters within the image and ignoring the background, which was basically just noise. OCR was designed exactly for this purpose.
I found the free library tesseract (https://github.com/ldiqual/tesseract-ios-lib) easy to use and with plenty of customizability. At first the results were very random but applying sharpening and monochromatic filter and a color invert worked well to clean up the text. Next a marked out a target area on the ui and used that to cut out the rectangle of image to process. The speed of processing is slow on large images and this cut it dramatically. The OCR filter allowed me to restrict allowable characters and as the plaques follow a standard configuration this narrowed down the accuracy.
So far its been successful with the grey background plaques but I havent found the correct filter for the red and white editions. My goal will be to add color detection and remove the need to feed in the data type.

How can I compare images of the same origin that were cropped?

Suppose I have an image file/URL, and I want my software to search it within a set of up to 100 images (or at least in that order of magnitude). The target image that the software should find should be the "same" image as the given image, but it should still be able to "forgive" slight processing on either of them (the two images may have been cropped differently, or they were compressed differently).
The question is - is this feasible a task, given that I won't have any of the images before the search is taking place (i.e., there won't be any indexing prior to the search.) Is it likely to work in subsecond time (remember that the compare set is quite small). And if feasible, which tools can I use for this task? This could be software components or even an online service (I can live with that for a proof of concept). Can OpenSURF help me here?
To focus my question further - I'm not asking which algorithms to use, at this point I would rather use an existing tool/API/service.
The target image that the software should find should be the "same" image as the given image, but it should still be able to "forgive" slight processing on either of them.
If "slight processing" doesn't involve rotation, but only "cropping", then simple cross-correlation should work, if there could be perspective correction, rotation, lens distortion correction, then things are more complicated.
I think this method is quite forgiving to slight color corrections. Anyway, you can always convert both images to grayscale and compare grayscale versions if you want.
To focus my question further - I'm not asking which algorithms to use, at this point I would rather use an existing tool/API/service.
You can start from cvMatchTemplate from OpenCV library (the link points to the C version of the API, but it's available also for C++ and Python). Use the cropped image as a template, and look for it in all your images.
If the images you compare have dark features on light backgrounds, you may benefit from using CV_TM_CCOEFF or CV_TM_CCOEFF_NORMED methods. They both subtract the average over the template area from both images. Normalized methods (CV_TM_*_NORMED) generally work better but are slower than their non-normalized counterparts.
You may consider to do some preprocessing with the images before the cross-correlation. If you normalize them first, the cross-correlation will be less sensitive to slight brightness/contrast modification. If you detect edges first, as suggested by #misha, you'll lose color/lightness information, but the results for contour overlapping will be much better.
jetxee set you off on the right track. However, if you simply use template matching, you can run into problems where the background interferes with your template matching result. For example, if your template is a building and your background is primarily light (e.g. desert sand), then the template matching will fail because the lighter background will always return a higher cross-correlation than the darker template. Here is an example of this problem.
The way you solve it is the same as what is in the link:
Perform edge-detection on both your template and the target image.
Throw original template and image away
Perform template detection using the edge-detected template and edge-detected target image
As far as forgiving slight processing, the edge detection step will take care of that. As long as the edges in the two images are not modified significantly (blurred, optically distorted), the approach will work.
I know you are not looking specifically for algorithms, but nonetheless, let me suggest the following which can accomplish exactly what you are trying to do, very efficiently...
For cropped versions of the same image, including rotation, the Fourier-Mellin transform or a log-polar transform (watch out for the artsy semi-nude drawing - good source however) will give you the translation, rotation and scale coefficients between the two images, allowing to to determine what operations were needed to go from one to the other.

Creating a 3D effect from a 2D image

I have a random 2D image. I would like to be able to present the image in 3D. This doesn't have to be very detailed, even if the image were arbitrarily broken into layers like a pop-up cutout from a children's book.
The goal would be that a given image would look normal when directly viewed but that if a viewer were to move/tilt left, right, up, down there would be a 3d effect.
This is similar but not exactly the same as this question here:
How to create 3D streoscopic images using MATLAB with image tool?
This is complete over-kill:
http://make3d.cs.cornell.edu/
And this is probably on the right track:
http://www.imagemagick.org/Usage/distorts/#perspective
My ideal implementation would be a automated PHP script with ImageMagick take is fed an image and spits out as a result either (in order of preference):
Images representing each layer, from
nearest to deepest (closer to the
childs pop-up book layer analogy)
5 images representing the said views
(direct, left, right, top, bottom)
Has this been done (either of the above ideal implementations), or does anyone know how to do all, or part, of this?
As far as the first part of your question is concerned, it sounds like your ideal implementation is http://make3d.cs.cornell.edu/, except that:
you want it simpler (return images from a fixed set of angles as opposed to a walkthrough)
you want it with imagemagick and PHP
I think that last restriction is unrealistic because there's a fair amount of maths and computer vision behind this kind of problem. Imagemagick will help you with lower level-image processing tasks like affine transforms, but it doesn't really provide the required higher-level computer vision functionality like 3D image reconstruction.
So my advice would be to try and work around that restriction somehow. If you implement the approach using more suitable tools (like C++ and OpenCV, for example, or Matlab, as the Make3D guys did), then you can wrap that in a CGI application so your PHP scripts can access it. Cornell (the authors of Make3D) had a similar thing going a while back, but it looks like they're not doing it any more.
For the second part of your question, the theory behind what you want to do has been fairly well-researched. See here for a list of depth estimation papers. Here is what things look like in source.

Resources