When capturing a photo using AVFoundation classes stains appear on certain areas of the image.
Happens on iOS 14.4, iPhone 12 Pro.
I managed to reproduce it using different custom ISO and exposure time settings and using the default auto setting.
Both for single photo and bracket captures.
Both with maxPhotoQualityPrioritization set to quality and to balanced.
Both with ultra wide angle and wide angle cameras.
It's not deterministic. Seems like it is most prominent in images with high light contrast and different light sources (where the natural light mixes with artificial light and some areas are more lit than others). Also more prominent when capturing multiple images using bracket settings with both negative and positive exposure biases. example image
Does anybody know any fix or a workaround for this?
What you describe as "stains" look like areas that are "blown out" (Areas where one or more color channels is at maximum and all detail is lost. This is known as "blown highlights" in photography.) This creates blobs of a solid color where there is a loss of all detail.
In your case, it looks like the "stains" are completely blown in all 3 color channels.
If you use Photoshop you can display a histogram of the image, as well as a mode where areas that are oversaturated are shown in red.
See this link, for example, for a description of how to do that.
Related
I would like to build a very simple AR app, which is able to detect a white sheet of A4 paper in its surrounding. I thought it would be enough to use Apple's image recognition sample project as well as a white sample image in the ratio of a A4 sheet but the ARSession will fail.
One or more reference images have insufficient texture: white_a4,
NSLocalizedRecoverySuggestion=One or more images lack sufficient
texture and contrast for accurate detection. Image detection works
best when an image contains multiple high-contrast regions distributed
across its extent.
Is there a simple way, to detect sheets of paper using ARKit? Thanks!
I think even ARKit 3.0 isn't ready for an abstract white sheet's detection at the moment.
If you have a white sheet with some markers at its corners, or some text on it, or, even, a white sheet placed inside definite environment (it's a kind of detection based on surroundings, not on the sheet itself) – then it has some sense.
But simple white paper has no distinct marks on it, hence ARKit has no understanding what it is, what its color is (outside a room it has cold tint, for instance, but inside a room it has warm tint), what a contrast is (contrast's important property in image detection) and how it's oriented (this mainly depends on your PoV).
Suppose the common sense of image detection is that ARKit detects image, not its absence.
So, for successive detection you'll need to give ARKit not only a sheet but its surrounding as well.
Also, you can look at Apple's recommendations when working with image detection technique:
Enter the physical size of the image in Xcode as accurately as possible. ARKit relies on this information to determine the distance of the image from the camera. Entering an incorrect physical size will result in an ARImageAnchor that’s the wrong distance from the camera.
When you add reference images to your asset catalog in Xcode, pay attention to the quality estimation warnings Xcode provides. Images with high contrast work best for image detection.
Use only images on flat surfaces for detection. If an image to be detected is on a nonplanar surface, like a label on a wine bottle, ARKit might not detect it at all, or might create an image anchor at the wrong location.
Consider how your image appears under different lighting conditions. If an image is printed on glossy paper or displayed on a device screen, reflections on those surfaces can interfere with detection.
I must add that you need a unique texture pattern, not a repetitive one.
What you could do is run a simple ARWorldTrackingConfiguration where you periodically analyze the camera image for rectangles using the Vision framework.
This post (https://medium.com/s23nyc-tech/using-machine-learning-and-coreml-to-control-arkit-24241c894e3b) describes how to use ARKit in combination with CoreML
Overview
I am attempting to build a prototype of a vision system that would apply pattern matching to figure out the orientation of boxes (eg. soap boxes).
Image sample
Below are real-time captured images of soap boxes in actual environment having two of four possible orientations. (Front_Straight and Back_Inverted orientations).
The real-time images will be very similar to these (300x200 pixels per image approx.)
____
The template images will be fed to the system in prior and it has to determine the orientation of boxes moving on a conveyor. The boxes on conveyor are guided so that they can take only one of 4 possible orientations Front_Straight, Front_Inverted, Back_Straight and Back_Inverted i.e boxes cannot be angular. The camera and the conveyor are fixed so the image size of real-time boxes is constant 300px by 200px. (I have used monochrome camera, if needed colour camera can be used too)
Some properties of the vision system prototype:
Fixed constant lighting.
The real-time image of box will be quite
low-res as attached(300x200 per box)
Minimal motion blur or imaging artefacts
OpenCV C++ based coding environment.
Intel core i5 CPU based PC will
be used.
Problem Statement
I am looking for a light weight yet robust algorithm that can fairly match template image with real-time images of boxes on conveyor to extract the face and orientation. I am new to feature matching so please guide me as to which feature detector and matcher will be most suitable for this particular case. Also please let me know if it is possible to attain 97% plus accuracy using the low-res realtime image as attached.
You have a very fortunate case, having the images with very little variation. Any feature detector should perform very well in this scenario. Since, in OpenCV, the interface is common, they are very easy to compare against each other. From my experience, ORB tends to be quite fast and with good results, but I expect SIFT/SURF to work in your case too.
I wouldn't expect the resolution to be a problem.
I have faced with such problem, when I capture photo from LCD display there are annoying rainbow strips.
Is there any way to clear image from them doing some computer vision stuff?
Which are the keywords should I google for? Or maybe some useful links/papers related for.
My goal is to OCRing after it.
In case of low threshold WolfJolion binarization finds a lot of connected components which cause slow and bad recognition.
Using higher threshold some characters are vanished from image.
Source photo:
First binarization:
Second binarization:
P.S. Photos are taken from MacBook Pro Retina Display with iPhone 6 camera.
What you see is called Moire effect. It is caused by subsampling the screen pixels with your camera pixels.
Simply slightly change angle and or distance to avoid these. Then you don't need any image processing.
Beside that these stripes should not bother any decent OCR.
If you insist on image processing then a global threshold should do the trick.
I think I understand what color profiles are. I do not understand, what is the difference in manipulating photo for example in photoshop in 16bpp sRGB and 16bpp Adobe RGB. My monitor can only show me sRGB.
Is there any difference in algorithms?
Maybe there is some preprocessing executed before program displays effects of my work (for example AdobeRGB(0.3, 0.25, 0.82) is being displayed as sRGB(0.301, 0.253, 0.819) in my monitor)?
Is there any sense in using different color profiles when I am not using ICC profile of my monitor/printer?
In general – what should I do if I would want to develop my own graphics-manipulating application that supports profile different than sRGB (for example in Qt)?
The color space your image uses determines how your 16 bits per pixel should relate to the output produced by your monitor, i.e., it determines what colors the numbers actually represent.
This can make a difference in the way some algorithms are processed if they are supposed to make realistic, natural-looking, or consistent results.
Let's say you composite a semi-transparent yellow on top of a dark red background? What kind of brown do you get? If the algorithm always mixes the pixel data the same way, then even when the yellow and red look the same on your monitor, the brown you get might be different because of your color space.
A more 'correct' way to do mixing would be to transform your pixel data into a consistent color space, mix, and then transform back. If the original colors look the same on two monitors with different calibrated profiles, then they will transform into the same numbers in a consistent color space, and the mix result will transform back into results that look the same on both monitors even though the pixel values might be different.
Natural-looking compositing with semi-transparency is a good example of an algorithm that has to take your color space into account in order to produce realistic results. Other effects that have to look 'natural', like specular highlights, shadows, etc., similarly need to do physically accurate math in a consistent color space.
To answer your specific questions:
Yes, as explained, many algorithms should perform different calculations with different color spaces.
Yes, there is. The image's color space defines what the data means in terms of physical light. If you display it with an ICC calibrated profile, it is transformed into the numbers that your monitor needs to accurately display your image.
It should make very little difference what color space you use for your image, except that some display software won't take it into account. Making sRGB images is better for cross-system compatibility, but I think Adobe RBG has a bigger gamut and can actually represent some green colors that sRGB can't. You should use printer and monitor calibration so that you can SEE what your image really looks like.
I think I answered that above.
Their's no differences in algorithms because you operate in RGB color space and not in XYZ color space. Monitors like you said shows colors differently, the red on one monitor may not exactly match the red primary on another monitor. In order to define different RGB color spaces in a common manner, monitors use the CIE 1931 XYZ color space. Every monitor or system calculate RGB color to XYZ according to used profiles, for example: RGB (1,0,0) = XYZ (0.4358, 0.2224, 0.0139) in sRGB and XYZ (0.7977, 0.2880, 0.0000) in ProPhotoRGB.
For further information see:
http://ninedegreesbelow.com/photography/xyz-rgb.html
http://www.ryanjuckett.com/programming/rgb-color-space-conversion/
Gamut mapping explained by analogy
If you change color spaces, you may lose some of the information because the mapping from one to the other may not be injective (invertible). You may choose among different rendering intents to pick the mapping that throws only the information you find least useful away.
This analogy might illustrate the consequences of converting an image to a smaller color space when the original space is larger than the one of your device: You can very well represent a 3D object in the computer, but you will never actually see it, because your screen is flat and thus able to display only 2D images. You can view projections of the object, you can view cuts through the object, but you need a 3D printer to get something really 3D out of it.
Even if you have no 3D printer, it is worth representing the object in 3D and not as a fixed 2D projection. Otherwise, you would not be able to make all those 2D cuts and projections, and even if you bought a 3D printer in the future, you could not print the object anymore.
The 3D object is a picture in the larger space, a fixed 2D projection is a picture in a smaller space, screen is a device with the smaller color space and 3D printer is a device with the larger color space. End of analogy.
ICC workflow
If you take a photo, you camera should assign a profile to it, describing the device color space of the camera. The profile defines mapping of the numbers inside the picture (coordinates in device color space) to real-world colors (coordinates in an absolute color space). Therefore, without a profile, the numbers really have no meaning and anyone is free to make up any mapping they like.
If you shoot RAW, you do the color space conversion when developing the photo; if you shoot JPEG, the camera performs this task for you.
In the opposite direction, when displaying or printing: If the display device is not calibrated and has no profile, the real-world colors stored in the image might not match what comes out of the device in reality. The mapping between the image color space and the output device space could not guarantee that the colors will be preserved and is somewhat arbitrary.
Actual answers
The difference in manipulating the photo in sRGB and Adobe RGB is that Adobe RGB is larger and thus preserves more information for further processing.
The difference in algorithms has already been explained by Matt Timmermans in another answer. Regarding color blending, you might want to know more about perceptually uniform color spaces (see e.g. a closed Q & A on SO).
Yes, conversion from Adobe RGB to sRGB is not identity and thus requires some processing. Where exactly this processing is done (device driver, OS kernel, image processing software) depends on the source and target, the OS and their settings. If you convert the spaces in Photoshop, it does the computation itself. Windows have a built-in color management module that takes care of converting an image with profile to the device color space of the output device.
The image you want to display/print might be stored in some rather exotic color space. If the OS guesses it is in sRGB (Windows would), it might give odd results. It is better to provide as much information as possible to the color management system. Even uncalibrated devices might be assigned some generic profiles, some guesswork might take place. And maybe, you’ll calibrate and characterize your device someday, or you’ll send the image to someone with such a device.
Qt itself does not support color management. However, KDE, which is built atop Qt, supports some color management via Oyranos.
When should we expect complete color management for KDE?
If we are talking about color management in Qt, not anytime soon. If we are talking about decent color management implemented in the compositor (KWin), sooner than not anytime soon. It also depends on how quickly the graphics applications adapt to these new color management things.
You could use Oyranos or another color management system directly in your application. Google told me about a thesis about getting color management to Qt, too.
Related reading
Generalities about colors # color-management-guide.com
ICC FAQ
Windows 7: Change color management settings
Windows Vista: Color management settings FAQ
Introduction to Color Management in Microsoft Windows Operating Systems
Windows Color System # MSDN
I am trying to teach my camera to be a scanner: I take pictures of printed text and then convert them to bitmaps (and then to djvu and OCR'ed). I need to compute a threshold for which pixels should be white and which black, but I'm stymied by uneven illumination. For example if the pixels in the center are dark enough, I'm likely to wind up with a bunch of black pixels in the corners.
What I would like to do, under relatively simple assumptions, is compensate for uneven illumination before thresholding. More precisely:
Assume one or two light sources, maybe one with gradual change in light intensity across the surface (ambient light) and another with an inverse square (direct light).
Assume that the white parts of the paper all have the same reflectivity/albedo/whatever.
Find some algorithm to estimate degree of illumination at each pixel, and from that recover the reflectivity of each pixel.
From a pixel's reflectivity, classify it white or black
I have no idea how to write an algorithm to do this. I don't want to fall back on least-squares fitting since I'd somehow like to ignore the dark pixels when estimating illumination. I also don't know if the algorithm will work.
All helpful advice will be upvoted!
EDIT: I've definitely considered chopping the image into pieces that are large enough so they still look like "text on a white background" but small enough so that illumination of a single piece is more or less even. I think if I then interpolate the thresholds so that there's no discontinuity across sub-image boundaries, I will probably get something halfway decent. This is a good suggestion, and I will have to give it a try, but it still leaves me with the problem of where to draw the line between white and black. More thoughts?
EDIT: Here are some screen dumps from GIMP showing different histograms and the "best" threshold value (chosen by hand) for each histogram. In two of the three a single threshold for the whole image is good enough. In the third, however, the upper left corner really needs a different threshold:
I'm not sure if you still need a solution after all this time, but if you still do. A few years ago I and my team photographed about 250,000 pages with a camera and converted them to (almost black and white ) grey scale images which we then DjVued ( also make pdfs of).
(See The catalogue and complete collection of photographic facsimiles of the 1144 paper transcripts of the French Institute of Pondicherry.)
We also ran into the problem of uneven illumination. We came up with a simple unsophisticated solution which worked very well in practice. This solution should also work to create black and white images rather than grey scale (as I'll describe).
The camera and lighting setup
a) We taped an empty picture frame to the top of a table to keep our pages in the exact same position.
b) We put a camera on a tripod also on top of the table above and pointing down at the taped picture frame and on a bar about a foot wide attached to the external flash holder on top of the camera we attached two "modelling lights". These can be purchased at any good camera shop. They are designed to provide even illumination. The camera was shaded from the lights by putting small cardboard box around each modelling light. We photographed in greyscale which we then further processed. (Our pages were old browned paper with blue ink writing so your case should be simpler).
Processing of the images
We used the free software package irfanview.
This software has a batch mode which can simultaneously do color correction, change the bit depth and crop the images. We would take the photograph of a page and then in interactive mode adjust the brightness, contrast and gamma settings till it was close to black and white. (We used greyscale but by setting the bit depth to 2 you will get black and white when you batch process all the pages.)
After determining the best color correction we then interactively cropped a single image and noted the cropping settings. We then set all these settings in the batch mode window and processed the pages for one book.
Creating DjVu images.
We used the free DjVu Solo 3.1 to create the DjVu images. This has several modes to create the DjVu images. The mode which creates black and white images didn't work well for us for photographs, but the "photo" mode did.
We didn't OCR (since the images were handwritten Sanskrit) but as long as the letters are evenly illuminated I think your OCR software should ignore big black areas like between a two page spread. But you can always get rid of the black between a two page spread or at the edges by cropping the pages twices once for the left hand pages and once for the right hand pages and the irfanview software will allow you to cleverly number your pages so you can then remerge the pages in the correct order. I.e rename your pages something like page-xxxA for lefthand pages and page-xxxB for righthand pages and the pages will then sort correctly on name.
If you still need a solution I hope some of the above is useful to you.
i would recommend calibrating the camera. considering that your lighting setup is fixed (that is the lights do not move between pictures), and your camera is grayscale (not color).
take a picture of a white sheet of paper which covers the whole workable area of your "scanner". store this picture, it tells what is white paper for each pixel. now, when you take take a picture of a document to scan, you can reload your "white reference picture" and even the illumination before performing a threshold.
let's call the white reference REF, the picture DOC, the even illumination picture EVEN, and the maximum value of a pixel MAX (for 8bit imaging, it is 255). for each pixel:
EVEN = DOC * (MAX/REF)
notes:
beware of the parenthesis: most image processing library uses the image pixel type for performing computation on pixel values and a simple multiplication will overload your pixel. eventually, write the loop yourself and use a 32 bit integer for intermediate computations.
the white reference image can be smoothed before being used in the process. any smoothing or blurring filter will do, and don't hesitate to apply it aggressively.
the MAX value in the formula above represents the target pixel value in the resulting image. using the maximum pixel value targets a bright white, but you can adjust this value to target a lighter gray.
Well. Usually the image processing I do is highly time sensitive, so a complex algorithm like the one you're seeking wouldn't work. But . . . have you considered chopping the image up into smaller pieces, and re-scaling each sub-image? That should make the 'dark' pixels stand out fairly well even in an image of variable lighting conditions (I am assuming here that you are talking about a standard mostly-white page with dark text.)
Its a cheat, but a lot easier than the 'right' way you're suggesting.
This might be horrendously slow, but what I'd recommend is to break the scanned surface into quarters/16ths and re-color them so that the average grayscale level is similar across the page. (Might break if you have pages with large margins though)
I assume that you are taking images of (relatively) small black letters on a white background.
One approach could be to "remove" the small black objects, while keeping the illumination variations of the background. This gives an estimate of how the image is illuminated, which can be used for normalizing the original image. It is often enough to subtract the illumination estimate from the original image and then do a threshold based segmentation.
This approach is based on gray scale morphological filters, and could be implemented in matlab like below:
img = imread('filename.png');
illumination = imclose(img, strel('disk', 10));
imgCorrected = img - illumination;
thresholdValue = graythresh(imgCorrected);
bw = imgCorrected > thresholdValue;
For an example with real images take a look at this guide from mathworks. For further reading about the use of morphological image analysis this book by Pierre Soille can be recommended.
Two algorithms come to my mind:
High-pass to alleviate the low-frequency illumination gradient
Local threshold with an appropriate radius
Adaptive thresholding is the keyword. Quote from a 2003 article by R.
Fisher, S. Perkins, A. Walker, and E. Wolfart: “This more sophisticated version
of thresholding can accommodate changing lighting conditions in the image, e.g.
those occurring as a result of a strong illumination gradient or shadows.”
ImageMagick's -lat option can do it, for example:
convert -lat 50x50-2000 input.jpg output.jpg
input.jpg
output.jpg
You could try using an edge detection filter, then a floodfill algorithm, to distinguish the background from the foreground. Interpolate the floodfilled region to determine the local illumination; you may also be able to modify the floodfill algorithm to use the local background value to jump across lines and fill boxes and so forth.
You could also try a Threshold Hysteresis with a rate of change control. Here is the link to the normal Threshold Hysteresis. Set the first threshold to a typical white value. Set the second threshold to less than the lowest white value in the corners.
The difference is that you want to check the difference between pixels for all values in between the first and second threshold. Ideally if the difference is positive, then act normally. But if it is negative, you only want to threshold if the difference is small.
This will be able to compensate for lighting variations, but will ignore the large changes between the background and the text.
Why don't you use simple opening and closing operations?
Try this, just lool at the results:
src - cource image
src - open(src)
close(src) - src
and look at the close - src result
using different window size, you will get backgound of the image.
I think this helps.