How can I automatically determine whether an image file depicts a photo or a 'graphic'?
For example using Imagemagick?
I am somewhat at the limits of my knowledge here, but I read a paper and have worked out a way to calculate image entropy with ImageMagick - some clever person might like to check it!
#!/bin/bash
image=$1
# Get number of pixels in image
px=$(convert -format "%w*%h\n" "$image" info:|bc)
# Calculate entropy
# See this paper www1.idc.ac.il/toky/imageProc-10/Lectures/04_histogram_10.ppt
convert "$image" -colorspace gray -depth 8 -format "%c" histogram:info:- | \
awk -F: -v px=$px '{p=$1/px;e+=-p*log(p)} END {print e}'
So, you would save the script above as entropy, then do the following once to make it executable:
chmod +x entropy
Then you can use it like this:
entropy image.jpg
It does seem to produce bigger numbers for true photos and lower numbers for computer graphics.
Another idea would be to look at the inter-channel correlation. Normally, on digital photos, the different wavelengths of light are quite strongly correlated with each other, so if the red component increases the green and the blue components tend to also increase, but if the red component decreases, both the green and the blue tend to also decrease. If you compare that to computer graphics, people tend to do their graphics with big bold primary colours, so a big red bar-graph or pie-chart graphic will not tend to be at all correlated between the channels. I took a digital photo of a landscape and resized it to be 1 pixel wide and 64 pixels high, and I am showing it using ImageMagick below - you will see that where red goes down so do green and blue...
convert DSC01447.JPG -resize 1x64! -depth 8 txt:
0,0: (168,199,235) #A8C7EB srgb(168,199,235)
0,1: (171,201,236) #ABC9EC srgb(171,201,236)
0,2: (174,202,236) #AECAEC srgb(174,202,236)
0,3: (176,204,236) #B0CCEC srgb(176,204,236)
0,4: (179,205,237) #B3CDED srgb(179,205,237)
0,5: (181,207,236) #B5CFEC srgb(181,207,236)
0,6: (183,208,236) #B7D0EC srgb(183,208,236)
0,7: (186,210,236) #BAD2EC srgb(186,210,236)
0,8: (188,211,235) #BCD3EB srgb(188,211,235)
0,9: (190,212,236) #BED4EC srgb(190,212,236)
0,10: (192,213,234) #C0D5EA srgb(192,213,234)
0,11: (192,211,227) #C0D3E3 srgb(192,211,227)
0,12: (191,208,221) #BFD0DD srgb(191,208,221)
0,13: (190,206,216) #BECED8 srgb(190,206,216)
0,14: (193,207,217) #C1CFD9 srgb(193,207,217)
0,15: (181,194,199) #B5C2C7 srgb(181,194,199)
0,16: (158,167,167) #9EA7A7 srgb(158,167,167)
0,17: (141,149,143) #8D958F srgb(141,149,143)
0,18: (108,111,98) #6C6F62 srgb(108,111,98)
0,19: (89,89,74) #59594A srgb(89,89,74)
0,20: (77,76,61) #4D4C3D srgb(77,76,61)
0,21: (67,64,49) #434031 srgb(67,64,49)
0,22: (57,56,43) #39382B srgb(57,56,43)
0,23: (40,40,34) #282822 srgb(40,40,34)
0,24: (39,38,35) #272623 srgb(39,38,35)
0,25: (38,37,37) #262525 srgb(38,37,37)
0,26: (40,39,38) #282726 srgb(40,39,38)
0,27: (78,78,57) #4E4E39 srgb(78,78,57)
0,28: (123,117,90) #7B755A srgb(123,117,90)
0,29: (170,156,125) #AA9C7D srgb(170,156,125)
0,30: (168,154,116) #A89A74 srgb(168,154,116)
0,31: (153,146,96) #999260 srgb(153,146,96)
0,32: (156,148,101) #9C9465 srgb(156,148,101)
0,33: (152,141,98) #988D62 srgb(152,141,98)
0,34: (151,139,99) #978B63 srgb(151,139,99)
0,35: (150,139,101) #968B65 srgb(150,139,101)
0,36: (146,135,98) #928762 srgb(146,135,98)
0,37: (145,136,97) #918861 srgb(145,136,97)
0,38: (143,133,94) #8F855E srgb(143,133,94)
0,39: (140,133,92) #8C855C srgb(140,133,92)
0,40: (137,133,92) #89855C srgb(137,133,92)
0,41: (136,133,91) #88855B srgb(136,133,91)
0,42: (131,124,81) #837C51 srgb(131,124,81)
0,43: (130,121,78) #82794E srgb(130,121,78)
0,44: (134,123,78) #867B4E srgb(134,123,78)
0,45: (135,127,78) #877F4E srgb(135,127,78)
0,46: (135,129,79) #87814F srgb(135,129,79)
0,47: (129,125,77) #817D4D srgb(129,125,77)
0,48: (106,105,65) #6A6941 srgb(106,105,65)
0,49: (97,99,60) #61633C srgb(97,99,60)
0,50: (120,121,69) #787945 srgb(120,121,69)
0,51: (111,111,63) #6F6F3F srgb(111,111,63)
0,52: (95,98,55) #5F6237 srgb(95,98,55)
0,53: (110,111,63) #6E6F3F srgb(110,111,63)
0,54: (102,105,60) #66693C srgb(102,105,60)
0,55: (118,120,66) #767842 srgb(118,120,66)
0,56: (124,124,68) #7C7C44 srgb(124,124,68)
0,57: (118,120,65) #767841 srgb(118,120,65)
0,58: (114,116,64) #727440 srgb(114,116,64)
0,59: (113,114,63) #71723F srgb(113,114,63)
0,60: (116,117,64) #747540 srgb(116,117,64)
0,61: (118,118,65) #767641 srgb(118,118,65)
0,62: (118,117,65) #767541 srgb(118,117,65)
0,63: (114,114,62) #72723E srgb(114,114,62)
Statistically, this is the covariance. I would tend to want to use red and green channels of a photo to evaluate this - because in a Bayer grid there are two green sites for each single red and blue site, so the green channel is averaged across the two and therefore least susceptible to noise. The blue is most susceptible to noise. So the code for measuring the covariance can be written like this:
#!/bin/bash
# Calculate Red Green covariance of image supplied as parameter
image=$1
convert "$image" -depth 8 txt: | awk ' \
{split($2,a,",")
sub(/\(/,"",a[1]);R[NR]=a[1];
G[NR]=a[2];
# sub(/\)/,"",a[3]);B[NR]=a[3]
}
END{
# Calculate mean of R,G and B
for(i=1;i<=NR;i++){
Rmean=Rmean+R[i]
Gmean=Gmean+G[i]
#Bmean=Bmean+B[i]
}
Rmean=Rmean/NR
Gmean=Gmean/NR
#Bmean=Bmean/NR
# Calculate Green-Red and Green-Blue covariance
for(i=1;i<=NR;i++){
GRcov+=(G[i]-Gmean)*(R[i]-Rmean)
#GBcov+=(G[i]-Gmean)*(B[i]-Bmean)
}
GRcov=GRcov/NR
#GBcov=GBcov/NR
print "Green Red covariance: ",GRcov
#print "GBcovariance: ",GBcov
}'
I did some testing and that also works quite well - however graphics with big white or black backgrounds appear to be well correlated too because red=green=blue on white and black (and all grey-toned areas) so you would need to be careful of them. That however leads to another thought, photos almost never have pure white or black (unless really poorly exposed) whereas graphics do have whit backgrounds, so another test you could use would be to calculate the number of solid black and white pixels like this:
convert photo.jpg -colorspace gray -depth 8 -format %c histogram:info:-| egrep "\(0\)|\(255\)"
2: ( 0, 0, 0) #000000 gray(0)
537: (255,255,255) #FFFFFF gray(255)
This one has 2 black and 537 pure white pixels.
I should imagine you probably have enough for a decent heuristic now!
Following on from my comment, you can use these ImageMagick commands:
# Get EXIF information
identify -format "%[EXIF*]" image.jpg
# Get number of colours
convert image.jpg -format "%k" info:
Other parameters may be suggested by other responders, and you can find most of that using:
identify -verbose image.jpg
Compute the entropy of the image. Artificial images usually have much lower entropy than photographs.
I've got a problem converting a pdflatex-generated PDF image to a PNG image using the standalone package.
The pixelated rendering of the text in the converted image (PDF->PS->PNG via gs and ImageMagick?) is awfully blurry and inferior in quality (sharpness, crispness etc.) to the screen-dumped original PDF.
I have checked out these StackExchange posts:
Standalone diagrams with TikZ?
TikZ to non-PDF
and been guided in the setup of my workflow by the standalone package manual. But after considerable experimental adjustment of the various conversion settings in the code below, I have been unable to improve the quality of the outputted PNG image.
A sample of the settings I have played with:
density (increase the dpi)
size (increase/decrease the dimensions)
the TikZ picture width/height dimensions (no optimum found, but if too small the PNG image width does not equal that specified among the documentclass parameters)
using the command={} option, I have also played with options such as -quality and - set colorspace RGB (though I didn't really know what I was doing here)
Another approach I have taken is to try to set the TikZ picture width and height dimensions (in cm) in such a way that they agree with the conversion dimensions given among the documentclass parameters (using a dpi + pixels -> cm converter).
None of this worked! So any help in converting from PDF to PNG using the standalone package that preserves the sharpness and crispness of the rendered text in the image would be hugely appreciated.
For reference the versions of the various systems/applications I'm using are:
Windows 7
MiKTeX 2.9
TeXnicCenter
gs 9.09
ImageMagick 6.8.6 Q16 (32-bit)
standalone package installed using MiKTeX package manager late Aug 2013
\documentclass[preview,convert={density=300,size=900x300,outext=.png}]{standalone}
\usepackage{tikz}
\usepackage{pgf}
\usepackage{pgfplots}
\begin{document}
\pgfplotsset{every x tick label/.style={at={(1,0)}, yshift=-0.15cm, xshift=-0.0cm, inner sep=0pt, font=\normalsize}}
\begin{tikzpicture}
\begin{axis}[
no markers, domain=-2.1*pi:2.1*pi, samples=1000,
width=30.0cm,
height=10.0cm,
axis x line*=middle,
x axis line style={densely dotted, opacity=0.75},
axis y line*=middle,
y axis line style={densely dotted, opacity=0.75},
ymin=-1.1,
ymax=1.1,
xtick={-6.28318530717959, -5.65486677646163, -5.02654824574367, -4.71238898038469, -4.39822971502571, -3.76991118430775, -3.14159265358979, -2.51327412287183, -1.88495559215388, -1.5707963267949, -1.25663706143592, -0.628318530717959, 0, 0.628318530717959, 1.25663706143592, 1.5707963267949, 1.88495559215388, 2.51327412287183, 3.14159265358979, 3.76991118430775, 4.39822971502571, 4.71238898038469, 5.02654824574367, 5.65486677646163, 6.28318530717959},
xticklabels={$-2\pi$, $-\frac{9\pi}{5}$, $-\frac{8\pi}{5}$, $-\frac{3\pi}{2}$, $-\frac{7\pi}{5}$, $-\frac{6\pi}{5}$, $-\pi$, $-\frac{4\pi}{5}$, $-\frac{3\pi}{5}$, $-\frac{\pi}{2}$, $-\frac{2\pi}{5}$, $-\frac{\pi}{5}$, $0$, $\frac{\pi}{5}$, $\frac{2\pi}{5}$, $\frac{\pi}{2}$, $\frac{3\pi}{5}$, $\frac{4\pi}{5}$, $\pi$, $\frac{6\pi}{5}$, $\frac{7\pi}{5}$, $\frac{\pi}{2}$, $\frac{8\pi}{5}$, $\frac{9\pi}{5}$, $2\pi$},
ytick=\empty,
enlargelimits=false, clip=true, axis on top]
\addplot [line width=0.5,cyan!50!black] {sin(deg(5*x))*cos(deg(x)};
\end{axis}
\end{tikzpicture}
\end{document}
In order to investigate this problem I first created a PDF from your posted tikz/tex code (after copying it into a tikz.tex file):
pdflatex tikz.tex
pdflatex tikz.tex
The resulting PDF does contain the illustration as a vector graphic, not a raster image. Hence, pdfimages -list will NOT detect it.
Then I tested two ways to convert the resulting PDF file to a PNG:
Using ImageMagick's convert (which employs Ghostscript behind your back as a 'delegate' to process the PDF input)
Using Ghostscript directly
1. Using convertwith -density 720
I've used this command to create a PNG from the PDF:
convert -density 720 tikz.pdf tikz1.png
Here is the result:
Why did I use -density 720? Because 720 PPI is the default resolution which Ghostscript uses when creating PDFs (unless you override this default setting by providing your own via -rNxM on the gs command line)...
The resulting image has a size of 374 kB (the PDF had 49 kB) and a width x height dimension of 8060 x 2390 pixels. Any pixelization (which will happen whenever you create a PNG!) is not immediately visible at that resolution.
The runtime for a loop running this command 10 times was 47 seconds.
2. Using Ghostscript directly
To achieve the direct PNG conversion with a Ghostscript command I used:
gs -o tikz-gs.png -sDEVICE=pngalpha \
-dAlignToPixels=0 -dGridFitTT=2 \
-dTextAlphaBits=4 -dGraphicsAlphaBits=4 \
tikz.pdf
Here is the resulting PNG:
It's file size is 308 kB, with dimensions of 8060 x 2390 pixels.
The runtime for a loop running this command 10 times was 17 seconds.
Analysis
PDF input file: Sizes of components
After looking at the source code of the PDF file after uncompressing all objects, I came up with the following statistics:
Total size of 5 embedded Type1 fonts................................ 38615 Bytes
Total size of v`/Contents` stream (mainly used by vector drawing)... 32630 Bytes
Rest of PDF structure ("overhead", if you want)..................... 5827 Bytes
---------------------------------------------------------------------------------
Total size of PDF (after uncompressing objects)..................... 77072 Bytes
The fonts are Type 1 (i.e. PostScript) fonts, according to the output of pdffonts. They are all embedded as subsets:
pdffonts tikz.pdf
name type encoding emb sub uni object ID
-------------------------- ------------ ---------------- --- --- --- ---------
FXXUVH+CMSY10 Type 1 Builtin yes yes no 7 0
BCSIZL+CMR10 Type 1 Builtin yes yes no 8 0
SFJZUV+CMMI10 Type 1 Builtin yes yes no 9 0
WPSSUY+CMR7 Type 1 Builtin yes yes no 10 0
SYHYOI+CMMI7 Type 1 Builtin yes yes no 11 0
Because...
...Fonts (unless they are raster fonts) are a different way to very efficiently code vector shapes for glyphs depicting text characters,
...Fonts + Vector drawing compose more than 90% of the total PDF size,
...there is no way in hell that lets you create a PNG raster image from the (compressed) PDF sized 49 kB (uncompressed size was 75 kB) that isn't larger by a few times than the original PDF file if you want to avoid directly visible "pixelization" and "blur".
Even if you use a resolution of 720 PPI (which creates a 308 kB-sized PNG), you'll still see pixelization once you start zooming in. Such pixelization does not occur with the PDF (because all its shapes are defined as vectors).
The following three images are screenshots:
top, from the tikz.pdf file at a high zoom level (~1000%),
center, from the tikz.png created with 720 PPI (at a similar zoom level),
bottom, from the tikz72.png created with 72 PPI (at a similar zoom level):
The text sizes used for the annotation of the coordinate axis are only around 10 points. If you rasterize those, you'll get clearly visible pixelization at any resolution below 400 PPI, maybe even above...
My Ghostscript is a self-compiled 9.17 GIT PRERELEASE. My ImageMagick is 6.9.0-0 Q16 x86_64.
There once was an image, possibly with alpha transparency, overlaid onto both a white background and black background. I have access to the two resulting images, but not the original, and I want to retrieve the original.
I have some Ruby code written up to do this, but, I think simply by nature of being in Ruby, it's not as fast as it needs to be. Here's the basic logic, iterating pixel by pixel:
if pixel_on_black == pixel_on_white
# Matching pixels indicate 100% opacity in the original.
original_pixel = pixel_on_black
elsif color_on_black == BLACK && color_on_white == WHITE
# Black on black and white on white indicate 0% opacity in the original.
original_pixel = TRANSPARENT
else
# Since it's not one of the simple cases, then we'll do some math.
# Fancy algebra tells us the following. (MAX_VALUE is the largest value
# a channel can have. So, in most cases, 255.)
# First, find the alpha value. This equation follows from the equations
# for composing on black and composing on white.
alpha = pixel_on_black.red - pixel_on_white.red + MAX_VALUE
# Now that we know the alpha value, undo its multiplicative effect on the
# pixel on black. By dividing. Ta da.
alpha_ratio = alpha / MAX_VALUE
original_pixel = Pixel.new
original_pixel.red = pixel_on_black.red / alpha_ratio
original_pixel.green = pixel_on_black.green / alpha_ratio
original_pixel.blue = pixel_on_black.blue / alpha_ratio
original_pixel.alpha = alpha
end
So that's nice, and it works and all. However, this code needs to end up running blazing-fast, and iterating pixels in Ruby is not acceptable. It looks like, unless this function already exists somewhere, it would be in my best interest to come up with a series of ImageMagick options that would do the trick.
I'm researching ImageMagick's command line tool now, because it looks really, really powerful, and it looks like either -fx or a series of fancy -function arguments would do the same thing as my code above. I'll keep trying, too, but are there any ImageMagick pros out there who already know how to put all that together?
EDIT: I have an -fx version running now :)
convert image_on_black.png image_on_white.png -matte -channel alpha -fx "u.r + 1 - v.r" -channel RGB -fx "(u.a == 0) ? 1 : (u / u.a)" output.png
Almost an exact translation of the original code, broken into channels. Iterate over the alpha channel, and set the correct alpha values. Then iterate over the RGB channels, and dividing the channel by the alpha value (unless it's zero, in which case we can set it to anything, since dividing by zero throws an error—in this case, I chose 1 for white).
Now time to work on converting these into more explicit arguments, since the -fx expression is reevaluated for each pixel, which isn't great.
Mkay, I surprised myself here, and I think I found an answer. It's four ImageMagick commands, though maybe they could be worked into one somehow…though I doubt it.
convert input_on_white.png -channel RGB -negate /tmp/convert_negative.png && \
convert input_on_black.png /tmp/convert_negative.png -alpha Off -compose Plus -composite /tmp/convert_alpha.png && \
composite plasma2.png /tmp/convert_alpha.png -alpha Off -channel RGB -compose Divide /tmp/convert_division.png && \
convert /tmp/convert_division.png /tmp/convert_alpha.png -compose CopyOpacity -composite plasma_output.png
(Obviously, when done, clean up those temporary files. Also, use an actual tempfile system, rather than using hardcoded paths.)
The strategy is to first create a grayscale image that represents the alpha mask. We'll be emulating the line of code alpha = pixel_on_black.red - pixel_on_white.red + MAX_VALUE, which can be rewritten as alpha = pixel_on_black.red + (MAX_VALUE - pixel_on_white.red).
So, line 1: we create an image that represents the second term of that equation, a negated version of the RGB channels of image-on-white. We save it as a temporary file.
Then, line 2: we want to add that negative image to the image-on-black. Use ImageMagick's Plus composition, and save that as the temporary alpha mask. The result is a grayscale image where white represents areas that should have 100% opacity in the final image, and black represents areas that will later be fully transparent.
Then, line 3: bring the image-on-black back to the original RGB colors. Since the image-on-black was created by mutiplying the RGB channels by the alpha ratio, we divide by the alpha mask image to undo that effect.
Finally, line 4: take the color-corrected image from line 3 and apply the alpha mask from line 2, using ImageMagick's CopyOpacity composition function. Ta da!
My original strategy took anywhere from 5-10 seconds. This strategy takes less than a second. Much, much, much better.
Unsurprisingly, asking for help is what drove me to find the answer myself. Regardless, I'll leave this question open for 48 hours to see if anyone finds a slightly more optimal solution. Thanks!