GM has build option quantum which defines the bitdepth to use when
reading an image. Building GM with high quantum means that images of
smaller bitdepth will take a lot more memory.
What is the quantum here? Can anyone give me some resources about this?
It is a build-time setting, which means you need to recompile GraphicsMagick in order to change it.
If you build with Q8, each of your pixels in an image can have 2^8 unique values, i.e. 256 shades of grey.
If you build with Q16, each pixel can have 2^16 unique values - i.e. 65,536 shades of grey.
So, with a larger quantum setting, on the plus side, you will get smoother gradients, and less rounding errors, for example. The downside is that your processing may take longer (CPU-dependent) and will take more RAM to store it.
You can check your current setting with:
gm identify version
Sample Output
GraphicsMagick 1.3.27 Q16 http://www.GraphicsMagick.org/
Copyright (C) 2002-2017 GraphicsMagick Group.
Additional copyrights and licenses apply to this software.
See http://www.GraphicsMagick.org/www/Copyright.html for details.
Feature Support:
Native Thread Safe yes
Large Files (> 32 bit) yes
Large Memory (> 32 bit) yes
BZIP yes
DPS no
FlashPix no
FreeType yes
Ghostscript (Library) no
JBIG no
JPEG-2000 no
JPEG yes
Little CMS no
Loadable Modules yes
OpenMP no
PNG yes
TIFF yes
TRIO no
UMEM no
WebP no
WMF no
X11 no
XML yes
ZLIB yes
Host type: x86_64-apple-darwin17.3.0
Configured using the command:
./configure '--prefix=/usr/local/Cellar/graphicsmagick/1.3.27' '--disable-dependency-tracking' '--enable-shared' '--disable-static' '--with-modules' '--without-lzma' '--disable-openmp' '--with-quantum-depth=16' '--without-gslib' '--with-gs-font-dir=/usr/local/share/ghostscript/fonts' '--without-x' '--without-lcms2' 'CC=clang' 'CXX=clang++'
Final Build Parameters:
CC = clang
CFLAGS = -g -O2 -Wall -D_THREAD_SAFE
CPPFLAGS = -I/usr/local/opt/freetype/include/freetype2 -I/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.13.sdk/usr/include/libxml2
CXX = clang++
CXXFLAGS = -D_THREAD_SAFE
LDFLAGS = -L/usr/local/opt/freetype/lib -L/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.13.sdk/usr/lib
LIBS = -lfreetype -lbz2 -lz -lltdl -lm -lpthread
The very first line has Q16 in it, meaning my Quantum is 16.
According to this website, quantum is:
The base type used to represent color samples in GraphicsMagick is the
Quantum type. Pixels are represented by a structure of Quantum values.
For example, an RGB pixel contains red, green, and blue quantums,
while an RGBA pixel contains red, green, blue, and opacity quantums.
The maximum value that a Quantum can attain is specified by a constant
value represented by the MaxRGB define, which is itself determined by
the number of bits in a Quantum. The QuantumDepth build option
determines the number of bits in a Quantum.
Related
I'm trying to do some ffts with MKL's ComputeForward method. Sometimes I get bins with zero on real and imaginary parts. I,.e I'm doing an FFT of floats of 20480 samples of a 16K tone sampled at 1.024 Msps, thus 50 Hz resolution per bin. The bin 9920, which corresponds to 496K is 0+0i.
The rest of the 10240 bins seem correct.
I've done the FFT on Octave and that bin should fit without problems on a float.
What can cause this?
NOTE:
Curiosly enough, the failing bin is the symmetric with regards to the 16K tone, that is, the 16K tone is at bin 320, and the 9920 is the 320th bin starting from the right.
Matlab/octave is not officially supported by intel MKL thus might be causing the error. This could be possibly solved by using supported languages such as C/C++, DPC++, and Fortran.
https://software.intel.com/content/www/us/en/develop/articles/oneapi-math-kernel-library-system-requirements.html
https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-FFT-bin-zero-real-and-imaginary-parts-part-II/m-p/1296917#M31708
I am facing this strange issue where i am trying to read the blob of WebP Image through MagickReadImageBlob and in the next line i just try to fetch the same blob using MagickGetImageBlob . So, my final blob size reduces strangely. So, can anyone explain this behaviour?
I am using Version: ImageMagick 6.9.8-10 Q16 x86_64 on ubuntu 16.04
So, can anyone explain this behaviour?
The MagickReadImageBlob decodes an image-file buffer into a raster of authenticated pixels.
The MagickGetImageBlob encodes the raster back into an image-file buffer.
WebP format can be either lossy, or lossless, as well as implement different compression techniques during the encoding process. It's more than possible that the encoding routine simply found another way to store the raster than the previous one. Your version of ImageMagick has a quantum depth of 16 (Q16), so the decoding/scaling of WebP's 24-bit Color + 8-bit alpha to Q16 might influence some encoding variations. Try setting MagickSetImageDepth(wand, 8) to see if that helps.
Under Windows, I need to choose between Q8 and Q16. I know that Q8 are 8 bits-per-pixel component (e.g. 8-bit red, 8-bit green, etc.), whereas, Q16 are 16 bits-per-pixel component. I know also that Q16 uses twice memory as Q8. Therefore, I must choose carefully.
What is a 16 bits-per-pixel component? Does a jpeg image support 16 bits-per-pixel component? Does a picture one takes from a digital camera in a smartphone result in 8 bits-per-pixel component or 16 bits-per-pixel component?
I just need to load jpg images, crop/resize them and save. I also need to save the pictures in 2 different variants: one with the icc color profile management included and another without any icc profile (sRGB)
What is a 16 bits-per-pixel component?
Each "channel" (e.g. Red, Green, Blue) can have a value between 0x0000 (no color), and 0xFFFF (full color). This allows greater depth of color, and more precision calculations.
For example. A "RED" pixel displayed with QuantumDepth of 8...
$ convert -size 1x1 xc:red -depth 8 rgb:- | hexdump
0000000 ff 00 00
0000003
Same for QuantumDepth go 16 ...
$ convert -size 1x1 xc:red -depth 16 rgb:- | hexdump
0000000 ff ff 00 00 00 00
0000006
And for Q32..? You guessed it.
$ convert -size 1x1 xc:red -depth 32 rgb:- | hexdump
0000000 ff ff ff ff 00 00 00 00 00 00 00 00
000000c
All-n-all, more memory is allocated to represent a color value. It gets a little more complex with HDRI imaging.
does jpeg image support 16 bits-per-pixel component ? does picture we take from camera in smartphone are in 8 bits-per-pixel component or 16 bits-per-pixel component ?
I believe JPEG's are 8bit, but I can be wrong here. I do know that most photographers KEEP all RAW files from device because JPEG doesn't support all the detail captured by the camera sensor. Here's a great write-up with examples.
I just need to load jpg images, crop/resize them and save. I also need to save the pictures in 2 different variants: one with the icc color profile management included and another without any icc profile (sRGB)
ImageMagick was designed to be "Swiss-Army-Knife" of encoders & decoders (+ a large amount of features). When reading a file, it decodes the format into something called "Authenticate Pixels" to be managed internally. The default size of the internal storage can be configured at time of compile, and for convenience the pre-build binaries are offered as Q8, Q16, and Q32. Plus additional HDRI support.
If your focused on quality, Q16 is a safe option. Q8 will be way faster, but limiting at times.
Also, you can find answer here (.net package, but means the same) : https://github.com/dlemstra/Magick.NET/tree/main/docs#q8-q16-or-q16-hdri
Q8, Q16 or Q16-HDRI?
Versions with Q8 in the name are 8 bits-per-pixel component (e.g.
8-bit red, 8-bit green, etc.), whereas, Q16 are 16 bits-per-pixel
component. A Q16 version permits you to read or write 16-bit images
without losing precision but requires twice as much resources as the
Q8 version. The Q16-HDRI version uses twice the amount of memory as
the Q16. It is more precise because it uses a floating point (32
bits-per-pixel component) and it allows out-of-bound pixels (less than
0 and more than 65535). The Q8 version is the recommended version. If
you need to read/write images with a better quality you should use the
Q16 version instead.
I need to convert many TIFF images to JPEG per second. Currently I'm using libmagick++ (Q16). I'm in the process of compiling ImageMagick Q8 as I read that it may improve performance (specially because I'm only working with 8bit images).
CImg also looks like a good option and GraphicsMagick claims to be faster than ImageMagic. I haven't tested either of those yet, but I was wondering if there are any other alternatives that could be faster than using ImageMagick Q8?
I'm looking for a Linux only solution.
UPDATE width GraphicsMagick & ImageMagick Q8
Base comparison (see comment to Mark): 0.2 secs with ImageMagick Q16
I successfully compiled GraphicsMagick with Q8, but after all, it seems about 30% slower than ImageMagick (0.3 secs).
After compiling ImageMagick with Q8, there was a gain of about 25% (0.15 secs). Nice :)
UPDATE width VIPS
Thanks to Mark's post, I give it a try to VIPS. Using the 7.38 version that is found in Ubuntu Trusty repositories:
time vips copy input.tiff output.jpg[Q=95]
real 0m0.105s
user 0m0.130s
sys 0m0.038s
Very nice :)
I also tried with the 7.42 (from ppa:dhor/myway) but it seems slighlty slower:
real 0m0.134s
user 0m0.168s
sys 0m0.039s
I will try to compile VIPS from source and see if I can beat that time. Well done Mark!
UPDATE: with VIPS 8.0
Compiled from source, vips-8.0 gets practically the same performance than 7.38:
real 0m0.100s
user 0m0.137s
sys 0m0.031s
Configure command:
./configure CC=c99 CFLAGS=-O2 --without-magick --without-OpenEXR --without-openslide --without-matio --without-cfitsio --without-libwebp --without-pangoft2 --without-zip --without-png --without-python
I have a few thoughts...
Thought 1
If your input images are 15MB and, for argument's sake, your output images are 1MB, you are already using 80MB/s of disk bandwidth to process 5 images a second - which is already around 50% of what a sensible disk might sustain. I would do a little experiment with using a RAMdisk to see if that might help, or an SSD if you have one.
Thought 2
Try experimenting with using VIPS from the command line to convert your images. I benchmarked it like this:
# Create dummy input image with ImageMagick
convert -size 3288x1152! xc:gray +noise gaussian -depth 8 input.tif
# Check it out
ls -lrt
-rw-r--r--# 1 mark staff 11372808 28 May 11:36 input.tif
identify input.tif
input.tif TIFF 3288x1152 3288x1152+0+0 8-bit sRGB 11.37MB 0.000u 0:00.000
Convert to JPEG with ImageMagick
time convert input.tif output.jpg
real 0m0.409s
user 0m0.330s
sys 0m0.046s
Convert to JPEG with VIPS
time vips copy input.tif output.jpg
real 0m0.218s
user 0m0.169s
sys 0m0.036s
Mmm, seems a good bit faster. YMMV of course.
Thought 3
Depending on the result of your test on disk speed, if your disk is not the limiting factor, consider using GNU Parallel to process more than one image at a time if you have a quad core CPU. It is pretty simple to use and I have always had excellent results with it.
For example, here I sequentially process 32 TIFF images created as above:
time for i in {0..31} ; do convert input-$i.tif output-$i.jpg; done
real 0m11.565s
user 0m10.571s
sys 0m0.862s
Now, I do exactly the same with GNU Parallel, doing 16 in parallel at a time
time parallel -j16 convert {} {.}.jpg ::: *tif
real 0m2.458s
user 0m15.773s
sys 0m1.734s
So, that's now 13 images per second, rather than 2.7 per second.
On my system, for a 5 MP image with a large window size (75px) it takes a whopping 140 ms (roughly 20 times as much as linear operations) to complete and I am looking to optimize it. I have noticed that the OpenCV gpu module does not implement a gpu version of the adaptiveThreshold so I have been thinking of implementing that algorithm for the GPU myself.
Can I hope for any speedup if I implement an adaptive threshold algorithm in CUDA, based on a large window size (50px+) and a large image (5 MP+), ignoring the overhead for loading memory into the GPU?
adaptiveThreshold documentation on opencv.org:
http://docs.opencv.org/modules/imgproc/doc/miscellaneous_transformations.html#adaptivethreshold
Building on Eric's answer:
The Npp CUDA library does not implement adaptiveThreshold but it seems beneficial to getting an adaptive threshold in a VERY straightforward way (just tested it and anecdotally works):
Run a box filter on src (i.e. compute mean window value for every pixel),
store in an intermediate image tmp.
Subtract a number K from each pixel in tmp
Run a compare function between src and
tmp into dst. The end.
The code may look like this (here K=0, 2nd step omitted):
nppiFilterBox_8u_C1R(oDeviceSrc.data(), oDeviceSrc.pitch(),
oDeviceIntermediate.data(), oDeviceDst.pitch(),
oSizeROI, oAdapThreshWindowSize,oAnchor);
nppiCompare_8u_C1R(oDeviceSrc.data(),oDeviceSrc.pitch(),
oDeviceDst.data(),oDeviceDst.pitch(),
oDeviceResult.data(),oDeviceResult.pitch(),
oSizeROI,NPP_CMP_LESS);
Also, wikipedia claims that applying a box filter 3 times in a row approximates a Gaussian filter to 97% accuracy.
Yes, this algorithm can be optimized on the GPU. I would expect to see an excellent speedup.
For ADAPTIVE_THRESH_MEAN_C, you could use a standard parallel reduction to calculate the arithmetic mean. For ADAPTIVE_THRESH_GAUSSIAN_C, you might use a kernel that performs per-pixel gaussian attenuation combined with a standard parallel reduction for the sum.
Implementation by CUDA should give you a satisfied performance gain.
Since your window size is large, this operation should be compute-bounded. The theoretical peak performance of a 5 MP image with 75px window on a Tesla K20X GPU should be about
5e6 * 75 * 75 / 3.95 Tflops = 7ms
Here's a white paper about image convolution. It shows how to implement a high performance box filer with CUDA.
http://docs.nvidia.com/cuda/samples/3_Imaging/convolutionSeparable/doc/convolutionSeparable.pdf
Nvidia cuNPP library also provides a function nppiFilterBox(), which can be used to implement ADAPTIVE_THRESH_MEAN_C directly.
http://docs.nvidia.com/cuda/cuda-samples/index.html#box-filter-with-npp
For ADAPTIVE_THRESH_GAUSSIAN_C, the function nppiFilter() with a proper mask could be used.
NPP doc pp.1009 http://docs.nvidia.com/cuda/pdf/NPP_Library.pdf