Slow font rendering in ImageMagick

Slow font rendering in ImageMagick - imagemagick

I'm using the following ImageMagick script (with Imagick for PHP) to generate an image of a font. This script takes about 0.1 seconds to generate an image of about 30 characters at size 48. The target speed is about 0.01 seconds. I'm afraid switching to the GD library may be the only way to achieve this (I read here that text generation is much faster in GD). However, without features like gravity and trim, it's much more cumbersome to generate this type of image using GD. Does anyone see an obvious bottleneck in this code, or is it time to switch libraries?
$image = new Imagick();
$draw = new ImagickDraw();
$background = new ImagickPixel('none');
$draw->setFont($font);
$draw->setFontSize($size);
$draw->setFillColor(new ImagickPixel('#'.$color));
$draw->setGravity(Imagick::GRAVITY_CENTER);
$draw->annotation(0, 0, $text);
$image->newImage(5*mb_strlen($text, 'UTF-8')*$size, 5*$size, $background);
$image->setImageFormat('png');
$image->drawImage($draw);
$image->trimImage(0);
$image->writeImage($path_server['dirname'].'/'.$path_server['basename']);

The answer was to switch libraries, but not to GD. Rather, I switched to GraphicsMagick, which is a fork of ImageMagick that focuses on efficiency and optimization. According to the GraphicsMagick website, it's used by some of the world's largest photo sites including Flickr and Etsy. The following GraphicsMagick code runs about 10 times faster than the corresponding ImageMagick code, which allowed me to hit my target of 0.01 seconds per operation (actually it's closer to 0.008 seconds):
$image = new Gmagick();
$draw = new GmagickDraw();
$draw->setfont($font);
$draw->setfontsize($size);
$draw->setfillcolor('#'.$color);
$draw->setgravity(Gmagick::GRAVITY_CENTER);
$draw->annotate(0, 0, mb_ereg_replace('%', '%%', $text));
$image->newimage(5*mb_strlen($text)*$size, 5*$size, 'none', 'png');
$image->drawimage($draw);
$image->trimimage(0);
$image->writeimage($path_server['dirname'].'/'.$path_server['basename']);
You'll notice that there are a few other nice features as well. For example, instead of having to define a color by creating an ImagickPixel object, most functions simply take a color as a string. Also, the function names seem more self-consistent in GraphicsMagick (annotate instead of annotation). Needless to say, I'm pretty happy with it.

Related

Scan video for text string?

My goal is to find the title screen from a movie trailer. I need a service where I can search a video for a string, then return the frame with that string. Pretty obscure, does anything like this exist?
e.g. for this movie, I'd scan for "Sausage Party" and retrieve this frame:
Edit: I found the cloudsight api which would actually work except cost is prohibitive # $.04 per call assuming I need to split the video into 1s intervals and scan every image (at least 60 calls per video).

No exact service that I can find, but you could attempt to do this yourself...
ffmpeg -i sausage_party.mp4 -r 1 %04d.png
/usr/local/bin/parallel --no-notice -j 8 \
/usr/local/bin/tesseract -psm 6 -l eng {} {.} \
::: *.png
This extracts one frame a second from the video file, and then uses tesseract to extract the text via OCR into files of the same name as the image frame (eg. 0135.txt. However your results are going to vary massively depending on the font used and the quality of the video file.
You'd probably find it cheaper/easier to use something like Amazon Mechanical Turk , especially since the OCR is going to have a hard time doing this automatically.

Another option could be implementing this service by yourself using the Scene Text Detection and Recognition module in OpenCV (docs.opencv.org/3.0-beta/modules/text/doc/text.html). You can take a look at this video to get an idea of how such a system would operate. As pointed out above the accuracy would depend on the font used in the movie titles, the quality of the video files, and the OCR.
OpenCV relies on Tesseract as the underlying OCR but, alternatively, you could use the text detection and localization functions (docs.opencv.org/3.0-beta/modules/text/doc/erfilter.html) in OpenCV to find text areas in the image and then employ a different OCR to perform the recognition. The text detection and localization stage can be done very quickly thus achieving real time performance would be mostly a matter of picking a fast OCR.

Tesseract on iOS - bad results

After spending over 10 hours to compile tesseract using libc++ so it works with OpenCV, I've got issue getting any meaningful results. I'm trying to use it for digit recognition, the image data I'm passing is a small square (50x50) image with either one or no digits in it.
I've tried using both eng and equ tessdata (from google code), the results are different but both get guess 0 digits. Using eng data I get '4\n\n' or '\n\n' as a result most of the time (even when there's no digit in the image), with confidence anywhere from 1 to 99.
Using equ data I get '\n\n' with confidence 0-4.
I also tried binarizing the image and the results are more or less the same, I don't think there's a need for it though since images are filtered pretty good.
I'm assuming that there's something wrong since the images are pretty easy to recognize compared to even simplest of the example images.
Here's the code:
Initialization:
_tess = new TessBaseAPI();
_tess->Init([dataPath cStringUsingEncoding:NSUTF8StringEncoding], "eng");
_tess->SetVariable("tessedit_char_whitelist", "0123456789");
_tess->SetVariable("classify_bln_numeric_mode", "1");
Recognition:
char *text = _tess->TesseractRect(imageData, (int)bytes_per_pixel, (int)bytes_per_line, 0, 0, (int)imageSize.width, (int)imageSize.height);
I'm getting no errors. TESSDATA_PREFIX is set properly and I've tried different methods for recognition. imageData looks ok when inspected.
Here are some sample images:
http://imgur.com/a/Kg8ar
Should this work with the regular training data?
Any help is appreciated, my first time trying tessarect out and I could have missed something.
EDIT:
I've found this:
_tess->SetPageSegMode(PSM_SINGLE_CHAR);
I'm assuming it must be used in this situation, tried it but got the same results.

I think Tesseract is a bit overkill for this stuff. You would be better off with a simple neural network, trained explicitly for your images. At my company, recently we were trying to use Tesseract on iOS for an OCR task (scanning utility bills with the camera), but it was too slow and inaccurate for our purposes (scanning took more than 30 seconds on an iPhone 4 at a tremendously low FPS). At the end, I trained a neural-network specifically for our target font, and this solution not only beat Tesseract (it could scan stuff flawlessly even on an iPhone 3Gs), but also a commercial ABBYY OCR engine, which we were given a sample from the company.
This course's material would be a good start in machine learning.

How do I choose a pixel format when creating a new Texture2D?

I'm using the SharpDX Toolkit, and I'm trying to create a Texture2D programmatically, so I can manually specify all the pixel values. And I'm not sure what pixel format to create it with.
SharpDX doesn't even document the toolkit's PixelFormat type (they have documentation for another PixelFormat class but it's for WIC, not the toolkit). I did find the DirectX enum it wraps, DXGI_FORMAT, but its documentation doesn't give any useful guidance on how I would choose a format.
I'm used to plain old 32-bit bitmap formats with 8 bits per color channel plus 8-bit alpha, which is plenty good enough for me. So I'm guessing the simplest choices will be R8G8B8A8 or B8G8R8A8. Does it matter which I choose? Will they both be fully supported on all hardware?
And even once I've chosen one of those, I then need to further specify whether it's SInt, SNorm, Typeless, UInt, UNorm, or UNormSRgb. I don't need the sRGB colorspace. I don't understand what Typeless is supposed to be for. UInt seems like the simplest -- just a plain old unsigned byte -- but it turns out it doesn't work; I don't get an error, but my texture won't draw anything to the screen. UNorm works, but there's nothing in the documentation that explains why UInt doesn't. So now I'm paranoid that UNorm might not work on some other video card.
Here's the code I've got, if anyone wants to see it. Download the SharpDX full package, open the SharpDXToolkitSamples project, go to the SpriteBatchAndFont.WinRTXaml project, open the SpriteBatchAndFontGame class, and add code where indicated:
// Add new field to the class:
private Texture2D _newTexture;
// Add at the end of the LoadContent method:
_newTexture = Texture2D.New(GraphicsDevice, 8, 8, PixelFormat.R8G8B8A8.UNorm);
var colorData = new Color[_newTexture.Width*_newTexture.Height];
_newTexture.GetData(colorData);
for (var i = 0; i < colorData.Length; ++i)
colorData[i] = (i%3 == 0) ? Color.Red : Color.Transparent;
_newTexture.SetData(colorData);
// Add inside the Draw method, just before the call to spriteBatch.End():
spriteBatch.Draw(_newTexture, new Vector2(0, 0), Color.White);
This draws a small rectangle with diagonal lines in the top left of the screen. It works on the laptop I'm testing it on, but I have no idea how to know whether that means it's going to work everywhere, nor do I have any idea whether it's going to be the most performant.
What pixel format should I use to make sure my app will work on all hardware, and to get the best performance?

The formats in the SharpDX Toolkit map to the underlying DirectX/DXGI formats, so you can, as usual with Microsoft products, get your info from the MSDN:
DXGI_FORMAT enumeration (Windows)
32-bit-textures are a common choice for most texture scenarios and have a good performance on older hardware. UNorm means, as already answered in the comments, "in the range of 0.0 .. 1.0" and is, again, a common way to access color data in textures.
If you look at the Hardware Support for Direct3D 10Level9 Formats (Windows) page you will see, that DXGI_FORMAT_R8G8B8A8_UNORM as well as DXGI_FORMAT_B8G8R8A8_UNORM are supported on DirectX 9 hardware. You will not run into compatibility-problems with both of them.
Performance is up to how your Device is initialized (RGBA/BGRA?) and what hardware (=supported DX feature level) and OS you are running your software on. You will have to run your own tests to find it out (though in case of these common and similar formats the difference should be a single digit percentage at most).

Using existing tools, how can I extract into separate images the Luma, Cb, Cr channels of a JPEG image?

I am seeking a method to extract into separate images the Luma (Y), Cb (blue component), Cr (red component), channels of the JPEG images:
Seattle Police Department image #1
Seattle Police Department image #2
Seattle Police Department image #3
I would like results equivalent to this example from Wikipedia.
The output must be calculated directly from the JPEG Start-of-Scan (SOS) data and other data in the JPEG, rather than 'back calculated' from the RGB images output by a decompressor. The purpose of this task is to produce images which represent the 'raw data' as closely as possible.
Are there existing tools which can accomplish this? I am considering throwing together something using Python, PyImage, etc. but I am surprised my search for open source or free tools has come up empty. I am aware there are many libraries which could help, although I am open to becoming aware of more libraries.
For this question, the correct answer would be a tool chain of free and/or open-source tools which can do the job. Tools with source are preferred. These tools can run on any platform, but Linux or Win32 would be immediately useful.
Answer inspired by codelogic
Given the libjpeg implementation, change djpeg.c and wrppm.c.
wrppm.c:
189: case JCS_RGB:
190: + case JCS_YCbCr:
191: /* emit header for raw PPM format */
djpeg.c
560: case FMT_PPM:
561: + cinfo.quantize_colors = 0;
562: + cinfo.out_color_space = JCS_YCbCr;
Obviously, this is quick hack, because I have a private copy where PPM output is always forced to YCbCr, but it works and I thank you, codelogic, for your Stone Code Logic.

As suggested your best bet would be to use libjpeg directly. Specifically, you might be able to set jpeg_decompress_struct's out_color_space member to be JCS_YCbCr instead of JCS_RGB and read the scanlines as usual. Here's some sample code (GPL).

Well the obvious one is libjpg.

How to read a bitmap in OCAML?

I want to read a bitmap file (from the file system) using OCAML and store the pixels (the colors) inside an array which have th dimension of the bitmap, each pixel will take one cell in the array.
I found the function Graphics.dump_image image -> color array array
but it doesn't read from a file.

CAMLIMAGE should do it. There is also a debian package (libcamlimage-ocmal-dev), as well as an installation through godi, if you use that to manage your ocaml packages.
As a useful example of reading and manipulating images in ocaml, I suggest looking over the code for a seam removal algorithm over at eigenclass.
You can also, as stated by jonathan --but not well-- call C functions from ocaml, such as ImageMagick. Although you're going to do a lot of manipulation of the image data to bring the image into ocaml, you can always write c for all your functions to manipulate the image as an abstract data type --this seems to be completely opposite of what you want though, writing most of the program in C not ocaml.
Since I recently wanted to play around with camlimages (and had some trouble installing it --I had to modify two of the ml files from compilation errors, very simple ones though). Here is a quick program, black_and_white.ml, and how to compile it. This should get someone painlessly started with the package (especially, dynamic image generation):
let () =
let width = int_of_string Sys.argv.(1)
and length = int_of_string Sys.argv.(2)
and name = Sys.argv.(3)
and black = {Color.Rgb.r = 0; g=0; b=0; }
and white = {Color.Rgb.r = 255; g=255; b=255; } in
let image = Rgb24.make width length black in
for i = 0 to width-1 do
for j = 0 to (length/2) - 1 do
Rgb24.set image i j white;
done;
done;
Png.save name [] (Images.Rgb24 image)
And to compile,
ocamlopt.opt -I /usr/local/lib/ocaml/camlimages/ ci_core.cmxa graphics.cmxa ci_graphics.cmxa ci_png.cmxa black_and_white.ml -o black_and_white
And to run,
./black_and_white 20 20 test1.png

I don't know of an out-of-the box way to do it. You could open the file with open_in and read it byte at a time with input_char, suck in the header and the data and build up the color array array that way for simple formats (e.g. BMPs) but for anything like JPGs or PNGs a roll your-own solution would probably be more work than you want to get into.

You could also use one of the numerous SDL bindings for OCaml, specifically the SDL_image ones, which let you load all kinds of images easily, and provides functions to access individual pixels and raw data as an array.
OCamlSDL is a popular one.

If you don't want to use CAMLIMAGE, usually raw RGB or PNM/PPM (which have an easy to create header format followed by RGB values) images are used. ImageMagick allows you to then view this formats or convert them into more usable formats.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart