Retaining colour compoent when doing OCR using Image magick - tesseract - imagemagick

My initial input is a colour multi column JPG file. I run image magick on this to create a TIFF file which tesseract 4.0 then performs OCR on to convert the TIFF to a PDF with the text in a searchable form.
Problem with this is because the TIFF output from Imagemagick is monochrome ( which is has to be for tesseract to extract the text correctly ) the final PDF is monochrome with the text highlightable on it. What I am trying to figure out is , is there a way to retain the colour of the original document when Imagemagick converts it to TIFF?
I am running on Ubuntu 14.0
The goal is to start with a coloured JPG image ( book scan but I don't have control over the scan process so always get a JPG ) which has text on it and convert this to a PDF file which looks the same as the JPG but with the text in a searchable/highlightable format.
My imageMagick command to convert the JPG to tiff is
convert -density 300 MyImage.jpg -depth 8 -lat 30x30-5% MyImage.tiff
MyImage.tiff is black and white which works best for Tesseract to to its OCR.
Tesseract command to convert to PDF is
tesseract MyImage.tiff MyImage pdf
But the final PDF will be black and white. What I would want to have is the text overlayed on a colour version of the original JPG.
Tesseract will only give decent results if using a monochrome input tiff file

Related

How to get optimal result when converting SVG to PDF with ImageMagick?

In Windows 10, I use ImageMagick-7.0.10-Q16. When I execute this command-line:
magick clocky.svg test.pdf
... then I get a PDF where the image in the PDF is very pixelated:
This is the source SVG image: https://svgshare.com/s/V0f
How can I get an optimal result when converting SVG to PDF without pixelation with ImageMagick?
Isn't there a way to include the SVG natively in the PDF, so the image in the PDF looks always smooth at all zoom values? (Like it is obviously done by Inkscape when saving an SVG to PDF).
In ImageMagick, you would specify a density before reading the SVG and then resize afterwards. So
magick -density 288 clocky.svg -resize 25% test.pdf
288=4*72 so we resize by 1/4=25%

imagemagick convert RGB PNG to CMYK PDF

I am trying to create a PDF file using Latex. However, Latex does not handle TIFF or any other image format capable of both transparency and CMYK. The only solution I think I can use is to convert the PNG image to PDF and embed those in the file.
I am somewhat familiar with imagemagick, however, I am having trouble figuring out how to convert a PNG (probably in the RGB/SRGB colour space) to a PDF in the CMYK colour space.
How do I go about doing this conversion so that the colours are correct and the transparency remains?
In Imagemagick, you should use a CMYK type profile to do the conversion:
convert input.png -profile USWebCoatedSWOP.icc output.pdf
Note, however, that Imagemagick will simply put the raster image into a vector PDF shell. It will not vectorize the image.

How can I convert .psd files to jpg,pdf and png (RGB/CMYK) using imagemagick

Can I use multiple variations of the below Imagemagick command to achieve the to convert a psd to png, jpg and pdf with rgb and cmyk settings
convert -colorspace rgb/cmyk original.psd convert.png/jpg/pdf
I am a newbie trying to play with file formats and I also want to convert the output files based on RGB and CMYK.
Any help would be greatly appreciated.

Ruby + RMagick + base64 image + RGB conversion from GrayScale doesn't work

I'm uploading base64 encoded image to a RoR application. When I receive the image, it has a rgb color scheme (correct), when I write the image on file to be uploaded with paperclip gem, the image color scheme change from rgb to grayscale.
Here is the code:
source = src.gsub(/^data:image\/(png|jpg|jpeg);base64,/,"")
blob = Base64.decode64(source)
img = Magick::Image.from_blob(blob).first
img.colorspace = Magick::SRGBColorspace
img.add_profile "#{Rails.root.to_s}/lib/color_profiles/RGB.icc"
img.write(url = "#{Rails.root.to_s}/tmp/#{self.id}_logo.png")
image = File.open(url)
the img is correctly a RGB image, if I check the resulted created file:
identify -format "%[colorspace]" #{url}
the color scheme is Gray.
Additional info:
The uploaded image is all black with white text, if I upload same image with red background, the final image is correctly an RGB image.
There seems to be a bug in ImageMagick 6.9.9.27 and 7.0.7.15 when reporting the conversion of a grayscale image to RGB PNG. Identify -verbose is reporting grayscale but the string format %[colorspace] is properly reporting sRGB as are the PNG tags. I have reported this bug. For example:
convert logo: -colorspace gray logo.jpg
convert logo.jpg PNG24:logo.png
convert logo.png -format "%[colorspace]" info:
sRGB
identify -verbose logo.png
...
Colorspace: Gray
...
png:IHDR.color-type-orig: 2
png:IHDR.color_type: 2 (Truecolor)
I do not understand. Is your image a color image or a grayscale only image?
IM 6.7.7.10 was during a time that ImageMagick was changing from non-linear gray to linear gray and back again. And also had RGB and sRGB swapped. So you may have a version where gray was linear (darker than non-linear gray) or where RGB and sRGB were swapped. You can convert back to non-linear using one of the following (I do not recall which to use at this time). The other will convert from linear to non-linear. If I assume your input image was grayscale and not color, then try one of these:
convert input -colorspace RGB -colorspace gray result
or
convert input -colorspace sRGB -colorspace gray result
If it is not grayscale, but color only, then leave off the -colorspace gray in these commands.
I would urge you to upgrade if you can. You are well over 200 versions old.
P.S. It is also possible your profile is causing a problem. I don't know what the RGB.icc profile is. Is that an Adobe RGB profile or an sRGB profile.
Can you reproduce your problem using Command Line ImageMagick? If so, post the command line you used. Sorry I do not know Ruby or RMagick.
P.S. 2 Apart from the lighter/darker issue, if you are trying to convert a grayscale image to color, then you will need to specify the output as PNG24:name.png. That is the only way to force a grayscale image to report colorspace=RGB without inserting color pixels.

Image conversion to Grayscale using ImageMagick is very dark

I converted a bunch of "normal" JPG photos via
convert infile -colorspace Gray outfile
to monochrome. However the result is for all images very dark. Here a sample conversion: original photo and converted monochrome image.
Is there a better way to convert a photo-realistic image with ImageMagick to gray-scale?
The documentation states that when changing the color space, the colors are converted from their original gamma to linear before the conversion. You need to convert them back to an appropriate gamma.
convert infile -colorspace Gray -gamma 2.2 outfile

Resources