In a project we use GIMP to create banners (which are saved in the GIMP native format). All the work is done by humans. But it is often tedious work which consists of replacing one logo with an other logo with the same dimensions or one piece of text with an other piece of text. Sometimes many 100 times.
What is the best way to automatically replace an image (with the same dimensions) or a text in a GIMP file? Does it make more sense to script it within GIMP or is it better to do open heard surgery on the file itself without GIMP? Or is there a command line tool which I can use for this?
Related
I'm working on a college project that involves OCRing a certain digit-code (with a few other characters as seperators - mainly '.','/' etc..) .
that digit code (printed on products for example) is usually in "digital" fonts (e.g. 7-segment-like font, or a pixelated font etc.).
So I am trying to train Tesseract on several digital fonts I've found online, similar to those used with these code.
The thing is, that Tesseract recognizes the tiff files I provide it as blank pages.
Things I've tried:
1. creating a .box file using JTesseract & qt-box (and adjusting the boxes manually) : in this case, the box & tiff are read by Tesseract and I'm getting the output "1 Page", but no characters are recognized and the tr file in blank.
creating a .box file with Tesseract's makebox - in this case no boxes are created at all.
PS - I manage to train it just fine using more traditional fonts (Arial for example)
Any ideas?
Im attaching an image of such an example font.
Thank you!
I managed to work around most of the issues. Posting it in case it could help anyone else:
I did 2 steps to get Tesseract to identify my text:
Image processing on the training images - I've applied some image processing methods (mainly dilate, erode and some blur) to sort of "connect" the pixels in the text that were segmented or separated from one another. Its VERY IMPORTANT to apply the same steps exactly to the images to be fed to the OCR.
I've noticed that simply saving images as TIFF/PNGs via code doesn't save the DPI setting in the header for some reason (and Tesseract identified the as 0 DPI). I assume there's a code-way to do that but I didn't have time, so I just opened the files in Photoshop and saved them from there.
I'm not entirely sure if it was step 1,2 or both that solved my issue, but most characters were eventually identified.
I'm looking for a way to locate known text within an image.
Specifically, I'm trying to create a tool convert a set of scanned pages into PDFs that support searching and copy+paste. I understand how this is usually done: OCR the page, retaining the position of the text, and then add the text as an invisible layer to the PDF. Acrobat has this functionality built in, and tesseract can output hOCR files (containing the recognized text along with its location), which can be used by hocr2pdf to generate a text layer.
Unfortunately, my source images are rather low quality (at most 150 DPI, with plenty of JPEG artifacts, and non-solid backgrounds behind some of the text), leading to pretty poor OCR results. However, I do have the a copy of the text (sans pictures and layout) that appears on each page.
Matching already known text to it's location on a scanned page seems like it would be much easier to do accurately, but I failed to discover any software with this capability built-in. How can I leverage existing software to do this?
Edit: The text varies in size and font, though passages of it are consistent.
The thought that springs to mind for me would be a cross-correlation. So, I would take the list of words that you know occur on the page and render them one at a time onto a canvas to create a picture of that word. You would need to use a similar font and size as the words in the document - which is what I asked in my comment. Then I would run a normalised cross-correlation of the picture of the word against the scanned image to see where it occurs. I would do all that with ImageMagick which is available for Windows and OSX (use homebrew on OS X) and included in most Linux distros.
So, let's take a screengrab of the second paragraph of your question and look for the word pretty - where you mention pretty poor OCR.
First, you need to render the word pretty onto a white background. The command will be something like this:
convert -background white -fill black -font Times -pointsize 14 label:pretty word.png
Result:
Then perform a normalised cross-correlation using Fred Weinhaus's script from here like this:
normcrosscorr -p word.png scan.png correlation-result.png
Match Coords: (504,30) And Score In Range 0 to 1: (0.999803)
and you can see the coordinates of the match are 504,30.
Result:
Another Idea
Another idea might be to take Google's Tesseract-OCR and replace the standard dictionary with the text file containing the words on the page you are processing...
I'm making a GUI for selecting regions to crop from images. I have been using Seesaw and cans select rectangular regions, but cannot find a way to set an image to the background of seesaw.canvas. This suggests using icons on labels. Can I make a label paintable and then use it as a canvas? Is there a way to overlap a label and a canvas or somehow use a panel that gives a background to its contents?
I think Quil has this functionality, but I'm not sure how to build a GUI around its draw, setup, sketch form if I want add widgets.
Existing solutions would appreciated as well, as long as I can decompose them. Using GIMP or Photoshop isn't an option for the workflow I want: multiple crops per photo, of different kinds on each page and different metadata added depending on the type of image outlined. Any suggestions for libraries for working with metadata for photos? I was planning on using a shell interface to exiftool, but a more portable option may be better.
You can draw a java.awt.Image (or sub-class) to a canvas with seesaw.graphics/image-shape:
(require '[seesaw.graphics :as g])
(defn paint-canvas [c g2d]
(g/draw g2d (g/image-shape my-image 0 0) (g/style)))
It seems like that should do it.
Also note that labels (and all Seesaw widgets) are paintable. Just set the :paint option like on a canvas and paint away.
Business Insight:
We are in education domain and we have a requirement to automate the conversion of labeled images (EPS), into interactive exercises
(using HTML/SVG/JavaScript), used by students.
Technical Insight:
Layered EPS files is what we get from the pubishers. The EPS files should be converted into two PNG files: [1.png] Which has label texts only [2.png] Everything else but label texts.
Then [1.png] should be run through some advanced OCR (?) program that should output the label texts along with their positions (X,Y coords) in the image. Then HTML/JavaScript could be used to overlay the label texts over the [2.png] along with some interactions like Drag'n'drop using JavaScript.
Tried so far:
Manually converted the EPS into PNG and used ImageMagick and Tessaract OCR to get the label text alone.
Question:
How far the above requirements of image processing (EPS->PNG+text labels with coords) could be automated and what are the best tools that could be used? Appreciate the help in advance.
PS: I'm an UI developer and could handle the HTML/JavaScript part, if just the coords are provided for the labels.
I am using subfloats to import 2 .png files in a figure to basically generate subfigures. There's no space between the figures when I compile it. How do I put some white space in between them? And is it possible to convert them to black and white using LaTeX?
Try \vskip 1em between the figures. If this doesn't work, post your nonworking figure code so we can see what's going on.
To put white space in between them, you can throw a \quad—or anything that makes whitespace—in between the subfloats within the figure. Converting to grayscale is not in the graphicx package as far as I can see; all the literature on it says to use an external program. My first instinct is to explore TikZ (a LaTeX package) just because it is so powerful, but I have no idea if it can do anything with external graphics. This Page deals with compile time options to specify a grayscale or color version. For example, if you have all the color images in one folder, you batch convert them to grayscale and put them in a separate folder with identical filenames. Then, at compile time, specify from which folder to get the images. See the comments for the graphicspath implementation.