Alfresco Transformer - Ubuntu Script Multi page PDF to OCR

Alfresco Transformer - Ubuntu Script Multi page PDF to OCR - imagemagick

I am dealing with a problem when calling a script to do a transformation. I have a script in ubuntu that splits a multipage pdf in single page pdf files, then with convert (from imagemagick) transforms it to tif, then it generates the html with tesseract ocr, convert it back to pdf with the text layer, and merge everythin back into a single pdf with text layer.
The scripts works fine in the console, but in Alfresco, because of different enviroment variables in the path, use a different convert (/opt/alfresco-3.4.d/common/bin/convert) instead of /usr/bin/convert. The result is a pdf 1.3 instead a tiff so tesseract does not do nothing. The servlet is tomcat, I tried to copy the /usr/bin/convert to catalina home, and to alfresco common directory , rename the convert to conv and call it, etc but nothing happen.
How could I tell Alfresco to use the right convert instead of his /opt/alfresco-3.4.d/common/bin/convert
Thanks

Related

imagemagick convert pdf with text(no scan) to gibberish

I am processing pdf files with imagemagick to images but this particular file is processed to some really gibberish stuff
To simplify stuff I am doing simple
convert file.pdf out.jpg
Just an idea is that it is mix of text pdf and image pdf and this could cause troubles. Can you help?
Pages of document which are in text are converted to this gibberish, last page which is actually scan is fine
this is the link to the original
file
EDIT: I found out that also files without combination of text and scan are causing issues, actually files which contain text data, not scanned image. So the question is how to setup imagemagick to convert pdf with pure text to image without getting this output

Problem was with ghostscript 9.22,
update to 9.23 helps

Ruby on Rails code for converting TeX to PNG on the server side

I am creating a Ruby on Rails app very similar to Wikipedia. In Wikipedia you can add math equations to a page using TeX/LaTeX. Then when you preview the page these TeX equations are converted to PNG images on the server side and the images are embedded in the text. Are you aware of a code that I can use to do that in Rails? There is a gem rails-latex but that doesn't do it. As far as I know MathJax doesn't do that either.

MediaWiki uses Texvc to create png images from tex. see: http://www.mediawiki.org/wiki/Texvc
Texvc is an external program, so you have to start it via system() and save the result in a special folder.
For example like this:
system('texvc /home/wiki/tmp /home/wiki/math "y=x+2" iso-8859-1 "rgb 1.0 1.0 1.0"')

How to convert a string to pdf file in Python without using temp txt-file on HDD?

I have a big library in plain txt-format.
I need to convert these files into pdf format (from inside Python script, not from command-line), but previously I need to make some manipulations on the original files' text.
I'm just reading the files' content into string, make the needed changes, and then I want to output the changed string into pdf-file, but without creating temporary text file on HDD.
Is there any way to do that?
Thanks in advance.
P.S. BTW, the library is in Russian, so I suppose I'll need to take care of encodings?

use the ReportLab toolkit: http://www.reportlab.com/software/opensource/rl-toolkit/
(it is also on PyPi: pip install reportlab; or if you are running Linux use the package manager)
The default built-in fonts of PDF do not support Russian, so you will have to do something
like:
canvas.setFont('DejaVuSans',10)
(replace 'DejaVuSans' with an installed font name you know has your characters in it).
This will incorporate that font in your PDF and make the resulting file about 20K bigger than without.
It is also possible to generate the PDF to memory, if that is necessary.

ImageMagick create PDF version 1.4 from image?

I know that I can use ImageMagick's convert tool to turn different image files into PDF documents. However, is there some way to specify what version of PDF document I want to use for the output? Can I convert an image to a PDF v1.4 document?
I am trying to find a way to automate the conversion of image files (probably SVG) to PDF files that need to be sent to a printing service. The printer's service requires the PDF files to meet certain requirements, and one of them is that the PDF file is v1.4. My version of convert is "6.5.7-8 2010-12-02 Q16".
Thanks,
Carl

This question on superuser.com
https://superuser.com/questions/193791/batch-convert-pdf-versions
will give you some hints how to change the version number in the PDF afterwards.

SVG files in Raphael, can they be used?

I have an SVG file that I would like to display via Raphael (each svg file is a node in a tree I'm trying to draw, the actual connections of the tree will be made by raphael). I tried something like:
var vector_image = paper.image("test.svg", 50,50,50,50);
but no dice, seems only "real" image files like png or jpeg are accepted? I find this very strange as Raphael itself uses Scalable Vector Graphics.
Is there anyway (short of parsing the SVG files into javascript snippets and pasting them into the html document) to display existing SVG files using Raphael (or any other vector based javascript graphical engine?)
If parsing it will have to be, is there any easy way to do this, short of just manually scraping the files? I'm running this code on a Ruby on Rails server, so I'd like to avoid solutions outside this framework, if possible (I've heard of one PHP solution through this site...I'd rather code by hand than add another language onto this project).
-Jenny

It's currently not possible to display existing SVG with Raphael, and there are apparently no plans for the implementation of SVG editing (see this forum post).
As for alternative JavaScript libraries, a newer alternative is Snap.svg, which can load external SVG files via its Snap.load() function.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Alfresco Transformer - Ubuntu Script Multi page PDF to OCR - imagemagick

Related

imagemagick convert pdf with text(no scan) to gibberish

Ruby on Rails code for converting TeX to PNG on the server side

How to convert a string to pdf file in Python without using temp txt-file on HDD?

ImageMagick create PDF version 1.4 from image?

SVG files in Raphael, can they be used?

Categories

Resources