imagemagick splitting large pdf into png's - imagemagick

I have a pdf I'd like to split into individual pictures, each page is a picture, I am using the following imagemajick command to do so:
convert -density 400 mypdf.pdf out.png
and it works fine however I have tested it on the first 5 pages of my pdf and it took 10 seconds, at this rate it should take about half an hour to split my pdf, which seems strange to me considering that I'm not really doing anything fancy, I'm not rotating the images or modifying them in anyway, I'd like to know if there is a faster way to do this. Thanks
Also, I'd like to preserve the quality, I was doing it before without the density flag but the quality dropped dramatically.

PDF rendering is a bit of a mess.
The best system is probably GhostScript, and MuPDF, its library form. It's extremely fast and scales well to large documents. Unfortunately the library licensing (AFL) is difficult and you can't really link directly to the binary.
ImageMagick gets around this restriction by shelling out to the ghostscript command-line tool, but of course that means that rendering a page of a PDF is now a many-stage process: the PDF is copied to /tmp, ghostscript is executed with a set of command-line flags to render the document out to an image file in /tmp, this temporary image file is read back in again, a page is extracted and finally the image is written to the output PNG.
On my laptop I see:
$ time convert -density 400 nipguide.pdf[8] x.png
real 0m2.598s
The other popular PDF renderer is poppler. This came out of the xpdf document previewer project, so it's fast, but is only really happy rendering to RGB. It can struggle on large documents too, and it's GPL, so you can't link to it without also becoming GPL.
libvips links directly to poppler-glib for PDF rendering, so you save some copies. I see:
$ time vips copy nipguide.pdf[page=8,dpi=400] x.png
real 0m0.904s
Finally, there's PDFium. This is the PDF render library from Chrome -- it's the old Foxit PDF previewer, rather crudely cut out and made into a library. It's a little slower than poppler, but it has a very generous license, which means you can use it in situations where poppler would just not work.
There's an experimental libvips branch which uses PDFium for PDF rendering. With that, I see:
$ time vips copy nipguide.pdf[page=8,dpi=400] x.png
real 0m1.152s

If you have Python installed, you should try PyMuPDF. It is a Python binding for MuPDF, extremely easy to use and extremely fast (3 times faster than xpdf).
Rendering PDF pages is bread-and-butter business for this package. Use a script like this:
#----------------------------------------------------------------------------------
import fitz
fname = sys.argv[1] # get filename from command line
doc = fitz.open(fname) # open the file
mat = fitz.Matrix(2,2) # controls resolution: scale factor in x and y direction
for page in doc:
pix = page.getPixmap(matrix=mat, alpha=False)
pix.writePNG("p-%i.png" % page.number) # write the page's image
#----------------------------------------------------------------------------------
More to "Matrix":
This form scales each direction by a factor of 2. So the resulting PNG becomes about 4 times larger than the default version in original, 100% size. Both dimensions can be scaled independently. Rotation or rendering only parts of a page is possible also.
More to PyMuPDF:
Available as binary wheel for Windows, OSX and all Linux versions from PyPI. Installation therefore is a matter of a few seconds. The license for the Python part is GNU GPL 3, for the MuPDF part GNU AFFERO GPL 3. So it's open source and freeware. Creating commercial products is excluded, but you can freely distribute under the same licenses.

Related

Change pixel order of a .tiff file from rgbrgb to rrbbgg (interleaved to non-interleaved)

I have been trying to figure out a way to create non-interleaved .tiff files, as described here: https://questionsomething.wordpress.com/2012/07/26/databending-using-audacity-effects/ (under the heading of "The photographic base").
It seems like it's a trivial thing using photoshop, but I'm on linux and would hate to get myself a copy just for this one option. If anyone knows of a way, be it via imagemagick, hacking the gimp or some obscure program, I'd be glad for any suggestions.
If TIFF parlance, you have a file in contiguous planar configuration, and want separate planar configuration.
The tiffcp utility that comes with LibTIFF can do this for you. Use the -p separate option:
tiffcp -p separate src.tif dest.tif
See the man page.

How to convert CorelDraw .WI wavelet-compressed image

I have a large sample of .WI images I need to convert to e.g. JPEGs, but the format now seems defunct.
The mimetype is image/wavelet.
The compression algorithm was developed by Summus, a US company that also now seems defunct.
The last CorelDraw support for the format was under 32-bit Windows. If I go down the hardware route I need to be able to make calls to a server via e.g. REST.
I think under *nix djvulibre might be able to open the files, but I haven't been able to test this yet.
Another option is to re-implement the codec myself.
It would be a nice-have to be able to script this.
Here's an example file http://www.wolfgang-rolke.de/graphics/wavelet.wi

Latex generated pdf unreadable

Of late, I have observed that pdf generated by latex files are unreadable in certain email browsers (when previewing the attachment in Outlook) as well as the printed hard copy especially math symbols like inner products, integral etc overlap with each other making the file ugly and unreadable. Surprisingly the same file looks perfectly fine when viewed using the ShareLatex built-in pdf browser as well as the desktop version of Adobe Reader.
ShareLatex documentation suggest switching the PDF viewer from built-in to native. Upon changing to native, even the browser version had unreadable characters.
[https://www.sharelatex.com/learn/Kb/Changing_PDF_viewer]
So, I would like to know if there is better way to compile the tex file in Sharelatex so that its readable across platforms and in print.
Most of the "pdf generation from tex" related issues posted on StackOverflow point out problems with viewing images. As such the pdf files I am generating don't contain any images.
Thanks in advance !
AFAIK there's not a single build-in PDF viewer (browser, e-mail client, ...) that works well. But what you could test is if \usepackage{lmodern} makes things better ...

In Carrierwave, how to compress images for Google PageSpeed

When I use Google PageSpeed, I'm being told I need to compress my images. Example:
Compressing https://xxx.s3.amazonaws.com/xxxx.jpg could save 33.2KiB (66% reduction).
I'm not sure how to make Google happy here. In Carrierwave, I have the following setting:
version :thumb do
process resize_to_fill: [340, 260]
process :quality => 86
end
If I move the process quality to anything other than 86, the image doesn't look so good. Is there some other setting/trick I'm missing to compress images in a way that will make Google PageSpeed happy and help my site load fast?
I haven't tried resize_to_limit helper, which may help you:
process :resize_to_limit => [340, 260]
It will resize the image to fit within the specified dimensions while
retaining the original aspect ratio. Will only resize the image if it
is larger than the specified dimensions.
There are couple of ways for image optimization that you can perform. Desktop and Online. For Desktop, I would suggest using JPEGOPTIM utility to optimize jpeg files.
Provides lossless optimization (based on optimizing the Huffman
tables) and "lossy" optimization based on setting maximum quality
factor.
If you are on Linux, install it from your terminal:
sudo apt-get install jpegoptim
Then go to the folder where your image is and check first size of it:
du -sh photo.jpg
after that run this command below to optimize it:
jpegoptim photo.jpg
You will see the output.
You can also compress the given image to a specific size to, but it
disables the lossless optimization.
You can also optimize your images in batch with this command:
jpegoptim *.JPG
Another Desktop way is to do basic optimization manually with PS or GIMP. including cropping unnecessary space, reducing color depth to the lowest acceptable level, removing image comments and ( Save for web option )
You can use online solutions too. There are plenty of them, I suggest these ones for example:
https://tinypng.com
https://kraken.io
There is also a WebP format ( developed by Google ) Chrome & Opera support it, but Firefox is not supporting it, so basicly images need to be served conditionally based on the HTTP Accept header sent by browsers capable to display this format. Check this Blog in case that you opt for WebP format, there is a gem which you can use. ( Rails 4 )
I hope it helps,

optimze pdf file size in rails app

For a rails app that works a lot with uploaded image heavy pdf files I'm looking for a way to optimize the file size of uploaded pdf's.
Adobe Acrobat has a 'save as reduced file size pdf' option which often halves the filesize when images are included.
I would like to do a similar action that is triggered after a file upload in my rails app.
Any ideas?
While #lzap's comment may be true, if you still want to give it a shot, you might look at pdftk (PDF Toolkit). Its a library for manipulating and creating PDF files that looks like it offers the ability to compress a given pdf file.
The library can be installed on most major operating systems, so if you have the ability to install it on your host, then simply call:
system("pdftk uncompressed-input.pdf output compressed-outpu.pdf compress")
inside your rails app whenever you want to compress a particular PDF file. I have no idea how long this would take, and if you are compressing many PDF's at the same time, you may want to consider handing off to a background job (without this, Rails will wait until the compression is done before returning anything to the browser, probably causing a timeout error for long running groups of compression calls).
Also, if your file names come from user input, be extra careful to avoid injection attacks.

Resources