Let's say I have a directory with 5 TIFF files in it and I want to convert some of them to a multipage PDF, but that there are other TIFS in the same directory that I do not want in the same PDF.
In other words, I want to convert file1.TIF, file2.TIF, file3.TIF --> foo.pdf, but I want to ignore file4.TIF and file5.TIF located in the same folder.
It would seem from the documentation that the only way to do this is to provide ImageMagick with a text file listing out the files and then point to it when calling the program, as in:
convert #FilesToConvert.txt C:\foo3.pdf
Is there no way to make the call inline though, so that I don't have to create a separate text file for each conversion?
Thanks in advance!
You should be able to use:
convert file1.TIF file2.TIF file3.TIF foo.pdf
I want to write simple text pdf file and I use Synopse pdf Delphi library.
Is it possible to write one text line to file and it automatically insert new line to file without using coordinates?
The easiest is to use the mORMotReport.pas unit.
It is very easy to add some text, with automatic insert of lines, and page layout.
See this sample folder as reference.
I need a sample code for converting DocumentFormat.OpenXml.Presentation.shape into a HTML SVG element in .net. Please help me for this.
Thanks in advance.
RagesH.
If the presentation has shapes stored as a WMF image, then you can pass the images as a file or stream and convert it to SVG format. There are tools to convert WMF2SVG.
Refer this page.
This is for Java https://code.google.com/p/wmf2svg/
For using it with .NET, you have to convert this Jar into DLL and use it.
For Converting Jar into DLL refer this https://code.google.com/p/jar2ikvmc/
This works for me.
Hope this helps!
Org-mode is a power editor. I use it to write scientific notes and produce them into tex/PDF files.
When I produce .org file into .tex or .pdf file, some codes are generated automatically by org-mode, such as:
\hypersetup{
pdfkeywords={},
pdfsubject={},
pdfcreator={Emacs Org-mode version 7.8.11}}
Those codes make some information occupied the first page of PDF file, that is useless for me.
How to prevent those codes from appearing on the tex-file produced and its PDF?
I use this in my init file:
(setq org-latex-with-hyperref nil)
Then you have to put your own hyper setup in where you want it.
I'm looking for a fast and reliable way to read/parse large PDF files in Ruby (on Linux and OSX).
Until now I've found the rather old and simple PDF-toolkit (a pdftotext-wrapper) and PDF-reader, which was unable to read most of my files. Though the two libraries provide exactly the functionality I was looking for.
My question: Have I missed something? Is there a tool that is better suited (faster and more reliable) to solve my problem?
You might find Docsplit useful:
Docsplit is a command-line utility and Ruby library for splitting apart documents into their component parts: searchable UTF-8 plain text, page images or thumbnails in any format, PDFs, single pages, and document metadata (title, author, number of pages...)
After trying different methods, I'm using PDF-Toolkit now. It's quite old, but it's fast, stable and reliable. Besides, it really doesn't need to be new, because it just wraps the xpdf commandline utilities.
You could use JRuby and a Java PDF library parser such as ApachePDFBox (https://www.ohloh.net/p/pdfbox). See also http://java-source.net/open-source/pdf-libraries.
Did you have a look at the CombinePDF library?
It's a pure ruby solution that allows some PDF manipulation, such as extracting pages, overlaying one PDF page over another, page numbering, writing basic text and tables, etc'.
Here's an example for stumping an existing PDF file with a logo. The example reads a PDF file, extracts one page to use as a stamp and stamps another PDF file.
require 'combine_pdf'
company_logo = CombinePDF.load("company_logo.pdf").pages[0]
pdf = CombinePDF.load "content_file.pdf"
pdf.pages.each {|page| page << company_logo}
pdf.save "content_with_logo.pdf"
You can also stamp text, number pages or use :
require 'combine_pdf'
pdf = CombinePDF.load "content_file.pdf"
pdf.number_pages #adds page numbers. you can add formatting and placement options.
pdf.pages.each {|page| page.textbox "One Way To Stamp"}
#you can a shortcut method to stamp pages
pdf.stamp_pages "Another way to stamp"
#you can use the shortcut method for both text and PDF stamps
company_logo = CombinePDF.load("company_logo.pdf").pages[0]
pdf.stamp_pages company_logo
# you can use write simple tables
pdf.pages[0].write_table headers: ['first name', 'surname'], table_data: [['John', 'Doe'], ['Mr.', 'Smith']]
pdf.save "content_with_logo.pdf"
It's not meant for complex operations, but it complements most PDF authoring libraries and allows you to use PDF templates instead of writing the whole thing from scratch.
Here's some options:
http://en.wikipedia.org/wiki/List_of_PDF_software
From that link, and searching sourceforge, there's a couple of command line utilities that might do what you want, like this one: http://pdftohtml.sourceforge.net/
Depending on your requirements and what the PDFs look like, you could look at using the Google Docs API (uploading the PDF and then downloading it as text), or could also try something like gocr. I've had a lot of luck parsing image text with gocr in the past, and you'd just have to bounce out to the shell to do it, like gocr -i whatever.pdf (I think it works with PDFs).
The downside to all of these is that they're not pure-Ruby implementations, but lots of the good (and free) OCR projects seem to be done that way.
If you just need to get the text content out of a pdf file, pdftohtml at sourceforge is efficient.
it is not suited for dealing with images.