DataTables, PDF and special characters - character-encoding

I am using DataTables and the TableTools PDF export function. The PDF-export does not take care of certain special characters and translate them into rubbish (or ISO equivalences, i guess). The characters are '●' ●, '○' ○, and '‭٭‭' ٭.
Is there any way to define the character set for the PDF so I can preserve those special characters? (I'm guessing that character set is the problem) Or any other workaround?

No, there isn't a way to configure the character set for the PDF. DataTables, or specifically its TableTools add-on, uses a fairly limited Flash-based PDF exporter.
You can, however, edit the ActionScript used to make the TableTools Flash add-on.
Download TableTools and look in the archive's \media\as3 directory for .as files.
If you don't have Adobe's software for Flash authoring, you might try the open source Adobe Flex.

A late answer (to my self) but others could benefit. I figured out to use mPDF instead. It supports UTF8, languages with special characters and embedded stylesheets.

Related

Latex generated pdf unreadable

Of late, I have observed that pdf generated by latex files are unreadable in certain email browsers (when previewing the attachment in Outlook) as well as the printed hard copy especially math symbols like inner products, integral etc overlap with each other making the file ugly and unreadable. Surprisingly the same file looks perfectly fine when viewed using the ShareLatex built-in pdf browser as well as the desktop version of Adobe Reader.
ShareLatex documentation suggest switching the PDF viewer from built-in to native. Upon changing to native, even the browser version had unreadable characters.
[https://www.sharelatex.com/learn/Kb/Changing_PDF_viewer]
So, I would like to know if there is better way to compile the tex file in Sharelatex so that its readable across platforms and in print.
Most of the "pdf generation from tex" related issues posted on StackOverflow point out problems with viewing images. As such the pdf files I am generating don't contain any images.
Thanks in advance !
AFAIK there's not a single build-in PDF viewer (browser, e-mail client, ...) that works well. But what you could test is if \usepackage{lmodern} makes things better ...

How do I convert HTML into document form? [duplicate]

This question already has answers here:
Convert HTML to word file ?
(2 answers)
Closed 8 years ago.
I'd like to be able to convert HTML to either docx or RTF. There are plenty of Ruby gems for creating docx and RTF docs, but they are just for creating an empty document, which you can then programmatically add stuff to.
The issue with those gems is there is no way to accurately convert the format of a webpage to be the same/similar on a printable page. There are a lot of complexities with HTML tags, and the position of those tags due to their CSS attributes.
With my current knowledge of the gems out there for RTF and Word creation, I'd have to write an HTML parser and convert all the HTML tags to similar openXML tags, such as bold, and italic, but then position things based on the CSS, but due to position: relative/absolute rendering a document page would be extremely difficult.
I'm wondering if there are any recent developments, or if there is some soon-to-be-released gem or service or tool to be able to handle this conversion.
There is a gem that is supposed to convert Word to and from HTML, but, it has no documentation, and can only be found at https://www.ruby-toolbox.com/gems/word_parsing and on rubygems. And, I've been unsuccessful installing it on my local machine, due to dependency issues. Since there is no documentation, there is no mention how to fix the dependencies.
There are services out there that will convert PDF to "word", and converting HTML to PDF has already been solved by multiple people or gems. This service: http://www.pdftoword.com/ converts PDF to RTF, and even separates out the images in the resulting document. Their issue is that it runs on a Windows server -- I need something cross platform, because the app I'm working on is Ruby on Rails running on Unix based servers.
I've published a little gem that generates docx files from html templates.
https://github.com/docxtor/docxtor
It can insert page numbers, footers/headers with given <div>'s contains, translate <h1> headings to document headings.
The catch is that all word processors parse docx format differently. So the resulting files are read just fine by Libre Office on Mac, but wouldn't open in Google Docs.
Any help and/or feedback on a gem is much appreciated!
I'm also looking for this kind of solution, I think it's better looking at on https://github.com/bagilevi/docx_builder. I haven't tried it yet however. Read this article also http://rubythings.blogspot.com/2011/05/creating-word-documents-in-rails.html
If someone could come up with a better solution, we all would be thankful :)

Customizable / Dynamic SWF generation

Wondered if anybody knows how customizable Flash swf files are made, where there appears to be a template swf that the user can then input some changes (eg text or image) and receives a newly-compiled swf file with their changes.
Some examples:
- http://flashfreezer.com/landingconfetti/index.html
Constraints:
- user receives a single output swf file that can be played with all their changes included. ie there is no reading from an xml file, or using Flashvars.
Been trying different things for a few weeks with no luck!
There are a number of ways, but generally the most common is to either use a SWF generating library (like PHP's) or through server-side compiling.
Normally, this will be a custom or proprietary library which uses the same language that the serve is running (and there are open-source libraries for this in PHP, Perl, Python, Java, C++... etc). The SWF is generated and served up with the appropriate headers so that the browser knows how to re-direct it. Often this will involve a pre-defined template which is then modified slightly for the new input. Only occasionally does this involve the manipulation of pre-generated SWF directly.
The other option is to have a command line call to the Flash IDE or the Flex compiler (and, technically, this can work for CS3 and CS4, though in a very nasty and hackish way) to generate a new version of the SWF on the fly. This is often slower, but it will generally yield a more finished feel to a product.
You could try Swiffotron. It can modify SWF files and do text replace type things on both text elements and in compiled actionscript.
Here's a swiffotron xml job file that does some text replacing.
And here's a swiffotron XML job file that modifies instances on the stage.
I didn't check the site, but the only way I can think of is to read the requirement details through flash (this can be done through plain html also) and then generate the AS files from their templates and compile them at the server side (using mxmlc or other compilers) and give back the SWF.
I get the impression that you're looking for SwfMill. SwfMill creates a swf based on an XML file that you create/define. You could use SwfMill on the server to generate a swf based on user input.

Creating Microsoft Word (.docx) documents in Ruby

Is there an easy way to create Word documents (.docx) in a Ruby application? Actually, in my case it's a Rails application served from a Linux server.
A gem similar to Prawn but for DOCX instead of PDF would be great!
As has been noted, there don't appear to be any libraries to manipulate Open XML documents in Ruby, but OpenXML Developer has complete documentation on the format of Open XML documents.
If what you want is to send a copy of a standard document (like a form letter) customized for each user, it should be fairly simple given that a DOCX is a ZIP file that contains various parts in a directory hierarchy. Have a DOCX "template" that contains all the parts and tree structure that you want to send to all users (with no real content), then simply create new (or modify existing) pieces that contain the user-specific content you want and inject it into the ZIP (DOCX file) before sending it to the user.
For example: You could have document-template.xml that contains Dear [USER-PLACEHOLDER]:. When a user requests the document, you replace [USER-PLACEHOLDER] with the user's name, then add the resulting document.xml to the your-template.docx ZIP file (which would contain all the images and other parts you want in the Word document) and send that resulting document to the user.
Note that if you rename a .docx file to .zip it is trivial to explore the structure and format of the parts inside. You can remove or replace images or other parts very easily with any ZIP manipulation tools or programmatically with code.
Generating a brand new Word document with completely custom content from raw XML would be very difficult without access to an API to make the job easier. If you really need to do that, you might consider installing Mono, then use VB.NET, C# or IronRuby to create your Open XML documents using the Open XML Format SDK 1.0. Since you would just be using the Microsoft.Office.DocumentFormat.OpenXml.Packaging Namespace to manipulate Open XML documents, it should work okay in Mono, which seems to support everything the SDK requires.
Maybe this gem is interesting for you.
https://github.com/trade-informatics/caracal/
It like prawn but with docx.
You can use Apache POI. It is written in Java, but integrates with Ruby as an extension
This is an old question but there's a new answer. If you'd like to turn an HTML doc into a Word (docx) doc, just use the 'htmltoword' gem:
https://github.com/karnov/htmltoword
I'm not sure why there was answer creep and everyone started posting templating solutions, but this answers the OP's question. Just like Prawn, except Word instead of PDF.
UPDATE:
There's also pandoc and an API wrapper for pandoc called docverter. Both have slightly complicated installs since pandoc is a haskell library.
I know if you serve a HTML document as a word document with the .doc extension, it will open in Word just fine. Just don't do anything fancy.
Edit: Here is an example using classic ASP. http://www.aspdev.org/asp/asp-export-word/
Using a technique very similar to that suggested by Grant Wagner I have created a Ruby html to word gem that should allow you to easily output Word docx files from your ruby app. You can check it out at http://github.com/nickfrandsen/htmltoword - Simply pass it a html string and it will create a corresponding word docx file.
def show
respond_to do |format|
format.docx do
file = Htmltoword::Document.create params[:docx_html_source], "file_name.docx"
send_file file.path, :disposition => "attachment"
end
end
end
Hope you find it useful. If you have any problems with it feel free to open a github issue.
Disclosure: I'm the leader of the docxtemplater project.
I know you're looking for a ruby solution, but because all other solutions only tell you how to do it globally, without giving you a library that does exactly what you want, here's a solution based on JS or NodeJS (works in both)
DocxTemplater Library
Demo of the library
You can also use it in the commandline:
npm install docxtemplater -g
docxtemplater <configFile>
----config.docxFile: The input file in docx format
----config.outputFile: The outputfile of the document
This is a way Doccy (doccyapp.com) has a api that does just that which you can use. Supports docx, odt and pages and converts to PDF as well if you like
Further to Grant's answer, you can also send Word a "Flat OPC" file, which is essentially the docx unzipped and concatenated to create a single xml file. This way, you can replace [USER-PLACEHOLDER] in one file and be done with it (ie no zipping or unzipping).
If anyone is still looking at this, this post explains how to use an XML data source. This works nicely for me.
http://seroter.wordpress.com/2009/12/23/populating-word-2007-templates-through-open-xml/
Check out this github repo: https://github.com/jawspeak/ruby-docx-templater
It allows you to create a document from a word template.
If you're running on Windows, of course, it's a matter of WIN32OLE and some pain with the Word COM objects.
Chances are that your serving from a *nix environment, though. Word 2007 uses the "Microsoft Office Open XML" format (*.docx) which can be opened using the appropriate compatibility pack from Microsoft.
Some of the more recent Office apps (2002/XP and 2003 at least) had their own XML formats which may also be useable.
I'm not aware of any Ruby tools to make the process easier, sadly.
If it can be made acceptable, I think I'd be inclined to go down the renamed-html file route. I just saved a document as HTML from WordXP, renamed it to a .doc and opened it without problem.
I encountered the same problem. Unfortunately I could not manipulate the xml because my clients should themselves to fill in templates. And to do this is not always possible (for example, office for mac does not allow this).
As a solution to this problem, I made ​​a simple gem, which can be used as an rtf document template with embedded ruby: https://github.com/eicca/rtf-templater
I tested it and it works ok for filling reports and documents. However, formatting badly displays for complex loops and conditions.

Search Words in pdf files

Is it possible to search "words" in pdf files with delphi?
I have code with which I can search in many others files like (exe, dll, txt) but it doesn't work with pdf files.
It depends on the structure of the specific PDF.
If the pdf is made of images (scanned pages) then you have to OCR each image and build a full text index inside the PDF. (To see if its image based, open it with notepad and look for obj tags full of random chars). There are a few utilities and apps that do this kind of work for you, CVision PDF Compressor is one that I have used before.
If the pdf is a standard PDF, then you should be able to open it like any other text file and search for the words.
Here is page that will detail some of the structure of a PDF. This a SO post for the same.
The components/libraries mentioned in the answer to this question should do what you need.
I'm just working on a project that does this. The method I use is to convert the PDF file to plain text (with pdftotext.exe) and create an index on the resulting text. We do the same with word and other office files, works pretty good!
Searching directly into pdf files from Delphi (without external app) is more difficult I think. If you find anything, please update here as I would also be very interested in that!
One option I have used is to use Microsoft's ifilter technology, this is used by windows desktop search and many other products such as sharepoint and SQL server full-text search.
It supports almost any office/office-like file format, even dwg, msg, pdf, and files in zip/rar archives.
The easiest way to use it is to run FiltDump.exe on any files you have, and index the text output.
To know about the filters installed on your PC, you can use ifilter explorer.
Wikipedia has some links on its ifilters page.
Quick PDF Library's GetPageText function can give you the words from a PDF as well as the page number and the co-ordinates of those words - sometimes useful for highlighting.
PDF is not just a binary representation. Think of it as a tree of objects, where an object node has some metadata and some content information. Some of these objects have string data, some don't. Some of these are even encrypted, and some are compressed. So, there's very little chance your string finder will work on any arbitrary PDF.

Resources