Chinese and Japanese character encoding issues when exporting HTML to PDF - ruby-on-rails

I run a web-based timeline maker that lets users create timelines in HTML/JavaScript and then export them to PDF files for printing when they're done.
I have had several users report issues with exporting their timelines to PDFs when the timelines contain certain Unicode characters. Here, for example, is a screenshot showing the web page and the PDF file that is generated:
I've been trying to wrap my head around why some Unicode character blocks like Block Elements and Georgian will export but Chinese and Japanese will not. Also, the export works correctly when I perform it on my local computer, but results in the above output when exporting on Heroku.
Does anyone know what might be causing this?
For completeness, the backend is in Ruby on Rails and it uses the PDFKit gem to convert the HTML page to a PDF and the site is hosted on Heroku.

It sounds like it might be an issue with the fonts on the server. The webpage version of the timeline renders correctly because you obviously have the correct font on the client machine that is running the browser. The PDF on the other hand is generated on the server, and thus has to use a font available to it there.
If that's the case, then using a font that both exists on the server and supports the correct CJK characters should fix this issue.

Having personally experienced this with Rails and Heroku, I can tell you the reason is either (A) fonts on your system not matching the fonts on Heroku, or (B) pdfkit having trouble loading custom fonts linked through CSS, or some combination of both
Most likely, you are referencing fonts on your local system (which contain glyphs for special characters) that don't match the fonts on Heroku. Run fc-list in Heroku's bash to get a list of their installed fonts, and substitute your font(s) for one that has the needed extended charset. However, now you will have to ensure that this font is also installed on your local machine. (Even worse, you could use different fonts for dev and production.)
You can also try uploading fonts to Heroku, and linking them from there. However, I've found this method to be unreliable when spanning across multiple systems or dev/staging/production environments, because each and every system has to have the required fonts installed. And even then, PDFkit makes you jump through hoops to get CSS fonts to work (for example, because of subtle variation in interpretation of font names by different operating systems).
The best solution I've found is to encode and embed fonts directly into CSS. Base-64 encode a font, and add it to the stylesheet:
#font-face {
font-family: 'OpenSans';
src: url(data:font/truetype;charset=utf-8;base64,AAEAAAATAQA...
}
Now you have a bulletproof stylesheet that's portable and self-compatible with every system.

If you do use Docker and is having the same issue above then try installing Japanese fonts at Docker: apt-get install fonts-takao-mincho
If it works then add it to your Dockerfile:
apt update && apt install -y \
# japanese fonts
fonts-takao-mincho

Related

Latex generated pdf unreadable

Of late, I have observed that pdf generated by latex files are unreadable in certain email browsers (when previewing the attachment in Outlook) as well as the printed hard copy especially math symbols like inner products, integral etc overlap with each other making the file ugly and unreadable. Surprisingly the same file looks perfectly fine when viewed using the ShareLatex built-in pdf browser as well as the desktop version of Adobe Reader.
ShareLatex documentation suggest switching the PDF viewer from built-in to native. Upon changing to native, even the browser version had unreadable characters.
[https://www.sharelatex.com/learn/Kb/Changing_PDF_viewer]
So, I would like to know if there is better way to compile the tex file in Sharelatex so that its readable across platforms and in print.
Most of the "pdf generation from tex" related issues posted on StackOverflow point out problems with viewing images. As such the pdf files I am generating don't contain any images.
Thanks in advance !
AFAIK there's not a single build-in PDF viewer (browser, e-mail client, ...) that works well. But what you could test is if \usepackage{lmodern} makes things better ...

Generating a docx file using Pandoc: images missing! Due to multiple requests?

I'm generating a markdown document using my Rails 4.2 app which includes images that are on the same server (in the public folder).
Using pandoc (pandoc-ruby 1.0.0), I want to convert the document into various formats, especially HTML (to preview it in the browser) and DOCX (to download it).
The preview in the browser works perfect. But when converting to DOCX, the images aren't included. I guess this is due to multiple requests to referenced images while pandoc is generating the document.
I have already experimented with setting allow_concurrency to true, but this didn't solve the problem. Also, it happens on both the development and the production environment (while in development, it takes a long time, and in production it doesn't - maybe due to some differences in timeout limits?).
I have already found a way to solve my problem by not referencing the images using an URL, but by embedding it as base64 string into the document. But this for sure can't be the solution of choice, as it tends to bloat up the HTML document a lot. Also, on production, I already get RuntimeError (Stack space overflow: current size 8388608 bytes) from pretty small embedded images. So I have to find a real solution.
Reference the images by file path instead of url if they are on the same server.

Cross platform way to get icon of firefox exe

I'm trying to get the path to the default firefox executable cross platform.
I tried the way recommended here:
https://stackoverflow.com/a/24056586/1828637
However its not working on mac os or linux
on mac it shows this: http://i.imgur.com/xu5GrF8.png
on linux (tested on ubuntu 14) it shows this: http://i.imgur.com/QxWKxbH.png
I was hoping to get the .xpm on linux and the .icns on mac os and the .ico on windows which is the container file, meaning like not just .ico of the single 64x64 image but contain all files please.
Thanks
Your title and question ask two different things, which is a bit confusing. I am not clear on if you want just a way to find the Firefox executable, or a way to extract the currently used (or default?) icon from Firefox.
Icon files:
If you are just looking for a URL to use within Firefox, they should be located at:
chrome://branding/content/icon128.png
chrome://branding/content/icon64.png
chrome://branding/content/icon32.png
chrome://branding/content/icon16.png
They do not appear to exist in .ico files within the Firefox distribution. In fact there are only 4 .ico files in the entire distribution. They are all within the browser/omni.ja file at (windows assumed as primary based on your statements in prior questions):
chrome\browser\skin\classic\browser\customizableui\customizeFavicon.ico
chrome\browser\skin\classic\aero\browser\customizableui\customizeFavicon.ico
chrome\browser\skin\classic\browser\preferences\in-content\favicon.ico
chrome\browser\skin\classic\aero\browser\preferences\in-content\favicon.ico
omni.ja files are just zip format files with the extension changed to .ja instead of .zip. You can change the file extension back to .zip and read it with any appropriate archive handler.
The chrome:// URLs are:
chrome://skin/customizableui/customizeFavicon.ico
chrome://skin/preferences/favicon.ico
I think you can only get access to two of them at a time through chrome://skin/ depending on if you are using aero. If you really need access to both you could use nsiZipReader to open the actual omni.ja file.
Executable file:
You already had a batter way to get the executable file. From your comment it is:
FileUtils.getFile('XREExeF', []);

DataTables, PDF and special characters

I am using DataTables and the TableTools PDF export function. The PDF-export does not take care of certain special characters and translate them into rubbish (or ISO equivalences, i guess). The characters are '●' ●, '○' ○, and '‭٭‭' ٭.
Is there any way to define the character set for the PDF so I can preserve those special characters? (I'm guessing that character set is the problem) Or any other workaround?
No, there isn't a way to configure the character set for the PDF. DataTables, or specifically its TableTools add-on, uses a fairly limited Flash-based PDF exporter.
You can, however, edit the ActionScript used to make the TableTools Flash add-on.
Download TableTools and look in the archive's \media\as3 directory for .as files.
If you don't have Adobe's software for Flash authoring, you might try the open source Adobe Flex.
A late answer (to my self) but others could benefit. I figured out to use mPDF instead. It supports UTF8, languages with special characters and embedded stylesheets.

How to convert a string to pdf file in Python without using temp txt-file on HDD?

I have a big library in plain txt-format.
I need to convert these files into pdf format (from inside Python script, not from command-line), but previously I need to make some manipulations on the original files' text.
I'm just reading the files' content into string, make the needed changes, and then I want to output the changed string into pdf-file, but without creating temporary text file on HDD.
Is there any way to do that?
Thanks in advance.
P.S. BTW, the library is in Russian, so I suppose I'll need to take care of encodings?
use the ReportLab toolkit: http://www.reportlab.com/software/opensource/rl-toolkit/
(it is also on PyPi: pip install reportlab; or if you are running Linux use the package manager)
The default built-in fonts of PDF do not support Russian, so you will have to do something
like:
canvas.setFont('DejaVuSans',10)
(replace 'DejaVuSans' with an installed font name you know has your characters in it).
This will incorporate that font in your PDF and make the resulting file about 20K bigger than without.
It is also possible to generate the PDF to memory, if that is necessary.

Resources