wicked_pdf shows unknown character on unicode pdf conversion (ruby)

wicked_pdf shows unknown character on unicode pdf conversion (ruby) - ruby-on-rails

I'm trying to create a pdf from a html page using wicked_pdf (version 1.1) and wkhtmltopdf-binary gems.
My html page contains a calendar emoji that displays well in the browser whatever font I use
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta http-equiv='content-type' content='text/html; charset=utf-8' />
<style>
unicode {
font-family: 'OpenSansEmoji', sans-serif;
}
#font-face {
font-family: 'OpenSansEmoji';
src: url(data:font/truetype;charset=utf-8;base64,<-- encoded_font_base64_string-->) format('truetype');
}
</style>
</head>
<body>
<div><unicode>📅</unicode></div>
</body>
</html>
However, when I try to generate the PDF using the WickedPdf.new.pdf_from_html_file method of the gem in the rails console,
File.open(File.expand_path('~/<--pdf_filename-->.pdf'), 'wb+') {|f| f.write WickedPdf.new.pdf_from_html_file('<--absolute_path_of_html_file-->')}
I get the following result:
PDF result with unknown character
As you can see, the first calendar icon is properly displayed, however there is a second character that is displayed, we do not know where it's coming from.
I have investigated through encoding in UTF-8 and UTF-16 and surrogate pair as suggested by this related post stackoverflow_emoji_wkhtmltopdf and looked at this issue wkhtmltopdf_git_issue but still can't make this character disappear!
If you have any clue, it's more than welcome.
Thanks in advance for your help!
EDIT
Following the comments from Eric Duminil and petkov.np, I can confirm - the code above works for me properly on Linux. Seems like this is a Linux vs MacOS issue. Can anyone suggest what the core of the issue in MacOS binding and whether it can be fixed?

I've edited this answer several times, please see the notes at the end as well as the comments.
I'm using macOS 10.12.2 and have the same issue. I'm listing all the browser etc. versions, although I suspect the biggest factor is the OS/wkhtmltopdf build.
Chrome: Version 55.0.2883.95 (64-bit)
Safari: Version 10.0.2 (12602.3.12.0.1)
wkhtmltopdf: 0.12.3 (with patched qt)
I'm using the following example snippet:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html" charset="utf-8">
<style type="text/css">
p {
font-family: 'EmojiSymbols', sans-serif;
}
#font-face {
font-family: 'EmojiSymbols';
src: local('EmojiSymbols-Regular.woff'), url('EmojiSymbols-Regular.woff') format('woff');
}
span:before {
content: '\01F60B';
}
</style>
</head>
<body>
<p>
😋
<span></span>
😋
😋
ð
</p>
</body>
</html>
I'm calling wkhtmltopdf with the --encoding 'UTF-8' option.
You can see the rendered result here (I'm sorry for the lame screenshot). Some brief conclusions:
Safari doesn't render the 'raw' UTF-8 bytes properly. It seems to treat them just as the raw byte sequence (last line in the html paragraph).
Safari renders everything fine.
Chrome renders everything fine.
With the above option, wkhtmltopdf renders the raw bytes (sort of) ok, but doesn't render the CSS content attribute properly. Every 'proper' occurrence of the unicode symbol is followed by this strange phantom symbol.
I've tried literally everything but the results are the same. For me, the fact that even Safari doesn't render the raw bytes properly indicates some system-level problem that is macOS specific. It's unclear to me wether this should be reported as a wkhtmltopdf issue or there is some misbehaved dependency in the macOS build.
EDIT: Safari seems to work fine, my markup was broken.
EDIT: A CSS workaround may do the trick, please check the comments below.
FINAL EDIT: As shown in the comments, the CSS 'hack' that solves the issues is using text-rendering: optimizeLegibility;. This seems to only be needed on macOS/OS X.
From my comment below:
I just found this issue. It seems irrelevant at first glance, but adding text-rendering: optimizeLegibility; to my styles removed the duplicate characters (on macOS). Why this happens is beyond me. As the issue author also uses
osx, it's apparent there is some problem withwkhtmltopdf builds for this os.

Related

Why do some characters such as "ç" look different from other characters?

I've got a French text on a website using "Nunito" from Google Fonts.
On Safari, I found out that my text had bolder letters for signs such as "ç" or "é". Looking again, I realized they also differ on other browser, not just as much.
I've tried including the font in different ways (link, font-face), nothing does the trick.
<html>
<head>
<meta charset="utf-8">
<link href="https://fonts.googleapis.com/css?family=Nunito:700&display=swap" rel="stylesheet">
<style>
body {
font-size:20px;
font-family: 'Nunito', Arial, sans-serif;
}
</style>
</head>
<body>
comment ça marche ?
</body>
</html>
In the example, the "ç" looks off.

At some point, I went and typed some text on Google Fonts directly, and it looked right.
That got me thinking... And trying at my example again.
Bing!
The text I had was copied/pasted from what the marketing sent me. That text didn't work, while "typed" text did.
The "ç" in the text I had was charcode 99 ("c") followed by 807 (the cedilla below it). Chrome and Firefox did attach both in an odd way, but it kind of worked, but Safari just ignored it and took the whole sign from Arial.
The "ç" I typed in Google Fonts for text was the code 231, which is a single character from Latin encoding.

ITMS-9000 "element "img" not allowed here; expected..."

Trying to get an ePub file to pass through Apple's ePub checker but get two errors multiple times.
(1) element "img" not allowed here; expected the element...
This is the coding on the page:
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<link href="../Styles/Style.css" type="text/css" rel="stylesheet"/>
<title></title>
</head>
<body>
<h2>Tokyo</h2>
<p>Japan is made up of five main islands: Hokkaido, Honshu, Shikoku, Kyushu, and Okinawa. Over three-quarters of the 127 million people in Japan live on Honshu, the largest and most developed island. Tokyo, the capital, lies on its eastern shore.</p>
<img alt="Tokyo Metropolis" src="../Images/Tokyo-Metropolis.jpg"/>
<p>Tokyo Metropolis, one of Japan’s 47 prefectures, is comprised of two areas: the <a class="hook" id="Special-Wards-23">23 special wards</a>, which together make up what most consider to be Tokyo, and the rest—the cities and towns that lie to the west. It is best thought of as a constellation of cities that have, over the course of time, merged into one vast urban sprawl which is home to over 13 million people.</p>
I have the alt tag inserted correctly and it displays correct in iBooks.
CSS for img is as follows:
img
{
display: block;
margin-left: auto;
margin-right: auto;
margin-top: 15px;
margin-bottom: 15px;
padding: 1px;
border: 1px solid #021a40;
background-color: #FFFFFF;
}
I've looked around at numerous forums but am none the wiser as to why I'm getting this error.
(2) Same error but in relation to tags ("element "ul" not allowed here; expected end-tag or element "li"...")
Html here...
<ul>
<li><b>Introduction</b></li>
<ul>
<li>Tokyo</li>
<li>A Brief History</li>
<ul>
<li>The Emergence of Japan</li>
[Html cut short as it is a table of contents and long].
I think this is because I have nested lists, but this works perfectly in iBooks so I don't know why it is causing an error at validation.
I'd be very grateful for some help!

The second one is clear: lists can only contain list items. That's how it is.
You say "this works perfectly in iBooks" but that's not true. It doesn't work perfectly. It's just that the app's error handling routines happen to handle this in such a way that the result looks roughly like what you expected. This will not be the same on other machines, other versions of the app etc. Avoid such errors.
The first error message is more subtle.
What version of HTML does the file identify itself as?
If it's XHTML 1.x or HTML 4.x strict, then plain text and inline elements are officially not allowed at the body level. Don't ask me why, I don't know.
If the file version is HTML 4.01 Transitional or HTML5 (or the XHTML equivalents) then images as children of the body are fine.
If anybody can tell me why this difference exists, I'd be delighted!
As for a solution, if you can't change the HTML version to HTML5 or XHTML5, then simply putting everything in the body in one big div will do the trick. Just put <div> right after the <body> and </div> just before the </body>.

Display '⤭' in iOS safari through CSS content property

The html looks like this:
<html>
<head>
<style type="text/css">
h1:before
{
content: '\292d';
}
</style>
</head>
<body>
<h1>Sample Text</h1>
</body>
</html>
So, I've already converted the '⤭' character to ASCII which shows fine in my desktop's browser; however, on iPhone, it's blank!

The problem could be with content: '\292d'; even though they say content is supported with safari 1.0 and up it still does not work properly.
i used it for displaying images and it used to show up in inspect element but not in browser window, the entity '\292d' is infact supported
instead Try putting it directly inside the tag, or use javascipt if you want it to be dynamically inserted

Strange extra characters in rendered html on IE 8

I have an ASP.Net MVC site that I want to render some custom HTML 5 canvasses in. I am getting a strange issue with the server serving up extra characters that are not in the source code.
In order to use an HTML 5 canvas in IE 8 you have to add the following tag in the html head:
<!--[if IE]><script src="../../Scripts/excanvas.js"></script><![endif]-->
For some reason this is served up as:
<!--[if IE]>IE]><script src="../../Scripts/excanvas.js"></scr<![endif]-->
Of course the duff markup causes the excanvas script to not be loaded by IE. I can't understand why the line gets garbled. I have the following doctype which is documented at http://www.w3schools.com/html5/tag_doctype.asp:
<!DOCTYPE html>
I'm not familiar with using HTML 5 or the new doctype so I'm suspicious of it. I'm also hosting on Apache with Mono so maybe that's what's garbling the line.
The page in question is at: http://openancestry.org/FamilyTree/Simpsons
Anyone seen this before or know why I cant use the "if IE" syntax?
UPDATE:
Well I'm pretty sure it's either Mono or Apache thats garbling the HTML so I've used the workaround below which adds a compatibility meta tag for IE8 and includes excanvas for any IE that predates IE9.
I'd still appreciate any answers on why the HTML gets garbled.
<% if (Request.Browser.Browser.Contains("IE") && float.Parse(Request.Browser.Version) < 9) { %>
<% if (float.Parse(Request.Browser.Version) > 7) { %>
<meta http-equiv="X-UA-Compatible" content="IE=7" />
<% } %>
<script type="text/javascript" src="../../Scripts/excanvas.js"></script>
<% } %>

Before I answer, I want to point out that you are missing type="text/javascript" in your example.
It is possible that the ASP.NET parser in Mono is mangling your comment. What version of Mono are you using (and what platform I suppose).
I just tried this on Mono 2.10 on Mac and did not have this problem.

Printing in IE8 Has #href contents inline

Can someone tell me how to stop IE8 printing the value of the href for an A tag next to the text. For example this markup
Some Link
When printed comes out as
Some Link(/site/page.html)
when printed. How can I stop this?

This doesn't happen for me in IE8 and I've never spotted it. I also can't find it in the Internet Options anywhere.
It is possible that you have some software on your computer that does this, for example AVG Anti-Virus adds content to web pages to tell you that it has checked the links being displayed for potentially harmful content - so your system-security software may be expanding all links to show you where they actually point, to prevent phishing attacks.
If you do have some anti-phishing software on your machine, you'll have to find the option within that.
Update - It is almost certainly some clever CSS.
I have created the following test page to demonstrate how you can add the URL to a link using CSS generated content. If this was used within a print stylesheet, this would explain how the URL is getting added to the link when you are printing the page. To stop this, you would have to save a copy of the web page, remove the style rule from the print-only style sheet and then open your copy and print it!
<html>
<head>
<title>Test</title>
<style type="text/css">
a:after {
content: " [" attr(href) "] ";
}
</style>
</head>
<body>
<h1>Test</h1>
<p>This is a test to see if this
Link Shows A URL</p>
</body>
</html>

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

wicked_pdf shows unknown character on unicode pdf conversion (ruby) - ruby-on-rails

Related

Why do some characters such as "ç" look different from other characters?

ITMS-9000 "element "img" not allowed here; expected..."

Display '⤭' in iOS safari through CSS content property

Strange extra characters in rendered html on IE 8

Printing in IE8 Has #href contents inline

Categories

Resources