Why do some characters such as "ç" look different from other characters? - character-encoding

I've got a French text on a website using "Nunito" from Google Fonts.
On Safari, I found out that my text had bolder letters for signs such as "ç" or "é". Looking again, I realized they also differ on other browser, not just as much.
I've tried including the font in different ways (link, font-face), nothing does the trick.
<html>
<head>
<meta charset="utf-8">
<link href="https://fonts.googleapis.com/css?family=Nunito:700&display=swap" rel="stylesheet">
<style>
body {
font-size:20px;
font-family: 'Nunito', Arial, sans-serif;
}
</style>
</head>
<body>
comment ça marche ?
</body>
</html>
In the example, the "ç" looks off.

At some point, I went and typed some text on Google Fonts directly, and it looked right.
That got me thinking... And trying at my example again.
Bing!
The text I had was copied/pasted from what the marketing sent me. That text didn't work, while "typed" text did.
The "ç" in the text I had was charcode 99 ("c") followed by 807 (the cedilla below it). Chrome and Firefox did attach both in an odd way, but it kind of worked, but Safari just ignored it and took the whole sign from Arial.
The "ç" I typed in Google Fonts for text was the code 231, which is a single character from Latin encoding.

Related

wicked_pdf shows unknown character on unicode pdf conversion (ruby)

I'm trying to create a pdf from a html page using wicked_pdf (version 1.1) and wkhtmltopdf-binary gems.
My html page contains a calendar emoji that displays well in the browser whatever font I use
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta http-equiv='content-type' content='text/html; charset=utf-8' />
<style>
unicode {
font-family: 'OpenSansEmoji', sans-serif;
}
#font-face {
font-family: 'OpenSansEmoji';
src: url(data:font/truetype;charset=utf-8;base64,<-- encoded_font_base64_string-->) format('truetype');
}
</style>
</head>
<body>
<div><unicode>📅</unicode></div>
</body>
</html>
However, when I try to generate the PDF using the WickedPdf.new.pdf_from_html_file method of the gem in the rails console,
File.open(File.expand_path('~/<--pdf_filename-->.pdf'), 'wb+') {|f| f.write WickedPdf.new.pdf_from_html_file('<--absolute_path_of_html_file-->')}
I get the following result:
PDF result with unknown character
As you can see, the first calendar icon is properly displayed, however there is a second character that is displayed, we do not know where it's coming from.
I have investigated through encoding in UTF-8 and UTF-16 and surrogate pair as suggested by this related post stackoverflow_emoji_wkhtmltopdf and looked at this issue wkhtmltopdf_git_issue but still can't make this character disappear!
If you have any clue, it's more than welcome.
Thanks in advance for your help!
EDIT
Following the comments from Eric Duminil and petkov.np, I can confirm - the code above works for me properly on Linux. Seems like this is a Linux vs MacOS issue. Can anyone suggest what the core of the issue in MacOS binding and whether it can be fixed?
I've edited this answer several times, please see the notes at the end as well as the comments.
I'm using macOS 10.12.2 and have the same issue. I'm listing all the browser etc. versions, although I suspect the biggest factor is the OS/wkhtmltopdf build.
Chrome: Version 55.0.2883.95 (64-bit)
Safari: Version 10.0.2 (12602.3.12.0.1)
wkhtmltopdf: 0.12.3 (with patched qt)
I'm using the following example snippet:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html" charset="utf-8">
<style type="text/css">
p {
font-family: 'EmojiSymbols', sans-serif;
}
#font-face {
font-family: 'EmojiSymbols';
src: local('EmojiSymbols-Regular.woff'), url('EmojiSymbols-Regular.woff') format('woff');
}
span:before {
content: '\01F60B';
}
</style>
</head>
<body>
<p>
😋
<span></span>
😋
😋
😋
</p>
</body>
</html>
I'm calling wkhtmltopdf with the --encoding 'UTF-8' option.
You can see the rendered result here (I'm sorry for the lame screenshot). Some brief conclusions:
Safari doesn't render the 'raw' UTF-8 bytes properly. It seems to treat them just as the raw byte sequence (last line in the html paragraph).
Safari renders everything fine.
Chrome renders everything fine.
With the above option, wkhtmltopdf renders the raw bytes (sort of) ok, but doesn't render the CSS content attribute properly. Every 'proper' occurrence of the unicode symbol is followed by this strange phantom symbol.
I've tried literally everything but the results are the same. For me, the fact that even Safari doesn't render the raw bytes properly indicates some system-level problem that is macOS specific. It's unclear to me wether this should be reported as a wkhtmltopdf issue or there is some misbehaved dependency in the macOS build.
EDIT: Safari seems to work fine, my markup was broken.
EDIT: A CSS workaround may do the trick, please check the comments below.
FINAL EDIT: As shown in the comments, the CSS 'hack' that solves the issues is using text-rendering: optimizeLegibility;. This seems to only be needed on macOS/OS X.
From my comment below:
I just found this issue. It seems irrelevant at first glance, but adding text-rendering: optimizeLegibility; to my styles removed the duplicate characters (on macOS). Why this happens is beyond me. As the issue author also uses
osx, it's apparent there is some problem withwkhtmltopdf builds for this os.

ITMS-9000 "element "img" not allowed here; expected..."

Trying to get an ePub file to pass through Apple's ePub checker but get two errors multiple times.
(1) element "img" not allowed here; expected the element...
This is the coding on the page:
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<link href="../Styles/Style.css" type="text/css" rel="stylesheet"/>
<title></title>
</head>
<body>
<h2>Tokyo</h2>
<p>Japan is made up of five main islands: Hokkaido, Honshu, Shikoku, Kyushu, and Okinawa. Over three-quarters of the 127 million people in Japan live on Honshu, the largest and most developed island. Tokyo, the capital, lies on its eastern shore.</p>
<img alt="Tokyo Metropolis" src="../Images/Tokyo-Metropolis.jpg"/>
<p>Tokyo Metropolis, one of Japan’s 47 prefectures, is comprised of two areas: the <a class="hook" id="Special-Wards-23">23 special wards</a>, which together make up what most consider to be Tokyo, and the rest—the cities and towns that lie to the west. It is best thought of as a constellation of cities that have, over the course of time, merged into one vast urban sprawl which is home to over 13 million people.</p>
I have the alt tag inserted correctly and it displays correct in iBooks.
CSS for img is as follows:
img
{
display: block;
margin-left: auto;
margin-right: auto;
margin-top: 15px;
margin-bottom: 15px;
padding: 1px;
border: 1px solid #021a40;
background-color: #FFFFFF;
}
I've looked around at numerous forums but am none the wiser as to why I'm getting this error.
(2) Same error but in relation to tags ("element "ul" not allowed here; expected end-tag or element "li"...")
Html here...
<ul>
<li><b>Introduction</b></li>
<ul>
<li>Tokyo</li>
<li>A Brief History</li>
<ul>
<li>The Emergence of Japan</li>
[Html cut short as it is a table of contents and long].
I think this is because I have nested lists, but this works perfectly in iBooks so I don't know why it is causing an error at validation.
I'd be very grateful for some help!
The second one is clear: lists can only contain list items. That's how it is.
You say "this works perfectly in iBooks" but that's not true. It doesn't work perfectly. It's just that the app's error handling routines happen to handle this in such a way that the result looks roughly like what you expected. This will not be the same on other machines, other versions of the app etc. Avoid such errors.
The first error message is more subtle.
What version of HTML does the file identify itself as?
If it's XHTML 1.x or HTML 4.x strict, then plain text and inline elements are officially not allowed at the body level. Don't ask me why, I don't know.
If the file version is HTML 4.01 Transitional or HTML5 (or the XHTML equivalents) then images as children of the body are fine.
If anybody can tell me why this difference exists, I'd be delighted!
As for a solution, if you can't change the HTML version to HTML5 or XHTML5, then simply putting everything in the body in one big div will do the trick. Just put <div> right after the <body> and </div> just before the </body>.

Display '⤭' in iOS safari through CSS content property

The html looks like this:
<html>
<head>
<style type="text/css">
h1:before
{
content: '\292d';
}
</style>
</head>
<body>
<h1>Sample Text</h1>
</body>
</html>
So, I've already converted the '⤭' character to ASCII which shows fine in my desktop's browser; however, on iPhone, it's blank!
The problem could be with content: '\292d'; even though they say content is supported with safari 1.0 and up it still does not work properly.
i used it for displaying images and it used to show up in inspect element but not in browser window, the entity '\292d' is infact supported
instead Try putting it directly inside the tag, or use javascipt if you want it to be dynamically inserted

How can I print a multi-page report in my web application?

I have a web application that produce its reports in HTML format. Sometimes these reports become very much and the window shows scrollbars.
The problem I have is that I can just print what I see in the web page, and whenever I want to print them, I have not more than 1 page to print. So I lose other repots that I expect to be in other papers.
What do I have to do ?
The nature of the problem
Your problem is associated with styling.
It is hard to tell what exactly your problem is - we did not have a chance to see your stylesheets. For sure you should rewrite them to not crop the pages.
Apply different stylesheets to screen and print
One idea is to change current stylesheet to be applied only to screen media and apply different one specifically to printed media.
You can do it like that in HTML:
<link rel="stylesheet" type="text/css" media="screen" href="screen.css" />
<link rel="stylesheet" type="text/css" media="print" href="print.css" />
or like that in CSS (example from W3C):
#import url("fancyfonts.css") screen;
#media print {
/* style sheet for print goes here */
}
Print-specific styling
For details on print-specific styles see the following page: http://www.w3.org/TR/CSS2/page.html
In your case the following styling may become useful:
table { page-break-inside: auto; }
tr { page-break-inside: avoid; page-break-after: auto; }
thead { display: table-header-group; }
tfoot { display: table-footer-group; }
It will allow for page breaks inside the table, will try to avoid page breaks inside rows, and will repeat both headers and footers of the table on each page. However, check whether it works in your target browsers, to be sure.
If you are using ASP.NET, use Crystal Reports.
If you are using Java EE, use JasperReports.
If you are using PHP, use FPDF (there may be something else too).
These tools are better for bulding reports rather than pure HTML.

Printing in IE8 Has #href contents inline

Can someone tell me how to stop IE8 printing the value of the href for an A tag next to the text. For example this markup
Some Link
When printed comes out as
Some Link(/site/page.html)
when printed. How can I stop this?
This doesn't happen for me in IE8 and I've never spotted it. I also can't find it in the Internet Options anywhere.
It is possible that you have some software on your computer that does this, for example AVG Anti-Virus adds content to web pages to tell you that it has checked the links being displayed for potentially harmful content - so your system-security software may be expanding all links to show you where they actually point, to prevent phishing attacks.
If you do have some anti-phishing software on your machine, you'll have to find the option within that.
Update - It is almost certainly some clever CSS.
I have created the following test page to demonstrate how you can add the URL to a link using CSS generated content. If this was used within a print stylesheet, this would explain how the URL is getting added to the link when you are printing the page. To stop this, you would have to save a copy of the web page, remove the style rule from the print-only style sheet and then open your copy and print it!
<html>
<head>
<title>Test</title>
<style type="text/css">
a:after {
content: " [" attr(href) "] ";
}
</style>
</head>
<body>
<h1>Test</h1>
<p>This is a test to see if this
Link Shows A URL</p>
</body>
</html>

Resources