Character Encoding Issue in XML - character-encoding

This word is causing me problems. Brúðkaup
In my cms, at the top of the webpage I have this line.
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
The database stores the word above as Brúðkaup and has a charset of latin1
At the top of my xml file I have the following:
<?xml version="1.0" encoding="UTF-8" ?>
Is the database using the wrong character encoding? Even if it is. Why is it that in html I specify the utf8 charset and the word shows correctly. Yet in XML I do something similar and it doesn't?
The XML is generated by PHP. I have tried to add the following in my script.
header('charset=utf-8');
This doesn't make any difference. Any ideas?

Since the data is latin1 (= ISO-8859-1 or windows-1252) encoded, it needs to be converted to UTF-8 in order to be displayed on a page in UTF-8 encoding. The tools for this depend on the software you use to get data from the database and put it into an HTML or XML document.
If the HTML file shows correctly, then either such a conversion was made at some point, or the HTML file is actually interpreted, by a browser, as latin1 encoded. This would happen if the server sends HTTP Content-Type header that specifies charset parameter to that effect – HTTP headers override meta tags.

Related

why Special characters apostrophe and others shows like this ’, in HTMl file

I have a markdown file in UTF-8 without BOM encoding format[md file generated tool from word document] . Converted this markdown to HTML using jekyll tool. The following special characters available(apostrophe,hypen so on) in md file content .
1.example content in MD:
dont't, **ListView** control
Converted HTMl format like this:
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
</head>
<body>
<p>dont’t, <strong>ListView</strong> control</p>
</body>
</html>
We can get exact result dont’t, ListView control when open the html file. I want to use the same html file loaded in to ASP.NET MVC razor view through Html.Action. syntax given below
MVC Razor view access the html file via action method:
Html.Action("GetHtmlPage", "Products", new {path = "~/Views/Products/WhatsNew/" + Model.Platform + ".html"}))
Action code:
public ActionResult GetHtmlPage(string path)
{
return new FilePathResult(path, "text/html");
}
Using the above MVC syntax , i can successfully loaded HTMl file into my View. But the output are show below like in browser and HTMl template like previous format.
dont’t, ListView control
Apostrope viewed as', ’
 string added after bold element.
How to view the special characters in browser , when loaded html file into razor view.? I have sticking as long as today.
It appears that your HTML document is advertising itself as UTF-8. However, it if is not actually in UTF-8 format, or if the Markdown file is not in UTF-8, either could be causing the characters to not actually be UTF-8 encoded characters. So check the encodings of your files.
If that doesn't resolve the problem, then you need to use HTML Entities. Or you need to use ASCII text only for punctuation.
For example, look at the apostrophe in your sample HTML, note that it is slanted at an angle (a single right quote, unicode character U+2019) as opposed to the strait apostrophe (unicode character U+0027 - which is also an ASCII character).
Note that for those characters to display reliably in HTML documents, it is best to use the HTML Entities for those characters. Therefore, the markdown document should look this this:
Don’t, **ListView** control
The HTML entity ’ tells the browser to display a single right quote, unicode character U+2019.
Note that Markdown does not convert such characters to HTML entities for you. You have to do it yourself. You could use SmartyPants to do conversions, but it converts the ASCII characters to the richer characters as HTML entities. In that case, your Markdown should look like this:
Don't, **ListView** control
Of course, you could just use the ASCII characters and not bother with SmartPants if you want.
However, be aware that if you are using MS Word, that program is configured by default to replace the ASCII character you type (using the apostrophe key on you keyboard) with the fancy character automatically. It is generally recommended that a word processor (like MS Word) not be used for editing Markdown documents for this reason. Use a plain text editor instead.
If you really must use MS Word there are a few ways to disable the auto-replace behavior. See this for more info about how word processors act with these types of characters and how to disable that behavior.
I was having this same problem with a markdown to html converter (pandoc) and I found the solution here. Just adding the following header solved my issue:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

character set encoding issue with multi-language featured site

I am here suffering from a simple, common problem.
my site is multi-language featured, built in codeigniter framework.
for eg for a french language here i have used
$lang['login'] = 'ConnÈcter';
this then appeared as Conn�cter in the view.
then i solved this by adding
<meta charset="ISO-8859-1">
which then resolved the issue.
but when the contents is loaded with characters like
Sáenz-Mata & Jiménez-Bremont
then is is changed to
Sáenz-Mata & Jiménez-Bremont
note é is changed to é even when i use
<meta charset="ISO-8859-1">
when above meta is removed, it gives me Conn�cter when the language is converted to french.
so please suggest me something which can handle both situations.
hope somebody understands it.(got messed up describing.)
thanks.
use <meta charset="utf-8">
Use UTF-8 consistently for all pages, as explained in the CodeIgniter User guide. Make sure the encoding of each file matches its declared encoding. What you are experiencing now is caused by mixing encodings (UTF-8 and ISO-8859-1 mostly).

ASP MVC 3, UTF-8 HTML charset not showing Polish chars. (Razor)

As title says. Story is, I've changed meta mark-up of my _Layout.cshtml page from:
<meta charset="utf-8" />
to
<meta content="text/html; charset=utf-8">
Effect? No Polish characters on page. Ok, let's revert the change. Effect? No Polish characters on page.
Btw it affects ONLY _Layout.cshtml, all other views show Polish letters properly. Proper letters are replaced by "Ĺ‚" characters.
Any ideas? Thought about changing browser, but it didn't work. Same stuff happens on different computer.
No other changes were made. Tried to revert project to older version from repository, didn't work.
Opened in notebook and saved again wit UTF-8 encoding set. Worked.

strange UTF-8 byte encoding issue with Rails, IE, PostgreSQL, delayed_job

I'm seeing a relatively strange (and hard to diagnose) error with a combination of IE8, Rails 3.0.3, PostgreSQL and delayed_job.
I have a text area on one of my pages, and in the controller I delay a message with delayed_job which includes an object which has the content from the text area:
SomeMailer.delay.send_message(message)
This works fine on Chrome, FF, Safari. However in IE8 only, and only when I actually enter text in the text area, and it looks like only when I enter a carriage return in the text area (I think), I get this error from the controller:
invalid byte sequence in UTF-8
This appears to me to be when delayed_job is serializing the job to the database via ActiveRecord, that it doesn't like the character encoding in the newline (\r\n). It's a bit hard to figure out because I don't know if this is an IE, Rails, delayed_job or Postgres issue.
Side Note: I'm getting this error locally, but it doesn't appear that this error appears on Heroku - so maybe they have their database configured better than I do?
Environment:
Rails 3.0.3
Ruby 1.9.2
Postgres 8.4 - encoding UTF8, collation en_US.UTF-8
delayed_job 2.1.4
IE 8
Any thoughts would be appreciated.
Are you setting your encoding in the HTML that is being sent to IE8? e.g.:
<!doctype html>
<head>
<meta charset="utf-8">
</head>
It's possible that the other browsers are working around the missing information and assuming UTF-8 when encoding the data from your text area.

Webpage won't show certain characters

I'm noticing that in places where our site uses special characters on a webpage, such as ¡ or ¿ or even "special quotes" (like MS-word) it displays this funky � character
Is there something I can to do fix this? Is this a charset thing?
I know I could use html entities, such as
¡
But, I wanted to see if there was something else to address this since I notice some other sites don't need to use the special code.
Thanks
did you try to use the following meta tag?
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />

Resources