This only seems to happen in Chrome, but when AJAX is used to send a JSON-encoded string containing é (which itself is typed by entering Alt+0233), it somehow ends up as a tab (character 9) in the database and "consumes" the following character.
I'm no stranger to é being shown instead of é because that's the UTF8-encoded "version" being treated as iso-8559-1, but what could cause 0b11101001 to become 0b00001001?
Although I have not found the reason for the encoding error, I have managed to fix the issue by checking the character encoding of the sent data and re-encoding as necessary.
Related
Try to use Google to find Wikipedia article about De Morgan's laws.
Click the link, and see the URL. At least in Chrome, it will be
https://en.wikipedia.org/wiki/De_Morgan%27s_laws
' is percent-encoded as %27, despite it is a valid URL character (and even more, if you manually change it in address bar from %27 to ', it will work). Why?
While aposthrope may be valid char, URL-encoded version is also equally valid!
Not sure if there is a hard reason, so this is kinda "soft" answer: Aposthrope (and/or double quote) needs to be escaped somehow if URL is ever put into for example JSON or XML. URL encoding them as part of sanitizing URLs solves this one way, and protects against poor JSON/XML handling and programmer errors. It's just pragmatic.
Decoding these certain valid chars in HTTP responses' headers etc (so browser shows them "right") should be possible and maybe nice, but extra work and code. Note that there are also chars where decoding would not be ok, so this would have to be selective! So at least in this case it just wasn't done I guess. So if a char gets URL-encoded at any step of the whole page loading operation chain, they stay that way.
I have a local json file with some descriptions of an app and I have found a weird behaviour when parsing \u0092 and \u0091 characters.
When json file contains these characters, the corresponding parsed NSString is printed like "?" and in UIlabel it dissapears completely.
Example "L\u2019H\u00e9r." is showed as "LHér." instead of "L'Hér."
If I replace this characters with \u2019, then I can see the caracter ' in UILabel
Does anybody any clue about this?
EDIT: For the moment I will substitute both of them with character \u2019, it is also a ' and there is no problem confusing it with a control character. Thank you all!
This answer is a little speculative, but I hope it gets you on the right tracks.
Your best bet may be to give up and substitute \u0091 and \u0092 for something else as a preprocessing step before string display. These are control characters and are unprintable in most encodings. But:
If rest of the file is proper UTF, your json file probably has problems: encoding is wrong (CP-1250?) while you read the file as UTF, some error has been made when converting the file, or a similar issue. So another solution is of course fixing your file.
If you're not sure about how your file is encoded, it may simply be encoded in CP-1250 - so reading the file using NSWindowsCP1250StringEncoding might fix your problem.
BTW, if you hardcode a string #"\u0091", you'll get a compilation time error Universal character name refers to a control character. Yes, not even a warning, it's that much unprintable in Unicode ;)
I'm using Ruby 2.0 and Rails 3.2.14. My view is littered several UTF-8 characters, mainly currency symbols like บาท and د.إ etc. I noticed some
(ActionView::Template::Error) "incompatible character encodings: ASCII-8BIT and UTF-8
in our production code and promptly tried visiting the page url on my browser without any issues. On digging in, I realised the error was actually caused by BingBot and few spiders. So when I tried to curl the same url, I was able to reproduce the issue. So, if I try
curl http://localhost:3000/?x=✓
I get the error where UTF-8 symbols are used in the view code. I also realised that if use HTML encoded strings in place of the symbols, this does not occur. However, I prefer using the actual symbols.
And I have already tried setting Encoding.default_external = Encoding::UTF_8 in environment.rb adding #encoding: utf-8 magic comment to top of file and it does not help.
So, why does this error occur? What is the difference between hitting this url on browser and on CURL besides cookies? And how do I go about fixing this issue and allow BingBot to index our site? Thanks.
The culprit that was leaking non UTF-8 characters in my template was an innocuous meta tag for Facebook Open Graph
%meta{property: "og:url", content: request.url}
And when the request is non-standard, this causes the encoding issue. Changing it to
%meta{property: "og:url", content: request.url.force_encoding('UTF-8')}
made the trick.
That error message usually occurs when you try to concatenate strings with different character encodings.
Is your database set to use UTF-8 as well?
If not, you could have a problem when you try to insert the non-UTF8 values into your UTF-8 template.
The character encoding starts to irritate me.
It took me a while to get everything from the DB in the right encoding on the screen, but with help from the i18n helper, this worked out.
Now I only have one more problem: saving text...
If i add some letters with accents (eg é ç ...) in a text field and want to save it, already in my controller it show as some exotic combination of characters.
Could someone tell me why this is and how I can fix this.
Everything is in UTF-8 btw
Thanks!
//Edit:
When I save the form, this is my log output
Parameters: {"free_text"=>"test 1 2 é",
And everything is capable of UTF-8...
Can you illustrate your output?
Let me guess your situation.
Supposed that you log those characters in controller in log/development.log or production.log.
If you view that log in terminal, you should ensure your terminal is capable to show UTF-8, with appropriate font. Also, your shell is capable to show UTF-8 and so do your the text viewer.
i am using Last.fm API to fetch some info of artists .I save info in DB and then display on my webpage.
But characters like “ (double quote) are shown as “ .
Example Artist info http://www.last.fm/music/David+Penn
and i got the first line as " Producer, arranger, dj and musician from Madrid-Spain. He has his own record company “Zen Recordsâ€, and ".
Mine Db is UTF-8 but i dunno why this error is still coming .
This seems to be a character encoding error. Confirm that you are reading the webpage as the correct encoding and are showing the results in the correct encoding.
You should be using UTF-8 all the way through. Check that:
your connection to the database is UTF-8 (using mysql_set_charset);
the pages you're outputting are marked as UTF-8 (<meta http-equiv="Content-Type" content="text/html;charset=utf-8">);
when you output strings from the database, you HTML-encode them using htmlspecialchars() and not htmlentities().
htmlentities HTML-encodes all non-ASCII characters, and by default assumes you are passing it bytes in ISO-8859-1. So if you pass it “ encoded as UTF-8 (bytes 0xE2, 0x80, 0x9C), you'd get â, instead of the expected “ or “. This can be fixed by passing in utf-8 as the optional $charset argument.
However it's usually easier to just use htmlspecialchars() instead, as this leaves non-ASCII characters alone, as raw bytes instead of HTML entity references. This results in a smaller page output, so is preferable as long as you're sure the HTML you're producing will keep its charset information (which you can usually rely on, except in context like sending snippets of HTML in a mail or something).
htmlspecialchars() does have an optional $charset argument too, but setting it to utf-8 is not critical since that results in no change of behaviour over the default ISO-8859-1 charset. If you are producing output in old-school multibyte encodings like Shift-JIS you do have to worry about setting this argument correctly, but today that's quite rare as most sane people use UTF-8 in preference.