I'm using Rails 3.2.1 and I have stuck on some problem for quite long.
I'm using oracle enhanced adapter and I have raw(16) (uuid) column and when I'm trying to display the data there is 2 situations:
1) I see the weird symbols
2) I'm getting incompatible character encoding: Ascii-8bit and utf-8.
In my application.rb file I added the
config.encoding = 'utf-8'
and in my view file I added
'#encoding=utf-8'
But so far nothing worked
I also tried to add html_safe but it failed .
How can I safely diaply my uuid data?
Thank you very much
Answer:
I used the unpack method to convert the
binary with those parameters
H8H4H4H4H12 and in the end joined the
array :-)
The RAW datatype is a string of bytes that can take any value. This includes binary data that doesn't translate to anything meaningful in ASCII or UTF-8 or in any character set.
You should really read Joel Spolsky's note about character sets and unicode before continuing.
Now, since the data can't be translated reliably to a string, how can we display it? Usually we convert or encode it, for instance:
we could use the hexadecimal representation where each byte is converted to two [0-9A-F] characters (in Oracle using the RAWTOHEX function). This is fine for display of small binary field such as RAW(16).
you can also use other encodings such as base 64, in Oracle with the UTL_ENCODE package.
Related
I keep getting an exception that ActiveRecord::StatementInvalid: PG::UntranslatableCharacter: ERROR: character with byte sequence 0xe2 0x80 0x99 in encoding "UTF8" has no equivalent in encoding "LATIN1". I did some checking and it looks like it is the backtick or apostrophe. What is the best way to handle this? Just strip out the character or convert the whole db to UTF-8? If it is converting to UTF-8 how can I do that permanently as it always seems to revert if you do it in the shell?
I don't understand what you mean by "revert, if done in the shell", but: You seem to have an application where some parts (at least the database) using encoding LATIN1, and one part (your Rails App) is using UTF-8. IMO, it is best if you have every in Unicode, but to what extend a conversion makes sense, can not be said in general. For example, if your database is also being processed by other tools, and those expect Latin1, a conversion is not sensible.
In any case, you need to define a clear borderline between where you use which encoding, and handle conversion at this border. This applies not only to the database, but also - for example - to the HTML pages you are generating (hopefully UTF-8), to files uploaded by the users and processes by your application, and so on.
If you convert to an encoding, where certain characters can not be represented - as this is in your case -, you have only three choices:
Reject the data (they must have been generated somewhere, perhaps as user input in a web form),
Simply remove the offending characters
Replace the offending characters by a placeholder (for instance, a question mark)
None of these options is very pleasant, but if converting your database to UTF-8 is no option, you should deal with this problem at the point where the problem string is generated, and not when it is written into the database.
I have imported a shapefile in postgre database where Latin1 character encoding is used. (Database can not import using UTF-8 format). When I retrieve value using PQgetvalue() method some special characters are received incorrectly. For example I have a field value "STURDEEÿAVENUE" that is incorrectly converted
to "STURDEEÿAVENUE"
Since you are getting the data back as UTF-8, your client_encoding is probably wrong. It can be set per connection and manages the encoding with which the strings are sent back to client. By setting the variable to Latin1 immediately after connecting you can retrieve the strings in the desired encoding.
How can I get a list of compatible encodings for a Ruby String? (MRI 1.9.3)
Use case: I have some user provided strings, encoded with UTF-8. Ideally I need to convert them to ISO/IEC 8859-1 (8-bit), but I also need to fallback to unicode when some special characters are present.
Also, is there a better way to accomplis this? Maybe I am testing the wrong thing.
EDIT- adding more details
Tanks for the answers, I should probably add some context.
I know how to perform encoding conversion.
I'm looking for a way to quickly find out if a string can be safely encoded to another encoding or, to put it in another (and quite wrong) way, what is the minimum encoding to support all the characters in that string.
Just converting the strings to 16-byte is not an option, because they will be sent as SMSs and converting them to a 16-byte encoding cuts the amount of available characters from 160 down to 70.
I need to convert them to 16-bytes only when they contain a special character which is not supported in ISO/IEC 8859-1.
Unluckily, Ruby’s ideas of encoding compatibility are not fully congruent with your use case. However, trying to encode your UTF-8 string in ISO-8859-1 and catching the error that is thrown when a conversion is not possible will achieve what you are after:
begin
'your UTF-8 string'.encode!('ISO-8859-1')
rescue Encoding::UndefinedConversionError
end
will convert your string to ISO-8859-1 if possible and leave it as UTF-8 if not.
Note this uses encode, which actually transcodes the string using Encoding::Converter (i.e. reassigns the correct encoding byte pattern to the character representations of the string), unlike force_encoding, which just changes the encoding flag (i.e. tells Ruby to interpret the string’s byte stream according to the set encoding).
Ruby has standard library in which u can find class Encoding and his sub-class called Encoding::Converter they are probably your best friends in this case.
#!/usr/bin/env ruby
# encoding: utf-8
converter = Encoding::Converter.new("UTF-8", "ISO-8859-1")
converted = converter.convert("é")
puts converted.encoding
# => ISO-8859-1
puts converted.dump
# => "\xE9"
Is valid_encoding? (instance method of String) useful? That is:
try_str = str.force_encoding("ISO/IEC 8859-1")
str = try_str if try_str.valid_encoding?
To convert to ISO-8859-1 you can follow the below code to encode it.
1.9.3p194 :002 > puts "é".force_encoding("ISO-8859-1").encode("UTF-8")
é
=> nil
Linked Answer
"Some String".force_encoding("ISO/IEC 8859-1")
Also you can refer rails encoding link
I have a field scraped from a utf-8 page:
"O’Reilly"
And saved in a yml file:
:name: "O\xE2\x80\x99Reilly"
(xE2x80x99 is the correct UTF-8 representation of this apostrophe)
However when I load the value into a hash and yield it to a page tagged as utf-8, I get:
OâReilly
I looked up the character â, which is encoded in UTF-16 as x00E2, and the characters x80 and x89 were invisible but present after the â when I pasted the string. I assume this means my app is outputting three UTF-16 characters instead of one UTF-8.
How do I make rails interpret a 3-byte UTF-8 code as a single character?
Ruby strings are sequences of bytes instead of characters:
$ irb
>> "O\xE2\x80\x99Reilly"
=> "O\342\200\231Reilly"
Your string is a sequence of 10 bytes but 8 characters (as you know). The safest way to see that you output the correct string in HTML (I assume you want HTML since you mentioned Rails) is to convert non-printable characters to HTML entities; in your case to
O’Reilly
This takes some work but it should help in cases where send your HTML in UTF-8 but your end-user has set his or her browser to override and show Latin-1 or some other silly restricted charset.
Ultimately this was caused by loading a syck file (generated by an external script) with psych (in rails). Loading with syck solved the issue:
#in ruby environment
puts YAML::ENGINE.yamler => syck
#in rails
puts YAML::ENGINE.yamler => psych
#in webapp
YAML::ENGINE.yamler = 'syck'
a = YAML::load(file_saved_with_syck)
a[index][:name] => "O’Reilly"
YAML::ENGINE.yamler = 'psych'
I assume this means my app is outputting three UTF-16 characters instead of one UTF-8.
It's not really UTF-16, which is rarely used on the web (and largely breaks there). Your app is outputting three Unicode characters (including the two invisible control codes), but that's not the same thing as the UTF-16 encoding.
The problem would seem to be that the YAML file is being read in as if it were ISO-8859-1-encoded, so that the \xE2 byte maps to character U+00E2 and so on. I am guessing you are using Ruby 1.9 and the YAML is being parsed into byte strings with associated ASCII-8BIT encoding instead of UTF-8, causing the strings to undergo a round of trancoding (mangling) later.
If this is the case you might have to force_encoding the read strings back to what they should have been, or set default_internal to cause the strings to be read back into UTF-8. Bit of a mess this.
I have a Ruby-on-Rails app that accepts a binary file upload, stores it as an ActiveRecord object in a local database, and passes a hex equivalent of the binary blob to a back-end web service for processing. This usually works great.
Two days ago, I ran into a problem with a file containing the hex sequence \x25\x32\x35, %25 in ASCII. The binary representation of the file was stored properly in the database but the hex string representation of the file that resulted from
sample.binary.unpack('H*').to_s
was incorrect. After investigating, I found that those three bytes were converted to hex string 25, the representation for %. It should have been 253235, the representation for %25
It makes sense for Ruby or Rails or ActiveRecord to do this. %25 is the proper URL-encoded value for %. However, I need to turn off this optimization or validation or whatever it is. I need blob.unpack('H*') to include a hex equivelant for every byte of the blob.
One (inefficient) way to solve this is to store a hex representation of the file in the database. Grabbing the file directly from the HTTP POST request works fine:
params[:sample].read.unpack('H*').to_s
That stores the full 253235. Something about the roundtrip to the database (sqlite) or the HTTPClient post from the front-end web service to the back-end web service (hosted within WEBrick) is causing the loss of fidelity.
Eager to hear any ideas, willing to try whatever to test out suggestions. Thanks.
This is a known issue with rails and it's sqlite adapter:
There is a bug filed here in the old rails system (with patch):
https://rails.lighthouseapp.com/projects/8994/tickets/5040
And a new bug filed here in the new rails issue tracking system:
https://github.com/rails/rails/issues/2407
Any string that contains '%00' will be mangled when converting to binary and back. A binary that contains the string '%25' will be converted to '%' which is what you are seeing.