Hash.from_xml double escapes & - ruby-on-rails

>> h={:title => "hi & mv288" }
=> {:title=>"hi & mv288"}
>> h.to_xml
=> "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<hash>\n <title>hi &amp; mv288</title>\n</hash>\n"
>> Hash.from_xml h.to_xml
=> {"hash"=>{"title"=>"hi & mv288"}}
If you notice line#2 and #4, the & characters in the title value became & after
a series of Hash.to_xml and from_xml method calls.
Is there any way to prevent Hash.from_xml from converting & into &.

We switched the xml parser to Nokogiri to resolve this issue.
Add this line in your environment.rb
ActiveSupport::XmlMini.backend = 'Nokogiri'
You will have to have nokogiri gem installed though. If you need a pure
java implementation of nokogiri, check this out.
https://github.com/tenderlove/nokogiri/wiki/pure-java-nokogiri-for-jruby
The installation command is,
gem install nokogiri --pre
You may also use LibXml as XmlMiini.backend to resolve this issue.

Related

Ruby Net::IMAP - Attachment Filenames with special characters/umlauts [duplicate]

I have the following header:
From: =?iso-8859-1?Q?Marta_Falc=E3o?= <marta.falcao#example.com.br>
I can easily split out the stuff before the <, which leaves me with
"=?iso-8859-1?Q?Marta_Falc=E3o?="
What can I use to turn this into "Marta Falcão"?
Using the newer Mail gem:
Mail::Encodings.value_decode(str) or
Mail::Encodings.unquote_and_convert_to(str, to_encoding)
Thanks to Roland Illig for his comment, which led me to two options:
install rfc2047-ruby and call Rfc2047.decode(header)
install TMail and call TMail::Unquoter.unquote_and_convert_to(header, 'utf-8') or better yet TMail::Address.parse(header).friendly, the latter of which strips out the <email address> part
Use Ruby to implement RFC 2047 isn't hard:
module Rfc2047
TOKEN = /[\041\043-\047\052\053\055\060-\071\101-\132\134\136\137\141-\176]+/.freeze
ENCODED_TEXT = /[\041-\076\100-\176]+/.freeze
ENCODED_WORD = /=\?(?<charset>#{TOKEN})\?(?<encoding>[QB])\?(?<encoded_text>#{ENCODED_TEXT})\?=/i.freeze
class << self
def encode(input)
"=?#{input.encoding}?B?#{[input].pack('m0')}?="
end
def decode(input)
match_data = ENCODED_WORD.match(input)
raise ArgumentError if match_data.nil?
charset, encoding, encoded_text = match_data.captures
decoded =
case encoding
when 'Q', 'q' then encoded_text.unpack1('M')
when 'B', 'b' then encoded_text.unpack1('m')
end
decoded.force_encoding(charset)
end
end
end
Rfc2047.decode '=?iso-8859-1?Q?Marta_Falc=E3o?=' # => Marta_Falcão
Update
mikel/mail is currently having an encoding issue which might not decode the string correctly.
If that really bothers you, you can try new_rfc_2047:
$ gem install new_rfc_2047
$ ruby -rrfc_2047 -e 'puts Rfc2047.decode "From: =?iso-8859-1?Q?Marta_Falc=E3o?= <marta.falcao#example.com.br>"'
From: Marta Falcão <marta.falcao#example.com.br>
Since the source code of mikel/mail is a little too complicated for me to do the modification, I just made my own gem for this.
Gem source is here: https://github.com/tonytonyjan/rfc_2047/

Empty array in smarter_csv

I am using Ruby 2.1.1 and the gem smarter_csv version 1.0.17. I am trying to get a simple example to work but keep getting an empty array. Any hints on what I am doing wrong.
2.1.1 :006 > f = File.open("/Users/diasks2/test.csv", "rb")
2.1.1 :007 > f.read
=> "source_language,target_language\rhello,hola\rlibrary,biblioteca"
2.1.1 :008 > require 'smarter_csv'
=> true
2.1.1 :009 > data = SmarterCSV.process("/Users/diasks2/test.csv")
=> []
This should work:
data = SmarterCSV.process("/Users/diasks2/test.csv", row_sep: "\r")
When not indicating row_sep, smarter_csv will use the system new line separator which is "\r\n" for windows machines, and "\r" for unix machines. Your file has "\r" to separate the lines, so you need to explicitly indicate this.
Edit - apparently there is a feature for smarter_csv to guess the row separator - if you use the following:
data = SmarterCSV.process("/Users/diasks2/test.csv", row_sep: :auto)
The gem should identify file with lines ending with \n, \r\n, or \r and parse them accordingly.

Soft signs in rails console

I want to create multiple categories via console and I want to be able add soft signs. At this moment I can't do that.
It's very important to project that I can save category names with soft signs.
Can somebody tip me where to search? I searched such tag - soft signs rails.
There wasn't any usefull resource.
Thanks
EDIT
Soft signs in my native language is like this.
Ā,Š,Ē,Ž with that symbol called soft sign abowe the character.
At this moment when I try to save new category record it shows me this kind off error
thodError: undefined methodcache_ancestry!' for #
But I am sure that I didn't change anything in models or controllers :(
What version of Ruby is this? What you're seeing there are either US-ASCII strings with UTF-8 data in them (Ruby 1.9) or byte arrays (Ruby 1.8).
If you're using Ruby 1.8, you may need to use Iconv to convert your encoding from US-ASCII to UTF-8. If you're using Ruby 1.9, then make sure you're creating UTF-8 strings and it should work just fine.
Note that those escape sequences are correct - that is the literal byte array of those characters, assuming the proper encoding is applied, so you may not need to actually change anything. If the bytes are right, everything's fine - you're just seeing ruby interpret the string as ASCII rather than UTF-8 or whatnot.
In Ruby 1.8, when you #inspect a string, you get the escaped version, but putsing it will show you the actual string:
1.8.7 :021 > s = "Komunālās mašīnas"
=> "Komun\304\201l\304\201s ma\305\241\304\253nas"
1.8.7 :022 > puts s
Komunālās mašīnas
In 1.9, you get the correct display all around, so long as your encoding is right:
1.9.3p327 :001 > s = "Komunālās mašīnas"
=> "Komunālās mašīnas"
1.9.3p327 :004 > s.force_encoding "US-ASCII"
=> "Komun\xC4\x81l\xC4\x81s ma\xC5\xA1\xC4\xABnas"
1.9.3p327 :005 > puts s
Komunālās mašīnas
Check this out Edgars:
#encoding: UTF-8
t = 'ŠšÐŽžÀÁÂÃÄAÆAÇÈÉÊËÌÎÑNÒOÓOÔOÕOÖOØOUÚUUÜUÝYÞBßSàaáaâäaaæaçcèéêëìîðñòóôõöùûýýþÿƒ'
fallback = {
'Š'=>'S', 'š'=>'s', 'Ð'=>'Dj','Ž'=>'Z', 'ž'=>'z', 'À'=>'A', 'Á'=>'A', 'Â'=>'A', 'Ã'=>'A', 'Ä'=>'A',
'Å'=>'A', 'Æ'=>'A', 'Ç'=>'C', 'È'=>'E', 'É'=>'E', 'Ê'=>'E', 'Ë'=>'E', 'Ì'=>'I', 'Í'=>'I', 'Î'=>'I',
'Ï'=>'I', 'Ñ'=>'N', 'Ò'=>'O', 'Ó'=>'O', 'Ô'=>'O', 'Õ'=>'O', 'Ö'=>'O', 'Ø'=>'O', 'Ù'=>'U', 'Ú'=>'U',
'Û'=>'U', 'Ü'=>'U', 'Ý'=>'Y', 'Þ'=>'B', 'ß'=>'Ss','à'=>'a', 'á'=>'a', 'â'=>'a', 'ã'=>'a', 'ä'=>'a',
'å'=>'a', 'æ'=>'a', 'ç'=>'c', 'è'=>'e', 'é'=>'e', 'ê'=>'e', 'ë'=>'e', 'ì'=>'i', 'í'=>'i', 'î'=>'i',
'ï'=>'i', 'ð'=>'o', 'ñ'=>'n', 'ò'=>'o', 'ó'=>'o', 'ô'=>'o', 'õ'=>'o', 'ö'=>'o', 'ø'=>'o', 'ù'=>'u',
'ú'=>'u', 'û'=>'u', 'ý'=>'y', 'ý'=>'y', 'þ'=>'b', 'ÿ'=>'y', 'ƒ'=>'f'
}
p t.encode('us-ascii', :fallback => fallback)
See Ruby 1.9.x replace sets of characters with specific cleaned up characters in a string
EDIT:
To get all the characters for your language you will need to add them as desired to the fallback hash. When I run "Komunālās mašīnas" as the variable 't' I get this:
t = "Komunālās mašīnas"
t.encode('us-ascii', :fallback => fallback)
Encoding::UndefinedConversionError: U+0101 from UTF-8 to US-ASCII
You can tell from this where the problem lies by googling U+0101 which shows
http://www.charbase.com/0101-unicode-latin-small-letter-a-with-macron
So now you know which letter is not working and you can add it to the fallback hash like so:
fallback = { OTHER DEFINITIONS , 'ā'=>'a'}
Here's a place to start:
http://www.ascii-codes.com/cp775.html

Fastercsv shows malformedCSVError, what am i doing wrong?

I am implementing in Ruby and i am running a project which reads a CSV file to add users.
but when i pick my file it just gives always the same error:
FasterCSV::MalformedCSVError in User importController#match
Illegal quoting on line 1.
my CSV file just exists of :
"RubenPersoon1","test","Bauwens","Ruben","rub#gmail.com",0
anyone who knows what can be wrong?
Try to upgrade your FasterCSV gem version. With the latest version it works:
FasterCSV.parse_line '"RubenPersoon1","test","Bauwens","Ruben","rub#gmail.com",0'
=> ["RubenPersoon1", "test", "Bauwens", "Ruben", "rub#gmail.com", "0"]
ruby-1.8.7-p352 :005 > FasterCSV.parse '"RubenPersoon1","test","Bauwens","Ruben","rub#gmail.com",0'
=> [["RubenPersoon1", "test", "Bauwens", "Ruben", "rub#gmail.com", "0"]]
Also, keep in mind that if you are on Ruby 1.9.2, FasterCSV is already included. Just require 'csv' and use the CSV class.

Rails 2.3.2/Ruby 1.8.6 Encoding Question - ActionController returning UTF-8?

I have a pretty simple Rails question regarding encoding that I can't find an answer to.
Environment:
Rails 2.3.2/Ruby1.8.6
I am not setting any encoding options within the Rails environment currently, have left everything to defaults.
If I read a String from disk from a text file - and send it via Rails render :text functionality using Apache/Phusion, what encoding should the client expect?
Thank you for any answers,
Since about Rails 1.2, Rails sets Ruby 1.8's $KCODE magic variable to "UTF8". It includes ActiveSupport::CoreExtensions::String::Multibyte to patch around issues with otherwise ambiguous per-character/per-byte operators. Your text file should be UTF-8, Ruby will pass it through and your application layout should specify a META tag declaring the document's charset to be UTF-8 too:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
Then it should all 'just work', but there are some gotchas described below.
If you're on a Mac, running "script/console" in Terminal.app and then pasting unusual character sequences directly into the terminal from e.g. the Character Viewer is a good way to play around and demonstrate this to your own satisfaction, since the whole OS works in UTF-8. I don't know what the equivalent would be for Windows or an arbitrary Linux distribution.
For example, "⇒" - RIGHTWARDS DOUBLE ARROW - is Unicode 21D2, UTF8 0xE2 (226), 0x87 (125), 0x92 (146). If I paste that into Terminal and ask for the byte values I get the expected result:
>> $KCODE
=> "UTF8"
>> "⇒"
=> "\342\207\222"
>> puts "⇒"
⇒
...but...
>> "⇒"[0]
=> 226
>> "⇒"[1]
=> 135
>> "⇒"[2]
=> 146
>> "⇒"[3]
=> nil
Note how you're still getting byte access with "[]". See the documentation on the Multibyte extensions in the Rails API (for Rails 2.2, e.g. at http://railsapi.com/) if you want to do string operations, otherwise things like "foo.reverse" will do the wrong thing; "foo.mb_chars.reverse" gets it right by using the "mb_chars" proxy.

Resources