Soft signs in rails console - ruby-on-rails

I want to create multiple categories via console and I want to be able add soft signs. At this moment I can't do that.
It's very important to project that I can save category names with soft signs.
Can somebody tip me where to search? I searched such tag - soft signs rails.
There wasn't any usefull resource.
Thanks
EDIT
Soft signs in my native language is like this.
Ā,Š,Ē,Ž with that symbol called soft sign abowe the character.
At this moment when I try to save new category record it shows me this kind off error
thodError: undefined methodcache_ancestry!' for #
But I am sure that I didn't change anything in models or controllers :(

What version of Ruby is this? What you're seeing there are either US-ASCII strings with UTF-8 data in them (Ruby 1.9) or byte arrays (Ruby 1.8).
If you're using Ruby 1.8, you may need to use Iconv to convert your encoding from US-ASCII to UTF-8. If you're using Ruby 1.9, then make sure you're creating UTF-8 strings and it should work just fine.
Note that those escape sequences are correct - that is the literal byte array of those characters, assuming the proper encoding is applied, so you may not need to actually change anything. If the bytes are right, everything's fine - you're just seeing ruby interpret the string as ASCII rather than UTF-8 or whatnot.
In Ruby 1.8, when you #inspect a string, you get the escaped version, but putsing it will show you the actual string:
1.8.7 :021 > s = "Komunālās mašīnas"
=> "Komun\304\201l\304\201s ma\305\241\304\253nas"
1.8.7 :022 > puts s
Komunālās mašīnas
In 1.9, you get the correct display all around, so long as your encoding is right:
1.9.3p327 :001 > s = "Komunālās mašīnas"
=> "Komunālās mašīnas"
1.9.3p327 :004 > s.force_encoding "US-ASCII"
=> "Komun\xC4\x81l\xC4\x81s ma\xC5\xA1\xC4\xABnas"
1.9.3p327 :005 > puts s
Komunālās mašīnas

Check this out Edgars:
#encoding: UTF-8
t = 'ŠšÐŽžÀÁÂÃÄAÆAÇÈÉÊËÌÎÑNÒOÓOÔOÕOÖOØOUÚUUÜUÝYÞBßSàaáaâäaaæaçcèéêëìîðñòóôõöùûýýþÿƒ'
fallback = {
'Š'=>'S', 'š'=>'s', 'Ð'=>'Dj','Ž'=>'Z', 'ž'=>'z', 'À'=>'A', 'Á'=>'A', 'Â'=>'A', 'Ã'=>'A', 'Ä'=>'A',
'Å'=>'A', 'Æ'=>'A', 'Ç'=>'C', 'È'=>'E', 'É'=>'E', 'Ê'=>'E', 'Ë'=>'E', 'Ì'=>'I', 'Í'=>'I', 'Î'=>'I',
'Ï'=>'I', 'Ñ'=>'N', 'Ò'=>'O', 'Ó'=>'O', 'Ô'=>'O', 'Õ'=>'O', 'Ö'=>'O', 'Ø'=>'O', 'Ù'=>'U', 'Ú'=>'U',
'Û'=>'U', 'Ü'=>'U', 'Ý'=>'Y', 'Þ'=>'B', 'ß'=>'Ss','à'=>'a', 'á'=>'a', 'â'=>'a', 'ã'=>'a', 'ä'=>'a',
'å'=>'a', 'æ'=>'a', 'ç'=>'c', 'è'=>'e', 'é'=>'e', 'ê'=>'e', 'ë'=>'e', 'ì'=>'i', 'í'=>'i', 'î'=>'i',
'ï'=>'i', 'ð'=>'o', 'ñ'=>'n', 'ò'=>'o', 'ó'=>'o', 'ô'=>'o', 'õ'=>'o', 'ö'=>'o', 'ø'=>'o', 'ù'=>'u',
'ú'=>'u', 'û'=>'u', 'ý'=>'y', 'ý'=>'y', 'þ'=>'b', 'ÿ'=>'y', 'ƒ'=>'f'
}
p t.encode('us-ascii', :fallback => fallback)
See Ruby 1.9.x replace sets of characters with specific cleaned up characters in a string
EDIT:
To get all the characters for your language you will need to add them as desired to the fallback hash. When I run "Komunālās mašīnas" as the variable 't' I get this:
t = "Komunālās mašīnas"
t.encode('us-ascii', :fallback => fallback)
Encoding::UndefinedConversionError: U+0101 from UTF-8 to US-ASCII
You can tell from this where the problem lies by googling U+0101 which shows
http://www.charbase.com/0101-unicode-latin-small-letter-a-with-macron
So now you know which letter is not working and you can add it to the fallback hash like so:
fallback = { OTHER DEFINITIONS , 'ā'=>'a'}
Here's a place to start:
http://www.ascii-codes.com/cp775.html

Related

How to URL encode a string in Ruby

How do I URI::encode a string like:
\x12\x34\x56\x78\x9a\xbc\xde\xf1\x23\x45\x67\x89\xab\xcd\xef\x12\x34\x56\x78\x9a
to get it in a format like:
%124Vx%9A%BC%DE%F1%23Eg%89%AB%CD%EF%124Vx%9A
as per RFC 1738?
Here's what I tried:
irb(main):123:0> URI::encode "\x12\x34\x56\x78\x9a\xbc\xde\xf1\x23\x45\x67\x89\xab\xcd\xef\x12\x34\x56\x78\x9a"
ArgumentError: invalid byte sequence in UTF-8
from /usr/local/lib/ruby/1.9.1/uri/common.rb:219:in `gsub'
from /usr/local/lib/ruby/1.9.1/uri/common.rb:219:in `escape'
from /usr/local/lib/ruby/1.9.1/uri/common.rb:505:in `escape'
from (irb):123
from /usr/local/bin/irb:12:in `<main>'
Also:
irb(main):126:0> CGI::escape "\x12\x34\x56\x78\x9a\xbc\xde\xf1\x23\x45\x67\x89\xab\xcd\xef\x12\x34\x56\x78\x9a"
ArgumentError: invalid byte sequence in UTF-8
from /usr/local/lib/ruby/1.9.1/cgi/util.rb:7:in `gsub'
from /usr/local/lib/ruby/1.9.1/cgi/util.rb:7:in `escape'
from (irb):126
from /usr/local/bin/irb:12:in `<main>'
I looked all about the internet and haven't found a way to do this, although I am almost positive that the other day I did this without any trouble at all.
str = "\x12\x34\x56\x78\x9a\xbc\xde\xf1\x23\x45\x67\x89\xab\xcd\xef\x12\x34\x56\x78\x9a".force_encoding('ASCII-8BIT')
puts CGI.escape str
=> "%124Vx%9A%BC%DE%F1%23Eg%89%AB%CD%EF%124Vx%9A"
Nowadays, you should use ERB::Util.url_encode or CGI.escape. The primary difference between them is their handling of spaces:
>> ERB::Util.url_encode("foo/bar? baz&")
=> "foo%2Fbar%3F%20baz%26"
>> CGI.escape("foo/bar? baz&")
=> "foo%2Fbar%3F+baz%26"
CGI.escape follows the CGI/HTML forms spec and gives you an application/x-www-form-urlencoded string, which requires spaces be escaped to +, whereas ERB::Util.url_encode follows RFC 3986, which requires them to be encoded as %20.
See "What's the difference between URI.escape and CGI.escape?" for more discussion.
str = "\x12\x34\x56\x78\x9a\xbc\xde\xf1\x23\x45\x67\x89\xab\xcd\xef\x12\x34\x56\x78\x9a"
require 'cgi'
CGI.escape(str)
# => "%124Vx%9A%BC%DE%F1%23Eg%89%AB%CD%EF%124Vx%9A"
Taken from #J-Rou's comment
I was originally trying to escape special characters in a file name only, not on the path, from a full URL string.
ERB::Util.url_encode didn't work for my use:
helper.send(:url_encode, "http://example.com/?a=\11\15")
# => "http%3A%2F%2Fexample.com%2F%3Fa%3D%09%0D"
Based on two answers in "Why is URI.escape() marked as obsolete and where is this REGEXP::UNSAFE constant?", it looks like URI::RFC2396_Parser#escape is better than using URI::Escape#escape. However, they both are behaving the same to me:
URI.escape("http://example.com/?a=\11\15")
# => "http://example.com/?a=%09%0D"
URI::Parser.new.escape("http://example.com/?a=\11\15")
# => "http://example.com/?a=%09%0D"
You can use Addressable::URI gem for that:
require 'addressable/uri'
string = '\x12\x34\x56\x78\x9a\xbc\xde\xf1\x23\x45\x67\x89\xab\xcd\xef\x12\x34\x56\x78\x9a'
Addressable::URI.encode_component(string, Addressable::URI::CharacterClasses::QUERY)
# "%5Cx12%5Cx34%5Cx56%5Cx78%5Cx9a%5Cxbc%5Cxde%5Cxf1%5Cx23%5Cx45%5Cx67%5Cx89%5Cxab%5Cxcd%5Cxef%5Cx12%5Cx34%5Cx56%5Cx78%5Cx9a"
It uses more modern format, than CGI.escape, for example, it properly encodes space as %20 and not as + sign, you can read more in "The application/x-www-form-urlencoded type" on Wikipedia.
2.1.2 :008 > CGI.escape('Hello, this is me')
=> "Hello%2C+this+is+me"
2.1.2 :009 > Addressable::URI.encode_component('Hello, this is me', Addressable::URI::CharacterClasses::QUERY)
=> "Hello,%20this%20is%20me"
Code:
str = "http://localhost/with spaces and spaces"
encoded = URI::encode(str)
puts encoded
Result:
http://localhost/with%20spaces%20and%20spaces
I created a gem to make URI encoding stuff cleaner to use in your code. It takes care of binary encoding for you.
Run gem install uri-handler, then use:
require 'uri-handler'
str = "\x12\x34\x56\x78\x9a\xbc\xde\xf1\x23\x45\x67\x89\xab\xcd\xef\x12\x34\x56\x78\x9a".to_uri
# => "%124Vx%9A%BC%DE%F1%23Eg%89%AB%CD%EF%124Vx%9A"
It adds the URI conversion functionality into the String class. You can also pass it an argument with the optional encoding string you would like to use. By default it sets to encoding 'binary' if the straight UTF-8 encoding fails.
If you want to "encode" a full URL without having to think about manually splitting it into its different parts, I found the following worked in the same way that I used to use URI.encode:
URI.parse(my_url).to_s

regex error(Java::JavaLang::ArrayIndexOutOfBoundsException: 4) on 3 character non-english strings

The following steps reproduce the error. Any workarounds/fixes? Thanks.
Also this happens with
jruby 1.5.2/rails 2.3.9 and jruby 1.6/rails 3.0.5
regex = /(aaa|bbb):/
str = "\343\202\242:"
str =~ regex
Steps in action.
d:\myapp>jruby script/rails console
Loading development environment (Rails 3.0.5)
irb(main):001:0> regex = /(aaa|bbb):/
=> /(aaa|bbb):/
irb(main):002:0> str = "\343\202\242:"
=> "péó:"
irb(main):003:0> str =~ regex
Java::JavaLang::ArrayIndexOutOfBoundsException: 4
from org.jcodings.MultiByteEncoding.safeLengthForUptoFour(MultiByteEncoding.java:5
from org.jcodings.specific.NonStrictUTF8Encoding.length(NonStrictUTF8Encoding.java
from org.joni.Matcher.forwardSearchRange(Matcher.java:124)
from org.joni.Matcher.search(Matcher.java:432)
from org.jruby.RubyRegexp.search(RubyRegexp.java:1474)
from org.jruby.RubyRegexp.op_match(RubyRegexp.java:1391)
from org.jruby.RubyString.op_match(RubyString.java:1557)
from org.jruby.RubyString$i$1$0$op_match.call(RubyString$i$1$0$op_match.gen:65535)
from org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:
from org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:139)
from org.jruby.ast.CallOneArgNode.interpret(CallOneArgNode.java:57)
from org.jruby.ast.NewlineNode.interpret(NewlineNode.java:103)
from org.jruby.ast.RootNode.interpret(RootNode.java:129)
from org.jruby.evaluator.ASTInterpreter.INTERPRET_EVAL(ASTInterpreter.java:95)
from org.jruby.evaluator.ASTInterpreter.evalWithBinding(ASTInterpreter.java:160)
from org.jruby.RubyKernel.evalCommon(RubyKernel.java:1134)
... 158 levels...
from org.jruby.RubyKernel$s$1$0$require.call(RubyKernel$s$1$0$require.gen:65535)
from org.jruby.internal.runtime.methods.JavaMethod$JavaMethodOneOrNBlock.call(Java
from org.jruby.internal.runtime.methods.AliasMethod.call(AliasMethod.java:61)
from org.jruby.internal.runtime.methods.AliasMethod.call(AliasMethod.java:61)
from org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:
from org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:139)
from script.rails.__file__(script/rails:6)
from script.rails.load(script/rails)
from org.jruby.Ruby.runScript(Ruby.java:670)
from org.jruby.Ruby.runNormally(Ruby.java:574)
from org.jruby.Ruby.runFromMain(Ruby.java:423)
from org.jruby.Main.doRunFromMain(Main.java:278)
from org.jruby.Main.internalRun(Main.java:198)
from org.jruby.Main.run(Main.java:164)
from org.jruby.Main.run(Main.java:148)
from org.jruby.Main.main(Main.java:128)irb(main):004:0>
Perhaps the regex is not delimited.
How about
str =~ /(aaa|bbb):/
or
regex='(aaa|bbb):'
str =~ /regex/
I think because you're passing a string that contains multibyte characters you need to pass the /u regex parameter to parse it as a UTF-8 string.
Just checked this in different versions of Ruby and it only appears to happen in JRuby so I think you've found a bug ;)
If you use something like "string".to_java_string first it would appear to work but it's actually converting it to ISO-8859-1 first which you don't want. To maintain your encoding just use .to_java and pass it the regex.
Here's a work around that should work I think:
regex = /(aaa|bbb):/u
str = "\343\202\242:"
str.to_java =~ regex

Rails 2.3.2/Ruby 1.8.6 Encoding Question - ActionController returning UTF-8?

I have a pretty simple Rails question regarding encoding that I can't find an answer to.
Environment:
Rails 2.3.2/Ruby1.8.6
I am not setting any encoding options within the Rails environment currently, have left everything to defaults.
If I read a String from disk from a text file - and send it via Rails render :text functionality using Apache/Phusion, what encoding should the client expect?
Thank you for any answers,
Since about Rails 1.2, Rails sets Ruby 1.8's $KCODE magic variable to "UTF8". It includes ActiveSupport::CoreExtensions::String::Multibyte to patch around issues with otherwise ambiguous per-character/per-byte operators. Your text file should be UTF-8, Ruby will pass it through and your application layout should specify a META tag declaring the document's charset to be UTF-8 too:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
Then it should all 'just work', but there are some gotchas described below.
If you're on a Mac, running "script/console" in Terminal.app and then pasting unusual character sequences directly into the terminal from e.g. the Character Viewer is a good way to play around and demonstrate this to your own satisfaction, since the whole OS works in UTF-8. I don't know what the equivalent would be for Windows or an arbitrary Linux distribution.
For example, "⇒" - RIGHTWARDS DOUBLE ARROW - is Unicode 21D2, UTF8 0xE2 (226), 0x87 (125), 0x92 (146). If I paste that into Terminal and ask for the byte values I get the expected result:
>> $KCODE
=> "UTF8"
>> "⇒"
=> "\342\207\222"
>> puts "⇒"
⇒
...but...
>> "⇒"[0]
=> 226
>> "⇒"[1]
=> 135
>> "⇒"[2]
=> 146
>> "⇒"[3]
=> nil
Note how you're still getting byte access with "[]". See the documentation on the Multibyte extensions in the Rails API (for Rails 2.2, e.g. at http://railsapi.com/) if you want to do string operations, otherwise things like "foo.reverse" will do the wrong thing; "foo.mb_chars.reverse" gets it right by using the "mb_chars" proxy.

Character Encoding issue in Rails v3/Ruby 1.9.2

I get this error sometimes "invalid byte sequence in UTF-8" when I read contents from a file. Note - this only happens when there are some special characters in the string. I have tried opening the file without "r:UTF-8", but still get the same error.
open(file, "r:UTF-8").each_line { |line| puts line.strip(",") } # line.strip generates the error
Contents of the file:
# encoding: UTF-8
290919,"SE","26","Sk‰l","",59.4500,17.9500,, # this errors out
290956,"CZ","45","HornÌ Bradlo","",49.8000,15.7500,, # this errors out
290958,"NO","02","Svaland","",58.4000,8.0500,, # this works
This is the CSV file I got from outside and I am trying to import it into my DB, it did not come with "# encoding: UTF-8" at the top, but I added this since I read somewhere it will fix this problem, but it did not. :(
Environment:
Rails v3.0.3
ruby 1.9.2p0 (2010-08-18 revision 29036) [x86_64-darwin10.5.0]
Ruby has a notion of an external encoding and internal encoding for each file. This allows you to work with a file in UTF-8 in your source, even when the file is stored in a more esoteric format. If your default external encoding is UTF-8 (which it is if you're on Mac OS X), all of your file I/O is going to be in UTF-8 as well. You can check this using File.open('file').external_encoding. What you're doing when you opening your file and passing "r:UTF-8" is forcing the same external encoding that Ruby is using by default.
Chances are, your source document isn't in UTF-8 and those non-ascii characters aren't mapping cleanly to UTF-8 (if they were, you would either get the correct characters and no error, and if they mapped by incorrectly, you would get incorrect characters and no error). What you should do is try to determine the encoding of the source document, then have Ruby transcode the document on read, like so:
File.open(file, "r:windows-1251:utf-8").each_line { |line| puts line.strip(",") }
If you need help determining the encoding of the source, give this Python library a whirl. It's based on the automatic charset detection fallback that was in Seamonkey/Mozilla (and is possibly still in Firefox).
If you want to change your file encoding, you can use gem 'charlock holmes'
https://github.com/brianmario/charlock_holmes
$require 'charlock_holmes/string'
content = File.read('test2.txt')
if !content.is_utf8?
detection = CharlockHolmes::EncodingDetector.detect(content)
utf8_encoded_content = CharlockHolmes::Converter.convert content, detection[:encoding], 'UTF-8'
end
Then you can save your new content in a temp file and overwrite your original file.
Hope this help.

Ruby On Rails and UTF-8

I have an Rails application with SayController, hello action and view template say/hello.html.erb. When I add some cyrillic character like "ю", I get an error:
ArgumentError in SayController#hello
invalid byte sequence in UTF-8
Headers:
{"Cache-Control"=>"no-cache",
"X-Runtime"=>"11",
"Content-Type"=>"text/html; charset=utf-8"}
If I try to write this letter with embedded Ruby,
<%= "ю" %>
I don't get any error, but it displays a question mark in black square (�) instead of this letter.
I use Windows 7 x64, Ruby 1.9.1p378, Rails 2.3.5, WEBrick server.
A likely cause of this error is that the file which contains the cyrillic letters is not encoded in UTF8, but perhaps in some russian encoding like KOI8. This will cause the characters to be impossible to interpret in UTF8 (and rightly so!).
So double check that your file is properly encoded in UTF8.
Create a initializer file (e.g encoding_fix.rb) under your_app/config/initializers with the following content:
Encoding.default_internal = Encoding::UTF_8 if RUBY_VERSION > "1.9"
Encoding.default_external = Encoding::UTF_8 if RUBY_VERSION > "1.9"
This sets the encoding to utf8.

Resources