How do I create a checkmark in unicode? - ruby-on-rails

I would think it would be:
"✓".encode(:unicode)
But I think this is not the correct usage of .encode. And when I say:
"✓".encode('Unicode')
it cannot do the conversion.

If you're using Ruby 1.9 (which has much better built-in encoding support), you can do this:
> checkmark = "\u2713"
# => "✓"
> checkmark.encoding
# => #<Encoding:UTF-8>

Related

Soft signs in rails console

I want to create multiple categories via console and I want to be able add soft signs. At this moment I can't do that.
It's very important to project that I can save category names with soft signs.
Can somebody tip me where to search? I searched such tag - soft signs rails.
There wasn't any usefull resource.
Thanks
EDIT
Soft signs in my native language is like this.
Ā,Š,Ē,Ž with that symbol called soft sign abowe the character.
At this moment when I try to save new category record it shows me this kind off error
thodError: undefined methodcache_ancestry!' for #
But I am sure that I didn't change anything in models or controllers :(
What version of Ruby is this? What you're seeing there are either US-ASCII strings with UTF-8 data in them (Ruby 1.9) or byte arrays (Ruby 1.8).
If you're using Ruby 1.8, you may need to use Iconv to convert your encoding from US-ASCII to UTF-8. If you're using Ruby 1.9, then make sure you're creating UTF-8 strings and it should work just fine.
Note that those escape sequences are correct - that is the literal byte array of those characters, assuming the proper encoding is applied, so you may not need to actually change anything. If the bytes are right, everything's fine - you're just seeing ruby interpret the string as ASCII rather than UTF-8 or whatnot.
In Ruby 1.8, when you #inspect a string, you get the escaped version, but putsing it will show you the actual string:
1.8.7 :021 > s = "Komunālās mašīnas"
=> "Komun\304\201l\304\201s ma\305\241\304\253nas"
1.8.7 :022 > puts s
Komunālās mašīnas
In 1.9, you get the correct display all around, so long as your encoding is right:
1.9.3p327 :001 > s = "Komunālās mašīnas"
=> "Komunālās mašīnas"
1.9.3p327 :004 > s.force_encoding "US-ASCII"
=> "Komun\xC4\x81l\xC4\x81s ma\xC5\xA1\xC4\xABnas"
1.9.3p327 :005 > puts s
Komunālās mašīnas
Check this out Edgars:
#encoding: UTF-8
t = 'ŠšÐŽžÀÁÂÃÄAÆAÇÈÉÊËÌÎÑNÒOÓOÔOÕOÖOØOUÚUUÜUÝYÞBßSàaáaâäaaæaçcèéêëìîðñòóôõöùûýýþÿƒ'
fallback = {
'Š'=>'S', 'š'=>'s', 'Ð'=>'Dj','Ž'=>'Z', 'ž'=>'z', 'À'=>'A', 'Á'=>'A', 'Â'=>'A', 'Ã'=>'A', 'Ä'=>'A',
'Å'=>'A', 'Æ'=>'A', 'Ç'=>'C', 'È'=>'E', 'É'=>'E', 'Ê'=>'E', 'Ë'=>'E', 'Ì'=>'I', 'Í'=>'I', 'Î'=>'I',
'Ï'=>'I', 'Ñ'=>'N', 'Ò'=>'O', 'Ó'=>'O', 'Ô'=>'O', 'Õ'=>'O', 'Ö'=>'O', 'Ø'=>'O', 'Ù'=>'U', 'Ú'=>'U',
'Û'=>'U', 'Ü'=>'U', 'Ý'=>'Y', 'Þ'=>'B', 'ß'=>'Ss','à'=>'a', 'á'=>'a', 'â'=>'a', 'ã'=>'a', 'ä'=>'a',
'å'=>'a', 'æ'=>'a', 'ç'=>'c', 'è'=>'e', 'é'=>'e', 'ê'=>'e', 'ë'=>'e', 'ì'=>'i', 'í'=>'i', 'î'=>'i',
'ï'=>'i', 'ð'=>'o', 'ñ'=>'n', 'ò'=>'o', 'ó'=>'o', 'ô'=>'o', 'õ'=>'o', 'ö'=>'o', 'ø'=>'o', 'ù'=>'u',
'ú'=>'u', 'û'=>'u', 'ý'=>'y', 'ý'=>'y', 'þ'=>'b', 'ÿ'=>'y', 'ƒ'=>'f'
}
p t.encode('us-ascii', :fallback => fallback)
See Ruby 1.9.x replace sets of characters with specific cleaned up characters in a string
EDIT:
To get all the characters for your language you will need to add them as desired to the fallback hash. When I run "Komunālās mašīnas" as the variable 't' I get this:
t = "Komunālās mašīnas"
t.encode('us-ascii', :fallback => fallback)
Encoding::UndefinedConversionError: U+0101 from UTF-8 to US-ASCII
You can tell from this where the problem lies by googling U+0101 which shows
http://www.charbase.com/0101-unicode-latin-small-letter-a-with-macron
So now you know which letter is not working and you can add it to the fallback hash like so:
fallback = { OTHER DEFINITIONS , 'ā'=>'a'}
Here's a place to start:
http://www.ascii-codes.com/cp775.html

"Sunspot" Gem makes distinction between UTF-8 chars

In a Rails app I started using sunspot => https://github.com/sunspot/sunspot/blob/master/README.md
Everything went OK until I noticed this (taken from the rails-console):
1.9.3p194 :002 > MyModel.search{fulltext "leon"}.results
=> [#<MyModel id: 16, name: "Leon">]
1.9.3p194 :003 > MyModel.search{fulltext "león"}.results
=> [#<MyModel id: 18, name: "León">]
How can I tell the system not to make distinction between "leon" and "león"
(I want smth like search{fulltext "leon"} => [#MyModel id: 16 ... , #MyModel id: 18...])
I've been looking for this problem and I've found every time the same response:
With this line in Gemfile works meanwhile the next release of rsolr:
gem 'rsolr', :git => "https://github.com/mwmitchell/rsolr.git"
thx
in the schema.xml you need to add a character filter as described in AnalyzersTokenizersTokenFilters for example:
<charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
and in the you should have mapping-ISOLatin1Accent.txt you should have entries that will map the unicode byte sequence to a asci character sequence. You can see an example here
mapping-ISOLatin1Accent.txt
Thx for the responses. At least I've solved it right last night with anohter idea I've taked from http://codeshooter.wordpress.com/2011/01/13/full-text-search-in-in-rails-with-sunspot-and-solr/
the idea is
in Restaurant.rb
text :name do
self.name.my_normalize
end
and the function
to_s.mb_chars.normalize(:kd).gsub(/[^\x00-\x7F]/,'').downcase
that line works with strings like "äáàÁÄÀ" --- "aaaaaa"
You need to make changes inside the Solr (the application, not the gem) configuration files. Solr is "embedded" in the gem, but you can access its configuration as if it were installed separately. Have a look at Solr documentation.

Rails ActiveRecord invalid byte sequence in UTF-8 issue

I am using MSSQL 2005.
I have StringIO object that contains my zip file content.
Here is how I obtain zip binary data:
stringio = Zip::ZipOutputStream::write_buffer do |zio|
Eclaim.find_by_sql("SET TEXTSIZE 67108864")
zio.put_next_entry("application.xml")
#zio.write #claim_db[:xml]
biblio = Nokogiri::XML('<?xml version="1.0" encoding="utf-8"?>' + #claim_db[:xml], &:noblanks)
zio.write biblio.to_xml
builder = Nokogiri::XML::Builder.new(:encoding => 'utf-8') do |xml|
xml.documents {
docs.where("ext not in (#{PROHIBITED_EXTS.collect{|v| "'#{v}'"}.join(', ')})").each{|doc|
zio.put_next_entry("#{doc[:materialtitle_id]}.#{doc[:ext]}")
zio.write doc[:efile]
xml.document(:id => doc[:materialtitle_id]) {
xml.title doc[:title]
xml.code doc[:code]
xml.filename "#{doc[:materialtitle_id]}.#{doc[:ext]}"
xml.extname doc[:ext]
}
}
}
end
zio.put_next_entry("docs.xml")
zio.write builder.to_xml
end
stringio
In my controller I try:
data.rewind
#claim.docs.create(
:title => 'Some file',
:ext => 'zip',
:size => data.length,
:receive_date => Time.now,
:efile => data.sysread
)
But Rails complains invalid byte sequence in UTF-8
Help me pls with it.
If the stream is configured as UTF-8 stream, you can't write compressed binary (which may contain any value).
I think, setting data as binary stream before write:
data.force_encoding "ASCII-8BIT"
might help.
What is complaining, ruby, ActiveRecord, or SQL Server? My guess is SQL Server. Make sure the data type of the efile field in the database is a binary BLOB.
You should break your problem down in constituate parts and troubleshoot smaller units. Remove complexity till you get to the source of the issue. For instance, have you tried just doing a simple Document.create with the attributes in say the console to remove the possibility that your controller code may be buggy? Something like Document.create :efile => File.read('sometiny.zip') and just go from there.
Assuming that works or break, you have a much simpler support request and less noise to issue ratio. Right now I suspect your controller code, not the SQL Server Adapter or the connection mode, as I have both tested to the hilt for simple binary data. Assuming the above does not work, you can then moving to examining smaller components.
For instance, what is the data type of the efile column? Do this in the console to find out, Document.columns_hash['efile'] and look at the #sql_type. Is it something suitable like varbinary(max)?
Moving on from there, what connection mode are you using with the SQL Server Adapter, TinyTDS? By default TinyTDS will convert everything to UTF8 as needed and is really smart about things. I have it tested with everything from binary to many different encodings. BTW, if you are using TinyTDS, did you make sure that you compiled FreeTDS with libiconv so it can do all this properly? You can easily check by doing the following in the console tsql -C assuming you have FreeTDS's binaries in your path. This should output a few lines, look for "iconv library: yes". Also make sure you are running 0.91 or better too!
Lastly, a bit of advice, that SET TEXTSIZE is so wrong there. You only want to do that once per connection. See here https://github.com/rails-sqlserver/activerecord-sqlserver-adapter#configure-connection--app-name

How to URL encode a string in Ruby

How do I URI::encode a string like:
\x12\x34\x56\x78\x9a\xbc\xde\xf1\x23\x45\x67\x89\xab\xcd\xef\x12\x34\x56\x78\x9a
to get it in a format like:
%124Vx%9A%BC%DE%F1%23Eg%89%AB%CD%EF%124Vx%9A
as per RFC 1738?
Here's what I tried:
irb(main):123:0> URI::encode "\x12\x34\x56\x78\x9a\xbc\xde\xf1\x23\x45\x67\x89\xab\xcd\xef\x12\x34\x56\x78\x9a"
ArgumentError: invalid byte sequence in UTF-8
from /usr/local/lib/ruby/1.9.1/uri/common.rb:219:in `gsub'
from /usr/local/lib/ruby/1.9.1/uri/common.rb:219:in `escape'
from /usr/local/lib/ruby/1.9.1/uri/common.rb:505:in `escape'
from (irb):123
from /usr/local/bin/irb:12:in `<main>'
Also:
irb(main):126:0> CGI::escape "\x12\x34\x56\x78\x9a\xbc\xde\xf1\x23\x45\x67\x89\xab\xcd\xef\x12\x34\x56\x78\x9a"
ArgumentError: invalid byte sequence in UTF-8
from /usr/local/lib/ruby/1.9.1/cgi/util.rb:7:in `gsub'
from /usr/local/lib/ruby/1.9.1/cgi/util.rb:7:in `escape'
from (irb):126
from /usr/local/bin/irb:12:in `<main>'
I looked all about the internet and haven't found a way to do this, although I am almost positive that the other day I did this without any trouble at all.
str = "\x12\x34\x56\x78\x9a\xbc\xde\xf1\x23\x45\x67\x89\xab\xcd\xef\x12\x34\x56\x78\x9a".force_encoding('ASCII-8BIT')
puts CGI.escape str
=> "%124Vx%9A%BC%DE%F1%23Eg%89%AB%CD%EF%124Vx%9A"
Nowadays, you should use ERB::Util.url_encode or CGI.escape. The primary difference between them is their handling of spaces:
>> ERB::Util.url_encode("foo/bar? baz&")
=> "foo%2Fbar%3F%20baz%26"
>> CGI.escape("foo/bar? baz&")
=> "foo%2Fbar%3F+baz%26"
CGI.escape follows the CGI/HTML forms spec and gives you an application/x-www-form-urlencoded string, which requires spaces be escaped to +, whereas ERB::Util.url_encode follows RFC 3986, which requires them to be encoded as %20.
See "What's the difference between URI.escape and CGI.escape?" for more discussion.
str = "\x12\x34\x56\x78\x9a\xbc\xde\xf1\x23\x45\x67\x89\xab\xcd\xef\x12\x34\x56\x78\x9a"
require 'cgi'
CGI.escape(str)
# => "%124Vx%9A%BC%DE%F1%23Eg%89%AB%CD%EF%124Vx%9A"
Taken from #J-Rou's comment
I was originally trying to escape special characters in a file name only, not on the path, from a full URL string.
ERB::Util.url_encode didn't work for my use:
helper.send(:url_encode, "http://example.com/?a=\11\15")
# => "http%3A%2F%2Fexample.com%2F%3Fa%3D%09%0D"
Based on two answers in "Why is URI.escape() marked as obsolete and where is this REGEXP::UNSAFE constant?", it looks like URI::RFC2396_Parser#escape is better than using URI::Escape#escape. However, they both are behaving the same to me:
URI.escape("http://example.com/?a=\11\15")
# => "http://example.com/?a=%09%0D"
URI::Parser.new.escape("http://example.com/?a=\11\15")
# => "http://example.com/?a=%09%0D"
You can use Addressable::URI gem for that:
require 'addressable/uri'
string = '\x12\x34\x56\x78\x9a\xbc\xde\xf1\x23\x45\x67\x89\xab\xcd\xef\x12\x34\x56\x78\x9a'
Addressable::URI.encode_component(string, Addressable::URI::CharacterClasses::QUERY)
# "%5Cx12%5Cx34%5Cx56%5Cx78%5Cx9a%5Cxbc%5Cxde%5Cxf1%5Cx23%5Cx45%5Cx67%5Cx89%5Cxab%5Cxcd%5Cxef%5Cx12%5Cx34%5Cx56%5Cx78%5Cx9a"
It uses more modern format, than CGI.escape, for example, it properly encodes space as %20 and not as + sign, you can read more in "The application/x-www-form-urlencoded type" on Wikipedia.
2.1.2 :008 > CGI.escape('Hello, this is me')
=> "Hello%2C+this+is+me"
2.1.2 :009 > Addressable::URI.encode_component('Hello, this is me', Addressable::URI::CharacterClasses::QUERY)
=> "Hello,%20this%20is%20me"
Code:
str = "http://localhost/with spaces and spaces"
encoded = URI::encode(str)
puts encoded
Result:
http://localhost/with%20spaces%20and%20spaces
I created a gem to make URI encoding stuff cleaner to use in your code. It takes care of binary encoding for you.
Run gem install uri-handler, then use:
require 'uri-handler'
str = "\x12\x34\x56\x78\x9a\xbc\xde\xf1\x23\x45\x67\x89\xab\xcd\xef\x12\x34\x56\x78\x9a".to_uri
# => "%124Vx%9A%BC%DE%F1%23Eg%89%AB%CD%EF%124Vx%9A"
It adds the URI conversion functionality into the String class. You can also pass it an argument with the optional encoding string you would like to use. By default it sets to encoding 'binary' if the straight UTF-8 encoding fails.
If you want to "encode" a full URL without having to think about manually splitting it into its different parts, I found the following worked in the same way that I used to use URI.encode:
URI.parse(my_url).to_s

Can I use a regular expression to extract the domain from a URL?

Suppose I want to turn this :
http://en.wikipedia.org/wiki/Anarchy
into this :
en.wikipedia.org
or even better, this :
wikipedia.org
Is this even possible in regex?
Why use a regex when Ruby has a library for it? The URI library:
ruby-1.9.1-p378 > require 'uri'
=> true
ruby-1.9.1-p378 > uri = URI.parse("http://en.wikipedia.org/wiki/Anarchy")
=> #<URI::HTTP:0x000001010a2270 URL:http://en.wikipedia.org/wiki/Anarchy>
ruby-1.9.1-p378 > uri.host
=> "en.wikipedia.org"
ruby-1.9.1-p378 > uri.host.split('.')
=> ["en", "wikipedia", "org"]
Splitting the host is one way to separate the domains, but I'm not aware of a reliable way to get the base domain -- you can't just count, in the event of a URL like "http://somedomain.otherdomain.school.ac.uk" vs "www.google.com".
/http:\/\/([^\/]*).*/ will produce en.wikipedia.org from the string you provided.
/http:\/\/.{0,3}\.([^\/]*).*/ will produce wikipedia.org.
yes
Now I know you haven't asked for how, and you haven't specified a language, but I'll answer anyway... (note, this works for all language subsites, not just en.wikipedia...)
perl:
$url =~ s,http://[a-z]{2}\.(wikipedia\.org)/.*,$1,;
ruby:
url = url.sub(/http:\/\/[a-z]{2}\.(wikipedia\.org)\/.*/, '\1')
php:
$url = preg_replace('|http://[a-z]{2}.(wikipedia.org)/.*|, '$1', $url);
Of course, for this particular example, you don't even need a regex, just this will do:
url = 'wikipedia.org'
but I jest...
you probably want to handle any URL and pull out the domain part, and it should also work for domains in different countries, eg: foo.co.uk.
In which case, I'd use Mark Rushakoff's solution to get the hostname and then a regex to pull out the domain:
domain = host.sub(/^.*\.([^.]+\.[^.]+(\.[a-z]{2})?)$/, '\1')
Hope this helps
Also, if you want to learn more, I have a regex tute online: http://tech.bluesmoon.info/2006/04/beginning-regular-expressions.html
Sure all you would have to do is search on http://(.*)/wiki/Anarchy
In Perl (Sorry I don't know Ruby, but I expect it's similar)
$string_to_search =~ s/http:////(.)//. should give you wikipedia.org
to get rid of the en, you can simply search on http:////en(.)//......
That should do it.
Update: In case you're not familiar with Regex, I would recommend picking up a Regex book, this one really rocks and I like it: REGEX BOOK,Mastering Regular Expressions, I saw it on half.com the other day for 14.99 used, but to clarify what i suggested above, is to look for the string http://en, then for anything until you find a / this is all captured in $1 (in perl, not sure if it's the same in ruby), a simple print $1 will print the string.
Update: #2 sorry the star in the regex is not showing up for some reason, so where you see the . in the () and after the // just imagine a *, oh and I forgot for the en part add a /. at the end that way you don't end up with .wikipedia.org

Resources