I have a string with spaces (one simple space and one ideographic space):
"qwe rty uiop".gsub(/[\s]+/,'') #=> "qwe rtyuiop"
How can I add all space-codes (for example 3000, 2060, 205f) in my pattern?
In Ruby 1.9 I just added \u3000 and other codes, but how do it in 1.8?
I think i found answer. In ActiveSupport::Multibyte::Chars is a UNOCODE_WHITESPACE constant. Solution:
pattern = ActiveSupport::Multibyte::Chars::UNICODE_WHITESPACE.collect do |c|
c.pack "U*"
end.join '|'
puts "qwe rty uiop".mb_chars.gsub(/#{pattern}/,'')
#=> qwertyuiop
Related
How can I create a ruby regular expression that includes a unicode character?
For example, I would like to the character "\u0002" in my regular expression.
You can write /\x02/ :
"\u0002" =~ /\x02/
#=> 0
If you're not sure, you can just start from a string :
Regexp.new("\u0002")
#=> /\x02/
Here's another example :
"☀☁☂" =~ /\u2602/
#=> 2
As mentionned by #TomLord in the comments, you can also specify a range. To check if a string includes a UTF-8 arrow :
"↹" =~ /[\u2190-\u21FF]/
#=> 0
I'm trying to parse words out of a string and put them into an array. I've tried the following thing:
#string1 = "oriented design, decomposition, encapsulation, and testing. Uses "
puts #string1.scan(/\s([^\,\.\s]*)/)
It seems to do the trick, but it's a bit shaky (I should include more special characters for example). Is there a better way to do so in ruby?
Optional: I have a cs course description. I intend to extract all the words out of it and place them in a string array, remove the most common word in the English language from the array produced, and then use the rest of the words as tags that users can use to search for cs courses.
The split command.
words = #string1.split(/\W+/)
will split the string into an array based on a regular expression. \W means any "non-word" character and the "+" means to combine multiple delimiters.
For me the best to spliting sentences is:
line.split(/[^[[:word:]]]+/)
Even with multilingual words and punctuation marks work perfectly:
line = 'English words, Polski Żurek!!! crème fraîche...'
line.split(/[^[[:word:]]]+/)
=> ["English", "words", "Polski", "Żurek", "crème", "fraîche"]
Well, you could split the string on spaces if that's your delimiter of interest
#string1.split(' ')
Or split on word boundaries
\W # Any non-word character
\b # Any word boundary character
Or on non-words
\s # Any whitespace character
Hint: try testing each of these on http://rubular.com
And note that ruby 1.9 has some differences from 1.8
For Rails you can use something like this:
#string1.split(/\s/).delete_if(&:blank?)
I would write something like this:
#string
.split(/,+|\s+/) # any ',' or any whitespace characters(space, tab, newline)
.reject(&:empty?)
.map { |w| w.gsub(/\W+$|^\W+^*/, '') } # \W+$ => any trailing punctuation; ^\W+^* => any leading punctuation
irb(main):047:0> #string1 = "oriented design, 'with', !!qwe, and testing. can't rubyisgood#)(*#%)(*, and,rails,is,good"
=> "oriented design, 'with', !!qwe, and testing. can't rubyisgood#)(*#%)(*, and,rails,is,good"
irb(main):048:0> #string1.split(/,+|\s+/).reject(&:empty?).map { |w| w.gsub(/\W+$|^\W+^*/, '')}
=> ["oriented", "design", "with", "qwe", "and", "testing", "can't", "rubyisgood", "and", "rails", "is", "good"]
How to check whether a string contains special character in ruby. If I get regular expression also it is fine.
Please let me know
Use str.include?.
Returns true if str contains the given string or character.
"hello".include? "lo" #=> true
"hello".include? "ol" #=> false
"hello".include? ?h #=> true
special = "?<>',?[]}{=-)(*&^%$#`~{}"
regex = /[#{special.gsub(/./){|char| "\\#{char}"}}]/
You can then use the regex to test if a string contains the special character:
if some_string =~ regex
This looks a bit complicated: what's going on in this bit
special.gsub(/./){|char| "\\#{char}"}
is to turn this
"?<>',?[]}{=-)(*&^%$#`~{}"
into this:
"\\?\\<\\>\\'\\,\\?\\[\\]\\}\\{\\=\\-\\)\\(\\*\\&\\^\\%\\$\\#\\`\\~\\{\\}"
Which is every character in special, escaped with a \ (which itself is escaped in the string, ie \\ not \). This is then used to build a regex like this:
/[<every character in special, escaped>]/
"foobar".include?('a')
# => true
Why not use inverse of [:alnum:] posix.
Here [:alnum:] includes all 0-9, a-z, A-Z.
Read more here.
"Hel#lo".index( /[^[:alnum:]]/ )
This will return nil in case you do not have any special character and hence eaiest way I think.
How about this command in Ruby 2.0.0 and above?
def check_for_a_special_charachter(string)
/\W/ === string
end
Therefore, with:
!"He#llo"[/\W/].nil? => True
!"Hello"[/\W/].nil? => False
if you looking for a particular character, you can make a range of characters that you want to include and check if what you consider to be a special character is not part of that arsenal
puts String([*"a".."z"].join).include? "a" #true
puts String([*"a".."z"].join).include? "$" #false
I think this is flexible because here you are not limited as to what should be excluded
puts String([*"a".."z",*0..9,' '].join).include? " " #true
I have data file as in format of TXT , I like to parse the URL field from TXT file using the below ruby code
f = File.open(txt_file, "r")
f.each_line { |line|
rows = line.split(',')
rows[3].each do |url|
next if url=="URL"
puts url
end
}
TXT contains:
name,option,price,URL
"x", "0,0,0,0,0,0", "123.40","http://domain.com/xym.jpg"
"x", "0,0,0,0,0,0", "111.34","http://domain.com/yum.jpg"
output:
0
Why does the output come from the option field "0,0,0,0,0,0"? How do I skip this and get the URL field?
Environment
ruby 1.8.7
rails 2.3.8
gem 1.3.7
I'd check out a CSV parsing tool to make this easier:
require 'rubygems'
require 'faster_csv'
FasterCSV.foreach(txt_file, :quote_char => '"',
:col_sep =>',', :row_sep =>:auto) do |row|
puts row[3] if row[3] != "URL"
break
end
Also, I think you're misunderstanding how the split() would work. If you run split() against one row from your file, you're going to get back an array of columns for that single row, not a multidimensional array as rows[3].each would suggest.
EDIT: Before reading, I completely agree with the answer by Jeff Swensen, I'll leave my answer here regardless.
I'm not entirely sure what your inside loop is for (rows[3].each) Because you can't convert a single line into a 'row' when you only have a single URL. You could split by the ** characters and return an Array of urls but then you still need to remove the extra double quotes, or you could use a Regular Expression, like so:
#!/usr/bin/env ruby
f = DATA
urls = f.readlines.map do |line|
line[/([^"]+)"\*\*/, 1]
end
urls.compact!
p urls
__END__
name ,option,price, **URL**
"x", "0,0,0,0,0,0", "123.40",**"http://domain.com/xym.jpg"**
"x", "0,0,0,0,0,0", "111.34",**"http://domain.com/yum.jpg"**
The call to compact is needed because map will insert nil objects when you hit something that doesn't match that expression. For the String#[] method, see here
The reason that "0" is the result is that your code is blindly splitting on the comma char when you seem to be expecting parsing CSV-style (where column values may contain delimiter chars if the entire column value is enclosed in quotes. I highly suggest using a csv parser. If you are using Ruby 1.9.2, then you will already have access to the FasterCSV library.
If you are sure that the fields you want are always surrounded by double quotations, you can use that as the basis for extracting rather than the comma.
File.open(txt_file) do |f|
f.each_line do |l|
cols = l.scan(/(?<!\\)"(.*?)(?<!\\)"/)
cols[3].tap{|url| puts url if url}
end
end
In your code, the opened IO is not closed. This is a bad practice. It is better to use a block so that you do not forget to close it.
The two (?<!\\)" in the regex match non-escaped double quotations. They use negative lookbehind.
.*? is a non-greedy match, which avoids a match from exceeding a non-escaped double quotation.
tap is to avoid repeating the cols[3] operation twice in puts and if.
Edit again
If you use ruby 1.8.7, you can either
update your regex engine to oniguruma by following easy steps here, http://oniguruma.rubyforge.org/
or
replace the regex. tap cannot be used also. Use the following instead:
.
File.open(txt_file) do |f|
f.each_line do |l|
cols = l.scan(/(?:\A|[^\\])"(.*?[^\\]|)"/)
url = cols[3]
puts url if url
end
end
I would recomment using oniguruma. It is a new regex engine introduced since ruby 1.9, and is much powerful and faster than the one used in ruby 1.8. It can be installed easily on ruby 1.8.
The data is in CSV format, but if all you want to do is grab the last field in the string, then do just that:
text =<<EOT
name,option,price,URL
"x", "0,0,0,0,0,0", "123.40","http://domain.com/xym.jpg"
"x", "0,0,0,0,0,0", "111.34","http://domain.com/yum.jpg"
EOT
require 'pp'
text.lines.map{ |l| l.split(',').last }
If you want to clean up the double-quotes and trailing line-breaks:
text.lines.map{ |l| l.split(',').last.gsub('"', '').chomp }
# => ["URL", "http://domain.com/xym.jpg", "http://domain.com/yum.jpg"]
For an admin function in a Rails app, I want to be able to store regexes in the DB (as strings), and add them via a standard controller action.
I've run into 2 issues:
1) The Rails parameter filters seem to be automatically escaping backslashes (escape characters), which messes up the regex. For instance:
\s{1,2}(foo)
becomes:
\\s{1,2}(foo)
2) So then I tried to use a write_attribute to gsub instances of double backslashes with single backslashes (essentially unescaping them). This proved to be much trickier than expected. (I'm using Ruby 1.9.2 if it matters). Some things I've found:
"hello\\world".gsub(/\\/, ' ') #=> "hello world"
"hello\\world".gsub(/\\/, "\\") #=> "hello\\world"
"hello\\world".gsub(/\\/, '\\') #=> "hello\\world"
What I'm trying to do is:
"hello\\world".gsub(/\\/, something) #=> "hello\world"
I'd love to know both solutions.
1) How can you safely pass and store regexes as params to a Rails controller action?
2) How can you substitute double backslashes with a single backslash?
In short, you can't substitute a double backslash with a single one in a string, because a single backslash in a string is an escape character. What you can do is the following:
Regexp.new("hello\\world") #=> /hello\world/
This will convert your string into a regular expression. So that means: store your regular expressions as strings (with the escaped characters) and convert them into regular expressions when you want to compare against them:
regexp = "\\s{1,2}(foo)"
reg = Regexp.new(regexp) #=> /\s{1,2}(foo)/
" foo" =~ reg #=> 0