Ruby regex to match string between variable and two digits

Ruby regex to match string between variable and two digits - ruby-on-rails

I'm not getting what I want from my regex.
I'm looking thru #my_string which contains:
"Black Multi/Wide Calf":[{"large":"http://ecx.images-willy.com/images
/I/41suNF66r4L.jpg","variant":"MAIN","hiRes":"http://ecx.images-willy.com
/images/I/51knTtAU6mL._UL1000_.jpg","thumb":"http://ecx.images-willy.com
/images/I/41suNF66r4L._US40_.jpg","main":{"http://ecx.images-willy.com/images
/I/51knTtAU6mL._UY500_.jpg":["500","500""]}}],"Dark Brown":
[{"large":"http://ecx.images......
And I have a variable which is:
#color = "Black Multi"
And my regex looks like:
/^#{#color}(.*)\d+(}\])$/i.match(#my_string)
I want the string that starts with "Black Multi" and ends with }]:
Black Multi/Wide Calf":[{"large":"http://ecx.images-willy.com/images
/I/41suNF66r4L.jpg","variant":"MAIN","hiRes":"http://ecx.images-willy.com
/images/I/51knTtAU6mL._UL1000_.jpg","thumb":"http://ecx.images-willy.com
/images/I/41suNF66r4L._US40_.jpg","main":{"http://ecx.images-willy.com/images
/I/51knTtAU6mL._UY500_.jpg":["500","500""]}}]
I'm getting nil with what I have. where did I jack this up?

It looks like your string is a JSON-encoded object. Don't try to parse it using regex. Instead parse it using a JSON parser, then access its contents like normal.
require 'json'
my_string = '{"Black Multi/Wide Calf":[{"large":"http://ecx.images-willy.com/images/I/41suNF66r4L.jpg","variant":"MAIN","hiRes":"http://ecx.images-willy.com/images/I/51knTtAU6mL._UL1000_.jpg","thumb":"http://ecx.images-willy.com/images/I/41suNF66r4L._US40_.jpg","main":{"http://ecx.images-willy.com/images/I/51knTtAU6mL._UY500_.jpg":["500","500"]}}]}'
obj = JSON[my_string]
# => {"Black Multi/Wide Calf"=>
# [{"large"=>"http://ecx.images-willy.com/images/I/41suNF66r4L.jpg",
# "variant"=>"MAIN",
# "hiRes"=>
# "http://ecx.images-willy.com/images/I/51knTtAU6mL._UL1000_.jpg",
# "thumb"=>
# "http://ecx.images-willy.com/images/I/41suNF66r4L._US40_.jpg",
# "main"=>
# {"http://ecx.images-willy.com/images/I/51knTtAU6mL._UY500_.jpg"=>
# ["500", "500"]}}]}
Because it's now a regular object, in this case a hash, it's easy to access its key/value pairs:
obj["Black Multi/Wide Calf"] # => [{"large"=>"http://ecx.images-willy.com/images/I/41suNF66r4L.jpg", "variant"=>"MAIN", "hiRes"=>"http://ecx.images-willy.com/images/I/51knTtAU6mL._UL1000_.jpg", "thumb"=>"http://ecx.images-willy.com/images/I/41suNF66r4L._US40_.jpg", "main"=>{"http://ecx.images-willy.com/images/I/51knTtAU6mL._UY500_.jpg"=>["500", "500"]}}]
And it's easy to drill down:
obj["Black Multi/Wide Calf"][0]['large'] # => "http://ecx.images-willy.com/images/I/41suNF66r4L.jpg"

You need to add the "multiline" flag (/m) to the regex:
str[/Black Multi.*?\}\]/m]
#=> "Black Multi/Wide Calf\"......\"500\"\"]}}]"

Related

How to return the same result from str.scan and str.match in Ruby using regex

I have a method to capture the extension as a group using a regex:
def test(str)
word_match = str.match(/\.(\w*)/)
word_scan = str.scan(/\.(\w*)/)
puts word_match, word_scan
end
test("test.rb")
So it will return:
.rb
rb
Why would I get a different answer?

The reason is that match and scan return different objects. match returns either a MatchData object or a String while scan returns an Array. You can see this by calling the class method on your variables
puts word_match.class # => MatchData
puts word_scan.class # => Array
If you take a look at the to_s method on MatchData you'll notice it returns the entire matched string, rather than the captures. If you wanted just the captures you could use the captures method.
puts word_match.captures # => "rb"
puts word_match.captures.class # => Array
If you were to pass a block to the match method you would get a string back with similar results to the scan method.
word_match = str.match(/\.(\w*)/) { |m| m.captures } # => [["rb"]]
puts word_scan.inspect #=> ["rb"]
puts word_match #=> "rb
More information on these methods and how they work can be found in the ruby-doc for the String class.

Don't write your own code for this, take advantage of Ruby's own built-in code:
File.extname("test.rb") # => ".rb"
File.extname("a/b/d/test.rb") # => ".rb"
File.extname(".a/b/d/test.rb") # => ".rb"
File.extname("foo.") # => "."
File.extname("test") # => ""
File.extname(".profile") # => ""
File.extname(".profile.sh") # => ".sh"
You're missing some cases. Compare the above to the output of your attempts:
fnames = %w[
test.rb
a/b/d/test.rb
.a/b/d/test.rb
foo.
test
.profile
.profile.sh
]
fnames.map { |fn|
fn.match(/\.(\w*)/).to_s
}
# => [".rb", ".rb", ".a", ".", "", ".profile", ".profile"]
fnames.map { |fn|
fn.scan(/\.(\w*)/).to_s
}
# => ["[[\"rb\"]]",
# "[[\"rb\"]]",
# "[[\"a\"], [\"rb\"]]",
# "[[\"\"]]",
# "[]",
# "[[\"profile\"]]",
# "[[\"profile\"], [\"sh\"]]"]
The documentation for File.extname says:
Returns the extension (the portion of file name in path starting from the last period).
If path is a dotfile, or starts with a period, then the starting dot is not dealt with the start of the extension.
An empty string will also be returned when the period is the last character in path.
On Windows, trailing dots are truncated.
The File class has many more useful methods to pick apart filenames. There's also the Pathname class which is very useful for similar things.

Include apostrophe with .split()

I'm trying to display an array of words from a user's post. However the method I'm using treats an apostrophe like whitespace.
<%= var = Post.pluck(:body) %>
<%= var.join.downcase.split(/\W+/) %>
So if the input text was: The baby's foot
it would output the baby s foot,
but it should be the baby's foot.
How do I accomplish that?

Accepted answer is too naïve:
▶ "It’s naïve approach".split(/[^'\w]+/)
#⇒ [
# [0] "It",
# [1] "s",
# [2] "nai",
# [3] "ve",
# [4] "approach"
# ]
this is because nowadays there is almost 2016 and many users might want to use their normal names, like, you know, José Østergaard. Punctuation is not only the apostroph, as you might notice.
▶ "It’s naïve approach".split(/[^'’\p{L}\p{M}]+/)
#⇒ [
# [0] "It’s",
# [1] "naïve",
# [2] "approach"
# ]
Further reading: Character Properties.

Along the lines of mudasobwa's answer, here's what \w and \W bring to the party:
chars = [*' ' .. "\x7e"].join
# => " !\"\#$%&'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~"
That's the usual visible lower-ASCII characters we'd see in code. See the Regexp documentation for more information.
Grabbing the characters that match \w returns:
chars.scan(/\w+/)
# => ["0123456789",
# "ABCDEFGHIJKLMNOPQRSTUVWXYZ",
# "_",
# "abcdefghijklmnopqrstuvwxyz"]
Conversely, grabbing the characters that don't match \w, or that match \W:
chars.scan(/\W+/)
# => [" !\"\#$%&'()*+,-./", ":;<=>?#", "[\\]^", "`", "{|}~"]
\w is defined as [a-zA-Z0-9_] which is not what you want to normally call "word" characters. Instead they're typically the characters we use to define variable names.
If you're dealing with only lower-ASCII characters, use the character-class
[a-zA-Z]
For instance:
chars = [*' ' .. "\x7e"].join
lower_ascii_chars = '[a-zA-Z]'
not_lower_ascii_chars = '[^a-zA-Z]'
chars.scan(/#{lower_ascii_chars}+/)
# => ["ABCDEFGHIJKLMNOPQRSTUVWXYZ", "abcdefghijklmnopqrstuvwxyz"]
chars.scan(/#{not_lower_ascii_chars}+/)
# => [" !\"\#$%&'()*+,-./0123456789:;<=>?#", "[\\]^_`", "{|}~"]
Instead of defining your own, you can take advantage of the POSIX definitions and character properties:
chars.scan(/[[:alpha:]]+/)
# => ["ABCDEFGHIJKLMNOPQRSTUVWXYZ", "abcdefghijklmnopqrstuvwxyz"]
chars.scan(/\p{Alpha}+/)
# => ["ABCDEFGHIJKLMNOPQRSTUVWXYZ", "abcdefghijklmnopqrstuvwxyz"]
Regular expressions always seem like a wonderful new wand to wave when extracting information from a string, but, like the Sorcerer's Apprentice found out, they can create havoc when misused or not understood.
Knowing this should help you write a bit more intelligent patterns. Apply that to what the documentation shows and you should be able to easily figure out a pattern that does what you want.

You can use below RegEx instead of /\W+/
var.join.downcase.split(/[^'\w]+/)
/\W/ refers to all non-word characters, apostrophe is one such non-word character.
To keep the code as close to original intent, we can use /[^'\w]/ - this means that all characters that are not apostrophe and word character.

Running that string through irb with the same split call that you wrote in your comment gets this:
irb(main):008:0> "The baby's foot".split(/\W+/)
=> ["The", "baby", "s", "foot"]
However, if you use split without an explicit delimiter, you get the split you're looking for:
irb(main):009:0> "The baby's foot".split
=> ["The", "baby's", "foot"]
Does that get you what you're looking for?

Ruby on Rails Titleize Underscore,Hyphenated and Single Quote Name

Rails titleize method removes hyphen and underscore, and capitalize method does not capitalize the word comes after hyphen and underscore. I wanted to do something like following:
sam-joe denis-moore → Sam-Joe Denis-Moore
sam-louise o'donnell → Sam-Louise O'Donnell
arthur_campbell john-foo → Arthur_Campbell John-Foo"
What is pattern that need to use on gsub below for this:
"sam-joe denis-moore".humanize.gsub(??) { $1.capitalize }
# => "Sam-Joe Denis-Moore"
Any help is really appreciated

While lurker's answer works, it's far more complicated than it needs to be. As you surmised, you can do this with gsub alone:
INITIAL_LETTER_EXPR = /(?:\b|_)[a-z]/
arr = [ "sam-joe denis-moore",
"sam-louise o'donnell",
"arthur_campbell john-foo" ]
arr.each do |str|
puts str.gsub(INITIAL_LETTER_EXPR) { $&.upcase }
end
# => Sam-Joe Denis-Moore
# Sam-Louise O'Donnell
# Arthur_Campbell John-Foo

Try this:
my_string.split(/([ _-])/).map(&:capitalize).join
You can put whatever delimiters you like in the regex. I used , _, and -. So, for example:
'sam-joe denis-moore'.split(/([ _-])/).map(&:capitalize).join
Results in:
'Sam-Joe Denis-Moore'
What it does is:
.split(/([ _-])/) splits the string into an array of substrings at the given delimiters, and keeps the delimiters as substrings
.map(&:capitalize) maps the resulting array of strings to a new array of strings, capitalizing each string in the array (delimiters, when capitalized, are unaffected)
.join joins the resulting array of substrings back together for the final result
You could, if you want, monkey patch the String class with your own titleize:
class String
def my_titleize
self.split(/([ _-])/).map(&:capitalize).join
end
end
Then you can do:
'sam-joe denis-moore'.my_titleize
=> 'Sam-Joe Denis-Moore'

How can I write quoted values in en.yml?

I'm writing a script that will add new translations to the en.yml file. However, when I'm dumping them back to the file, my strings are in the following format:
some_key: This is the value
I'm trying to make the output be:
some_key: "This is the value"
I'm writing the translations like this:
File.open(yaml_file, "w") do |f|
f.write(translations.to_yaml)
end
Where translations is the hash containing all the translations.
Is there any way of adding these quotes, besides manually parsing/rewriting the YAML file?

The plan (unquotes) scalar representation is the preferred version when the scalar type doesn't require escaping.
In your case, the String:
This is the value
doesn't need to be in quotes, thus, if you supply the following YAML:
key: "This is the value"
the processor may return:
key: This is the value
because they are totally equivalent. However, if you actually want to enter a quoted string as value, then you should use:
key: '"This is the value"'
or escape the double quote:
key: "\"This is the value\""
I gave a quick look at the Psych emitter code, the one invoked by the to_yaml, and there doesn't seem to be an option to force quoting on scalar.
I don't even see the option implemented in the scalar emitter code.
def visit_Psych_Nodes_Scalar o
#handler.scalar o.value, o.anchor, o.tag, o.plain, o.quoted, o.style
end
In other words, you cannot enforce quoting.

Updated for hash conversion
def value_stringify(hash)
hash.each do |k,v|
if v.kind_of? Hash
hash[k]= value_stringify(v)
else
hash[k]="\"#{v}\""
end
end
end
Now use the converted hash to store yaml.
File.open(yaml_file, "w") do |f|
f.write(value_stringify(translations).to_yaml)
end
Now it should work..

The format you get is valid YAML. However, if you really want this you could temporarily modify your data before converting it.
Normal:
{ foo: "bar" }.to_yaml
# => foo: bar
With an space after:
{ foo: "bar " }.to_yaml
# => foo: 'bar '
Note that you get single quotes and not double quotes. So if you temporarily modifying your data you could add in an placeholder which you remove later.
Example:
{ foo: "foo --REMOVE-- ", bar: "bar --REMOVE-- " }.to_yaml
.gsub(' --REMOVE-- ', '')
# => foo: 'foo'
# bar: 'bar'

convert ruby hash to URL query string ... without those square brackets

In Python, I can do this:
>>> import urlparse, urllib
>>> q = urlparse.parse_qsl("a=b&a=c&d=e")
>>> urllib.urlencode(q)
'a=b&a=c&d=e'
In Ruby[+Rails] I can't figure out how to do the same thing without "rolling my own," which seems odd. The Rails way doesn't work for me -- it adds square brackets to the names of the query parameters, which the server on the other end may or may not support:
>> q = CGI.parse("a=b&a=c&d=e")
=> {"a"=>["b", "c"], "d"=>["e"]}
>> q.to_params
=> "a[]=b&a[]=c&d[]=e"
My use case is simply that I wish to muck with the values of some of the values in the query-string portion of the URL. It seemed natural to lean on the standard library and/or Rails, and write something like this:
uri = URI.parse("http://example.com/foo?a=b&a=c&d=e")
q = CGI.parse(uri.query)
q.delete("d")
q["a"] << "d"
uri.query = q.to_params # should be to_param or to_query instead?
puts Net::HTTP.get_response(uri)
but only if the resulting URI is in fact http://example.com/foo?a=b&a=c&a=d, and not http://example.com/foo?a[]=b&a[]=c&a[]=d. Is there a correct or better way to do this?

In modern ruby this is simply:
require 'uri'
URI.encode_www_form(hash)

Quick Hash to a URL Query Trick :
"http://www.example.com?" + { language: "ruby", status: "awesome" }.to_query
# => "http://www.example.com?language=ruby&status=awesome"
Want to do it in reverse? Use CGI.parse:
require 'cgi'
# Only needed for IRB, Rails already has this loaded
CGI::parse "language=ruby&status=awesome"
# => {"language"=>["ruby"], "status"=>["awesome"]}

Here's a quick function to turn your hash into query parameters:
require 'uri'
def hash_to_query(hash)
return URI.encode(hash.map{|k,v| "#{k}=#{v}"}.join("&"))
end

The way rails handles query strings of that type means you have to roll your own solution, as you have. It is somewhat unfortunate if you're dealing with non-rails apps, but makes sense if you're passing information to and from rails apps.

As a simple plain Ruby solution (or RubyMotion, in my case), just use this:
class Hash
def to_param
self.to_a.map { |x| "#{x[0]}=#{x[1]}" }.join("&")
end
end
{ fruit: "Apple", vegetable: "Carrot" }.to_param # => "fruit=Apple&vegetable=Carrot"
It only handles simple hashes, though.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Ruby regex to match string between variable and two digits - ruby-on-rails

You need to add the "multiline" flag (/m) to the regex: str[/Black Multi.*?\}\]/m] #=> "Black Multi/Wide Calf\"......\"500\"\"]}}]"

Related

How to return the same result from str.scan and str.match in Ruby using regex

Include apostrophe with .split()

Ruby on Rails Titleize Underscore,Hyphenated and Single Quote Name

How can I write quoted values in en.yml?

convert ruby hash to URL query string ... without those square brackets

Categories

Resources