How to split dot "." only before equal "=" in Ruby - ruby-on-rails

I need to split a string at a period that comes before an equal sign to assign to a hash. E.g.,
"Project.risksPotentialAfterSum=Pot. aft."
should be splitted like this:
{"Project" =>{"risksPotentialAfterSum" => "Pot. aft."}}
For now, I use str.split(/[\.=]/,2) which has a problem for the value that comes after the equal sign. Any ideas?

str = "Project.risksPotentialAfterSum=Pot. aft."
m = str.match(/\A(?<obj>.+?)\.(?<prop>[^.]+?)=(?<val>.+)/)
#=> #<MatchData "Project.risksPotentialAfterSum=Pot. aft." obj:"Project"
h = { m[:obj]=>{ m[:prop]=>m[:val] } }
#=> {"Project"=>{"risksPotentialAfterSum"=>"Pot. aft."}}
That regex says, roughly:
Starting at the start of the string,
find just about anything on the same line (name it 'obj') up until you see a period,
that is followed by one or more characters that aren't a period (name it 'prop') up until you see an equals sign,
and name whatever comes after the equals sign 'val'.

ruby-1.9.2-p136 :028 > str
=> "Project.risksPotentialAfterSum=Pot. aft."
ruby-1.9.2-p136 :029 > split = str.split(/\.|=/,3)
=> ["Project", "risksPotentialAfterSum", "Pot. aft."]
ruby-1.9.2-p136 :030 > Hash[*[split[0],Hash[*split[1,2]]]]
=> {"Project"=>{"risksPotentialAfterSum"=>"Pot. aft."}}
Concepts used here:
Uitlizing the | for regex with states: match the left or match the right of |.
Using the splat operator
Create hash based on list.

Instead of using string splitting you could consider using regular expression matching and capturing the values that you're interested in.
m = "Project.risksPotentialAfterSum=Pot. aft.".match /(\w+)\.(\w+)=(.*)/
h = {m[1] => {m[2] => m[3]}}
#=> {"Project"=>{"risksPotentialAfterSum"=>"Pot. aft."}}

Related

How do I replace each substring in a string?

I have a string:
story = 'A long foo ago, in a foo bar baz, baz away...foobar'
I also have matches from this string (the dictionary is dynamic, it doesn't depend on me)
string_matches = ['foo', 'foo', 'bar', 'baz', 'baz', 'foobar'] # words can be repeated
How to replace each match with **foo**? to get a result:
story = 'A long **foo** ago, in a **foo** **bar** **baz**, **baz** away...**foobar**'
for example my code:
string_matches.each do |word|
story.gsub!(/#{word}/, "**#{word}**")
end
returned:
"A long ****foo**** ago, in a ****foo**** **bar** ****baz****, ****baz**** away...****foo******bar**"
If you need to check if the words are matched as whole words, you may use
story.gsub(/\b(?:#{Regexp.union(string_matches.uniq.sort { |a,b| b.length <=> a.length }).source})\b/, '**\0**')
If the whole word check is not necessary use
story.gsub(Regexp.union(string_matches.uniq.sort { |a,b| b.length <=> a.length }), '**\0**')
See the Ruby demo
Details
\b - a word boundary
(?:#{Regexp.union(string_matches.uniq.sort { |a,b| b.length <=> a.length }).source}) - this creates a pattern like (?:foobar|foo|bar|baz) that matches a single word from the deduplicated list of keywords, and sorts them by length in the descending order. See Order of regular expression operator (..|.. ... ..|..) why this is necessary.
\b - a word boundary
The \0 in the replacement pattern is the replacement backreference referring to the whole match.
A slight change will nearly get you there:
irb(main):001:0> string_matches.uniq.each { |word| story.gsub!(/#{word}/, "**#{word}**") }
=> ["foo", "bar", "baz", "foobar"]
irb(main):002:0> story
=> "A long **foo** ago, in a **foo** **bar** **baz**, **baz** away...**foo****bar**"
The trouble with the final part of the resulting string is that foobar has been matched by both foo and foobar.

Split a database query string using regex in ruby

I have a query string which I want to separate out
created_at BETWEEN '2018-01-01T00:00:00+05:30' AND '2019-01-01T00:00:00+05:30' AND updated_at BETWEEN '2018-05-01T00:00:00+05:30' AND '2019-05-01T00:00:00+05:30' AND user_id = 5 AND status = 'closed'
Like this
created_at BETWEEN '2018-01-01T00:00:00+05:30' AND '2019-01-01T00:00:00+05:30'
updated_at BETWEEN '2018-05-01T00:00:00+05:30' AND '2019-05-01T00:00:00+05:30'
user_id = 5
status = 'closed'
This is just an example string, I want to separate the query string dynamically. I know can't just split with AND because of the pattern like BETWEEN .. AND
You might be able to do this with regex but here's a parser that may work for your use case. It can surely be improved but it should work.
require 'time'
def parse(sql)
arr = []
split = sql.split(' ')
date_counter = 0
split.each_with_index do |s, i|
date_counter = 2 if s == 'BETWEEN'
time = Time.parse(s.strip) rescue nil
date_counter -= 1 if time
arr << i+1 if date_counter == 1
end
arr.select(&:even?).each do |index|
split.insert(index + 2, 'SPLIT_ME')
end
split = split.join(' ').split('SPLIT_ME').map{|l| l.strip.gsub(/(AND)$/, '')}
split.map do |line|
line[/^AND/] ? line.split('AND') : line
end.flatten.select{|l| !l.empty?}.map(&:strip)
end
This is not really a regex, but more a simple parser.
This works by matching a regex from the start of the string until it encounters a whitespace followed by either and or between followed by a whitespace character. The result is removed from the where_cause and saved in statement.
If the start of the string now starts with a whitespace followed by between followed by a whitespace. It is added to statement and removed from where_cause with anything after that, allowing 1 and. Matching stops if the end of the string is reached or another and is encountered.
If point 2 didn't match check if the string starts with a whitespace followed by and followed by a whitespace. If this is the case remove this from where_cause.
Finally add statement to the statements array if it isn't an empty string.
All matching is done case insensitive.
where_cause = "created_at BETWEEN '2018-01-01T00:00:00+05:30' AND '2019-01-01T00:00:00+05:30' AND updated_at BETWEEN '2018-05-01T00:00:00+05:30' AND '2019-05-01T00:00:00+05:30' AND user_id = 5 AND status = 'closed'"
statements = []
until where_cause.empty?
statement = where_cause.slice!(/\A.*?(?=[\s](and|between)[\s]|\z)/mi)
if where_cause.match? /\A[\s]between[\s]/i
between = /\A[\s]between[\s].*?[\s]and[\s].*?(?=[\s]and[\s]|\z)/mi
statement << where_cause.slice!(between)
elsif where_cause.match? /\A[\s]and[\s]/i
where_cause.slice!(/\A[\s]and[\s]/i)
end
statements << statement unless statement.empty?
end
pp statements
# ["created_at BETWEEN '2018-01-01T00:00:00+05:30' AND '2019-01-01T00:00:00+05:30'",
# "updated_at BETWEEN '2018-05-01T00:00:00+05:30' AND '2019-05-01T00:00:00+05:30'",
# "user_id = 5",
# "status = 'closed'"]
Note: Ruby uses \A to match the start of the string and \z to match the end of a string instead of the usual ^ and $, which match the beginning and ending of a line respectively. See the regexp anchor documentation.
You can replace every [\s] with \s if you like. I've added them in to make the regex more readable.
Keep in mind that this solution isn't perfect, but might give you an idea how to solve the issue. The reason I say this is because it doesn't account for the words and/between in column name or string context.
The following where cause:
where_cause = "name = 'Tarzan AND Jane'"
Will output:
#=> ["name = 'Tarzan", "Jane'"]
This solution also assumes correctly structured SQL queries. The following queries don't result in what you might think:
where_cause = "created_at = BETWEEN AND"
# TypeError: no implicit conversion of nil into String
# ^ does match /\A[\s]between[\s]/i, but not the #slice! argument
where_cause = "id = BETWEEN 1 AND 2 BETWEEN 1 AND 3"
#=> ["id = BETWEEN 1 AND 2 BETWEEN 1", "3"]
I'm not certain if I understand the question, particularly in view of the previous answers, but if you simply wish to extract the indicated substrings from your string, and all column names begin with lowercase letters, you could write the following (where str holds the string given in the question):
str.split(/ +AND +(?=[a-z])/)
#=> ["created_at BETWEEN '2018-01-01T00:00:00+05:30' AND '2019-01-01T00:00:00+05:30'",
# "updated_at BETWEEN '2018-05-01T00:00:00+05:30' AND '2019-05-01T00:00:00+05:30'",
# "user_id = 5",
# "status = 'closed'"]
The regular expression reads, "match one or more spaces, followed by 'AND', followed by one or more spaces, followed by a positive lookahead that contains a lowercase letter". Being in a positive lookahead, the lowercase letter is not part of the match that is returned.

Array range being recognised as string - unable to convert to integer

i've written a function within a model to scrape a site and store certain attributes within a separate model (story):
def get_content
request = HTTParty.get("#{url}")
doc = Nokogiri::HTML(request.body)
doc.css("#{anchor}")["#{range}"].each do |entry|
story = self.stories.new
story.title = entry.text
story.url = entry[:href]
story.save
end
This uses the url, anchor, and range attributes of a Sections variable. The range attribute is stored as an array range - i.e. 0..2 or 11..13 - however, I'm being told that it can't convert a string into a variable. I've tried storing range as an integer and as a string, but both fail.
I realise I could input the beginning and end of the range as two separate integers in my db, and put ["#{beginrange}".."#{endrange}"] but this seems a messy way of doing it.
Any other ideas? Many thanks in advance
Hmm if you are sure that the range is always a string like '1..2' ('<Integer >..<Integer>'), you can use the eval method:
In my IRB console:
1.9.3p0 :032 > (eval "1..2").each { |l| puts l }
1
2
=> 1..2
1.9.3p0 :033 > (eval "1..2").inspect
=> "1..2"
1.9.3p0 :034 > (eval "1..2").class
=> Range
In your case:
doc.css("#{anchor}")[eval(range)].each do |entry|
#...
end
But eval is kind of dangerous. If you are sure that the range attribute is a Range as a String (validations and Regex are here to help), you can use eval without risk.
There's a couple things I see wrong.
["#{beginrange}".."#{endrange}"] creates a range of characters, not a range of integers, which Array[] needs:
beginrange = 1
endrange = 2
["#{beginrange}".."#{endrange}"]
=> ["1".."2"]
[beginrange..endrange]
=> [1..2]
But, you're storing the representation of the array range you need as a string. If I had a string representation of a range, I'd use this:
range_value = '1..2'
[Range.new(*range_value.scan(/\d+/).map(&:to_i))]
=> [1..2]
Or, if there was a chance I'd encounter an exclusive-range:
[Range.new(*range_value.scan(/\d+/).map(&:to_i), range_value['...'])]
=> [1..2]
range_value = '1...2'
[Range.new(*range_value.scan(/\d+/).map(&:to_i), range_value['...'])]
=> [1...2]
Those are all good when you can't trust your Range string representation's source, i.e., the value is coming from a form or a file someone else created. If you own the incoming value, or, for convenience, stored it as a string in a database, you can easily recreate the range using eval:
eval('1..2').class
=> Range
eval('1..2')
=> 1..2
eval('1...2')
=> 1...2
People are afraid of eval, because, used unwisely, it is dangerous. That doesn't mean we should avoid using it, instead, we should use it when it's safe.
You could use a regex to check the format of the string, raise an exception if it's not acceptable, then continue:
raise "Invalid range value received" if (!range_value[/\A\d+\s*\.{2,3}\s*\d+\z/])
[eval(range_value)]

Ruby (Rails) unescape a string -- undo Array.to_s

Have been hacking together a couple of libraries, and had an issue where a string was getting 'double escaped'.
for example:
Fixed example
> x = ['a']
=> ["a"]
> x.to_s
=> "[\"a\"]"
>
Then again to
\"\[\\\"s\\\"\]\"
This was happening while dealing with http headers. I have a header which will be an array, but the http library is doing it's own character escaping on the array.to_s value.
The workaround I found, was to convert the array to a string myself, and then 'undo' the to_s. Like so:
formatted_value = value.to_s
if value.instance_of?(Array)
formatted_value = formatted_value.gsub(/\\/,"") #remove backslash
formatted_value = formatted_value.gsub(/"/,"") #remove single quote
formatted_value = formatted_value.gsub(/\[/,"") #remove [
formatted_value = formatted_value.gsub(/\]/,"") #remove ]
end
value = formatted_value
... There's gotta be a better way ... (without needing to monkey-patch the gems I'm using). (yeah, this break's if my string actually contains those strings.)
Suggestions?
** UPDATE 2 **
Okay. Still having troubles in this neighborhood, but now I think I've figured out the core issue. It's serializing my array to json after a to_s call. At least, that seems to be reproducing what I'm seeing.
['a'].to_s.to_json
I'm calling a method in a gem that is returning the results of a to_s, and then I'm calling to_json on it.
I've edited my answer due to your edited question:
I still can't duplicate your results!
>> x = ['a']
=> ["a"]
>> x.to_s
=> "a"
But when I change the last call to this:
>> x.inspect
=> "[\"a\"]"
So I'll assume that's what you're doing?
it's not necessarily escaping the values - per se. It's storing the string like this:
%{["a"]}
or rather:
'["a"]'
In any case. This should work to un-stringify it:
>> x = ['a']
=> ["a"]
>> y = x.inspect
=> "[\"a\"]"
>> z = Array.class_eval(y)
=> ["a"]
>> x == z
=> true
I'm skeptical about the safe-ness of using class_eval though, be wary of user inputs because it may produce un-intended side effects (and by that I mean code injection attacks) unless you're very sure you know where the original data came from, or what was allowed through to it.

Newb Regular Expression Question - Ruby 1.9.2

I'm pretty new to programming, specifically Ruby and I've been bumping my head working with regular expressions.
What I have is a string such as the following:
s = 'Your Product Costs $10.00'
What I'm trying to do is write an expression, so my match data is only equal to the price, for example I've been able to do the following.
r = /[$]\d....\Z/
and therefore
match = r.match s
=> #<MatchData "$10.00">
My problem is, what if the product price is $100.00 dollars. Well I don't have enough wild cards and my match is nil.
It there away with regular expressions, to say "wild cards until the end of a string?" or "wild cards until [characters]" Or will I have to find the length of a string, find the location of my $ character, and count it out based off each input?
Thanks.
A regular expression that matches an amount and also ensures that the match is an actual number and not a succession of digits or random characters:
/\$([1-9]\d*|0)(\.\d{1,2})?/
Matches $123.1, $123.12, $123, $0.12 etc.
Doesn't match $01.12, $12. etc.
Further reading: http://www.regular-expressions.info/
Here's the regexp you want:
\$(\d+\.\d+)
The ruby code to test:
str1 = 'Your Product Costs $10.00'
str2 = 'Your Product Costs $100.00'
regexp = /\$(\d+\.\d+)/
regexp.match str1 # => <MatchData "$10.00" 1:"10.00">
regexp.match str2 # => <MatchData "$100.00" 1:"100.00">
The key is to check for the . . There's a great website for testing your regular expressions: http://rubular.com/
Maybe you want this?
r = /\$\d+\.\d{1,2}/
s = 'Your product costs $10.00'
s2 = 'Your product costs $1000.00'
r.match s
=> #<MatchData "$10.00">
r.match s2
=> #<MatchData "$1000.00">
It accepts any number of digits before the dot, and one to two digits after the dot. You can change this in the {1, 2} part. Note that if you change it to 0, you have to make the dot optional.
With both the dot and digits after the decimal point optional:
r = /\$\d+(?:\.\d{1,2})?/
s = "Anything blabla $100060"
r.match s
=> #<MatchData "$100060">"
With unlimited number of digits after the decimal point:
r = /\$\d+\.\d+/
s = "Product price is: $1560.5215010"
r.match s
=> #<MatchData "$1560.5215010">
With unlimited number of digits after the decimal point and optional dot:
r = /\$\d+(?:\.\d{1,})?/
s = "Product price is: $1500"
s2 = "Product price is: $19.921501"
r.match s
=> #<MatchData "$1500">
r.match s2
=> #<MatchData "$19.921501">

Resources