Parsing a string list with multiple values into JSON - ruby-on-rails

I have about thirty thousand records with a string column that has been stored in the following format, with different keys:
"something: this, this and that, that, other stuff, another: name, another name, last: here"
In rails, I want to change it into a hash like
{
something: [ "this", "this and that", "that" ],
another: [ "name", "another name" ],
last: [ "here" ]
}
Is there a way to do this elegantly? I was thinking of splitting at the colon, then doing a reverse search of the first space.

There are about a hundred ways to solve this. A pretty straightforward one is this:
str = "something: this, this and that, that, other stuff, another: name, another name, last: here"
key = nil
str.scan(/\s*([^,:]+)(:)?\s*/).each_with_object({}) do |(val, colon), hsh|
if colon
key = val.to_sym
hsh[key] = []
else
hsh[key] << val
end
end
# => {
# something: ["this", "this and that", "that", "other stuff"],
# another: ["name", "another name"],
# last: ["here"]
# }
It works by scanning the string with the following regular expression:
/
\s* # any amount of optional whitespace
([^,:]+) # one or more characters that aren't , or : (capture 1)
(:)? # an optional trailing : (capture 2)
\s* # any amount of optional whitespace
/x
Then it iterates over the matches and puts them into a hash. When a match has a trailing colon (capture 2), a new hash key is created with an empty array for a value. Otherwise the value (capture 1) is added to the array for the most recent key.
Or…
A somewhat less straightforward but cleverer approach is to let the RegExp do more work:
MATCH_LIST_ENTRY = /([^:]+):\s*((?:[^,]+(?:,\s*|$))+?)(?=[^:,]+:|$)/
def parse_list2(str)
str.scan(MATCH_LIST_ENTRY).map do |k, vs|
[k.to_sym, vs.split(/,\s*/)]
end.to_h
end
I won't pick apart the RegExp for this one, but it's simpler than it looks. Regexper does a pretty good job of explaining it.
You can see both of these in action on repl.it here: https://repl.it/#jrunning/LongtermMidnightblueAssembler

If str is the string given in the example, the desired hash can be constructed as follows.
str.split(/, *(?=\p{L}+:)/).
each_with_object({}) do |s,h|
k, v = s.split(/: +/)
h[k.to_sym]= v.split(/, */)
end
#=> {:something=>["this", "this and that", "that", "other stuff"],
# :another=>["name", "another name"],
# :last=>["here"]}
Note:
str.split(/, *(?=\p{L}+:)/)
#=> ["something: this, this and that, that, other stuff",
# "another: name, another name",
# "last: here"]
This regular expression reads, "match a comma followed by zero or more spaces, the match to be immediately followed by one or more Unicode letters followed by a colon, (?=\p{L}+:) being a positive lookahead".

elegantly:
result_hash = {}
string.scan(/(?<key>[\w]+(?=:))|(?<value>[\s\w]+(?=(,|\z)))/) do |key,value|
if key.present?
result_hash[key] = []
current_key = key
elsif value.present?
result_hash[current_key] << value.strip
end
end
then jsonize:
json = result_hash.to.json

Related

In Ruby, how to return a sentence with each word reversed in place?

I want to write a method that takes a string as an argument and returns the same sentence with each word reversed in place.
example:-
reverse_each_word("Hello there, and how are you?")
#=> "olleH ,ereht dna woh era ?uoy"
I tried:
def reverse_each_word(string)
array = []
new_array=array <<string.split(" ")
array.collect {|word| word.join(" ").reverse}
end
but the return is
["?uoy era woh dna ,ereht olleH"]
The whole sentence is reversed and not sure why there is [ ]
any help??
As an alternative approach, you could use gsub to find all words and reverse them:
str = "Hello there, and how are you?"
str.gsub(/\S+/, &:reverse)
#=> "olleH ,ereht dna woh era ?uoy"
/\S+/ matches one or more (+) non-whitespace (\S) characters
def reverse_each_word(string)
new_array = string.split(" ")
new_array.collect {|word| word.reverse }.join(" ")
end

gsub regex only match if beginning of string or has space

I have a bunch of phrases with missing apostrophes, and I have an array of fixes like so:
phrase = "i d let some"
def contractions_to_fix
[
{ missing: "let s", fixed: "let's" },
{ missing: "i d", fixed: "i'd" }
]
end
I'm trying to loop through the array of contractions to replace them, like this:
contractions_to_fix.each do |contraction|
if phrase.include? contraction[:missing]
idea_title.gsub! contraction[:missing], contraction[:fixed]
end
end
The goal, for this example, would be to return "i'd let some"; however, every regex I've tried so far returns an incorrect response.
For example:
contraction[:missing] results in "i'd let'some
/\bcontraction[:missing]\b/ results in "i d let some"
Any help would be much appreciated!
The easiest way to code the exact requirement in your title is to flip your condition around: "isn't preceded or followed by a non-space":
idea_title.gsub!(/(?<!\S)#{Regexp.escape(contraction[:missing])}(?!\S)/, contraction[:fixed])
Though /\b#{...}\b/ should work for the example you gave. Your problem is likely the fact that you feed a String as the pattern into gsub! instead of Regexp, so you are literally looking for \b (backslash and lowercase B), not a word boundary. Try it with
idea_title.gsub!(/\b#{Regexp.escape(contraction[:missing])}\b/, contraction[:fixed])
arr = [
{ missing: "let s", fixed: "let's" },
{ missing: "i d", fixed: "i'd" }
]
h = arr.reduce({}) { |h,g| h.merge(g[:missing]=>g[:fixed]) }
#=> {"let s"=>"let's", "i d"=>"i'd"}
r = /\b(?:#{h.keys.join('|')})\b/
#=> /\b(?:let s|i d)\b/
"i d want to let some".gsub(r, h)
#=> "i'd want to let some"
This uses the (second) form of String.gsub that takes a hash as a second argument and has no block.
One may alternatively calculate h as follows.
h = arr.map { |g| g.values_at(:missing, :fixed) }.to_h
#=> {"let s"=>"let's", "i d"=>"i'd"}

How to remove words that have the same letter more than once?

I am trying to remove words that have more that have the same letter more than once. I have tried squeeze but all that is doing is removing words that have duplicate letters next to each other.
Here is the code at the moment:
array = []
File.open('word.txt').each do |line|
if line.squeeze == line
array << line
end
end
Input from word.txt
start
james
hello
joins
regex
Output that I am looking for
james
joins
Any suggestions on how I can get around this.
Perhaps something like this:
array = []
File.open('word.txt').each do |line|
chars = line.chars
array << line if chars.uniq == chars
end
or shorter:
array = File.open('word.txt').select { |word| word.chars.uniq == word.chars }
You could use a regular expression, for example:
re = /
(.) # match and capture a single character
.*? # any number of characters in-between (non-greedy)
\1 # match the captured character again
/x
Example:
'start'[re] #=> "tart"
'james'[re] #=> nil
'hello'[re] #=> "ll"
'joins'[re] #=> nil
'regex'[re] #=> "ege"
It can be passed to grep to return all matched lines:
IO.foreach('word.txt').grep(re)
#=> ["start\n", "hello\n", "regex\n"]
or to grep_v to return the other lines:
IO.foreach('word.txt').grep_v(re)
#=> ["james\n", "joins\n"]

Can't get rid of some characters when pushing string to array

I'm creating some kind of custom tags that I'll use later to filter some datas. However, when I add the tags inside an array, I get the following:
"[\"witcher 3\", \"badass\", \"epic\"]"
#tags = []
params[:tags].split(', ').map do |tag|
#tags.push(tag.strip)
end
# About 5 lines under
FileDetail.create!(path: path, creation_date: date, tags: #tags)
Why do these \ show up, and why don't .strip work?
Thank you in advance
You are setting an array of strings in #tag, and \" represents an escaped character, in this case " which is used by ruby to represent String objects.
Consider the following code (an try it on IRB):
foo = ["bar", "baz"]
#=> ["bar", "baz"]
foo.inspect
#=> "[\"bar\", \"baz\"]"
foo.each { |f| puts "tag: #{f}" }
# tag: bar
# tag: baz
As you can see, there is really no \ character to strip from the string, its just how ruby outputs a String representation. So your code doesn't need .strip method:
#tags = []
params[:tags].split(', ').map do |tag|
#tags.push(tag)
end
Not related to your question, but still relevant: split method will return an array, so there is no need to create one before and then push items to it; just assign the returned array to #tags.
For example:
params[:tags] = "witcher 3, badass, epic"
#=> "witcher 3, badass, epic"
#tags = params[:tags].split(', ')
#=> ["witcher 3", "badass", "epic"]
If you want, you can still use map and strip to remove leading and trailing spaces:
params[:tags] = "witcher 3, badass , epic "
#=> "witcher 3, badass , epic "
params[:tags].split(",").map(&:strip)
#=> ["witcher 3", "badass", "epic"]

How to expand a string in Ruby based on some condition?

I have a string a5bc2cdf3. I want to expand it to aaaaabcbccdfcdfcdf.
In the string is a5, so the resulting string should contain 5 consecutive "a"s, "bc2" results in "bc" appearing 2 times consecutively, and cdf should repeat 3 times.
If input is a5bc2cdf3, and output is aaaaabcbccdfcdfcdf how can I do this in a Ruby method?
def get_character("compressed_string",index)
expanded_string = calculate_expanded_string(compressed_string)
required_char = expanded_string(char_at, index_number(for eg 3))
end
def calculate_expanded_string(compressed_string)
return expanded
end
You may use a regex like
.gsub(/([a-zA-Z]+)(\d+)/){$1*$2.to_i}
See the Ruby online demo
The /([a-zA-Z]+)(\d+)/ will match stubstrings with 1+ letters (([a-zA-Z]+)) and 1+ digits ((\d+)) and will capture them into 2 groups that are later used inside a block to return the string you need.
Note that instead of [a-zA-Z] you might consider using \p{L} that can match any letters.
You want to break out of gsub once the specified index is reached in the original "compressed" string. It is still possible, see this Ruby demo:
s = 'a5bc2cdf3' # input string
index = 5 # break index
result = "" # expanded string
s.gsub!(/([a-zA-Z]+)(\d+)/){ # regex replacement
result << $1*$2.to_i # add to the resulting string
break if Regexp.last_match.end(0) >= index # Break if the current match end index is bigger or equal to index
}
puts result[index] # Show the result
# => b
For brevity, you may replace Regexp.last_match with $~.
I would propose to use scan to move over the compressed string, using a simple RegEx which detects groups of non-decimal characters followed by their count as decimal /([^\d]+)(\d+)/.
def get_character(compressed_string, index)
result = nil
compressed_string.scan(/([^\d]+)(\d+)/).inject(0) do |total_length, (chars, count)|
decoded_string = chars * count.to_i
total_length += decoded_string.length
if index < total_length
result = decoded_string[-(total_length - index)]
break
else
total_length
end
end
result
end
Knowing the current (total) length, one can break out of the loop if the current expanded string includes the requested index. The string is never decoded entirely.
This code gives the following results
get_character("a5bc2cdf3", 5) # => "b"
get_character("a5bc2cdf3", 10) # => "d"
get_character("a5bc2cdf3", 20) # => nil
Just another way. I prefer Wiktor's method by a long way.
def stringy str, index
lets, nums = str.split(/\d+/), str.split(/[a-z]+/)[1..-1].map(&:to_i)
ostr = lets.zip(nums).map { |l,n| l*n }.join
ostr[index]
end
str = 'a5bc2cdf3'
p stringy str, 5 #=> "b"
I'd use:
str = "a5bc2cdf3"
str.split(/(\d+)/).each_slice(2).map { |s, c| s * c.to_i }.join # => "aaaaabcbccdfcdfcdf"
Here's how it breaks down:
str.split(/(\d+)/) # => ["a", "5", "bc", "2", "cdf", "3"]
This works because split will return the value being split on if it's in a regex group: /(\d+)/.
str.split(/(\d+)/).each_slice(2).to_a # => [["a", "5"], ["bc", "2"], ["cdf", "3"]]
The resulting array can be broken into the string to be repeated and its associated count using each_slice(2).
str.split(/(\d+)/).each_slice(2).map { |s, c| s * c.to_i } # => ["aaaaa", "bcbc", "cdfcdfcdf"]
That array of arrays can then be processed in a map that uses String's * to repeat the characters.
And finally join concatenates all the resulting expanded strings back into a single string.

Resources