How to parse text file in Ruby on Rails - ruby-on-rails

Below is the text file:
Old count: 56
S id: 1
M id: 1
New count: 2
Old count: 56
S id: 1
M id: 2
New count: 20
Old count: 56
S id: 1
M id: 2
New count: 32
-----------------------------
Old count: 2
S id: 2
M id: 1
New count: 4
--------------------------------
.
.
.
.
I have used delimiter "---------------" for each ids.
How to parse the value such that the lines with in the delimiter "-----" that is new count is added like this: 2+20+32 = 54
Hash array: count << {'new count' => 54} for first block and so on for remaining blocks.
I have tried like this..
begin
f=File.open("out2", "r")
f.each_line do |line|
#data+=line
end
s_rec=#data.split("------")
s_rec.each do |rec|
row_s=rec.split(/\n/)
row_s.each do |row|
if r.include?"New count"
rv=row.split(":")
#db=rv[1]
end
end
end

Not sure what output format you are trying to achieve, but given the text:
text = <<__
Old count: 56
S id: 1
M id: 1
New count: 2
Old count: 56
S id: 1
M id: 2
New count: 20
Old count: 56
S id: 1
M id: 2
New count: 32
-----------------------------
Old count: 2
S id: 2
M id: 1
New count: 4
--------------------------------
.
.
.
.
__
this:
text
.split(/^-{5,}/)
.map{|s| s.scan(/\bNew count: (\d+)/).map{|match| match.first.to_i}.inject(:+)}
gives:
[
54,
4,
nil
]
In response to the comment, still not clear what you want because what you wrote is not a valid Ruby object, but this:
text
.scan(/^S id: (\d+).+?^New count: (\d+)/m)
.inject(Hash.new(0)){|h, (k, v)| h[k.to_i] += v.to_i; h}
.map{|k, v| {"S id" => k, "new count" => v}}
gives:
[
{
"S id" => 1,
"new count" => 54
},
{
"S id" => 2,
"new count" => 4
}
]

I'd start with:
data = 'Old count: 56
S id: 1
M id: 1
New count: 2
Old count: 56
S id: 1
M id: 2
New count: 20
Old count: 56
S id: 1
M id: 2
New count: 32
-----------------------------
Old count: 2
S id: 2
M id: 1
New count: 4
--------------------------------
'
ary = data.split("\n").slice_before(/^---/).map{ |a| a.select{ |s| s['New count:'] }.map{ |s| s[/\d+/].to_i }.inject(:+) }.compact
Which gives me an array:
[
[0] 54,
[1] 4,
]
compact is needed because there's a trailing ---- block delimiter that results in an empty array when slice_before does its magic.
From that point it's easy to create an array of hashes:
Hash[ ary.map.with_index(1) { |v, i| ["S #{ i }", "new count #{ v }" ] } ]
Which looks like:
{
"S 1" => "new count 54",
"S 2" => "new count 4"
}
Breaking it down, the code through slice_before returns:
[
[0] [
[ 0] "--------------------------------",
[ 1] "Old count: 56",
[ 2] "S id: 1",
[ 3] "M id: 1 ",
[ 4] "New count: 2",
[ 5] "Old count: 56",
[ 6] "S id: 1",
[ 7] "M id: 2",
[ 8] "New count: 20",
[ 9] "Old count: 56",
[10] "S id: 1",
[11] "M id: 2",
[12] "New count: 32"
],
[1] [
[0] "-----------------------------",
[1] "Old count: 2",
[2] "S id: 2",
[3] "M id: 1",
[4] "New count: 4"
]
]
From there it's straightforward, selecting the lines that are needed in each sub-array, extracting out the values, and summing them using inject.
Once that's done it's simply using map and with_index to build the string and name/value pairs, then let Hash turn them into a hash.

Related

Elixir Accumulator List of Maps

Can you help me to implement one Accumulator from List of maps?.
[
%{
score: 1,
name: "Javascript",
},
%{
score: 2,
name: "Elixir",
},
%{
score: 10,
name: "Elixir",
}
]
The result should be:
[
%{
score: 12,
name: "Elixir",
},
%{
score: 1,
name: "Javascript",
}
]
I will appreciate your suggestion.
Regards
Assuming your original list is stored in input local variable, one might start with Enum.reduce/3 using Map.update/4 as a reducer.
Enum.reduce(input, %{}, fn %{score: score, name: name}, acc ->
Map.update(acc, name, score, & &1 + score)
end)
#⇒ %{"Elixir" => 12, "Javascript" => 1}
Whether you insist on having a list of maps as a result (which is way less readable IMSO,) go further and Enum.map/2 the result:
Enum.map(%{"Elixir" => 12, "Javascript" => 1}, fn {name, score} ->
%{name: name, score: score}
end)
#⇒ [%{name: "Elixir", score: 12},
# %{name: "Javascript", score: 1}]
To sum it up:
input
|> Enum.reduce(%{}, fn %{score: score, name: name}, acc ->
Map.update(acc, name, score, & &1 + score)
end)
|> Enum.map(& %{name: elem(&1, 0), score: elem(&1, 1)})
#⇒ [%{name: "Elixir", score: 12},
# %{name: "Javascript", score: 1}]
Sidenote: maps in erlang (and, hence, in elixir) are not ordered. That means, if you want the resulting list to be sorted by name, or by score, you should explicitly Enum.sort/2 it:
Enum.sort(..., & &1.score > &2.score)
#⇒ [%{name: "Elixir", score: 12},
# %{name: "Javascript", score: 1}]
A simple way could be to use Enum.group_by/3 to group the items by name, then Enum.sum/1 to sum the scores:
list
|> Enum.group_by(& &1.name, & &1.score)
|> Enum.map(fn {name, score} -> %{name: name, score: Enum.sum(score)} end)
Output:
[%{name: "Elixir", score: 12}, %{name: "Javascript", score: 1}]
If you were looking to create & use a more generalized solution, you could create your own Merger module.
defmodule Merger do
def merge_by(enumerable, name_fun, merge_fun) do
enumerable
|> Enum.group_by(name_fun)
|> Enum.map(fn {_name, items} -> Enum.reduce(items, merge_fun) end)
end
end
list = [
%{score: 1, name: "Javascript"},
%{score: 2, name: "Elixir"},
%{score: 10, name: "Elixir"}
]
Merger.merge_by(list, & &1.name, &%{&1 | score: &1.score + &2.score})
# => [%{name: "Elixir", score: 12}, %{name: "Javascript", score: 1}]

Distribute items into containers in twos - Rails

I have a list of 10 items -- it is an array of hashes.
[{ id: 1, name: 'one'}, { id: 2, name: 'two' } .. { id: 10, name: 'ten' }]
I also have a random number of containers -- let's say 3, in this case. These containers are hashes with array values.
{ one: [], two: [], three: [] }
What I want to do, is iterate over the containers and drop 2 items at a time resulting in:
{
one: [{id:1}, {id:2}, {id:7}, {id:8}],
two: [{id:3}, {id:4}, {id:9}, {id:10}],
three: [{id:5}, {id:6}]
}
Also, if the item list is an odd number (11), the last item is still dropped into the next container.
{
one: [{id:1}, {id:2}, {id:7}, {id:8}],
two: [{id:3}, {id:4}, {id:9}, {id:10}],
three: [{id:5}, {id:6}, {id:11}]
}
note: the hashes are snipped here so it's easier to read.
My solution is something like this: (simplified)
x = 10
containers = { one: [], two: [], three: [] }
until x < 1 do
containers.each do |c|
c << 'x'
c << 'x'
end
x -= 2
end
puts containers
I'm trying to wrap my head around how I can achieve this but I can't seem to get it to work.
Round-robin pair distribution into three bins:
bins = 3
array = 10.times.map { |i| i + 1 }
# => [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
array.
each_slice(2). # divide into pairs
group_by. # group into bins
with_index { |p, i| i % bins }. # round-robin style
values. # get rid of bin indices
each(&:flatten!) # join pairs in each bin
Completely different approach, stuffing bins in order:
base_size, bins_with_extra = (array.size / 2).divmod(bins)
pos = 0
bins.times.map { |i|
length = 2 * (base_size + (i < bins_with_extra ? 1 : 0)) # how much in this bin?
array[pos, length].tap { pos += length } # extract and advance
}
# => [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10]]
If you absolutely need to have this in a hash,
Hash[%i(one two three).zip(binned_array)]
# => {:one=>[1, 2, 7, 8], :two=>[3, 4, 9, 10], :three=>[5, 6]}
The lovely (but likely not as performant) solution hinted at by Stefan Pochmann:
bins.times.with_object(array.to_enum).map { |i, e|
Array.new(2 * (base_size + (i < bins_with_extra ? 1 : 0))) { e.next }
}
This is just to show a different approach (and I would probably not use this one myself).
Given an array of items and the containers hash:
items = (1..10).to_a
containers = { one: [], two: [], three: [] }
You could dup the array (in order not to modify the original one) and build an enumerator that cycles each_value in the hash:
array = items.dup
enum = containers.each_value.cycle
Using the above, you can shift 2 items off the array and push them to the next container until the array is emtpy?:
enum.next.push(*array.shift(2)) until array.empty?
Result:
containers
#=> {:one=>[1, 2, 7, 8], :two=>[3, 4, 9, 10], :three=>[5, 6]}
You can use Enumerable#each_slice to iterate over a range from 0 to 10 in 3s and then append to an array of arrays:
containers = [
[],
[],
[]
]
(1...10).each_slice(3) do |slice|
containers[0] << slice[0]
containers[1] << slice[1]
containers[2] << slice[2]
end
p containers
# [[1, 4, 7], [2, 5, 8], [3, 6, 9]]

Sum and Average Array of Array

I am trying to sum array of array and get average at the same time. The original data is in the form of JSON. I have to parse my data to array of array in order to render the graph. The graph does not accept array of hash.
I first convert the output to JSON using the definition below.
ActiveSupport::JSON.decode(#output.first(10).to_json)
And the result of the above action is shown below.
output =
[{"name"=>"aaa", "job"=>"a", "pay"=> 2, ... },
{"name"=>"zzz", "job"=>"a", "pay"=> 4, ... },
{"name"=>"xxx", "job"=>"a", "pay"=> 6, ... },
{"name"=>"yyy", "job"=>"a", "pay"=> 8, ... },
{"name"=>"aaa", "job"=>"b", "pay"=> 2, ... },
{"name"=>"zzz", "job"=>"b", "pay"=> 4, ... },
{"name"=>"xxx", "job"=>"b", "pay"=> 6, ... },
{"name"=>"yyy", "job"=>"b", "pay"=> 10, ... },
]
Then I retrieved the job and pay by converting to array of array.
ActiveSupport::JSON.decode(output.to_json).each { |h|
a << [h['job'], h['pay']]
}
The result of the above operation is as below.
a = [["a", 2], ["a", 4], ["a", 6], ["a", 8],
["b", 2], ["b", 4], ["b", 6], ["b", 10]]
The code below will give me the sum of each element in the form of array of array.
a.inject({}) { |h,(job, data)| h[job] ||= 0; h[job] += data; h }.to_a
And the result is as below
[["a", 20], ["b", 22]]
However, I am trying to get the average of the array. The expected output is as below.
[["a", 5], ["b", 5.5]]
I can count how many elements in an array and divide the sum array by the count array. I was wondering if there is an easier and more efficient way to get the average.
output = [
{"name"=>"aaa", "job"=>"a", "pay"=> 2 },
{"name"=>"zzz", "job"=>"a", "pay"=> 4 },
{"name"=>"xxx", "job"=>"a", "pay"=> 6 },
{"name"=>"yyy", "job"=>"a", "pay"=> 8 },
{"name"=>"aaa", "job"=>"b", "pay"=> 2 },
{"name"=>"zzz", "job"=>"b", "pay"=> 4 },
{"name"=>"xxx", "job"=>"b", "pay"=> 6 },
{"name"=>"yyy", "job"=>"b", "pay"=> 10 },
]
output.group_by { |obj| obj['job'] }.map do |key, list|
[key, list.map { |obj| obj['pay'] }.reduce(:+) / list.size.to_f]
end
The group_by method will transform your list into a hash with the following structure:
{"a"=>[{"name"=>"aaa", "job"=>"a", "pay"=>2}, ...], "b"=>[{"name"=>"aaa", "job"=>"b", ...]}
After that, for each pair of that hash, we want to calculate the mean of its 'pay' values, and return a pair [key, mean]. We use a map for that, returning a pair with:
They key itself ("a" or "b").
The mean of the values. Note that the values list has the form of a list of hashes. To retrieve the values, we need to extract the last element of each pair; that's what list.map { |obj| obj['pay'] } is used for. Finally, calculate the mean by suming all elements with .reduce(:+) and dividing them by the list size as a float.
Not the most efficient solution, but it's practical.
Comparing the answer with #EricDuminil's, here's a benchmark with a list of size 8.000.000:
def Wikiti(output)
output.group_by { |obj| obj['job'] }.map do |key, list|
[key, list.map { |obj| obj['pay'] }.reduce(:+) / list.size.to_f]
end
end
def EricDuminil(output)
count_and_sum = output.each_with_object(Hash.new([0, 0])) do |hash, mem|
job = hash['job']
count, sum = mem[job]
mem[job] = count + 1, sum + hash['pay']
end
result = count_and_sum.map do |job, (count, sum)|
[job, sum / count.to_f]
end
end
require 'benchmark'
Benchmark.bm do |x|
x.report('Wikiti') { Wikiti(output) }
x.report('EricDuminil') { EricDuminil(output) }
end
user system total real
Wikiti 4.100000 0.020000 4.120000 ( 4.130373)
EricDuminil 4.250000 0.000000 4.250000 ( 4.272685)
This method should be reasonably efficient. It creates a temporary hash with job name as key and [count, sum] as value:
output = [{ 'name' => 'aaa', 'job' => 'a', 'pay' => 2 },
{ 'name' => 'zzz', 'job' => 'a', 'pay' => 4 },
{ 'name' => 'xxx', 'job' => 'a', 'pay' => 6 },
{ 'name' => 'yyy', 'job' => 'a', 'pay' => 8 },
{ 'name' => 'aaa', 'job' => 'b', 'pay' => 2 },
{ 'name' => 'zzz', 'job' => 'b', 'pay' => 4 },
{ 'name' => 'xxx', 'job' => 'b', 'pay' => 6 },
{ 'name' => 'yyy', 'job' => 'b', 'pay' => 10 }]
count_and_sum = output.each_with_object(Hash.new([0, 0])) do |hash, mem|
job = hash['job']
count, sum = mem[job]
mem[job] = count + 1, sum + hash['pay']
end
#=> {"a"=>[4, 20], "b"=>[4, 22]}
result = count_and_sum.map do |job, (count, sum)|
[job, sum / count.to_f]
end
#=> [["a", 5.0], ["b", 5.5]]
It requires 2 passes, but the created objects aren't big. In comparison, calling group_by on a huge array of hashes isn't very efficient.
How about this (Single pass iterative average calculation)
accumulator = Hash.new {|h,k| h[k] = Hash.new(0)}
a.each_with_object(accumulator) do |(k,v),obj|
obj[k][:count] += 1
obj[k][:sum] += v
obj[k][:average] = (obj[k][:sum] / obj[k][:count].to_f)
end
#=> {"a"=>{:count=>4, :sum=>20, :average=>5.0},
# "b"=>{:count=>4, :sum=>22, :average=>5.5}}
Obviously average is just recalculated on every iteration but since you asked for them at the same time this is probably as close as you are going to get.
Using your "output" instead looks like
output.each_with_object(accumulator) do |h,obj|
key = h['job']
obj[key][:count] += 1
obj[key][:sum] += h['pay']
obj[key][:average] = (obj[key][:sum] / obj[key][:count].to_f)
end
#=> {"a"=>{:count=>4, :sum=>20, :average=>5.0},
# "b"=>{:count=>4, :sum=>22, :average=>5.5}}
as Sara Tibbetts comment suggests, my first step would be to convert it like this
new_a = a.reduce({}){ |memo, item| memo[item[0]] ||= []; memo[item[0]] << item[1]; memo}
which puts it in this format
{a: [2, 4, 6, 8], b: [2, 4, 6, 20]}
you can then use slice to filter the keys you want
new_a.slice!(key1, key2, ...)
Then do another pass through to do get the final format
new_a.reduce([]) do |memo, (k,v)|
avg = v.inject{ |sum, el| sum + el }.to_f / v.size
memo << [k,avg]
memo
end
I elected to use Enumerable#each_with_object with the object being an array of two hashes, the first to compute totals, the second to count the number of numbers that are totalled. Each hash is defined Hash.new(0), zero being the default value. See Hash::new for a fuller explanation, In short, if a hash defined h = Hash.new(0) does not have a key k, h[k] returns 0. (h is not modified.) h[k] += 1 expands to h[k] = h[k] + 1. If h does not have a key k, h[k] on the right of the equality returns 0.1
output =
[{"name"=>"aaa", "job"=>"a", "pay"=> 2},
{"name"=>"zzz", "job"=>"a", "pay"=> 4},
{"name"=>"xxx", "job"=>"a", "pay"=> 6},
{"name"=>"yyy", "job"=>"a", "pay"=> 8},
{"name"=>"aaa", "job"=>"b", "pay"=> 2},
{"name"=>"zzz", "job"=>"b", "pay"=> 4},
{"name"=>"xxx", "job"=>"b", "pay"=> 6},
{"name"=>"yyy", "job"=>"b", "pay"=>10}
]
htot, hnbr = output.each_with_object([Hash.new(0), Hash.new(0)]) do |f,(g,h)|
s = f["job"]
g[s] += f["pay"]
h[s] += 1
end
htot.merge(hnbr) { |k,o,n| o.to_f/n }.to_a
#=> [["a", 5.0], ["b", 5.5]]
If .to_a at the end is dropped the the hash {"a"=>5.0, "b"=>5.5} is returned. The OP might find that more useful than the array.
I've used the form of Hash#merge that uses a block to determine the values of keys that are present in both hashes being merged.
Note that htot={"a"=>20, "b"=>22} and hnbr=>{"a"=>4, "b"=>4}.
1 If the reader is wondering why h[k] on the left of = doesn't return zero as well, it's a different method: Hash#[]= versus Hash#[]

How to group rows based on multiple fields?

I have a notifications table where I email out reports to people based on the frequency they've selected.
If an email address exists across multiple questions and has the same frequency...then I need to group them together so I can send one report for both questions.
[
#<Notification id: 1, question_id: 58, email: "john#example.com", frequency: "daily">,
#<Notification id: 2, question_id: 25, email: "john#example.com", frequency: "daily">,
#<Notification id: 3, question_id: 47, email: "john#example.com", frequency: "monthly">,
#<Notification id: 3, question_id: 19, email: "frank#example.org", frequency: "monthly">
]
So in my example data, 3 reports would be sent:
1 to john#example.com for question's 58 and 25 on a daily basis
1 to john#example.com for question 47 on a monthly basis
1 to frank#example.org for question 19 on a monthly basis
I may not be explaining this very well, so let me know if something needs clarification.
You can achieve this with a regular group_by:
#notifications = your_method_to_retrieve_notifications
#notifications = #noticications.group_by do |notif|
notif.ferquency + ':' + notif.email
end
This will group your notifications like this:
#notifications = {
'daily:john#example.com' => [<Notification id: 1>, #etc.],
'monthly:john#example.com' => [# list of notifications],
'monthly:frank#example.org' => [# list of notif]
}
If you want only an array of list of notifications grouped by frequency & email, use the above method and add this:
#notifications = your_method_to_retrieve_notifications
#notifications = #noticications.group_by do |notif|
notif.ferquency + ':' + notif.email
end.values
# returns Array of arrays like:
[
[<Notification id: 1 frequency: "daily", email "a#b.com">,<Notification id: 2 frequency: "daily", email "a#b.com">],
[<Notification id: 3 frequency: "daily", email "xx#yy.com">],
[<Notification id: 4 frequency: "monthly", email "a#b.com">,<Notification id: 5 frequency: "monthly", email "a#b.com">],
]

Sorting an hash given an Array including information about the order criteria

I am using Ruby on Rails 3.1 and I would like to order a Hash of Arrays by caring the order "stated"/"specified" in another Array. That is, for example, I have:
# This is the Hash of Arrays mentioned above.
hash = {
1 => [
"Value 1 1",
"Value 1 2",
"Value 1 n",
],
2 => [
"Value 2 1",
"Value 2 2",
"Value 2 n",
],
3 => [
"Value 3 1",
"Value 3 2",
"Value 3 n",
],
m => [
"Value m 1",
"Value m 2",
"Value m n",
]
}
and
# This is the Array mentioned above.
array = [m, 3, 1, 2]
I would like to order hash keys as "stated"/"specified" in the array in order to have:
# Note that Hash keys are ordered as in the Array.
ordered_hash = {
m => [
"Value m 1",
"Value m 2",
"Value m n",
],
3 => [
"Value 3 1",
"Value 3 2",
"Value 3 n",
],
1 => [
"Value 1 1",
"Value 1 2",
"Value 1 n",
],
2 => [
"Value 2 1",
"Value 2 2",
"Value 2 n",
]
}
How can I make that (maybe using the Enumerable Ruby module or some unknown to me Ruby on Rails method)?
sorted_array = hash.sort_by { |k,v| array.index(k) }
If you want ordering and a Hash, you'll need to use ActiveSupport::OrderedHash, e.g.
sorted_array = hash.sort_by { |k,v| array.index(k) }
sorted_hash = ActiveSupport::OrderedHash[sorted_array]
On this toy example, James' method using array.index will be fine, but if the hash or array were large, you wouldn't want to do .index over and over. The more efficient way would be:
Hash[*array.map {|i| [i, hash[i]]}]

Resources