How to group and count the last N results? - ruby-on-rails

In Rails, you can use:
Model.group(:field).count
to yield something like:
{"a"=>7, "b"=>5, "c"=>3 "d"=>3, "e"=>4}
But how can I count ONLY in the last N lines, not the entire table, with the DATABASE doing the calculations?
Do not work:
Model.limit(100).group(:field).count
limit will limit the hash output keys not the table lines used
Model.last(100).group(:field).count
Last returns a Array and raises an error
I'm using:
* Ruby 2.3.3p222
* Rails 4.2.4
* pg 9.5.6

As you mentioned, the limit is being applied on the grouped instances, not the instances themselves. A simple workaround would be:
Model.where(id: Model.limit(100).select(:id)).group(:field).count

Array objects can also be grouped using group_by:
grouped = Model.last(100).group_by(&:field).map { |k,v| [k, v.length] }
This will return the following matrix:
#=> [["Field value 1", value_1_count], ["Field value 2", value_2_count], etc...]
Matrix can also be turned into hash:
grouped.each_with_object({}) { |value, memo| memo[value[0]] = value[1] }
To sum up, try the following:
Model.last(100)
.group_by(&:field)
.each_with_object({}) { |(key, value), memo| memo[key] = value.length }

Related

How do I find the maximum value of an object's attribute that occurs exactly twice in an array?

I'm using Ruby 2.4. If I want to find the maximum number of a numeric attribute of my model, I can do
max_num = my_objects_arr.maximum(:numeric_attr)
but how would I find the maximum number of attributes whose values occur exactly twice in my array? That is, let's say my objects array has three objects
obj1 - numeric_attr = 3
obj2 - numeric_attr = 3
obj3 - numeric_attr = 4
The maximum of the attributes above that occur exactly twice would be "3". Although "4" is the maximum of all attributes, it only occurs once in the array.
array = [1, 2, 3, 2, 3, 4]
array.group_by { |e| e } # group_by(&:itself) since 2.3
.select { |_, v| v.count == 2 }
.keys
.max
#⇒ 3
For objects and attributes:
my_objects_arr.group_by { |o| o.numeric_attr }
.select { |_, v| v.count == 2 }
.keys
.max
To get the objects themselves:
my_objects_arr.group_by { |o| o.numeric_attr }
.select { |_, v| v.count == 2 }
.max_by(&:first)
.last
Since you are using rails calculations e.g. #maximum this should work for you
my_objects_arr
.group(:numeric_attr)
.having("count(numeric_attr) = 2")
.maximum(:numeric_attr)
This will find the maximum value of numeric_attr by grouping them by the numeric_attr and selecting the numeric_attr that have exactly 2
SQL estimation
SELECT
MAX(numeric_attr)
FROM
[SOME TABLE]
GROUP BY
numeric_attr
HAVING
COUNT(numeric_attr) = 2
arr = [1, 2, 3, 2, 3, 4]
arr.each_with_object(Hash.new(0)) {|n,h| h[n] += 1}.select {|_,nbr| nbr == 2}.keys.max
#=> 3
This uses the form of Hash::new that creates a hash h with a default value of zero. That means that if h does not have a key k, h[k] returns zero (without altering the hash). This refers to the method Hash#[], not to be confused with Hash#[]=.

Ruby aggregate selective values within a collection of hashes

I have an array of hashes with the keys being countries and the values being number of days.
I would like to aggregate over the hashes and sum the values for the countries that are the same.
the array could look like this countries = [{"Country"=>"Brazil", "Duration"=>731/1 days}, {"Country"=>"Brazil", "Duration"=>365/1 days}]
I would like this to return something on the lines of: [{"Country" => "Brazil", "Duration"=>1096/1 days}]
I tried the other questions on SO like this one
countries.inject{|new_h, old_h| new_h.merge(old_h) {|_, old_v, new_v| old_v + new_v}}
Produces {"Country"=>"BrazilBrazil", "Duration"=>1096/1 days}
Is there a way to selectively only merge specific values?
This uses the form of Hash::new that creates a creates an empty hash with a default value (here 0). For a hash h created that way, h[k] returns the default value if the hash does not have a key k. The hash is not modified.
countries = [{"Country"=>"Brazil", "Duration"=>"731/1 days"},
{"Country"=>"Argentina", "Duration"=>"123/1 days"},
{"Country"=>"Brazil", "Duration"=>"240/1 days"},
{"Country"=>"Argentina", "Duration"=>"260/1 days"}]
countries.each_with_object(Hash.new(0)) {|g,h| h[g["Country"]] += g["Duration"].to_i }.
map { |k,v| { "Country"=>k, "Duration"=>"#{v}/1 days" } }
#=> [{"Country"=>"Brazil", "Duration"=>"971/1 days"},
# {"Country"=>"Argentina", "Duration"=>"383/1 days"}]
The first hash passed to the block and assigned to the block variable g.
g = {"Country"=>"Brazil", "Duration"=>"731/1 days"}
At this time h #=> {}. We then compute
h[g["Country"]] += g["Duration"].to_i
#=> h["Brazil"] += "971/1 days".to_i
#=> h["Brazil"] = h["Brazil"] + 971
#=> h["Brazil"] = 0 + 971 # h["Brazil"]
See String#to_i for an explanation of why "971/1 days".to_i returns 971.
h["Brazil"] on the right of the equality returns the default value of 0 because h does not (yet) have a key "Brazil". Note that h["Brazil"] on the right is syntactic sugar for h.[]("Brazil"), whereas on the left it is syntactic sugar for h.[]=(h["Brazil"] + 97). It is Hash#[] that returns the default value when the hash does not have the given key. The remaining steps are similar.
You may update your code as follows:
countries.inject do |new_h, old_h|
new_h.merge(old_h) do |k, old_v, new_v|
if k=="Country" then old_v else old_v + new_v end
end
end
# => {"Country"=>"Brazil", "Duration"=>1096}
where you basically use the k (for key) argument to switch among different merging policies.

Updating Ruby Hash Values with Array Values

I've created the following hash keys with values parsed from PDF into array:
columns = ["Sen", "P-Hire#", "Emp#", "DOH", "Last", "First"]
h = Hash[columns.map.with_index.to_h]
=> {"Sen"=>0, "P-Hire#"=>1, "Emp#"=>2, "DOH"=>3, "Last"=>4, "First"=>5}
Now I want to update the value of each key with 6 equivalent values from another parsed data array:
rows = list.text.scan(/^.+/)
row = rows[0].tr(',', '')
#data = row.split
=> ["2", "6", "239", "05/05/67", "Harp", "Erin"]
I can iterate over #data in the view and it will list each of the 6 values. When I try to do the same in the controller it sets the same value to each key:
data.each do |e|
h.update(h){|key,v1| (e) }
end
=>
{"Sen"=>"Harper", "P-Hire#"=>"Harper", "Emp#"=>"Harper", "DOH"=>"Harper", "Last"=>"Harper", "First"=>"Harper"
So it's setting the value of each key to the last value of the looped array...
I would just do:
h.keys.zip(#data).to_h
If the only purpose of h is as an interim step getting to the result, you can dispense with it and do:
columns.zip(#data).to_h
There are several ways to solve this problem but a more direct and straight forward way would be:
columns = ["Sen", "P-Hire#", "Emp#", "DOH", "Last", "First"]
...
#data = row.split
h = Hash.new
columns.each_with_index do |column, index|
h[column] = #data[index]
end
Another way:
h.each do |key, index|
h[key] = #data[index]
end
Like I said, there are several ways of solving the issue and the best is always going to depend on what you're trying to achieve.

Ruby - Splititng Array after getting data from Mysql

I have the following code to fetch the data from MySQL database into my rails controller
#main = $connection.execute("SELECT * FROM builds WHERE platform_type IS NOT NULL")
This returns a mysql2 type object which behaves like an array i guess.
I want to split this into 2 arrays, first one where platform_type is 'TOTAL' and everything else in the other array.
It actually returns a Mysql2::Result object. Of course you can do
totals = []
others = []
main.each { |r|
(r['platform_type'] == 'TOTAL' ? totals : others) << r
}
but why not use a rails way with smth like:
Builds.where("platform_type = ?", 'TOTAL')
Builds.where("platform_type NOT IN ?", [nil, 'TOTAL'])
Try array.select. Something like
total = #main.select { |build| build.platform_type == 'TOTAL' }
not_total = #main.reject { |build| build.platform_type == 'TOTAL' }
http://matthewcarriere.com/2008/06/23/using-select-reject-collect-inject-and-detect/
Even better, use Enumerable.partition as per this answer: Ruby Select and Reject in one method

Best way to analyse data using ruby

I would like to analyse data in my database to find out how many times certain words appear.
Ideally I would like a list of the top 20 words used in a particular column.
What would be the easiest way of going about this.
Create an autovivified hash and then loop through the rows populating the hash and incrementing the value each time you get the same key (word). Then sort the hash by value.
A word counter...
I wasn't sure if you were asking how to get rails to work on this or how to count words, but I went ahead and did a column-oriented ruby wordcounter anyway.
(BTW, at first I did try the autovivified hash, what a cool trick.)
# col: a column name or number
# strings: a String, Array of Strings, Array of Array of Strings, etc.
def count(col, *strings)
(#h ||= {})[col = col.to_s] ||= {}
[*strings].flatten.each { |s|
s.split.each { |s|
#h[col][s] ||= 0
#h[col][s] += 1
}
}
end
def formatOneCol a
limit = 2
a.sort { |e1,e2| e2[1]<=>e1[1] }.each { |results|
printf("%9d %s\n", results[1], results[0])
return unless (limit -= 1) > 0
}
end
def formatAllCols
#h.sort.each { |a|
printf("\n%9s\n", "Col " + a[0])
formatOneCol a[1]
}
end
count(1,"how now")
count(1,["how", "now", "brown"])
count(1,[["how", "now"], ["brown", "cow"]])
count(2,["you see", "see you",["how", "now"], ["brown", "cow"]])
count(2,["see", ["see", ["see"]]])
count("A_Name Instead","how now alpha alpha alpha")
formatAllCols
$ ruby count.rb
Col 1
3 how
3 now
Col 2
5 see
2 you
Col A_Name Instead
3 alpha
1 how
$
digitalross answer looks too verbose to me, also, as you tag ruby-on-rails and said you use DB.. i'm assuming you need an activerecord model so i'm giving you a full solution
in your model:
def self.top_strs(column_symbol, top_num)
h = Hash.new(0)
find(:all, :select => column_symbol).each do |obj|
obj.send(column_symbol).split.each do |word|
h[word] += 1
end
end
h.map.sort_by(&:second).reverse[0..top_num]
end
for example, model Comment, column body:
Comment.top_strs(:body, 20)

Resources