I would like to analyse data in my database to find out how many times certain words appear.
Ideally I would like a list of the top 20 words used in a particular column.
What would be the easiest way of going about this.
Create an autovivified hash and then loop through the rows populating the hash and incrementing the value each time you get the same key (word). Then sort the hash by value.
A word counter...
I wasn't sure if you were asking how to get rails to work on this or how to count words, but I went ahead and did a column-oriented ruby wordcounter anyway.
(BTW, at first I did try the autovivified hash, what a cool trick.)
# col: a column name or number
# strings: a String, Array of Strings, Array of Array of Strings, etc.
def count(col, *strings)
(#h ||= {})[col = col.to_s] ||= {}
[*strings].flatten.each { |s|
s.split.each { |s|
#h[col][s] ||= 0
#h[col][s] += 1
}
}
end
def formatOneCol a
limit = 2
a.sort { |e1,e2| e2[1]<=>e1[1] }.each { |results|
printf("%9d %s\n", results[1], results[0])
return unless (limit -= 1) > 0
}
end
def formatAllCols
#h.sort.each { |a|
printf("\n%9s\n", "Col " + a[0])
formatOneCol a[1]
}
end
count(1,"how now")
count(1,["how", "now", "brown"])
count(1,[["how", "now"], ["brown", "cow"]])
count(2,["you see", "see you",["how", "now"], ["brown", "cow"]])
count(2,["see", ["see", ["see"]]])
count("A_Name Instead","how now alpha alpha alpha")
formatAllCols
$ ruby count.rb
Col 1
3 how
3 now
Col 2
5 see
2 you
Col A_Name Instead
3 alpha
1 how
$
digitalross answer looks too verbose to me, also, as you tag ruby-on-rails and said you use DB.. i'm assuming you need an activerecord model so i'm giving you a full solution
in your model:
def self.top_strs(column_symbol, top_num)
h = Hash.new(0)
find(:all, :select => column_symbol).each do |obj|
obj.send(column_symbol).split.each do |word|
h[word] += 1
end
end
h.map.sort_by(&:second).reverse[0..top_num]
end
for example, model Comment, column body:
Comment.top_strs(:body, 20)
Related
So, I have an after_save hook on review model which calls calculate_specific_rating function of product model. The function goes like this:
def calculate_specific_rating
ratings = reviews.reload.all.pluck(:rating)
specific_rating = Hash.new(0)
ratings.each { |rating| specific_rating[rating] += 1 }
self.specific_rating = specific_rating
save
end
Right now, it returns
specific_rating => {
"2"=> 3, "4"=> 1
}
I want it to return like:
specific_rating => {
"1"=> 0, "2"=>3, "3"=>0, "4"=>1, "5"=>0
}
Also, is it okay to initialize a new hash everytime a review is saved? I want some alternative. Thanks
You can create a range from 1 until the maximum value in ratings plus 1 and start iterating through it, yielding an array where the first element is the current one, and the second element is the total of times the current element is present in ratings. After everything the result is converted to a hash:
self.specific_rating = (1..ratings.max + 1).to_h { |e| [e.to_s, ratings.count(e)] }
save
You could also do something like this -
def calculate_specific_rating
ratings = [1,2,3,4,5]
existing_ratings = reviews.group_by(&:rating).map{|k,v| [k, v.count]}.to_h
Hash[(ratings - existing_ratings.keys).map {|x| [x, 0]}].merge(existing_ratings)
end
which gives
{3=>0, 4=>0, 5=>0, 2=>3, 1=>1}
I have an array PARTITION which stores days.
I want to group_by my posts (ActiveRecord::Relation) according to how old are they and in which partition they lie.
Example: PARTITION = [0, 40, 60, 90]
I want to group posts which are 0 to 40 days old, 40 to 60 days old, 60 to 90 days old and older than 90 days.
Please note that I will get array data from an external source and I don't want to use a where clause because I am using includes and where fires db query making includes useless.
How can I do this?
Here's a simple approach:
posts.each_with_object(Hash.new { |h, k| h[k] = [] }) do |post, hash|
days_old = (Date.today - post.created_at.to_date).to_i
case days_old
when 0..39
hash[0] << post
when 40..59
hash[40] << post
when 60..89
hash[60] << post
when 90..Float::INFINITY # or 90.. in the newest Ruby versions
hash[90] << post
end
end
This iterates through the posts, along with a hash which has a default value of an empty array.
Then, we simply check how many days ago a post was created and add it to relevant key of the hash.
This hash is then returned when all posts have been processed.
You can use whatever you want for the keys (e.g. hash["< 40"]), though I've used your partitions for illustrative purposes.
The result will be something akin to the following:
{ 0: [post_1, post_3, etc],
40: [etc],
60: [etc],
90: [etc] }
Hope this helps - let me know if you've got any questions.
Edit: it's a little trickier if your PARTITIONS are coming from an external source, though the following would work:
# transform the PARTITIONS into an array of ranges
ranges = PARTITIONS.map.with_index do |p, i|
return 0..(p - 1) if i == 0 # first range is 0..partition minus 1
return i..Float::INFINITY if i + 1 == PARTITIONS.length # last range is partition to infinity
p..(PARTITIONS[i + 1] - 1)
end
# loop through the posts with a hash with arrays as the default value
posts.each_with_object(Hash.new { |h, k| h[k] = [] }) do |post, hash|
# loop through the new ranges
ranges.each do |range|
days_old = Date.today - post.created_at.to_date
hash[range] << post if range.include?(days_old) # add the post to the hash key for the range if it's present within the range
end
end
A final edit:
Bit silly using each_with_object when group_by will handle this perfectly. Example below:
posts.group_by |post|
days_old = (Date.today - post.created_at.to_date).to_i
case days_old
when 0..39
0
when 40..59
40
when 60..89
60
when 90..Float::INFINITY # or 90.. in the newest Ruby versions
90
end
end
Assumptions:
This partitioning is for display purposes.
The attribute you want to group by is days
You want to the result a hash
{ 0 => [<Post1>], 40 => [<Post12>], 60 => [<Post41>], 90 => [<Post101>] }
add these methods to your model
# post.rb
def self.age_partitioned
group_by(&:age_partition)
end
def age_partition
[90, 60, 40, 0].find(days) # replace days by the correct attribute name
end
# Now to use it
Post.where(filters).includes(:all_what_you_want).age_partitioned
As per the description given in the post, something done as below could help you group the data:
result_array_0_40 = [];result_array_40_60 = [];result_array_60_90 = [];result_array_90 = [];
result_json = {}
Now, we need to iterate over values and manually group them into dynamic key value pairs
PARTITION.each do |x|
result_array_0_40.push(x) if (0..40).include?(x)
result_array_40_60.push(x) if (40..60).include?(x)
result_array_60_90.push(x) if (60..90).include?(x)
result_array_90.push(x) if x > 90
result_json["0..40"] = result_array_0_40
result_json["40..60"] = result_array_40_60
result_json["60..90"] = result_array_60_90
result_json["90+"] = result_array_90
end
Hope it Helps!!
I just wrote a method that I'm pretty sure is terribly written. I can't figure out if there is a better way to write this in ruby. It's just a simple loop that is counting stuff.
Of course, I could use a select or something like that, but that would require looping twice on my array. Is there a way to increment several variables by looping without declaring the field before the loop? Something like a multiple select, I don't know. It's even worst when I have more counters.
Thank you!
failed_tests = 0
passed_tests = 0
tests.each do |test|
case test.status
when :failed
failed_tests += 1
when :passed
passed_tests +=1
end
end
You could do something clever like this:
tests.each_with_object(failed: 0, passed: 0) do |test, memo|
memo[test.status] += 1
end
# => { failed: 1, passed: 10 }
You can use the #reduce method:
failed, passed = tests.reduce([0, 0]) do |(failed, passed), test|
case test.status
when :failed
[failed + 1, passed]
when :passed
[failed, passed + 1]
else
[failed, passed]
end
end
Or with a Hash with default value, this will work with any statuses:
tests.reduce(Hash.new(0)) do |counter, test|
counter[test.status] += 1
counter
end
Or even enhancing this with #fivedigit's idea:
tests.each_with_object(Hash.new(0)) do |test, counter|
counter[test.status] += 1
end
Assuming Rails 4 ( using 4.0.x here). I would suggest:
tests.group(:status).count
# -> {'passed' => 2, 'failed' => 34, 'anyotherstatus' => 12}
This will group all records by any possible :status value, and count each individual ocurrence.
Edit: adding a Rails-free approach
Hash[tests.group_by(&:status).map{|k,v| [k,v.size]}]
Group by each element's value.
Map the grouping to an array of [value, counter] pairs.
Turn the array of paris into key-values within a Hash, i.e. accessible via result[1]=2 ....
hash = test.reduce(Hash.new(0)) { |hash,element| hash[element.status] += 1; hash }
this will return a hash with the count of the elements.
ex:
class Test
attr_reader :status
def initialize
#status = ['pass', 'failed'].sample
end
end
array = []
5.times { array.push Test.new }
hash = array.reduce(Hash.new(0)) { |hash,element| hash[element.status] += 1; hash }
=> {"failed"=>3, "pass"=>2}
res_array = tests.map{|test| test.status}
failed_tests = res_array.count :failed
passed_tests = res_array.count :passed
What is the best way to incrementally iterate through a pair of hashes in Ruby? Should I convert them to arrays? Should I go an entirely different direction? I am working on a problem where the code is supposed to determine what to bake, and in what quantities, for a bakery given 2 inputs. The number of people to be fed, and their favorite food. They bake 3 things (keys in my_list) and each baked item feeds a set number of people (value in my_list).
def bakery_num(num_of_people, fav_food)
my_list = {"pie" => 8, "cake" => 6, "cookie" => 1}
bake_qty = {"pie_qty" => 0, "cake_qty" => 0, "cookie_qty" => 0}
if my_list.has_key?(fav_food) == false
raise ArgumentError.new("You can't make that food")
end
index = my_list.key_at(fav_food)
until num_of_people == 0
bake_qty[index] = (num_of_people / my_list[index])
num_of_people = num_of_people - bake_qty[index]
index += 1
end
return "You need to make #{pie_qty} pie(s), #{cake_qty} cake(s), and #{cookie_qty} cookie(s)."
end
The goal is to output a list for the bakery that will result in no uneaten food. When doing the math, the modulo would then be divided into the next food item.
Thanks for the help.
What is the best way to incrementally iterate through a pair of hashes in Ruby?
Since the keys of bake_qty conveniently have a '_qty' appended to them from their corresponding keys in my_list, you can use this to your advantage:
max_value = my_list[fav_food]
my_list.each do |key,value|
next if max_value < value
qty = bake_qty[key+'_qty']
...
end
You could use 'inject' method.
until num_of_people == 0
num_of_people = my_list.inject(num_of_people) do |t,(k,v)|
if num_of_people > 0
bake_qty["#{key}_qty"] += num_of_people/v
t - v
end
end
You can sort your hash at the beginning to ensure that your first food is the fav food
I want to make a loop on a variable that can be altered inside of the loop.
first_var.sort.each do |first_id, first_value|
second_var.sort.each do |second_id, second_value_value|
difference = first_value - second_value
if difference >= 0
second_var.delete(second_id)
else
second_var[second_id] += first_value
if second_var[second_id] == 0
second_var.delete(second_id)
end
first_var.delete(first_id)
end
end
end
The idea behind this code is that I want to use it for calculating how much money a certain user is going to give some other user. Both of the variables contain hashes. The first_var is containing the users that will get money, and the second_var is containing the users that are going to pay. The loop is supposed to "fill up" a user that should get money, and when a user gets full, or a user is out of money, to just take it out of the loop, and continue filling up the rest of the users.
How do I do this, because this doesn't work?
Okay. What it looks like you have is two hashes, hence the "id, value" split.
If you are looping through arrays and you want to use the index of the array, you would want to use Array.each_index.
If you are looping through an Array of objects, and 'id' and 'value' are attributes, you only need to call some arbitrary block variable, not two.
Lets assume these are two hashes, H1 and H2, of equal length, with common keys. You want to do the following: if H1[key]value is > than H2[key]:value, remove key from H2, else, sum H1:value to H2:value and put the result in H2[key].
H1.each_key do |k|
if H1[k] > H2[k] then
H2.delete(k)
else
H2[k] = H2[k]+H1[k]
end
end
Assume you are looping through two arrays, and you want to sort them by value, and then if the value in A1[x] is greater than the value in A2[x], remove A2[x]. Else, sum A1[x] with A2[x].
b = a2.sort
a1.sort.each_index do |k|
if a1[k] > b[k]
b[k] = nil
else
b[k] = a1[k] + b[k]
end
end
a2 = b.compact
Based on the new info: you have a hash for payees and a hash for payers. Lets call them ees and ers just for convenience. The difficult part of this is that as you modify the ers hash, you might confuse the loop. One way to do this--poorly--is as follows.
e_keys = ees.keys
r_keys = ers.keys
e = 0
r = 0
until e == e_keys.length or r == r_keys.length
ees[e_keys[e]] = ees[e_keys[e]] + ers[r_keys[r]]
x = max_value - ees[e_keys[e]]
ers[r_keys[r]] = x >= 0 ? 0 : x.abs
ees[e_keys[e]] = [ees[e_keys[e]], max_value].min
if ers[r_keys[r]] == 0 then r+= 1 end
if ees[e_keys[e]] == max_value then e+=1 end
end
The reason I say that this is not a great solution is that I think there is a more "ruby" way to do this, but I'm not sure what it is. This does avoid any problems that modifying the hash you are iterating through might cause, however.
Do you mean?
some_value = 5
arrarr = [[],[1,2,5],[5,3],[2,5,7],[5,6,2,5]]
arrarr.each do |a|
a.delete(some_value)
end
arrarr now has the value [[], [1, 2], [3], [2, 7], [6, 2]]
I think you can sort of alter a variable inside such a loop but I would highly recommend against it. I'm guessing it's undefined behaviour.
here is what happened when I tried it
a.each do |x|
p x
a = []
end
prints
1
2
3
4
5
and a is [] at the end
while
a.each do |x|
p x
a = []
end
prints nothing
and a is [] at the end
If you can I'd try using
each/map/filter/select.ect. otherwise make a new array and looping through list a normally.
Or loop over numbers from x to y
1.upto(5).each do |n|
do_stuff_with(arr[n])
end
Assuming:
some_var = [1,2,3,4]
delete_if sounds like a viable candidate for this:
some_var.delete_if { |a| a == 1 }
p some_var
=> [2,3,4]