I am using the gem elasticsearch-rails to retrieve data from elasticsearch in a dynamic way, meaning that the result can have none or multiple aggregations depending on users choices.
Imagine a response like this:
(...)
"aggregations"=>
{"agg_insignia_id"=>
{"buckets"=>
[{"key"=>1,
"key_as_string"=>"1",
"doc_count"=>32156,
"agg_chain_id"=>
{"buckets"=>
[{"key"=>9,
"key_as_string"=>"9",
"doc_count"=>23079,
"agg_store_id"=>
{"buckets"=>
[{"key"=>450,
"key_as_string"=>"450",
"doc_count"=>145,
"agg_value"=>{"value"=>1785.13}},
{"key"=>349,
"key_as_string"=>"349",
"doc_count"=>143,
"agg_value"=>{"value"=>1690.37}},
How can I transform that data in a tabular data? like
| insignia_id | chain_id | store_id | value |
| 1 | 9 | 450 | 1785.13 |
| 1 | 9 | 349 | 1690.37 |
(...)
EDIT :: Being clear on the response I am looking for, two choices here: Array (simple) or Array of hashes.
Array style: [[insignia_id, chain_id, store_id, value], [1,9,450,1785.13], [1,9,349,1690.37],...]
Array of Hashes style: [{insignia_id => 1, chain_id => 9, store_id => 450, value => 1785.13}, {insignia_id => 1, chain_id => 9, store_id => 450, value => 1690.37 }]
The later is more like an activerecord style...
ok, so I came up with a solution for an array response.
Firstly added a helper for what comes ahead...
class Hash
def deep_find(key, object=self, found=nil)
if object.respond_to?(:key?) && object.key?(key)
return object[key]
elsif object.is_a? Enumerable
object.find { |*a| found = deep_find(key, a.last) }
return found
end
end
end
now for the array algorithm (added in a concern):
def self.to_table_array(data, aggs, final_table = nil, row = [])
final_table = [aggs.keys] if final_table.nil?
hash_tree = data.deep_find(aggs.keys.first)
if aggs.values.uniq.length == 1 && aggs.values.uniq == [:data]
aggs.keys.each do |agg|
row << data[agg]["value"]
end
final_table << row
else
hash_tree["buckets"].each_with_index do |h, index|
row.pop if index > 0
aggs.shift if index == 0
row << h["key_as_string"]
final_table = to_table_array(h, aggs.clone, final_table, row.clone)
end
end
final_table
end
The call for this method could be made like this:
#_fields = { "insignia_id" => :row, "chain_id" => :row, "store_id"=> :row, "value" => : data }
#res.response => Elasticsearch response
result = to_table_array(res.response, _fields)
There are some things quite specific to this case like you can see on this _fields variable. Also I'm assuming each aggregation has the name of the term itself. The rest is quite the same for every possible case.
A result of an array of hashes is pretty simple from here just by replacing few lines.
I put a lot of efford in this. Hope this helps someone else other than me.
Related
I have an array PARTITION which stores days.
I want to group_by my posts (ActiveRecord::Relation) according to how old are they and in which partition they lie.
Example: PARTITION = [0, 40, 60, 90]
I want to group posts which are 0 to 40 days old, 40 to 60 days old, 60 to 90 days old and older than 90 days.
Please note that I will get array data from an external source and I don't want to use a where clause because I am using includes and where fires db query making includes useless.
How can I do this?
Here's a simple approach:
posts.each_with_object(Hash.new { |h, k| h[k] = [] }) do |post, hash|
days_old = (Date.today - post.created_at.to_date).to_i
case days_old
when 0..39
hash[0] << post
when 40..59
hash[40] << post
when 60..89
hash[60] << post
when 90..Float::INFINITY # or 90.. in the newest Ruby versions
hash[90] << post
end
end
This iterates through the posts, along with a hash which has a default value of an empty array.
Then, we simply check how many days ago a post was created and add it to relevant key of the hash.
This hash is then returned when all posts have been processed.
You can use whatever you want for the keys (e.g. hash["< 40"]), though I've used your partitions for illustrative purposes.
The result will be something akin to the following:
{ 0: [post_1, post_3, etc],
40: [etc],
60: [etc],
90: [etc] }
Hope this helps - let me know if you've got any questions.
Edit: it's a little trickier if your PARTITIONS are coming from an external source, though the following would work:
# transform the PARTITIONS into an array of ranges
ranges = PARTITIONS.map.with_index do |p, i|
return 0..(p - 1) if i == 0 # first range is 0..partition minus 1
return i..Float::INFINITY if i + 1 == PARTITIONS.length # last range is partition to infinity
p..(PARTITIONS[i + 1] - 1)
end
# loop through the posts with a hash with arrays as the default value
posts.each_with_object(Hash.new { |h, k| h[k] = [] }) do |post, hash|
# loop through the new ranges
ranges.each do |range|
days_old = Date.today - post.created_at.to_date
hash[range] << post if range.include?(days_old) # add the post to the hash key for the range if it's present within the range
end
end
A final edit:
Bit silly using each_with_object when group_by will handle this perfectly. Example below:
posts.group_by |post|
days_old = (Date.today - post.created_at.to_date).to_i
case days_old
when 0..39
0
when 40..59
40
when 60..89
60
when 90..Float::INFINITY # or 90.. in the newest Ruby versions
90
end
end
Assumptions:
This partitioning is for display purposes.
The attribute you want to group by is days
You want to the result a hash
{ 0 => [<Post1>], 40 => [<Post12>], 60 => [<Post41>], 90 => [<Post101>] }
add these methods to your model
# post.rb
def self.age_partitioned
group_by(&:age_partition)
end
def age_partition
[90, 60, 40, 0].find(days) # replace days by the correct attribute name
end
# Now to use it
Post.where(filters).includes(:all_what_you_want).age_partitioned
As per the description given in the post, something done as below could help you group the data:
result_array_0_40 = [];result_array_40_60 = [];result_array_60_90 = [];result_array_90 = [];
result_json = {}
Now, we need to iterate over values and manually group them into dynamic key value pairs
PARTITION.each do |x|
result_array_0_40.push(x) if (0..40).include?(x)
result_array_40_60.push(x) if (40..60).include?(x)
result_array_60_90.push(x) if (60..90).include?(x)
result_array_90.push(x) if x > 90
result_json["0..40"] = result_array_0_40
result_json["40..60"] = result_array_40_60
result_json["60..90"] = result_array_60_90
result_json["90+"] = result_array_90
end
Hope it Helps!!
Say i had a record in my database like
+----+-----------+----------+
| id | firstname | lastname |
+----+-----------+----------+
| 1 | 'Bill' | nil |
+----+-----------+----------+
(note last name is nil)
Is there any where I can retrieve the above record using the following hash structure as search parameters:
vals = {firstname: "Bill", lastname: "test"}
Table.where(vals)
(ie: find the closest match, ignoring the nil column value in the table)
(I'm thinking of checking each key in the hash individually and stopping when a match is found, but just wondering if there is a more efficient way, specially for larger tables)
You could make custom search.
def self.optional_where params
query_params = params.keys.map do |k|
"(#{k} = ? OR #{k} IS NULL)"
end.join(" AND ")
where(query_params, *params.values)
end
Then you would use it like
Table.optional_where(vals)
This will produce next query
SELECT "tables".* FROM "tables" WHERE ((firstname = 'Bill' OR first_name IS NULL) AND (lastname = 'test' OR last_name IS NULL))
Let make a custom search like this:
scope :custom_search, -> (params) {
params.each do |k, v|
params[k] = if
if v.is_a? Array
(v << nil).uniq
else
[v, nil]
end
where(params)
end
}
Then we use it like:
search_params = {firstname: "Bill", lastname: "test"}
Table.custom_search(search_params)
The generated sql will be:
SELECT * FROM tables where firstname IN ['Bill', null] AND lastname IN ['test', null]
This means you don't care if one or more fields are nil
I just wrote a method that I'm pretty sure is terribly written. I can't figure out if there is a better way to write this in ruby. It's just a simple loop that is counting stuff.
Of course, I could use a select or something like that, but that would require looping twice on my array. Is there a way to increment several variables by looping without declaring the field before the loop? Something like a multiple select, I don't know. It's even worst when I have more counters.
Thank you!
failed_tests = 0
passed_tests = 0
tests.each do |test|
case test.status
when :failed
failed_tests += 1
when :passed
passed_tests +=1
end
end
You could do something clever like this:
tests.each_with_object(failed: 0, passed: 0) do |test, memo|
memo[test.status] += 1
end
# => { failed: 1, passed: 10 }
You can use the #reduce method:
failed, passed = tests.reduce([0, 0]) do |(failed, passed), test|
case test.status
when :failed
[failed + 1, passed]
when :passed
[failed, passed + 1]
else
[failed, passed]
end
end
Or with a Hash with default value, this will work with any statuses:
tests.reduce(Hash.new(0)) do |counter, test|
counter[test.status] += 1
counter
end
Or even enhancing this with #fivedigit's idea:
tests.each_with_object(Hash.new(0)) do |test, counter|
counter[test.status] += 1
end
Assuming Rails 4 ( using 4.0.x here). I would suggest:
tests.group(:status).count
# -> {'passed' => 2, 'failed' => 34, 'anyotherstatus' => 12}
This will group all records by any possible :status value, and count each individual ocurrence.
Edit: adding a Rails-free approach
Hash[tests.group_by(&:status).map{|k,v| [k,v.size]}]
Group by each element's value.
Map the grouping to an array of [value, counter] pairs.
Turn the array of paris into key-values within a Hash, i.e. accessible via result[1]=2 ....
hash = test.reduce(Hash.new(0)) { |hash,element| hash[element.status] += 1; hash }
this will return a hash with the count of the elements.
ex:
class Test
attr_reader :status
def initialize
#status = ['pass', 'failed'].sample
end
end
array = []
5.times { array.push Test.new }
hash = array.reduce(Hash.new(0)) { |hash,element| hash[element.status] += 1; hash }
=> {"failed"=>3, "pass"=>2}
res_array = tests.map{|test| test.status}
failed_tests = res_array.count :failed
passed_tests = res_array.count :passed
What is the best way to achieve the following, I have following array of actions under ABC
ABC:-
ABC:Actions,
ABC:Actions:ADD-DATA,
ABC:Actions:TRANSFER-DATA,
ABC:Actions:EXPORT,
ABC:Actions:PRINT,
ABC:Detail,
ABC:Detail:OVERVIEW,
ABC:Detail:PRODUCT-DETAIL,
ABC:Detail:EVENT-LOG,
ABC:Detail:ORDERS
I want to format this as:
ABC =>{Actions=> [ADD-DATA,TRANSFER-DATA,EXPORT,PRINT], Detail => [Overview, Product-detail, event-log,orders]}
There's probably a ton of ways to do it but here's one:
a = ["ABC:Actions",
"ABC:Actions:ADD-DATA",
"ABC:Actions:TRANSFER-DATA",
"ABC:Actions:EXPORT",
"ABC:Actions:PRINT",
"ABC:Detail",
"ABC:Detail:OVERVIEW",
"ABC:Detail:PRODUCT-DETAIL",
"ABC:Detail:EVENT-LOG",
"ABC:Detail:ORDERS"]
a.map { |action| action.split(":") }.inject({}) do |m, s|
m[s.at(0)] ||= {}
m[s.at(0)][s.at(1)] ||= [] if s.at(1)
m[s.at(0)][s.at(1)] << s.at(2) if s.at(2)
m
end
The map call returns an array where each of the strings in the original array have been split into an array of elements that were separated by :. For example [["ABC","Actions","ADD-DATA"] ... ]
The inject call then builds up a hash by going through each of these "split" arrays. It creates a mapping for the first element, if one doesn't already exist, to an empty hash, e.g. "ABC" => {}. Then it creates a mapping in that hash for the second element, if one doesn't already exist, to an empty array, e.g. "ABC" => { "Detail" => [] }. Then it adds the third element to that array to give something like "ABC" => { "Detail" => ["OVERVIEW"] }. Then it goes onto the next "split" array and adds that to the hash too in the same way.
I will do this as below :
a = ["ABC:Actions",
"ABC:Actions:ADD-DATA",
"ABC:Actions:TRANSFER-DATA",
"ABC:Actions:EXPORT",
"ABC:Actions:PRINT",
"ABC:Detail",
"ABC:Detail:OVERVIEW",
"ABC:Detail:PRODUCT-DETAIL",
"ABC:Detail:EVENT-LOG",
"ABC:Detail:ORDERS"]
m = a.map{|i| i.split(":")[1..-1]}
# => [["Actions"],
# ["Actions", "ADD-DATA"],
# ["Actions", "TRANSFER-DATA"],
# ["Actions", "EXPORT"],
# ["Actions", "PRINT"],
# ["Detail"],
# ["Detail", "OVERVIEW"],
# ["Detail", "PRODUCT-DETAIL"],
# ["Detail", "EVENT-LOG"],
# ["Detail", "ORDERS"]]
m.each_with_object(Hash.new([])){|(i,j),ob| ob[i] = ob[i] + [j] unless j.nil? }
# => {"Actions"=>["ADD-DATA", "TRANSFER-DATA", "EXPORT", "PRINT"],
# "Detail"=>["OVERVIEW", "PRODUCT-DETAIL", "EVENT-LOG", "ORDERS"]}
It was just interesting to do it with group_by :)
a = ['ABC:Actions',
'ABC:Actions:ADD-DATA',
'ABC:Actions:TRANSFER-DATA',
'ABC:Actions:EXPORT',
'ABC:Actions:PRINT',
'ABC:Detail',
'ABC:Detail:OVERVIEW',
'ABC:Detail:PRODUCT-DETAIL',
'ABC:Detail:EVENT-LOG',
'ABC:Detail:ORDERS']
result = a.map { |action| action.split(":") }.group_by(&:shift)
result.each do |k1,v1|
result[k1] = v1.group_by(&:shift)
result[k1].each { |k2,v2| result[k1][k2] = v2.flatten }
end
p result
{"ABC"=>{"Actions"=>["ADD-DATA", "TRANSFER-DATA", "EXPORT", "PRINT"], "Detail"=>["OVERVIEW", "PRODUCT-DETAIL", "EVENT-LOG", "ORDERS"]}}
I would like to analyse data in my database to find out how many times certain words appear.
Ideally I would like a list of the top 20 words used in a particular column.
What would be the easiest way of going about this.
Create an autovivified hash and then loop through the rows populating the hash and incrementing the value each time you get the same key (word). Then sort the hash by value.
A word counter...
I wasn't sure if you were asking how to get rails to work on this or how to count words, but I went ahead and did a column-oriented ruby wordcounter anyway.
(BTW, at first I did try the autovivified hash, what a cool trick.)
# col: a column name or number
# strings: a String, Array of Strings, Array of Array of Strings, etc.
def count(col, *strings)
(#h ||= {})[col = col.to_s] ||= {}
[*strings].flatten.each { |s|
s.split.each { |s|
#h[col][s] ||= 0
#h[col][s] += 1
}
}
end
def formatOneCol a
limit = 2
a.sort { |e1,e2| e2[1]<=>e1[1] }.each { |results|
printf("%9d %s\n", results[1], results[0])
return unless (limit -= 1) > 0
}
end
def formatAllCols
#h.sort.each { |a|
printf("\n%9s\n", "Col " + a[0])
formatOneCol a[1]
}
end
count(1,"how now")
count(1,["how", "now", "brown"])
count(1,[["how", "now"], ["brown", "cow"]])
count(2,["you see", "see you",["how", "now"], ["brown", "cow"]])
count(2,["see", ["see", ["see"]]])
count("A_Name Instead","how now alpha alpha alpha")
formatAllCols
$ ruby count.rb
Col 1
3 how
3 now
Col 2
5 see
2 you
Col A_Name Instead
3 alpha
1 how
$
digitalross answer looks too verbose to me, also, as you tag ruby-on-rails and said you use DB.. i'm assuming you need an activerecord model so i'm giving you a full solution
in your model:
def self.top_strs(column_symbol, top_num)
h = Hash.new(0)
find(:all, :select => column_symbol).each do |obj|
obj.send(column_symbol).split.each do |word|
h[word] += 1
end
end
h.map.sort_by(&:second).reverse[0..top_num]
end
for example, model Comment, column body:
Comment.top_strs(:body, 20)