How to test dependence between X and Y in annual data? - machine-learning

This is a sample from my original data set.
I need to test whether y is significantly dependent on x (y~x), and I am thinking about how to model that properly.
The problem is that this is annual data; each combination of destination and model can not be duplicated in a year, so [Milan; XG] can not be given twice in 2018. But there is no guarantee that each combination of model and destination is given in every year, for instance [Milan; XG] can be in 2018 but not in 2020.
Could you tell me which model would fit this issue? And how?
structure(list(destination = c("Milan", "MIlan", "MIlan", "London",
"London", "Paris", "Paris", "Paris", "Paris", "Rome", "Rome",
"Milan", "MIlan", "Brasil", "Brasil", "Brasil", "Brasil", "Paris",
"Rome", "Rome", "Milan", "London", "London", "London", "NY",
"NY"), year = c(2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L,
2018L, 2018L, 2018L, 2018L, 2019L, 2019L, 2019L, 2019L, 2019L,
2019L, 2019L, 2019L, 2019L, 2020L, 2020L, 2020L, 2020L, 2020L,
2020L), model = c("XG", "A", "B", "XG", "A", "XG", "B", "C",
"D", "A", "XG", "A", "XG", "A", "D", "XG", "B", "XG", "B", "A",
"B", "A", "XG", "C", "A", "XG"), y = c(34, 134, 31, 22, 345,
435, 45, 453, 55, 348, 89, 785, 980, 889, 567, 547, 33, 2354,
34, 22, 452, 4524, 435, 345, 345, 534), x = c(17.6695512, 35.07829364,
16.87201322, 14.21338109, 56.28538067, 63.20198065, 20.3278907,
64.4963535, 22.47332875, 56.52957002, 28.58782161, 84.90258016,
94.86348995, 90.35182737, 72.15685394, 70.8728216, 17.4077656,
147.0243572, 17.6695512, 14.21338109, 64.42512614, 203.8202633,
63.20198065, 56.28538067, 56.28538067, 70.02557581)), class = "data.frame", row.names = c(NA,
-26L))
output:
destination year model y x
1 Milan 2018 XG 34 17.66955
2 MIlan 2018 A 134 35.07829
3 MIlan 2018 B 31 16.87201
4 London 2018 XG 22 14.21338
5 London 2018 A 345 56.28538
6 Paris 2018 XG 435 63.20198
7 Paris 2018 B 45 20.32789
8 Paris 2018 C 453 64.49635
9 Paris 2018 D 55 22.47333
10 Rome 2018 A 348 56.52957
11 Rome 2018 XG 89 28.58782
12 Milan 2019 A 785 84.90258
13 MIlan 2019 XG 980 94.86349
14 Brasil 2019 A 889 90.35183
15 Brasil 2019 D 567 72.15685
16 Brasil 2019 XG 547 70.87282
17 Brasil 2019 B 33 17.40777
18 Paris 2019 XG 2354 147.02436
19 Rome 2019 B 34 17.66955
20 Rome 2019 A 22 14.21338
21 Milan 2020 B 452 64.42513
22 London 2020 A 4524 203.82026
23 London 2020 XG 435 63.20198
24 London 2020 C 345 56.28538
25 NY 2020 A 345 56.28538
26 NY 2020 XG 534 70.02558

Related

Merging & Summing nested hashes in Ruby

What I'm trying to do is very similar to the question outlined in this post, but I have one additional problem in that the nested values of my hash need to have their dates grouped and the values of each date summed. The goal is to create a Multiple Series Graph in Chartkick.
The query, grabbing a month range for example:
arr = LineItem.includes(:order, :product)
.where(orders: {order_date: Date.parse("Jan 1 2020")..Date.parse("Feb 1 2020")})
.map { |line_item| { name: line_item.product.model_number, data: { line_item.order.order_date.strftime('%a %b %d, %Y') => line_item.order_quantity } } }
The output hash:
=> [
{:name=>"FR-GP02", :data=>{"Mon Jan 20, 2020"=>2}},
{:name=>"FR-GP02", :data=>{"Mon Jan 20, 2020"=>5}},
{:name=>"FR-GP02", :data=>{"Tue Jan 21, 2020"=>1}},
{:name=>"FR-GP02", :data=>{"Tue Jan 21, 2020"=>3}},
{:name=>"FR-GP02", :data=>{"Wed Jan 22, 2020"=>1}},
{:name=>"FR-GP04", :data=>{"Mon Jan 20, 2020"=>2}},
{:name=>"FR-GP04", :data=>{"Tue Jan 21, 2020"=>4}},
{:name=>"FR-GP04", :data=>{"Tue Jan 21, 2020"=>3}},
{:name=>"FR-GP04", :data=>{"Tue Jan 21, 2020"=>6}},
{:name=>"FR-GP04", :data=>{"Wed Jan 22, 2020"=>3}},
{:name=>"FR-GP01", :data=>{"Tue Jan 21, 2020"=>5}},
{:name=>"FR-GP01", :data=>{"Thu Jan 23, 2020"=>3}},
{:name=>"FR-GP01", :data=>{"Thu Jan 23, 2020"=>1}},
...
My expected hash; which should group the name, then group the date and sum the value:
=> [
{:name=>"FR-GP02", :data=>{"Mon Jan 20, 2020"=>7, "Tue Jan 21, 2020"=>4, "Wed Jan 22, 2020"=>1}},
{:name=>"FR-GP04", :data=>{"Mon Jan 20, 2020"=>2, "Tue Jan 21, 2020"=>13, "Wed Jan 22, 2020"=>3}},
{:name=>"FR-GP01", :data=>{"Tue Jan 21, 2020"=>5, "Thu Jan 23, 2020"=>4}},
...
However, after running this code:
arr.group_by {|h| h[:name]}.map { |k,v| { name: k, data: v.map {|h| h[:data]}.reduce(&:merge)}}
this is the output:
=> [
{:name=>"RP-AP02", :data=>{"Mon Jan 20, 2020"=>2, "Tue Jan 21, 2020"=>1, "Wed Jan 22, 2020"=>1}},
{:name=>"RP-AP04", :data=>{"Mon Jan 20, 2020"=>2, "Tue Jan 21, 2020"=>4, "Wed Jan 22, 2020"=>3}},
{:name=>"RP-AP01", :data=>{"Tue Jan 21, 2020"=>5, "Thu Jan 23, 2020"=>3}},
...
The output generated does group the name and data, but does not sum the quantities. I'm grouping it by day here as an example, but would also like the option of grouping it by week & month. In the past 8 hours of monkeying with this, I've also tried using Groupdate to no avail.
There are many ways to obtain the desired return value. Here are two. First I define arr.
arr = [
{:name=>"FR-GP02", :data=>{"Mon Jan 20, 2020"=>2}},
{:name=>"FR-GP02", :data=>{"Mon Jan 20, 2020"=>5}},
{:name=>"FR-GP02", :data=>{"Tue Jan 21, 2020"=>1}},
{:name=>"FR-GP02", :data=>{"Tue Jan 21, 2020"=>3}},
{:name=>"FR-GP02", :data=>{"Wed Jan 22, 2020"=>1}},
{:name=>"FR-GP04", :data=>{"Mon Jan 20, 2020"=>2}},
{:name=>"FR-GP04", :data=>{"Tue Jan 21, 2020"=>4}},
{:name=>"FR-GP04", :data=>{"Tue Jan 21, 2020"=>3}},
{:name=>"FR-GP04", :data=>{"Tue Jan 21, 2020"=>6}},
{:name=>"FR-GP04", :data=>{"Wed Jan 22, 2020"=>3}},
{:name=>"FR-GP01", :data=>{"Tue Jan 21, 2020"=>5}},
{:name=>"FR-GP01", :data=>{"Thu Jan 23, 2020"=>3}},
{:name=>"FR-GP01", :data=>{"Thu Jan 23, 2020"=>1}}]
The first calculation employs the methods Enumerable#group_by and Hash#transform_values.
arr.group_by { |h| h[:name] }
.map do |k,v|
{ name: k,
data: v.group_by do |h|
h[:data].keys.first
end.transform_values { |a| a.sum { |h| h[:data].values.first }}
}
end
#=> [{:name=>"FR-GP02", :data=>{"Mon Jan 20, 2020"=>7,
"Tue Jan 21, 2020"=>4,
"Wed Jan 22, 2020"=>1}},
{:name=>"FR-GP04", :data=>{"Mon Jan 20, 2020"=>2,
"Tue Jan 21, 2020"=>13,
"Wed Jan 22, 2020"=>3}},
{:name=>"FR-GP01", :data=>{"Tue Jan 21, 2020"=>5,
"Thu Jan 23, 2020"=>4}}]
Note:
arr.group_by { |h| h[:name] }
#=> {"FR-GP02"=>[{:name=>"FR-GP02", :data=>{"Mon Jan 20, 2020"=>2}},
{:name=>"FR-GP02", :data=>{"Mon Jan 20, 2020"=>5}},
{:name=>"FR-GP02", :data=>{"Tue Jan 21, 2020"=>1}},
{:name=>"FR-GP02", :data=>{"Tue Jan 21, 2020"=>3}},
{:name=>"FR-GP02", :data=>{"Wed Jan 22, 2020"=>1}}],
"FR-GP04"=>[{:name=>"FR-GP04", :data=>{"Mon Jan 20, 2020"=>2}},
{:name=>"FR-GP04", :data=>{"Tue Jan 21, 2020"=>4}},
{:name=>"FR-GP04", :data=>{"Tue Jan 21, 2020"=>3}},
{:name=>"FR-GP04", :data=>{"Tue Jan 21, 2020"=>6}},
{:name=>"FR-GP04", :data=>{"Wed Jan 22, 2020"=>3}}],
"FR-GP01"=>[{:name=>"FR-GP01", :data=>{"Tue Jan 21, 2020"=>5}},
{:name=>"FR-GP01", :data=>{"Thu Jan 23, 2020"=>3}},
{:name=>"FR-GP01", :data=>{"Thu Jan 23, 2020"=>1}}]}
map's block variables initially equal the following:
k = "FR-GP02"
v = [{:name=>"FR-GP02", :data=>{"Mon Jan 20, 2020"=>2}},
{:name=>"FR-GP02", :data=>{"Mon Jan 20, 2020"=>5}},
{:name=>"FR-GP02", :data=>{"Tue Jan 21, 2020"=>1}},
{:name=>"FR-GP02", :data=>{"Tue Jan 21, 2020"=>3}},
{:name=>"FR-GP02", :data=>{"Wed Jan 22, 2020"=>1}}]
Then the value of :data in the first hash being created is computed as follows:
f = v.group_by do |h|
h[:data].keys.first
end
#=> {"Mon Jan 20, 2020"=>[
# {:name=>"FR-GP02", :data=>{"Mon Jan 20, 2020"=>2}},
# {:name=>"FR-GP02", :data=>{"Mon Jan 20, 2020"=>5}}],
# "Tue Jan 21, 2020"=>[
# {:name=>"FR-GP02", :data=>{"Tue Jan 21, 2020"=>1}},
# {:name=>"FR-GP02", :data=>{"Tue Jan 21, 2020"=>3}}],
# "Wed Jan 22, 2020"=>[
# {:name=>"FR-GP02", :data=>{"Wed Jan 22, 2020"=>1}}]}
and lastly,
f.transform_values { |a| a.sum { |h| h[:data].values.first }}
#=> {"Mon Jan 20, 2020"=>7, "Tue Jan 21, 2020"=>4, "Wed Jan 22, 2020"=>1}
Here is a second way to obtain the desired result.
arr.each_with_object(Hash.new(0)) do |g,h|
d, n = g[:data].flatten
h[[g[:name], d]] += n
end.group_by { |(name, _),_| name }
.map do |name,arr|
{ name: name, data: arr.each_with_object({}) { |((_,d),t),h| h[d] = t } }
end
#=> (as above)
The steps are as follows.
s = arr.each_with_object(Hash.new(0)) do |g,h|
d, n = g[:data].flatten
h[[g[:name], d]] += n
end
#=> {["FR-GP02", "Mon Jan 20, 2020"]=>7,
# ["FR-GP02", "Tue Jan 21, 2020"]=>4,
# ["FR-GP02", "Wed Jan 22, 2020"]=>1,
# ["FR-GP04", "Mon Jan 20, 2020"]=>2,
# ["FR-GP04", "Tue Jan 21, 2020"]=>13,
# ["FR-GP04", "Wed Jan 22, 2020"]=>3,
# ["FR-GP01", "Tue Jan 21, 2020"]=>5,
# ["FR-GP01", "Thu Jan 23, 2020"]=>4}
This uses the form of Hash::new that takes an argument called its default value (usually, as here, zero) and no block. If a hash is defined
h = Hash.new(0)
and--possibly after adding key-value pairs--does not have a key k, h[k] will return the default value. This means that in the expression
h[[g[:name], d]] += n
if h does not have a key [g[:name], d] the value of h for that key is initialized to zero before n is added. If h does have that key the current value of that key is increased by n.
Continuing the calculation,
t = s.group_by { |(name,_),_| name }
#=> {"FR-GP02"=>[[["FR-GP02", "Mon Jan 20, 2020"], 7],
# [["FR-GP02", "Tue Jan 21, 2020"], 4],
# [["FR-GP02", "Wed Jan 22, 2020"], 1]],
# "FR-GP04"=>[[["FR-GP04", "Mon Jan 20, 2020"], 2],
# [["FR-GP04", "Tue Jan 21, 2020"], 13],
# [["FR-GP04", "Wed Jan 22, 2020"], 3]],
# "FR-GP01"=>[[["FR-GP01", "Tue Jan 21, 2020"], 5],
# [["FR-GP01", "Thu Jan 23, 2020"], 4]]}
Lastly,
t.map do |name,arr|
{ name: name, data: arr.each_with_object({}) { |((_,d),t),h| h[d] = t } }
end
#=> (as above)
Here and earlier I've made good use of Ruby's powerful technique called Array decomposition. See also this article.

Is there an easy way to extract tensor by viewing the elements as indices?

The input tensor shape as below
input =
[[ 0 0 1 2]
[ 0 3 4 5]
[ 0 6 7 8]
[ 1 9 10 11]
[ 1 12 13 14]
[ 1 15 16 17]
[ 1 18 19 20]
[ 1 21 22 23]
[ 1 24 25 26]
[ 1 27 28 29]
[ 1 30 31 32]
[ 2 33 34 35]
[ 2 36 37 38]
[ 2 39 40 41]]
And I want to extract block-wise elements according to the first element of each row(like:0,1,2), does anyone help me with it, THANKS!
If there are off-the-shelf function would be great.

Rails - Convert string to time

I'm trying to parse these string into time: "3 on Jun 20", "Jun 20 at 3", "Jun 20 at 300".
Using DateTime.parse didnt parse "3", "300" into "3:00 AM", it just returns Wed, 20 Jun 2018 00:00:00 +0000.
Anyone has any idea to parse these integer into time?
There's Chronic, a "natural language date/time parser":
require 'chronic'
Chronic.parse('3 on Jun 20') #=> 2018-06-20 15:00:00 +0200
Chronic.parse('Jun 20 at 3') #=> 2018-06-20 15:00:00 +0200
Chronic.parse('Jun 20 at 300') #=> 2018-06-20 15:00:00 +0200
Just out of curiosity, trying to reinvent chronic in 4 LOCs :)
["3 on Jun 20", "Jun 20 at 3", "Jun 20 at 300"].map do |dt|
d, t = dt.split(/\s+at\s+/i)
t, d = dt.split(/\s+on\s+/i) unless t
return [dt] unless t && d
t = t[0..-3] + (t[-2..-1] ? ":" << t[-2..-1] : t[/.{,2}\z/] + ":00")
[d, t] # [["Jun 20", "3:00"], ["Jun 20", "3:00"], ["Jun 20", "3:00"]]
end.map { |dt| DateTime.parse dt.join ' ' }
Use strptime to parse a custom format:
DateTime.strptime("3 on Jun 20", "%H on %b %d")
https://ruby-doc.org/stdlib-2.5.0/libdoc/date/rdoc/DateTime.html#method-c-strptime

Rails: Calculate days between date range given in string format

How could I calculate days from a given date range in Rails Controller.
Example:
daterange = "August 24 - Sept 11, 2016"
Desired output: 19
Also we have to keep in mind that the date range string may also change to something:
"December 24 2016 - Jan 11, 2017"
Try this:
(daterange.split("-")[1].to_date - daterange.split("-")[0].to_date).to_i + 1
# daterange = "August 24 - Sept 11, 2016"
# => 19
# daterange = "December 24 2016 - Jan 11, 2017"
# => 19
This should work:
def day_difference(daterange)
daterange.split('-').map(&:to_date).inject { |r,e| (e-r).to_i + 1 }
end
With your given examples:
day_difference("August 24 - Sept 11, 2016")
# => 19
day_difference("December 24 2016 - Jan 11, 2017")
# => 19

Remove duplicates from Ruby arrays and sum

I have the following array of arrays [date, value]:
array = [[12 Mar 2015, 0], [12 Mar 2015, 5], [13 Mar 2015, 0], [14 Mar 2015, 49], [15 Mar 2015, 51], [15 Mar 2015, 10], [16 Mar 2015, 110], [17 Mar 2015, 0], [18 Mar 2015, 31], [19 Mar 2015, 47], [20 Mar 2015, 0], [21 Mar 2015, 0], [22 Mar 2015, 138], [22 Mar 2015, 10], [23 Mar 2015, 0]]
You can see that there are arrays with duplicate dates. How would one sum the values while grouping by the dates? This is what I am looking for:
array = [[12 Mar 2015, 5], [13 Mar 2015, 0], [14 Mar 2015, 49], [15 Mar 2015, 61], [16 Mar 2015, 110], [17 Mar 2015, 0], [18 Mar 2015, 31], [19 Mar 2015, 47], [20 Mar 2015, 0], [21 Mar 2015, 0], [22 Mar 2015, 148], [23 Mar 2015, 0]]
Your array of days should look like
array = [["12 Mar 2015", 0], ["12 Mar 2015", 5], ["13 Mar 2015", 0], ["14 Mar 2015", 49], ["15 Mar 2015", 51], ["15 Mar 2015", 10], ["16 Mar 2015", 110], ["17 Mar 2015", 0], ["18 Mar 2015", 31], ["19 Mar 2015", 47], ["20 Mar 2015", 0], ["21 Mar 2015", 0], ["22 Mar 2015", 138], ["22 Mar 2015", 10], ["23 Mar 2015", 0]]
grouped = array.inject(Hash.new(0)) do |result, itm|
result[itm.first] += itm.last
result
end.to_a
UPDATED
Many thanks to #nathanvda, inject({}) do |hash, [time, index]| was my mistake. In any case his solution is clearer.
array.inject({}) do |hash, item|
time, index = item.to_a
hash[time] = hash.fetch(time, 0) + index
hash
end.to_a

Resources