How to build a table where each cell is built using the 2 dimensions - ruby-on-rails

Desired output
Each User has child Plan which has child PlanDate objects. PlanDate has an attribute ddate which is just a date. Plan has an attribute mtype that can either be M, V, or C (haha MVC, subconscious techy much?). For a given week (let's just say the current week), I'd like to print out a table that looks like this:
----------------------------------------------------------------------------
| User | Mon | Tue | Wed | Thu | Fri | Other attributes of User
----------------------------------------------------------------------------
| Eric | M | | M | | M | ...
----------------------------------------------------------------------------
| Erin | V | V | V | V | V | ...
----------------------------------------------------------------------------
| Jace | | C | C | | | ...
----------------------------------------------------------------------------
| Kris | C | | | | | ...
----------------------------------------------------------------------------
| Tina | V | | V | | V | ...
----------------------------------------------------------------------------
| Lily | M | M | M | M | M | ...
----------------------------------------------------------------------------
The order of the Users on the rows doesn't really matter to me; I may add Ransack gem to make it ordered, but for now ignore. A given User may not have PlanDates with a ddate for every day in a given week, and certainly there's no relationship between the PlanDates across Users.
Proposed options
I feel like there are two options:
In the view, print the column headers with a data-attribute of the day in question, and print the row headers with a data-attribute of the user id in question (will have to first select all the users who do have a grandchild PlanDate with a ddate somewhere in the current week). Then in the intersection, use the two data-attributes to query ActiveRecord.
In the model, generate a data hash that can create the table, and then pass that hash to the view via the controller.
Option 1 makes more intuitive sense to me having been a lifelong Excel user, but it breaks MVC entirely, so I'm trying to go with Option 2, and the challenges below are related to Option 2. That said if you can make a case for Option 1 go for it! I feel like if you can convince me to do Option 1, I can implement it without the same problems...
Challenges with Option 2
I can build a hash with one dimension as a key, and a hash of the other dimension as an array. For example, if the days of the current week were used as the key:
{
Mon => [Eric, Erin, Kris, Tina, Lily],
Tue => [Erin, Jace, Lily]
Wed => [Eric, Erin, Jace, Kris, Lily],
Thu => [Erin, Lily],
Fri => [Eric, Erin, Tina, Lily]
}
But the problem is once I get there, I'm not sure how to deal with the fact that there are blanks in the data... If I were to convert the hash above into a table, I would only know how to make the Users appear as a list under each date; but then that wouldn't look like my desired output at all, because there wouldn't be gaps in the data. For example on Monday, there's no Jace, but there needs to be a blank space for Jace so that it's easy for the Viewer to look across and see, ah there's no Jace on Monday, but there is a Jace on Tuesday and Wednesday.

Oh actually I just needed a minute to think logically about this... I just need a nested hash, one with the first dimension, one with the second. So a method like this:
def plan_table
# 1 : get an array of the current week's dates
week = (Date.today.at_beginning_of_week..(Date.today.at_end_of_week-2.days)).map { |d| d }
# 2 : find all the users (plucking the id) that have a plan date in the current week
current_week_users = User.select { |u| u.plans.select { |p| p.plan_dates.select { |pd| week.include? pd.ddate.to_date }.count > 0 }.count > 0 }.map(&:id)
# 3: build the hash
#week_table = Hash.new
week.each do |day|
#week_table[day] = {}
current_week_users.each do |user_id|
# for each user in the has we built already, we have to check if that user has a pd that: 1) falls on this date, 2) has a non canceled plan, 3) has a user that matches this current user. The extra checks for user_id and plan_id are my ole' paranoia about objects being created without parents
potential_pd = PlanDate.select { |pd| pd.ddate.to_date == day && pd.plan_id != nil && pd.plan.status != "canceled" && pd.plan.user_id != nil && pd.plan.user.id == user_id }
if potential_pd == []
#week_table[day][user_id] = ""
else
#week_table[day][user_id] = potential_pd.first.plan.mtype
end
end
end
return #week_table
end

Related

Dataflow stream python windowing

i am new in using dataflow. I have following logic :
Event is added to pubsub
Dataflow reads pubsub and gets the event
From event i am looking into MySQL to find relations in which segments this event have relation and list of relations is returned with this step. This segments are independent from one another.
Each segment can be divided to two tables in MySQL results for email and mobile and they are independent as well.
Each segment have rules that can be 1 to n . I would like to process this step in parallel and collect all results. I have tried to use Windows but i am not sure how to write the logic so when i get the combined results from all rules inside one segment all of them will be collected at end function and write the final logic inside MySQL depending from rule results ( boolean ).
Here is so far what i have :
testP = beam.Pipeline(options=options)
ReadData = (
testP | 'ReadData' >> beam.io.ReadFromPubSub(subscription=str(options.pubsubsubscriber.get())).with_output_types(bytes)
| 'Decode' >> beam.Map(lambda x: x.decode('utf-8'))
| 'GetSegments' >> beam.ParDo(getsegments(options))
)
processEmails = (ReadData
| 'GetSubscribersWithRulesForEmails' >> beam.ParDo(GetSubscribersWithRules(options, 'email'))
| 'ProcessSubscribersSegmentsForEmails' >> beam.ParDo(ProcessSubscribersSegments(options, 'email'))
)
processMobiles = (ReadData
| 'GetSubscribersWithRulesForMobiles' >> beam.ParDo(GetSubscribersWithRules(options, 'mobile'))
| 'ProcessSubscribersSegmentsForMobiles' >> beam.ParDo(ProcessSubscribersSegments(options, 'mobile'))
)
#for sake of testing only window for email is written
windowThis = (processEmails
| beam.WindowInto(
beam.window.FixedWindows(1),
trigger=beam.transforms.trigger.Repeatedly(
beam.transforms.trigger.AfterProcessingTime(1 * 10)),
accumulation_mode=beam.transforms.trigger.AccumulationMode.DISCARDING)
| beam.CombinePerKey(beam.combiners.ToListCombineFn())
| beam.ParDo(print_windows)
)
In this case, because all of your elements have the exact same timestamp, I would use their message ID, and their timestamp to group them with Session windows. It would be something like this:
testP = beam.Pipeline(options=options)
ReadData = (
testP | 'ReadData' >> beam.io.ReadFromPubSub(subscription=str(options.pubsubsubscriber.get())).with_output_types(bytes)
| 'Decode' >> beam.Map(lambda x: x.decode('utf-8'))
| 'GetSegments' >> beam.ParDo(getsegments(options))
)
# At this point, ReadData contains (key, value) pairs with a timestamp.
# (Now we perform all of the processing
processEmails = (ReadData | ....)
processMobiles = (ReadData | .....)
# Now we window by sessions with a 1-second gap. This is okay because all of
# the elements for any given key have the exact same timestamp.
windowThis = (processEmails
| beam.WindowInto(beam.window.Sessions(1)) # Default trigger is fine
| beam.CombinePerKey(beam.combiners.ToListCombineFn())
| beam.ParDo(print_windows)
)

How to add hours and minutes that are stored as strings

I'm trying to add hours and minutes that are stored in the database like this:
+----+---------+-------+
| id | user_id | time |
+----+---------+-------+
| 1 | 4 | 03:15 |
| 2 | 4 | 02:22 |
+----+---------+-------+
The time field is a string. How can I add the hours and minutes expressed by the strings like 05:37?
I tried this
current_user.table.pluck(:time).sum(&:to_f)
but the output is only 5.
If you read the times from your table into an array you obtain something like
arr = ["03:15", "02:22"]
You could then write
arr.sum do |s|
h, m = s.split(':').map(&:to_i)
60*h + m
end.divmod(60).join(':')
#=> "5:37"
See Array#sum (introduced in MRI v2.4) and Integer#divmod. To support Ruby versions earlier than 2.4 use Enumerable#reduce in place of Array#sum.
The three steps are as follows.
mins = arr.sum do |s|
h, m = s.split(':').map(&:to_i)
60*h + m
end
#=> 337
hm = mins.divmod(60)
#=> [5, 37]
hm.join(':')
#=> "5:37"
Please check this sample of summing time in ruby:
require 'time'
t = Time.parse("3:15")
puts t.strftime("%H:%M")
t2 = Time.parse("02:22")
puts t2.strftime("%H:%M")
t3 = t.to_i + t2.to_i
puts Time.at(t3).utc.strftime("%H:%M")
This is fast sum of times below 24 hour span. For correct solution for every case please check #Cary code above.
Here is a small Ruby Gem made from #Cary code sample which extends Ruby Array class with method sum_strings in example like: ["12:23","23:30","1:2"].sum_strings(':') will result as "36:55"
Gem https://github.com/nezirz/sum_strings/
Sample project with using Gem: https://github.com/nezirz/use_sum_strings

How do I perform a "diff" on two Sources given a key using Apache Beam Python SDK?

I posed the question generically, because maybe it is a generic answer. But a specific example is comparing 2 BigQuery tables with the same schema, but potentially different data. I want a diff, i.e. what was added, deleted, modified, with respect to a composite key, e.g. the first 2 columns.
Table A
C1 C2 C3
-----------
a a 1
a b 1
a c 1
Table B
C1 C2 C3 # Notes if comparing B to A
-------------------------------------
a a 1 # No Change to the key a + a
a b 2 # Key a + b Changed from 1 to 2
# Deleted key a + c with value 1
a d 1 # Added key a + d
I basically want to be able to make/report the comparison notes.
Or from a Beam perspective I may want to Just output up to 4 labeled PCollections: Unchanged, Changed, Added, Deleted. How do I do this and what would the PCollections look like?
What you want to do here, basically, is join two tables and compare the result of that, right? You can look at my answer to this question, to see the two ways in which you can join two tables (Side inputs, or CoGroupByKey).
I'll also code a solution for your problem using CoGroupByKey. I'm writing the code in Python because I'm more familiar with the Python SDK, but you'd implement similar logic in Java:
def make_kv_pair(x):
""" Output the record with the x[0]+x[1] key added."""
return ((x[0], x[1]), x)
table_a = (p | 'ReadTableA' >> beam.Read(beam.io.BigQuerySource(....))
| 'SetKeysA' >> beam.Map(make_kv_pair)
table_b = (p | 'ReadTableB' >> beam.Read(beam.io.BigQuerySource(....))
| 'SetKeysB' >> beam.Map(make_kv_pair))
joined_tables = ({'table_a': table_a, 'table_b': table_b}
| beam.CoGroupByKey())
output_types = ['changed', 'added', 'deleted', 'unchanged']
class FilterDoFn(beam.DoFn):
def process((key, values)):
table_a_value = list(values['table_a'])
table_b_value = list(values['table_b'])
if table_a_value == table_b_value:
yield pvalue.TaggedOutput('unchanged', key)
elif len(table_a_value) < len(table_b_value):
yield pvalue.TaggedOutput('added', key)
elif len(table_a_value) > len(table_b_value):
yield pvalue.TaggedOutput('removed', key)
elif table_a_value != table_b_value:
yield pvalue.TaggedOutput('changed', key)
key_collections = (joined_tables
| beam.ParDo(FilterDoFn()).with_outputs(*output_types))
# Now you can handle each output
key_collections.unchanged | WriteToText(...)
key_collections.changed | WriteToText(...)
key_collections.added | WriteToText(...)
key_collections.removed | WriteToText(...)

How to get index being tested on where block using spock?

Is there any way to get the index being tested on Spock?
I have a where block like this:
where:
column1 | column2
1 | 3
1 | 4
2 | 5
6 | 8
I want to know if it's possible to get the index being executed on my test.
So If I'm running the first test (1 | 3) my index would be 0.
If I'm running the third test (2 | 5) my index would be 2.
Is there any way to get this index inside my test?
The trivial answer would be to add an index or case or some categorical variable that can be checked in the result section.
where:
idx | column1 | column2
0 | 1 | 3
1 | 1 | 4
2 | 2 | 5
3 | 6 | 8
But I have to wonder if the where clause is being used to run what maybe should be multiple test cases under the guise of a single test.
The idx shouldn't be telling the test code what checks to execute in the expect/then block, and shouldn't be driving any code that is being tested.
If the test yields different results if the order of the inputs was:
where:
idx | column1 | column2
0 | 6 | 8
1 | 2 | 5
2 | 1 | 4
3 | 1 | 3
then I think the test needs to be broken up, because order sensitivity would seem to indicate that this test is testing something other than just the pairs of column1 and column2 values, and using where isn't exactly appropriate.
There is a default (now?): #iterationCount, see here. You can use it in your method's name like
#Unroll
def "testing entry #iterationCount from where block"() {
// ...
}
There is #featureName available as well, by the way.
You need to add, #Unroll to your test definition and then in the test name put #column1 to output the value
#Unroll
def "something #column1 and #column2"() {
...
}
Something like below would suffice?
import spock.lang.*
#Unroll
class MyFirstSpec extends Specification {
def "column1 is #column1 and column2 is #column2 where index is #index"() {
expect:
if(index in [0, 1]) {
assert column1 < column2
} else {
assert column1 == column2
}
where:
[column1, column2, index] << [
[1, 2, 4, 5], [2, 3, 4, 5]
].transpose().withIndex()*.flatten()
}
}
Run this Sample
Note:
withIndex() is available from Groovy 2.4.0
Adding an index column can be done in a groovy way. Just add the following line to the end of the "where:" block. The list should have the same number of items as the rows of data
idx << [0,1,2,3]
or
idx << (0..3).collect()
Reference: Secton "Combining Data Tables, Data Pipes, and Variable Assignments" of http://spockframework.org/spock/docs/1.0/data_driven_testing.html

Calculate hierarchical labels for Google Sheets using native functions

Using Google Sheets, I want to automatically number rows like so:
The key is that I want this to use built-in functions only.
I have an implementation working where child items are in separate columns (e.g. "Foo" is in column B, "Bar" is in column C, and "Baz" is in column D). However, it uses a custom JavaScript function, and the slow way that custom JavaScript functions are evaluated, combined with the dependencies, possibly combined with a slow Internet connection, means that my solution can take over one second per row (!) to calculate.
For reference, here's my custom function (that I want to abandon in favor of native code):
/**
* Calculate the Work Breakdown Structure id for this row.
*
* #param {range} priorIds IDs that precede this one.
* #param {range} names The names for this row.
* #return A WBS string id (e.g. "2.1.5") or an empty string if there are no names.
* #customfunction
*/
function WBS_ID(priorIds,names){
if (Array.isArray(names[0])) names = names[0];
if (!names.join("")) return "";
var lastId,pieces=[];
for (var i=priorIds.length;i-- && !lastId;) lastId=priorIds[i][0];
if (lastId) pieces = (lastId+"").split('.').map(function(s){ return s*1 });
for (var i=0;i<names.length;i++){
if (names[i]){
var s = pieces.concat();
pieces.length=i+1;
pieces[i] = (pieces[i]||0) + 1;
return pieces.join(".");
}
}
}
For example, cell A7 would use the formula:
=WBS_ID(A$2:A6,B7:D7)
...to produce the result "1.3.2"
Note that in the above example blank rows are skipped during numbering. An answer that does not honor this—where the ID is calculated determinstically from the ROW())—is acceptable (and possibly even desirable).
Edit: Yes, I've tried to do this myself. I have a solution that uses three extra columns which I chose not to include in the question. I have been writing equations in Excel for at least 25 years (and Google Spreadsheets for 1 year). I have looked through the list of functions for Google Spreadsheets and none of them jumps out to me as making possible something that I didn't think of before.
When the question is a programming problem and the problem is an inability to see how to get from point A to point B, I don't know that it's useful to "show what I've done". I've considered splitting by periods. I've looked for a map equivalent function. I know how to use isblank() and counta().
Lol this is hilariously the longest (and very likely the most unnecessarily complicated way to combine formulas) but because I thought it was interesting that it does in fact work, so long as you just add a 1 in the first row then in the second row you add:
=if(row()=1,1,if(and(istext(D2),counta(split(A1,"."))=3),left(A1,4)&n(right(A1,1)+1),if(and(isblank(B2),isblank(C2),isblank(D2)),"",if(and(isblank(B2),isblank(C2),isnumber(indirect(address(row()-1,column())))),indirect(address(row()-1,column()))&"."&if(istext(D2),round(max(indirect(address(1,column())&":"&address(row()-1,column())))+0.1,)),if(and(isblank(B2),istext(C2)),round(max(indirect(address(1,column())&":"&address(row()-1,column())))+0.1,2),if(istext(B2),round(max(indirect(address(1,column())&":"&address(row()-1,column())))+1,),))))))
in my defense ive had a very long day at work - complicating what should be a simple thing seems to be my thing today :)
Foreword
Spreadsheet built-in functions doesn't include an equivalent to JavaScript .map. The alternative is to use the spreadsheets array handling features and iteration patterns.
A "complete solution" could include the use of built-in functions to automatically transform the user input into a simple table and returning the Work Breakdown Structure number (WBS) . Some people refer to transforming the user input into a simple table as "normalization" but including this will make this post to be too long for the Stack Overflow format, so it will be focused in presenting a short formula to obtain the WBS.
It's worth to say that using formulas for doing the transformation of large data sets into a simple table as part of the continuous spreadsheet calculations, in this case, of WBS, will make the spreadsheet to slow to refresh.
Short answer
To keep the WBS formula short and simple, first transform the user input into a simple table including task name, id and parent id columns, then use a formula like the following:
=ArrayFormula(
IFERROR(
INDEX($D$2:$D,MATCH($C2,$B$2:$B,0))
&"."
&COUNTIF($C$2:$C2,C2),
RANK($B2,FILTER($B$2:B,LEN($C$2:$C)=0),TRUE)&"")
)
Explanation
First, prepare your data
Put each task in one row. Include a General task / project to be used as the parent of all the root level tasks.
Add an ID to each task.
Add a reference to the ID of the parent task for each task. Left blank for the General task / project.
After the above steps the data should look like the following:
+---+--------------+----+-----------+
| | A | B | C |
+---+--------------+----+-----------+
| 1 | Task | ID | Parent ID |
| 2 | General task | 1 | |
| 3 | Substast 1 | 2 | 1 |
| 4 | Substast 2 | 3 | 1 |
| 5 | Subsubtask 1 | 4 | 2 |
| 6 | Subsubtask 2 | 5 | 2 |
+---+--------------+----+-----------+
Remark: This also could help to reduce of required processing time of a custom funcion.
Second, add the below formula to D2, then fill down as needed,
=ArrayFormula(
IFERROR(
INDEX($D$2:$D,MATCH($C2,$B$2:$B,0))
&"."
&COUNTIF($C$2:$C2,C2),
RANK($B2,FILTER($B$2:B,LEN($C$2:$C)=0),TRUE)&"")
)
The result should look like the following:
+---+--------------+----+-----------+----------+
| | A | B | C | D |
+---+--------------+----+-----------+----------+
| 1 | Task | ID | Parent ID | WBS |
| 2 | General task | 1 | | 1 |
| 3 | Substast 1 | 2 | 1 | 1.1 |
| 4 | Substast 2 | 3 | 1 | 1.2 |
| 5 | Subsubtask 1 | 4 | 2 | 1.1.1 |
| 6 | Subsubtask 2 | 5 | 2 | 1.1.2 |
+---+--------------+----+-----------+----------+
Here's an answer that does not allow a blank line between items, and requires that you manually type "1" into the first cell (A2). This formula is applied to cell A3, with the assumption that there are at most three levels of hierarchy in columns B, C, and D.
=IF(
COUNTA(B3), // If there is a value in the 1st column
INDEX(SPLIT(A2,"."),1)+1, // find the 1st part of the prior ID, plus 1
IF( // ...otherwise
COUNTA(C3), // If there's a value in the 2nd column
INDEX(SPLIT(A2,"."),1) // find the 1st part of the prior ID
& "." // add a period and
& IFERROR(INDEX(SPLIT(A2,"."),2),0)+1, // add the 2nd part of the prior ID (or 0), plus 1
INDEX(SPLIT(A2,"."),1) // ...otherwise find the 1st part of the prior ID
& "." // add a period and
& IFERROR(INDEX(SPLIT(A2,"."),2),1) // add the 2nd part of the prior ID or 1 and
& "." // add a period and
& IFERROR(INDEX(SPLIT(A2,"."),3)+1,1) // add the 3rd part of the prior ID (or 0), plus 1
)
) & "" // Ensure the result is a string ("1.2", not 1.2)
Without comments:
=IF(COUNTA(B3),INDEX(SPLIT(A2,"."),1)+1,IF(COUNTA(C3),INDEX(SPLIT(A2,"."),1)& "."& IFERROR(INDEX(SPLIT(A2,"."),2),0)+1,INDEX(SPLIT(A2,"."),1)& "."& IFERROR(INDEX(SPLIT(A2,"."),2),1)& "."& IFERROR(INDEX(SPLIT(A2,"."),3)+1,1))) & ""

Resources