I have a problem making a query for the following case:
+--------------------hasManager-------------------+
| | |
| property:isPersonalMngr=true (bool) |
| v
[ Employee ]-- hasShift -->[ Shift ]-- hasManager -->[ Manager ]
| | |
| | property:isPersonalMngr=false (bool)
| |
| property:name (text)
|
property:baseShift (bool)
For a manager 'John', who is managing shifts and can also be a personal manager of an empoyee, I want return all the employees he's managing with the list of shifts for each employee. Each empoyee has a 'baseShift' (say: 'night' / 'day') and a scheduled shift ('wed123')
Eg:
[ 'Employee1', [ 'night', 'wed123', 'sat123' ]]
[ 'Employee2', [ 'day', 'mon123', 'tue123' ]]
For the shift employees I have this:
g.V('John').in('hasManager').in('hasShift').hasLabel('Employee')
For the personal managed I have this:
g.V('John').in('hasManager').hasLabel('Employee')
How do I combine these two AND add the name property of the shift in a list?
Thanks.
To test this, I created the following graph. Hope this fits your data model from above:
g.addV('Manager').property(id,'John').as('john').
addV('Manager').property(id,'Terry').as('terry').
addV('Manager').property(id,'Sally').as('sally').
addV('Employee').property(id,'Tom').as('tom').
addV('Employee').property(id,'Tim').as('tim').
addV('Employee').property(id,'Lisa').as('lisa').
addV('Employee').property(id,'Sue').as('sue').
addV('Employee').property(id,'Chris').as('chris').
addV('Employee').property(id,'Bob').as('bob').
addV('Shift').property('name','mon123').as('mon123').
addV('Shift').property('name','tues123').as('tues123').
addV('Shift').property('name','sat123').as('sat123').
addV('Shift').property('name','wed123').as('wed123').
addE('hasManager').from('tom').to('john').property('isPersonalMngr',true).
addE('hasManager').from('tim').to('john').property('isPersonalMngr',true).
addE('hasManager').from('lisa').to('terry').property('isPersonalMngr',true).
addE('hasManager').from('sue').to('terry').property('isPersonalMngr',true).
addE('hasManager').from('chris').to('sally').property('isPersonalMngr',true).
addE('hasManager').from('bob').to('sally').property('isPersonalMngr',true).
addE('hasShift').from('tom').to('mon123').property('baseShift','day').
addE('hasShift').from('tim').to('tues123').property('baseShift','night').
addE('hasShift').from('lisa').to('wed123').property('baseShift','night').
addE('hasShift').from('sue').to('sat123').property('baseShift','night').
addE('hasShift').from('chris').to('wed123').property('baseShift','day').
addE('hasShift').from('bob').to('sat123').property('baseShift','day').
addE('hasShift').from('bob').to('mon123').property('baseShift','day').
addE('hasShift').from('tim').to('wed123').property('baseShift','day').
addE('hasManager').from('mon123').to('terry').property('isPersonalMngr',false).
addE('hasManager').from('tues123').to('sally').property('isPersonalMngr',false).
addE('hasManager').from('wed123').to('john').property('isPersonalMngr',false).
addE('hasManager').from('sat123').to('terry').property('isPersonalMngr',false)
From this, the follow query generates an output in the format that you're looking for:
gremlin> g.V('John').
union(
inE('hasManager').has('isPersonalMngr',true).outV(),
inE('hasManager').has('isPersonalMngr',false).outV().in('hasShift')).
dedup().
map(union(id(),out('hasShift').values('name').fold()).fold())
==>[Tom,[mon123]]
==>[Tim,[tues123,wed123]]
==>[Lisa,[wed123]]
==>[Chris,[wed123]]
A note on your data model - you could likely simplify things by having two different types of edges for hasManager and that would remove the need for a boolean property on those edges. Instead, you could have hasOrgManager and hasShiftManager edges and that would remove the need for the property checks when traversing those edges.
I have a PCollection which consists of an ID column and seven value columns. There are several rows for each ID.
I would like to compute the mean of the seven columns per unique ID.
Is there a way to achieve this without going through each element programmatically and creating key/value pair per each element?
table_pcoll = ....
def per_column_average(rows, ignore_elms=[ID_INDEX]):
return [sum([row[idx] if idx not in ignore_elms else 0
for row in rows])/len(row[0])
for idx, _ in enumerate(rows[0])]
keyed_averaged_elm = (table_pcoll
| beam.Map(lambda x: (x[ID_INDEX], x))
| beam.GroupByKey()
| beam.Map(lambda x: (x[0], per_column_average(rows))
Sorry about the nasty one-liner. I hope that helps.
I have the following in my Google cloud storage
Advertiser | Event
__________________
100 | Click
101 | Impression
100 | Impression
100 | Impression
101 | Impression
My output of the pipeline should be something like
Advertiser | Count
100 | 3
101 | 2
First I used groupByKey, the output is like
100 Click, Impression, Impression
101 Impression, Impression
How to proceed from here?
Instead of a GroupByKey, you may want to use a combine function, which is a composite that optimizes before and after the group by key. Your pipeline can look something like this:
Python
collection_contents = [(100, 'Click'),
(101, 'Impression'),
(100, 'Impression'),
(100, 'Impression'),
(101, 'Impression']
input_collection = pipeline | beam.Create(collection_contents)
counts = input_collection | Count.PerKey()
This should output a collection with the shape you are looking for. The Count series of transforms is available in the apache_beam.transforms.combiners.combine.Count module.
Java
The same transforms exist for Java in the org.apache.beam.sdk.transforms package:
PCollection<KV<Integer, Integer>> resultColl = inputColl.apply(Count.perKey())
This counting pattern has been described in the 'word count' sample of Apache Beam.
Find the sample at Github apache beam sample: wordcount.py. The counting starts at line 95.
I have a list of variables for which I want to create a list of numbered variables. The intent is to use these with the reshape command to create a stacked data set. How do I keep them in order? For instance, with this code
local ct = 1
foreach x in q61 q77 q99 q121 q143 q165 q187 q209 q231 q253 q275 q297 q306 q315 q324 q333 q342 q351 q360 q369 q378 q387 q396 q405 q414 q423 {
gen runs`ct' = `x'
local ct = `ct' + 1
}
when I use the reshape command it generates an order as
runs1 runs10 runs11 ... runs2 runs22 ...
rather than the desired
runs01 runs02 runs03 ... runs26
Preserving the order is necessary in this analysis. I'm trying to add a leading zero to all ct values less than 10 when assigning variable names.
Generating a series of identifiers with leading zeros is a documented and solved problem: see e.g. here.
local j = 1
foreach v in q61 q77 q99 q121 q143 q165 q187 q209 q231 q253 q275 q297 q306 q315 q324 q333 q342 q351 q360 q369 q378 q387 q396 q405 q414 q423 {
local J : di %02.0f `j'
rename `v' runs`J'
local ++j
}
Note that I used rename rather than generate. If you are going to reshape the variables afterwards, the labour of copying the contents is unnecessary. Indeed the default float type for numeric variables used by generate could in some circumstances result in loss of precision.
I note that there may also be a solution with rename groups.
All that said, it's hard to follow your complaint about what reshape does (or does not) do. If you have a series of variables like runs* the most obvious reshape is a reshape long and for example
clear
set obs 1
gen id = _n
foreach v in q61 q77 q99 q121 q143 {
gen `v' = 42
}
reshape long q, i(id) j(which)
list
+-----------------+
| id which q |
|-----------------|
1. | 1 61 42 |
2. | 1 77 42 |
3. | 1 99 42 |
4. | 1 121 42 |
5. | 1 143 42 |
+-----------------+
works fine for me; the column order information is preserved and no use of rename was needed at all. If I want to map the suffixes to 1 up, I can just use egen, group().
So, that's hard to discuss without a reproducible example. See
https://stackoverflow.com/help/mcve for how to post good code examples.
Using Google Sheets, I want to automatically number rows like so:
The key is that I want this to use built-in functions only.
I have an implementation working where child items are in separate columns (e.g. "Foo" is in column B, "Bar" is in column C, and "Baz" is in column D). However, it uses a custom JavaScript function, and the slow way that custom JavaScript functions are evaluated, combined with the dependencies, possibly combined with a slow Internet connection, means that my solution can take over one second per row (!) to calculate.
For reference, here's my custom function (that I want to abandon in favor of native code):
/**
* Calculate the Work Breakdown Structure id for this row.
*
* #param {range} priorIds IDs that precede this one.
* #param {range} names The names for this row.
* #return A WBS string id (e.g. "2.1.5") or an empty string if there are no names.
* #customfunction
*/
function WBS_ID(priorIds,names){
if (Array.isArray(names[0])) names = names[0];
if (!names.join("")) return "";
var lastId,pieces=[];
for (var i=priorIds.length;i-- && !lastId;) lastId=priorIds[i][0];
if (lastId) pieces = (lastId+"").split('.').map(function(s){ return s*1 });
for (var i=0;i<names.length;i++){
if (names[i]){
var s = pieces.concat();
pieces.length=i+1;
pieces[i] = (pieces[i]||0) + 1;
return pieces.join(".");
}
}
}
For example, cell A7 would use the formula:
=WBS_ID(A$2:A6,B7:D7)
...to produce the result "1.3.2"
Note that in the above example blank rows are skipped during numbering. An answer that does not honor this—where the ID is calculated determinstically from the ROW())—is acceptable (and possibly even desirable).
Edit: Yes, I've tried to do this myself. I have a solution that uses three extra columns which I chose not to include in the question. I have been writing equations in Excel for at least 25 years (and Google Spreadsheets for 1 year). I have looked through the list of functions for Google Spreadsheets and none of them jumps out to me as making possible something that I didn't think of before.
When the question is a programming problem and the problem is an inability to see how to get from point A to point B, I don't know that it's useful to "show what I've done". I've considered splitting by periods. I've looked for a map equivalent function. I know how to use isblank() and counta().
Lol this is hilariously the longest (and very likely the most unnecessarily complicated way to combine formulas) but because I thought it was interesting that it does in fact work, so long as you just add a 1 in the first row then in the second row you add:
=if(row()=1,1,if(and(istext(D2),counta(split(A1,"."))=3),left(A1,4)&n(right(A1,1)+1),if(and(isblank(B2),isblank(C2),isblank(D2)),"",if(and(isblank(B2),isblank(C2),isnumber(indirect(address(row()-1,column())))),indirect(address(row()-1,column()))&"."&if(istext(D2),round(max(indirect(address(1,column())&":"&address(row()-1,column())))+0.1,)),if(and(isblank(B2),istext(C2)),round(max(indirect(address(1,column())&":"&address(row()-1,column())))+0.1,2),if(istext(B2),round(max(indirect(address(1,column())&":"&address(row()-1,column())))+1,),))))))
in my defense ive had a very long day at work - complicating what should be a simple thing seems to be my thing today :)
Foreword
Spreadsheet built-in functions doesn't include an equivalent to JavaScript .map. The alternative is to use the spreadsheets array handling features and iteration patterns.
A "complete solution" could include the use of built-in functions to automatically transform the user input into a simple table and returning the Work Breakdown Structure number (WBS) . Some people refer to transforming the user input into a simple table as "normalization" but including this will make this post to be too long for the Stack Overflow format, so it will be focused in presenting a short formula to obtain the WBS.
It's worth to say that using formulas for doing the transformation of large data sets into a simple table as part of the continuous spreadsheet calculations, in this case, of WBS, will make the spreadsheet to slow to refresh.
Short answer
To keep the WBS formula short and simple, first transform the user input into a simple table including task name, id and parent id columns, then use a formula like the following:
=ArrayFormula(
IFERROR(
INDEX($D$2:$D,MATCH($C2,$B$2:$B,0))
&"."
&COUNTIF($C$2:$C2,C2),
RANK($B2,FILTER($B$2:B,LEN($C$2:$C)=0),TRUE)&"")
)
Explanation
First, prepare your data
Put each task in one row. Include a General task / project to be used as the parent of all the root level tasks.
Add an ID to each task.
Add a reference to the ID of the parent task for each task. Left blank for the General task / project.
After the above steps the data should look like the following:
+---+--------------+----+-----------+
| | A | B | C |
+---+--------------+----+-----------+
| 1 | Task | ID | Parent ID |
| 2 | General task | 1 | |
| 3 | Substast 1 | 2 | 1 |
| 4 | Substast 2 | 3 | 1 |
| 5 | Subsubtask 1 | 4 | 2 |
| 6 | Subsubtask 2 | 5 | 2 |
+---+--------------+----+-----------+
Remark: This also could help to reduce of required processing time of a custom funcion.
Second, add the below formula to D2, then fill down as needed,
=ArrayFormula(
IFERROR(
INDEX($D$2:$D,MATCH($C2,$B$2:$B,0))
&"."
&COUNTIF($C$2:$C2,C2),
RANK($B2,FILTER($B$2:B,LEN($C$2:$C)=0),TRUE)&"")
)
The result should look like the following:
+---+--------------+----+-----------+----------+
| | A | B | C | D |
+---+--------------+----+-----------+----------+
| 1 | Task | ID | Parent ID | WBS |
| 2 | General task | 1 | | 1 |
| 3 | Substast 1 | 2 | 1 | 1.1 |
| 4 | Substast 2 | 3 | 1 | 1.2 |
| 5 | Subsubtask 1 | 4 | 2 | 1.1.1 |
| 6 | Subsubtask 2 | 5 | 2 | 1.1.2 |
+---+--------------+----+-----------+----------+
Here's an answer that does not allow a blank line between items, and requires that you manually type "1" into the first cell (A2). This formula is applied to cell A3, with the assumption that there are at most three levels of hierarchy in columns B, C, and D.
=IF(
COUNTA(B3), // If there is a value in the 1st column
INDEX(SPLIT(A2,"."),1)+1, // find the 1st part of the prior ID, plus 1
IF( // ...otherwise
COUNTA(C3), // If there's a value in the 2nd column
INDEX(SPLIT(A2,"."),1) // find the 1st part of the prior ID
& "." // add a period and
& IFERROR(INDEX(SPLIT(A2,"."),2),0)+1, // add the 2nd part of the prior ID (or 0), plus 1
INDEX(SPLIT(A2,"."),1) // ...otherwise find the 1st part of the prior ID
& "." // add a period and
& IFERROR(INDEX(SPLIT(A2,"."),2),1) // add the 2nd part of the prior ID or 1 and
& "." // add a period and
& IFERROR(INDEX(SPLIT(A2,"."),3)+1,1) // add the 3rd part of the prior ID (or 0), plus 1
)
) & "" // Ensure the result is a string ("1.2", not 1.2)
Without comments:
=IF(COUNTA(B3),INDEX(SPLIT(A2,"."),1)+1,IF(COUNTA(C3),INDEX(SPLIT(A2,"."),1)& "."& IFERROR(INDEX(SPLIT(A2,"."),2),0)+1,INDEX(SPLIT(A2,"."),1)& "."& IFERROR(INDEX(SPLIT(A2,"."),2),1)& "."& IFERROR(INDEX(SPLIT(A2,"."),3)+1,1))) & ""