Calculate hierarchical labels for Google Sheets using native functions - google-sheets

Using Google Sheets, I want to automatically number rows like so:
The key is that I want this to use built-in functions only.
I have an implementation working where child items are in separate columns (e.g. "Foo" is in column B, "Bar" is in column C, and "Baz" is in column D). However, it uses a custom JavaScript function, and the slow way that custom JavaScript functions are evaluated, combined with the dependencies, possibly combined with a slow Internet connection, means that my solution can take over one second per row (!) to calculate.
For reference, here's my custom function (that I want to abandon in favor of native code):
/**
* Calculate the Work Breakdown Structure id for this row.
*
* #param {range} priorIds IDs that precede this one.
* #param {range} names The names for this row.
* #return A WBS string id (e.g. "2.1.5") or an empty string if there are no names.
* #customfunction
*/
function WBS_ID(priorIds,names){
if (Array.isArray(names[0])) names = names[0];
if (!names.join("")) return "";
var lastId,pieces=[];
for (var i=priorIds.length;i-- && !lastId;) lastId=priorIds[i][0];
if (lastId) pieces = (lastId+"").split('.').map(function(s){ return s*1 });
for (var i=0;i<names.length;i++){
if (names[i]){
var s = pieces.concat();
pieces.length=i+1;
pieces[i] = (pieces[i]||0) + 1;
return pieces.join(".");
}
}
}
For example, cell A7 would use the formula:
=WBS_ID(A$2:A6,B7:D7)
...to produce the result "1.3.2"
Note that in the above example blank rows are skipped during numbering. An answer that does not honor this—where the ID is calculated determinstically from the ROW())—is acceptable (and possibly even desirable).
Edit: Yes, I've tried to do this myself. I have a solution that uses three extra columns which I chose not to include in the question. I have been writing equations in Excel for at least 25 years (and Google Spreadsheets for 1 year). I have looked through the list of functions for Google Spreadsheets and none of them jumps out to me as making possible something that I didn't think of before.
When the question is a programming problem and the problem is an inability to see how to get from point A to point B, I don't know that it's useful to "show what I've done". I've considered splitting by periods. I've looked for a map equivalent function. I know how to use isblank() and counta().

Lol this is hilariously the longest (and very likely the most unnecessarily complicated way to combine formulas) but because I thought it was interesting that it does in fact work, so long as you just add a 1 in the first row then in the second row you add:
=if(row()=1,1,if(and(istext(D2),counta(split(A1,"."))=3),left(A1,4)&n(right(A1,1)+1),if(and(isblank(B2),isblank(C2),isblank(D2)),"",if(and(isblank(B2),isblank(C2),isnumber(indirect(address(row()-1,column())))),indirect(address(row()-1,column()))&"."&if(istext(D2),round(max(indirect(address(1,column())&":"&address(row()-1,column())))+0.1,)),if(and(isblank(B2),istext(C2)),round(max(indirect(address(1,column())&":"&address(row()-1,column())))+0.1,2),if(istext(B2),round(max(indirect(address(1,column())&":"&address(row()-1,column())))+1,),))))))
in my defense ive had a very long day at work - complicating what should be a simple thing seems to be my thing today :)

Foreword
Spreadsheet built-in functions doesn't include an equivalent to JavaScript .map. The alternative is to use the spreadsheets array handling features and iteration patterns.
A "complete solution" could include the use of built-in functions to automatically transform the user input into a simple table and returning the Work Breakdown Structure number (WBS) . Some people refer to transforming the user input into a simple table as "normalization" but including this will make this post to be too long for the Stack Overflow format, so it will be focused in presenting a short formula to obtain the WBS.
It's worth to say that using formulas for doing the transformation of large data sets into a simple table as part of the continuous spreadsheet calculations, in this case, of WBS, will make the spreadsheet to slow to refresh.
Short answer
To keep the WBS formula short and simple, first transform the user input into a simple table including task name, id and parent id columns, then use a formula like the following:
=ArrayFormula(
IFERROR(
INDEX($D$2:$D,MATCH($C2,$B$2:$B,0))
&"."
&COUNTIF($C$2:$C2,C2),
RANK($B2,FILTER($B$2:B,LEN($C$2:$C)=0),TRUE)&"")
)
Explanation
First, prepare your data
Put each task in one row. Include a General task / project to be used as the parent of all the root level tasks.
Add an ID to each task.
Add a reference to the ID of the parent task for each task. Left blank for the General task / project.
After the above steps the data should look like the following:
+---+--------------+----+-----------+
| | A | B | C |
+---+--------------+----+-----------+
| 1 | Task | ID | Parent ID |
| 2 | General task | 1 | |
| 3 | Substast 1 | 2 | 1 |
| 4 | Substast 2 | 3 | 1 |
| 5 | Subsubtask 1 | 4 | 2 |
| 6 | Subsubtask 2 | 5 | 2 |
+---+--------------+----+-----------+
Remark: This also could help to reduce of required processing time of a custom funcion.
Second, add the below formula to D2, then fill down as needed,
=ArrayFormula(
IFERROR(
INDEX($D$2:$D,MATCH($C2,$B$2:$B,0))
&"."
&COUNTIF($C$2:$C2,C2),
RANK($B2,FILTER($B$2:B,LEN($C$2:$C)=0),TRUE)&"")
)
The result should look like the following:
+---+--------------+----+-----------+----------+
| | A | B | C | D |
+---+--------------+----+-----------+----------+
| 1 | Task | ID | Parent ID | WBS |
| 2 | General task | 1 | | 1 |
| 3 | Substast 1 | 2 | 1 | 1.1 |
| 4 | Substast 2 | 3 | 1 | 1.2 |
| 5 | Subsubtask 1 | 4 | 2 | 1.1.1 |
| 6 | Subsubtask 2 | 5 | 2 | 1.1.2 |
+---+--------------+----+-----------+----------+

Here's an answer that does not allow a blank line between items, and requires that you manually type "1" into the first cell (A2). This formula is applied to cell A3, with the assumption that there are at most three levels of hierarchy in columns B, C, and D.
=IF(
COUNTA(B3), // If there is a value in the 1st column
INDEX(SPLIT(A2,"."),1)+1, // find the 1st part of the prior ID, plus 1
IF( // ...otherwise
COUNTA(C3), // If there's a value in the 2nd column
INDEX(SPLIT(A2,"."),1) // find the 1st part of the prior ID
& "." // add a period and
& IFERROR(INDEX(SPLIT(A2,"."),2),0)+1, // add the 2nd part of the prior ID (or 0), plus 1
INDEX(SPLIT(A2,"."),1) // ...otherwise find the 1st part of the prior ID
& "." // add a period and
& IFERROR(INDEX(SPLIT(A2,"."),2),1) // add the 2nd part of the prior ID or 1 and
& "." // add a period and
& IFERROR(INDEX(SPLIT(A2,"."),3)+1,1) // add the 3rd part of the prior ID (or 0), plus 1
)
) & "" // Ensure the result is a string ("1.2", not 1.2)
Without comments:
=IF(COUNTA(B3),INDEX(SPLIT(A2,"."),1)+1,IF(COUNTA(C3),INDEX(SPLIT(A2,"."),1)& "."& IFERROR(INDEX(SPLIT(A2,"."),2),0)+1,INDEX(SPLIT(A2,"."),1)& "."& IFERROR(INDEX(SPLIT(A2,"."),2),1)& "."& IFERROR(INDEX(SPLIT(A2,"."),3)+1,1))) & ""

Related

Esttab: Create new row with logarithm of a beta coefficient

I am using Stata and I'm currently trying to figure out how to create new row that shows me the relative effect of a certain coefficient.
eststo, title(log_total[1]]): reg log_total a b
eststo, title(log_total[2]]): reg log_total a b c
esttab using total.tex
To give a sample code, this is what I have.
However, in the end besides the rows for a, b, and c I want to have a row that says effect for a where I calculate exp(a)-1 and where I want to print exp(a)-1 %.
The table should look the following:
| | total[1]| total[2]|
|:---- |:------:| -----:|
| a| 0.014| 0.021|
| b| 0.031| 0.005|
| c| | 0.082|
| Effect| 1.4 %| 2.1 %|
How can I add this "Effect" row to my table using esttab? I tried using estadd which works for fixed values but I was not able to figure out how to include a calculation in there.
Thank you a lot!

Gremlin, combine two queries and join data

I have a problem making a query for the following case:
+--------------------hasManager-------------------+
| | |
| property:isPersonalMngr=true (bool) |
| v
[ Employee ]-- hasShift -->[ Shift ]-- hasManager -->[ Manager ]
| | |
| | property:isPersonalMngr=false (bool)
| |
| property:name (text)
|
property:baseShift (bool)
For a manager 'John', who is managing shifts and can also be a personal manager of an empoyee, I want return all the employees he's managing with the list of shifts for each employee. Each empoyee has a 'baseShift' (say: 'night' / 'day') and a scheduled shift ('wed123')
Eg:
[ 'Employee1', [ 'night', 'wed123', 'sat123' ]]
[ 'Employee2', [ 'day', 'mon123', 'tue123' ]]
For the shift employees I have this:
g.V('John').in('hasManager').in('hasShift').hasLabel('Employee')
For the personal managed I have this:
g.V('John').in('hasManager').hasLabel('Employee')
How do I combine these two AND add the name property of the shift in a list?
Thanks.
To test this, I created the following graph. Hope this fits your data model from above:
g.addV('Manager').property(id,'John').as('john').
addV('Manager').property(id,'Terry').as('terry').
addV('Manager').property(id,'Sally').as('sally').
addV('Employee').property(id,'Tom').as('tom').
addV('Employee').property(id,'Tim').as('tim').
addV('Employee').property(id,'Lisa').as('lisa').
addV('Employee').property(id,'Sue').as('sue').
addV('Employee').property(id,'Chris').as('chris').
addV('Employee').property(id,'Bob').as('bob').
addV('Shift').property('name','mon123').as('mon123').
addV('Shift').property('name','tues123').as('tues123').
addV('Shift').property('name','sat123').as('sat123').
addV('Shift').property('name','wed123').as('wed123').
addE('hasManager').from('tom').to('john').property('isPersonalMngr',true).
addE('hasManager').from('tim').to('john').property('isPersonalMngr',true).
addE('hasManager').from('lisa').to('terry').property('isPersonalMngr',true).
addE('hasManager').from('sue').to('terry').property('isPersonalMngr',true).
addE('hasManager').from('chris').to('sally').property('isPersonalMngr',true).
addE('hasManager').from('bob').to('sally').property('isPersonalMngr',true).
addE('hasShift').from('tom').to('mon123').property('baseShift','day').
addE('hasShift').from('tim').to('tues123').property('baseShift','night').
addE('hasShift').from('lisa').to('wed123').property('baseShift','night').
addE('hasShift').from('sue').to('sat123').property('baseShift','night').
addE('hasShift').from('chris').to('wed123').property('baseShift','day').
addE('hasShift').from('bob').to('sat123').property('baseShift','day').
addE('hasShift').from('bob').to('mon123').property('baseShift','day').
addE('hasShift').from('tim').to('wed123').property('baseShift','day').
addE('hasManager').from('mon123').to('terry').property('isPersonalMngr',false).
addE('hasManager').from('tues123').to('sally').property('isPersonalMngr',false).
addE('hasManager').from('wed123').to('john').property('isPersonalMngr',false).
addE('hasManager').from('sat123').to('terry').property('isPersonalMngr',false)
From this, the follow query generates an output in the format that you're looking for:
gremlin> g.V('John').
union(
inE('hasManager').has('isPersonalMngr',true).outV(),
inE('hasManager').has('isPersonalMngr',false).outV().in('hasShift')).
dedup().
map(union(id(),out('hasShift').values('name').fold()).fold())
==>[Tom,[mon123]]
==>[Tim,[tues123,wed123]]
==>[Lisa,[wed123]]
==>[Chris,[wed123]]
A note on your data model - you could likely simplify things by having two different types of edges for hasManager and that would remove the need for a boolean property on those edges. Instead, you could have hasOrgManager and hasShiftManager edges and that would remove the need for the property checks when traversing those edges.

Add categories in MDS plot

I) PROBLEM
Let’s say I have a matrix like this with distances (in kilometers) between the homes of different people.
| | Person 1 | Person 2 | Person 3 |
|----------|----------|----------|----------|
| Person 1 | | | |
| Person 2 | 24 | | |
| Person 3 | 17 | 153 | |
And I have a data table like this:
| Person | Party |
|----------|----------|
| Person 1 | Party A |
| Person 2 | Party B |
| Person 3 | Party C |
I want to do multidimensional scaling (dissimilarity by distance) to visualize i) how close each person lives to another; ii) which party each person votes for (different colors for each party)
II) CURRENT RESULT
My current plot of MDS (made with SPSS) is like this (I don’t use a code line, but a menu commands in SPSS).
:
III) EXPECTED RESULT
I want to add a different color for each person depending on which party this person votes for:
IV) QUESTION(S)
Can I do it in SPSS? How to add the data about votes in the matrix and how to show it in MDS plot?
EDIT
There is quite the same problem and solution for R.
R) Create double-labeled MDS plot
But I want to do it in SPSS.
I don't believe it's possible to create a plot like the one you show directly from either of the MDS procedures currently available in SPSS Statistics, PROXSCAL or ALSCAL. I think what you'd need to do would be to save the common space coordinates to a new dataset or file, then add the Party variable to that new dataset or file, define it as Nominal in the measurement level designation in the Data Editor, and then use the Grouped Scatter option under Scatter/Dot in the chart Gallery in the Chart Builder, defining groups by the Party variable.
The PROXSCAL procedure lets you save things from the dialogs in the Output sub-dialog. The ALSCAL procedure only supports saving out of common space coordinates and other things using command syntax, specifically using the OUTFILE subcommand (you can paste the command from the dialogs, then add this subcommand).

Creating numbered variable names using the foreach command

I have a list of variables for which I want to create a list of numbered variables. The intent is to use these with the reshape command to create a stacked data set. How do I keep them in order? For instance, with this code
local ct = 1
foreach x in q61 q77 q99 q121 q143 q165 q187 q209 q231 q253 q275 q297 q306 q315 q324 q333 q342 q351 q360 q369 q378 q387 q396 q405 q414 q423 {
gen runs`ct' = `x'
local ct = `ct' + 1
}
when I use the reshape command it generates an order as
runs1 runs10 runs11 ... runs2 runs22 ...
rather than the desired
runs01 runs02 runs03 ... runs26
Preserving the order is necessary in this analysis. I'm trying to add a leading zero to all ct values less than 10 when assigning variable names.
Generating a series of identifiers with leading zeros is a documented and solved problem: see e.g. here.
local j = 1
foreach v in q61 q77 q99 q121 q143 q165 q187 q209 q231 q253 q275 q297 q306 q315 q324 q333 q342 q351 q360 q369 q378 q387 q396 q405 q414 q423 {
local J : di %02.0f `j'
rename `v' runs`J'
local ++j
}
Note that I used rename rather than generate. If you are going to reshape the variables afterwards, the labour of copying the contents is unnecessary. Indeed the default float type for numeric variables used by generate could in some circumstances result in loss of precision.
I note that there may also be a solution with rename groups.
All that said, it's hard to follow your complaint about what reshape does (or does not) do. If you have a series of variables like runs* the most obvious reshape is a reshape long and for example
clear
set obs 1
gen id = _n
foreach v in q61 q77 q99 q121 q143 {
gen `v' = 42
}
reshape long q, i(id) j(which)
list
+-----------------+
| id which q |
|-----------------|
1. | 1 61 42 |
2. | 1 77 42 |
3. | 1 99 42 |
4. | 1 121 42 |
5. | 1 143 42 |
+-----------------+
works fine for me; the column order information is preserved and no use of rename was needed at all. If I want to map the suffixes to 1 up, I can just use egen, group().
So, that's hard to discuss without a reproducible example. See
https://stackoverflow.com/help/mcve for how to post good code examples.

Associate scatter plot data with per-item labels

I have some data in a Google Sheets table, formatted like so:
Label | ValueA | ValueB
------+--------+-------
A | 1 | 1
B | 1 | 2
A | 3 | 3
B | 2 | 4
C | 9 | 1
I would like to render a scatterplot, with a single colored point for each entry, in which everything with an A label is color 1, everything with a B label is color 2, and so on, and they all share the same coordinate space.
I've poked around quite a bit in the options available in the UI, but nothing seems to do it. Multi color plots can be made, but they never associate the labels the way I want them to.
I guess this will take some scripting to do, but I really don't know where to start.
Maybe try a bubble chart instead?:
I suspect what you really want may be:
but the logic of the data layout that seems to be required to achieve this escapes me.

Resources