Add categories in MDS plot - spss

I) PROBLEM
Let’s say I have a matrix like this with distances (in kilometers) between the homes of different people.
| | Person 1 | Person 2 | Person 3 |
|----------|----------|----------|----------|
| Person 1 | | | |
| Person 2 | 24 | | |
| Person 3 | 17 | 153 | |
And I have a data table like this:
| Person | Party |
|----------|----------|
| Person 1 | Party A |
| Person 2 | Party B |
| Person 3 | Party C |
I want to do multidimensional scaling (dissimilarity by distance) to visualize i) how close each person lives to another; ii) which party each person votes for (different colors for each party)
II) CURRENT RESULT
My current plot of MDS (made with SPSS) is like this (I don’t use a code line, but a menu commands in SPSS).
:
III) EXPECTED RESULT
I want to add a different color for each person depending on which party this person votes for:
IV) QUESTION(S)
Can I do it in SPSS? How to add the data about votes in the matrix and how to show it in MDS plot?
EDIT
There is quite the same problem and solution for R.
R) Create double-labeled MDS plot
But I want to do it in SPSS.

I don't believe it's possible to create a plot like the one you show directly from either of the MDS procedures currently available in SPSS Statistics, PROXSCAL or ALSCAL. I think what you'd need to do would be to save the common space coordinates to a new dataset or file, then add the Party variable to that new dataset or file, define it as Nominal in the measurement level designation in the Data Editor, and then use the Grouped Scatter option under Scatter/Dot in the chart Gallery in the Chart Builder, defining groups by the Party variable.
The PROXSCAL procedure lets you save things from the dialogs in the Output sub-dialog. The ALSCAL procedure only supports saving out of common space coordinates and other things using command syntax, specifically using the OUTFILE subcommand (you can paste the command from the dialogs, then add this subcommand).

Related

Esttab: Create new row with logarithm of a beta coefficient

I am using Stata and I'm currently trying to figure out how to create new row that shows me the relative effect of a certain coefficient.
eststo, title(log_total[1]]): reg log_total a b
eststo, title(log_total[2]]): reg log_total a b c
esttab using total.tex
To give a sample code, this is what I have.
However, in the end besides the rows for a, b, and c I want to have a row that says effect for a where I calculate exp(a)-1 and where I want to print exp(a)-1 %.
The table should look the following:
| | total[1]| total[2]|
|:---- |:------:| -----:|
| a| 0.014| 0.021|
| b| 0.031| 0.005|
| c| | 0.082|
| Effect| 1.4 %| 2.1 %|
How can I add this "Effect" row to my table using esttab? I tried using estadd which works for fixed values but I was not able to figure out how to include a calculation in there.
Thank you a lot!

Gremlin, combine two queries and join data

I have a problem making a query for the following case:
+--------------------hasManager-------------------+
| | |
| property:isPersonalMngr=true (bool) |
| v
[ Employee ]-- hasShift -->[ Shift ]-- hasManager -->[ Manager ]
| | |
| | property:isPersonalMngr=false (bool)
| |
| property:name (text)
|
property:baseShift (bool)
For a manager 'John', who is managing shifts and can also be a personal manager of an empoyee, I want return all the employees he's managing with the list of shifts for each employee. Each empoyee has a 'baseShift' (say: 'night' / 'day') and a scheduled shift ('wed123')
Eg:
[ 'Employee1', [ 'night', 'wed123', 'sat123' ]]
[ 'Employee2', [ 'day', 'mon123', 'tue123' ]]
For the shift employees I have this:
g.V('John').in('hasManager').in('hasShift').hasLabel('Employee')
For the personal managed I have this:
g.V('John').in('hasManager').hasLabel('Employee')
How do I combine these two AND add the name property of the shift in a list?
Thanks.
To test this, I created the following graph. Hope this fits your data model from above:
g.addV('Manager').property(id,'John').as('john').
addV('Manager').property(id,'Terry').as('terry').
addV('Manager').property(id,'Sally').as('sally').
addV('Employee').property(id,'Tom').as('tom').
addV('Employee').property(id,'Tim').as('tim').
addV('Employee').property(id,'Lisa').as('lisa').
addV('Employee').property(id,'Sue').as('sue').
addV('Employee').property(id,'Chris').as('chris').
addV('Employee').property(id,'Bob').as('bob').
addV('Shift').property('name','mon123').as('mon123').
addV('Shift').property('name','tues123').as('tues123').
addV('Shift').property('name','sat123').as('sat123').
addV('Shift').property('name','wed123').as('wed123').
addE('hasManager').from('tom').to('john').property('isPersonalMngr',true).
addE('hasManager').from('tim').to('john').property('isPersonalMngr',true).
addE('hasManager').from('lisa').to('terry').property('isPersonalMngr',true).
addE('hasManager').from('sue').to('terry').property('isPersonalMngr',true).
addE('hasManager').from('chris').to('sally').property('isPersonalMngr',true).
addE('hasManager').from('bob').to('sally').property('isPersonalMngr',true).
addE('hasShift').from('tom').to('mon123').property('baseShift','day').
addE('hasShift').from('tim').to('tues123').property('baseShift','night').
addE('hasShift').from('lisa').to('wed123').property('baseShift','night').
addE('hasShift').from('sue').to('sat123').property('baseShift','night').
addE('hasShift').from('chris').to('wed123').property('baseShift','day').
addE('hasShift').from('bob').to('sat123').property('baseShift','day').
addE('hasShift').from('bob').to('mon123').property('baseShift','day').
addE('hasShift').from('tim').to('wed123').property('baseShift','day').
addE('hasManager').from('mon123').to('terry').property('isPersonalMngr',false).
addE('hasManager').from('tues123').to('sally').property('isPersonalMngr',false).
addE('hasManager').from('wed123').to('john').property('isPersonalMngr',false).
addE('hasManager').from('sat123').to('terry').property('isPersonalMngr',false)
From this, the follow query generates an output in the format that you're looking for:
gremlin> g.V('John').
union(
inE('hasManager').has('isPersonalMngr',true).outV(),
inE('hasManager').has('isPersonalMngr',false).outV().in('hasShift')).
dedup().
map(union(id(),out('hasShift').values('name').fold()).fold())
==>[Tom,[mon123]]
==>[Tim,[tues123,wed123]]
==>[Lisa,[wed123]]
==>[Chris,[wed123]]
A note on your data model - you could likely simplify things by having two different types of edges for hasManager and that would remove the need for a boolean property on those edges. Instead, you could have hasOrgManager and hasShiftManager edges and that would remove the need for the property checks when traversing those edges.

count after groupby in Google Dataflow

I have the following in my Google cloud storage
Advertiser | Event
__________________
100 | Click
101 | Impression
100 | Impression
100 | Impression
101 | Impression
My output of the pipeline should be something like
Advertiser | Count
100 | 3
101 | 2
First I used groupByKey, the output is like
100 Click, Impression, Impression
101 Impression, Impression
How to proceed from here?
Instead of a GroupByKey, you may want to use a combine function, which is a composite that optimizes before and after the group by key. Your pipeline can look something like this:
Python
collection_contents = [(100, 'Click'),
(101, 'Impression'),
(100, 'Impression'),
(100, 'Impression'),
(101, 'Impression']
input_collection = pipeline | beam.Create(collection_contents)
counts = input_collection | Count.PerKey()
This should output a collection with the shape you are looking for. The Count series of transforms is available in the apache_beam.transforms.combiners.combine.Count module.
Java
The same transforms exist for Java in the org.apache.beam.sdk.transforms package:
PCollection<KV<Integer, Integer>> resultColl = inputColl.apply(Count.perKey())
This counting pattern has been described in the 'word count' sample of Apache Beam.
Find the sample at Github apache beam sample: wordcount.py. The counting starts at line 95.

Calculate hierarchical labels for Google Sheets using native functions

Using Google Sheets, I want to automatically number rows like so:
The key is that I want this to use built-in functions only.
I have an implementation working where child items are in separate columns (e.g. "Foo" is in column B, "Bar" is in column C, and "Baz" is in column D). However, it uses a custom JavaScript function, and the slow way that custom JavaScript functions are evaluated, combined with the dependencies, possibly combined with a slow Internet connection, means that my solution can take over one second per row (!) to calculate.
For reference, here's my custom function (that I want to abandon in favor of native code):
/**
* Calculate the Work Breakdown Structure id for this row.
*
* #param {range} priorIds IDs that precede this one.
* #param {range} names The names for this row.
* #return A WBS string id (e.g. "2.1.5") or an empty string if there are no names.
* #customfunction
*/
function WBS_ID(priorIds,names){
if (Array.isArray(names[0])) names = names[0];
if (!names.join("")) return "";
var lastId,pieces=[];
for (var i=priorIds.length;i-- && !lastId;) lastId=priorIds[i][0];
if (lastId) pieces = (lastId+"").split('.').map(function(s){ return s*1 });
for (var i=0;i<names.length;i++){
if (names[i]){
var s = pieces.concat();
pieces.length=i+1;
pieces[i] = (pieces[i]||0) + 1;
return pieces.join(".");
}
}
}
For example, cell A7 would use the formula:
=WBS_ID(A$2:A6,B7:D7)
...to produce the result "1.3.2"
Note that in the above example blank rows are skipped during numbering. An answer that does not honor this—where the ID is calculated determinstically from the ROW())—is acceptable (and possibly even desirable).
Edit: Yes, I've tried to do this myself. I have a solution that uses three extra columns which I chose not to include in the question. I have been writing equations in Excel for at least 25 years (and Google Spreadsheets for 1 year). I have looked through the list of functions for Google Spreadsheets and none of them jumps out to me as making possible something that I didn't think of before.
When the question is a programming problem and the problem is an inability to see how to get from point A to point B, I don't know that it's useful to "show what I've done". I've considered splitting by periods. I've looked for a map equivalent function. I know how to use isblank() and counta().
Lol this is hilariously the longest (and very likely the most unnecessarily complicated way to combine formulas) but because I thought it was interesting that it does in fact work, so long as you just add a 1 in the first row then in the second row you add:
=if(row()=1,1,if(and(istext(D2),counta(split(A1,"."))=3),left(A1,4)&n(right(A1,1)+1),if(and(isblank(B2),isblank(C2),isblank(D2)),"",if(and(isblank(B2),isblank(C2),isnumber(indirect(address(row()-1,column())))),indirect(address(row()-1,column()))&"."&if(istext(D2),round(max(indirect(address(1,column())&":"&address(row()-1,column())))+0.1,)),if(and(isblank(B2),istext(C2)),round(max(indirect(address(1,column())&":"&address(row()-1,column())))+0.1,2),if(istext(B2),round(max(indirect(address(1,column())&":"&address(row()-1,column())))+1,),))))))
in my defense ive had a very long day at work - complicating what should be a simple thing seems to be my thing today :)
Foreword
Spreadsheet built-in functions doesn't include an equivalent to JavaScript .map. The alternative is to use the spreadsheets array handling features and iteration patterns.
A "complete solution" could include the use of built-in functions to automatically transform the user input into a simple table and returning the Work Breakdown Structure number (WBS) . Some people refer to transforming the user input into a simple table as "normalization" but including this will make this post to be too long for the Stack Overflow format, so it will be focused in presenting a short formula to obtain the WBS.
It's worth to say that using formulas for doing the transformation of large data sets into a simple table as part of the continuous spreadsheet calculations, in this case, of WBS, will make the spreadsheet to slow to refresh.
Short answer
To keep the WBS formula short and simple, first transform the user input into a simple table including task name, id and parent id columns, then use a formula like the following:
=ArrayFormula(
IFERROR(
INDEX($D$2:$D,MATCH($C2,$B$2:$B,0))
&"."
&COUNTIF($C$2:$C2,C2),
RANK($B2,FILTER($B$2:B,LEN($C$2:$C)=0),TRUE)&"")
)
Explanation
First, prepare your data
Put each task in one row. Include a General task / project to be used as the parent of all the root level tasks.
Add an ID to each task.
Add a reference to the ID of the parent task for each task. Left blank for the General task / project.
After the above steps the data should look like the following:
+---+--------------+----+-----------+
| | A | B | C |
+---+--------------+----+-----------+
| 1 | Task | ID | Parent ID |
| 2 | General task | 1 | |
| 3 | Substast 1 | 2 | 1 |
| 4 | Substast 2 | 3 | 1 |
| 5 | Subsubtask 1 | 4 | 2 |
| 6 | Subsubtask 2 | 5 | 2 |
+---+--------------+----+-----------+
Remark: This also could help to reduce of required processing time of a custom funcion.
Second, add the below formula to D2, then fill down as needed,
=ArrayFormula(
IFERROR(
INDEX($D$2:$D,MATCH($C2,$B$2:$B,0))
&"."
&COUNTIF($C$2:$C2,C2),
RANK($B2,FILTER($B$2:B,LEN($C$2:$C)=0),TRUE)&"")
)
The result should look like the following:
+---+--------------+----+-----------+----------+
| | A | B | C | D |
+---+--------------+----+-----------+----------+
| 1 | Task | ID | Parent ID | WBS |
| 2 | General task | 1 | | 1 |
| 3 | Substast 1 | 2 | 1 | 1.1 |
| 4 | Substast 2 | 3 | 1 | 1.2 |
| 5 | Subsubtask 1 | 4 | 2 | 1.1.1 |
| 6 | Subsubtask 2 | 5 | 2 | 1.1.2 |
+---+--------------+----+-----------+----------+
Here's an answer that does not allow a blank line between items, and requires that you manually type "1" into the first cell (A2). This formula is applied to cell A3, with the assumption that there are at most three levels of hierarchy in columns B, C, and D.
=IF(
COUNTA(B3), // If there is a value in the 1st column
INDEX(SPLIT(A2,"."),1)+1, // find the 1st part of the prior ID, plus 1
IF( // ...otherwise
COUNTA(C3), // If there's a value in the 2nd column
INDEX(SPLIT(A2,"."),1) // find the 1st part of the prior ID
& "." // add a period and
& IFERROR(INDEX(SPLIT(A2,"."),2),0)+1, // add the 2nd part of the prior ID (or 0), plus 1
INDEX(SPLIT(A2,"."),1) // ...otherwise find the 1st part of the prior ID
& "." // add a period and
& IFERROR(INDEX(SPLIT(A2,"."),2),1) // add the 2nd part of the prior ID or 1 and
& "." // add a period and
& IFERROR(INDEX(SPLIT(A2,"."),3)+1,1) // add the 3rd part of the prior ID (or 0), plus 1
)
) & "" // Ensure the result is a string ("1.2", not 1.2)
Without comments:
=IF(COUNTA(B3),INDEX(SPLIT(A2,"."),1)+1,IF(COUNTA(C3),INDEX(SPLIT(A2,"."),1)& "."& IFERROR(INDEX(SPLIT(A2,"."),2),0)+1,INDEX(SPLIT(A2,"."),1)& "."& IFERROR(INDEX(SPLIT(A2,"."),2),1)& "."& IFERROR(INDEX(SPLIT(A2,"."),3)+1,1))) & ""

Associate scatter plot data with per-item labels

I have some data in a Google Sheets table, formatted like so:
Label | ValueA | ValueB
------+--------+-------
A | 1 | 1
B | 1 | 2
A | 3 | 3
B | 2 | 4
C | 9 | 1
I would like to render a scatterplot, with a single colored point for each entry, in which everything with an A label is color 1, everything with a B label is color 2, and so on, and they all share the same coordinate space.
I've poked around quite a bit in the options available in the UI, but nothing seems to do it. Multi color plots can be made, but they never associate the labels the way I want them to.
I guess this will take some scripting to do, but I really don't know where to start.
Maybe try a bubble chart instead?:
I suspect what you really want may be:
but the logic of the data layout that seems to be required to achieve this escapes me.

Resources