Deleting Value Labels in SPSS - spss

I imported data from survey monkey into spss and survey monkey automatically assigns values and value labels. My values and labels are currently something like this:
1 "Married"
2 "Single"
3 "777"
4 "999"
I re-coded variables so that 3=777 and 4=999. Then I set 777 and 999 to missing. I then used ADD VALUE LABELS to add the 777= "Refused" and 999= "Don't know". How do I use syntax to delete the Value and Value Labels for 3 and 4? These are no longer true since I re-coded values 3 and 4. I know I can use VALUE LABELS to delete all my values and labels but I would have to specify all my categories which would be tedious. Ideally I would want to re-code the 3 and 4 values, add values labels for the new 777 and 999 values and delete the old 3 and 4. If I only had a few variables I would consider doing it a different way but I want to write syntax that I could use for a list of about 100 variables. I will also be pulling data from survey monkey on a weekly basis and would like to have the syntax file to rename, recode, and add value labels ready to go each time I pull the data.

I don't believe there is a way to delete specific value labels for specific values only. So the workaround is to explicitly set the values for the entire set of values:
DATA LIST FREE / MS.
BEGIN DATA
1 2 3 4
END DATA.
/* 1. Original values labels */.
VALUE LABELS MS 1 "Sinlge" 2 "Married" 3 "777" 4 "999".
CTABLES /TABLE MS[C].
/* 2. Recode values and re-label - Note values 3 and 4 are still assigned values but they happen to be blank as they are being registered by CTABLES */.
RECODE MS (3=777) (4=999).
ADD VALUE LABELS MS 3 "" 4 "" 777 "Refused" 999 "Unknown".
CTABLES /TABLE MS[C].
/* 3. Workaround is to assign explicitly entire set of values */.
VALUE LABELS MS 1 "Sinlge" 2 "Married" 777 "Refused" 999 "Unknown".
CTABLES /TABLE MS[C].
Update:
Well, nothing is impossible in the realms of computing. Raynald Levesque outlines a workaround solution here. And Ruben Geert van den Berg provides a python solution on his website also.

That's can make with python begin-end program block inside SPSS syntax:
DATA LIST FREE / MS (F1.0).
BEGIN DATA
END DATA.
VALUE LABELS MS 1 "Married" 2 "Single" 3 "777" 4 "999".
ADD VALUE LABELS MS 777 "Refused" 999 "Don't know".
BEGIN PROGRAM.
import spss
qst='MS'
values=[3,4]
with spss.DataStep():
datasetObj=spss.Dataset();varObj = datasetObj.varlist[qst];valObj=varObj.valueLabels
print 'Before:',valObj
for i in values:
try:
del valObj[i]
except:
continue
print 'After:',valObj
END PROGRAM.
Output Log:
Before: {1.0: 'Married', 2.0: 'Single', 3.0: '777', 4.0: '999', 999.0: "Don't know", 777.0: 'Refused'}
After: {1.0: 'Married', 2.0: 'Single', 777.0: 'Refused', 999.0: "Don't know"}

Related

Tableau question: How to link a reference table to a dynamic calculated field value (which is an integer)? I'm assigning P values

Since Tableau does not have a function for P-values(correct me if I'm wrong here) I created a spreadsheet with all possible sample sizes under two different alphas/significance levels and need to connect the appropriate p-value to a calculated field from the main database source (aggregate count of people). I assumed I could easily match numbers with a condition to bring back the p-value in a calculated field yet I'm hitting a brick wall. Biggest issue seems to be that the field I want to join the P-value reference table to is an aggregated integer. Also, I do not have any extensions and my end result needs to be an integer, not a graph.
Any secret tricks here?
Seems I cannot blend the reference table in nor join it to an aggregate?
Thanks!
I found a work around in calculating the critical value for a two tailed t-test in tableau. However, I didn't figure out how to join based on an aggregated calculated field. Work around: I used a conditional statement just copying and pasting about 100 critical values based on (sample size - 2) aka degrees of freedom, into a calculated field. To save time, use excel to pull down the conditions to 120. Worked like a charm!
Here is the conditional logic for alpha = .2 (80%) in two tailed t-test (replace the ## line with about 117 rows):
IF [degrees of freedom] = 1 THEN 3.08
ELSEIF [degrees of freedom] = 2 THEN 1.89
ELSEIF [degrees of freedom] = 3 THEN 1.64
##ELSEIF [...calculate down to 120] = ... then ...
ELSEIF [degrees of freedom] > 121 THEN 1.28
END

Collect value that is below other values

I'm trying to figure out how to collect the value that is always in LINE 9 of texts with this same template:
Aposta
Sport: 11.718.177
Compartilhar
Feita por
Privado
em 25/06/2021 às 10:04
Vitória
10:04 25/06/2021
Katerina Siniakova - Sorribes Tormo, Sara
2nd set jogo 6 - vencedor
Vitória
Katerina Siniakova
1,30
2-0
In this case, the value of LINE 9 is:
Vitória
I tried to use:
=TRANSPOSE(SPLIT(A1,"
"))
And after creating a column with the separate values, I tried using QUERY to remove the first lines of text and using LIMIT 9 to keep only the value of ROW 9, but QUERY joins the values from other lines and ends up giving a wrong value.
Note: I will need to use it to analyze texts like this on several different lines in Column A, so I should look for an option that can also be used as ARRAY so I don't need to put a different formula on each line.
This will give you the 9th column of an array split by carriage returns:
=INDEX(SPLIT(A2:A,CHAR(10),0,0),,9)

SPSS set lables equal to value in column

As the title says.
What I'm trying to do is a way to set the labels of a column equal to the value in another column.
A B
1 Car
2 Bike
3 Van
1 Car
3 Van
Column A contains the numeric values. Column B contains the labels.
I want to tell SPSS to take the value 1, and assign it the label "Car" (and so on) as clasically is done manually with:
VALUE LABELS
1 "Car"
2 "Bike"
3 "Van".
Execute.
The syntax below will automatically create a new syntax that adds the value labels as you described.
Before starting, I'm recreating the sample data you posted to demonstrate on:
data list list/A (f1) B (a10).
begin data
1 "Car"
2 "Bike"
3 "Van"
1 " Car"
3 "Van"
end data.
dataset name orig.
Now we get to work:
* first we aggregate the data to get only one line for every value/label pair.
dataset declare agg.
aggregate out=agg /break A B /nn=n.
dataset activate agg.
* now we use the data to create a syntax file with the value label commands.
string cat (a50).
compute cat=concat('"', B, '"').
write out="yourpath\my labels syntax.sps" /"add value labels A ", A, cat, ".".
execute.
* getting back to the original data we can now execute the syntax.
dataset activate orig.
insert file="yourpath\my labels syntax.sps".

Gephi - generate graph using matrix data

Can you help me visualise an undirected graph?
I have about 500 strings that look like this:
;javascript;java;tapestry;d;design;jquery;css;html;com;air;testing;events;crm;soa;documentation;.a;email;iso;dynamic;mobile;this;project;resolution;s;automation;web;like;e-commerce;profile;commerce;out;jobs;inventory;operators;environment;system;include;integration;relationship;field;implementation;key;.profile;planning;knockout.js;sun;packaging;collaboration;report;public;virtual;communication;send;state;member;execution;solution;provider;members;continuous;writing;e;cuba;required;transactional;subject;manual;capacity;portfolio;.so;leader;take
;c;python;java;.a;basic;equivalent;cad;requirements;catia;.x;nx;self;communication;selected;base;summary
;javascript;c;python;java;rest;android;security;linux;sql;git;design;perl;css;html;svn;yaml;architecture;ios;json;api;ubuntu;pyramid;deployment;bash;documentation;configuration;frameworks;module;object;.a;multitasking;centos;hosting;project;fluent;administrator;monitoring;control;specifications;web;version;platform;admin;components;out;minimum;environment;system;include;using;key;falcon;communication;migrate;deadlines;ansible;back;cycle;production;red;analysis;administration;graphic;maintenance;autonomy;french;required;environments;hat;lead;arch;take
and what I would like to do with them is calculate and visualise the edges between the shared elements of the strings. Like if in the first two strings we find javascript and python, then the edge would be thicker between them for every match occurence in all strings in the final graph.
What I've done so far is to parse out the strings and separate each one in a 1/0 matrix, with the string names as column names (in a csv file) but that did not seem to work because I don't know if the labels could be seen in Gelphi as column names.
javascript java tapestry
---------------------------------------
Row 1 1 0 1
Row 2 0 1 0
Row 3 1 1 1
So I transposed the matrix to get the strings all in a column, but those columns enumerated by number don't mean much to me.
name Col1 Col2 Col3 Col4
------- ------------------------
javascript 1 0 0 0 1
java 0 1 1 0 0
tapestry 1 0 1 0 1
I am thinking a matrix multiplied by its transverse might help, although Im not sure how the math works for the result interpretation.
what I would like to do with them is calculate and visualise the edges between the shared elements of the strings.
Gephi just visualizes what is created manually, or selected for import.
Can you help me visualise an undirected graph?
Graph as .csv
Turning associations into a static graph (as Gephi-compatible .csv file):
Nodes
List unique names, save to .csv like:
id,label
0,"node #1"
1,"node #2"
2,"node #3"
Optionally, add additional columns as required.
Edges
Enumerate associations, add to weight for each occurrence, save to .csv, like:
id,source,target,label,weight
0,0,1,"edge #1",1
1,0,2,"edge #2",3
Alternatively, create new edge per association-occurrence and merge them after import.
Do not create edges for 0 -weight values (no connection).
source and target -column reference node id.
label -column is optional.
May contain type -column (allows for Directed or Undirected -value).
Optionally, convert to weight (0.0 - 1.0 as opposed to amount) by recalculating weight as:
weight = weight / highest_weight

Count selected elements for each line and create an arrayformula that groups by number of counts

We have asked users:
What to do with the money?
[ ] paint the bridge
[ ] rebuild the school
[ ] keep the money
[ ] Other : [____________________]
Here is the spreadsheet with their answers:
A B
1 Name Choices
2 Lilia paint the bridge, rebuild the school, keep the money
3 Paul rebuild the school, paint the bridge, do something else
4 Margerite keep the money, I don't know, do what you want
5 John paint the bridge
...
800
I want a formula that output the number of official choices (excluding other) picked per user.
With the first 4 rows of data, the formula would output this table:
D E
Nbr of choices a user made Frequency (Nbr of users who made these choices)
0 0
1 2
2 1
3 1
Couldn't find a way to get this right from a single formula. For a starter, I wanted to split each line (of B2: B) by "," but couldn't find a way to apply a fn (split) to each line in an formula...
Even with 800 rows of data (B2:B), the resulting table (D2:E5) would always be 4 rows long plus titles (and two column wide)
I could do this in C2, and replicate manually with the "+" corner icon...
=countif(B2;"*rebuild the school*")+countif(B2;"*keep the money*")+countif(B2;"*paint the bridge*")
And then do in E2:
=arrayformula(countif(C2:C;D2:D5))
But I'd like to generate the table of frequencies in one formula, without any manual action (without C column).
So I am looking for a way to "map" the first function to each row, put this in the second fn.
ANSWER by Akshin Jalilov EXPLAINED
This is the answer by Akshin Jalilov, but shorter (and with international notations)
=ARRAYFORMULA(COUNTIF(ARRAYFORMULA(IF(B2:B="";;COUNTIF(ARRAYFORMULA
(IFERROR(IF(FIND("paint the bridge";B2:B);Row(B2:B);0)));"="&row(B2:B))
+COUNTIF(ARRAYFORMULA(IFERROR(IF(FIND(
"rebuild the school";B2:B);Row(B2:B);0)));"="&row(B2:B))
+COUNTIF(ARRAYFORMULA(IFERROR(IF(FIND(
"keep the money";B2:B);Row(B2:B);0)));"="&row(B2:B))));"="&D2:D5))
Step1:
IF(FIND("rebuild the school";B2:B);Row(B2:B);0)
This means, for each row (B2:B) find "rebuild the school". If you find it, return the number of the row, otherwise, return 0.
Step2:
=ARRAYFORMULA(IFERROR(Step1))
Wrap this in an ARRAYFORMULA so that you return the results for each row.
I think IFERROR is there to prevent an error from stopping the process.
Step3:
=ARRAYFORMULA(IF(B2:B="";;COUNTIF(ARRAYFORMULA(IFERROR(IF(FIND("paint the bridge";B2:B);Row(B2:B);0)));"="&row(B2:B))+countif(Step2)+countif(ARRAYFORMULA(IFERROR(IF(FIND("keep the money";B2:B);Row(B2:B);0)));"="&row(B2:B))))
This will count valid votes made by each users. This is equivalent to C2 formula referred in my manual process. But is it now part of a single global formula.
Step4:
Lastly, the rest of the formula counts frequencies of each voting count possibilities.
I know this formula is large but this is the closest I got to what you want.
Now to make it easy, name your responses range "Responses". I assume it is B2:B.
Here is the formula:
=ARRAYFORMULA(Countif(ARRAYFORMULA(IF(Responses="",,COUNTIF(VLOOKUP(row(Responses),({ARRAYFORMULA(Row(Responses)),ARRAYFORMULA(IFERROR(IF(FIND("paint the bridge",Responses),Row(Responses),0))),ARRAYFORMULA(IFERROR(IF(FIND("rebuild the school",Responses),Row(Responses),0))),ARRAYFORMULA(IFERROR(IF(FIND("keep the money",Responses),Row(Responses),0)))}),2),"="&row(Responses))+COUNTIF(VLOOKUP(row(Responses),({ARRAYFORMULA(Row(Responses)),ARRAYFORMULA(IFERROR(IF(FIND("paint the bridge",Responses),Row(Responses),0))),ARRAYFORMULA(IFERROR(IF(FIND("rebuild the school",Responses),Row(Responses),0))),ARRAYFORMULA(IFERROR(IF(FIND("keep the money",Responses),Row(Responses),0)))}),3),"="&row(Responses))+COUNTIF(VLOOKUP(row(Responses),({ARRAYFORMULA(Row(Responses)),ARRAYFORMULA(IFERROR(IF(FIND("paint the bridge",Responses),Row(Responses),0))),ARRAYFORMULA(IFERROR(IF(FIND("rebuild the school",Responses),Row(Responses),0))),ARRAYFORMULA(IFERROR(IF(FIND("keep the money",Responses),Row(Responses),0)))}),4),"="&row(Responses)))),"="&D2:D5))
Here is an example if how it works. I am not sure which one exactly you wanted so added both

Resources