Error Adding Variables in SPSS - spss

I am using the Data > Merge Files > Add Variables in SPSS. The two .sav files both contain a variable called "Student_No" which is numeric with the same width in each file. I am using this as the key variable in which to match cases. I am not indicating that cases are not sorted. It makes no difference if I indicate that the active or non-active data set is keyed. In either case the new variables are not properly matched with the cases.
What are some of the potential problems that might be causing this mismatch?

The dialog box pastes STAR JOIN syntax in some cases and MATCH FILES in others. There were some problems with STAR JOIN in older versions of Statistics, so you might need to use MATCH FILES instead. See the Command Syntax Reference for that command on how to do this.

Related

Check values existence using spss syntax

I should check existence of values based on some conditions.
i.e. i have 3 variables, varA, varB and varC. varC should not be empty only if varA>varB (condition).
i normally use some syntax to check any of the variables and run a frequency of any of them to see if there are errors:
if missing(varC) and (varA>varB) ck_varC=1.
if not(missing(varC)) and not(varA>varB) ck_varC=2.
exe.
fre ck_varC.
exe.
I had some errors when the condition became complex and when in the condition there are missing() or other functions but i could have made a mistake.
do you think there is an easier way of doing this checks?
thanks in advance
EDIT: here an example of what i mean, think at a questionnaire with some routing, you ask age to anyone, if they are between 17 and 44 ask them if they work, if they work ask them how many hours.
i have an excel tool where i put down all variables with all conditions, then it will generate the syntax in the example, all with the same structure for all variables, considering both situations, we have a value that shouldn't be there or we don't have a value that should be there.
is there an easier way of doing that? is this structure always valid no matter what is the condition?
In SPSS, missing values are not numbers. You need to explicitly program those scenarios as well. you got varC covered (partially), but no scenario where varA or varB have missing data is covered.
(As good practice, maybe you should initialize your check variable as sysmis or 0, using syntax):
numeric ck_varC (f1.0).
compute ck_varC=0.
if missing(varC) and (varA>varB) ck_varC=1.
if not(missing(varC)) and not(varA>varB) ck_varC=2.
***additional conditional scenarios go here:.
if missing(varA) or missing(varB) ck_varC=3.
...
fre ck_varC.
By the way - you do not need any of the exe. commands if you are going to run your syntax as a whole.
Later Edit, after the poster updated the question:
Your syntax would be something like this. Note the use of the range function, which is not mandatory, but might be useful for you in the future.
I am also assuming that work is a string variable, so its values need to be referenced using quotation signs.
if missing(age) ck_age=1.
if missing(work) and range(age,17,44) ck_work=1.
if missing(hours) and work="yes" ck_hours=1.
if not (missing (age)) and not(1>0) ck_age=2. /*this will never happen because of the not(1>0).
if not(missing(work)) and (not range(age,17,44)) ck_work=2. /*note that if age is missing, this ck_work won't be set here.
if not(missing(hours)) and (not(work="yes")) ck_hours=2.
EXECUTE.
String variables are case sensitive
There is no missing equivalent in strings; an empty blank string ("") is still a string. not(work="yes") is True when work is blank ("").

How can I merge several files on SPSS by variable label?

I have 48 .sav data sets containing results of a monthly survey. I need to merge the cases of all common variables from them, in order to come up with a 4 years aggregate. As I'm new to SPSS and I'm not very proficient with syntax (although i can follow it) I would normally do this using Data - Merge files - Add Cases but most of these common variables have different variable names on each data set as the questions are not always formulated in the same order and some questions only appear on one or two data sets.
However, the variable labels do not change from one data set to another. It would be great if someone knows a way to merge this data sets by variable label instead of variable name. Swapping variable names and variable labels would also do as then I could use Data - Merge files - Add Cases without problems.
Many thanks beforehand!
The merge procedures such as ADD FILES (Data > Merge Files > Add Cases) provide a capability to rename variables in the input files before merging. However, if there are a lot of variables to merge, this would get pretty tedious and error prone. Also, the dialog box supports only merging two files, while syntax allows up to 50.
Variable labels are generally not valid as variable names due to the typical presence of characters such as blanks and punctuation and length restrictions. If you have a rule that could be used to turn labels into valid variable names, that could be automated, or if the variables are always in the same order and are present in all the files, they could be renamed something like V1, V2, ...
The renaming could be done manually in syntax that you would craft for each file, or this could be done with a short Python program that you run on each file. I can write that for you if you provide details and, preferably, a sample dataset to test with (jkpeck AT gmail.com).
The Python code could loop over all the sav files in a directory and apply the renaming logic to each in one step.

How can I set "999" as the DEFAULT missing value in SPSS/PASW?

I'm importing a very large dataset into SPSS. Many fields in the dataset contain a "999" value, indicating a missing value. I want to instruct SPSS to view them as such. However, default each variable in SPSS is set to having "no missing values". In variable view, you have to define "999" as being the "discrete missing value" for each variable. With hundreds of variables though, this is a lot of work:
Therefore: is there a way to define "discrete missing value 999" as the default missing value for each variable on import? This would save me a lot of work, but I cannot find the answer online (I only get tutorials as to how define 999 as the missing value for each variable seperately, as I am doing now).
Your help is be greatly appreciated!
Edit Now that I think about it: I can easily replace each "999" in the dataset by an empty cell. Aren't empty cells considered missing values by SPSS?
Syntax is your friend here as pointed out for the MISSING VALUES command. But you may have other metadata that is the same for many variables such as value labels or the measurement level. You can set those in syntax for multiple commands, but you might want to investigate the APPLY DICTIONARY command (Data > Copy Data Properties in the menus). Using it you can set up one variable with all the metadata to be shared and then apply all those specifications to a bunch of other variables.
I think you can change it for on variable, than copy that cell, select all other cells in the missing column (in variable view) and paste to all the other cells in once.
(The ctrl-C, ctrl-V shortcuts might not work)
But yes, empty cells are read as missing too.

How do I run multiple sets of regressions in SPSS without having to retype the command each time?

How do I run multiple sets of regressions in SPSS without having to retype the command each time or without having to change the dependent variable every single time manually?
I need to run a lot of regressions with the same independent variables but I need to change the dependent variable. Is there a possibility to make this process easier?
Thank you very much for your help.
Note also that if you have to repeat this process for each country, you can use SPLIT FILES with the country id, and statistical procedures, including REGRESSION, will automatically iterate over each country.
Let's say you have 50 dependent variables, each of which needs to be regressed on the same predictors using the same regression options. Paste your list of dependent variables into Excel as a vertical list (cells b1:b50). Into cells a1:a:50 paste that part of your regression syntax that comes before the name of the dependent variable, right up to and including "/dependent ". Into cells c1:c50 paste the part of your syntax that follows the name of the dependent variable. Then in cell d1 type "=concatenate(a1,b1,c1)". Paste that formula down through cells d2:d50 and you'll have your 50 commands to paste into a syntax window. It may show gridlines; SPSS will not have any problem with these.
btw, What sort of research context is it that requires these identically-configured regressions for a large number of outcomes?

Define every symbol as a command in LaTeX

I'm working on a large project involving multiple documents typeset in LaTeX. I want to be consistent in my use of symbols, so it might be a nice idea to define a command for every symbol that has a specific meaning throughout the project. Does anyone have any experience with this? Are there issues I should pay attention to?
A little more specific. Say that, throughout the document I would denote something called permability by a script P, would it be an idea to define
\providecommand{\permeability}{\mathscr{P}}
or would this be more like the case "defining a command for $n$"?
A few tips:
Using \providecommand will define that command only if it's not been previously defined. So if you're not getting the results you expected, you may be trying to define a command that's been defined elsewhere.
If you wrap the math in your commands with \ensuremath, it will do the right thing regardless of whether you're in math mode when you issue the command:
\providecommand{\permeability}{\ensuremath{\mathscr{P}}}
Now I can easily use \permeability in text or $\permeability$ in math mode.
Using your own commands allows you to easily change the typographical representation of something later. For instance:
\newcommand{\vect}[1]{\ensuremath{\mathbf{#1}}}
would print \vect{x} as a boldfaced x. If you later decide you prefer arrows above your vectors, you could change the command to:
\newcommand{\vect}[1]{\ensuremath{\vec{#1}}}
I have been doing this for anything that has a specific meaning and is longer than a single symbol, mostly to save typing:
\newcommand{\objId}{\mbox{$\mathit{objId}$}\xspace}
\newcommand{\insOp}[1]{#1\mbox{$^+$}\xspace}
\newcommand{\delOp}[1]{#1\mbox{$^-$}\xspace}
However then I noticed that I stopped making inconsistency errors (objId vs ObjId vs ObjID), so I agree that it is a nice idea.
However I am not sure if it is a good idea in case symbols in the output are, well, single Latin symbols, as in:
\newcommand{\numOfObjs}{$n$}
It is too easy to type a single symbol and forget about it even though a command was defined for it.
EDIT: using your example IMHO it'd be a good idea to define \permeability because it is more than a single P that you have to type in without the command. But it's a close call.

Resources