SPSS merging two datasets - some different variables, no duplicate cases/rows - spss

I have two SPSS files (.sav). The two files have some different variables (columns) and they have completely different rows/cases. I know that SPSS can "add variables" or "add cases", but I need to do both.
I want to merge the files together, keeping all variables from both files.

You want to "Add cases" ("Data/Mergefiles/Add cases..."), and you will be prompted to choose which variables to keep in the new datafile. Simply add all available variables (from both files) when prompted (there will be a dialogue; move all variables from the list on the left into the list on the right)
If the two files have common variables, they will be merged into a single variable, assuming they also have the same format (e.g. - you won't be able to merge variables with the same names, if one is numeric and the other one is string).

Related

How to build "if statement" in SPSS Modeler?

could you please advise how to build "if statement" in SPSS Modeler if we have two data sources?
One data source (1) is a table (an output node generated by SPSS Modeler) where all the IDs are listed with which we need to work further.
Another data source (2) is an Excel file where all the IDs are listed whereas this list includes some IDs from (1) but also some additional ones - to all these IDs are assigned values that are needed to be added to the data source (1) not necessarily to the table.
So if the ID from (1) is in (2) we would like to assign a value from (2) to the ID in (1) and have it stored in some table or even better in a file.
Thank you very much for your help / advice.
Patricia
Based on your problem it sounds like you want to merge these datasets. This can be easily done in Modeler via the Merge Node, just make sure the variables have the same name or Modeler won't recognize it as a key. You can see an example here
You can also create a flag variable using the Derive node, see example here
You will have to use the Merge Node to combine the 2 datasets but you don't have to give the same name for the keys IDs. You can use the option condition in the Merge Node without the necessity of having the same name and even the same type of variable.
Syntax example for the merge Node - option condition: 'ID' = 'id'

How do I programmatically merge cases from datasets with conflicting variable names?

I want to add cases from many SPSS dataset to one SPSS dataset.
Here's my code:
DATASET ACTIVATE DataSet1.
ADD FILES /FILE=*
/FILE='Path\to\dataset.sav'.
EXECUTE.
But I get this error: Mismatched variable types on the input files.
I want SPSS to ignore the conflicting columns and add cases only from the columns where there is no conflict.
How do I do this?
This occurs because variables of the same name in the two different data sources have either different format types (STRING, NUMERIC, DATE ect) or either they are both STRINGS but of different length.
The latter, string variables of different lenghts, can be solved like this:
DATA LIST FREE / V(A1).
BEGIN DATA.
a b c
END DATA.
DATASET NAME DS1.
DATA LIST FREE / V(A2).
BEGIN DATA.
1 2 3
END DATA.
DATASET NAME DS2.
STATS ADJUST WIDTHS VARIABLES=ALL WIDTH=MAX /FILES DS1 DS2.
DATASET ACTIVATE DS1.
ADD FILES FILE=* /FILE=DS2.
However, if you have mismatch of different format types then that is a tad more complicated to solve due to many different permutations, so you would probably want to asses which variables are problematic and harmonize/delete them before merging files. Probably worth carrying out this exercise nonetheless as having same variable names with different format type could be signs of erroneous data.
If you know which variables conflict, you can use the KEEP subcommand to select the others, or you can use the RENAME command to assign new names and adjust the results afterwards.
If you need to harmonize the names and the issue is something like differing string lengths for variables that should be the same, the STATS ADJUST WIDTHS extension command can harmonize the widths before you merge.

Can SPSS treat a collection of Nominal Variables as one variable?

I have a lot of movie data from IMDB and I'm in the middle of cleaning up the data and making it so that 1 row = 1 movie as the database often has multiple records for a single film.
I've restructured the data so that what was a single 'Country' variable with multiple cases for a single film, is now a set of 29 country columns. A single film may have up to 29 countries affiliated with it (most have just 1 or 2).
I plan to do some simple descriptive statistics and expected frequencies, perhaps look for correlations with other variables like genre etc.
Is it possible to have SPSS treat all 29 variables as a single variable? It doesn't matter which of the country variables a country is present in, just that it is present in one of them. For example I might want to find all Indian films, and ask SPSS to check for each row, whether 'India' is in any one of the country variables and return the row if it is present in any of them.
Is this possible, or do I just need to manually instruct SPSS with a list of OR commands whenever I run a query.
There are two types of multiple response sets: multiple dichotomy, which would be 29 yes/no variables as you describe, and multiple category, in which you have a list of arbitrary categories. See the MRSETS command for details.
Once defined, CTABLES can do all your statistical calculations on these, and these sets can also be used in graphics constructed in the Chart Builder or GGRAPH commands.
Don't confuse the sets created by MRSETS with the older MULTIPLE RESPONSE procedure, which is still available. MRSETS definitions persist with the data and are used with CTABLES and GGRAPH only.
With the ANY function, as Andy said above, you would use the individual variables, but you can use TO. So, for example, you could write
COMPUTE FILM7 = ANY(7, f1 to f29)
if you have MC variables. If using the MD structure, you would have to check, say, variable f7 in this example.

Large file merging problems in SPSS

I have a large dataset of over 4000 cases with over 500 variables. I want to add this set of variables to another dataset containing most of the same cases but only around 10 variables.
Both of the datasets contain an ID variable that allows me to match the cases. The larger dataset is a keyed table because there are cases in there that aren't in the smaller set and are therefore of no interest to me.
I'm very comfortable with merging the files but my problem arises when I look at the new dataset. The variables are in there but all the values turn up missing. This only applies to the variables that were added to the active dataset. I checked to see if the key variable had any duplicates and it didn't.
I wonder why this happens, and if there is a way to fix this?
I can add that I have done this very often before without this problem.

Merging files in spss

Merging files in spss
Hi,
I have a problem in merging files. Here's what I need to do: I have chosen 200 cases from 7000 in ArcMap (GIS-program). In the process I have lost some of the cases' variable information.
Now I would like to get the variables back to my smaller dataset, and I used data-> merge files > add variables, and ID as match, match cases on > keyvariables in sorted files > both files provide cases.
This gave a dataset of all the 7000 cases, only the variables already existed in the first table didn't add to the merged dataset. I tried also all different choises, but none of them gave me the result I wanted. This would be the 200 cases added with the variables that were lost in the process.
So in a nutshell how do I merge/replace the info from variables A (dataset) to variables B(dataset) without extra casesĀ“ from A (only the info of the selected 200 casesĀ“out of 7000)?
Out of hand:
Create a new variable in the reduced DataSet with the Value of 1.
Match the files.
Sort by the new variable.
Delete all cases who don't have the value 1 on this variable.
I don't see why you are choosing both files provide cases. You want to use the 7000-case file as a keyed table using ID as the key and match it with the 200-case file, which provides all the cases. Assuming that you select all the variables from the large file that you want, this should give you the desired result.

Resources