How do I programmatically merge cases from datasets with conflicting variable names?

How do I programmatically merge cases from datasets with conflicting variable names? - spss

I want to add cases from many SPSS dataset to one SPSS dataset.
Here's my code:
DATASET ACTIVATE DataSet1.
ADD FILES /FILE=*
/FILE='Path\to\dataset.sav'.
EXECUTE.
But I get this error: Mismatched variable types on the input files.
I want SPSS to ignore the conflicting columns and add cases only from the columns where there is no conflict.
How do I do this?

This occurs because variables of the same name in the two different data sources have either different format types (STRING, NUMERIC, DATE ect) or either they are both STRINGS but of different length.
The latter, string variables of different lenghts, can be solved like this:
DATA LIST FREE / V(A1).
BEGIN DATA.
a b c
END DATA.
DATASET NAME DS1.
DATA LIST FREE / V(A2).
BEGIN DATA.
1 2 3
END DATA.
DATASET NAME DS2.
STATS ADJUST WIDTHS VARIABLES=ALL WIDTH=MAX /FILES DS1 DS2.
DATASET ACTIVATE DS1.
ADD FILES FILE=* /FILE=DS2.
However, if you have mismatch of different format types then that is a tad more complicated to solve due to many different permutations, so you would probably want to asses which variables are problematic and harmonize/delete them before merging files. Probably worth carrying out this exercise nonetheless as having same variable names with different format type could be signs of erroneous data.

If you know which variables conflict, you can use the KEEP subcommand to select the others, or you can use the RENAME command to assign new names and adjust the results afterwards.
If you need to harmonize the names and the issue is something like differing string lengths for variables that should be the same, the STATS ADJUST WIDTHS extension command can harmonize the widths before you merge.

Related

Merging and appending data on SPSS

I am collecting data on cspro and export it to spss and I want append new data on to the old data, so I won't have the same files that I worked on on my last data.
Is there any syntax to sort that out?

If you are looking for a way to add up two SPSS files (after exporting your new data to a new file), the syntax is:
add files
/file="path1/filename1.sav"
/file="path2/filename2.sav".
you may have trouble if the same string variables have different widths in the two files. If so you need to choose the appropriate width and force it on all relevant variables before adding the files:
get file="path1/filename1.sav".
alter type stringVar1 (a50) stringVar2 (a150).
dataset name fil1.
get file="path2/filename2.sav".
alter type stringVar1 (a50) stringVar2 (a150).
dataset name fil2.
add files /file=fil1 /file=fil2.
execute.

How to define large set of properties of a node without having to type them all?

I have imported a csv file into neo4j. I have been trying to define a large number of properties (all the columns) for each node. How can i do that without having to type in each name?
I have been trying this:
USING PERIODIC COMMIT
load csv WITH headers from "file:///frozen_catalog.csv" AS line
//Creating nodes for each product id with its properties
CREATE (product:product{id : line.`o_prd`,
Gross_Price_Average: TOINT(line.`Gross_Price_Average`),
O_PRD_SPG: TOINT(line.`O_PRD_SPG`)});

You can adding properties from maps. For example:
LOAD CSV WITH HEADERS FROM "http://data.neo4j.com/northwind/products.csv" AS row
MERGE (P:Product {productID: row.productID})
SET P += row
http://neo4j.com/docs/developer-manual/current/cypher/clauses/set/#set-adding-properties-from-maps

The LOAD CSV command cannot perform automatic type conversion to ints on certain fields, that must be done explicitly (though you can avoid having to explicitly mention all other fields by using the map projection feature to transform your line data before setting it via stdob--'s suggestion).
You may want to take a look at Neo4j's import tool, as this will allow you to specify field type in headers, which should perform type conversion for you.
That said, 77 columns is a lot of data to all store on individual nodes. You may want to take another look at your data and figure out if some of those properties would be better modeled as nodes with their own label with relationships to your product nodes. You mentioned some of these were categorical properties. Categories are well suited to be modeled separately as nodes instead of as properties, and maybe some of your other properties would work better as nodes as well.

Can SPSS treat a collection of Nominal Variables as one variable?

I have a lot of movie data from IMDB and I'm in the middle of cleaning up the data and making it so that 1 row = 1 movie as the database often has multiple records for a single film.
I've restructured the data so that what was a single 'Country' variable with multiple cases for a single film, is now a set of 29 country columns. A single film may have up to 29 countries affiliated with it (most have just 1 or 2).
I plan to do some simple descriptive statistics and expected frequencies, perhaps look for correlations with other variables like genre etc.
Is it possible to have SPSS treat all 29 variables as a single variable? It doesn't matter which of the country variables a country is present in, just that it is present in one of them. For example I might want to find all Indian films, and ask SPSS to check for each row, whether 'India' is in any one of the country variables and return the row if it is present in any of them.
Is this possible, or do I just need to manually instruct SPSS with a list of OR commands whenever I run a query.

There are two types of multiple response sets: multiple dichotomy, which would be 29 yes/no variables as you describe, and multiple category, in which you have a list of arbitrary categories. See the MRSETS command for details.
Once defined, CTABLES can do all your statistical calculations on these, and these sets can also be used in graphics constructed in the Chart Builder or GGRAPH commands.
Don't confuse the sets created by MRSETS with the older MULTIPLE RESPONSE procedure, which is still available. MRSETS definitions persist with the data and are used with CTABLES and GGRAPH only.
With the ANY function, as Andy said above, you would use the individual variables, but you can use TO. So, for example, you could write
COMPUTE FILM7 = ANY(7, f1 to f29)
if you have MC variables. If using the MD structure, you would have to check, say, variable f7 in this example.

Large file merging problems in SPSS

I have a large dataset of over 4000 cases with over 500 variables. I want to add this set of variables to another dataset containing most of the same cases but only around 10 variables.
Both of the datasets contain an ID variable that allows me to match the cases. The larger dataset is a keyed table because there are cases in there that aren't in the smaller set and are therefore of no interest to me.
I'm very comfortable with merging the files but my problem arises when I look at the new dataset. The variables are in there but all the values turn up missing. This only applies to the variables that were added to the active dataset. I checked to see if the key variable had any duplicates and it didn't.
I wonder why this happens, and if there is a way to fix this?
I can add that I have done this very often before without this problem.

Merging files in spss

Merging files in spss
Hi,
I have a problem in merging files. Here's what I need to do: I have chosen 200 cases from 7000 in ArcMap (GIS-program). In the process I have lost some of the cases' variable information.
Now I would like to get the variables back to my smaller dataset, and I used data-> merge files > add variables, and ID as match, match cases on > keyvariables in sorted files > both files provide cases.
This gave a dataset of all the 7000 cases, only the variables already existed in the first table didn't add to the merged dataset. I tried also all different choises, but none of them gave me the result I wanted. This would be the 200 cases added with the variables that were lost in the process.
So in a nutshell how do I merge/replace the info from variables A (dataset) to variables B(dataset) without extra cases´ from A (only the info of the selected 200 cases´out of 7000)?

Out of hand:
Create a new variable in the reduced DataSet with the Value of 1.
Match the files.
Sort by the new variable.
Delete all cases who don't have the value 1 on this variable.

I don't see why you are choosing both files provide cases. You want to use the 7000-case file as a keyed table using ID as the key and match it with the 200-case file, which provides all the cases. Assuming that you select all the variables from the large file that you want, this should give you the desired result.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

How do I programmatically merge cases from datasets with conflicting variable names? - spss

Related

Merging and appending data on SPSS

How to define large set of properties of a node without having to type them all?

Can SPSS treat a collection of Nominal Variables as one variable?

Large file merging problems in SPSS

Merging files in spss

Categories

Resources