Merging files in spss
Hi,
I have a problem in merging files. Here's what I need to do: I have chosen 200 cases from 7000 in ArcMap (GIS-program). In the process I have lost some of the cases' variable information.
Now I would like to get the variables back to my smaller dataset, and I used data-> merge files > add variables, and ID as match, match cases on > keyvariables in sorted files > both files provide cases.
This gave a dataset of all the 7000 cases, only the variables already existed in the first table didn't add to the merged dataset. I tried also all different choises, but none of them gave me the result I wanted. This would be the 200 cases added with the variables that were lost in the process.
So in a nutshell how do I merge/replace the info from variables A (dataset) to variables B(dataset) without extra casesĀ“ from A (only the info of the selected 200 casesĀ“out of 7000)?
Out of hand:
Create a new variable in the reduced DataSet with the Value of 1.
Match the files.
Sort by the new variable.
Delete all cases who don't have the value 1 on this variable.
I don't see why you are choosing both files provide cases. You want to use the 7000-case file as a keyed table using ID as the key and match it with the 200-case file, which provides all the cases. Assuming that you select all the variables from the large file that you want, this should give you the desired result.
Related
I am collecting data on cspro and export it to spss and I want append new data on to the old data, so I won't have the same files that I worked on on my last data.
Is there any syntax to sort that out?
If you are looking for a way to add up two SPSS files (after exporting your new data to a new file), the syntax is:
add files
/file="path1/filename1.sav"
/file="path2/filename2.sav".
you may have trouble if the same string variables have different widths in the two files. If so you need to choose the appropriate width and force it on all relevant variables before adding the files:
get file="path1/filename1.sav".
alter type stringVar1 (a50) stringVar2 (a150).
dataset name fil1.
get file="path2/filename2.sav".
alter type stringVar1 (a50) stringVar2 (a150).
dataset name fil2.
add files /file=fil1 /file=fil2.
execute.
I have imported a csv file into neo4j. I have been trying to define a large number of properties (all the columns) for each node. How can i do that without having to type in each name?
I have been trying this:
USING PERIODIC COMMIT
load csv WITH headers from "file:///frozen_catalog.csv" AS line
//Creating nodes for each product id with its properties
CREATE (product:product{id : line.`o_prd`,
Gross_Price_Average: TOINT(line.`Gross_Price_Average`),
O_PRD_SPG: TOINT(line.`O_PRD_SPG`)});
You can adding properties from maps. For example:
LOAD CSV WITH HEADERS FROM "http://data.neo4j.com/northwind/products.csv" AS row
MERGE (P:Product {productID: row.productID})
SET P += row
http://neo4j.com/docs/developer-manual/current/cypher/clauses/set/#set-adding-properties-from-maps
The LOAD CSV command cannot perform automatic type conversion to ints on certain fields, that must be done explicitly (though you can avoid having to explicitly mention all other fields by using the map projection feature to transform your line data before setting it via stdob--'s suggestion).
You may want to take a look at Neo4j's import tool, as this will allow you to specify field type in headers, which should perform type conversion for you.
That said, 77 columns is a lot of data to all store on individual nodes. You may want to take another look at your data and figure out if some of those properties would be better modeled as nodes with their own label with relationships to your product nodes. You mentioned some of these were categorical properties. Categories are well suited to be modeled separately as nodes instead of as properties, and maybe some of your other properties would work better as nodes as well.
could you please advise how to build "if statement" in SPSS Modeler if we have two data sources?
One data source (1) is a table (an output node generated by SPSS Modeler) where all the IDs are listed with which we need to work further.
Another data source (2) is an Excel file where all the IDs are listed whereas this list includes some IDs from (1) but also some additional ones - to all these IDs are assigned values that are needed to be added to the data source (1) not necessarily to the table.
So if the ID from (1) is in (2) we would like to assign a value from (2) to the ID in (1) and have it stored in some table or even better in a file.
Thank you very much for your help / advice.
Patricia
Based on your problem it sounds like you want to merge these datasets. This can be easily done in Modeler via the Merge Node, just make sure the variables have the same name or Modeler won't recognize it as a key. You can see an example here
You can also create a flag variable using the Derive node, see example here
You will have to use the Merge Node to combine the 2 datasets but you don't have to give the same name for the keys IDs. You can use the option condition in the Merge Node without the necessity of having the same name and even the same type of variable.
Syntax example for the merge Node - option condition: 'ID' = 'id'
I want to add cases from many SPSS dataset to one SPSS dataset.
Here's my code:
DATASET ACTIVATE DataSet1.
ADD FILES /FILE=*
/FILE='Path\to\dataset.sav'.
EXECUTE.
But I get this error: Mismatched variable types on the input files.
I want SPSS to ignore the conflicting columns and add cases only from the columns where there is no conflict.
How do I do this?
This occurs because variables of the same name in the two different data sources have either different format types (STRING, NUMERIC, DATE ect) or either they are both STRINGS but of different length.
The latter, string variables of different lenghts, can be solved like this:
DATA LIST FREE / V(A1).
BEGIN DATA.
a b c
END DATA.
DATASET NAME DS1.
DATA LIST FREE / V(A2).
BEGIN DATA.
1 2 3
END DATA.
DATASET NAME DS2.
STATS ADJUST WIDTHS VARIABLES=ALL WIDTH=MAX /FILES DS1 DS2.
DATASET ACTIVATE DS1.
ADD FILES FILE=* /FILE=DS2.
However, if you have mismatch of different format types then that is a tad more complicated to solve due to many different permutations, so you would probably want to asses which variables are problematic and harmonize/delete them before merging files. Probably worth carrying out this exercise nonetheless as having same variable names with different format type could be signs of erroneous data.
If you know which variables conflict, you can use the KEEP subcommand to select the others, or you can use the RENAME command to assign new names and adjust the results afterwards.
If you need to harmonize the names and the issue is something like differing string lengths for variables that should be the same, the STATS ADJUST WIDTHS extension command can harmonize the widths before you merge.
I have a large dataset of over 4000 cases with over 500 variables. I want to add this set of variables to another dataset containing most of the same cases but only around 10 variables.
Both of the datasets contain an ID variable that allows me to match the cases. The larger dataset is a keyed table because there are cases in there that aren't in the smaller set and are therefore of no interest to me.
I'm very comfortable with merging the files but my problem arises when I look at the new dataset. The variables are in there but all the values turn up missing. This only applies to the variables that were added to the active dataset. I checked to see if the key variable had any duplicates and it didn't.
I wonder why this happens, and if there is a way to fix this?
I can add that I have done this very often before without this problem.