SPSS: aggregate and count different values - spss

In SPSS i have a variabele with a lot of different values (8 figure number; 00000000). Every row is a person. I want to aggregate this data on postal area and count the number of different values in a postal area. Is there a way?
Result within a postal area should be 1 to N : 1 = every person has the same value, N = every person has a different value

Aggregate in two steps. Assuming your dataset name is data1, with variables var1 (the variable of interest) and postalcode, I would do this:
Create a dataset step1, with one row for each combination of values of postalcode and var1. Also possible by using the command casestovars.
dataset declare step1.
dataset activate data1.
aggregate outf=step1 /break=postalcode var1 /n=n(var1).
Create a dataset result with one row for each postalcode, and a variable n for the number of rows from the previous dataset step1.
dataset declare result.
dataset activate step1.
aggregate outf=result /break=postalcode /n=n(var1).
So, in conclusion: first break by both of the variables, then break only by the variable of postal code. This should do the trick!

Related

how to select SpatRaster layers from their names?

I've got a SpatRaster of (150 x 150 x 1377) that shows temporal evolution of precipitations. Each layer is a given hour in a 2-month interval, but some hours are missing, and the dataset isn't continuous. The layers names are strings as "YYYYMMDDhhmm".
I need to find the mean value every three hours even on whole intervals or on missing-data intervals. On entire ones I want to average three data and on missing-data ones I would like to average two of them or, if two are missing, to select the unique value as the averaged one.
How can I use data names to select how to act?
I've already tried this code but I'm averaging on three continuous layers by index and not by hours. How can I convert names in DateTime form from "tidyverse" in order to use rollapply() to see if two steps back I find the DateTime I am expecting? Is there any other method to check this out?
HSAF=rast(c((paste0(resfolder, "HSAF_final1_5.tif")),(paste0(resfolder, "HSAF_final6_10.tif")),(paste0(resfolder, "HSAF_final11_15.tif")),
(paste0(resfolder, "HSAF_final16_20.tif")),(paste0(resfolder, "HSAF_final21_25.tif")),(paste0(resfolder, "HSAF_final26_30.tif")),
(paste0(resfolder, "HSAF_final31_N04.tif")),(paste0(resfolder, "HSAF_finalN05_N08.tif")),(paste0(resfolder, "HSAF_finalN09_N13.tif")),
(paste0(resfolder, "HSAF_finalN14_N18.tif")),(paste0(resfolder, "HSAF_finalN19_N23.tif")),(paste0(resfolder, "HSAF_finalN24_N28.tif")),
(paste0(resfolder, "HSAF_finalN29_N30.tif"))))
index=names(HSAF)
j=2
for (i in seq(1,3, by=3))
{third_el<- HSAF[index[i+j]]
second_el <- HSAF[index[i+j-1]]
first_el<- HSAF[index[i+j-2]]
newraster<- c(first_el, second_el, third_el)
newraster<- mean(newraster, filename=paste0(tempfile(), ".tif"))
names(newraster)<- paste0(index[i+j-2],index[i+j-1],index[i+j])
}
for (i in seq(4,1374 , by=3))
{ third_el<- HSAF[index[i+j]]
second_el <- HSAF[index[i+j-1]]
first_el<- HSAF[index[i+j-2]]
subraster<- c(first_el, second_el, third_el)
subraster<- mean(subraster, filename=paste0(tempfile(), ".tif"))
names(subraster)<- paste0(index[i+j-2],index[i+j-1],index[i+j])
add(newraster)<- subraster
}

SPSS - How to create a 'Totals' row (not a column)

I have a dataset like this:
Program Timely_Count Total_Count
PROG1 51,761 53,356
PROG2 232,371 235,769
PROG3 100,756 110,859
PROG4 25,713 36,309
PROG5 17,985 18,995
PROG6 24,673 24,732
I want to create a "Total" row (not a column) so when I save this into Excel I will have a table that looks like this:
Program Timely_Count Total_Count
PROG1 51,761 53,356
PROG2 232,371 235,769
PROG3 100,756 110,859
PROG4 25,713 36,309
PROG5 17,985 18,995
PROG6 24,673 24,732
TOTAL 453,259 480,020
I know I can use the AGGRAGATE function to add a TOTALS column, but that does not format the dataset the way I need for this report.
I also need this in syntax since it is run multiple times per day on multiple datasets. I have SPSS version 22. (If any of that helps.) –
first you aggregate, then add the aggregated results back to your original table.
First let's recreate your sample data:
data list list/Program (a20) Timely_Count Total_Count (2f8).
begin data
PROG1 51,761 53,356
PROG2 232,371 235,769
PROG3 100,756 110,859
PROG4 25,713 36,309
PROG5 17,985 18,995
PROG6 24,673 24,732
end data.
Now run this:
dataset name OrigData.
dataset declare tot.
aggregate /out='tot'/break = /Timely_Count Total_Count=sum(Timely_Count Total_Count).
add files /file=*/file=tot.
recode program (""="TOTAL").

How to apply content based filtering in ne04j

I have a data in below format where 1st column represents the products node, all the following columns represent properties of the products. I want to apply content based filtering algo using cosine similarity in Neo4j. For that, I believe, I need to define the fx columns as the properties of each product node and then call these properties as a vector and then apply cosine similarity between the products. I am having trouble doing two things:
1. How to define these columns as properties in one go(as the columns could be more than 100).
2. How to call all the property values as a vector to be able to apply cosine similarity.
Product f1 f2 f3 f4 f5
P1 0 1 0 1 1
P2 1 0 1 1 0
P3 1 1 1 1 1
P4 0 0 0 1 0
You can use LOAD CSS to input your data.
For example, this query will read in your data file and output for each input line (ignoring the header line) a name string and a props collection:
LOAD CSV FROM 'file:///data.csv' AS line FIELDTERMINATOR ' '
WITH line SKIP 1
RETURN HEAD(line) AS name, [p IN TAIL(line) | TOFLOAT(p)] AS props
Even though your data has a header line, the above query skips over it, as it is not needed. In fact, we don't want to use the WITH HEADERS option of LOAD CSV, since that would convert each data line into a map, whereas it is more convenient for our current purposes to get each data line as a collection of values.
The above query assumes that all the columns are space-separated, that the first column will always contain a name string, and that all other columns contain the numeric values that should be put into the same collection (named props).
If you replace RETURN with WITH, you can append additional clauses to the query that make use of the name and props values.

SPSS merge datasets with add variables only links 1 case

I have the following syntax to merge two datasets. I expect that the resulting dataset (test1) contains 5 cases with 4 of them (2 to 5) a value in variable set2.
The result I am getting is dataset test1 with 5 cases but only 1 of them (case with id 5) has a value in variable set2.
Do I need to contact my ICT department, or am I misunderstanding something about merging data in SPSS. I am used to working with SAS, R and SQL, but need to help someone with a data merging within SPSS
INPUT PROGRAM.
LOOP id=1 to 5.
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.
COMPUTE set1 = RV.NORMAL(1,1).
EXECUTE.
DATASET NAME test1.
INPUT PROGRAM.
LOOP id=2 to 5.
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.
COMPUTE set2 = RV.NORMAL(1,1).
EXECUTE.
DATASET NAME test2.
DATASET ACTIVATE test1.
STAR JOIN
/SELECT t0.set1, t1.set2
/FROM * AS t0
/JOIN 'test2' AS t1
ON t0.id=t1.id
/OUTFILE FILE=*.
results in:
id set1 set2
1,00 1,74
2,00 1,58
3,00 1,01
4,00 ,12
5,00 2,52 ,79
SPSS version 21
When I run the syntax you provide I get the desired results (and not what you indicate):
If it continues to fail (after contacting SPSS support), try using MATCH FILES:
DATASET ACTIVATE test1.
SORT CASES BY ID.
DATASET ACTIVATE test2.
SORT CASES BY ID.
MATCH FILES FILE=test1 /FILE=test2 /BY ID.
DATASET NAME Restult.

Sort all the cases of specific variable in descending order but other will remain same using SPSS Syntax

I have two variables (id and Var1) in SPSS as below. I want to sort Var1 as descending order but other variables do not change accordingly with Var1. i.e. other variable will remain same as before sort.
My data is...
id Var1
-- ----
M-1 3
M-2 4
M-3 2
M-4 7
But I want like this..
id Var1
-- ----
M-1 7
M-2 4
M-3 3
M-4 2
My Syntax/code is...
data list list
/id(A3) Var1(F2.0).
begin data.
M-1 3
M-2 4
M-3 2
M-4 7
end data.
sort cases by BY Var1(D).
execute.
When I run this code it also sort id according to Var1. But I do not want to expand this sort command for entire variables. I only want to sort for current selection variable in SPSS.
Can anyone help using SPSS Syntax?
You Could split the dataset sort the Var1 variable and then merge them together. One way to do so would be this:
* create data.
data list list
/id(A3) Var1(F2.0).
begin data.
M-1 3
M-2 4
M-3 2
M-4 7
end data.
DATASET NAME ids.
DATASET COPY sortvar.
* Delete sort variable (Var1) from dataset "ids".
DELETE VARIABLES Var1.
* Keep only sort variable in dataset "sortvars".
DATASET ACTIVATE sortvar.
DELETE VARIABLES id.
* sort Var1.
SORT CASES BY Var1(D).
* Merge datasets.
MATCH FILES
/FILE ids
/FILE sortvar.
EXECUTE.
If you have lots of variables to delete in the sortvar dataset you could also use the MATCH CASES command:
* Delete all variables but Var1.
DATASET ACTIVATE sortvar.
MATCH CASES
/FILE *
/KEEP Var1.
Alternativly you can use the SAVE command in combination with the KEEP or DROP options in order to split the dataset.

Resources