Merging different parts of one file - based on a variable in the file - spss

I have a data file that looks like the first picture, I am reading it in to SPSS using FILE TYPE MIXED so that it looks like the second picture. How can I merge the cases based on the ID variable so that cases with the same ID variable are merged? The variable Age is repeated, so it does not matter which is selected, but it would be good if it were possible to select the first value.
Here is an example of the code I am using to read the data:
FILE TYPE MIXED RECORD=RecordID 1
/ WILD =WARN.
RECORD TYPE 1.
DATA LIST
/ ID 8-9 JobType 3-4 Age 5-7.
RECORD TYPE 2.
DATA LIST
/ ID 3-4 Sex 11 Salary 5-8.
RECORD TYPE 3.
DATA LIST
/ ID 6-7 Age 8-10 Hiring 3-5.
END FILE TYPE.
BEGIN DATA
1 1 39 1
1 3 27 2
1 2 27 3
1 3 25 4
2 1 9000 0
2 2 7500 0
2 3 4750 1
2 4 7250 1
3 76 1 39
3 98 2 27
3 8 3 27
3 44 4 25
END DATA.
LIST.

This should work:
sort cases by ID RecordID.
casestovars id=ID/index=RecordID.
If the ages are identical they collapse into one column. If they aren't, you'll get three age columns, and you'll be able to choose the one you prefer.

Related

Product co-purchasing or bundles given a product ID

I have a data(amazon co-purchasing product) in two columns with values as product ID. I would like to select values from 100 - 299, 300-399, 400-999 and others values and group them. I want to create a bundle or co-purchasing between product in one group with another eg. 100-299 and 300-399, 400-999 and 100-299. The original data has two columns with FromNode and ToNode. Below are few lines of the original data. Some values(product ID) appear under both columns.
FromNode ToNode
0 1
0 2
0 3
0 4
0 5
1 0
1 2
1 4
1 5
1 15
2 0
2 11
2 13
2 14
3 65
3 66
3 67
I am using
df[df[['FromNode', 'ToNode']].isin([100,101,102...299]).any(1)]
to pick the values in the range but it seems I have to list all the values in the isin argument. Is there an efficient way to just give the range 100-299 to the isin(100-299) to fetch the values. Should just combine both columns into one and use iloc to select the values. Any tips will help.

SPSS selection of same ID rows based on difference between rows

I have a dataset where I have rows of data for each ID. Each row reflects a different time each ID has accessed the website. I have also created a variable which tells me how many months there were between each visit. I want to select all the cases from time 1 to last time value for each ID if they have returned after at least 1 month. What do I do?
ID Time MonthSince
1 1 .
1 2 0
2 1 .
2 2 1
3 1 .
3 2 0
I would like the dataset to look as follows:
ID Time MonthSince Filter
1 1 . Not Selected
1 2 0 Not Selected
2 1 . Selected
2 2 1 Selected
3 1 . Not Selected
3 2 0 Not Selected
What I suggest is calculate the total number of months in MonthSince. If this total is zero, we know there wasn't more then a month before the last visit and we can filter these cases out:
aggregate outfile=* mode=addvariables/break=ID/TotMonths=sum(MonthSince).
select if TotMonths>0.

ARRAYFORMULA with repetition

I have two columns of data, and would like to distribute the elements of one of these columns over several rows. I can easily calculate the index of the element I need, but cannot figure out how to access the element.
A B Desired output Formula for index: =ARRAYFORMULA(IF(A:A,CEILING(ROW(A:A)/3+1),""))
1 11 22 2
2 22 22 2
3 33 22 2
4 44 33 3
5 33 3
6 33 3
7 44 4
How can I modify my formula for the index so that it yields the item of column B at the calculated index?
I tried =ARRAYFORMULA(IF(A:A, INDEX(B:B, CEILING(ROW(A:A)/3+1), 1), "")) but that only repeats the first element (22) 7 times.
Use Vlookup instead of Index:
=ARRAYFORMULA(IF(A:A,vlookup(CEILING(ROW(A:A)/3+1),A:B,2),""))
EDIT
It isn't necessary to use a key column, you could use something like this:
=ARRAYFORMULA(vlookup(CEILING(sequence(counta(B:B)*3)/3+1),{row(B:B),B:B},2))
assuming you wanted to generate three rows for each non-blank row in column B not counting the first one.
Or if you want to be different, use a concatenate/split approach:
=ArrayFormula(flatten(split(rept(filter(B:B,B:B<>"",row(B:B)>1)&"|",3),"|")))
(all the above assume you want to ignore the first row in col B and start with 22).

merge two data set and paste one column

I have two data sets and I need to merge them in a special way below.
My First data set
data1=data.frame(store=c(12,13),product=c(1,2))
data1
store product
12 1
13 2
data2=data.frame(product=c(1,1,2,2,2),promo=c("promo1","promo2","promo1","promo2","promo3"))
data2
product promo
1 promo1
1 promo2
2 promo1
2 promo2
2 promo3
Desired data set below;
store product numberofpromo promo
12 1 2 promo1;promo2
13 2 3 promo1;promo2;promo3
Thank you
data=data.frame(data2%>%
group_by(product) %>%
summarise(promotion=paste(promo,collapse=";"),
promo_say=n()))
After finding this, you need to the join

Join two tables and return data from both

I have three tables as.
**Table 1** **Table 2** **Table 3**
Lot_no(pk) Lot_no(pk/fk) Lot_no(fk)
Name job type Material
Phone Printing qty Trim
look at sample data
**Table 1** **Table 2** **Table 3**
1 Mian Sultan xyz 1 Reverse 50,000pcs 1 PVC 20
2 Mian Usman xyz 2 New 10,000pcs 1 INK 30
2 MILKY 25
2 INK 35
I just want to show data from table 2 & table 3 on the basis of lot_no.
for example user enter lot_no=1 then result should be displayed as
1 Reverse 50,000pcs
1 PVC 20
1 INK 30
if user enter lot_no=2 then similarly
2 New 10,000pcs
2 MILKY 25
2 INK 35
my query is as follows....
#lotnum int (Variable declaration in stored procedure)
SELECT table2.lot_no, table2.job_type, table2.printing qty,
table3.material, table3.trim
FROM table2
INNER JOIN table3 ON (table2.lot_no=table3.lot_no)
WHERE table2.lot_no=#lotnum AND table3.lot_no=#lotnum;
it shows me the Correct result but when i use this in Crystal Report it Shows only..... when lot_no=1
1 Reverse 50,000pcs
1 PVC 20
it don't show
1 INK 30
Similar case when lot_no=2.
Please Guide me thanks.

Resources