Product co-purchasing or bundles given a product ID - analysis

I have a data(amazon co-purchasing product) in two columns with values as product ID. I would like to select values from 100 - 299, 300-399, 400-999 and others values and group them. I want to create a bundle or co-purchasing between product in one group with another eg. 100-299 and 300-399, 400-999 and 100-299. The original data has two columns with FromNode and ToNode. Below are few lines of the original data. Some values(product ID) appear under both columns.
FromNode ToNode
0 1
0 2
0 3
0 4
0 5
1 0
1 2
1 4
1 5
1 15
2 0
2 11
2 13
2 14
3 65
3 66
3 67
I am using
df[df[['FromNode', 'ToNode']].isin([100,101,102...299]).any(1)]
to pick the values in the range but it seems I have to list all the values in the isin argument. Is there an efficient way to just give the range 100-299 to the isin(100-299) to fetch the values. Should just combine both columns into one and use iloc to select the values. Any tips will help.

Related

SPSS selection of same ID rows based on difference between rows

I have a dataset where I have rows of data for each ID. Each row reflects a different time each ID has accessed the website. I have also created a variable which tells me how many months there were between each visit. I want to select all the cases from time 1 to last time value for each ID if they have returned after at least 1 month. What do I do?
ID Time MonthSince
1 1 .
1 2 0
2 1 .
2 2 1
3 1 .
3 2 0
I would like the dataset to look as follows:
ID Time MonthSince Filter
1 1 . Not Selected
1 2 0 Not Selected
2 1 . Selected
2 2 1 Selected
3 1 . Not Selected
3 2 0 Not Selected
What I suggest is calculate the total number of months in MonthSince. If this total is zero, we know there wasn't more then a month before the last visit and we can filter these cases out:
aggregate outfile=* mode=addvariables/break=ID/TotMonths=sum(MonthSince).
select if TotMonths>0.

ARRAYFORMULA with repetition

I have two columns of data, and would like to distribute the elements of one of these columns over several rows. I can easily calculate the index of the element I need, but cannot figure out how to access the element.
A B Desired output Formula for index: =ARRAYFORMULA(IF(A:A,CEILING(ROW(A:A)/3+1),""))
1 11 22 2
2 22 22 2
3 33 22 2
4 44 33 3
5 33 3
6 33 3
7 44 4
How can I modify my formula for the index so that it yields the item of column B at the calculated index?
I tried =ARRAYFORMULA(IF(A:A, INDEX(B:B, CEILING(ROW(A:A)/3+1), 1), "")) but that only repeats the first element (22) 7 times.
Use Vlookup instead of Index:
=ARRAYFORMULA(IF(A:A,vlookup(CEILING(ROW(A:A)/3+1),A:B,2),""))
EDIT
It isn't necessary to use a key column, you could use something like this:
=ARRAYFORMULA(vlookup(CEILING(sequence(counta(B:B)*3)/3+1),{row(B:B),B:B},2))
assuming you wanted to generate three rows for each non-blank row in column B not counting the first one.
Or if you want to be different, use a concatenate/split approach:
=ArrayFormula(flatten(split(rept(filter(B:B,B:B<>"",row(B:B)>1)&"|",3),"|")))
(all the above assume you want to ignore the first row in col B and start with 22).

Merging different parts of one file - based on a variable in the file

I have a data file that looks like the first picture, I am reading it in to SPSS using FILE TYPE MIXED so that it looks like the second picture. How can I merge the cases based on the ID variable so that cases with the same ID variable are merged? The variable Age is repeated, so it does not matter which is selected, but it would be good if it were possible to select the first value.
Here is an example of the code I am using to read the data:
FILE TYPE MIXED RECORD=RecordID 1
/ WILD =WARN.
RECORD TYPE 1.
DATA LIST
/ ID 8-9 JobType 3-4 Age 5-7.
RECORD TYPE 2.
DATA LIST
/ ID 3-4 Sex 11 Salary 5-8.
RECORD TYPE 3.
DATA LIST
/ ID 6-7 Age 8-10 Hiring 3-5.
END FILE TYPE.
BEGIN DATA
1 1 39 1
1 3 27 2
1 2 27 3
1 3 25 4
2 1 9000 0
2 2 7500 0
2 3 4750 1
2 4 7250 1
3 76 1 39
3 98 2 27
3 8 3 27
3 44 4 25
END DATA.
LIST.
This should work:
sort cases by ID RecordID.
casestovars id=ID/index=RecordID.
If the ages are identical they collapse into one column. If they aren't, you'll get three age columns, and you'll be able to choose the one you prefer.

ArrayFormula retrieve list by multiple criterias

Input
Data sheet
TaskId ClientId Canceled
1 1 0
2 1 0
3 1 0
4 2 0
5 2 1
6 2 0
7 3 0
Report sheet
ClientId
1
1
2
3
Desired Output
Arrayformula to get all TaskIds from Data by clients where Canceled = 0
TaskIds
1
2
3
1
2
3
4
6
7
I have join + filter formula to drag down, which gives me all TaskIds for clients:
ClientId TaskIds
1 1,2,3
1 1,2,3
2 4,6
3 7
Then I get my result from this helper_column:
=transpose(split(join(",", helper_colum)))
And I want to make this work without need to drag down.
Try this:
=ARRAYFORMULA(TRANSPOSE(SPLIT(CONCATENATE("🍻"&TRANSPOSE(IF(TRANSPOSE(A11:A14)=B2:B8,IF(C2:C8=0,A2:A8,""),""))),"🍻")))
A11:A14=Report sheet Client ID.
A2:C8=Data sheet values.
Cheers 🍻
In the 'Report' tab this spreadsheet, cell B2 I entered
=arrayformula(vlookup(A2:A5&"", regexreplace({unique(filter(Data!B2:B, Data!C2:C=0))&"", trim(transpose(query(if((transpose(unique(filter(Data!B2:B, Data!C2:C=0)))=filter(Data!B2:B, Data!C2:C=0))*len(filter(Data!B2:B, Data!C2:C=0)),filter(Data!A2:A, Data!C2:C=0)&",",),,50000)))},",$", ), 2, 0))

Using sum when grouping one more columns in LINQ

I have 2 tables like below.
For comments to vote
VoteId VoteValue UserId CommentId DateAdded
1 1 1 1 10/11/2013
2 1 5 1 10/14/2013
3 1 9 2 09/08/2013
4 1 11 3 01/03/2014
For users that take point values
PointId Date PointValue UserId
1 10/11/2013 1 1
2 10/14/2013 1 5
3 09/08/2013 1 9
4 01/03/2014 1 11
I should find 10 users that most taken votes each month in all comments. Firstly I try to write LINQ like that;
var object = (db.Comments.
Where(c => c.ApplicationUser.Id == comment.ApplicationUser.Id).
FirstOrDefault()).ToList();
I can't use sum and add points to my table. Any helps?
I hope it's clear.
First you should extract mounth from datetime value, then group by month descending and also take sum of all coments and use Take(10) at the end.

Resources