rails group_by_day_of_week and sort - ruby-on-rails

I have 2 models, Member and Slot
and there is middle table between them named MemberSlot for has_many through relationship.
Now user comes and register in a slot.
Slot has field name slot_date.
For metrics on index page I have to show that how many times each member registered for slot in terms of day of week using date field slot_date.
Like
User Monday Tuesday Wednesday Thursday Saturday Sunday Total
---------------------------------------------------------------
Faisal 0 1 1 3 0 1 6
User2 1 0 0 0 1 2 4
User3 0 0 0 0 0 0 0
and than allow user to sort this table like monday or Total.
I tried to do it something like
Slot.joins(member_slots: [:member]).group("members.id").group_by_day_of_week(:slot_date).count
Issue with this is that I am not even able sort this by specific day and not possible by total and than have to fetch member in loop to show
So it not workable.
someone can suggest a proper way to handle this.
Waiting for your response.

Related

Development of a feature per row or from today's date

I have a problem. I want to predict when the customer will place another order in how many days if an order comes in.
I have already created my target variable next_purchase_in_days. This specifies in how many days the customer will place an order again. And I would like to predict this.
Since I have too few features, I want to do feature engineering. I would like to specify how many orders the customer has placed in the last 90 days. For example, I have calculated back from today's date how many orders the customer has placed in the last 90 days.
Is it better to say per row how many orders the customer has placed? Please see below for the example.
So does it make more sense to calculate this from today's date and include it as a feature or should it be recalculated for each row?
customerId fromDate next_purchase_in_days
0 1 2021-02-22 24
1 1 2021-03-18 4
2 1 2021-03-22 109
3 1 2021-02-10 12
4 1 2021-09-07 133
8 3 2022-05-17 61
10 3 2021-02-22 133
11 3 2021-02-22 133
Example
# What I have
customerId fromDate next_purchase_in_days purchase_in_last_90_days
0 1 2021-02-22 24 0
1 1 2021-03-18 4 0
2 1 2021-03-22 109 0
3 1 2021-02-10 12 0
4 1 2021-09-07 133 0
8 3 2022-05-17 61 1
10 3 2021-02-22 133 1
11 3 2021-02-22 133 1
# Or does this make more sense?
customerId fromDate next_purchase_in_days purchase_in_last_90_days
0 1 2021-02-22 24 1
1 1 2021-03-18 4 2
2 1 2021-03-22 109 3
3 1 2021-02-10 12 0
4 1 2021-09-07 133 0
8 3 2022-05-17 61 1
10 3 2021-02-22 133 0
11 3 2021-02-22 133 0
You can address this in a number of ways, but something interesting to consider is the interaction between Date & Customer ID.
Dates have meaning to humans beyond just time keeping. They are associated with emotional, and culturally importance. Holidays, weekends, seasons, anniversaries etc. So there is a conditional relationship between the probability of a purchase and Events: P(x|E)
Customer Ids theoretically represent a single person, or at the very least a single business with a limited number of people responsible for purchasing.
Certain people/corporations are just more likely to spend.
So here are a number of ways to address this:
Find a list of holidays relevant to the users. For instance if they are US based find a list of US recognized holidays. Then create a
feature based on each date: Date_Till_Next_Holiday or (DTNH for
short).
Dates also have cyclical aspects that can encode probability. Day of the > year (1-365), Days of the week (1-7), week numbers (1-52),
Months (1-12), Quarters (1-4). I would create additional columns
encoding each of these.
To address the customer interaction, have a running total of past purchases. You could call it Purchases_to_date, and would be an
integer (0...n) where n is the number of previous purchases.
I made a notebook to show you how to do running totals.
Humans tend to share purchasing patterns with other humans. You could run a k-means cluster algorithm that splits customers into 3-4
groups based on all the previous info, and then use their
cluster-number as a feature. Sklearn-Kmeans
So based on all that you could engineer 8 different columns. I would then run Principle Component Analysis (PCA) to reduce that to 3-4 features.
You can use Sklearn-PCA to do PCA.

Google Sheets (Drop Down list to change data in rows below)

I can't seem to find any information about this particular issue anywhere, unbelievable.
I'm trying to change the rows with Balance, Weekly PnL, Weekly PnL% etc. to go with the weeks.
So the rows in week 1 would have different data from the rows in week 2 and 3 etc.
Can this even be done in Google Sheets?
[
Assuming you have sheet Data with the following format:
Week
#
Amount
Week 1
Balance
1
Week 2
Balance
2
Week 3
Balance
3
Week 1
Weekly PnL
4
Week 2
Weekly PnL
5
Week 3
Weekly PnL
6
...
...
...
=ARRAYFORMULA(IF(A2:A="",,VLOOKUP(A2:A,FILTER(Data!B:C,Data!A:A=B1),2,FALSE)))

Is it possible to sum a value if the substraction of two value on the same row equals something?

I'm trying to build a sheet where I can see how much I have to pay each month.
Let's say I have the following table
Current installment (CI)
Total installments (TI)
Installment amount (IA)
1
3
$100
1
1
$200
2
3
$150
1
3
$75
2
4
$150
1
1
$50
So, the first month would be if TI-CI >= 1, then I will sum that value. For the following month I would do the same but TI-CI >= 2
And the result would be something like this
-
-
1st month debt
$475 (the result of 100+150+75+100)
2nd month debt
$325 (the result of 100+75+150)
3rd month debt
$100
Is this possible at all?
try:
=IFNA(SUM(FILTER(C$2:C, (B$2:B-A$2:A)>=ROW(A1))))
and drag down

Conditional array formulas

I have a massive dataset and am preparing a dashboard based on this dataset.
On my dashboard, I have a drop-down menu that allows me to select a month of my choice, from Jan to Apr.
Visitor Jan Feb Mar Apr
Jenny 2 3 0 1
Peter 2 0 1 3
Charley 0 2 4
Charley 1 2 2 3
Sam 1 4 2 3
Peter 2 2 5 0
John 3 3 6 9
Robin 4 0 7 0
I am looking for a formula that will give me the number of unique visitors who have been active at least once in the month that I choose from the drop-down menu.
Hoping this is really clear, but if not, please feel free to shoot back your questions.
This may be easier with Excel 2013, but if the results you want from your example are 6, 5, 5, and 5 for Jan>April respectively then perhaps:
Create a PivotTable from multiple consolidation ranges (example how here and for VALUES choose Sum of Value.
Count the non-zero values in the PT by column with a formula such as:
=COUNTIF(H5:H10,">"&0)
The above however would not be convenient for repetition each month, though a whole year might be prepared at one time.

SPSS dataset restructuring involving variable for survey completion date

I'm using SPSS and have a dataset comprised of individuals' responses to a survey question. This is longitudinal data, so the subjects have taken the survey at least twice and some as many as four or five times.
My variables are ID (scale), date of survey completion (date - dd-mmm-yyyy), and response to survey question (scale).
The dataset is sorted by ID then date (ascending). Each date corresponds to survey time 1, time 2, etc. What I would like to do is compute a new variable time that corresponds to the survey completion dates for a particular participant. I would then like to use that variable to complete a long-to-wide restructuring of the dataset.
So, I'd like to accomplish the following and am not sure how to go about doing it:
1) I have something like this:
ID Date Assessment_Answer
----------------------------------
1 01-Jan-2009 4
1 01-Jan-2010 1
1 01-Jan-2011 5
2 15-Oct-2012 6
2 15-Oct-2012 0
2) Want to compute another variable that would give me this:
ID Date Assessment_Answer Time
-----------------------------------------
1 01-Jan-2009 4 Time1
1 01-Jan-2010 1 Time2
1 01-Jan-2011 5 Time3
2 15-Oct-2012 6 Time1
2 15-Oct-2013 0 Time2
3) And restructure so that I have something like this:
ID Time1 Time2 Time3 Time4
--------------------------
1 4 1 5
2 6 0
You can use sequential case processing to create a variable that is a counter within each ID. So for example:
*Making fake data.
DATA LIST FREE / ID (F1.0) Date (DATE10) Assessment_Answer (F1.0).
BEGIN DATA
1 01-Jan-2009 4
1 01-Jan-2010 1
1 01-Jan-2011 5
2 15-Oct-2012 6
2 15-Oct-2012 0
END DATA.
*Making counter within ID.
SORT CASES BY Id Date.
DO IF ($casenum = 1) OR (Id <> LAG(ID)).
COMPUTE Time = 1.
ELSE.
COMPUTE Time = LAG(Time) + 1.
END IF.
FORMATS Time (F2.0).
EXECUTE.
Now you can use CASESTOVARS to reshape the data like you requested.
CASESTOVARS
/ID = Id
/INDEX = Time
/DROP Date.

Resources