I have a massive dataset and am preparing a dashboard based on this dataset.
On my dashboard, I have a drop-down menu that allows me to select a month of my choice, from Jan to Apr.
Visitor Jan Feb Mar Apr
Jenny 2 3 0 1
Peter 2 0 1 3
Charley 0 2 4
Charley 1 2 2 3
Sam 1 4 2 3
Peter 2 2 5 0
John 3 3 6 9
Robin 4 0 7 0
I am looking for a formula that will give me the number of unique visitors who have been active at least once in the month that I choose from the drop-down menu.
Hoping this is really clear, but if not, please feel free to shoot back your questions.
This may be easier with Excel 2013, but if the results you want from your example are 6, 5, 5, and 5 for Jan>April respectively then perhaps:
Create a PivotTable from multiple consolidation ranges (example how here and for VALUES choose Sum of Value.
Count the non-zero values in the PT by column with a formula such as:
=COUNTIF(H5:H10,">"&0)
The above however would not be convenient for repetition each month, though a whole year might be prepared at one time.
Related
Hi everybody I am trying to query an already formatted google sheets, I am able to filter some of those data (I used =query(x,select * where ... )). The output I get is the following:
may
may
june
june
july
july
july
planned
name
1
0
1
1
2
3
1
Now I want to refer to all the numbers under may (or june or july) in order to do some operation. I can' t just select the value I want because I need to automate it.
How can I get all the columns containing a specific marker(in my case the name of the month)? If it is not possible can you suggest me a different way to do that ? (I am not very experienced with google sheets or excel)
Since query can't select rows, you'd transpose it first and then select the columns you want and then retranspose it back, if needed:
Input:
may
may
june
june
july
july
july
planned
name
1
0
1
1
2
3
1
Formula(select columns >0):
=QUERY(TRANSPOSE(A27:I28),"Select * where Col2>0")
Output:
planned name
may
1
june
1
june
1
july
2
july
3
july
1
I have a problem. I want to predict when the customer will place another order in how many days if an order comes in.
I have already created my target variable next_purchase_in_days. This specifies in how many days the customer will place an order again. And I would like to predict this.
Since I have too few features, I want to do feature engineering. I would like to specify how many orders the customer has placed in the last 90 days. For example, I have calculated back from today's date how many orders the customer has placed in the last 90 days.
Is it better to say per row how many orders the customer has placed? Please see below for the example.
So does it make more sense to calculate this from today's date and include it as a feature or should it be recalculated for each row?
customerId fromDate next_purchase_in_days
0 1 2021-02-22 24
1 1 2021-03-18 4
2 1 2021-03-22 109
3 1 2021-02-10 12
4 1 2021-09-07 133
8 3 2022-05-17 61
10 3 2021-02-22 133
11 3 2021-02-22 133
Example
# What I have
customerId fromDate next_purchase_in_days purchase_in_last_90_days
0 1 2021-02-22 24 0
1 1 2021-03-18 4 0
2 1 2021-03-22 109 0
3 1 2021-02-10 12 0
4 1 2021-09-07 133 0
8 3 2022-05-17 61 1
10 3 2021-02-22 133 1
11 3 2021-02-22 133 1
# Or does this make more sense?
customerId fromDate next_purchase_in_days purchase_in_last_90_days
0 1 2021-02-22 24 1
1 1 2021-03-18 4 2
2 1 2021-03-22 109 3
3 1 2021-02-10 12 0
4 1 2021-09-07 133 0
8 3 2022-05-17 61 1
10 3 2021-02-22 133 0
11 3 2021-02-22 133 0
You can address this in a number of ways, but something interesting to consider is the interaction between Date & Customer ID.
Dates have meaning to humans beyond just time keeping. They are associated with emotional, and culturally importance. Holidays, weekends, seasons, anniversaries etc. So there is a conditional relationship between the probability of a purchase and Events: P(x|E)
Customer Ids theoretically represent a single person, or at the very least a single business with a limited number of people responsible for purchasing.
Certain people/corporations are just more likely to spend.
So here are a number of ways to address this:
Find a list of holidays relevant to the users. For instance if they are US based find a list of US recognized holidays. Then create a
feature based on each date: Date_Till_Next_Holiday or (DTNH for
short).
Dates also have cyclical aspects that can encode probability. Day of the > year (1-365), Days of the week (1-7), week numbers (1-52),
Months (1-12), Quarters (1-4). I would create additional columns
encoding each of these.
To address the customer interaction, have a running total of past purchases. You could call it Purchases_to_date, and would be an
integer (0...n) where n is the number of previous purchases.
I made a notebook to show you how to do running totals.
Humans tend to share purchasing patterns with other humans. You could run a k-means cluster algorithm that splits customers into 3-4
groups based on all the previous info, and then use their
cluster-number as a feature. Sklearn-Kmeans
So based on all that you could engineer 8 different columns. I would then run Principle Component Analysis (PCA) to reduce that to 3-4 features.
You can use Sklearn-PCA to do PCA.
I have long (multiple thousand lines and growing) list of data in Sheets which have a date and additional columns with data. Here's a simplified example of this list (=TAB1):
Date Number Product-ID
02.09.2021 123 1
02.09.2021 2 1
01.09.2021 15 1
01.09.2021 675 2
01.09.2021 45 2
01.09.2021 52 1
31.08.2021 2 1
31.08.2021 78 1
31.08.2021 44 1
31.08.2021 964 2
30.08.2021 1 2
29.08.2021 ...
...
Three remarks:
The date is formatted to European standard DD.MM.YYYY
There definitely is more than one line per day per product (could be a big number depending on the day)
(for the formulas below) In the European standard Sheets uses ; instead of , as in =IF(A;B;C)
In a different tab (=TAB2), I want to add up all the numbers for a unique date for Product-ID 1. So far I've done it like this:
Date Sum (if Product-ID=1)
=UNIQUE('TAB1'!A2:A) =ARRAYFORMULA(SUMIF('TAB1'!A:A&'TAB1'!C:C;A2:A&"1";'TAB1'!B:B))
02.09.2021 125
01.09.2021 67
31.08.2021 124
30.08.2021 1
29.08.2021 ...
...
This works fine so far. Here's what I want to do now:
For every month (here: August and September 2021) I need an additional line above the current date (in this case: above 02.09.2021) AND above a completed month to sum over the whole month for column B. Here's how it should look like:
Date Sum (if Product-ID=1)
September 2021 192
02.09.2021 125
01.09.2021 67
August 2021 125
31.08.2021 124
30.08.2021 1
29.08.2021 ...
Of course, the line for the next day (03.09.2021) should be added above 02.09.2021 and below the sum for the month when it's automatically added to TAB1 on the next day.
I tried to play around with s.th. like =IF(DAY(UNIQUE('TAB1'!A2:A))=1;...;...) but didn't get far.
Is there anyone with an idea how to realize s.th. like this?
You want to learn about QUERY().
in cell A1 of an empty tab.
=QUERY('TAB1'!A2:C,"select A,SUM(B) where C = 1 group by A")
it makes a very big difference whether your product ids are text or numbers. the above was written as if they are numbers, but you might have just been simplifying. If they are text you would write it like this:
=QUERY('TAB1'!A2:C,"select A,SUM(B) where C = '1XYZ' group by A")
note the single quotes.
if the IDs are a MIX of text and letters then you need to force them all to text values in the original data by highlighting the IDs column and choosing Format>Number>Plain Text from the menu bar.
UPDATE:
I understand the requirements better now for intermixing a cumulative month total into the output. This may work.
=ARRAYFORMULA({QUERY({EOMONTH('TAB1'!A2:A,0),'TAB1'!B2:C},"select 'Total',Col1,SUM(Col2) where Col3 = 1 group by 'Total',Col1 label 'Total''',SUM(Col2)''",0);QUERY('TAB1'!A2:C,"select '',A,SUM(B) where C = 1 group by '',A label '''',SUM(B)''",0)},"order by Col2,Col1",0))
I'm trying to build a sheet where I can see how much I have to pay each month.
Let's say I have the following table
Current installment (CI)
Total installments (TI)
Installment amount (IA)
1
3
$100
1
1
$200
2
3
$150
1
3
$75
2
4
$150
1
1
$50
So, the first month would be if TI-CI >= 1, then I will sum that value. For the following month I would do the same but TI-CI >= 2
And the result would be something like this
-
-
1st month debt
$475 (the result of 100+150+75+100)
2nd month debt
$325 (the result of 100+75+150)
3rd month debt
$100
Is this possible at all?
try:
=IFNA(SUM(FILTER(C$2:C, (B$2:B-A$2:A)>=ROW(A1))))
and drag down
I have an easy-to-append monthly purchase log:
month prod count
-----------------
jan water 10
jan bread 20
feb bread 2
feb water 1
And I want to get a friendlier summary table:
prod jan feb
-------------
water 10 1
bread 20 2
Any idea how I can get this raport with new months in log appearing automatically as new columns?
I managed to get the month heads with a =ArrayFormula(TRANSPOSE(UNIQUE(FILTER(log!A2:A, log!A2:A<>"")))) and I am ok with entering the prod column by hand but I only managed to have a formula per column for count. And that means I need to drag the formula with each new month added to the log...
Any ideas? Thanks!
Try this formula:
=QUERY(A:C,"select B, sum(C) where A <> '' group by B pivot A")
See more info here:
https://developers.google.com/chart/interactive/docs/querylanguage
Use number of months instead of names to get 1, 2, 3 from feb, jan ordered alphabetically