I have several columns I would like to plot as percentages. Since I have 10 individual columns for each category, I would like to calculate the percent total. How can I do this? For example, my desired outcome for percentage of staff Dec 1 = count(Staff dec 1)/ total (count of staff dec1 + count staff dec2.....staff count dec10).
Related
I'm modeling out our expansion plans and want to avoid having a row for each new location. Is there a way in Google Sheets to do this via formula?
# of Locations
1
1
2
2
3
3
4
Sales Location 1
1725
1984
2281
2624
3017
3470
3990
Sales Location 2
1725
1984
2281
2624
3017
Sales Location 3
1725
1984
2281
Sales Location 4
1725
Total Sales
1725
1984
4006
4607
7023
8077
11013
I looked at a hlookup and offset attempting to use the first locations sales for each new location, just offset by a random period.
I have a problem. I want to predict when the customer will place another order in how many days if an order comes in.
I have already created my target variable next_purchase_in_days. This specifies in how many days the customer will place an order again. And I would like to predict this.
Since I have too few features, I want to do feature engineering. I would like to specify how many orders the customer has placed in the last 90 days. For example, I have calculated back from today's date how many orders the customer has placed in the last 90 days.
Is it better to say per row how many orders the customer has placed? Please see below for the example.
So does it make more sense to calculate this from today's date and include it as a feature or should it be recalculated for each row?
customerId fromDate next_purchase_in_days
0 1 2021-02-22 24
1 1 2021-03-18 4
2 1 2021-03-22 109
3 1 2021-02-10 12
4 1 2021-09-07 133
8 3 2022-05-17 61
10 3 2021-02-22 133
11 3 2021-02-22 133
Example
# What I have
customerId fromDate next_purchase_in_days purchase_in_last_90_days
0 1 2021-02-22 24 0
1 1 2021-03-18 4 0
2 1 2021-03-22 109 0
3 1 2021-02-10 12 0
4 1 2021-09-07 133 0
8 3 2022-05-17 61 1
10 3 2021-02-22 133 1
11 3 2021-02-22 133 1
# Or does this make more sense?
customerId fromDate next_purchase_in_days purchase_in_last_90_days
0 1 2021-02-22 24 1
1 1 2021-03-18 4 2
2 1 2021-03-22 109 3
3 1 2021-02-10 12 0
4 1 2021-09-07 133 0
8 3 2022-05-17 61 1
10 3 2021-02-22 133 0
11 3 2021-02-22 133 0
You can address this in a number of ways, but something interesting to consider is the interaction between Date & Customer ID.
Dates have meaning to humans beyond just time keeping. They are associated with emotional, and culturally importance. Holidays, weekends, seasons, anniversaries etc. So there is a conditional relationship between the probability of a purchase and Events: P(x|E)
Customer Ids theoretically represent a single person, or at the very least a single business with a limited number of people responsible for purchasing.
Certain people/corporations are just more likely to spend.
So here are a number of ways to address this:
Find a list of holidays relevant to the users. For instance if they are US based find a list of US recognized holidays. Then create a
feature based on each date: Date_Till_Next_Holiday or (DTNH for
short).
Dates also have cyclical aspects that can encode probability. Day of the > year (1-365), Days of the week (1-7), week numbers (1-52),
Months (1-12), Quarters (1-4). I would create additional columns
encoding each of these.
To address the customer interaction, have a running total of past purchases. You could call it Purchases_to_date, and would be an
integer (0...n) where n is the number of previous purchases.
I made a notebook to show you how to do running totals.
Humans tend to share purchasing patterns with other humans. You could run a k-means cluster algorithm that splits customers into 3-4
groups based on all the previous info, and then use their
cluster-number as a feature. Sklearn-Kmeans
So based on all that you could engineer 8 different columns. I would then run Principle Component Analysis (PCA) to reduce that to 3-4 features.
You can use Sklearn-PCA to do PCA.
I have long (multiple thousand lines and growing) list of data in Sheets which have a date and additional columns with data. Here's a simplified example of this list (=TAB1):
Date Number Product-ID
02.09.2021 123 1
02.09.2021 2 1
01.09.2021 15 1
01.09.2021 675 2
01.09.2021 45 2
01.09.2021 52 1
31.08.2021 2 1
31.08.2021 78 1
31.08.2021 44 1
31.08.2021 964 2
30.08.2021 1 2
29.08.2021 ...
...
Three remarks:
The date is formatted to European standard DD.MM.YYYY
There definitely is more than one line per day per product (could be a big number depending on the day)
(for the formulas below) In the European standard Sheets uses ; instead of , as in =IF(A;B;C)
In a different tab (=TAB2), I want to add up all the numbers for a unique date for Product-ID 1. So far I've done it like this:
Date Sum (if Product-ID=1)
=UNIQUE('TAB1'!A2:A) =ARRAYFORMULA(SUMIF('TAB1'!A:A&'TAB1'!C:C;A2:A&"1";'TAB1'!B:B))
02.09.2021 125
01.09.2021 67
31.08.2021 124
30.08.2021 1
29.08.2021 ...
...
This works fine so far. Here's what I want to do now:
For every month (here: August and September 2021) I need an additional line above the current date (in this case: above 02.09.2021) AND above a completed month to sum over the whole month for column B. Here's how it should look like:
Date Sum (if Product-ID=1)
September 2021 192
02.09.2021 125
01.09.2021 67
August 2021 125
31.08.2021 124
30.08.2021 1
29.08.2021 ...
Of course, the line for the next day (03.09.2021) should be added above 02.09.2021 and below the sum for the month when it's automatically added to TAB1 on the next day.
I tried to play around with s.th. like =IF(DAY(UNIQUE('TAB1'!A2:A))=1;...;...) but didn't get far.
Is there anyone with an idea how to realize s.th. like this?
You want to learn about QUERY().
in cell A1 of an empty tab.
=QUERY('TAB1'!A2:C,"select A,SUM(B) where C = 1 group by A")
it makes a very big difference whether your product ids are text or numbers. the above was written as if they are numbers, but you might have just been simplifying. If they are text you would write it like this:
=QUERY('TAB1'!A2:C,"select A,SUM(B) where C = '1XYZ' group by A")
note the single quotes.
if the IDs are a MIX of text and letters then you need to force them all to text values in the original data by highlighting the IDs column and choosing Format>Number>Plain Text from the menu bar.
UPDATE:
I understand the requirements better now for intermixing a cumulative month total into the output. This may work.
=ARRAYFORMULA({QUERY({EOMONTH('TAB1'!A2:A,0),'TAB1'!B2:C},"select 'Total',Col1,SUM(Col2) where Col3 = 1 group by 'Total',Col1 label 'Total''',SUM(Col2)''",0);QUERY('TAB1'!A2:C,"select '',A,SUM(B) where C = 1 group by '',A label '''',SUM(B)''",0)},"order by Col2,Col1",0))
I'm trying to sum values across multiple Google Sheet spreadsheets (workbooks) that are grouped by dates. For example, I want to sum all the Delta values for March 2, 2020 across multiple spreadsheets and each spreadsheet will have 0 or more values for that date.
Here's an example with 2 spreadsheets:
Spreadsheet 1:
Date Start Stop Delta
Mon 02Mar20 16:51 16:56 0:05
Mon 02Mar20 16:56 17:00 0:03
Tue 03Mar20 18:45 18:49 0:03
Tue 03Mar20 19:04 19:06 0:01
Spreadsheet 2:
Date Start Stop Delta
Mon 02Mar20 8:38 8:49 0:11
Tue 03Mar20 4:47 4:50 0:03
Tue 03Mar20 17:42 17:55 0:13
Tue 03Mar20 17:58 18:45 0:47
Tue 03Mar20 18:53 19:03 0:10
I want to have a dynamic sum of the Delta columns across spreadsheets by each day in a separate spreadsheet. So here's what I would like to autogenerate. Specifically, the sum of the Delta values for Spreadsheet 1 and Spreadsheet 2 for each day (0:08, 0:11, 0:04, 1:10):
Date Total Spreadsheet 1 Spreadsheet 2
Mon 02Mar20 0:19 0:08 0:11
Tue 03Mar20 1:14 0:04 1:10
I tried using IMPORTRANGE but I'm not sure how to make the sums dynamic for each day. I don't know ahead of time how many entries I'll have for each date in Spreadsheet 1 and 2 so I want to have a way to auto determine how many rows to sum up each day for Spreadsheet 1 and 2. I'm guessing I would need to use QUERY or FILTER to filter all the imported values from IMPORTRANGE but I'm not sure how to do that.
I made an easy dataset to be sum up. Got a Spreadsheet like this:
As you can see, the total sum values for 2nd march would be 2 and for 3rd March would be 20.
In a different Spreadsheet, got my dashboard:
The formula I've used in B2 is:
=SUMPRODUCT(--(IMPORTRANGE("https://docs.google.com/spreadsheets/d/1rnap9LJQJaqriiJLSsF7EWQLwBUiNviktxDAMFfW0ZE";"Hoja 1!A1:A4")=$A2);IMPORTRANGE("https://docs.google.com/spreadsheets/d/1rnap9LJQJaqriiJLSsF7EWQLwBUiNviktxDAMFfW0ZE";"Hoja 1!B1:B4"))
This is how it works:
--(IMPORTRANGE("https://docs.google.com/spreadsheets/d/1rnap9LJQJaqriiJLSsF7EWQLwBUiNviktxDAMFfW0ZE";"Hoja 1!A1:A4")=$A2) will compare the values of column A in Workbook 1 with the date in column A in my main dashboard. Because we've used a double unary operator this will return an array of 1 and 0 if there is a match or not (in this case, it will be an array like {1;1;0;0}
IMPORTRANGE("https://docs.google.com/spreadsheets/d/1rnap9LJQJaqriiJLSsF7EWQLwBUiNviktxDAMFfW0ZE";"Hoja 1!B1:B4") will return as array the values of column B in Workbook 1, in this case it will return {1;1;10;10}
SUMPRODUCT will multiply both arrays and sum up the values, in this case {1;1;0;0} * {1;1;10;10} = {1;1;0;0} and the sum up of this final array is 2.
Same logic applied to second date, we would obtain {0;0;1;1} * {1;1;10;10} = {0;0;10;10} -> 20
Just add each workbook in 1 different column with same formula, and then do a normal sum up in your main dashboard to get the Grand total sum for all values in all workbooks for a specific date:
Hope this helps.
NOTICE: Of course, this method will work only if your dates are dates (not strings/texts) and the times in Delta are date/times too (not strings/texts)
I have rails 3 application and the following problem. I need to calculate a price based on the following:
Term range from 1 to 365 days (1 year).
Tariffs in a tables which presented in two sections: for one day and for half a month (15 days)
Example 1:
Prices: 1 day price = 0.5 and 15 days = 6
Term : 45 days.
Price: To get the price we devide the number of days by 15 (45/15 = 3) and multiply the result by the tariff for 15 days (3*6 = 18). Final price 18.
Example 2:
Prices: 1 day price = 0.5 and 15 days = 6
Term : 79 days.
Price: To get the price we find the half month period in this case is 45 and again we devide that number by 15 (75/15 = 5) and multiply the result by the tariff for 15 days (5*6 = 30). However there are 4 more days to account for, so we multiple them by the price for a day (4*0.5 = 2). The sum of the two results forms the final price (30+2 = 32).
The period is submited through a from and can be anything from 1 day to 365 days, prices per day and 15 days are stored in a database. My question is how to make the calculations in ruby/rails, so the code always calculates the half month and the reminder if any?
Any help is appreciated.
use the modulo operator:
79 / 15 # dividing integers performs an euclidian division
=> 5
79 % 15 # gives you the rest from an euclidian division.
=> 4