BigQuery - Optimal way of filtering by timestamp events - join

Suppose I have a table containing all days and some data for multiple factory location productions each day. A second table contains a list of days where some incident occurred, which invalidates the results for the next week (i.e., an incident on the 1st of January invalidates results until the 8th of January)
TABLE A: factory production
Factory date colA colB colC colD colF colE ... colZ
Brazil '2022-01-01'
Brazil '2022-01-02'
.
.
.
Sweden '2022-12-31'
TABLE B: occurrence
Location date
Brazil '2022-02-15'
Brazil '2022-02-17'
Italy '2022-07-28'
Sweden '2022-09-09'
Sweden '2022-09-14'
what is the cleanest way of removing all the 'invalid' days from A (based on location+incident)?
An obvious solution is to use JOIN TABLE B using location and dates with the week interval + GROUP BY to deduplicate (potential consecutive days in TABLE B), but it would need to treat the long list of columns. Is there a more elegant/efficient solution?

Related

How to get the average of the last cells in a column based on the last month?

I have a sheet that contains an amount in column D and the week number(A), month number (B), and year(C). How do I get the average of the last month's amount (D)? So in the example below, I am looking for the average of D251 to D254 based on last month (Column B) 7. Then next month, with it's September, The same cell would average all the "Amounts" in column D for the month of last month, Aug 8th. I hope that all makes sense.
How do I get the average of the last month's amount (D)?
You can use AverageIF, or possibly AverageIFs (plural) to account for prior year. I updated my example. You also might consider a different structure/column for time that combines Year_Month. See example in sheet.
Formula used:
=IFERROR(AVERAGEIFS(D:D,B:B,if(B2=1,12,B2-1),C:C,if(B2=1,C2-1,C2)),"First Period")
See this example.
Please try the following
=AVERAGE(INDEX(FILTER(A2:D,B2:B=MAX(B2:B)),,4))

Commission calculation based on sliding month (googlesheets)

I have to pay a commission to Agents (affiliates) based on the following conditions:
the commission starts on a monthly basis following a USER (linked to the Agent) first deposit/purchase on a website
agents have a decreasing commission, ex: 1 month following first deposit of their USER = 30% of sales, 2d month period following 1st deposit of USER: 25% of sales, etc
Commission are paid on a month basis calculation (ex: from 01/07/2020 till 31/07/2020)
If a USER makes a first purchase on June 22d and if sales commission for 1st period is 30%, then agent is eligible to a 30% commission on sales from june 22d till July 22d, then 25% for sales from 23rd july till 23rd august, etc
I have designed a googlesheets (see below) that serves the purpose (using 12 columns to get the correct commission% for a specific user on a specific day!), but I am trying to find a more straight forward formula to get the applicable com. % based on the commission sliding table and the first deposit date of a specific user.
Can anyone help?
The google sheets showing my calc is here:
https://docs.google.com/spreadsheets/d/1I1gzZ670hJH8HwCGizzbvlQkg0dgAvgOSQTfOUL0VgU/edit?usp=sharing
This might help you. (Updated to correct the row number of where the formula should go.)
If you insert a new column in your sheet, to the right of Column W, the Current Commision Month, and paste the following formula in the cell where the Current Commision Month header text should appear (Row 9 in your sample) of that column, it replicates the results in your Current Commision Month.
But it does not require columns I through V. You can test that by deleting columns I through V - you can use Undo and Redo to go back and forth, if necessary. Depending on how you use your "End" column - the logic wasn't clear to me - the info for that can also be gained in this one column.
={"Current
Commision
Month";"";ArrayFormula(
if(
($H11:H<>"") * ($A11:A>=date(year(H11:H),month(H11:H),day(H11:H))),
ifs( $A11:A< date(year(H11:H),month(H11:H)+1,day(H11:H)),1,
$A11:A< date(year(H11:H),month(H11:H)+2,day(H11:H)),2,
$A11:A< date(year(H11:H),month(H11:H)+3,day(H11:H)),3,
$A11:A< date(year(H11:H),month(H11:H)+4,day(H11:H)),4,
$A11:A< date(year(H11:H),month(H11:H)+5,day(H11:H)),5,
$A11:A< date(year(H11:H),month(H11:H)+6,day(H11:H)),6,
$A11:A>=date(year(H11:H),month(H11:H)+7,day(H11:H)),9999),
""))}
The first IF test is to check that the FirstDeposit date is not blank, and that the sale date is greater than or equal to the FirstDeposit date.
The IFS tests go through and check whether the sale date is less than one of the months, and stops at the first value (commission month) that is greater than the sale date. If never, it places a vlaue of 9999.
Note that the "9999" values are just to indicate the sale date is greater than the "End" date, and can be changed to blanks or whatever you want.
[![enter image description here][1]][1]
I've added a sample tab with the final result. Let me know if this helps. There may be several other enhancements possible for your sheet, in particular the use of ARRAYFORMULAS to fill values down many of your columns.
I haven't spent time on the actual commision calculations, in the final columns, but if you feel that still needs improvements, I can try to simplify there as well.
[1]: https://i.stack.imgur.com/TfFZ5.png

'weeknum' function

Here is what I'm trying to do:
I have three Spreadsheets.
(1) Days (holds increase in user-nr per day)
(2) Weeks (is supposed to sum up user increases so they are shown each week)
(3) Months (is supposed to sum up user increases so they are shown each month)
To give an example: if we have 10 users on Monday, 20 more on Tuesday, 15 more on Wednesday (that's when the next calendar week starts), then I want in the sheet "weeks" to see e.g. 45 users in calendar week 27 or so.
So what I try is this: =SUMIF(WEEKNUM(Days!A2:A977); A2; Days!B2:B977)
A holds the date of the day
B holds the number of users.
What happens is it does not sum up the number of the users shown in B, but only gives the number in the first cell of the weeknumber shown in A2.
What is my mistake?
The formula seems to be correct, but two things are needed.
You should embrace it in ArrayFormula()
You should use one more weeknum()
=ArrayFormula(sumif(weeknum(Days!A2:A977);weeknum(Days!A2);Days!B2:B977))

Rails: How to group records that have the same Date (month day year) for a Datetime field?

I have a Purchase model and I want to group all of the records by their ordered_datetime field. However, I don't care about the time, I just want to group by the date. So if there are 2 orders ordered on:
5/12/2014 12:00PM
5/12/2014 3:00PM
They should be grouped together even though they happened at different times during the day.
Is there a way to do this? Purchase.uniq.pluck(:ordered_datetime) separates the 2 records into 2 groups since their times are different.
You can use the DATE function on the timestamp column:
Purchase.group('DATE(ordered_datetime)').count
Which returns each date with a purchase count.
You can also sort the dates by adding an order clause:
Purchase.group('DATE(ordered_datetime)').order('date_ordered_datetime').count

Store the day of the week and time?

I have a two-part question about storing days of the week and time in a database. I'm using Rails 4.0, Ruby 2.0.0, and Postgres.
I have certain events, and those events have a schedule. For the event "Skydiving", for example, I might have Tuesday and Wednesday and 3 pm.
Is there a way for me to store the record for Tuesday and Wednesday in one row or should I have two records?
What is the best way to store the day and time? Is there a way to store day of week and time (not datetime) or should these be separate columns? If they should be separate, how would I store the day of the week? I was thinking of storing them as integer values, 0 for Sunday, 1 for Monday, since that's how the wday method for the Time class does it.
Any suggestions would be super helpful.
Is there a way for me to store the the record for Tuesday and
Wednesday in one row or do should I have two records?
There are several ways to store multiple time ranges in a single row. #bma already provided a couple of them. That might be useful to save disk space with very simple time patterns. The clean, flexible and "normalized" approach is to store one row per time range.
What is the best way to store the day and time?
Use a timestamp (or timestamptz if multiple time zones may be involved). Pick an arbitrary "staging" week and just ignore the date part while using the day and time aspect of the timestamp. Simplest and fastest in my experience, and all date and time related sanity-checks are built-in automatically. I use a range starting with 1996-01-01 00:00 for several similar applications for two reasons:
The first 7 days of the week coincide with the day of the month (for sun = 7).
It's the most recent leap year (providing Feb. 29 for yearly patterns) at the same time.
Range type
Since you are actually dealing with time ranges (not just "day and time") I suggest to use the built-in range type tsrange (or tstzrange). A major advantage: you can use the arsenal of built-in Range Functions and Operators. Requires Postgres 9.2 or later.
For instance, you can have an exclusion constraint building on that (implemented internally by way of a fully functional GiST index that may provide additional benefit), to rule out overlapping time ranges. Consider this related answer for details:
Preventing adjacent/overlapping entries with EXCLUDE in PostgreSQL
For this particular exclusion constraint (no overlapping ranges per event), you need to include the integer column event_id in the constraint, so you need to install the additional module btree_gist. Install once per database with:
CREATE EXTENSION btree_gist; -- once per db
Or you can have one simple CHECK constraint to restrict the allowed time period using the "range is contained by" operator <#.
Could look like this:
CREATE TABLE event (event_id serial PRIMARY KEY, ...);
CREATE TABLE schedule (
event_id integer NOT NULL REFERENCES event(event_id)
ON DELETE CASCADE ON UPDATE CASCADE
, t_range tsrange
, PRIMARY KEY (event_id, t_range)
, CHECK (t_range <# '[1996-01-01 00:00, 1996-01-09 00:00)') -- restrict period
, EXCLUDE USING gist (event_id WITH =, t_range WITH &&) -- disallow overlap
);
For a weekly schedule use the first seven days, Mon-Sun, or whatever suits you. Monthly or yearly schedules in a similar fashion.
How to extract day of week, time, etc?
#CDub provided a module to deal with it on the Ruby end. I can't comment on that, but you can do everything in Postgres as well, with impeccable performance.
SELECT ts::time AS t_time -- get the time (practically no cost)
SELECT EXTRACT(DOW FROM ts) AS dow -- get day of week (very cheap)
Or in similar fashion for range types:
SELECT EXTRACT(DOW FROM lower(t_range)) AS dow_from -- day of week lower bound
, EXTRACT(DOW FROM upper(t_range)) AS dow_to -- same for upper
, lower(t_range)::time AS time_from -- start time
, upper(t_range)::time AS time_to -- end time
FROM schedule;
db<>fiddle here
Old sqliddle
ISODOW instead of DOW for EXTRACT() returns 7 instead of 0 for sundays. There is a long list of what you can extract.
This related answer demonstrates how to use range type operator to compute a total duration for time ranges (last chapter):
Calculate working hours between 2 dates in PostgreSQL
Check out the ice_cube gem (link).
It can create a schedule object for you which you can persist to your database. You need not create two separate records. For the second part, you can create schedule based on any rule and you need not worry on how that will be saved in the database. You can use the methods provided by the gem to get whatever information you want from the persisted schedule object.
Depending how complex your scheduling needs are, you might want to have a look at RFC 5545, the iCalendar scheduling data format, for ideas on how to store the data.
If you needs are pretty simple, than that is probably overkill. Postgresql has many functions to convert date and time to whatever format you need.
For a simple way to store relative dates and times, you could store the day of week as an integer as you suggested, and the time as a TIME datatype. If you can have multiple days of the week that are valid, you might want to use an ARRAY.
Eg.
ARRAY[2,3]::INTEGER[] = Tues, Wed as Day of Week
'15:00:00'::TIME = 3pm
[EDIT: Add some simple examples]
/* Custom the time and timetz range types */
CREATE TYPE timerange AS RANGE (subtype = time);
--drop table if exists schedule;
create table schedule (
event_id integer not null, /* should be an FK to "events" table */
day_of_week integer[],
time_of_day time,
time_range timerange,
recurring text CHECK (recurring IN ('DAILY','WEEKLY','MONTHLY','YEARLY'))
);
insert into schedule (event_id, day_of_week, time_of_day, time_range, recurring)
values
(1, ARRAY[1,2,3,4,5]::INTEGER[], '15:00:00'::TIME, NULL, 'WEEKLY'),
(2, ARRAY[6,0]::INTEGER[], NULL, '(08:00:00,17:00:00]'::timerange, 'WEEKLY');
select * from schedule;
event_id | day_of_week | time_of_day | time_range | recurring
----------+-------------+-------------+---------------------+-----------
1 | {1,2,3,4,5} | 15:00:00 | | WEEKLY
2 | {6,0} | | (08:00:00,17:00:00] | WEEKLY
The first entry could be read as: the event is valid at 3pm Mon - Fri, with this schedule occurring every week.
The second entry could be read as: the event is valid Saturday and Sunday between 8am and 5pm, occurring every week.
The custom range type "timerange" is used to denote the lower and upper boundaries of your time range.
The '(' means "inclusive", and the trailing ']' means "exclusive", or in other words "greater than or equal to 8am and less than 5pm".
Why not just store the datestamp then use the built in functionality for Date to get the day of the week?
2.0.0p247 :139 > Date.today
=> Sun, 10 Nov 2013
2.0.0p247 :140 > Date.today.strftime("%A")
=> "Sunday"
strftime sounds like it can do everything for you. Here are the specific docs for it.
Specifically for what you're talking about, it sounds like you'd need an Event table that has_many :schedules, where a Schedule would have a start_date timestamp...

Resources