I would like to get a result such as the following:
name from_value to_value at
tag A 10 15 2019-02-11 16:00
tag B 1 2 2019-02-11 16:00
tag A 15 20 2019-02-11 16:05
tag B 2 3 2019-02-11 16:05
tag A 20 25 2019-02-11 16:10
tag B 3 4 2019-02-11 16:10
basically a column "from_value" (previous value current point) and a column "to_value" (current value current point).
To select only the current point value I do:
SELECT value FROM data WHERE "name"='tag A'
What if I wanted to select also the previous value?
SELECT prev(value) AS "from_value", value AS "to_value" FROM data WHERE "name"='tag A'
Can I do something like the above or I need to always save the previous value every time for every new point?
With group by time you can use last() and difference() functions to get value changes per time interval.
SELECT LAST(value)-DIFFERENCE(LAST(value)) as FromValue, LAST(value) as ToValue
FROM demo where time > 1549983975150000000
GROUP BY time(10ms),tagA FILL(none)
name: demo
tags: tagA=1
time FromValue ToValue
---- --------- -------
1549984410470000000 10
1549984421820000000 10 15
1549984431180000000 15 17
1549984436350000000 17 10
1549984753810000000 10 10
SELECT * FROM demo
name: demo
time tagA value
---- ---- -----
1549984410475859753 1 10
1549984421827992234 1 15
1549984431180379398 1 17
1549984436356232522 1 10
1549984753817094214 1 10
Related
I have a sheet which looks like :
Employee
Sat 10 /01
Sun 10 /02
Mon 10 /03
Tue 10 /04
Wed 10 /05
a
1
1
b
1
1
c
1
1
d
1
1
1
1
e
1
1
1
I have transposed this sheet:
Employee
a
b
c
d
e
Sat 10 /01
1
1
1
Sun 10 /02
1
1
1
Mon 10 /03
1
1
1
Tue 10 /04
1
1
1
Wed 10 /05
1
I just want that each of entries are connected with each other which simply means, using this formula on each cell i.e B2 in sheet 2 should be connected using a formula : =Sheet2!B2 and B3 should be =Sheet2!C2, but dragging down the formula it gives =Sheet2!B3, =Sheet2!B4 and so on, but I want this formula to work horizontally.
I'd appreciate your help because this data is quite large (approximately 90-100 employees).
Use this if you want to transpose sheet1 in sheet2
=TRANSPOSE(Sheet1!A1:500)
Is there anyway I can create a series like =Sheet4!B2, =Sheet4!C2, =Sheet4!D2
try dragging down:
=INDIRECT(ADDRESS(2, ROW(A1), 4,, "Sheet4"))
I have a problem. I want to predict when the customer will place another order in how many days if an order comes in.
I have already created my target variable next_purchase_in_days. This specifies in how many days the customer will place an order again. And I would like to predict this.
Since I have too few features, I want to do feature engineering. I would like to specify how many orders the customer has placed in the last 90 days. For example, I have calculated back from today's date how many orders the customer has placed in the last 90 days.
Is it better to say per row how many orders the customer has placed? Please see below for the example.
So does it make more sense to calculate this from today's date and include it as a feature or should it be recalculated for each row?
customerId fromDate next_purchase_in_days
0 1 2021-02-22 24
1 1 2021-03-18 4
2 1 2021-03-22 109
3 1 2021-02-10 12
4 1 2021-09-07 133
8 3 2022-05-17 61
10 3 2021-02-22 133
11 3 2021-02-22 133
Example
# What I have
customerId fromDate next_purchase_in_days purchase_in_last_90_days
0 1 2021-02-22 24 0
1 1 2021-03-18 4 0
2 1 2021-03-22 109 0
3 1 2021-02-10 12 0
4 1 2021-09-07 133 0
8 3 2022-05-17 61 1
10 3 2021-02-22 133 1
11 3 2021-02-22 133 1
# Or does this make more sense?
customerId fromDate next_purchase_in_days purchase_in_last_90_days
0 1 2021-02-22 24 1
1 1 2021-03-18 4 2
2 1 2021-03-22 109 3
3 1 2021-02-10 12 0
4 1 2021-09-07 133 0
8 3 2022-05-17 61 1
10 3 2021-02-22 133 0
11 3 2021-02-22 133 0
You can address this in a number of ways, but something interesting to consider is the interaction between Date & Customer ID.
Dates have meaning to humans beyond just time keeping. They are associated with emotional, and culturally importance. Holidays, weekends, seasons, anniversaries etc. So there is a conditional relationship between the probability of a purchase and Events: P(x|E)
Customer Ids theoretically represent a single person, or at the very least a single business with a limited number of people responsible for purchasing.
Certain people/corporations are just more likely to spend.
So here are a number of ways to address this:
Find a list of holidays relevant to the users. For instance if they are US based find a list of US recognized holidays. Then create a
feature based on each date: Date_Till_Next_Holiday or (DTNH for
short).
Dates also have cyclical aspects that can encode probability. Day of the > year (1-365), Days of the week (1-7), week numbers (1-52),
Months (1-12), Quarters (1-4). I would create additional columns
encoding each of these.
To address the customer interaction, have a running total of past purchases. You could call it Purchases_to_date, and would be an
integer (0...n) where n is the number of previous purchases.
I made a notebook to show you how to do running totals.
Humans tend to share purchasing patterns with other humans. You could run a k-means cluster algorithm that splits customers into 3-4
groups based on all the previous info, and then use their
cluster-number as a feature. Sklearn-Kmeans
So based on all that you could engineer 8 different columns. I would then run Principle Component Analysis (PCA) to reduce that to 3-4 features.
You can use Sklearn-PCA to do PCA.
I'm trying to retrieve the sum of same values that has the same timestamp.
My query is
SELECT value FROM dashboards WHERE time >= '2021-03-07T00:00:00Z' AND time <= '2021-03-09T00:00:00Z'
My returned values are
time value
---- -----
2021-03-07T00:00:00Z 1
2021-03-07T00:00:00Z 1
2021-03-07T00:00:00Z 1
2021-03-08T00:00:00Z 2
2021-03-08T00:00:00Z 2
2021-03-08T00:00:00Z 2
2021-03-09T00:00:00Z 3
2021-03-09T00:00:00Z 3
2021-03-09T00:00:00Z 3
How can I change my query the result will be
time sum
---- -----
2021-03-07T00:00:00Z 3
2021-03-08T00:00:00Z 6
2021-03-09T00:00:00Z 9
SELECT SUM(value) FROM dashboards WHERE time >= '2021-03-07T00:00:00Z' AND time <= '2021-03-09T00:00:00Z' GROUP BY time(1h) FILL(none)
GROUP BY time(1h) - group results by time column with interval of 1h
FILL(none) - ignore empty results
I have a table like this
Row time viewCount
1 00:00:00 31
2 00:00:01 44
3 00:00:02 78
4 00:00:03 71
5 00:00:04 72
6 00:00:05 73
7 00:00:06 64
8 00:00:07 70
I would like to aggregate this into
Row time viewCount
1 00:00:00 31
2 00:15:00 445
3 00:30:00 700
4 00:45:00 500
5 01:00:04 121
6 01:15:00 475
.
.
.
Please help. Thanks in advance
Supposing that you actually have a TIMESTAMP column, you can use an approach like this:
#standardSQL
SELECT
TIMESTAMP_SECONDS(
UNIX_SECONDS(timestamp) -
MOD(UNIX_SECONDS(timestamp), 15 * 60)
) AS time,
SUM(viewCount) AS viewCount
FROM `project.dataset.table`
GROUP BY time;
It relies on conversion to and from Unix seconds in order to compute the 15 minute intervals. Note that it will not produce a row with a zero count for an empty 15 minute interval unlike Mikhail's solution, however (it's not clear if this is important to you).
Below is for BigQuery Standard SQL
Note: you provided simplified example of your data and below follows it - so instead of each 15 minutes aggregation, it uses each 2 sec aggregation. This is for you to be able to easy test / play with it. It is easily can be adjusted to 15 minutes by changing SECOND to MINUTE in 3 places and 2 to 15 in 3 places. Also this example uses TIME data type for time field as it is in your example so it is limited to just 24 hour period - most likely in your real data you have DATETIME or TIMESTAMP. In this case you will also need to replace all TIME_* functions with respective DATETIME_* or TIMESTAMP_* functions
So, finally - the query is:
#standardSQL
WITH `project.dataset.table` AS (
SELECT TIME '00:00:00' time, 31 viewCount UNION ALL
SELECT TIME '00:00:01', 44 UNION ALL
SELECT TIME '00:00:02', 78 UNION ALL
SELECT TIME '00:00:03', 71 UNION ALL
SELECT TIME '00:00:04', 72 UNION ALL
SELECT TIME '00:00:05', 73 UNION ALL
SELECT TIME '00:00:06', 64 UNION ALL
SELECT TIME '00:00:07', 70
),
period AS (
SELECT MIN(time) min_time, MAX(time) max_time, TIME_DIFF(MAX(time), MIN(time), SECOND) diff
FROM `project.dataset.table`
),
checkpoints AS (
SELECT TIME_ADD(min_time, INTERVAL step SECOND) start_time, TIME_ADD(min_time, INTERVAL step + 2 SECOND) end_time
FROM period, UNNEST(GENERATE_ARRAY(0, diff + 2, 2)) step
)
SELECT start_time time, SUM(viewCount) viewCount
FROM checkpoints c
JOIN `project.dataset.table` t
ON t.time >= c.start_time AND t.time < c.end_time
GROUP BY start_time
ORDER BY start_time, time
and result is:
Row time viewCount
1 00:00:00 75
2 00:00:02 149
3 00:00:04 145
4 00:00:06 134
Is it possible to get individual data from cumulative?
Output of the following query is
SELECT mean("value") FROM "statsd_value" WHERE "type_instance" = 'counts' AND time > now() - 5m GROUP BY time(10s) fill(none)
TimeStamp Value
1463393810 0
1463393820 10
1463393830 23
1463393840 34
1463393850 67
1463393860 90
1463393870 104
Basically, the above data is cumulative data, I want to get individual data from that similar to this
TimeStamp Value
1463393820 10
1463393830 13
1463393840 11
1463393850 33
1463393860 23
1463393870 14
Is it possible to form query to get data in this way?
InfluxQL provides a difference function that will give you the functionality that you're looking for.
The query would look like this:
SELECT difference(mean("value")) FROM "statsd_value" WHERE "type_instance" = 'counts' AND time > now() - 5m GROUP BY time(10s) fill(none)
TimeStamp Value
1463393820 10
1463393830 13
1463393840 11
1463393850 33
1463393860 23
1463393870 14