I am trying to calculate the average difference between order_time and pickup_time grouped by runner, but in customers table order_time can appear more than once and the calculations go wrong because of this
customers_order_table
order_id
customer_id
pizza_id
exclusions
extras
order_time
1
101
1
NULL
NULL
2020-01-01 18:05:00
2
101
1
NULL
NULL
2020-01-01 19:01:00
3
102
1
NULL
NULL
2020-01-02 23:51:00
3
102
2
NULL
NULL
2020-01-02 23:51:00
-ruunners_orders_table
order_id
runner_id
pickup_time
distance
duration
cancellation
1
1
2020-01-01 18:15:34
20
32
NULL
2
1
2020-01-01 19:10:54
20
27
NULL
3
1
2020-01-03 00:12:37
13,4
20
NULL
4
2
2020-01-04 13:53:03
23,4
40
NULL
My calculated field is working like (image): (9+10+21+21+15+15)/6
But it should be: (9+10+21+15)/4
It is getting 2 info about the same order
The only solution I found is to create a new table without duplicated values like this:
order_id
runner_id
pickup_time
order_time
Any other suggestion?
It seems like you may want something like {FIXED order_id:
max(pickup_time - order_time)} because it sounds like you need a
single time delta for each order. Then hopefully Tableau will let you
take the average of that calculation when you have runner_id and the
new field in the view
Mako212
I changed my calculated field from
DATEDIFF('minute',[Order Time],[Pickup Time])
to:
{ FIXED [Order Id]: max(DATEDIFF('minute',[Order Time],[Pickup Time]))}
Now the average calculation is correct, no more duplicates
enter image description here
Related
I have a dataset that I need to get the COUNT DISTINCT of IDs of. However it seems that certain IDs have values in BOTH value columns when it's only supposed to be one. I'm currently running a query in SQL to fix the table but in the meantime, was playing around with Tableau to see if I can remedy this but got stuck.
ID
COLUMN1
COLUMN2
1
1
2
1
0
3
1
0
4
0
5
1
This gives me a COUNTD of 7 (3 from COLUMN1 and 4 from COLUMN2). I want it to count ALL the IDs where
Has a value in COLUMN1
Has a value in COLUMN1 >= 0 but DOES NOT HAVE A VALUE IN COLUMN1
I should get a count of 5 IDs total.
Is there a way to do this in tableau?
Let's say, for simplicity sake, I have the following table:
id amount p_id date
------------------------------------------------
1 5 1 2020-01-01T01:00:00
2 10 1 2020-01-01T01:10:00
3 15 2 2020-01-01T01:20:00
4 10 3 2020-01-01T03:30:00
5 10 4 2020-01-01T03:50:00
6 20 1 2020-01-01T03:40:00
Here's a sample response I want:
{
"2020-01-01T01:00:00": 25, -- this is from adding records with ids: 2 and 3
"2020-01-01T03:00:00": 55 -- this is from adding records with ids: 3,4,5 and 6
}
I want to get the total (sum(amount)) of all unique p_id's grouped by the hour.
The row chosen per p_id is the one with the latest date. So for example, the first value in the response above doesn't include id 1 because the record with id 2 has the same p_id and the date on that row is later.
The one tricky thing is I want to include the summation of all the amount per p_id if their date is before the hour presented. So for example, in the second value of the response (with key "2020-01-01T03:00:00"), even though id 3 has a timestamp in a different hour, it's the latest for that p_id 2 and therefore gets included in the sum for "2020-01-01T03:00:00". But the row with id 6 overrides id 2 with the same p_id 1.
In other words: always take the latest amount for each p_id so far, and compute the sum for every distinct hour found in the table.
Create a CTE that includes row_number() over (partition by p_id, date_trunc('hour',"date") order by "date" desc) as pid_hr_seq
Then write your query against that CTE with where pid_hr_seq = 1.
I have a data file that looks like the first picture, I am reading it in to SPSS using FILE TYPE MIXED so that it looks like the second picture. How can I merge the cases based on the ID variable so that cases with the same ID variable are merged? The variable Age is repeated, so it does not matter which is selected, but it would be good if it were possible to select the first value.
Here is an example of the code I am using to read the data:
FILE TYPE MIXED RECORD=RecordID 1
/ WILD =WARN.
RECORD TYPE 1.
DATA LIST
/ ID 8-9 JobType 3-4 Age 5-7.
RECORD TYPE 2.
DATA LIST
/ ID 3-4 Sex 11 Salary 5-8.
RECORD TYPE 3.
DATA LIST
/ ID 6-7 Age 8-10 Hiring 3-5.
END FILE TYPE.
BEGIN DATA
1 1 39 1
1 3 27 2
1 2 27 3
1 3 25 4
2 1 9000 0
2 2 7500 0
2 3 4750 1
2 4 7250 1
3 76 1 39
3 98 2 27
3 8 3 27
3 44 4 25
END DATA.
LIST.
This should work:
sort cases by ID RecordID.
casestovars id=ID/index=RecordID.
If the ages are identical they collapse into one column. If they aren't, you'll get three age columns, and you'll be able to choose the one you prefer.
I wrote a Hive query to compute 33 and 66 percentile on multiple columns of a table that contains integer values (including 0).
Just to filter outliers, I added the filter >0 before computing percentile.
I have 46 columns and I calculate 33 and 66 percentile on each column, with the >0 filter on column.
Then I join these results to get a table with 33 and 66 percentiles of these columns.
My issue is that the query doesn't execute. I tried executing with 2 columns and it works fine but doesn't work on this huge number of joins. Can someone suggest an alternate way.
Data looks like this:
C1| C2| C3
---------------
0 | 2 | 3
1 | 0 | 2
2 | 0 | 0
for C1, the data will be [1,2]; for C2 -> [2]; for C3 -> [3,2]
you need not do that
just use percentile udf of hive
select percentile(C1,0.33),.....,percentile(C46,0.33) from table
UNION ALL
select percentile(C1,0.66),.....,percentile(C46,0.66) from table
This gives you a table having 46 columns with first row indicating the 33rd percentile of each column and 2nd row indicating the 66th percentile of each column
or you can do
select percentile(C1,0.33),.....,percentile(C46,0.33) , percentile(C1,0.66),.....,percentile(C46,0.66) from table
I have three tables as.
**Table 1** **Table 2** **Table 3**
Lot_no(pk) Lot_no(pk/fk) Lot_no(fk)
Name job type Material
Phone Printing qty Trim
look at sample data
**Table 1** **Table 2** **Table 3**
1 Mian Sultan xyz 1 Reverse 50,000pcs 1 PVC 20
2 Mian Usman xyz 2 New 10,000pcs 1 INK 30
2 MILKY 25
2 INK 35
I just want to show data from table 2 & table 3 on the basis of lot_no.
for example user enter lot_no=1 then result should be displayed as
1 Reverse 50,000pcs
1 PVC 20
1 INK 30
if user enter lot_no=2 then similarly
2 New 10,000pcs
2 MILKY 25
2 INK 35
my query is as follows....
#lotnum int (Variable declaration in stored procedure)
SELECT table2.lot_no, table2.job_type, table2.printing qty,
table3.material, table3.trim
FROM table2
INNER JOIN table3 ON (table2.lot_no=table3.lot_no)
WHERE table2.lot_no=#lotnum AND table3.lot_no=#lotnum;
it shows me the Correct result but when i use this in Crystal Report it Shows only..... when lot_no=1
1 Reverse 50,000pcs
1 PVC 20
it don't show
1 INK 30
Similar case when lot_no=2.
Please Guide me thanks.