PSQL select & replace chars - psql

How do I select from both tables based on the Name from different menus but the data is almost identical in PSQL?
I've tried using the select and replace from psql command.
SELECT * FROM americanmenu JOIN europeanmenu ON replace(usmenu.type, \'US\', \'EU\') = eumenu.type WHERE usmenu.type = eumenu.type
table US
ID | type | year
----------------------
01 | wine 1 us | 2001
02 | wine 2 us | 2002
table EU
ID | TYPE | year
--------------------
01 | wine 1 eu | 2001
02 | wine 2 eu | 2002
There is are additional columns for price and taste ratings which I did not include because this is the gist of the problem. I would want to select from the us table by type, and replacing the last 2 characters/string into "eu" and be able to compare both tables even though there is alot of identical data. Thanks!

Query:
t=# SELECT *
FROM americanmenu a
JOIN europeanmenu e ON replace(a.type, 'us','eu') = e.type;
id | type | year | id | type | year
-----+-------------+------+-----+-------------+------
01 | wine 1 us | 2001 | 01 | wine 1 eu | 2001
02 | wine 2 us | 2002 | 02 | wine 2 eu | 2002
(2 rows)
Time: 0.297 ms
prepare:
t=# create table americanmenu (id text, type text, year int);
CREATE TABLE
Time: 4.515 ms
t=# create table europeanmenu (id text, type text, year int);
CREATE TABLE
Time: 15.218 ms
t=# copy americanmenu from stdin delimiter '|';
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> 01 | wine 1 us | 2001
02 | wine 2 us | 2002>>
>> \.
COPY 2
Time: 7144.563 ms
t=# copy europeanmenu from stdin delimiter '|';
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> 01 | wine 1 eu | 2001
02 | wine 2 eu | 2002>>
>> \.
COPY 2
Time: 9729.000 ms

Related

Query select by most recent date and group line items

just discovered that my source data is being updated from a third-party source which I can't change. As such, my orders file actually has all order history including updates of quantity, etc.
I am trying to create a sheet that pulls ONLY the summary values for the most recent version of the order. The example below is an actual extract from the data set - without all the extra data. As you can see, Bill's order was updated three times before it shipped.
I need to group on Order Number and return ONLY the last update from 09/08/2021.
There are obviously many rows (17,000) to be exact with approximately 8000 orders. About 10% of the orders are updated like this. Does anyone have any suggestions for grouping and reporting on the Latest date?
A B C D E
Order Number | Name | Item | QTY | Updated
1001 | Bill | ABC | 10 | 30/07/2021
1001 | Bill | DEF | 5 | 30/07/2021
1001 | Bill | GHI | 5 | 30/07/2021
1001 | Bill | ABC | 10 | 07/08/2021
1001 | Bill | DEF | 5 | 07/08/2021
1001 | Bill | GHI | 7 | 07/08/2021
1001 | Bill | ABC | 2 | 09/08/2021
1001 | Bill | DEF | 4 | 09/08/2021
1001 | Bill | GHI | 2 | 09/08/2021
I want to pull a query back with this group by order number for the last update and sum the QTY.
For this subset of data, the result should look like this.
1001 | Bill | 8 | 09/08/2021
=query(Orders!A1:E,"Select A, B, Sum(D), E group by A, B, E Where E = date ‘”&Text(Max(Orders!E:E),"YYY-MM-DD”)&”‘”,1)
I am getting an error. Any idea? Thanks
try:
=QUERY(Orders!A1:E,
"select A,B,sum(D),max(E)
Where E = date '"&TEXT(MAX(Orders!E:E), "YYYY-MM-DD")&"'
group by A,B", 1)

Google Sheets - sumif using condition on row and column

I have a spreadsheet that looks like the following
TABLE 1
ID/Month | May | June | July | August | September | October
ID101 | 30 | 50 | 50 | 80 | 20 | 60
ID201 | 20 | 30 | 10 | 40 | 30 | 50
ID101 | 10 | 50 | 60 | 80 | 70 | 20
ID301 | 20 | 80 | 70 | 40 | 40 | 70
ID101 | 30 | 70 | 80 | 50 | 90 | 50
ID301 | 80 | 20 | 30 | 20 | 60 | 20
TABLE 2
ID | Date | Value
ID101 | July | ?
ID201 | September | ?
ID301 | June | ?
? is the sum of the values in TABLE 1 if the IDs matches, and if the row of dates are less than or equal to the dates specified in TABLE 2.
So
for ID101 | July | ? I need to find the sum of values in row ID101 in TABLE 1 and May/June/July columns
for ID201 | September | ? I need to find the sum of values in row ID201 in TABLE 1 and May/June/July/August/September columns
How do I do a sumif like an index match table where I can look up conditions in column (IDs) and rows (less than or equal to dates)
You can use SUMPRODUCT function:
=SUMPRODUCT((J2=$A$2:$A$7)*(MONTH(K2&1)>=MONTH($B$1:$G$1&1))*$B$2:$G$7)
To convert your month names to correct number using MONTH(K2&1) formula you must specify a United States locale in the spreadsheet settings
You'll want to do three things:
Un-pivot your table ("wide" to "long") so that each value is in its own row, identified by an ID and a Month (this is a little tricky)
Give your month a numeric value (the MONTH() function comes in handy)
Use SUMIFS to check the ID column and the Month column.
Here's a working sample: Google Sheets link

How do I make a calculation field reference specific values from a pivot table in google sheets

So I'm making a punch in/out dashboard in google sheets. It uses a google form to populate a sheet with my employees punches like so:
Timestamp | Name | Punch Type | Time
6/2/2020 15:09:55 | Bob | 1. Start Shift | 7:30:00 AM
6/2/2020 15:10:45 | Bob | 2. Start Lunch | 11:00:00 AM
6/2/2020 15:11:08 | Bob | 3. End Lunch | 11:30:00 AM
6/2/2020 16:01:04 | Bob | 4. End Day | 4:00:00 PM
...
I then used this source data to make a pivot table that looks like this:
AVERAGE of Time | Punch Type
Name | 1. Start Shift | 2. Start Lunch | 3. End Lunch | 4. End Day
Bob | 7:30:00 AM | 11:00:00 AM | 11:30:00 AM | 4:00:00 PM
...
In this pivot table, I want to add a column at the end that is a calculated field of
("4. End Day" - "1. Start Shift") - ("3. End Lunch" - "2. Start Lunch").
I'm encountering two road blocks here. First is when I go to add a calculated field in the pivot table editor panel, it creates 4 new columns instead of just one:
| Punch Type | Values
| 1. Start Shift | 2. Start Lunch | 3. End Lunch | 4. End Day
Name | AVERAGE of Time.. | AVERAGE of Time.. | AVERAGE of Time.. | AVERAGE of Time..
Bob | 7:30:00 AM | 0 | 11:00:00 AM | 0 | 11:30:00 AM | 0 | 4:00:00 PM | 0
...
I the second issue is I can't figure out of to reference the columns with the timestamps to do this calculation.
Basically my end goal is a pivot table that looks like this:
AVERAGE of Time | Punch Type
Name | 1. Start Shift | 2. Start Lunch | 3. End Lunch | 4. End Day | Total Hours
Bob | 7:30:00 AM | 11:00:00 AM | 11:30:00 AM | 4:00:00 PM | 8.0
...
Displayed below is how I have my Pivot Table settings in the Pivot Table Editor Panel, before I attempt to add the calculated field

Find total duration of many overlapping times

I have a list of dates and times for employee time sheets. The times begin in column F, and end in column G. Sometimes there are overlapping times for projects. The employee does not get paid for overlapping projects, yet we need to track each project separately. I would like to be able to look at columns E, F and G and find any overlapping projects, and return a single time entry. In the example below, notice that line 1 does NOT overlap with the others, but that there is a series of overlapping entries in lines 2-6. They don't necessarily all overlap, but are more like a "chain." I want to write a formula (not a script) to solve this.
+---+------------+------------+----------+
| | E | F | G |
+---+------------+------------+----------+
| 1 | 10/11/2017 | 12:30 PM | 1:00 PM |
| 2 | 10/11/2017 | 1:00 PM | 3:00 PM |
| 3 | 10/11/2017 | 2:15 PM | 6:45 PM |
| 4 | 10/11/2017 | 2:30 PM | 3:00 PM |
| 5 | 10/11/2017 | 2:15 PM | 6:45 PM |
| 6 | 10/11/2017 | 3:00 PM | 6:45 PM |
+---+------------+------------+----------+
I would want to evaluate these columns and return the total duration of each "chain" on the final line of the series of overlaps. In my example below, we'll put that in column H. It finds 5.75 hours for the series that begins in row 2 and ends in row 6 (1 pm to 6:45 pm).
+---+------------+------------+----------+------------+
| | E | F | G | H |
+---+------------+------------+----------+------------+
| 1 | 10/11/2017 | 12:30 PM | 1:00 PM | 0.5 |
| 2 | 10/11/2017 | 1:00 PM | 3:00 PM | overlap |
| 3 | 10/11/2017 | 2:15 PM | 6:45 PM | overlap |
| 4 | 10/11/2017 | 2:30 PM | 3:00 PM | overlap |
| 5 | 10/11/2017 | 2:15 PM | 6:45 PM | overlap |
| 6 | 10/11/2017 | 3:00 PM | 6:45 PM | 5.75 |
+---+------------+------------+----------+------------+
I've tried writing queries, but keep finding myself back at the beginning. If anyone has a suggestion, I'd love to know it! Thank you in advance.
Neill
My Solution
To solve this I need 2 extra columns:
Step 1. Return "overlap" or "ok"
Two lines overlap when one end is inside the other:
I made a query formula to check this:
=if(QUERY(ArrayFormula({value(E1:E+F1:F),VALUE(E1:E+G1:G)}),
"select count(Col1) where
Col1 < "&value(G1+E1-1/10^4)&"
and Col2 > "&value(F1+E1+1/10^4)&" label Count(Col1) ''",0)>1,"overlap","ok")
Drag the formula down. The result is column:
ok
overlap
overlap
overlap
overlap
ok
ok
overlap
overlap
overlap
overlap
ok
In the formula:
value is used to compare numbers. Must compare each pare: date + time.
-1/10^4 and +1/10^4 is used because of imprecision in query
Step 2. Get Time Chains
This part is tricky. My solution will only work if data is sorted like in the example.
Enter 1 in cell I1. In cell I2 enter the formula:
=if(or(and(H1=H2,H2="overlap"),and(H2="ok",H1="overlap")),I1,I1+1)
Drag the formula down. The result is column:
1
2
2
2
2
2
3
4
4
4
4
4
Step3. Get Durations
In J4 paste and copy down the formula:
=if(H1="ok",
round(QUERY(ArrayFormula({value(E:E+F:F),VALUE(E:E+G:G),I:I}),
"select max(Col2) - min(Col1) where Col3 = "&I1
&" label max(Col2) - min(Col1) ''")*24,2),"")
The query gets max durations by groups, found in step2.
round is used because of imprecision in query

select a row except all the other rows based on some condition psql

Consider this scenario:
There is a table T (name,habit) where combination of name and habit is the primary key for the table T.
Suppose the data is as follows:
name | habit
a1 | smoking
a1 | drinking
a2 | sleeping
a3 | jogging
a2 | jogging
a4 | sleeping
Now I want to select names which have all the habits as unique. Here clearly a2,a3 and a4 have habits in common so they should be filtered out.
So the output should be like
OUTPUT:
name
a1
My question:
How can I do this using except in psql?
you don't need except for it:
t=# with a as (
select *,count(1) over (partition by habit)
from t
)
select distinct name
from a
where count = 1;
name
------
a1
(1 row)
schema:
t=# create table T (name text,habit text);
CREATE TABLE
Time: 14.162 ms
t=# copy t from stdin delimiter '|';
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> a1 | smoking
a1 | drinking
a2 | sleeping
a3 | jogging
a2 | jogging
a4 | sleeping>> >> >> >> >>
>> \.
COPY 6
Time: 3216.573 ms

Resources