join / union in presto to keep email in one column

join / union in presto to keep email in one column - join

I'm trying to join two tables together in presto,
select o.email
, o.user_id
, c.email
, c.sessions
from datasource o
full join datasource2 c
on o.email = c.email
this yields:
email user_id email sessions
jeff#sessions.com 123 NULL NULL
mike#berkley.com 987 NULL NULL
jared#swiss.com 384 jared#swiss.com 14
steph#berk.com 333 NULL NULL
NULL NULL lisa#hart.com 12
the problem with this is that I want to do multiple joins on multiple data sources using email, the only workaround I can think of is to use this as a subquery, and create a new column that takes one, and if null, takes the other, then perform the full join on datasource3, rinse repeat.

You want to use COALESCE which will chose the not null of the two values.
COALESCE is very useful for a lot of things. It can take more than two values and will return the first non NULL value it gets. If all of them are NULL it will simply return NULL.
SELECT
COALLESCE(o.email, c.email) AS email
, o.user_id
, c.sessions
FROM datasource o
FULL JOIN datasource2 c
ON o.email = c.email
For the official documentation on COALESCE see here:
https://prestodb.io/docs/current/functions/conditional.html

Related

Populating Fact Tables(Data Warehouse) and Querying

I am not sure how to query my fact tables(covid and vaccinations), I populated the dimensions with dummy data, I am supposed to leave the fact tables empty? As far as I know, they would get populated when I write the queries.
I am not sure how to query the tables I have tried different things, but I get an empty result.
Below is a link to the schema.
I want to find out the "TotalDeathsUK"(fact table COVID) for the last year caused by each "Strain"(my strain table has 3 strain in total.

You can use MERGE to poulate your fact table COVIDFact :
MERGE
INTO factcovid
using (
SELECT centerid,
dateid,
patientid,
strainid
FROM yourstagingfacttable ) AS f
ON factcovid.centerid = f.centerid AND factcovid.dateid=f.dateid... //the join columns
WHEN matched THEN
do nothing WHEN NOT matched THEN
INSERT VALUES
(
f.centerid,
f.dateid,
f.patientid,
f.strainid
)
And for VaccinationsFact :
MERGE
INTO vaccinations
using (
SELECT centerid,
dateid,
patientid,
vaccineid
FROM yourstagingfacttable ) AS f
ON factcovid.centerid = f.centerid //join condition(s)
WHEN matched THEN
do nothing WHEN NOT matched THEN
INSERT VALUES
(
f.centerid,
f.dateid,
f.patientid,
f.vaccineid
)
For the TotalDeathUK measure :
SELECT S.[Name] AS Strain, COUNT(CF.PatientID) AS [Count of Deaths] FROM CovidFact AS CF
LEFT JOIN Strain AS S ON S.StrainID=CF.StrainID
LEFT JOIN Time AS T ON CF.DateID=T.DateID
LEFT JOIN TreatmentCenter AS TR ON TR.CenterID=CF.CenterID
LEFT JOIN City AS C ON C.CityID = TR.CityID
WHERE C.Country LIKE 'UK' AND T.Year=2020
AND Result LIKE 'Death' // you should add a Result column to check if the Patient survived or died
GROUP BY S.[Name]

Hive: Optimal way to JOIN using 2 ON conditions with NULL

I have 2 tables which look as below.
Table_A
ID1 ID2 NAME
112 NULL ADAM
132 990 BRIAN
NULL 980 CARL
Table_B
ID1 ID2 SURNAME
112 NULL LEVINE
132 990 LARA
NULL 980 JOHNSON
If I join the table as below the null comparisons would not work and hence not return a surname for ADAM
SELECT A.NAME,B.SURNAME
FROM
TABLE_A A
LEFT JOIN
TABLE_B B
ON A.ID1 = B.ID1
AND
A.ID2 = B.ID2;
I added a check for NULL in the ON clause for ID2 which did work but the operation turned out to be costly for even small tables. (See below)
SELECT A.NAME,B.SURNAME
FROM
TABLE_A A
LEFT JOIN
TABLE_B B
ON
(A.ID1 = B.ID1 OR (A.ID1 IS NULL AND B.ID1 IS NULL))
AND
(A.ID2 = B.ID2 OR (A.ID2 IS NULL AND B.ID2 IS NULL));
What would be the right way to go about with this comparison?

To join NULLs like normal values, use NVL() function to substitute NULL with some value which is not used normally in the data, for example -9999:
SELECT A.NAME,B.SURNAME
FROM
TABLE_A A
LEFT JOIN
TABLE_B B
ON NVL(A.ID1,-9999) = NVL(B.ID1,-9999)
AND
NVL(A.ID2,-9999) = NVL(B.ID2,-9999);

Hive doesn't support or expression in on condition.
The join condition should consists of purely equality expression.
I prefer the COALESCE function:
SELECT A.NAME,B.SURNAME
FROM
TABLE_A A
LEFT JOIN
TABLE_B B
ON
COALESCE(A.ID1, 'missing') = COALESCE(B.ID1, 'missing')
AND
COALESCE(A.ID2, 'missing') = COALESCE(B.ID2, 'missing')

This is a typical case scenario that calls for a NULL-safe equality operator, which is natively supported by Hive with the GenericUDF <=>. This operator, as I quote:
Returns same result with EQUAL(=) operator for non-null operands,
but returns TRUE if both are NULL, FALSE if one of the them is NULL.
So the SQL is as simple as the following:
select
a.name,
b.surname
from table_a a
left join table_b b
on a.id1 <=> b.id1 and a.id2 <=> b.id2;

Find records with ID in array of IDS and keep the order of records matching that of IDs [duplicate]

I have a simple SQL query in PostgreSQL 8.3 that grabs a bunch of comments. I provide a sorted list of values to the IN construct in the WHERE clause:
SELECT * FROM comments WHERE (comments.id IN (1,3,2,4));
This returns comments in an arbitrary order which in my happens to be ids like 1,2,3,4.
I want the resulting rows sorted like the list in the IN construct: (1,3,2,4).
How to achieve that?

You can do it quite easily with (introduced in PostgreSQL 8.2) VALUES (), ().
Syntax will be like this:
select c.*
from comments c
join (
values
(1,1),
(3,2),
(2,3),
(4,4)
) as x (id, ordering) on c.id = x.id
order by x.ordering

In Postgres 9.4 or later, this is simplest and fastest:
SELECT c.*
FROM comments c
JOIN unnest('{1,3,2,4}'::int[]) WITH ORDINALITY t(id, ord) USING (id)
ORDER BY t.ord;
WITH ORDINALITY was introduced with in Postgres 9.4.
No need for a subquery, we can use the set-returning function like a table directly. (A.k.a. "table-function".)
A string literal to hand in the array instead of an ARRAY constructor may be easier to implement with some clients.
For convenience (optionally), copy the column name we are joining to ("id" in the example), so we can join with a short USING clause to only get a single instance of the join column in the result.
Works with any input type. If your key column is of type text, provide something like '{foo,bar,baz}'::text[].
Detailed explanation:
PostgreSQL unnest() with element number

Just because it is so difficult to find and it has to be spread: in mySQL this can be done much simpler, but I don't know if it works in other SQL.
SELECT * FROM `comments`
WHERE `comments`.`id` IN ('12','5','3','17')
ORDER BY FIELD(`comments`.`id`,'12','5','3','17')

With Postgres 9.4 this can be done a bit shorter:
select c.*
from comments c
join (
select *
from unnest(array[43,47,42]) with ordinality
) as x (id, ordering) on c.id = x.id
order by x.ordering;
Or a bit more compact without a derived table:
select c.*
from comments c
join unnest(array[43,47,42]) with ordinality as x (id, ordering)
on c.id = x.id
order by x.ordering
Removing the need to manually assign/maintain a position to each value.
With Postgres 9.6 this can be done using array_position():
with x (id_list) as (
values (array[42,48,43])
)
select c.*
from comments c, x
where id = any (x.id_list)
order by array_position(x.id_list, c.id);
The CTE is used so that the list of values only needs to be specified once. If that is not important this can also be written as:
select c.*
from comments c
where id in (42,48,43)
order by array_position(array[42,48,43], c.id);

I think this way is better :
SELECT * FROM "comments" WHERE ("comments"."id" IN (1,3,2,4))
ORDER BY id=1 DESC, id=3 DESC, id=2 DESC, id=4 DESC

Another way to do it in Postgres would be to use the idx function.
SELECT *
FROM comments
ORDER BY idx(array[1,3,2,4], comments.id)
Don't forget to create the idx function first, as described here: http://wiki.postgresql.org/wiki/Array_Index

In Postgresql:
select *
from comments
where id in (1,3,2,4)
order by position(id::text in '1,3,2,4')

On researching this some more I found this solution:
SELECT * FROM "comments" WHERE ("comments"."id" IN (1,3,2,4))
ORDER BY CASE "comments"."id"
WHEN 1 THEN 1
WHEN 3 THEN 2
WHEN 2 THEN 3
WHEN 4 THEN 4
END
However this seems rather verbose and might have performance issues with large datasets.
Can anyone comment on these issues?

To do this, I think you should probably have an additional "ORDER" table which defines the mapping of IDs to order (effectively doing what your response to your own question said), which you can then use as an additional column on your select which you can then sort on.
In that way, you explicitly describe the ordering you desire in the database, where it should be.

sans SEQUENCE, works only on 8.4:
select * from comments c
join
(
select id, row_number() over() as id_sorter
from (select unnest(ARRAY[1,3,2,4]) as id) as y
) x on x.id = c.id
order by x.id_sorter

SELECT * FROM "comments" JOIN (
SELECT 1 as "id",1 as "order" UNION ALL
SELECT 3,2 UNION ALL SELECT 2,3 UNION ALL SELECT 4,4
) j ON "comments"."id" = j."id" ORDER BY j.ORDER
or if you prefer evil over good:
SELECT * FROM "comments" WHERE ("comments"."id" IN (1,3,2,4))
ORDER BY POSITION(','+"comments"."id"+',' IN ',1,3,2,4,')

And here's another solution that works and uses a constant table (http://www.postgresql.org/docs/8.3/interactive/sql-values.html):
SELECT * FROM comments AS c,
(VALUES (1,1),(3,2),(2,3),(4,4) ) AS t (ord_id,ord)
WHERE (c.id IN (1,3,2,4)) AND (c.id = t.ord_id)
ORDER BY ord
But again I'm not sure that this is performant.
I've got a bunch of answers now. Can I get some voting and comments so I know which is the winner!
Thanks All :-)

create sequence serial start 1;
select * from comments c
join (select unnest(ARRAY[1,3,2,4]) as id, nextval('serial') as id_sorter) x
on x.id = c.id
order by x.id_sorter;
drop sequence serial;
[EDIT]
unnest is not yet built-in in 8.3, but you can create one yourself(the beauty of any*):
create function unnest(anyarray) returns setof anyelement
language sql as
$$
select $1[i] from generate_series(array_lower($1,1),array_upper($1,1)) i;
$$;
that function can work in any type:
select unnest(array['John','Paul','George','Ringo']) as beatle
select unnest(array[1,3,2,4]) as id

Slight improvement over the version that uses a sequence I think:
CREATE OR REPLACE FUNCTION in_sort(anyarray, out id anyelement, out ordinal int)
LANGUAGE SQL AS
$$
SELECT $1[i], i FROM generate_series(array_lower($1,1),array_upper($1,1)) i;
$$;
SELECT
*
FROM
comments c
INNER JOIN (SELECT * FROM in_sort(ARRAY[1,3,2,4])) AS in_sort
USING (id)
ORDER BY in_sort.ordinal;

select * from comments where comments.id in
(select unnest(ids) from bbs where id=19795)
order by array_position((select ids from bbs where id=19795),comments.id)
here, [bbs] is the main table that has a field called ids,
and, ids is the array that store the comments.id .
passed in postgresql 9.6

Lets get a visual impression about what was already said. For example you have a table with some tasks:
SELECT a.id,a.status,a.description FROM minicloud_tasks as a ORDER BY random();
id | status | description
----+------------+------------------
4 | processing | work on postgres
6 | deleted | need some rest
3 | pending | garden party
5 | completed | work on html
And you want to order the list of tasks by its status.
The status is a list of string values:
(processing, pending, completed, deleted)
The trick is to give each status value an interger and order the list numerical:
SELECT a.id,a.status,a.description FROM minicloud_tasks AS a
JOIN (
VALUES ('processing', 1), ('pending', 2), ('completed', 3), ('deleted', 4)
) AS b (status, id) ON (a.status = b.status)
ORDER BY b.id ASC;
Which leads to:
id | status | description
----+------------+------------------
4 | processing | work on postgres
3 | pending | garden party
5 | completed | work on html
6 | deleted | need some rest
Credit #user80168

I agree with all other posters that say "don't do that" or "SQL isn't good at that". If you want to sort by some facet of comments then add another integer column to one of your tables to hold your sort criteria and sort by that value. eg "ORDER BY comments.sort DESC " If you want to sort these in a different order every time then... SQL won't be for you in this case.

Getting Conditional Count in Join with Laravel Query Builder

I am trying to achieve the following with Laravel Query builder.
I have a table called deals . Below is the basic schema
id
deal_id
merchant_id
status
deal_text
timestamps
I also have another table called merchants whose schema is
id
merchant_id
merchant_name
about
timestamps
Currently I am getting deals using the following query
$deals = DB::table('deals')
-> join ('merchants', 'deals.merchant_id', '=', 'merchants.merchant_id')
-> where ('merchant_url_text', $merchant_url_text)
-> get();
Since only 1 merchant is associated with a deal, I am getting deals and related merchant info with the query.
Now I have a 3rd table called tbl_deal_votes. Its schema looks like
id
deal_id
vote (1 if voted up, 0 if voted down)
timestamps
What I want to do is join this 3rd table (on deal_id) to my existing query and be able to also get the upvotes and down votes each deal has received.

To do this in a single query you'll probably need to use SQL subqueries, which doesn't seem to have good fluent query support in Laravel 4/5. Since you're not using Eloquent objects, the raw SQL is probably easiest to read. (Note the below example ignores your deals.deal_id and merchants.merchant_id columns, which can likely be dropped. Instead it just uses your deals.id and merchants.id fields by convention.)
$deals = DB::select(
DB::raw('
SELECT
deals.id AS deal_id,
deals.status,
deals.deal_text,
merchants.id AS merchant_id,
merchants.merchant_name,
merchants.about,
COALESCE(tbl_upvotes.upvotes_count, 0) AS upvotes_count,
COALESCE(tbl_downvotes.downvotes_count, 0) AS downvotes_count
FROM
deals
JOIN merchants ON (merchants.id = deals.merchant_id)
LEFT JOIN (
SELECT deal_id, count(*) AS upvotes_count
FROM tbl_deal_votes
WHERE vote = 1 && deal_id
GROUP BY deal_id
) tbl_upvotes ON (tbl_upvotes.deal_id = deals.id)
LEFT JOIN (
SELECT deal_id, count(*) AS downvotes_count
FROM tbl_deal_votes
WHERE vote = 0
GROUP BY deal_id
) tbl_downvotes ON (tbl_downvotes.deal_id = deals.id)
')
);
If you'd prefer to use fluent, this should work:
$upvotes_subquery = '
SELECT deal_id, count(*) AS upvotes_count
FROM tbl_deal_votes
WHERE vote = 1
GROUP BY deal_id';
$downvotes_subquery = '
SELECT deal_id, count(*) AS downvotes_count
FROM tbl_deal_votes
WHERE vote = 0
GROUP BY deal_id';
$deals = DB::table('deals')
->select([
DB::raw('deals.id AS deal_id'),
'deals.status',
'deals.deal_text',
DB::raw('merchants.id AS merchant_id'),
'merchants.merchant_name',
'merchants.about',
DB::raw('COALESCE(tbl_upvotes.upvotes_count, 0) AS upvotes_count'),
DB::raw('COALESCE(tbl_downvotes.downvotes_count, 0) AS downvotes_count')
])
->join('merchants', 'merchants.id', '=', 'deals.merchant_id')
->leftJoin(DB::raw('(' . $upvotes_subquery . ') tbl_upvotes'), function($join) {
$join->on('tbl_upvotes.deal_id', '=', 'deals.id');
})
->leftJoin(DB::raw('(' . $downvotes_subquery . ') tbl_downvotes'), function($join) {
$join->on('tbl_downvotes.deal_id', '=', 'deals.id');
})
->get();
A few notes about the fluent query:
Used the DB::raw() method to rename a few selected columns.
Otherwise, there would have been a conflict between deals.id
and merchants.id in the results.
Used COALESCE to default null votes to 0.
Split the subqueries into separate PHP strings to improve readability.
Used left joins for the subqueries so deals with no upvotes/downvotes still show up.

store procedure nulls or zero value in column

i have a store procedure,its meant for updating a table,when i execute it,it brings out nulls or zero values for a some columns.this is the logic used
IF OBJECT_ID('tempdb..#exxPresessions_john') IS NOT NULL
DROP TABLE #exxPresessions_john
SELECT c.claim_id,
c.completed_date,
wp.createdon,
COUNT(DISTINCT wp.WebSessionId) AS websessions
INTO #exxPresessions_john
FROM dbo.web_PageviewsID wp WITH (NOLOCK)
JOIN #CliamID_john c WITH (NOLOCK)
ON c.claim_id = wp.claimid
WHERE ClaimType IS NOT NULL
AND c.completed_date > wp.createdon
GROUP BY
claim_id,
completed_date,
createdon
ORDER BY claim_id;
CREATE INDEX idx_index2 ON #WebPresessions_nosa (claim_id);
this is the Condition
completed_date > created_date
.it returns NULL as completed_date is NULL or Zero
i tried this but it did not work
and ISNULL (c.completed_date,0)> wp.createdon

You need to add one more condition like this:
WHERE ClaimType IS NOT NULL
AND c.completed_date is not null --Removes the null completed_date
AND c.completed_date > wp.createdon

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

join / union in presto to keep email in one column - join

Related

Populating Fact Tables(Data Warehouse) and Querying

Hive: Optimal way to JOIN using 2 ON conditions with NULL

Find records with ID in array of IDS and keep the order of records matching that of IDs [duplicate]

Getting Conditional Count in Join with Laravel Query Builder

store procedure nulls or zero value in column

Categories

Resources