I am not sure how to query my fact tables(covid and vaccinations), I populated the dimensions with dummy data, I am supposed to leave the fact tables empty? As far as I know, they would get populated when I write the queries.
I am not sure how to query the tables I have tried different things, but I get an empty result.
Below is a link to the schema.
I want to find out the "TotalDeathsUK"(fact table COVID) for the last year caused by each "Strain"(my strain table has 3 strain in total.
You can use MERGE to poulate your fact table COVIDFact :
MERGE
INTO factcovid
using (
SELECT centerid,
dateid,
patientid,
strainid
FROM yourstagingfacttable ) AS f
ON factcovid.centerid = f.centerid AND factcovid.dateid=f.dateid... //the join columns
WHEN matched THEN
do nothing WHEN NOT matched THEN
INSERT VALUES
(
f.centerid,
f.dateid,
f.patientid,
f.strainid
)
And for VaccinationsFact :
MERGE
INTO vaccinations
using (
SELECT centerid,
dateid,
patientid,
vaccineid
FROM yourstagingfacttable ) AS f
ON factcovid.centerid = f.centerid //join condition(s)
WHEN matched THEN
do nothing WHEN NOT matched THEN
INSERT VALUES
(
f.centerid,
f.dateid,
f.patientid,
f.vaccineid
)
For the TotalDeathUK measure :
SELECT S.[Name] AS Strain, COUNT(CF.PatientID) AS [Count of Deaths] FROM CovidFact AS CF
LEFT JOIN Strain AS S ON S.StrainID=CF.StrainID
LEFT JOIN Time AS T ON CF.DateID=T.DateID
LEFT JOIN TreatmentCenter AS TR ON TR.CenterID=CF.CenterID
LEFT JOIN City AS C ON C.CityID = TR.CityID
WHERE C.Country LIKE 'UK' AND T.Year=2020
AND Result LIKE 'Death' // you should add a Result column to check if the Patient survived or died
GROUP BY S.[Name]
I'm working with very large tables in Hive so I'd like to avoid having to create a whole new table when joining a single column from Table 2 to Table 1.
My first pass using the INSERT and UPDATE statements with the following test data didn't work.
Is there a way to do this or is it simpler to just create Table 3 by joining Table 1 to Table 2 and then dropping Table 1?
DROP TABLE IF EXISTS table_1;
CREATE TABLE table_1 (id VARCHAR(64), cost INT, diag_cd VARCHAR(64));
INSERT INTO TABLE table_1
VALUES ('A0001', 1000, 'A1'), ('A0001', 2000, 'B1'), ('A0001', 3000, 'B1'),
('B0001', 5000, 'A1'), ('B0001', 10000, 'B1'), ('B0001', 15000, 'C1'),
('C0001', 11000, 'B1'), ('C0001', 14000, 'C1'), ('C0001', 20000, 'C1');
DROP TABLE IF EXISTS table_2;
CREATE TABLE table_2 (id VARCHAR(64), prodt_cd VARCHAR(64));
INSERT INTO TABLE table_2
VALUES ('A0001', 'OAP'), ('B0001', 'OAPIN'), ('C0001', 'MOAPIN');
INSERT INTO TABLE table_1 prodt_cd VARCHAR(64);
UPDATE table_1 t1 SET t1.prodt_cd = t2.prodt_cd
INNER JOIN table_2 t2
ON t1.id = t2.id;
After some research and help from Mike67 I found a solution.
It appears Hive does not support COLUMN UPDATE or MERGE statements, but a simple alternative is to create an empty table and then populate it with fields from a join:
DROP TABLE IF EXISTS table_3;
CREATE TABLE table_3 LIKE table_1;
INSERT INTO TABLE table_3
SELECT a.*, b.prodt_cd
FROM table_1 AS a
LEFT OUTER JOIN table_2 AS b
ON a.id = b.id;
I have a simple SQL query in PostgreSQL 8.3 that grabs a bunch of comments. I provide a sorted list of values to the IN construct in the WHERE clause:
SELECT * FROM comments WHERE (comments.id IN (1,3,2,4));
This returns comments in an arbitrary order which in my happens to be ids like 1,2,3,4.
I want the resulting rows sorted like the list in the IN construct: (1,3,2,4).
How to achieve that?
You can do it quite easily with (introduced in PostgreSQL 8.2) VALUES (), ().
Syntax will be like this:
select c.*
from comments c
join (
values
(1,1),
(3,2),
(2,3),
(4,4)
) as x (id, ordering) on c.id = x.id
order by x.ordering
In Postgres 9.4 or later, this is simplest and fastest:
SELECT c.*
FROM comments c
JOIN unnest('{1,3,2,4}'::int[]) WITH ORDINALITY t(id, ord) USING (id)
ORDER BY t.ord;
WITH ORDINALITY was introduced with in Postgres 9.4.
No need for a subquery, we can use the set-returning function like a table directly. (A.k.a. "table-function".)
A string literal to hand in the array instead of an ARRAY constructor may be easier to implement with some clients.
For convenience (optionally), copy the column name we are joining to ("id" in the example), so we can join with a short USING clause to only get a single instance of the join column in the result.
Works with any input type. If your key column is of type text, provide something like '{foo,bar,baz}'::text[].
Detailed explanation:
PostgreSQL unnest() with element number
Just because it is so difficult to find and it has to be spread: in mySQL this can be done much simpler, but I don't know if it works in other SQL.
SELECT * FROM `comments`
WHERE `comments`.`id` IN ('12','5','3','17')
ORDER BY FIELD(`comments`.`id`,'12','5','3','17')
With Postgres 9.4 this can be done a bit shorter:
select c.*
from comments c
join (
select *
from unnest(array[43,47,42]) with ordinality
) as x (id, ordering) on c.id = x.id
order by x.ordering;
Or a bit more compact without a derived table:
select c.*
from comments c
join unnest(array[43,47,42]) with ordinality as x (id, ordering)
on c.id = x.id
order by x.ordering
Removing the need to manually assign/maintain a position to each value.
With Postgres 9.6 this can be done using array_position():
with x (id_list) as (
values (array[42,48,43])
)
select c.*
from comments c, x
where id = any (x.id_list)
order by array_position(x.id_list, c.id);
The CTE is used so that the list of values only needs to be specified once. If that is not important this can also be written as:
select c.*
from comments c
where id in (42,48,43)
order by array_position(array[42,48,43], c.id);
I think this way is better :
SELECT * FROM "comments" WHERE ("comments"."id" IN (1,3,2,4))
ORDER BY id=1 DESC, id=3 DESC, id=2 DESC, id=4 DESC
Another way to do it in Postgres would be to use the idx function.
SELECT *
FROM comments
ORDER BY idx(array[1,3,2,4], comments.id)
Don't forget to create the idx function first, as described here: http://wiki.postgresql.org/wiki/Array_Index
In Postgresql:
select *
from comments
where id in (1,3,2,4)
order by position(id::text in '1,3,2,4')
On researching this some more I found this solution:
SELECT * FROM "comments" WHERE ("comments"."id" IN (1,3,2,4))
ORDER BY CASE "comments"."id"
WHEN 1 THEN 1
WHEN 3 THEN 2
WHEN 2 THEN 3
WHEN 4 THEN 4
END
However this seems rather verbose and might have performance issues with large datasets.
Can anyone comment on these issues?
To do this, I think you should probably have an additional "ORDER" table which defines the mapping of IDs to order (effectively doing what your response to your own question said), which you can then use as an additional column on your select which you can then sort on.
In that way, you explicitly describe the ordering you desire in the database, where it should be.
sans SEQUENCE, works only on 8.4:
select * from comments c
join
(
select id, row_number() over() as id_sorter
from (select unnest(ARRAY[1,3,2,4]) as id) as y
) x on x.id = c.id
order by x.id_sorter
SELECT * FROM "comments" JOIN (
SELECT 1 as "id",1 as "order" UNION ALL
SELECT 3,2 UNION ALL SELECT 2,3 UNION ALL SELECT 4,4
) j ON "comments"."id" = j."id" ORDER BY j.ORDER
or if you prefer evil over good:
SELECT * FROM "comments" WHERE ("comments"."id" IN (1,3,2,4))
ORDER BY POSITION(','+"comments"."id"+',' IN ',1,3,2,4,')
And here's another solution that works and uses a constant table (http://www.postgresql.org/docs/8.3/interactive/sql-values.html):
SELECT * FROM comments AS c,
(VALUES (1,1),(3,2),(2,3),(4,4) ) AS t (ord_id,ord)
WHERE (c.id IN (1,3,2,4)) AND (c.id = t.ord_id)
ORDER BY ord
But again I'm not sure that this is performant.
I've got a bunch of answers now. Can I get some voting and comments so I know which is the winner!
Thanks All :-)
create sequence serial start 1;
select * from comments c
join (select unnest(ARRAY[1,3,2,4]) as id, nextval('serial') as id_sorter) x
on x.id = c.id
order by x.id_sorter;
drop sequence serial;
[EDIT]
unnest is not yet built-in in 8.3, but you can create one yourself(the beauty of any*):
create function unnest(anyarray) returns setof anyelement
language sql as
$$
select $1[i] from generate_series(array_lower($1,1),array_upper($1,1)) i;
$$;
that function can work in any type:
select unnest(array['John','Paul','George','Ringo']) as beatle
select unnest(array[1,3,2,4]) as id
Slight improvement over the version that uses a sequence I think:
CREATE OR REPLACE FUNCTION in_sort(anyarray, out id anyelement, out ordinal int)
LANGUAGE SQL AS
$$
SELECT $1[i], i FROM generate_series(array_lower($1,1),array_upper($1,1)) i;
$$;
SELECT
*
FROM
comments c
INNER JOIN (SELECT * FROM in_sort(ARRAY[1,3,2,4])) AS in_sort
USING (id)
ORDER BY in_sort.ordinal;
select * from comments where comments.id in
(select unnest(ids) from bbs where id=19795)
order by array_position((select ids from bbs where id=19795),comments.id)
here, [bbs] is the main table that has a field called ids,
and, ids is the array that store the comments.id .
passed in postgresql 9.6
Lets get a visual impression about what was already said. For example you have a table with some tasks:
SELECT a.id,a.status,a.description FROM minicloud_tasks as a ORDER BY random();
id | status | description
----+------------+------------------
4 | processing | work on postgres
6 | deleted | need some rest
3 | pending | garden party
5 | completed | work on html
And you want to order the list of tasks by its status.
The status is a list of string values:
(processing, pending, completed, deleted)
The trick is to give each status value an interger and order the list numerical:
SELECT a.id,a.status,a.description FROM minicloud_tasks AS a
JOIN (
VALUES ('processing', 1), ('pending', 2), ('completed', 3), ('deleted', 4)
) AS b (status, id) ON (a.status = b.status)
ORDER BY b.id ASC;
Which leads to:
id | status | description
----+------------+------------------
4 | processing | work on postgres
3 | pending | garden party
5 | completed | work on html
6 | deleted | need some rest
Credit #user80168
I agree with all other posters that say "don't do that" or "SQL isn't good at that". If you want to sort by some facet of comments then add another integer column to one of your tables to hold your sort criteria and sort by that value. eg "ORDER BY comments.sort DESC " If you want to sort these in a different order every time then... SQL won't be for you in this case.
In power query ( version included with exel 2016, PC ), is it possible to refer to a computed column of a related table?
Say I have an sqlite database as follow
PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE products (
iddb INTEGER NOT NULL,
px FLOAT,
PRIMARY KEY (iddb)
);
INSERT INTO "products" VALUES(0,0.0);
INSERT INTO "products" VALUES(1,1.1);
INSERT INTO "products" VALUES(2,2.2);
INSERT INTO "products" VALUES(3,3.3);
INSERT INTO "products" VALUES(4,4.4);
CREATE TABLE sales (
iddb INTEGER NOT NULL,
quantity INTEGER,
product_iddb INTEGER,
PRIMARY KEY (iddb),
FOREIGN KEY(product_iddb) REFERENCES products (iddb)
);
INSERT INTO "sales" VALUES(0,0,0);
INSERT INTO "sales" VALUES(1,1,1);
INSERT INTO "sales" VALUES(2,2,2);
INSERT INTO "sales" VALUES(3,3,3);
INSERT INTO "sales" VALUES(4,4,4);
INSERT INTO "sales" VALUES(5,5,0);
INSERT INTO "sales" VALUES(6,6,1);
INSERT INTO "sales" VALUES(7,7,2);
INSERT INTO "sales" VALUES(8,8,3);
INSERT INTO "sales" VALUES(9,9,4);
COMMIT;
basically we have products ( iddb, px ) and sales of those products ( iddb, quantity, product_iddb )
I load this data in power query by:
A. creating an ODBC data source using SQLITE3 driver : testDSN
B. in excel : data / new query , feeding this connection string Provider=MSDASQL.1;Persist Security Info=False;DSN=testDSN;
Now in power query I add a computed column, say px10 = px * 10 to the product table.
In the sales table, I can expand the product table into product.px, but not product.px10 . Shouldn't it be doable? ( in this simplified example I could expand first product.px and then create the px10 column in the sales table, but then any new table needinng px10 from product would require me to repeat the work... )
Any inputs appreciated.
I would add a Merge step from the sales query to connect it to the product query (which will include your calculated column). Then I would expand the Table returned to get your px10 column.
This is instead of expanding the Value column representing the product SQL table, which gets generated using the SQL foreign key.
You will have to come back and add any further columns added to the product query to the expansion list, but at least the column definition is only in one place.
In functional programming you don't modify existing values, only create new values. When you add the new column to product it creates a new table, and doesn't modify the product table that shows up in related tables. Adding a new column over product can't show up in Odbc's tables unless you apply that transformation to all related tables.
What you could do is generalize the "add a computed column" into a function that takes a table or record and adds the extra field. Then just apply that over each table in your database.
Here's an example against Northwind in SQL Server
let
Source = Sql.Database(".", "Northwind_Copy"),
AddAColumn = (data) => if data is table then Table.AddColumn(data, "UnitPrice10x", each [UnitPrice] * 10)
else if data is record then Record.AddField(data, "UnitPrice10x", data[UnitPrice] * 10)
else data,
TransformedSource = Table.TransformColumns(Source, {"Data", (data) => if data is table then Table.TransformColumns(data, {"Products", AddAColumn}, null, MissingField.Ignore) else data}),
OrderDetails = TransformedSource{[Schema="dbo",Item="Order Details"]}[Data],
#"Expanded Products" = Table.ExpandRecordColumn(OrderDetails, "Products", {"UnitPrice", "UnitPrice10x"}, {"Products.UnitPrice", "Products.UnitPrice10x"})
in
#"Expanded Products"
Database is Teradata
I have two table which I am trying to join. Following are the table structures. When I join these table I expect to get two rows as output but getting 4 rows.what is reason for this behavior. Join based on three keys should uniquely identify a row but still getting 4 rows as output. Any help is appreciated.
TableA
Weekkey|segment|type|users
201501|1|A|100
201501|1|B|100
TableB
Weekkey|segment|type|revenue
201501|1|A|200
201501|1|B|200
when I join these two table using the following query i get the following result
select a.* ,b.user
from tablea a left join tableb b on a.weekkey=b.weekkey
and a.segment=b.segment
and a.type=b.type
Weekkey|segment|type|revenue|users
201501|1|A|200|100
201501|1|B|200|100
201501|1|A|200|100
201501|1|B|200|100
Using sql server, here is ddl and sample data along with the query you posted. The output you state you are getting doesn't happen here.
create table #tablea
(
Weekkey int
, segment int
, type char(1)
, users int
)
insert #tablea
select 201501, 1, 'A', 100 union all
select 201501, 1, 'B', 100
create table #TableB
(
Weekkey int
, segment int
, type char(1)
, revenue int
)
insert #TableB
select 201501, 1, 'A', 200 union all
select 201501, 1, 'B', 200
select a.*
, b.revenue
from #tablea a
left join #tableb b on a.weekkey = b.weekkey
and a.segment = b.segment
and a.type = b.type
drop table #tablea
drop table #TableB