Informix one to many format issue - informix

Trying to fix my Informix query results format from a one to many relationship. My current query is using a JOIN but is creating a new line for every time there is a match to the JOIN ON condition. I should add the below is only an example, the real data is thousands of entries with about a 100 unique "category" entries so I cant hard code WHERE statements, it needs to read each entry and add if a match. I tried a GROUP_CONCAT however is just returned an error, guess its not a informix function, I also tried reading this thread but have yet been unable to get working. Show a one to many relationship as 2 columns - 1 unique row (ID & comma separated list)
Any help will be appreciated.
IBM/Informix-Connect Version 3.70.UC4
IBM/Informix LIBGLS LIBRARY Version 5.00.UC5
IBM Informix Dynamic Server Version 11.70.FC8W1
Tables
movie
name rating movie_id
rio g 1
horton g 2
blade r 3
lotr_1 pg13 4
lotr_2 pg13 5
paul_blart pg 6
category
cat_name id
kids 1
comedy 2
action 3
fantasy 4
category_member
movie_name cat_name catmem_id
lotr_1 action 1
lotr_1 fantasy 2
rio kids 3
rio comedy 4
When I use
#!/bin/bash
echo "SET isolation dirty read;
UNLOAD to /export/home/movie/movieDetail.unl DELIMITER ','
SELECT a.name, a.rating, b.cat_name
FROM movie a
LEFT JOIN category b ON b.movie_name = a.name
;" | dbaccess thedb;
What I get is
rio,g,kids
rio,g,comedy
lotr_1,pg13,action
lotr_1,pg13,fantasy
What I would like is
rio,g,kids,comedy
lotr_1,pg13,action,fantasy

Install the GROUP_CONCAT user-defined aggregate
You must install the GROUP_CONCAT user-defined aggregate from SO 715350 (referenced in your question) into your database. The GROUP_CONCAT aggregate is not defined by Informix, but can be added if you use the SQL from that question. One difference between that and a normal built-in function is that you need to install the aggregate in each database in the server where you need to use it. There might be a way to do a 'global install' (for all databases in a given server), but I've forgotten (or, more accurately, never learned) how to do it.
Writing your queries
With the sample database listed at the bottom:
The query in the question does not run:
SELECT a.name, a.rating, b.cat_name
FROM movie a
LEFT JOIN category b ON b.movie_name = a.name;
SQL -217: Column (movie_name) not found in any table in the query (or SLV is undefined).
This can be fixed by changing category to category_member. This produces:
SELECT a.name, a.rating, b.cat_name
FROM movie a
LEFT JOIN category_member b ON b.movie_name = a.name;
rio g kids
rio g comedy
horton g
blade r
lotr_1 pg13 action
lotr_1 pg13 fantasy
lotr_2 pg13
paul_blart pg
The LEFT JOIN appears to be unwanted. And using GROUP_CONCAT produces approximately the desired answer:
SELECT a.name, a.rating, GROUP_CONCAT(b.cat_name)
FROM movie a
JOIN category_member b ON b.movie_name = a.name
GROUP BY a.name, a.rating;
rio g kids,comedy
lotr_1 pg13 action,fantasy
If you specify the delimiter as ,, the commas in the data from the GROUP_CONCAT operator will be escaped to avoid ambiguity:
SELECT a.NAME, a.rating, GROUP_CONCAT(b.cat_name)
FROM movie a
JOIN category_member b ON b.movie_name = a.NAME
GROUP BY a.NAME, a.rating;
rio,g,kids\,comedy
lotr_1,pg13,action\,fantasy
Within standard Informix utilities, there isn't a way to avoid that; they don't leave the selected/unloaded data in an ambiguous format.
I'm not convinced that the database schema is very well organized. The Movie table is OK; the Category table is OK; but the Category_Member table would be more orthodox if it used the schema:
DROP TABLE IF EXISTS category_member;
CREATE TABLE category_member
(
movie_id INTEGER NOT NULL REFERENCES Movie(Movie_id),
category_id INTEGER NOT NULL REFERENCES Category(Id),
PRIMARY KEY(movie_id, category_id)
);
INSERT INTO category_member VALUES(4, 3);
INSERT INTO category_member VALUES(4, 4);
INSERT INTO category_member VALUES(1, 1);
INSERT INTO category_member VALUES(1, 2);
-- Use GROUP_CONCAT
SELECT a.NAME, a.rating, GROUP_CONCAT(c.cat_name)
FROM movie a
JOIN category_member b ON b.movie_id = a.movie_id
JOIN category c ON b.category_id = c.id
GROUP BY a.NAME, a.rating;
The output from this query is the same as from the previous one, but the joining is more orthodox.
Sample database
DROP TABLE IF EXISTS movie;
CREATE TABLE movie
(
name VARCHAR(20) NOT NULL UNIQUE,
rating CHAR(4) NOT NULL,
movie_id SERIAL NOT NULL PRIMARY KEY
);
INSERT INTO movie VALUES("rio", "g", 1);
INSERT INTO movie VALUES("horton", "g", 2);
INSERT INTO movie VALUES("blade", "r", 3);
INSERT INTO movie VALUES("lotr_1", "pg13", 4);
INSERT INTO movie VALUES("lotr_2", "pg13", 5);
INSERT INTO movie VALUES("paul_blart", "pg", 6);
DROP TABLE IF EXISTS category;
CREATE TABLE category
(
cat_name VARCHAR(10) NOT NULL UNIQUE,
id SERIAL NOT NULL PRIMARY KEY
);
INSERT INTO category VALUES("kids", 1);
INSERT INTO category VALUES("comedy", 2);
INSERT INTO category VALUES("action", 3);
INSERT INTO category VALUES("fantasy", 4);
DROP TABLE IF EXISTS category_member;
CREATE TABLE category_member
(
movie_name VARCHAR(20) NOT NULL,
cat_name VARCHAR(10) NOT NULL,
catmem_id SERIAL NOT NULL PRIMARY KEY
);
INSERT INTO category_member VALUES("lotr_1", "action", 1);
INSERT INTO category_member VALUES("lotr_1", "fantasy", 2);
INSERT INTO category_member VALUES("rio", "kids", 3);
INSERT INTO category_member VALUES("rio", "comedy", 4);

Related

Populating Fact Tables(Data Warehouse) and Querying

I am not sure how to query my fact tables(covid and vaccinations), I populated the dimensions with dummy data, I am supposed to leave the fact tables empty? As far as I know, they would get populated when I write the queries.
I am not sure how to query the tables I have tried different things, but I get an empty result.
Below is a link to the schema.
I want to find out the "TotalDeathsUK"(fact table COVID) for the last year caused by each "Strain"(my strain table has 3 strain in total.
You can use MERGE to poulate your fact table COVIDFact :
MERGE
INTO factcovid
using (
SELECT centerid,
dateid,
patientid,
strainid
FROM yourstagingfacttable ) AS f
ON factcovid.centerid = f.centerid AND factcovid.dateid=f.dateid... //the join columns
WHEN matched THEN
do nothing WHEN NOT matched THEN
INSERT VALUES
(
f.centerid,
f.dateid,
f.patientid,
f.strainid
)
And for VaccinationsFact :
MERGE
INTO vaccinations
using (
SELECT centerid,
dateid,
patientid,
vaccineid
FROM yourstagingfacttable ) AS f
ON factcovid.centerid = f.centerid //join condition(s)
WHEN matched THEN
do nothing WHEN NOT matched THEN
INSERT VALUES
(
f.centerid,
f.dateid,
f.patientid,
f.vaccineid
)
For the TotalDeathUK measure :
SELECT S.[Name] AS Strain, COUNT(CF.PatientID) AS [Count of Deaths] FROM CovidFact AS CF
LEFT JOIN Strain AS S ON S.StrainID=CF.StrainID
LEFT JOIN Time AS T ON CF.DateID=T.DateID
LEFT JOIN TreatmentCenter AS TR ON TR.CenterID=CF.CenterID
LEFT JOIN City AS C ON C.CityID = TR.CityID
WHERE C.Country LIKE 'UK' AND T.Year=2020
AND Result LIKE 'Death' // you should add a Result column to check if the Patient survived or died
GROUP BY S.[Name]

Is it possible to join a column from Table B to Table A without creating a new table C in Hive SQL?

I'm working with very large tables in Hive so I'd like to avoid having to create a whole new table when joining a single column from Table 2 to Table 1.
My first pass using the INSERT and UPDATE statements with the following test data didn't work.
Is there a way to do this or is it simpler to just create Table 3 by joining Table 1 to Table 2 and then dropping Table 1?
DROP TABLE IF EXISTS table_1;
CREATE TABLE table_1 (id VARCHAR(64), cost INT, diag_cd VARCHAR(64));
INSERT INTO TABLE table_1
VALUES ('A0001', 1000, 'A1'), ('A0001', 2000, 'B1'), ('A0001', 3000, 'B1'),
('B0001', 5000, 'A1'), ('B0001', 10000, 'B1'), ('B0001', 15000, 'C1'),
('C0001', 11000, 'B1'), ('C0001', 14000, 'C1'), ('C0001', 20000, 'C1');
DROP TABLE IF EXISTS table_2;
CREATE TABLE table_2 (id VARCHAR(64), prodt_cd VARCHAR(64));
INSERT INTO TABLE table_2
VALUES ('A0001', 'OAP'), ('B0001', 'OAPIN'), ('C0001', 'MOAPIN');
INSERT INTO TABLE table_1 prodt_cd VARCHAR(64);
UPDATE table_1 t1 SET t1.prodt_cd = t2.prodt_cd
INNER JOIN table_2 t2
ON t1.id = t2.id;
After some research and help from Mike67 I found a solution.
It appears Hive does not support COLUMN UPDATE or MERGE statements, but a simple alternative is to create an empty table and then populate it with fields from a join:
DROP TABLE IF EXISTS table_3;
CREATE TABLE table_3 LIKE table_1;
INSERT INTO TABLE table_3
SELECT a.*, b.prodt_cd
FROM table_1 AS a
LEFT OUTER JOIN table_2 AS b
ON a.id = b.id;

Find records with ID in array of IDS and keep the order of records matching that of IDs [duplicate]

I have a simple SQL query in PostgreSQL 8.3 that grabs a bunch of comments. I provide a sorted list of values to the IN construct in the WHERE clause:
SELECT * FROM comments WHERE (comments.id IN (1,3,2,4));
This returns comments in an arbitrary order which in my happens to be ids like 1,2,3,4.
I want the resulting rows sorted like the list in the IN construct: (1,3,2,4).
How to achieve that?
You can do it quite easily with (introduced in PostgreSQL 8.2) VALUES (), ().
Syntax will be like this:
select c.*
from comments c
join (
values
(1,1),
(3,2),
(2,3),
(4,4)
) as x (id, ordering) on c.id = x.id
order by x.ordering
In Postgres 9.4 or later, this is simplest and fastest:
SELECT c.*
FROM comments c
JOIN unnest('{1,3,2,4}'::int[]) WITH ORDINALITY t(id, ord) USING (id)
ORDER BY t.ord;
WITH ORDINALITY was introduced with in Postgres 9.4.
No need for a subquery, we can use the set-returning function like a table directly. (A.k.a. "table-function".)
A string literal to hand in the array instead of an ARRAY constructor may be easier to implement with some clients.
For convenience (optionally), copy the column name we are joining to ("id" in the example), so we can join with a short USING clause to only get a single instance of the join column in the result.
Works with any input type. If your key column is of type text, provide something like '{foo,bar,baz}'::text[].
Detailed explanation:
PostgreSQL unnest() with element number
Just because it is so difficult to find and it has to be spread: in mySQL this can be done much simpler, but I don't know if it works in other SQL.
SELECT * FROM `comments`
WHERE `comments`.`id` IN ('12','5','3','17')
ORDER BY FIELD(`comments`.`id`,'12','5','3','17')
With Postgres 9.4 this can be done a bit shorter:
select c.*
from comments c
join (
select *
from unnest(array[43,47,42]) with ordinality
) as x (id, ordering) on c.id = x.id
order by x.ordering;
Or a bit more compact without a derived table:
select c.*
from comments c
join unnest(array[43,47,42]) with ordinality as x (id, ordering)
on c.id = x.id
order by x.ordering
Removing the need to manually assign/maintain a position to each value.
With Postgres 9.6 this can be done using array_position():
with x (id_list) as (
values (array[42,48,43])
)
select c.*
from comments c, x
where id = any (x.id_list)
order by array_position(x.id_list, c.id);
The CTE is used so that the list of values only needs to be specified once. If that is not important this can also be written as:
select c.*
from comments c
where id in (42,48,43)
order by array_position(array[42,48,43], c.id);
I think this way is better :
SELECT * FROM "comments" WHERE ("comments"."id" IN (1,3,2,4))
ORDER BY id=1 DESC, id=3 DESC, id=2 DESC, id=4 DESC
Another way to do it in Postgres would be to use the idx function.
SELECT *
FROM comments
ORDER BY idx(array[1,3,2,4], comments.id)
Don't forget to create the idx function first, as described here: http://wiki.postgresql.org/wiki/Array_Index
In Postgresql:
select *
from comments
where id in (1,3,2,4)
order by position(id::text in '1,3,2,4')
On researching this some more I found this solution:
SELECT * FROM "comments" WHERE ("comments"."id" IN (1,3,2,4))
ORDER BY CASE "comments"."id"
WHEN 1 THEN 1
WHEN 3 THEN 2
WHEN 2 THEN 3
WHEN 4 THEN 4
END
However this seems rather verbose and might have performance issues with large datasets.
Can anyone comment on these issues?
To do this, I think you should probably have an additional "ORDER" table which defines the mapping of IDs to order (effectively doing what your response to your own question said), which you can then use as an additional column on your select which you can then sort on.
In that way, you explicitly describe the ordering you desire in the database, where it should be.
sans SEQUENCE, works only on 8.4:
select * from comments c
join
(
select id, row_number() over() as id_sorter
from (select unnest(ARRAY[1,3,2,4]) as id) as y
) x on x.id = c.id
order by x.id_sorter
SELECT * FROM "comments" JOIN (
SELECT 1 as "id",1 as "order" UNION ALL
SELECT 3,2 UNION ALL SELECT 2,3 UNION ALL SELECT 4,4
) j ON "comments"."id" = j."id" ORDER BY j.ORDER
or if you prefer evil over good:
SELECT * FROM "comments" WHERE ("comments"."id" IN (1,3,2,4))
ORDER BY POSITION(','+"comments"."id"+',' IN ',1,3,2,4,')
And here's another solution that works and uses a constant table (http://www.postgresql.org/docs/8.3/interactive/sql-values.html):
SELECT * FROM comments AS c,
(VALUES (1,1),(3,2),(2,3),(4,4) ) AS t (ord_id,ord)
WHERE (c.id IN (1,3,2,4)) AND (c.id = t.ord_id)
ORDER BY ord
But again I'm not sure that this is performant.
I've got a bunch of answers now. Can I get some voting and comments so I know which is the winner!
Thanks All :-)
create sequence serial start 1;
select * from comments c
join (select unnest(ARRAY[1,3,2,4]) as id, nextval('serial') as id_sorter) x
on x.id = c.id
order by x.id_sorter;
drop sequence serial;
[EDIT]
unnest is not yet built-in in 8.3, but you can create one yourself(the beauty of any*):
create function unnest(anyarray) returns setof anyelement
language sql as
$$
select $1[i] from generate_series(array_lower($1,1),array_upper($1,1)) i;
$$;
that function can work in any type:
select unnest(array['John','Paul','George','Ringo']) as beatle
select unnest(array[1,3,2,4]) as id
Slight improvement over the version that uses a sequence I think:
CREATE OR REPLACE FUNCTION in_sort(anyarray, out id anyelement, out ordinal int)
LANGUAGE SQL AS
$$
SELECT $1[i], i FROM generate_series(array_lower($1,1),array_upper($1,1)) i;
$$;
SELECT
*
FROM
comments c
INNER JOIN (SELECT * FROM in_sort(ARRAY[1,3,2,4])) AS in_sort
USING (id)
ORDER BY in_sort.ordinal;
select * from comments where comments.id in
(select unnest(ids) from bbs where id=19795)
order by array_position((select ids from bbs where id=19795),comments.id)
here, [bbs] is the main table that has a field called ids,
and, ids is the array that store the comments.id .
passed in postgresql 9.6
Lets get a visual impression about what was already said. For example you have a table with some tasks:
SELECT a.id,a.status,a.description FROM minicloud_tasks as a ORDER BY random();
id | status | description
----+------------+------------------
4 | processing | work on postgres
6 | deleted | need some rest
3 | pending | garden party
5 | completed | work on html
And you want to order the list of tasks by its status.
The status is a list of string values:
(processing, pending, completed, deleted)
The trick is to give each status value an interger and order the list numerical:
SELECT a.id,a.status,a.description FROM minicloud_tasks AS a
JOIN (
VALUES ('processing', 1), ('pending', 2), ('completed', 3), ('deleted', 4)
) AS b (status, id) ON (a.status = b.status)
ORDER BY b.id ASC;
Which leads to:
id | status | description
----+------------+------------------
4 | processing | work on postgres
3 | pending | garden party
5 | completed | work on html
6 | deleted | need some rest
Credit #user80168
I agree with all other posters that say "don't do that" or "SQL isn't good at that". If you want to sort by some facet of comments then add another integer column to one of your tables to hold your sort criteria and sort by that value. eg "ORDER BY comments.sort DESC " If you want to sort these in a different order every time then... SQL won't be for you in this case.

Use computed columns in related table in power query on SQLITE via odbc

In power query ( version included with exel 2016, PC ), is it possible to refer to a computed column of a related table?
Say I have an sqlite database as follow
PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE products (
iddb INTEGER NOT NULL,
px FLOAT,
PRIMARY KEY (iddb)
);
INSERT INTO "products" VALUES(0,0.0);
INSERT INTO "products" VALUES(1,1.1);
INSERT INTO "products" VALUES(2,2.2);
INSERT INTO "products" VALUES(3,3.3);
INSERT INTO "products" VALUES(4,4.4);
CREATE TABLE sales (
iddb INTEGER NOT NULL,
quantity INTEGER,
product_iddb INTEGER,
PRIMARY KEY (iddb),
FOREIGN KEY(product_iddb) REFERENCES products (iddb)
);
INSERT INTO "sales" VALUES(0,0,0);
INSERT INTO "sales" VALUES(1,1,1);
INSERT INTO "sales" VALUES(2,2,2);
INSERT INTO "sales" VALUES(3,3,3);
INSERT INTO "sales" VALUES(4,4,4);
INSERT INTO "sales" VALUES(5,5,0);
INSERT INTO "sales" VALUES(6,6,1);
INSERT INTO "sales" VALUES(7,7,2);
INSERT INTO "sales" VALUES(8,8,3);
INSERT INTO "sales" VALUES(9,9,4);
COMMIT;
basically we have products ( iddb, px ) and sales of those products ( iddb, quantity, product_iddb )
I load this data in power query by:
A. creating an ODBC data source using SQLITE3 driver : testDSN
B. in excel : data / new query , feeding this connection string Provider=MSDASQL.1;Persist Security Info=False;DSN=testDSN;
Now in power query I add a computed column, say px10 = px * 10 to the product table.
In the sales table, I can expand the product table into product.px, but not product.px10 . Shouldn't it be doable? ( in this simplified example I could expand first product.px and then create the px10 column in the sales table, but then any new table needinng px10 from product would require me to repeat the work... )
Any inputs appreciated.
I would add a Merge step from the sales query to connect it to the product query (which will include your calculated column). Then I would expand the Table returned to get your px10 column.
This is instead of expanding the Value column representing the product SQL table, which gets generated using the SQL foreign key.
You will have to come back and add any further columns added to the product query to the expansion list, but at least the column definition is only in one place.
In functional programming you don't modify existing values, only create new values. When you add the new column to product it creates a new table, and doesn't modify the product table that shows up in related tables. Adding a new column over product can't show up in Odbc's tables unless you apply that transformation to all related tables.
What you could do is generalize the "add a computed column" into a function that takes a table or record and adds the extra field. Then just apply that over each table in your database.
Here's an example against Northwind in SQL Server
let
Source = Sql.Database(".", "Northwind_Copy"),
AddAColumn = (data) => if data is table then Table.AddColumn(data, "UnitPrice10x", each [UnitPrice] * 10)
else if data is record then Record.AddField(data, "UnitPrice10x", data[UnitPrice] * 10)
else data,
TransformedSource = Table.TransformColumns(Source, {"Data", (data) => if data is table then Table.TransformColumns(data, {"Products", AddAColumn}, null, MissingField.Ignore) else data}),
OrderDetails = TransformedSource{[Schema="dbo",Item="Order Details"]}[Data],
#"Expanded Products" = Table.ExpandRecordColumn(OrderDetails, "Products", {"UnitPrice", "UnitPrice10x"}, {"Products.UnitPrice", "Products.UnitPrice10x"})
in
#"Expanded Products"

SQL Join based on three keys

Database is Teradata
I have two table which I am trying to join. Following are the table structures. When I join these table I expect to get two rows as output but getting 4 rows.what is reason for this behavior. Join based on three keys should uniquely identify a row but still getting 4 rows as output. Any help is appreciated.
TableA
Weekkey|segment|type|users
201501|1|A|100
201501|1|B|100
TableB
Weekkey|segment|type|revenue
201501|1|A|200
201501|1|B|200
when I join these two table using the following query i get the following result
select a.* ,b.user
from tablea a left join tableb b on a.weekkey=b.weekkey
and a.segment=b.segment
and a.type=b.type
Weekkey|segment|type|revenue|users
201501|1|A|200|100
201501|1|B|200|100
201501|1|A|200|100
201501|1|B|200|100
Using sql server, here is ddl and sample data along with the query you posted. The output you state you are getting doesn't happen here.
create table #tablea
(
Weekkey int
, segment int
, type char(1)
, users int
)
insert #tablea
select 201501, 1, 'A', 100 union all
select 201501, 1, 'B', 100
create table #TableB
(
Weekkey int
, segment int
, type char(1)
, revenue int
)
insert #TableB
select 201501, 1, 'A', 200 union all
select 201501, 1, 'B', 200
select a.*
, b.revenue
from #tablea a
left join #tableb b on a.weekkey = b.weekkey
and a.segment = b.segment
and a.type = b.type
drop table #tablea
drop table #TableB

Resources