View with LEFT JOIN is Updatable? - join

According to MariaDB documentation, a VIEW can't have a outer join to be updatable:
A view cannot be used for updating if it uses any of the following:
... an outer join ...
As far as i know, "outer join" include "left join" and "right join", right?
But when i test it (in mariaDB v10.1.25), it is updatable! What am I missing here?
DROP TABLE IF EXISTS t1,t2;
CREATE TABLE t1(id INT, a TEXT);
INSERT INTO t1(id,a) VALUES (1,'a'),(2,'b'),(3,'c');
CREATE TABLE t2(id INT, b TEXT);
INSERT INTO t2(id,b) VALUES (1,'+'),(1,'-'),(2,'*');
CREATE OR REPLACE VIEW v1 AS SELECT * FROM t1 LEFT JOIN t2 USING(id);
#UPDATE v1 SET a='x' WHERE id=3; # worked
#UPDATE v1 SET a='y' WHERE b IS NULL; # worked
#UPDATE v1 SET b='y' WHERE b IS NULL; # worked! even so, it does not make sense and does not update anything
SELECT * FROM t1;
SELECT * FROM t2;

Some entries in the ChangeLogs. (This is too long for a Comment; perhaps the Answer is buried in one of the links.)
----- 2015-08-03 5.7.8 Release Candidate -- Bugs Fixed -- -----
An assertion could be raised if a multiple-table UPDATE ( /doc/refman/5.7/en/update.html ) of a view, where the same column was used in the SET and JOIN clauses, was used as a prepared statement. (Bug #76962, Bug #21045724)
http://bugs.mysql.com/bug.php?id=76962
----- 2015-03-09 5.7.6 Milestone 16 -- Optimizer Notes -- -----
To handle a derived table (subquery in the FROM clause) or view reference, ...
mysql> SET optimizer_switch = 'derived_merge=off'; ...
http://bugs.mysql.com/bug.php?id=59203
----- 2015-06-04 MariaDB 10.1.5 & 2015-05-07 MariaDB 10.0.18 & 2015-05-01 MariaDB 5.5.43 -- -- -----
MDEV-7613 : MariaDB 5.5.40 server crash on update table left join with a view
http://localhost/kb/en/mariadb-5540-release-notes/
----- 2009-12-07 5.5.0 Milestone 2 & 2009-10-06 5.1.40 -- Bugs Fixed -- -----
A multiple-table UPDATE ( http://dev.mysql.com/doc/refman/5.5/en/update.html ) involving a natural join and a mergeable view raised an assertion. (Bug #47150)
http://bugs.mysql.com/bug.php?id=47150
Note: 5.5 predates MariaDB 10.1, and the fix should be included; 5.7 comes after, so it won't be in 10.1.

Related

Why does Hive warn that this subquery would cause a Cartesian product?

According to Hive's documentation it supports NOT IN subqueries in a WHERE clause, provided that the subquery is an uncorrelated subquery (does not reference columns from the main query).
However, when I attempt to run the trivial query below, I get an error FAILED: SemanticException Cartesian products are disabled for safety reasons.
-- sample data
CREATE TEMPORARY TABLE foods (name STRING);
CREATE TEMPORARY TABLE vegetables (name STRING);
INSERT INTO foods VALUES ('steak'), ('eggs'), ('celery'), ('onion'), ('carrot');
INSERT INTO vegetables VALUES ('celery'), ('onion'), ('carrot');
-- the problematic query
SELECT *
FROM foods
WHERE foods.name NOT IN (SELECT vegetables.name FROM vegetables)
Note that if I use an IN clause instead of a NOT IN clause, it actually works fine, which is perplexing because the query evaluation structure should be the same in either case.
Is there a workaround for this, or another way to filter values from a query based on their presence in another table?
This is Hive 2.3.4 btw, running on an Amazon EMR cluster.
Not sure why you would get that error. One work around is to use not exists.
SELECT f.*
FROM foods f
WHERE NOT EXISTS (SELECT 1
FROM vegetables v
WHERE v.name = f.name)
or a left join
SELECT f.*
FROM foods f
LEFT JOIN vegetables v ON v.name = f.name
WHERE v.name is NULL
You got cartesian join because this is what Hive does in this case. vegetables table is very small (just one row) and it is being broadcasted to perform the cross (most probably map-join, check the plan) join. Hive does cross (map) join first and then applies filter. Explicit left join syntax with filter as #VamsiPrabhala said will force to perform left join, but in this case it works the same, because the table is very small and CROSS JOIN does not multiply rows.
Execute EXPLAIN on your query and you will see what is exactly happening.

Suspected alias issue in bigquery join

I am relatively new to bigquery and think I have an aliasing problem but can't work out what it is. Essentially, I have two tables and while the first table has the majority of the required information the second table has a date of birth that I need to join. I have written the below query and the two initial SELECT statements work in isolation and appear to return the expected values. However, when attempting to join the two tables I get an error stating:
Unrecognized name: t1_teams at [10:60]
WITH table_1 AS (SELECT competition_name, stat_season_name,
matchdata_Date, t1_teams.name, t1_players.Position, CAST(REGEXP_REPLACE(t1_players.uID, r'[a-zA-Z]', '') AS NUMERIC) AS Player_ID1, t1_players.First, t1_players.Last
FROM `prod.feed1`,
UNNEST(teams) AS t1_teams, UNNEST(t1_teams.Players) as t1_players),
table_2 AS (SELECT t2_players.uID AS Player_ID2, t2_players.stat_birth_date
FROM `prod.feed2`,
UNNEST(players) AS t2_players)
SELECT competition_name, stat_season_name, matchdata_Date, t1_teams.name, t1_players.Position, t1_players.uID, t1_players.First, t1_players.Last, t2_players.stat_birth_date
FROM table_1
LEFT JOIN table_2
ON Player_ID1 = Player_ID2
WHERE competition_name = "EPL"
AND stat_season_name = "Season 2018/2019"
Any help in steering me in the right direction would be greatly appreciated as my reading of the bigquery documentation and other searches have drawn a blank.
The problem is here:
WITH table_1 AS (
SELECT
competition_name,
stat_season_name,
matchdata_Date,
-- this line
t1_teams.name,
...
You're selecting t1_teams.name, so you end up with just name an an output column from the select list. If you want to refer to t1_teams later, then select that instead:
WITH table_1 AS (
SELECT
competition_name,
stat_season_name,
matchdata_Date,
-- this line
t1_teams,
...

5702 : The SQL Server is terminating this process because the database view

after modifying the view as below, if use the view in the select statements by the application(as below query), then we are getting
DB error 5702: The SQL Server is terminating this process.
Here's the query
select *
from view1 v1,
view2 v2
where v1.column in(select M_FLOW_ID from VW_NETPAY_UNDO)
is the issue because more number of view in sql statements ?
Most provably is that your query is runing for a long long time and sybase decided to stop the process, root of the problem ist that you have a cross join between the 2 Views.
You should match the 2 views,(on View1.columnA= View2.columnA)
select *
from view1 v1,
view2 v2 on V1.id = V2.id -- id like common value column
where v1.column in(select colum1 from table1)
.If you don´t do that you have a cartesian products between the 2 Views and then the resuld wouldnt arrise.
Best Regards,
Enrique

Add uniqueness constraint to Postgres text array contents [duplicate]

I'm trying to come up with a PostgreSQL schema for host data that's currently in an LDAP store. Part of that data is the list of hostnames a machine can have, and that attribute is generally the key that most people use to find the host records.
One thing I'd like to get out of moving this data to an RDBMS is the ability to set a uniqueness constraint on the hostname column so that duplicate hostnames can't be assigned. This would be easy if hosts could only have one name, but since they can have more than one it's more complicated.
I realize that the fully-normalized way to do this would be to have a hostnames table with a foreign key pointing back to the hosts table, but I'd like to avoid having everybody need to do joins for even the simplest query:
select hostnames.name,hosts.*
from hostnames,hosts
where hostnames.name = 'foobar'
and hostnames.host_id = hosts.id;
I figured using PostgreSQL arrays could work for this, and they certainly make the simple queries simple:
select * from hosts where names #> '{foobar}';
When I set a uniqueness constraint on the hostnames attribute, though, it of course treats the entire list of names as the unique value instead of each name. Is there a way to make each name unique across every row instead?
If not, does anyone know of another data-modeling approach that would make more sense?
The righteous path
You might want to reconsider normalizing your schema. It is not necessary for everyone to "join for even the simplest query". Create a VIEW for that.
Table could look like this:
CREATE TABLE hostname (
hostname_id serial PRIMARY KEY
, host_id int REFERENCES host(host_id) ON UPDATE CASCADE ON DELETE CASCADE
, hostname text UNIQUE
);
The surrogate primary key hostname_id is optional. I prefer to have one. In your case hostname could be the primary key. But many operations are faster with a simple, small integer key. Create a foreign key constraint to link to the table host.
Create a view like this:
CREATE VIEW v_host AS
SELECT h.*
, array_agg(hn.hostname) AS hostnames
-- , string_agg(hn.hostname, ', ') AS hostnames -- text instead of array
FROM host h
JOIN hostname hn USING (host_id)
GROUP BY h.host_id; -- works in v9.1+
Starting with pg 9.1, the primary key in the GROUP BY covers all columns of that table in the SELECT list. The release notes for version 9.1:
Allow non-GROUP BY columns in the query target list when the primary
key is specified in the GROUP BY clause
Queries can use the view like a table. Searching for a hostname will be much faster this way:
SELECT *
FROM host h
JOIN hostname hn USING (host_id)
WHERE hn.hostname = 'foobar';
Provided you have an index on host(host_id), which should be the case as it should be the primary key. Plus, the UNIQUE constraint on hostname(hostname) implements the other needed index automatically.
In Postgres 9.2+ a multicolumn index would be even better if you can get an index-only scan out of it:
CREATE INDEX hn_multi_idx ON hostname (hostname, host_id);
Starting with Postgres 9.3, you could use a MATERIALIZED VIEW, circumstances permitting. Especially if you read much more often than you write to the table.
The dark side (what you actually asked)
If I can't convince you of the righteous path, here is some assistance for the dark side:
Here is a demo how to enforce uniqueness of hostnames. I use a table hostname to collect hostnames and a trigger on the table host to keep it up to date. Unique violations raise an exception and abort the operation.
CREATE TABLE host(hostnames text[]);
CREATE TABLE hostname(hostname text PRIMARY KEY); -- pk enforces uniqueness
Trigger function:
CREATE OR REPLACE FUNCTION trg_host_insupdelbef()
RETURNS trigger
LANGUAGE plpgsql AS
$func$
BEGIN
-- split UPDATE into DELETE & INSERT
IF TG_OP = 'UPDATE' THEN
IF OLD.hostnames IS DISTINCT FROM NEW.hostnames THEN -- keep going
ELSE
RETURN NEW; -- exit, nothing to do
END IF;
END IF;
IF TG_OP IN ('DELETE', 'UPDATE') THEN
DELETE FROM hostname h
USING unnest(OLD.hostnames) d(x)
WHERE h.hostname = d.x;
IF TG_OP = 'DELETE' THEN RETURN OLD; -- exit, we are done
END IF;
END IF;
-- control only reaches here for INSERT or UPDATE (with actual changes)
INSERT INTO hostname(hostname)
SELECT h
FROM unnest(NEW.hostnames) h;
RETURN NEW;
END
$func$;
Trigger:
CREATE TRIGGER host_insupdelbef
BEFORE INSERT OR DELETE OR UPDATE OF hostnames ON host
FOR EACH ROW EXECUTE FUNCTION trg_host_insupdelbef();
SQL Fiddle with test run.
Use a GIN index on the array column host.hostnames and array operators to work with it:
Why isn't my PostgreSQL array index getting used (Rails 4)?
Check if any of a given array of values are present in a Postgres array
In case anyone still needs what was in the original question:
CREATE TABLE testtable(
id serial PRIMARY KEY,
refs integer[],
EXCLUDE USING gist( refs WITH && )
);
INSERT INTO testtable( refs ) VALUES( ARRAY[100,200] );
INSERT INTO testtable( refs ) VALUES( ARRAY[200,300] );
and this would give you:
ERROR: conflicting key value violates exclusion constraint "testtable_refs_excl"
DETAIL: Key (refs)=({200,300}) conflicts with existing key (refs)=({100,200}).
Checked in Postgres 9.5 on Windows.
Note that this would create an index using the operator &&. So when you are working with testtable, it would be times faster to check ARRAY[x] && refs than x = ANY( refs ).
P.S. Generally I agree with the above answer. In 99% cases you'd prefer a normalized schema. Please try to avoid "hacky" stuff in production.

Ambiguous column error creating table in Aster Studio 6.0

I am new to databases and am posting a problem from work. I am creating a table in Aster Studio 6.0, but got an error about an ambiguous column. I ran the same query in Teradata SQL Assistant and did not get an error.
I have six tables with millions of rows named EDW.SWIFTIQ_TRANS_DTL, EDW.SWIFTIQ_STORE, EDW.SWIFTIQ_PROD, EDW.STORE_XREF, EDW.TDLNX_STR_OUTLT, and EDW.SURV_CWC.
EDW represents the original database, but the columns were labeled with aliases.
I did a trim() on the VARCHAR columns for saving spool space. For the error about TDLNX_RTL_OUTLT_NBR, I performed an INNER JOIN on similar columns from two different tables. Doing a preview in SQL Assistant, there was a temporary table with only one column called TDLNX_RTL_OUTLT_NBR.
Here’s the SQL query:
CREATE TABLE public.table_name
DISTRIBUTE BY HASH (SRC_SYS_PROD_ID) AS (
SELECT * FROM load_from_teradata(
ON public.load_from_teradata_dummy
TDPID(‘database_name')
USERNAME(’user_name')
PASSWORD(’ss')
QUERY ('SELECT e.TDLNX_RTL_OUTLT_NBR, e.OUTLT_ST_ADDR_TXT, e.STORE_OUTLT_ZIP_CD, d.TRANS_ID, d.TRANS_DT,
d.TRANS_TM, d.UNIT_QTY, d.SRC_SYS_STORE_ID, d.SRC_SYS_PROD_ID, d.SRC_SYS_NM, a.SRC_SYS_STORE_ID, a.SRC_SYS_NM, a.STORE_NM,
a.CITY_NM, a.ZIP_CD, a.ST_cd, p.SRC_SYS_PROD_ID, p.SRC_SYS_NM, p.UPC_CD, p.PROD_ID, f.SRC_SYS_STORE_ID, f.SRC_SYS_NM,
f.TDLNX_RTL_OUTLT_NBR, g.SURV_CWC_WSLR_CUST_PARTY_ID, g.AGE_CD, g.HIGH_END_ACCT_FLG, g.RACE_ETHNC_CD, g.OCCPN_CD
FROM EDW.SWIFTIQ_TRANS_DTL d
INNER JOIN EDW.SWIFTIQ_STORE a
ON trim( a.SRC_SYS_STORE_ID) = trim(d.SRC_SYS_STORE_ID)
INNER JOIN EDW.SWIFTIQ_PROD p
ON trim(p.SRC_SYS_PROD_ID) = trim(d.SRC_SYS_PROD_ID)
and p.SRC_SYS_NM = d.SRC_SYS_NM
INNER JOIN EDW.STORE_XREF f
ON trim(f.SRC_SYS_STORE_ID) = trim(a.SRC_SYS_STORE_ID)
INNER JOIN EDW.TDLNX_STR_OUTLT e
ON trim(e.TDLNX_RTL_OUTLT_NBR)= trim(f.TDLNX_RTL_OUTLT_NBR)
INNER JOIN EDW.SURV_CWC g
ON g.SURV_CWC_WSLR_CUST_PARTY_ID = e.WSLR_CUST_PARTY_ID
WHERE TRANS_DT between ''2015-01-01'' and ''2015-03-31''')
num_instances('4') ) );
ERROR: column reference 'TDLNX_RTL_OUTLT_NBR' is ambiguous.
EDIT: Forgot to include a description about the table aliases. a stands for EDW.SWIFTIQ_STORE, p for EDW.SWIFTIQ_PROD, f for EDW.STORE_XREF, e for EDW.TDLNX_STR_OUTLT, g for EDW.SURV_CWC, and d for EDW.SWIFTIQ_TRANS_DTL.
You will get the same error when you try CREATE TABLE AS SELECT in Teradata. There are three column names, SRC_SYS_NM & SRC_SYS_PROD_ID & SRC_SYS_STORE_ID, which are used multiple times (with different table aliases) within the SELECT.
Add column aliases to make those names unique, e.g. trans_SRC_SYS_NM instead of d.SRC_SYS_NM.
Additionally the TRIMs in the joins are a very bad idea. You will probably not save that much spool, but force the optimizer to redistribute all spools for join-preparation.

Resources