I'm trying to come up with a PostgreSQL schema for host data that's currently in an LDAP store. Part of that data is the list of hostnames a machine can have, and that attribute is generally the key that most people use to find the host records.
One thing I'd like to get out of moving this data to an RDBMS is the ability to set a uniqueness constraint on the hostname column so that duplicate hostnames can't be assigned. This would be easy if hosts could only have one name, but since they can have more than one it's more complicated.
I realize that the fully-normalized way to do this would be to have a hostnames table with a foreign key pointing back to the hosts table, but I'd like to avoid having everybody need to do joins for even the simplest query:
select hostnames.name,hosts.*
from hostnames,hosts
where hostnames.name = 'foobar'
and hostnames.host_id = hosts.id;
I figured using PostgreSQL arrays could work for this, and they certainly make the simple queries simple:
select * from hosts where names #> '{foobar}';
When I set a uniqueness constraint on the hostnames attribute, though, it of course treats the entire list of names as the unique value instead of each name. Is there a way to make each name unique across every row instead?
If not, does anyone know of another data-modeling approach that would make more sense?
The righteous path
You might want to reconsider normalizing your schema. It is not necessary for everyone to "join for even the simplest query". Create a VIEW for that.
Table could look like this:
CREATE TABLE hostname (
hostname_id serial PRIMARY KEY
, host_id int REFERENCES host(host_id) ON UPDATE CASCADE ON DELETE CASCADE
, hostname text UNIQUE
);
The surrogate primary key hostname_id is optional. I prefer to have one. In your case hostname could be the primary key. But many operations are faster with a simple, small integer key. Create a foreign key constraint to link to the table host.
Create a view like this:
CREATE VIEW v_host AS
SELECT h.*
, array_agg(hn.hostname) AS hostnames
-- , string_agg(hn.hostname, ', ') AS hostnames -- text instead of array
FROM host h
JOIN hostname hn USING (host_id)
GROUP BY h.host_id; -- works in v9.1+
Starting with pg 9.1, the primary key in the GROUP BY covers all columns of that table in the SELECT list. The release notes for version 9.1:
Allow non-GROUP BY columns in the query target list when the primary
key is specified in the GROUP BY clause
Queries can use the view like a table. Searching for a hostname will be much faster this way:
SELECT *
FROM host h
JOIN hostname hn USING (host_id)
WHERE hn.hostname = 'foobar';
Provided you have an index on host(host_id), which should be the case as it should be the primary key. Plus, the UNIQUE constraint on hostname(hostname) implements the other needed index automatically.
In Postgres 9.2+ a multicolumn index would be even better if you can get an index-only scan out of it:
CREATE INDEX hn_multi_idx ON hostname (hostname, host_id);
Starting with Postgres 9.3, you could use a MATERIALIZED VIEW, circumstances permitting. Especially if you read much more often than you write to the table.
The dark side (what you actually asked)
If I can't convince you of the righteous path, here is some assistance for the dark side:
Here is a demo how to enforce uniqueness of hostnames. I use a table hostname to collect hostnames and a trigger on the table host to keep it up to date. Unique violations raise an exception and abort the operation.
CREATE TABLE host(hostnames text[]);
CREATE TABLE hostname(hostname text PRIMARY KEY); -- pk enforces uniqueness
Trigger function:
CREATE OR REPLACE FUNCTION trg_host_insupdelbef()
RETURNS trigger
LANGUAGE plpgsql AS
$func$
BEGIN
-- split UPDATE into DELETE & INSERT
IF TG_OP = 'UPDATE' THEN
IF OLD.hostnames IS DISTINCT FROM NEW.hostnames THEN -- keep going
ELSE
RETURN NEW; -- exit, nothing to do
END IF;
END IF;
IF TG_OP IN ('DELETE', 'UPDATE') THEN
DELETE FROM hostname h
USING unnest(OLD.hostnames) d(x)
WHERE h.hostname = d.x;
IF TG_OP = 'DELETE' THEN RETURN OLD; -- exit, we are done
END IF;
END IF;
-- control only reaches here for INSERT or UPDATE (with actual changes)
INSERT INTO hostname(hostname)
SELECT h
FROM unnest(NEW.hostnames) h;
RETURN NEW;
END
$func$;
Trigger:
CREATE TRIGGER host_insupdelbef
BEFORE INSERT OR DELETE OR UPDATE OF hostnames ON host
FOR EACH ROW EXECUTE FUNCTION trg_host_insupdelbef();
SQL Fiddle with test run.
Use a GIN index on the array column host.hostnames and array operators to work with it:
Why isn't my PostgreSQL array index getting used (Rails 4)?
Check if any of a given array of values are present in a Postgres array
In case anyone still needs what was in the original question:
CREATE TABLE testtable(
id serial PRIMARY KEY,
refs integer[],
EXCLUDE USING gist( refs WITH && )
);
INSERT INTO testtable( refs ) VALUES( ARRAY[100,200] );
INSERT INTO testtable( refs ) VALUES( ARRAY[200,300] );
and this would give you:
ERROR: conflicting key value violates exclusion constraint "testtable_refs_excl"
DETAIL: Key (refs)=({200,300}) conflicts with existing key (refs)=({100,200}).
Checked in Postgres 9.5 on Windows.
Note that this would create an index using the operator &&. So when you are working with testtable, it would be times faster to check ARRAY[x] && refs than x = ANY( refs ).
P.S. Generally I agree with the above answer. In 99% cases you'd prefer a normalized schema. Please try to avoid "hacky" stuff in production.
Related
I've got multiple master tables in the same format with the same variables. I now want to left join another variable but I can't combine the master tables due to limited storage on my computer. Is there a way that I can left join a variable onto multiple master tables within one PROC SQL? Maybe with the help of a macro?
The LEFT JOIN code looks like this for one join but I'm looking for an alternative than to copy and paste this 5 times:
PROC SQL;
CREATE TABLE New AS
SELECT a.*, b.Value
FROM Old a LEFT JOIN Additional b
ON a.ID = b.ID;
QUIT;
You can't do it in one create table statement, as it only creates one table at a time. But you can do a few things, depending on what your actual limiting factor is (you mention a few).
If you simply want to avoid writing the same code five times, but otherwise don't care how it executes, then just write the code in a macro, as you reference.
%macro update_table(old=, new=);
PROC SQL;
CREATE TABLE &new. AS
SELECT a.*, b.Value
FROM &old. a LEFT JOIN Additional b
ON a.ID = b.ID;
QUIT;
%mend update_table;
%update_table(old=old1, new=new1)
%update_table(old=old2, new=new2)
%update_table(old=old3, new=new3)
Of course, if the names of the five tables are in a pattern, you can perhaps automate this further based on that pattern, but you don't give sufficient information to figure that out.
If you on the other hand need to do this more efficiently in terms of processing than running the SQL query five times, it can be done a number of ways, depending on the specifics of your additional table and your specific limitations. It looks to me that you have a good use case for a format lookup here, for example; see for example Jenine Eason's paper, Proc Format, a Speedy Alternative to Sort/Merge. If you're just merging on the ID, this is very easy.
data for_format;
set additional;
start = ID;
label = value;
fmtname='AdditionalF'; *or '$AdditionalF' if ID is character-valued;
output;
if _n_=1 then do; *creating an "other" option so it returns missing if not found;
hlo='o';
label = ' ';
output;
end;
run;
And then you just have five data steps with a PUT statement adding the value, or even you could simply format the ID variable with that format and it would have that value whenever you did most PROCs (if this is something like a classifier that you don't truly need "in" the data).
You can do this in a single pass through the data in a Data Step using a hash table to lookup values.
data new1 new2 new3;
set old1(in=a) old2(in=b) old3(in=c);
format value best.;
if _n_=1 then do;
%create_hash(lk,id,value,"Additional");
end;
value = .;
rc = lk.find();
drop rc;
if a then
output new1;
else if b then
output new2;
else if c then
output new3;
run;
%create_hash() macro available here.
You could, alternatively, use Joe's format with the same Data Step syntax.
If my Question title is not clear,
Let me details it more:
In my rails application, in local(development) I am using PL-SQL as my databse, and there is a Customer table where the Primary ID starts from 1,2,3.......so on. Till I start using the same database in Local(development) & Production there was no problem in creating customers, so after I started running Local(development) & Production in parallel with same DB many of the time customer creations fails as ID's trying to create as Duplicate. So how can I set in my Local/Production to change the next Index to start with for the ID into another number to avoid this conflict. ?
Eg: I want to continue using(1,2.....etc) the ID in Local & in production I want to set the next Id from 50000 on wards and continuation.
If I understand your question correctly, you have development environment and live environment both linked to the same Database and same table (say the table name is customer_tab, and the primary id column is cus_id).
Although, this is highly not recommended practice, but if you want to have the primary id be in sequence regardless where the insert is coming from (live or dev), then you can use sequence and triggers. That is, insert statement will not insert the primary id, rather it will leave it null. However, on insert, you run the trigger that uses the sequence numbers. Something like this:
/*Create sequence*/
create sequence customer_tab_seq
start with 5000
increment by 1;
/*Create trigger*/
CREATE OR REPLACE TRIGGER customer_tab_tri
BEFORE INSERT ON customer_tab
REFERENCING NEW AS NEW OLD AS OLD
FOR EACH ROW
BEGIN
IF INSERTING THEN
IF :NEW.cus_id IS NULL THEN
SELECT customer_tab_seq.NEXTVAL INTO :NEW.cus_id FROM DUAL;
END IF;
END IF;
END;
/
In this case, first row inserted will be given 5000 (regardless if it is from live or dev), the second will be given 5001 (regardless if it is from live or dev). And it will save you future conflicts.
I have a Rails app that uses postgreSQL.
I recently did a backup of production and restored it to development.
When I try to add a Payment record in development, I get:
ERROR: duplicate key value violates unique constraint "payments_pkey"
DETAIL: Key (id)=(1) already exists.
Yet, there is only one record in the table with id=1 and the payments_id_seq has Current value = 1.
So, whey isn't Rails trying to add id=2 ??
Thanks for the help!
PS - is there a script or command in pgadmin to force the id_seq to be correct?
If you receive a PostgreSQL unique key violation error message ("duplicate key value violates unique constraint..."), probably your primary key index is out of sync, e.g. after populating the database.
Use
ActiveRecord::Base.connection.reset_pk_sequence!('[table_name]')
to fix the sequence for the users table.
Presumably whatever method you used to copy your database didn't update your sequences along the way, a standard dump/restore should have take care of that but if you copied things row-by-row by hand then you'll have to fix things using setval.
If you only need to fix the sequence for a table T, then you could do this from the console:
ActiveRecord::Base.connection.execute(%q{
select setval('T_id_seq', m)
from (
select max(id) from T
) as dt(m)
})
or you could feed that SQL to pgadmin. You'd repeat that for each table T.
I have a TSimpleDataSet based on dinamically created SQL query. I need to know which field is a primary key?
SimpleDataSet1.DataSet.SetSchemaInfo(stIndexes, 'myTable' ,'');
This code tells me that i have a primary key with name 'someName', but how can i know which field (column) works with this index?
A Primary Key/Index can belong to several columns (not just one).
The schema stIndexes dataset will return the PK name INDEX_NAME and the columns that construct that PK/Index (COLUMN_NAME). INDEX_TYPE will tell you which index types you have (eSQLNonUnique/eSQLUnique/eSQLPrimaryKey).
I have never worked with TSimpleDataSet but check if the indexes information is stored in
IndexDefs[TIndexDef].Name/Fields/Options - if ixPrimary in Options then this is your PK. and Fields belongs to that index.
Take a look at the source at SqlExpr.pas: TCustomSQLDataSet.AddIndexDefs.
Note how TCustomSQLDataSet returns the TableName (and then the indexs information) from the command text:
...
if FCommandType = ctTable then
TableName := FCommandText
else
TableName := GetTableNameFromSQL(CommandText);
DataSet := FSQLConnection.OpenSchemaTable(stIndexes, TableName, '', '', '');
...
I think the simple data set does not provide that information.
However, i am sure there are components for that. Check, for Oracle database, Devart's ODAC.
Basically, it involves only one query to the database.
However, it is not something that components will offer by default as, because it involves a different query, it leads to slow response times.
For Oracle database, query on user_indexes.
I have 14 fields which are similar and I search the string 'A' on each of them. I would like after that order by "position" field
-- some set in order to remove a lot of useless text
def col='col01'
select '&col' "Fieldname",
&col "value",
position
from oneTable
where &col like '%A%'
/
-- then for the second field, I only have to type two lines
def col='col02'
/
...
def col='col14'
/
Write all the fields which contains 'A'. The problem is that those field are not ordered by position.
If I use UNION between table, I cannot take advantage of the substitution variables (&col), and I have to write a bash in unix in order to make the replacement back into ksh. The problem is of course that database code have to be hard-coded in this script (connection is not easy stuff).
If I use a REFCURSOR with OPEN, I cannot group the results sets together. I have only one request and cannot make an UNION of then. (print refcursor1 union refcursor2; print refcursor1+refcursor2 raise an exception, select * from refcursor1 union select * from refcursor2, does not work also).
How can concatenate results into one big "REFCURSOR"? Or use a union between two distinct run ('/') of my request, something like holding the request while typing new definition of variables?
Thank you for any advice.
Does this answer your question ?
CREATE TABLE #containingAValueTable
(
FieldName VARCHAR(10),
FieldValue VARCHAR(1000),
position int
)
def col='col01'
INSERT INTO #containingAValueTable
(
FieldName , FieldValue, position
)
SELECT '&col' "Fieldname",
&col "value",
position
FROM yourTable
WHERE &col LIKE '%A%'
/
-- then for the second field, I only have to type two lines
def col='col02'
INSERT INTO...
/
def col='col14'
/
select * from #containingAValueTable order by postion
DROP #containingAValueTable
But I'm not totally sure about your use of the 'substitution variable' called "col" (and i only have SQL Server to test my request so I used explicit field names)
edit : Sorry for the '#' charcater, we use it so often in SQL Server for temporaries, I didn't even know it was SQL Server specific (moreover I think it's mandatory in sql server for creating temporary table). Whatever, I'm happy I could be useful to you.