enforcing consistency among multiple 1:m relationships - normalization

Given these business rules:
Users have 0 or more accounts and all accounts are associated with a single user
Users have 0 or more assets and all assets are associated with a single user
An asset may be associated with a single account. If it is assigned to any account, that account must belong to the user associated with the asset.
Assume the following proposed schema:
User
-id
Account
-id
-user_id
Asset
-id
-user_id
-account_id (Nullable)
It appears there is a weakness in this schema since an asset could be
assigned to an account that belongs to a different user than that
asset. Is this addressed by one of the normal forms leading to a
better schema? If it is not covered via normalization is the best
constraint then on the business logic side?

The only part of this (below) that normalization might deal with is the nullable column. In Chris Date's understanding, if a column allows NULL, then the relation isn't in 1NF.
If you were trying to strictly follow the relational model, I think you'd handle this with an assertion. But most SQL platforms don't support assertions. In SQL, I believe you're looking for something along these lines. I tested this in PostgreSQL.
create table users (
user_id integer primary key
);
create table accounts (
user_id integer not null references users (user_id),
account_id integer not null unique,
primary key (user_id, account_id)
);
create table assets (
user_id integer not null references users (user_id),
asset_id integer not null unique,
account_id integer null,
primary key (user_id, asset_id),
foreign key (user_id, account_id) references accounts (user_id, account_id)
);
-- Insert 3 users.
insert into users values (1), (2), (3);
-- User 1 has two accounts, user 2 has 3 accounts, user 3 has none.
insert into accounts values
(1, 100),
(1, 101),
(2, 102),
(2, 103),
(2, 104);
-- User 1 has 1 asset not assocated with an account.
insert into assets values (1, 200, null);
-- User 1 has 1 asset associated with account 101
insert into assets values (1, 201, 101);
-- User 1 tries to associate an asset with account 102, which doesn't belong to user 1.
insert into assets values (1, 202, 102);
[Fails with foreign key violation]
-- User 2 has two assets not associated with an account.
insert into assets values
(2, 500, null),
(2, 501, null);

I would suggest dropping the account_id foreign key completely from table Asset. Since account_id is related to your user table, you can join asset and user, and then perform a left join from user to account (if this is the table where account_id is the primary key). If you get a result from the left join, the asset is linked to an account and the user is the same. This way you force that constraint.
Hope this helps,
Regards
ElChe

Related

How to query DynamoDB with for multiple ids eg: student_id in [1,2,3] #rails

I have a table in DynamoDB.
student_id, name, age, address partition_key: student_id, table_name: students
lets say student ids are 1,2,3 etc...
I wanted to query students based on Ids.
eg sql select * from students where id in (1,2,3)
How to do the same in DynamoDB
Please help me with the query params.
I tried
params = {
:table_name=>"students",
:key_condition_expression=>"student_id IN :student_id",
:expression_attribute_values=>{":student_id"=> [1,2,3]}
}
DynamoDB does not allow you to apply conditionals around the partition key (e.g. PK in [1,2,3]), you need to specify it explicitly. However, you can perform various condition operations around sort keys (<,>, begins_with, etc).
If you can identify the primary key, which I'm assuming is your student_id and you want students with id 1, 2, and 3, then you can use the client batch_get_item().

Unable to find column names in a FK constraint

I have created two tables in Snowflake.
create or replace TRANSIENT TABLE TESTPARENT (
COL1 NUMBER(38,0) NOT NULL,
COL2 VARCHAR(16777216) NOT NULL,
COL3 VARCHAR(16777216) NOT NULL,
constraint UNIQ_COL3 unique (COL3)
);
create or replace TRANSIENT TABLE TESTCHILD3 (
COL_A NUMBER(38,0) NOT NULL,
COL_B NUMBER(38,0) NOT NULL,
ABCDEF VARCHAR(16777216) NOT NULL,
constraint FKEY_1 foreign key (COL_A, COL_B) references TEST_DB.PUBLIC.TESTPARENT1(COL1,COL2),
constraint FKEY_2 foreign key (ABCDEF) references TEST_DB.PUBLIC.TESTPARENT(COL3)
);
Now I want to execute a query and see the names of columns that are involved in FKEY_2 FOREIGN KEY
in Table TESTCHILD3, but it seems like there are no DB Table/View that keeps this information. I can find out the column names for UNIQUE KEY & PRIMARY KEY but there is nothing for FOREIGN KEYS.
EDIT
I have already tried INFORMATION_SCHEMA.TABLE_CONSTRAINTS, along with INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS and all the other system tables. No luck. Only DESC TABLE is giving me some info related to CONSTRAINTS and COLUMNS but that also has FOREIGN KEY CONSTRAINTS information missing.
SHOW IMPORTED KEYS IN TABLE <fk_table_name>;
Updated answer:
I was checking on something unrelated and noticed a very efficient way to list all primary and foreign keys:
show exported keys in account; -- Foreign keys
show primary keys in account;
When you limit the call to a table, it appears you have to request the foreign keys that point to the parent table:
show exported keys in table "DB_NAME"."SCHEMA_NAME"."PARENT_TABLE";
You can check the documentation for how to limit the show command to a specific database or schema, but this returns rich information in a table very quickly.
maybe you can try to query this view: INFORMATION_SCHEMA.TABLE_CONSTRAINTS
Note: TABLE_CONSTRAINTS only displays objects for which the current role for the session has been granted access privileges.
For more see: https://docs.snowflake.net/manuals/sql-reference/info-schema/table_constraints.html

Delete all records that are not the latest

I have a table that deliberately has duplicates in it. In this instance the things that will be duplicated are a deviceId, and the datetime. Sometimes the customer updates their data. The table has three columns, deviceId, datetime and value (there is an incremental primary key). Sometimes when the customer re-evaluates their data, they notice that the value is incorrect, they then update it and send the data for re-processing. As a consequence, i need to be able to delete records that are not the very latest records. I cant do it by datetime, as this will also be duplicated in some cases and I cant truncate the staging table.
To delete the dupes I have the following:
;WITH DupeData AS (
SELECT ROW_NUMBER() OVER(PARTITION BY tblMeterData_Id,fldDateTime, fldValue, [fldBatchId],[fldProcessed] ORDER BY fldDateTime) AS ROW
FROM [Stage.tblMeterData])
DELETE FROM DupeData
WHERE ROW > 1
The problem with this, is it seems to delete a random duplicate.
I want to keep the latest record that is in the staging area and delete any others that are not the latest record. I can then update the relevant row with the new value, with the latest data, when I take it from staging into prod.
is any primary or unique key on the table?
if there's unique id - the easiest way below
not sure about performance but should work ok on small amounts
DELETE FROM DupeData
where id in
(select id from
( SELECT id,
ROW_NUMBER() OVER(PARTITION BY tblMeterData_Id,fldDateTime, fldValue, [fldBatchId],[fldProcessed] ORDER BY fldDateTime) AS ROW
FROM [Stage.tblMeterData])
) q
where q.row > 1)

Order with DISTINCT ids in rails with postgres

I have the following code to join two tables microposts and activities with micropost_id column and then order based on created_at of activities table with distinct micropost id.
Micropost.joins("INNER JOIN activities ON
(activities.micropost_id = microposts.id)").
where('activities.user_id= ?',id).order('activities.created_at DESC').
select("DISTINCT (microposts.id), *")
which should return whole micropost columns.This is not working in my developement enviornment.
(PG::InvalidColumnReference: ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list
If I add activities.created_at in SELECT DISTINCT, I will get repeated micropost ids because the have distinct activities.created_at column. I have done a lot of search to reach here. But the problem always persist because of this postgres condition to avoid random selection.
I want to select based on order of activities.created_at with distinct micropost _id.
Please help..
To start with, we need to quickly cover what SELECT DISTINCT is actually doing. It looks like just a nice keyword to make sure you only get back distinct values, which shouldn't change anything, right? Except as you're finding out, behind the scenes, SELECT DISTINCT is actually acting more like a GROUP BY. If you want to select distinct values of something, you can only order that result set by the same values you're selecting -- otherwise, Postgres doesn't know what to do.
To explain where the ambiguity comes from, consider this simple set of data for your activities:
CREATE TABLE activities (
id INTEGER PRIMARY KEY,
created_at TIMESTAMP WITH TIME ZONE,
micropost_id INTEGER REFERENCES microposts(id)
);
INSERT INTO activities (id, created_at, micropost_id)
VALUES (1, current_timestamp, 1),
(2, current_timestamp - interval '3 hours', 1),
(3, current_timestamp - interval '2 hours', 2)
You stated in your question that you want "distinct micropost_id" "based on order of activities.created_at". It's easy to order these activities by descending created_at (1, 3, 2), but both 1 and 2 have the same micropost_id of 1. So if you want the query to return just micropost IDs, should it return 1, 2 or 2, 1?
If you can answer the above question, you need to take your logic for doing so and move it into your query. Let's say that, and I think this is pretty likely, you want this to be a list of microposts which were most recently acted on. In that case, you want to sort the microposts in descending order of their most recent activity. Postgres can do that for you, in a number of ways, but the easiest way in my mind is this:
SELECT micropost_id
FROM activities
JOIN microposts ON activities.micropost_id = microposts.id
GROUP BY micropost_id
ORDER BY MAX(activities.created_at) DESC
Note that I've dropped the SELECT DISTINCT bit in favor of using GROUP BY, since Postgres handles them much better. The MAX(activities.created_at) bit tells Postgres to, for each group of activities with the same micropost_id, sort by only the most recent.
You can translate the above to Rails like so:
Micropost.select('microposts.*')
.joins("JOIN activities ON activities.micropost_id = microposts.id")
.where('activities.user_id' => id)
.group('microposts.id')
.order('MAX(activities.created_at) DESC')
Hope this helps! You can play around with this sqlFiddle if you want to understand more about how the query works.
Try the below code
Micropost.select('microposts.*, activities.created_at')
.joins("INNER JOIN activities ON (activities.micropost_id = microposts.id)")
.where('activities.user_id= ?',id)
.order('activities.created_at DESC')
.uniq

using a table in an sql condition without incorporating that table's contents into the results

Say I have four tables, users, contacts, files, and userfiles.
Users can upload files and have contacts. They can choose to share their uploaded files with their contacts.
When a user selects one or more of their uploaded files, I want to show a list of their contacts that they are not already sharing all of their selected files with. So if they selected one file, it'd show the contacts that can't already see that file. If the selected multiple files, it'd show the contacts that can't already see all of the files.
Right now I'm trying a query like this (using sqlite3):
select users.user_id, users.display_name
from users, contacts, userfiles
where contacts.user_id = :user_id
and contacts.contact_id = users.user_id
and (
userfiles.user_id != users.user_id
and userfiles.file_id != :file_id
);
Note that the last line is auto-generated in a loop in the case of multiple selected files.
Where :user_id is the user trying to share the file, and :file_id is the file which, if a user can already see that file, they are omitted from the result. What I end up with is a list of contacts which are sharing any files other than the selected one, so if the user is sharing multiple files with any one contact, that contact shows up in the list multiple times.
How can I avoid the duplicates? I just want to check if the file is already being shared, not grab all of the contents of userfiles that don't involve a particular file or files.
select users.user_id, users.display_name
from users, contacts as c
where c.user_id = :user_id
and c.contact_id = users.user_id
and not exists (
select user_id
from userfiles as uf
where uf.user_id = c.contact_id
and uf.file_id in (:file_ids)
);
Note that :file_ids is all your file_id's, seperated with commas. No more looping to run multiple queries!
EDIT:
This is the data I'm running as a test:
create table users (user_id integer primary key, display_name text);
insert into users values (1,"bob");
insert into users values (2,"jim");
insert into users values (3,"bill");
insert into users values (4,"martin");
insert into users values (5,"carson");
create table contacts values (user_id integer, contact_id integer);
insert into contacts select u1.user_id, u2.user_id from users u1, users u2 where ui.user_id!=u2.user_id;
create table userfiles (user_id integer, file_id integer);
insert into userfiles values (1,10);
insert into userfiles values (2,10);
insert into userfiles values (3,10);
insert into userfiles values (4,10);
insert into userfiles values (1,20);
insert into userfiles values (2,30);
Then, if I run my query with :user_id = 5 and :files_id = 20,30, I get:
select users.user_id, users.display_name
from users, contacts as c
where c.user_id = 5
and c.contact_id = users.user_id
and not exists (
select user_id
from userfiles as uf
where uf.user_id = c.contact_id
and uf.file_id in (20,30)
);
UserID|Display_Name
3 |bill
4 |martin
That seems like what you want, as I understand it, that is, the only users who do not have all the file ID's. If I misunderstood something, please let me know.
This seemed to work, not sure if it's optimal but it is the only way I could figure it out:
select users.user_id, users.display_name
from users, contacts
where contacts.user_id = :user_id
and contacts.contact_id = users.user_id
and (
select count(*)
from userfiles
where userfiles.user_id = users.user_id
and userfiles.file_id in (:file_ids)
) < :number_of_files;
It selects all contacts, except the ones that match all of the file_ids. It does select the contacts which match some of the file_ids, since it grabs the count of contacts matching the specified IDs, and then checks if that is less than the number of ids that were provided.

Resources