Is my ER-Diagram for yearly data on trade and transportation OK? - entity-relationship

As part of my school project, I'm supposed to design/make a database where one can store/update/retrieve yearly data about international trade and transportation. To begin with, I isolated a small part of the database in order to start small.
Firstly I tried to design a diagram that would store the number of passengers (not individual passengers) that embarked/disembarked on/off ships in each port of every country every year and how many local and foreign passengers there were (I don't need those two to interact).
(Ignore the Passengers on the top.) and the inwards_outwards entity would give me a table in the database that would look like this:
Secondly I tried to design the diagram of a table where I could store Origin-Destination data (e.g. of the passengers that arrived in (or left from ) a country, how many came from (went to) each other country etc.
For instance in 2011, from England 20 passengers flew to France, 10 to Germany, etc. and in 2011, in England arrived 23 from France, 19 from Germany, etc.
and the od_hellas entity would give me a table like this:
Questions:
Do the above look OK to you?
Is there a more efficient way to store yearly data?
Is what I'm trying to make doable in the context of a project? Any advice in general?

You can do this with three tables as shown below.
If you want to add data about Passengers then you would need a fourth table "Passenger"
The value in your "Numbers" column can be calculated from the base data by using SQL COUNT something like this:
SELECT COUNT(passengerNr)
FROM Departure
WHERE portCode = "EL_OGRPIR";
To get the data by year, you just add something like [AND date = "2011"] (depends on how you choose to store your date data.)
If my solution helps, please click on the vote icon.
Here is the logical view of the tables.
Here is the SQL DDL that you would use to generate the tables in a database. (e.g. you could cut and paste this SQL into the "New Query" panel in SQL Server Management Studio.)
CREATE SCHEMA Trade
GO
CREATE TABLE Trade.Port
(
portCode nchar(15) NOT NULL,
countryCode nchar(2) NOT NULL,
portName nchar(50) NOT NULL,
type nchar(10) CHECK (type IN (N'SeaPort', N'AirPort', N'LandBorder')) NOT NULL,
CONSTRAINT Port_PK PRIMARY KEY(portCode)
)
GO
CREATE TABLE Trade.Departure
(
passengerNr int NOT NULL,
portCode nchar(15) NOT NULL,
"date" datetime NOT NULL,
isInternational bit,
CONSTRAINT Departure_PK PRIMARY KEY(passengerNr, portCode)
)
GO
CREATE TABLE Trade.Arrival
(
passengerNr int NOT NULL,
portCode nchar(15) NOT NULL,
"date" datetime NOT NULL,
isInternational bit,
CONSTRAINT Arrival_PK PRIMARY KEY(passengerNr, portCode)
)
GO
ALTER TABLE Trade.Departure ADD CONSTRAINT Departure_FK FOREIGN KEY (portCode) REFERENCES Trade.Port (portCode) ON DELETE NO ACTION ON UPDATE NO ACTION
GO
ALTER TABLE Trade.Arrival ADD CONSTRAINT Arrival_FK FOREIGN KEY (portCode) REFERENCES Trade.Port (portCode) ON DELETE NO ACTION ON UPDATE NO ACTION
GO

Related

Looking up data in an Oracle (12.1) table using keys from a text file

I have a table with approximately 8 million rows in it. It has a uniqueness constraint on a column called Customer_Identifier. This is a varchar(10) field, is not the primary key, but is unique.
I wish to retrieve some customer rows from this table using SQL Developer. I have been given a text file with each record containing a search key value in the columns 1-10. This query will need to be reused a few times, with different customer_identifier values. Sometimes I will be given a few customer_identifier values (<1000 of them). Sometimes many (between 1000 and 10000 of them). For the times when I want fewer than 1000 values, it's pretty straightforward to use an IN clause. I can edit the text file to wrap the keys in quotes and insert commas as appropriate. But SQL developer has a hard limit of 1000 values in an IN clause.
I only have read rights to the database, so creating and managing a new physical table is out of the question :-(.
Is there a way that I can treat the text file as a table in Oracle 12.1, and thus use it to join to my customer table on the customer_identifier column?
Brgds
Chris
Yes, you can treat a text file as an external table. But you may need DBA assistance to create a new directory, if you don't have access to a directory defined in the database.
Thanks to Oracle Base
**Create a directory object pointing to the location of the files.**
CREATE OR REPLACE DIRECTORY ext_tab_data AS '/data';
**Create the external table using the CREATE TABLE..ORGANIZATION EXTERNAL syntax. This defines the metadata for the table describing how it should appear and how the data is loaded.**
CREATE TABLE countries_ext (
country_code VARCHAR2(5),
country_name VARCHAR2(50),
country_language VARCHAR2(50)
)
ORGANIZATION EXTERNAL (
TYPE ORACLE_LOADER
DEFAULT DIRECTORY ext_tab_data
ACCESS PARAMETERS (
RECORDS DELIMITED BY NEWLINE
FIELDS TERMINATED BY ','
MISSING FIELD VALUES ARE NULL
(
country_code CHAR(5),
country_name CHAR(50),
country_language CHAR(50)
)
)
LOCATION ('Countries1.txt','Countries2.txt')
)
PARALLEL 5
REJECT LIMIT UNLIMITED;
**Once the external table created, it can be queried like a regular table.**
SQL> SELECT *
2 FROM countries_ext
3 ORDER BY country_name;
COUNT COUNTRY_NAME COUNTRY_LANGUAGE
----- ---------------------------- -----------------------------
ENG England English
FRA France French
GER Germany German
IRE Ireland English
SCO Scotland English
USA Unites States of America English
WAL Wales Welsh
7 rows selected.
SQL>

Delete all records that are not the latest

I have a table that deliberately has duplicates in it. In this instance the things that will be duplicated are a deviceId, and the datetime. Sometimes the customer updates their data. The table has three columns, deviceId, datetime and value (there is an incremental primary key). Sometimes when the customer re-evaluates their data, they notice that the value is incorrect, they then update it and send the data for re-processing. As a consequence, i need to be able to delete records that are not the very latest records. I cant do it by datetime, as this will also be duplicated in some cases and I cant truncate the staging table.
To delete the dupes I have the following:
;WITH DupeData AS (
SELECT ROW_NUMBER() OVER(PARTITION BY tblMeterData_Id,fldDateTime, fldValue, [fldBatchId],[fldProcessed] ORDER BY fldDateTime) AS ROW
FROM [Stage.tblMeterData])
DELETE FROM DupeData
WHERE ROW > 1
The problem with this, is it seems to delete a random duplicate.
I want to keep the latest record that is in the staging area and delete any others that are not the latest record. I can then update the relevant row with the new value, with the latest data, when I take it from staging into prod.
is any primary or unique key on the table?
if there's unique id - the easiest way below
not sure about performance but should work ok on small amounts
DELETE FROM DupeData
where id in
(select id from
( SELECT id,
ROW_NUMBER() OVER(PARTITION BY tblMeterData_Id,fldDateTime, fldValue, [fldBatchId],[fldProcessed] ORDER BY fldDateTime) AS ROW
FROM [Stage.tblMeterData])
) q
where q.row > 1)

How to migrate complex Rails database to use UUID primary keys Postgresql

I have a database I would like to convert to use UUID's as the primary key in postgresql.
I have roughly 30 tables with deep multi-level associations. Is there an 'easy' way to convert all current ID's to UUID?
From this: https://coderwall.com/p/n_0awq, I can see that I could alter the table in migration. I was thinking something like this:
for client in Client.all
# Retrieve children
underwritings = client.underwritings
# Change primary key
execute 'ALTER TABLE clients ALTER COLUMN id TYPE uuid;'
execute 'ALTER TABLE clients ALTER COLUMN id SET DEFAULT uuid_generate_v1();'
# Get new id - is this already generated?
client_id = client.id
for underwriting in underwritings
locations = underwriting.locations
other_record = underwriting.other_records...
execute 'ALTER TABLE underwritings ALTER COLUMN id TYPE uuid;'
execute 'ALTER TABLE underwritings ALTER COLUMN id SET DEFAULT uuid_generate_v1();'
underwriting.client_id = client_id
underwriting.saved
underwriting_id = underwriting.id
for location in locations
buildings = location.buildings
execute 'ALTER TABLE locations ALTER COLUMN id TYPE uuid;'
execute 'ALTER TABLE locations ALTER COLUMN id SET DEFAULT uuid_generate_v1();'
location.undewriting_id = underwriting_id
location.save
location_id = location.id
for building in buildings
...
end
end
for other_record in other_records
...
end
...
...
end
end
Questions:
Will this work?
Is there an easier way to do this?
Will child records be retrieved properly as long as they are retrieved before the primary key is changed?
Will the new primary key be already generated as soon as the alter table is called?
Thanks very much for any help or tips in doing this.
I found these to be quite tedious. It is possible to use direct queries to PostgreSQL to convert table with existing data.
For primary key:
ALTER TABLE students
ALTER COLUMN id DROP DEFAULT,
ALTER COLUMN id SET DATA TYPE UUID USING (uuid(lpad(replace(text(id),'-',''), 32, '0'))),
ALTER COLUMN id SET DEFAULT uuid_generate_v4()
For other references:
ALTER TABLE students
ALTER COLUMN city_id SET DATA TYPE UUID USING (uuid(lpad(replace(text(city_id),'-',''), 32, '0')))
The above left pads the integer value with zeros and converts to a UUID. This approach does not require id mapping and if needed old id could be retrieved.
As there is no data copying, this approach works quite fast.
To handle these and more complicated case of polymorphic associations please use https://github.com/kreatio-sw/webdack-uuid_migration. This gem adds additional helpers to ActiveRecord::Migration to ease these migrations.
I think trying to do something like this through Rails would just complicate matters. I'd ignore the Rails side of things completely and just do it in SQL.
Your first step is grab a complete backup of your database. Then restore that backup into another database to:
Make sure that your backup works.
Give you a realistic playpen where you can make mistakes without consequence.
First you'd want to clean up your data by adding real foreign keys to match all your Rails associations. There's a good chance that some of your FKs will fail, if they do you'll have to clean up your broken references.
Now that you have clean data, rename all your tables to make room for the new UUID versions. For a table t, we'll refer to the renamed table as t_tmp. For each t_tmp, create another table to hold the mapping from the old integer ids to the new UUID ids, something like this:
create table t_id_map (
old_id integer not null,
new_id uuid not null default uuid_generate_v1()
)
and then populate it:
insert into t_id_map (old_id)
select id from t_tmp
And you'll probably want to index t_id_map.old_id while you're here.
This gives us the old tables with integer ids and a lookup table for each t_tmp that maps the old id to the new one.
Now create the new tables with UUIDs replacing all the old integer and serial columns that held ids; I'd add real foreign keys at this point as well; you should be paranoid about your data: broken code is temporary, broken data is usually forever.
Populating the new tables is pretty easy at this point: simply use insert into ... select ... from constructs and JOIN to the appropriate t_id_map tables to map the old ids to the new ones. Once the data has been mapped and copied, you'll want to do some sanity checking to make sure everything still makes sense. Then you can drop your t_tmp and t_id_map tables and get on with your life.
Practice that process on a copy of your database, script it up, and away you go.
You would of course want to shut down any applications that access your database while you're doing this work.
Didn't want to add foreign keys, and wanted to to use a rails migration. Anyways, here is what I did if others are looking to do this (example for 2 tables, I did 32 total):
def change
execute 'CREATE EXTENSION "uuid-ossp";'
execute <<-SQL
ALTER TABLE buildings ADD COLUMN guid uuid DEFAULT uuid_generate_v1() NOT NULL;
ALTER TABLE buildings ALTER COLUMN guid SET DEFAULT uuid_generate_v1();
ALTER TABLE buildings ADD COLUMN location_guid uuid;
ALTER TABLE clients ADD COLUMN guid uuid DEFAULT uuid_generate_v1() NOT NULL;
ALTER TABLE clients ALTER COLUMN guid SET DEFAULT uuid_generate_v1();
ALTER TABLE clients ADD COLUMN agency_guid uuid;
ALTER TABLE clients ADD COLUMN account_executive_guid uuid;
ALTER TABLE clients ADD COLUMN account_representative_guid uuid;
SQL
for record in Building.all
location = record.location
record.location_guid = location.guid
record.save
end
for record in Client.all
agency = record.agency
record.agency_guid = agency.guid
account_executive = record.account_executive
record.account_executive_guid = account_executive.guid unless account_executive.blank?
account_representative = record.account_representative
record.account_representative_guid = account_representative.guid unless account_representative.blank?
record.save
end
execute <<-SQL
ALTER TABLE buildings DROP CONSTRAINT buildings_pkey;
ALTER TABLE buildings DROP COLUMN id;
ALTER TABLE buildings RENAME COLUMN guid TO id;
ALTER TABLE buildings ADD PRIMARY KEY (id);
ALTER TABLE buildings DROP COLUMN location_id;
ALTER TABLE buildings RENAME COLUMN location_guid TO location_id;
ALTER TABLE clients DROP CONSTRAINT clients_pkey;
ALTER TABLE clients DROP COLUMN id;
ALTER TABLE clients RENAME COLUMN guid TO id;
ALTER TABLE clients ADD PRIMARY KEY (id);
ALTER TABLE clients DROP COLUMN agency_id;
ALTER TABLE clients RENAME COLUMN agency_guid TO agency_id;
ALTER TABLE clients DROP COLUMN account_executive_id;
ALTER TABLE clients RENAME COLUMN account_executive_guid TO account_executive_id;
ALTER TABLE clients DROP COLUMN account_representative_id;
ALTER TABLE clients RENAME COLUMN account_representative_guid TO account_representative_id;
SQL
end

Change Data Capture with table joins in ETL

In my ETL process I am using Change Data Capture (CDC) to discover only rows that have been changed in the source tables since the last extraction. Then I do the transformation only for this rows. The problem is when I have for example 2 tables which I want to join into one dimension, and only one of them has changed. For example I have table Countries and Towns as following:
Countries:
ID Name
1 France
Towns:
ID Name Country_ID
1 Lyon 1
Now lets say a new row is added to Towns table:
ID Name Country_ID
1 Lyon 1
2 Paris 2
The Countries table has not been changed, so CDC for these tables shows me only the row from Towns table. The problem is when I do the join between Countries and Towns, there is no row in Countries change set, so the join will result in empty set.
Do you have an idea how to solve it? Of course there might be more difficult cases, involving 3 and more tables, and consequential joins.
This is a typical problem found when doing Realtime Change-Data-Capture, or even Incremental-only daily changes.
There's multiple ways to solve this.
One way would be to do your joins on the natural keys in the dimension or mapping table, to get the associated country (SELECT distinct country_name, [..other attributes..] from dim_table where country_id = X).
Another alternative would be to do the join as part of the change capture process - when a row is loaded to towns, a trigger goes off that loads the foreign key values into the associated staging tables (country, etc).
There is allot i could babble on for more information on but i will be specific to what is in your question. I would suggest the following to get the results...
1st Pass is where everything matches via the join...
Union All
2nd Pass Gets all towns where there isn't a country
(left outer join with a where condition that
requires the ID in the countries table to be null/missing).
You would default the Country ID value in that unmatched join to something designated as a "Unmatched Value" typically 0 or -1 is used or a series of standard -negative numbers that you could assign descriptions to later to identify why data is bad for your example -1 could be "Found Town Without Country".

LINQ to SQL and Data Projection, MVC

HI I have a table with some some values (IDs), and of course when i get the result i got just the int IDs, but i want to put it more user friendly, for example when its the number 1, i want to put the string "Avaible", when its 2 "Not avaible", im on an N tiers enviroment and i need to get this done on the Model, whats the best way to accomplish this, i have to declare another class to project the strings, or must i use something like a dictionary, Key -> Value.
right now i just have this
return from t in db.products where t.productID==productID select t;
If you are using Linq to SQL you need another table to contain product status:
Table Name: Product Status
Fields: ProductStatusID int Indentity Primary Key
ProductStatus nvarchar(50)
Add a field to your Products Table:
Field to Add: ProductStatusID int
Add some statuses to your new table, and set the ProductStatusID of each product to an appropriate status id.
Add a constraint that connects the two ProductStatusID fields together. The easiest way do this is to create a diagram in SQL Server Management Studio Express, drag both tables onto the diagram, and then drag the ProductStatusID field from the ProductStatus table to the Products table, and click OK on the dialog that opens.
Rebuild your Linq to SQL data classes. You do this by deleting and recreating the DBML file, and dragging your tables into the designer again.
When you get a products object (p) from your dataContext object, you should now see this:
p.ProductStatus <-- The text description of the product's status.
Linq to SQL will reach into your ProductStatus table, and lookup the appropriate status description.

Resources