Change Data Capture with table joins in ETL - join

In my ETL process I am using Change Data Capture (CDC) to discover only rows that have been changed in the source tables since the last extraction. Then I do the transformation only for this rows. The problem is when I have for example 2 tables which I want to join into one dimension, and only one of them has changed. For example I have table Countries and Towns as following:
Countries:
ID Name
1 France
Towns:
ID Name Country_ID
1 Lyon 1
Now lets say a new row is added to Towns table:
ID Name Country_ID
1 Lyon 1
2 Paris 2
The Countries table has not been changed, so CDC for these tables shows me only the row from Towns table. The problem is when I do the join between Countries and Towns, there is no row in Countries change set, so the join will result in empty set.
Do you have an idea how to solve it? Of course there might be more difficult cases, involving 3 and more tables, and consequential joins.

This is a typical problem found when doing Realtime Change-Data-Capture, or even Incremental-only daily changes.
There's multiple ways to solve this.
One way would be to do your joins on the natural keys in the dimension or mapping table, to get the associated country (SELECT distinct country_name, [..other attributes..] from dim_table where country_id = X).
Another alternative would be to do the join as part of the change capture process - when a row is loaded to towns, a trigger goes off that loads the foreign key values into the associated staging tables (country, etc).

There is allot i could babble on for more information on but i will be specific to what is in your question. I would suggest the following to get the results...
1st Pass is where everything matches via the join...
Union All
2nd Pass Gets all towns where there isn't a country
(left outer join with a where condition that
requires the ID in the countries table to be null/missing).
You would default the Country ID value in that unmatched join to something designated as a "Unmatched Value" typically 0 or -1 is used or a series of standard -negative numbers that you could assign descriptions to later to identify why data is bad for your example -1 could be "Found Town Without Country".

Related

Joining two tables based on matching two columns

I'm trying to join two tables:
Table A has three columns: State, County, and Count (of Farmer's Markets in said county)
Table B has several columns: State, County, and several data columns (like food access score)
I'm trying to combine them in such a way as to put the Count for each State/County combination (since there are multiple counties with the same name) together with the State and County and data columns from Table B.
I've been banging my head on SAS, trying to get a join to cooperate. I read a few other questions on here, but I can't find where the mistake is in my code.
PROC SQL;
CREATE TABLE WORK.QUERY1
AS
SELECT FMDV4.State, FMDV4.County, FMDV4.Count, CFSDV1.GROC14,
CFSDV1.SUPERC14, CFSDV1.CONVS14, CFSDV1.SPECS14, CFSDV1.FOODINSEC_13_15,
CFSDV1.PCT_LACCESS_POP15, CFSDV1.DIRSALES_FARMS12, CFSDV1.FMRKT16,
CFSDV1.FOODHUB16, CFSDV1.CSA12, CFSDV1.POVRATE15, CFSDV1.PERPOV10
FROM FNLPRJT.CFSDV1 AS CFSDV1
INNER JOIN FNLPRJT.FMDV4 AS FMDV4
ON (( CFSDV1.State = FMDV4.State ) AND ( CFSDV1.County =
FMDV4.County ));
QUIT;
I also tried a few variants, like:
PROC SQL;
CREATE TABLE WORK.QUERY1
AS
SELECT FMDV4.State, FMDV4.County, FMDV4.Count, CFSDV1.GROC14,
CFSDV1.SUPERC14, CFSDV1.CONVS14, CFSDV1.SPECS14, CFSDV1.FOODINSEC_13_15,
CFSDV1.PCT_LACCESS_POP15, CFSDV1.DIRSALES_FARMS12, CFSDV1.FMRKT16,
CFSDV1.FOODHUB16, CFSDV1.CSA12, CFSDV1.POVRATE15, CFSDV1.PERPOV10
FROM FNLPRJT.CFSDV1 AS CFSDV1
INNER JOIN FNLPRJT.FMDV4 AS FMDV4
ON CFSDV1.State = FMDV4.State
WHERE CFSDV1.County = FMDV4.County;
QUIT;
I get a table of 0 rows with the columns as they should be (State, County, Count, ). I'm just missing the dang data! Can anyone please help me find my mistake?
Can you try
propcase(CFSDV1.State) = propcase(FMDV4.State)
and
propcase(CFSDV1.County) = propcase(FMDV4.County);
If this doesn't work try character functions like trim and compress to remove any blanks that might be present in the data.

Compare 3 tables in SQLite

Originally, I have 2 tables. I normalized it since the relationship of this tables is many to many. Now I have 3.
Jobs
jID PK
jName
jDesc
jEarnings
jTags
Course
cID PK
cName
cDesc
cSchool
cProgram
JobsCourse
ID PK
jID FK
cID FK
My app displays a tableview of the jobs
When clicked it displays the UIViewcontroller of jobs plus a tableview of the related course
How do I query the jobcourse table so that I can get all the related Courses to a certain job?
You can join the two tables, e.g.:
SELECT course.* FROM course INNER JOIN jobscourse ON jobscourse.cID = course.cID WHERE jobscourse.jID = ?
That gets all entries from course where the jID in jobscourse is equal to some value.

Data normalization / Searching across multiple fields

have some denormalized data, along the lines of the following:
FruitData:
LOAD * INLINE [
ID,ColumnA, ColumnB, ColumnC
1,'Apple','Pear','Banana'
2,'Banana','Mango','Strawberry'
3,'Pear','Strawberry','Kiwi'
];
MasterFruits
LOAD * INLINE [
Fruitname
'Apple'
'Banana'
'Pear'
'Mango'
'Kiwi'
'Strawberry'
'Papaya'
];
And what I need to do is compare these fields to a master list of fruit (held in another table). This would mean that if I chose Banana, IDs 1 and 2 would come up and if I chose Strawberry, IDs 2 and 3 would come up.
Is there any way I can create a listbox that searches across all 3 fields at once?
A list box is just a mechanism to allow you to "select" a value in a certain field as a filter. The real magic behind what Qlikview is doing comes from the associations made in the data model. Since your tables have no common field you couldn't, for example, load a List Box for Fruitname and click something and have it alter List Boxes for other fields such as ColumnA, B, or C. To get the behavior you want you need to associate the two tables. This is can be accomplished by concatenating the various columns into one column (essentially normalizing the data).
[LinkTable]:
LOAD Distinct ColumnA as Fruitname,
ID
Resident FruitData;
Concatenate([LinkTable])
LOAD Distinct ColumnB as Fruitname,
ID
Resident FruitData;
Concatenate([LinkTable])
LOAD Distinct ColumnC as Fruitname,
ID
Resident FruitData;
You can see the table this produces here:
and the data model looks like this:
and finally, the desired behavior:

How to select the max record for each of a group of candidates in Grails?

I have a table that gets populated every day with records from reporting systems.
I have a list of the serial numbers those i am interested in returning in an asset list.
How do I get Grails to return the records that match the maximum "epoch" entry for each asset? In sql I would cross join the table back to itself after picking out the maximum such as:
select a.* from assetTable a inner join (select sn, max(epoch) epoch from assetTable group by sn) b on a.sn = b.sn and a.epoch = b.epoch
but I cannot figure out how to get this done efficiently with Grails...
From a domain class perspective it is pretty simple. Consider for the same of example that I have a single domain class "AssetTable" and it has Integer epoch, String sn, ...
Literally, all I want to do is get the latest entry (all fields) for a subset of serial numbers (sn) that I have in a List.

SQL Syntax Challenge

I have two tables, one containing a list of different options users can select from. For example:
tbl_options
id_option
option
The next table I use to store which of these options the user selects. For example:
tbl_selected
id_selected
id_option
id_user
I use PHP to loop through the tbl_options table to generate a full list of checkboxes that the user can select from. When a user selects an option, the id_option and id_user are stored in the tbl_selected table. When a user deselects an option, the id_selected record is deleted from the tbl_selected table.
The challenge I am having is the best way to retrieve the full list of options in tbl_options, plus having the query indicate the associated records stored in the tbl_selected table.
I've tried LEFT JOIN'ing tbl_options to tbl_selected which provides me with the full list of options, but as soon as I add the WHERE id_user = ### the query only returns those records with values in tbl_selected. Ideally, I would like to see the results from a query as follows:
id_option option id_user
1 Apples 3
2 Oranges 3
3 Bananas
4 Pears
5 Peaches 3
This would indicate that user #3 has stored Apples, Oranges and Peaches. This also indicates that user #3 has not selected Bananas or Pears.
Is this possible using a SQL statement or should I pursue a different technique?
Your problem is that the user-restriction is applied to the whole query. To apply it only to the Join condition you need to add it to the ON clause like this:
select o.id_option, o.[option], s.id_user
from tbl_options o
left outer join tbl_selected s
on o.id_option = s.id_option and s.id_user = 3

Resources