Data normalization / Searching across multiple fields - normalization

have some denormalized data, along the lines of the following:
FruitData:
LOAD * INLINE [
ID,ColumnA, ColumnB, ColumnC
1,'Apple','Pear','Banana'
2,'Banana','Mango','Strawberry'
3,'Pear','Strawberry','Kiwi'
];
MasterFruits
LOAD * INLINE [
Fruitname
'Apple'
'Banana'
'Pear'
'Mango'
'Kiwi'
'Strawberry'
'Papaya'
];
And what I need to do is compare these fields to a master list of fruit (held in another table). This would mean that if I chose Banana, IDs 1 and 2 would come up and if I chose Strawberry, IDs 2 and 3 would come up.
Is there any way I can create a listbox that searches across all 3 fields at once?

A list box is just a mechanism to allow you to "select" a value in a certain field as a filter. The real magic behind what Qlikview is doing comes from the associations made in the data model. Since your tables have no common field you couldn't, for example, load a List Box for Fruitname and click something and have it alter List Boxes for other fields such as ColumnA, B, or C. To get the behavior you want you need to associate the two tables. This is can be accomplished by concatenating the various columns into one column (essentially normalizing the data).
[LinkTable]:
LOAD Distinct ColumnA as Fruitname,
ID
Resident FruitData;
Concatenate([LinkTable])
LOAD Distinct ColumnB as Fruitname,
ID
Resident FruitData;
Concatenate([LinkTable])
LOAD Distinct ColumnC as Fruitname,
ID
Resident FruitData;
You can see the table this produces here:
and the data model looks like this:
and finally, the desired behavior:

Related

How can I update a single field from the results of a select with a join

I wish to change a flag from Y to N. I have been given a select which gives me two records for which I would like to change the value.
As a newbie to anything more advanced than the basics, I am totally at a loss.
As this is a live table, I would am too cautious to attempt this without fully understanding that I will update the correct records.
SELECT a.*, '||' , b.* FROM basecode b, type t
WHERE b.b_id IN ('Val1', 'Val2', 'Val3')
AND b.btype_id = t.ttype_id
The resultant code gives me records with multiple fields, but I just wish to change just a couple of records flag fields from 'Y' to 'N'
Stripping away the other fields I have
iflag='Y'
oflag='Y'
And just want them to be set to 'N' from the previous select.

Count occurrence of values in a serialized attribute(array) in Active Admin dashboard (Rails, Active admin 1.0, Postgresql database, postgres_ext gem)

I'd like to have a basic table summing up the number of occurence of values inside arrays.
My app is a Daily Deal app built to learn more Ruby on Rails.
I have a model Deals, which has one attribute called Deal_goal. It's a multiple select which is serialized in an array.
Here is the deal_goal taken from schema.db:
t.string "deal_goal",:array => true
So a deal A can have deal= goal =[traffic, qualification] and another deal can have as deal_goal=[branding, traffic, acquisition]
What I'd like to build is a table in my dashboard which would take each type of goal (each value in the array) and count the number of deals whose deal_goal's array would contain this type of goal and count them.
My objective is to have this table:
How can I achieve this? I think I would need to group each deal_goal array for each type of value and then count the number of times where this goals appears in the arrays. I'm quite new to RoR and can't manage to do it.
Here is my code so far:
column do
panel "top of Goals" do
table_for Deal.limit(10) do
column ("Goal"), :deal_goal ????
# add 2 columns:
'nb of deals with this goal'
'Share of deals with this goal'
end
end
Any help would be much appreciated!
I can't think of any clean way to get the results you're after through ActiveRecord but it is pretty easy in SQL.
All you're really trying to do is open up the deal_goal arrays and build a histogram based on the opened arrays. You can express that directly in SQL this way:
with expanded_deals(id, goal) as (
select id, unnest(deal_goal)
from deals
)
select goal, count(*) n
from expanded_deals
group by goal
And if you want to include all four goals even if they don't appear in any of the deal_goals then just toss in a LEFT JOIN to say so:
with
all_goals(goal) as (
values ('traffic'),
('acquisition'),
('branding'),
('qualification')
),
expanded_deals(id, goal) as (
select id, unnest(deal_goal)
from deals
)
select all_goals.goal goal,
count(expanded_deals.id) n
from all_goals
left join expanded_deals using (goal)
group by all_goals.goal
SQL Demo: http://sqlfiddle.com/#!15/3f0af/20
Throw one of those into a select_rows call and you'll get your data:
Deal.connection.select_rows(%q{ SQL goes here }).each do |row|
goal = row.first
n = row.last.to_i
#....
end
There's probably a lot going on here that you're not familiar with so I'll explain a little.
First of all, I'm using WITH and Common Table Expressions (CTE) to simplify the SELECTs. WITH is a standard SQL feature that allows you to produce SQL macros or inlined temporary tables of a sort. For the most part, you can take the CTE and drop it right in the query where its name is:
with some_cte(colname1, colname2, ...) as ( some_pile_of_complexity )
select * from some_cte
is like this:
select * from ( some_pile_of_complexity ) as some_cte(colname1, colname2, ...)
CTEs are the SQL way of refactoring an overly complex query/method into smaller and easier to understand pieces.
unnest is an array function which unpacks an array into individual rows. So if you say unnest(ARRAY[1,2]), you get two rows back: 1 and 2.
VALUES in PostgreSQL is used to, more or less, generate inlined constant tables. You can use VALUES anywhere you could use a normal table, it isn't just some syntax that you throw in an INSERT to tell the database what values to insert. That means that you can say things like this:
select * from (values (1), (2)) as dt
and get the rows 1 and 2 out. Throwing that VALUES into a CTE makes things nice and readable and makes it look like any old table in the final query.

Change Data Capture with table joins in ETL

In my ETL process I am using Change Data Capture (CDC) to discover only rows that have been changed in the source tables since the last extraction. Then I do the transformation only for this rows. The problem is when I have for example 2 tables which I want to join into one dimension, and only one of them has changed. For example I have table Countries and Towns as following:
Countries:
ID Name
1 France
Towns:
ID Name Country_ID
1 Lyon 1
Now lets say a new row is added to Towns table:
ID Name Country_ID
1 Lyon 1
2 Paris 2
The Countries table has not been changed, so CDC for these tables shows me only the row from Towns table. The problem is when I do the join between Countries and Towns, there is no row in Countries change set, so the join will result in empty set.
Do you have an idea how to solve it? Of course there might be more difficult cases, involving 3 and more tables, and consequential joins.
This is a typical problem found when doing Realtime Change-Data-Capture, or even Incremental-only daily changes.
There's multiple ways to solve this.
One way would be to do your joins on the natural keys in the dimension or mapping table, to get the associated country (SELECT distinct country_name, [..other attributes..] from dim_table where country_id = X).
Another alternative would be to do the join as part of the change capture process - when a row is loaded to towns, a trigger goes off that loads the foreign key values into the associated staging tables (country, etc).
There is allot i could babble on for more information on but i will be specific to what is in your question. I would suggest the following to get the results...
1st Pass is where everything matches via the join...
Union All
2nd Pass Gets all towns where there isn't a country
(left outer join with a where condition that
requires the ID in the countries table to be null/missing).
You would default the Country ID value in that unmatched join to something designated as a "Unmatched Value" typically 0 or -1 is used or a series of standard -negative numbers that you could assign descriptions to later to identify why data is bad for your example -1 could be "Found Town Without Country".

SQL Syntax Challenge

I have two tables, one containing a list of different options users can select from. For example:
tbl_options
id_option
option
The next table I use to store which of these options the user selects. For example:
tbl_selected
id_selected
id_option
id_user
I use PHP to loop through the tbl_options table to generate a full list of checkboxes that the user can select from. When a user selects an option, the id_option and id_user are stored in the tbl_selected table. When a user deselects an option, the id_selected record is deleted from the tbl_selected table.
The challenge I am having is the best way to retrieve the full list of options in tbl_options, plus having the query indicate the associated records stored in the tbl_selected table.
I've tried LEFT JOIN'ing tbl_options to tbl_selected which provides me with the full list of options, but as soon as I add the WHERE id_user = ### the query only returns those records with values in tbl_selected. Ideally, I would like to see the results from a query as follows:
id_option option id_user
1 Apples 3
2 Oranges 3
3 Bananas
4 Pears
5 Peaches 3
This would indicate that user #3 has stored Apples, Oranges and Peaches. This also indicates that user #3 has not selected Bananas or Pears.
Is this possible using a SQL statement or should I pursue a different technique?
Your problem is that the user-restriction is applied to the whole query. To apply it only to the Join condition you need to add it to the ON clause like this:
select o.id_option, o.[option], s.id_user
from tbl_options o
left outer join tbl_selected s
on o.id_option = s.id_option and s.id_user = 3

Create Select clause within Stored Procedure?

I have two tables coming which has data from two different systems. I need to reconcile the data in these two tables
The column mapping needs to be made configurable.
E.g.:
Table A Table B
Col1A, Col2A Col1B, Col2B
MappingTable
Col1A = Col1B
Col2A= Col2B
So I would need to have two result sets like this based on the mappings in the table.This needs to be decided dynamically. i.e. Select _____ from A. The _____ needs to be filled dynamically.
Select Col1A, Col2A from A
Select Col2B, Col1B from B
Is this possible in MySQL?
Use a UNION clause if the column sets are similar (if they're not you won't be able to combine their results in a meaningful way, anyway).
select col1a, col2a from A
union
select col1b, col2b from b

Resources