I'm stuck trying to create a query that pulls results from at least three different tables with many to many relationships.
I want to end up with a table that lists cases, the outcomes and complaints.
All cases may have none, one or multiple outcomes, same relationship applies to the complaints. I want to be able to have the case listed once, then subsequent columns to list all the outcomes and complaints related to that case.
I have tried GROUP_CONCAT to get the outcomes in one column instead of repeating the cases but when I use UNION to combine the outcomes and complaints one column header overwrites the other.
Any help appreciated and here's the link to the fiddle http://sqlfiddle.com/#!2/d111e/2/0
I suggest you START with this this query structure:
SELECT
c.caseID, c.caseTitle, c.caseSynopsis /* if more columns ... add to group by also */
, group_concat(co.concern)
, group_concat(re.resultText)
FROM caseSummaries AS c
LEFT JOIN JNCT_CONCERNS_CASESUMMARY AS JCC ON c.caseID = JCC.caseSummary_FK
LEFT JOIN CONCERNS AS co ON JCC.concerns_FK = co.concernsID
LEFT JOIN JNCT_RESULT_CASESUMMARY AS JRC ON c.caseID = JRC.caseSummary_FK
LEFT JOIN RESULTS AS re ON JRC.result_FK = re.result_ID
GROUP BY
c.caseID, c.caseTitle, c.caseSynopsis /* add more ... here also */
;
Treat the table caseSummaries as the most important and then everything else "hangs off" that.
Please note that although MySQL will allow it, you should place EVERY non-aggregating column that you include in the select clause into the group by clause also.
also see: http://sqlfiddle.com/#!2/2d1a79/7
Related
I recently started using SAS, only receiving a basic training that didn't cover proc sql. I'd like to read up a bit more on SAS sql when I have the time.
For now, I found a solution to what I wanted to do, but I'm having difficulties understanding what is happening.
My issue started when I wanted to find out which subjects in my dataset have a certain value for all their records. I made use of my previously written snippet of code that I thought I understood. I just tried adding a couple more variables and group by statements:
data have;
input subject:$1. myvar:1. mycount:1.;
datalines;
a 1 1
a 0 2
a 0 3
b 1 1
b 0 2
b 1 3
c 1 1
c 1 2 /*This subject has myvar = 1 for all its observations*/
;
run;
*find subjects;
proc sql;
create table want as
/* select*/
/* distinct x.subject */
/* from */
(select distinct subject, count(myvar) as myvar_c
from have where myvar = 1 group by subject) x,
(select distinct subject, max(mycount) as max_c
from have group by subject) y
where x.subject = y.subject and x.myvar_c = y.max_c;
quit;
When removing the commented 'select distinct x.subject from' in the create table statement, the above code works as should.
However, I've previously also created another piece of code, to select all subjects in my dataset that have two types of records:
data have2;
input subject:$1. mytype:1.;
datalines;
a 1
a 0
a 0
b 1
b 0
b 1
c 1
c 1 /*This subject doesn't have two types of records in all its observations*/
;
run;
*Find subjects;
proc sql;
create table want2 as select
distinct x.subject from
have2 x,
(select distinct subject, count(distinct mytype) as mytype_c from have2 group by subject) y
where y.mytype_c = 2 and x.subject = y.subject;
quit;
Which is similar, but didn't require the additional select statement. The first code has 3 select statements, the second code only requires two select statements.
Can someone inform me why this is exactly required?
Or link me some good documentation that lists the specifications of these types of joins - can anyone also inform me of the specific name of this type of join where you only use a comma?
while I'm writing, also see that could've used my code I initially wrote to find subjects that have only 1 type of record and tweak it for my current issue >.< but still would like to know what is happening in the first example.
The SQL join construct
FROM ONE, TWO, THREE, …
is known as a CROSS JOIN and is a join without criteria. The comma (,) syntax is less prevalent today and the following construct is recommended
FROM ONE
CROSS JOIN TWO
CROSS JOIN THREE
The result set is a cartesian product and the number of rows is the product of the number of rows in the cross joined tables.
When the query has criteria (WHERE clause) the join is an INNER JOIN.
The SAS documentation for Proc SQL is a good starting point and includes examples.
joined-table Component
Joins a table with itself or with other tables or views.
…
Table of Contents
Syntax
Required Arguments
Optional Argument
Details
Types of Joins
Joining Tables
Table Limit
Specifying the Rows to Be Returned
Table Aliases
Joining a Table with Itself
Inner Joins
Outer Joins
Cross Joins
Union Joins
Natural Joins
Joining More Than Two Tables
Comparison of Joins and Subqueries
General tip:
If you want to fool around (fiddle) with SQL queries in a browser, try visiting
SQL Fiddle web site.
There are three models that matter here: Objective, Student, and Seminar. All are associated with has_and_belongs_to_many.
There is an ObjectiveStudent join model that includes columns "ready" and "points_all_time". There is an ObjectiveSeminar join model that includes column "priority".
I need to collect all of the objectives that are associated with a given student and also with a given seminar.
They need to also be marked with a "priority" above zero in the seminar. So I think I need this line:
obj_sems = ObjectiveSeminar.where(:seminar => given_seminar).where("priority > ?", 0)
Finally, they need to also be objectives where the student is ready, but has not scored above 7. So I think I need this line:
obj_studs = ObjectiveStudent.where(:user => given_student, :ready => true).where("points_all_time <= ?", 7)
Is there a way to gather all the objectives whose join table records appear in both of the above queries? Note that neither of the lists return objectives; they return objective_seminars, and objective_students, respectively. My end goal is to collect the objectives that meet all of the above criteria.
Or am I approaching this all wrong?
Bonus question: I would also love to sort the objectives by their priority in the given seminar. But I'm afraid that would add too much to the database load. What are your thoughts on this?
Thank you in advance for any insight.
In order to get Objectives you'll need to start your query from that.
In order to query with an AND condition the associated tables, you'll need inner joins with these tables.
Finally you'll need a distinct operator to only fetch each objective once.
The extended version of what (I think) you need is:
Objective.joins(objective_seminars: :seminar, objective_student: :student).
where(seminars: seminar_search_params, strudents: student_search_params).
where('objective_seminars.priority > 0').
where('objective_students.ready = 1 AND points_all_time <= 7').
order('objective_seminars.priority ASC').
distinct
Now for the database load it all depends on your indexes and the size of your tables.
The above query will translate to the following SQL (or something similar).
SELECT DISTINCT objectives.* FROM objectives
INNER JOIN objective_students ON objective_students.objective_id = objectives.id
INNER JOIN students ON students.id = objective_students.student_id
INNER JOIN objective_seminars ON objective_seminars.objective_id = objectives.id
INNER JOIN seminars ON seminars.id = objective_seminars.seminar_id
WHERE seminars_query AND
students_query AND
objective_seminars.priority > 0 AND
objective_students.ready = 1 AND points_all_time <= 7 AND
objective_seminars.priority ASC
So you'll need to add or extend your indexes so that all 5 tables queries can have an index helping out. The actual index implementation is up to you and depends on your application's specific (read - write load, tables size, cardinality etc)
I'm still a novice at SQL and I need to run a report which JOINs 3 tables. The third table has duplicates of fields I need. So I tried to join with a distinct option but hat didn't work. Can anyone suggest the right code I could use?
My Code looks like this:
SELECT
C.CUSTOMER_CODE
, MS.SALESMAN_NAME
, SUM(C.REVENUE_AMT)
FROM C_REVENUE_ANALYSIS C
JOIN M_CUSTOMER MC ON C.CUSTOMER_CODE = MC.CUSTOMER_CODE
/* This following JOIN is the issue. */
JOIN M_SALESMAN MS ON MC.SALESMAN_CODE = (SELECT SALESMAN_CODE FROM M_SALESMAN WHERE COMP_CODE = '00')
WHERE REVENUE_DATE >= :from_date
AND REVENUE_DATE <= :to_date
GROUP BY C.CUSTOMER_CODE, MS.SALESMAN_NAME
I also tried a different variation to get a DISTINCT.
/* I also tried this variation to get a distinct */
JOIN M_SALESMAN MS ON MC.SALESMAN_CODE =
(SELECT distinct(SALESMAN_CODE) FROM M_SALESMAN)
Please can anyone help? I would truly appreciate it.
Thanks in advance.
select distinct
c.customer_code,
ms.salesman_code,
SUM(c.revenue_amt)
FROM
c_revenue c,
m_customer mc,
m_salesman ms
where
c.customer_code = mc.customer_code
AND mc.salesman_code = ms.salesman_code
AND ms.comp_code = '00'
AND Revenue_Date BETWEEN (from_date AND to_date)
group by
c.customer_code, ms.salesman_name
The above will return you any distinct combination of Customer Code, Salesman Code and SUM of Revenue Amount where the c.CustomerCode matches an mc.customer_code AND that same mc record matches an ms.salesman_code AND that ms record has a comp_code of '00' AND the Revenue_Date is between the from and to variables. Then, the whole result will be grouped by customer code and salesman name; the only thing that will cause duplicates to appear is if the SUM(revenue) is somehow different.
To explain, if you're just doing a straight JOIN, you don't need the JOIN keywords. I find it tends to convolute things; you only need them if you're doing an "odd" join, like an LEFT/RIGHT join. I don't know your data model so the above MIGHT still return duplicates but, if so, let me know.
I'd like to have a basic table summing up the number of occurence of values inside arrays.
My app is a Daily Deal app built to learn more Ruby on Rails.
I have a model Deals, which has one attribute called Deal_goal. It's a multiple select which is serialized in an array.
Here is the deal_goal taken from schema.db:
t.string "deal_goal",:array => true
So a deal A can have deal= goal =[traffic, qualification] and another deal can have as deal_goal=[branding, traffic, acquisition]
What I'd like to build is a table in my dashboard which would take each type of goal (each value in the array) and count the number of deals whose deal_goal's array would contain this type of goal and count them.
My objective is to have this table:
How can I achieve this? I think I would need to group each deal_goal array for each type of value and then count the number of times where this goals appears in the arrays. I'm quite new to RoR and can't manage to do it.
Here is my code so far:
column do
panel "top of Goals" do
table_for Deal.limit(10) do
column ("Goal"), :deal_goal ????
# add 2 columns:
'nb of deals with this goal'
'Share of deals with this goal'
end
end
Any help would be much appreciated!
I can't think of any clean way to get the results you're after through ActiveRecord but it is pretty easy in SQL.
All you're really trying to do is open up the deal_goal arrays and build a histogram based on the opened arrays. You can express that directly in SQL this way:
with expanded_deals(id, goal) as (
select id, unnest(deal_goal)
from deals
)
select goal, count(*) n
from expanded_deals
group by goal
And if you want to include all four goals even if they don't appear in any of the deal_goals then just toss in a LEFT JOIN to say so:
with
all_goals(goal) as (
values ('traffic'),
('acquisition'),
('branding'),
('qualification')
),
expanded_deals(id, goal) as (
select id, unnest(deal_goal)
from deals
)
select all_goals.goal goal,
count(expanded_deals.id) n
from all_goals
left join expanded_deals using (goal)
group by all_goals.goal
SQL Demo: http://sqlfiddle.com/#!15/3f0af/20
Throw one of those into a select_rows call and you'll get your data:
Deal.connection.select_rows(%q{ SQL goes here }).each do |row|
goal = row.first
n = row.last.to_i
#....
end
There's probably a lot going on here that you're not familiar with so I'll explain a little.
First of all, I'm using WITH and Common Table Expressions (CTE) to simplify the SELECTs. WITH is a standard SQL feature that allows you to produce SQL macros or inlined temporary tables of a sort. For the most part, you can take the CTE and drop it right in the query where its name is:
with some_cte(colname1, colname2, ...) as ( some_pile_of_complexity )
select * from some_cte
is like this:
select * from ( some_pile_of_complexity ) as some_cte(colname1, colname2, ...)
CTEs are the SQL way of refactoring an overly complex query/method into smaller and easier to understand pieces.
unnest is an array function which unpacks an array into individual rows. So if you say unnest(ARRAY[1,2]), you get two rows back: 1 and 2.
VALUES in PostgreSQL is used to, more or less, generate inlined constant tables. You can use VALUES anywhere you could use a normal table, it isn't just some syntax that you throw in an INSERT to tell the database what values to insert. That means that you can say things like this:
select * from (values (1), (2)) as dt
and get the rows 1 and 2 out. Throwing that VALUES into a CTE makes things nice and readable and makes it look like any old table in the final query.
I have two tables - tool_downloads and tool_configurations. I am trying to retrieve the most recent build date for each tool in my database. The layout of the DB is simple. One table called tool_downloads keeps track of when a tool is downloaded. Another table is called tool_configurations and stores the actual data about the tool. They are linked together by the tool_conf_id.
If I run the following query which omits dates, I get back 200 records.
SELECT DISTINCT a.tool_conf_id, b.tool_conf_id
FROM tool_downloads a
JOIN tool_configurations b
ON a.tool_conf_id = b.tool_conf_id
ORDER BY a.tool_conf_id
When I try to add in date information I get back hundreds of thousands of records! Here is the query that fails horribly.
SELECT DISTINCT a.tool_conf_id, max(a.configured_date) as config_date, b.configuration_name
FROM tool_downloads a
JOIN tool_configurations b
ON a.tool_conf_id = b.tool_conf_id
ORDER BY a.tool_conf_id
I know the problem has something to do with group-bys/aggregate data and joins. I can't really search google since I don't know the name of the problem I'm encountering. Any help would be appreciated.
Solution is:
SELECT b.tool_conf_id, b.configuration_name, max(a.configured_date) as config_date
FROM tool_downloads a
JOIN tool_configurations b
ON a.tool_conf_id = b.tool_conf_id
GROUP BY b.tool_conf_id, b.configuration_name