Using distinct in a join - join

I'm still a novice at SQL and I need to run a report which JOINs 3 tables. The third table has duplicates of fields I need. So I tried to join with a distinct option but hat didn't work. Can anyone suggest the right code I could use?
My Code looks like this:
SELECT
C.CUSTOMER_CODE
, MS.SALESMAN_NAME
, SUM(C.REVENUE_AMT)
FROM C_REVENUE_ANALYSIS C
JOIN M_CUSTOMER MC ON C.CUSTOMER_CODE = MC.CUSTOMER_CODE
/* This following JOIN is the issue. */
JOIN M_SALESMAN MS ON MC.SALESMAN_CODE = (SELECT SALESMAN_CODE FROM M_SALESMAN WHERE COMP_CODE = '00')
WHERE REVENUE_DATE >= :from_date
AND REVENUE_DATE <= :to_date
GROUP BY C.CUSTOMER_CODE, MS.SALESMAN_NAME
I also tried a different variation to get a DISTINCT.
/* I also tried this variation to get a distinct */
JOIN M_SALESMAN MS ON MC.SALESMAN_CODE =
(SELECT distinct(SALESMAN_CODE) FROM M_SALESMAN)
Please can anyone help? I would truly appreciate it.
Thanks in advance.

select distinct
c.customer_code,
ms.salesman_code,
SUM(c.revenue_amt)
FROM
c_revenue c,
m_customer mc,
m_salesman ms
where
c.customer_code = mc.customer_code
AND mc.salesman_code = ms.salesman_code
AND ms.comp_code = '00'
AND Revenue_Date BETWEEN (from_date AND to_date)
group by
c.customer_code, ms.salesman_name
The above will return you any distinct combination of Customer Code, Salesman Code and SUM of Revenue Amount where the c.CustomerCode matches an mc.customer_code AND that same mc record matches an ms.salesman_code AND that ms record has a comp_code of '00' AND the Revenue_Date is between the from and to variables. Then, the whole result will be grouped by customer code and salesman name; the only thing that will cause duplicates to appear is if the SUM(revenue) is somehow different.
To explain, if you're just doing a straight JOIN, you don't need the JOIN keywords. I find it tends to convolute things; you only need them if you're doing an "odd" join, like an LEFT/RIGHT join. I don't know your data model so the above MIGHT still return duplicates but, if so, let me know.

Related

Rails: Collect records whose join tables appear in two queries

There are three models that matter here: Objective, Student, and Seminar. All are associated with has_and_belongs_to_many.
There is an ObjectiveStudent join model that includes columns "ready" and "points_all_time". There is an ObjectiveSeminar join model that includes column "priority".
I need to collect all of the objectives that are associated with a given student and also with a given seminar.
They need to also be marked with a "priority" above zero in the seminar. So I think I need this line:
obj_sems = ObjectiveSeminar.where(:seminar => given_seminar).where("priority > ?", 0)
Finally, they need to also be objectives where the student is ready, but has not scored above 7. So I think I need this line:
obj_studs = ObjectiveStudent.where(:user => given_student, :ready => true).where("points_all_time <= ?", 7)
Is there a way to gather all the objectives whose join table records appear in both of the above queries? Note that neither of the lists return objectives; they return objective_seminars, and objective_students, respectively. My end goal is to collect the objectives that meet all of the above criteria.
Or am I approaching this all wrong?
Bonus question: I would also love to sort the objectives by their priority in the given seminar. But I'm afraid that would add too much to the database load. What are your thoughts on this?
Thank you in advance for any insight.
In order to get Objectives you'll need to start your query from that.
In order to query with an AND condition the associated tables, you'll need inner joins with these tables.
Finally you'll need a distinct operator to only fetch each objective once.
The extended version of what (I think) you need is:
Objective.joins(objective_seminars: :seminar, objective_student: :student).
where(seminars: seminar_search_params, strudents: student_search_params).
where('objective_seminars.priority > 0').
where('objective_students.ready = 1 AND points_all_time <= 7').
order('objective_seminars.priority ASC').
distinct
Now for the database load it all depends on your indexes and the size of your tables.
The above query will translate to the following SQL (or something similar).
SELECT DISTINCT objectives.* FROM objectives
INNER JOIN objective_students ON objective_students.objective_id = objectives.id
INNER JOIN students ON students.id = objective_students.student_id
INNER JOIN objective_seminars ON objective_seminars.objective_id = objectives.id
INNER JOIN seminars ON seminars.id = objective_seminars.seminar_id
WHERE seminars_query AND
students_query AND
objective_seminars.priority > 0 AND
objective_students.ready = 1 AND points_all_time <= 7 AND
objective_seminars.priority ASC
So you'll need to add or extend your indexes so that all 5 tables queries can have an index helping out. The actual index implementation is up to you and depends on your application's specific (read - write load, tables size, cardinality etc)

Second foreign_key to speed up query

Say you creating an imdb type site for TV Shows. You have a Show with many attached episodes and a bunch of people
Right now I link people to episodes though a contribution table - but if I want to make a list of all the shows they are on, I have to go through episodes.
Since this query takes a long time I was thinking about adding show_id to the contributions table. Is this common practice to increase performance or is there another way I haven't thought of?
Since this query takes a long time
Have you run a SQL explain plan to show why this is the case? What is the actual SQL query that is being run, and are you doing things like ordering or running subqueries within it?
If I understand your structure it is something like this:
|people| n---1 |contribution| 1---n |episodes| n---1 |shows|
A sql select of the sort:
select distinct s.name
from shows s,
episodes e,
contribution c
where c.people_id = <id>
and c.episode_id = e.id
and e.show_id = s.id
should really not have performance issues unless there are no indexes on the tables or the tables are massive.
Here's a way using where id in ( ... ) to select all shows a specific person appeared in
Shows.where(id: Contribution.select("show_id")
.join(:episodes)
.where(person_id: personId)
.group("episodes.show_id"))
You may also want to try exists
Shows.where("EXISTS(SELECT 1 from contributions c
join episodes e on e.id = c.episode_id
where c.person_id = ? and e.show_id = shows.id)")

Issues with joining multiple tables

I'm really struggling at the moment trying to work out how to join multiple tables without duplicating data.
At the moment I have 8 tables that I was wanted to get various information from per member of staff like the below:
SDQ score, Goal scores, CHI score, number of appointments, number of dna appointments
The tables and field I can see to join are as follows
tblSDQ - Assessed_By_Staff_ID
tblGoals - Recorded_By_Staff_ID
tblCHI - Recorded_By_Staff_ID
tblReferral - Staff_ID
tblStaff - Staff_ID
tblDiaryAppointment - needs to connect to tblDiaryAppointmentClinician using Clinician_Invitee_Staff_ID
I hope someone can help or advice. I just don't know if it's even possible to join all these tables using the same field, or if its possible to join them but then return a number of entries but then just count others?
Syntax depends on a rdbms you are using.
You could use join with specified join fields from both tables:
select bla-bla
from table1
join table2 on ( table1.fileld_name1 = table2.fileld_name2 )
https://dev.mysql.com/doc/refman/5.0/en/join.html
if you need outer join (to show nulls for optional tables data) you could use this:
join table2 on ( table1.fileld_name1 = table2.fileld_name2 or table2.field_name2 is null )
to join with couns you could use subqueries like this
join ( select field_name3, coint(*) as cnt from table3 goup by field_name3 ) AS table3_counts
...
where ( table3_counts.field_name3 = ... or table3_counts.field_name3 is null )
https://dev.mysql.com/doc/refman/5.0/en/from-clause-subqueries.html
PS: Joins are often slow. It's better to denormalize tables to eliminate joins and gain performance. Or do simple selects and join in backend code.

SQL Multiple Joins on same table with where clause

Folks,
I've had a pretty thorough search before posting and couldn't see this answered anywhere previously. Perhaps it isn't possible.... I'm using SQL server 2008 R2
Anyway, thanks in advance for looking/helping.
I have two tables that I'd like to join.
Table1 (t1):
Account------Name--------Amount
12345-------account1-----10000.00
12346-------account2-----20000.00
Table2 (t2):
ID-----Account---extraData
10-----12345-----ZZ100
20-----12345-----ZZ250
30-----12345-----ZZ400
10-----12346-----ZZ150
20-----12346-----ZZ200
I'm trying to return the following from the above tables:
t1.Account---t1.Name------ID1(t2.ID=10)---ID2(td.ID=20)----SUM(Amount)
12345--------account1-------ZZ100------------ZZ250-------------10000.00
12346--------account2-------ZZ150------------ZZ200-------------20000.00
I have tried various joins of sorts and a union, but can't seem to get the results above. Most result in either nothing, or the Amount column returning as double the required result.
My starting point is:
Select t1.Account, t1.Name, t2A.extraData, t2B.extraData, SUM(t1.AMOUNT)
from table1 t1
join table2 t2A on t1.Account = t2A.Account and t2A.ID = '10'
join table2 t2B on t1.Account = t2B.Account and t2B.ID = '20'
Group by t1.Account, t1.Name, t2A.extraData, t2B.extraData
I've reduced the code and complexity of the query for this thread, but the problem is as above. I have no control over the table structure as they form part of an accounting system that I can't amend (I could, but I'd upset one or two people!).
Hopefully I've explained the issue clearly enough. It seems like it should be simple, but I can't seem to fathom it - perhaps I've just been staring too long. Anyway, thanks in advance for your assistance.
Edit: to change the code to reflect the first response highlighting a mistake in my posting.
Please try this. I think this helps you to achieve your result.
DECLARE #ids varchar(max)
SELECT #ids=STUFF((SELECT DISTINCT ', [' + CAST(ID AS VARCHAR(10))+']'
FROM t2
FOR XML PATH(''), TYPE)
.value('.','NVARCHAR(MAX)'),1,2,' ')
SELECT #ids
EXECUTE ('SELECT
Account,Name,'+#ids+',Amount
FROM
(SELECT t1.Account,Name,ID,ExtraData,SUM(Amount) AS Amount
FROM t1 t1 INNER JOIN t2 t2 ON t1.Account=t2.Account
GROUP BY t1.Account,Name,ID,ExtraData) AS SourceTable
PIVOT
(
MAX(ExtraData)
FOR ID IN ('+#ids+')
) AS PivotTable;')

Trying to create one table from four

I'm stuck trying to create a query that pulls results from at least three different tables with many to many relationships.
I want to end up with a table that lists cases, the outcomes and complaints.
All cases may have none, one or multiple outcomes, same relationship applies to the complaints. I want to be able to have the case listed once, then subsequent columns to list all the outcomes and complaints related to that case.
I have tried GROUP_CONCAT to get the outcomes in one column instead of repeating the cases but when I use UNION to combine the outcomes and complaints one column header overwrites the other.
Any help appreciated and here's the link to the fiddle http://sqlfiddle.com/#!2/d111e/2/0
I suggest you START with this this query structure:
SELECT
c.caseID, c.caseTitle, c.caseSynopsis /* if more columns ... add to group by also */
, group_concat(co.concern)
, group_concat(re.resultText)
FROM caseSummaries AS c
LEFT JOIN JNCT_CONCERNS_CASESUMMARY AS JCC ON c.caseID = JCC.caseSummary_FK
LEFT JOIN CONCERNS AS co ON JCC.concerns_FK = co.concernsID
LEFT JOIN JNCT_RESULT_CASESUMMARY AS JRC ON c.caseID = JRC.caseSummary_FK
LEFT JOIN RESULTS AS re ON JRC.result_FK = re.result_ID
GROUP BY
c.caseID, c.caseTitle, c.caseSynopsis /* add more ... here also */
;
Treat the table caseSummaries as the most important and then everything else "hangs off" that.
Please note that although MySQL will allow it, you should place EVERY non-aggregating column that you include in the select clause into the group by clause also.
also see: http://sqlfiddle.com/#!2/2d1a79/7

Resources