Preventing duplicate values in multiple joins of the same table

Preventing duplicate values in multiple joins of the same table - ruby-on-rails

I have two tables in my project that I need to join together in a somewhat complicated way and it is giving me very strange issues
I have a concept of teams and a concept of FeedItems. FeedItems means the team has solved a challenge. I need to know the last time they solved a challenge at, and I also need to calculate the sum of point-based FeedItems.
SELECT COALESCE(sum(challenges.point_value), 0) + COALESCE(sum(point_feed_items.point_value), 0) as team_score,
GREATEST(MAX(pentest_feed_items.created_at), MAX(point_feed_items.created_at)) as last_solve_time, teams.* FROM "teams"
LEFT JOIN feed_items AS point_feed_items
ON point_feed_items.team_id = teams.id
AND point_feed_items.type IN ('StandardSolvedChallenge', 'ScoreAdjustment')
LEFT JOIN feed_items AS pentest_feed_items
ON pentest_feed_items.team_id = teams.id
AND pentest_feed_items.type IN ('PentestSolvedChallenge')
LEFT JOIN challenges ON challenges.id = point_feed_items.challenge_id
AND challenges.type IN ('StandardChallenge') WHERE "teams"."division_id" = $1
GROUP BY teams.id ORDER BY "teams"."created_at" ASC
This works nearly all the time, I am just running into an edge case where I will sometimes end up with the same ScoreAdjustment in the point_feed_items.point_value sum. I called COUNT(point_feed_items.point_value) and verified that I somehow had 3 elements coming back even though there should only be 1. I have so far been unable to figure out either why the same element is sometimes coming back multiple times, or how to call DISTINCT as part of the LEFT JOIN to avoid the problem completely.
I did find that removing the 2nd LEFT JOIN did fix the issue, however I need the data from that LEFT JOIN.
To put the issue another way, I replaced COALESCE(sum(point_feed_items.point_value), 0) with COALESCE(COUNT(point_feed_items.point_value), 0) and verified with no ScoreAdjustments in the database that it returned 0. I then created one ScoreAdjustment with the correct team and COALESCE(COUNT(point_feed_items.point_value), 0) then returned 3 instead of 1. Am I misunderstanding how LEFT JOIN AS works?
This is part of a rails app, however it is mostly written as a manual query for better performance.

Turns out this was all due to a misunderstanding of how LEFT JOIN works when you do it multiple times.
Given the following (simplified example) data:
I was thinking of the LEFT JOIN as looking something like this:
And it actually looked something like this:
I went ahead and switched my query over to look as follows:
SELECT COALESCE(sum(point_feed_items.team_score), 0) as team_score,
GREATEST(MAX(pentest_feed_items.last_solve_time),
MAX(point_feed_items.last_solve_time)) as last_solve_time,
teams.*
FROM "teams"
LEFT JOIN LATERAL
(
SELECT
COALESCE(sum(challenges.point_value), 0) + COALESCE(sum(feed_items.point_value), 0) as team_score,
MAX(feed_items.created_at) as last_solve_time
FROM feed_items
LEFT JOIN challenges ON challenges.id = feed_items.challenge_id AND challenges.type IN ('StandardChallenge')
WHERE feed_items.team_id = teams.id
AND feed_items.type IN ('StandardSolvedChallenge', 'ScoreAdjustment')
) AS point_feed_items ON true
LEFT JOIN LATERAL
(
SELECT MAX(feed_items.created_at) as last_solve_time
FROM feed_items
WHERE feed_items.team_id = teams.id
AND feed_items.type IN ('PentestSolvedChallenge')
) AS pentest_feed_items ON true
WHERE "teams"."division_id" = $1 GROUP BY teams.id
And everything works fine now.

Related

How to use the exceptjoin in Cognos-11?

I don't get an except join to work in Cognos-11. Where or what am I missing?
Some understanding for a beginner in this branch would be nice ;-)
What I've tried so far is making two queries. The first one holds data items like "customer", "BeginningDate" and "Purpose". The second query holds data items like "customer", "Adress" and "Community".
What I'd like to accomplish is to get in query3: the "customers" from query1 that are not available in query2. To me it sounds like an except-join.
I went to the query work area, created a query3 and dragged an "except-join" icon on it. Then I dragged query1 into the upper space and query2 into the lower. What I'm used to getting with other joins, is a possibility to set a new link, cardinality and so on. Now double clicking the join isn't opening any pop-up. The properties of the except-join show "Set operation = Except", "Duplicates = remove", "Projection list = Manual".
How do I get query3 filled with the data item "customer" that only holds a list of customers which are solely appearing in query1?

In SQL terms, you want
select T2.C1
from T1
left outer join T2 on T1.C1 = T2.C1
where T2.C1 is null
So, in the query pane of a Cognos report...
Use a regular join.
Join using customer from both queries.
Change the cardinality to 1..1 on the query1 side and 0..1 on the query2 side.
In the filters for query3, add a filter for query2.customer is null.

EXCEPT is not a join. It is used to compare two data sets.
https://learn.microsoft.com/en-us/sql/t-sql/language-elements/set-operators-except-and-intersect-transact-sql?view=sql-server-2017
What you need is an INNER JOIN. That would be the join tool in the Toolbox in Cognos.

Using distinct in a join

I'm still a novice at SQL and I need to run a report which JOINs 3 tables. The third table has duplicates of fields I need. So I tried to join with a distinct option but hat didn't work. Can anyone suggest the right code I could use?
My Code looks like this:
SELECT
C.CUSTOMER_CODE
, MS.SALESMAN_NAME
, SUM(C.REVENUE_AMT)
FROM C_REVENUE_ANALYSIS C
JOIN M_CUSTOMER MC ON C.CUSTOMER_CODE = MC.CUSTOMER_CODE
/* This following JOIN is the issue. */
JOIN M_SALESMAN MS ON MC.SALESMAN_CODE = (SELECT SALESMAN_CODE FROM M_SALESMAN WHERE COMP_CODE = '00')
WHERE REVENUE_DATE >= :from_date
AND REVENUE_DATE <= :to_date
GROUP BY C.CUSTOMER_CODE, MS.SALESMAN_NAME
I also tried a different variation to get a DISTINCT.
/* I also tried this variation to get a distinct */
JOIN M_SALESMAN MS ON MC.SALESMAN_CODE =
(SELECT distinct(SALESMAN_CODE) FROM M_SALESMAN)
Please can anyone help? I would truly appreciate it.
Thanks in advance.

select distinct
c.customer_code,
ms.salesman_code,
SUM(c.revenue_amt)
FROM
c_revenue c,
m_customer mc,
m_salesman ms
where
c.customer_code = mc.customer_code
AND mc.salesman_code = ms.salesman_code
AND ms.comp_code = '00'
AND Revenue_Date BETWEEN (from_date AND to_date)
group by
c.customer_code, ms.salesman_name
The above will return you any distinct combination of Customer Code, Salesman Code and SUM of Revenue Amount where the c.CustomerCode matches an mc.customer_code AND that same mc record matches an ms.salesman_code AND that ms record has a comp_code of '00' AND the Revenue_Date is between the from and to variables. Then, the whole result will be grouped by customer code and salesman name; the only thing that will cause duplicates to appear is if the SUM(revenue) is somehow different.
To explain, if you're just doing a straight JOIN, you don't need the JOIN keywords. I find it tends to convolute things; you only need them if you're doing an "odd" join, like an LEFT/RIGHT join. I don't know your data model so the above MIGHT still return duplicates but, if so, let me know.

Get incremental changes between Hive partitions

I have a nightly job that runs and computes some data in hive. It is partitioned by day.
Fields:
id bigint
rank bigint
Yesterday
output/dt=2013-10-31
Today
output/dt=2013-11-01
I am trying to figure out if there is a easy way to get incremental changes between today and yesterday
I was thinking about doing a left outer join but not sure what that looks like since its the same table
This is what it might looks like when there are different tables
SELECT * FROM a LEFT OUTER JOIN b
ON (a.id=b.id AND a.dt='2013-11-01' and b.dt='2-13-10-31' ) WHERE a.rank!=B.rank
But on the same table it is
SELECT * FROM a LEFT OUTER JOIN a
ON (a.id=a.id AND a.dt='2013-11-01' and a.dt='2-13-10-31' ) WHERE a.rank!=a.rank
Suggestions?

This would work
SELECT a.*
FROM A a LEFT OUTER JOIN A b ON a.id = b.id
WHERE a.dt='2013-11-01' AND b.dt='2013-10-31' AND <your-rank-conditions>;
Efficiently, this would span 1 MapReduce job only.

So I figured it out... Using Subqueries and Joins
select * from (select * from table where dt='2013-11-01') a
FULL OUTER JOIN
(select * from table where dt='2013-10-31') b
on (a.id=b.id)
where a.rank!=b.rank or a.rank is null or b.rank is null
The above will give you the diff..
You can take the diff and figure out what you need to ADD/UPDATE/REMOVE
UPDATE If a.rank!=null and b.rank!=null i.e rank changed
DELETE IF a.rank=null and b.rank!=null i.e the user is no longer ranked
ADD if a.rank!=null and b.rank=null i.e this is a new user

Join Issue - How to bring back only 1 outer join per row

I don't think my title explained it very well :)
I have a query that has quite a few joins on it.
SELECT
sof_slot_games.launch_date,
sof_slot_games.game_name,
sof_reviews.review_content,
sof_slot_games.slot_game_id,
sof_slot_game_details.no_of_reels,
sof_slot_game_details.paylines,
sof_reviews.reg_timestamp,
sof_developers.developer_name,
sof_slot_games.game_slug,
sof_slot_game_images.game_image
FROM
sof_reviews
Inner Join sof_slot_games ON sof_slot_games.slot_game_id = sof_reviews.slot_game_id
Inner Join sof_slot_game_details ON sof_slot_games.slot_game_id = sof_slot_game_details.slot_game_id
Inner Join sof_developers ON sof_slot_games.developer_id = sof_developers.developer_id
left outer Join sof_slot_game_images ON sof_reviews.slot_game_id = sof_slot_game_images.slot_game_id
WHERE
sof_slot_game_images.image_type_id = '3'
ORDER BY
sof_slot_games.launch_date DESC
limit 0,20
The problem is that I want to return just 1 game_image per row. The games themselves are basically the most recent 20 games by launch_date. But if I join with the game_image (type=3), then it will bring back multiple rows for the same game if that game has multiple images.
I want to really just pick the most recent 20 games and then pull back the 1st image for each. This is to go into an RSS Feed for the top 20 games which is why I want to do it this way (in case anyone is wondering) :)
I've been trying to figure this out... I know I have done it before, but my brain is not reminding me what I did :)
Thanks!

Number of joins in select query

I have a business table and in that we have 50 foreign key columns which refers other master data tables.
to fetch all the data my query has to join all the 50 reference tables like
select ct.id , ct.name , ct.description , st.value , pr.value , sv.value , ....
from
core_table ct
left outer join domain_value st on ct.status_fk = st.id
left outer join domain_value pr on ct.priority_fk = pr.id
left outer join domain_value svon ct.severity_fk = sv.id
.......
.......
so like this i need to make 50 left outer joins.
is this right to do 50 left outer joins like this or do we have any other optimized way to achieve this ?

Is too many Left Joins a code smell?
It's a perfectly legitimate solution for some designs.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Preventing duplicate values in multiple joins of the same table - ruby-on-rails

Related

How to use the exceptjoin in Cognos-11?

Using distinct in a join

Get incremental changes between Hive partitions

Join Issue - How to bring back only 1 outer join per row

Number of joins in select query

Categories

Resources