Hive join over partition - join

I have these 2 tables :
table products
(
product_id bigint,
product_name string
)
partitioned by (product_category as string)
table place_of_sale
(
product_id bigint,
city string
)
partitioned by (country as string)
how can I left join the 2 tables based on 'product_id' but over the partition ‘country’ of the table place_of_sale ?
This is an example with the desired result :
table products
product_id product_name product_category
1000 banana fruit
1001 coconut fruit
1002 ananas fruit
2002 cow animal
2003 beef animal
table place_of_sale
product_id city country
1000 Texas USA
1002 Miami USA
2003 Sydney Australia
desired result for a left join between table products and table place_of_sale over the partition country :
product_id product_name product_category city country
1000 banana fruit Texas USA
1001 coconut fruit null null
1002 ananas fruit Miam USA
2002 cow animal null null
2003 beef animal Sydney Australia
Here the example is given with only 2 different countries but imagine plenty of different countries.
It's like a left join performed for each country and then an union between the results of all countries.

If the sold product is that present in the table place_of_sale, then use LEFT JOIN:
select s.country, p.product_id, p.product_name, p.category,
case when s.product_id is NULL then "not sold" else "sold" as sold
from products p
left join place_of_sale s on p.product_id = s.product_id
order by country, product_id --order if necessary

Related

Order scope results by closest match

I have three models: Doctor, Appointment, Patient. A Doctor has many patients through appointments and a Patient has many doctors, also through appointments.
I have a search query where you can filter doctors by patients, so if you input the names of the patients it will show you any doctor that has assigned any of those patients.
Here is the scope I use to search within the Doctor model:
scope :by_patient, lambda { |*names|
names_array = names.map { |name| "%#{name}%" }
joins(appointments: :patient).where('lower(patients.name) LIKE ANY (array[?])', names_array)
}
E.g.
Dr. Smith -> Patients: Karen, Joe, Kate, Mary
Dr. Johnson -> Patients: Kate, Mary
Dr. Spears -> Patients: John
If you look for Patients Kate and Mary the query will return both
Dr. Smith and Dr. Johnson, however, Dr. Smith should be shown first
because he has both patients while Dr. Johnson only has one.
How could I order the search results by those doctors who have the most patients?

MySQL self-referencing JOIN

I have two tables:
Customers (
int(11) Id,
varchar(255) Name,
int(11) Referred_ID -- Referred_ID being reference to an Id field
-- (no key on that field) and
)
the other table being:
Invoices (
int(11) Id,
date Billing_date,
int(11) Customer_ID
)
I want to select Id, Billing_date of the invoice AND most important, customer's Name this customer refers to.
Now I'm only able to select his referrer's ID by a query like this one:
SELECT Invoices.Id, Invoices.Billing_date, Customers.Name, Referred_ID
FROM Invoices
INNER JOIN Customers ON Invoices.Customer_Id = Customers.Id;
How should I modify my query to replace that Referred_ID by its owner name?
It's a MySQL from something like 2015, by the way.
You could use two time the customers using an alias for join the referred
SELECT Invoices.Id, Invoices.Billing_date, Customers.Name, Referred.Name
FROM Invoices
INNER JOIN Customers ON Invoices.Customer_Id = Customers.Id
INNER JOIN Customers Referred on Referred.id = Customers.Referred_ID;
use Customers table twice in join
SELECT Invoices.Id, Invoices.Billing_date,
c1.Name as customername,
c1.Referred_ID,
c2.Name as refername
FROM Invoices INNER JOIN Customers c1 ON Invoices.Customer_Id = c1.Id
join Customers c2 on c1.Id=c2.Referred_ID

RAILS: How to select fields from associated table with grouping and create a hash from result?

I would like to create an active record query and store the result in a hash which includes summary information from associated tables as follows.
Here is the tables and associations:
Post belongs_to: category
Category has_many: posts
Here I would like to count the # of posts in each category and create a summary table as follows (with the SQL query for the desired table):
select c.name, count(p.id) from posts a left join categories c on p.category_id = c.id where p.status = 'Approved' group by (c.name) order by (c.name);
Category | count
---------------+-------
Basketball | 2
Football | 3
Hockey | 4
(3 rows)
Lastly I would like to store the result in a hash as follows:
summary_hash = { 'Basketball' => 2, 'Football' => 3, 'Hockey' => 4 }
I will appreciate if you can guide me how to write the active record query and store the result in the hash.
Try
Post.where(status: 'Approved').joins(:category).
select("categories.name").group("categories.name").count

customize sql for has_many relation

I have two Tables:
Locations, which is self-refential:
int id,
string name,
int location_id
and Nodes:
int id,
string name,
int location_id
The Relations are:
class Node
belongs_to :location
end
class Location
has_many :nodes
end
That works, but i want not only the direct associated Nodes for an Location but also that Nodes, wich are associated to any Child of the Location. I have an Select Statement with some CTE, which archives exactly this:
with sublocations (name, id, lvl) as
(
select
l.name,
l.id,
1 as lvl
from locations l
where l.id = 10003
union all
select
sl.name,
sl.id,
lvl + 1 as lvl
from sublocations inner join locations sl
on (sublocations.id = sl.location_id)
)
select
sl.name as location,
sl.id as location_code,
n.name
from sublocations sl join nodes n on n.LOCATION_ID = sl.ID;
But how can i bring this in the has_many Relation?
Thanks, Jan

Order by the sum of an associations property

I have a Department model with an expenses association:
class Department < ActiveRecord::Base
has_many :expenses
end
class Expense < ActiveRecord::Base
belongs_to :department
end
An expense has an amount property:
e = Expense.new
e.amount = 119.50
I want 2 queries now:
list all departments, ordered by SUM of expenses.
same as #1, but grouped by month i.e. jan, feb, march, ...
For #1, the following code will get you the department ids sorted by sum of expenses:
Expense.select('department_id, sum(amount) as total').group('department_id').order('total desc')
Here is a sample code on how to use the returned objects:
Expense.select('department_id, sum(amount) as total').group('department_id').order('total desc').each { |dep| print "Department ID: #{dep.department_id} | Total expense: #{dep.total}\n" }
This will print something like:
Department ID: 2 | Total expense: 119.50
Department ID: 1 | Total expense: 54.34
Department ID: 10 | Total expense: 23.43
For #2, you can similarly add the month grouping along with the sum:
Expense.select('department_id, extract(month from created_at) as month, sum(amount) as total').group('department_id, month').order('month asc, total desc')
Again, a sample code to demonstrate how to use it:
Expense.select('department_id, extract(month from created_at) as month, sum(amount) as total').group('department_id, month').order('month asc, total desc').each { |dep| print "Department ID: #{dep.department_id} | Month: #{dep.month} | Total expense: #{dep.total}\n" }
This will will print something like:
Department ID: 2 | Month: 1 | Total expense: 119.50
Department ID: 1 | Month: 1 | Total expense: 54.34
Department ID: 10 | Month: 1 | Total expense: 23.43
Department ID: 1 | Month: 2 | Total expense: 123.45
Department ID: 2 | Month: 2 | Total expense: 76.54
Department ID: 10 | Month: 2 | Total expense: 23.43
... and so on.
Of course, once you have the department Ids, you can use Department.find() to get the rest of information. I believe ActiveRecord does not support getting at the same time all the Department fields directly without using raw SQL.
EDIT ----
If you want to include the department fields you can either:
1 - Load them in separate queries like:
Expense.select('department_id, sum(amount) as total').group('department_id').order('total desc').each do |department_expense|
# In department_expense you have :department_id and :total
department = Department.find(department_expense.department_id)
# In department now you have the rest of fields
# Do whatever you have to do with this row of department + expense
# Example
print "Department #{department.name} from #{department.company}: $#{department_expense.total}"
end
Advantage: Using ActiveRecord SQL abstractions is nice and clean.
Drawback: You are doing a total of N+1 queries, where N is the number of departments, instead of a single query.
2 - Load them using raw SQL:
Department.select('*, (select sum(amount) from expenses where department_id = departments.id) as total').order('total desc').each do |department|
# Now in department you have all department fields + :total which has the sum of expenses
# Do whatever you have to do with this row of department + expense
# Example
print "Department #{department.name} from #{department.company}: $#{department.total}"
end
Advantage: You are doing a single query.
Drawback: You are losing the abstraction that ActiveRecord is providing to you from SQL.
Both will print:
Department R&D from Microsoft: $119.50
Department Finance from Yahoo: $54.34
Department Facilities from Google: $23.43

Resources