Hive does not support non equi joins: The common work around is to move the join condition to the where clause, which work fine when you want an inner join. but what about a left join?
Contrived example. Let say we have an orderLineItem table, and we need to join to a ProductPrice table that has a productID, price & a date range for which the price applies. We want to join to this where ProductID=ProductID & OrderDate between start and End date. If a productID or a valid date range do not match, I'd still want to see all orderLineItems.
This SQL fiddle is an example of how we'd do this in MSSQL:
http://sqlfiddle.com/#!6/fb877/7
Problem
If I apply the typical workaround, and move the non equi filter to the where clause, it becomes an inner join. In the case above, in the sql fiddle & below, I have a product ID that is not in the lookup.
Question:
Provided hive does not support non eqi-joins, How can a left non-eqi be achieved ?
[SQLFiddle Content]
Tables:
CREATE TABLE OrderLineItem(
LineItemIDId int IDENTITY(1,1),
OrderID int NOT NULL,
ProductID int NOT NULL,
OrderDate Date
);
CREATE TABLE ProductPrice(
ProductID int,
Cost float,
startDate Date,
EndDate Date
);
loading The data & how we'd join in MSSQL:
--Old Price. Should be ignored
INSERT INTO ProductPrice(ProductID, COST,startDate,EndDate) VALUES (1, 50,'12/1/2012','1/1/2013');
INSERT INTO ProductPrice(ProductID, COST,startDate,EndDate) VALUES (2, 55,'12/1/2012','1/1/2013');
--Price for Order 2. Should be applied to Order 1
INSERT INTO ProductPrice (ProductID, COST,startDate,EndDate) VALUES(1, 20,'12/1/2013','1/1/2014');
INSERT INTO ProductPrice (ProductID, COST,startDate,EndDate) VALUES(2, 25,'12/1/2013','1/1/2014');
--Price for Order 2. Should be applied to Order 2
INSERT INTO ProductPrice (ProductID, COST,startDate,EndDate) VALUES(1, 15,'1/2/2014','3/1/2014');
INSERT INTO ProductPrice (ProductID, COST,startDate,EndDate) VALUES(2, 20,'1/2/2014','3/1/2014');
--January 1st 2014 Order
INSERT INTO OrderLineItem(OrderID,ProductID,OrderDate) VALUES (1, 1,'1/1/2014') ;
INSERT INTO OrderLineItem(OrderID,ProductID,OrderDate) VALUES (1, 2,'1/1/2014');
--Feb 1st 2014 Order
INSERT INTO OrderLineItem(OrderID,ProductID,OrderDate) VALUES (2, 1,'2/1/2014');
INSERT INTO OrderLineItem (OrderID,ProductID,OrderDate) VALUES(2, 2,'2/1/2014');
INSERT INTO OrderLineItem (OrderID,ProductID,OrderDate) VALUES(2, 3,'2/1/2014'); -- no price
SELECT * FROM OrderLineItem;
SELECT * FROM OrderLineItem li LEFT OUTER JOIN ProductPrice p on
p.ProductID=li.ProductID AND OrderDate BETWEEN startDate AND EndDate;
Create a copy of the left table with added serial row numbers:
CREATE TABLE OrderLineItem_serial AS
SELECT ROW_NUMBER() OVER() AS serial, * FROM OrderLineItem;
Remark: This may work better for some tables formats (must be WITHOUT COMPRESSION):
CONCAT(INPUT__FILE__NAME, BLOCK__OFFSET__INSIDE__FILE) AS serial
Do an inner join:
CREATE TABLE OrderLineItem_inner AS
SELECT * FROM OrderLineItem_serial li JOIN ProductPrice p
on p.ProductID = li.ProductID WHERE OrderDate BETWEEN startDate AND EndDate;
Left join by serial:
SELECT * FROM OrderLineItem_serial li
LEFT OUTER JOIN OrderLineItem_inner i on li.serial = i.serial;
Why not use a WHERE clause that allows for NULL cases separately?
SELECT * FROM OrderLineItem li
LEFT OUTER JOIN ProductPrice p
ON p.ProductID=li.ProductID
WHERE ( StartDate IS NULL OR OrderDate BETWEEN startDate AND EndDate);
That should take care of it - if the left join matches it'll use the date logic, if it doesn't it'll keep the NULL values intact as a left join should.
Not sure if you can avoid using a double join:
SELECT *
FROM OrderLineItem li
LEFT OUTER JOIN (
SELECT p.*
FROM ProductPrice p
JOIN OrderLineItem li
ON p.ProductID=li.ProductID
WHERE OrderDate BETWEEN StartDate AND EndDate ) p
ON p.ProductId = li.ProductID
WHERE StartDate IS NULL OR
OrderDate BETWEEN StartDate AND EndDate;
This way if there is a match and StartDate is not null, there has to be a valid start/end date match.
Hive 0.10 supports cross joins, so you could handle all your "theta join" (non-equijoin) conditions in the WHERE clause.
Related
I am looking for a way to perform this query in DQL :
SELECT
t0_.product, t0_.creation_date, t0_.id pvId,
t0_.product AS product,
t0_.file_path AS file_path_3,
t0_.product AS product_11
FROM table0 t0_
join (
select max(creation_date) as createdDate, id, product
from table0
group by product
) t0b on (t0b.createdDate = t0_.creation_date and t0_.product = t0b.product)
LEFT JOIN table1 t1_ ON t0_.product = t1_.id
LEFT JOIN table2 t2_ ON t1_.id = t2_.otherId
LEFT JOIN table3 t3_ ON t2_.id = t3_.otherId
LEFT JOIN table4 t4_ ON t2_.id = t4_.otherId
LEFT JOIN table5 t5_ ON t4_.id = t5_.otherId
LEFT JOIN table6 t6_ ON t4_.id = t6_.otherId
LEFT JOIN table7 t7_ ON t6_.id = t7_.otherId
WHERE t6_.id = :identifier
GROUP BY p0_.product;
explication : I have in table 'table0' several rows linked to table 'table1', I want to keep only the most recent one (column creation_date)
tableA.ID
tableA.propertyTableB.ID
tableA.createdAt
1
1
-4days
2
3
-3days
3
1
yesterday
4
3
today
5
1
today
6
1
-5 days
7
2
yesterday
8
2
today
I need to keep the rows : 4, 5, 8,
I need to get a queryBuilder object in order to do other operations behind it.
Has anyone already succeeded in this feat?
Thanks in advance <3
I need to keep the most recents rows from tableA related to tableB
EDIT : Maybe I find another way
no Select in the Join, just another JOIN like this :
FROM table0 t0
LEFT JOIN table0Bis AS t0B ON t0B.identifier = t0.identifier AND t0B.creation_date > t0.creation_date
LEFT JOIN table1 t1 ON t1.id = t0.foo
LEFT JOIN table2 t2 ON t1.id = t2.bar
LEFT JOIN table3 t3 ON t2.id = t3.rab
LEFT JOIN table4 t4 ON t2.id = t4.oof
LEFT JOIN table5 t5 ON t5.id = t4.toto
LEFT JOIN table6 t6 ON t5.id = t6.lorem
LEFT JOIN table7 t7 ON t6.id = t7.ipsum
where t7.id = :id AND t0B.creation_date IS NULL
group by t0.product;
$queryBuilder = $this->createQueryBuilder('t0');
$queryBuilder->leftJoin(
'App\Entity\Table0',
't0b',
Join::WITH,
't0b.product = t0.product AND t0b.creationDate > t0.creationDate')
->leftJoin('t0.foo', 't1')
->leftJoin('t1.bar', 't2')
->leftJoin('t2.bar', 't3')
->leftJoin('t3.rab', 't4')
->andWhere('t4.id = :ident')->setParameter('ident', $user->getId())
->andWhere('t0b.creationDate IS NULL')
->groupBy('t0.id', 't0.foobar')
->orderBy('t0.foobar', 'DESC')
;
Excuse my english (is not my native language)
Well, i wanna make this query (i make the temp table EMPL, the first temp table VEN is very easy but i don't see how to make the join)
SELECT 'CADORE_MP',EMPL.PERSONNELNUMBER, EMPL.NOMBRE, 0 AS CANTIDAD, VEN.MONTO FROM
(
SELECT VATNUM, SUM(INVOICEAMOUNT) AS MONTO FROM CUSTINVOICEJOUR
WHERE INVOICEDATE>=#FECHAI and INVOICEDATE<=#FECHAF
AND PAYMENT LIKE '%DIAS%'
AND CUSTGROUP LIKE 'EMP%'
AND REVERSE_GT=0 AND TAXTYPEDOCUMENTID='FC'
GROUP BY VATNUM) VEN
INNER JOIN
(
SELECT HW.PERSONNELNUMBER, HPIN.IDENTIFICATIONNUMBER, PRE.FIRSTNAME+' '+PRE.SECONDNAME+' '+PRE.FIRSTLASTNAME+' '+PRE.SECONDLASTNAME AS NOMBRE
FROM HcmPersonIdentificationNumber HPIN INNER JOIN HCMWORKER HW ON HPIN.PERSON=HW.PERSON AND HPIN.IDENTIFICATIONTYPE=5637146829
INNER JOIN PAYROLLEMPL PRE ON HW.PERSON=PRE.PERSON
INNER JOIN HCMEMPLOYMENT HE ON HW.RECID=HE.WORKER
LEFT JOIN PAYROLLLIQUIDATION PL ON HW.PERSONNELNUMBER=PL.EMPLID AND PL.DATELOW<#FECHAF
LEFT JOIN PAYROLLGROUPEMPLOYEES PRGE ON HW.PERSONNELNUMBER=PRGE.EMPLID AND GROUPSEMPLYEESID='CADORE_PQ'
WHERE HE.VALIDTO>='21541231' AND PL.EMPLID IS NULL AND PRGE.EMPLID IS NULL) EMPL ON VEN.VATNUM=EMPL.IDENTIFICATIONNUMBER
it have some parameters and other are constants.
so, i can do the two inner selects but later i can't do the join of this two temp tables
and i wanna know if someone can explain me the diference of join, exists join, no exists join. something like this
and if is posible to make when only wanna data from A but do not intersect with B
This is what i have
Query query;
QueryRun Run;
QueryBuildDataSource dataSourceHW;
QueryBuildDataSource dataSourceHPIN;
QueryBuildDataSource dataSourcePRE;
QueryBuildDataSource dataSourceHE;
QueryBuildDataSource dataSourcePL;
QueryBuildDataSource dataSourcePRGE;
str textDesc = "";
date FechaF;
FechaF = str2Date('21/03/2016',123);
query = new Query();
dataSourceHW = query.addDataSource(tableNum(HcmWorker));
dataSourceHPIN = dataSourceHW.addDataSource(tableNum(HcmPersonIdentificationNumber));
dataSourceHPIN.addLink(fieldNum(HcmWorker, Person), fieldNum(HcmPersonIdentificationNumber, Person));
dataSourceHPIN.addRange(fieldnum(HcmPersonIdentificationNumber,IdentificationType)).value('5637146829');
dataSourceHPIN.fetchMode(QueryFetchMode::One2One);
dataSourceHPIN.joinMode(JoinMode::InnerJoin);
dataSourcePRE = dataSourceHW.addDataSource(tableNum(PayRollEmpl), "PayRollEmpl");
dataSourcePRE.addLink(fieldNum(HcmWorker, Person), fieldNum(PayRollEmpl, Person));
dataSourcePRE.fetchMode(QueryFetchMode::One2One);
dataSourcePRE.joinMode(JoinMode::InnerJoin);
dataSourceHE = dataSourceHW.addDataSource(tableNum(HcmEmployment), "HcmEmployment");
dataSourceHE.addLink(fieldNum(HcmWorker, RecId), fieldNum(HcmEmployment, Worker));
dataSourceHE.addRange(fieldnum(HcmEmployment,ValidTo)).value(strFmt('%1' ,DateTimeUtil::maxValue()));
dataSourceHE.fetchMode(QueryFetchMode::One2One);
dataSourceHE.joinMode(JoinMode::InnerJoin);
dataSourcePL = dataSourceHW.addDataSource(tableNum(PayRollLiquidation));
dataSourcePL.addLink(fieldNum(HcmWorker, PersonnelNumber), fieldNum(PayRollLiquidation, EmplId));
dataSourcePL.addRange(fieldnum(PayRollLiquidation,DateLow)).value('21/03/2016');
dataSourcePL.fetchMode(QueryFetchMode::One2One);
dataSourcePL.joinMode(JoinMode::OuterJoin);
// this join under this dont work, indistinctly if i use it, always show me the same
// because of this i ask above about the data of table A only
dataSourcePRGE = dataSourceHW.addDataSource(tableNum(PayRollGroupEmployees));
dataSourcePRGE.addLink(fieldNum(HcmWorker, PersonnelNumber), fieldNum(PayRollGroupEmployees, EmplId));
dataSourcePRGE.addRange(fieldnum(PayRollGroupEmployees,GroupsEmplyeesId)).value('CADORE_PQ');
//dataSourcePRGE.addRange(fieldnum(PayRollGroupEmployees,EmplId)).value(SysQuery::valueEmptyString());
dataSourcePRGE.fetchMode(QueryFetchMode::One2One);
dataSourcePRGE.joinMode(JoinMode::NoExistsJoin);
will thanks you if someone help me
I am trying to get an employee manager's first and last name IF the employee has a manager ( Some employee's do not ex CEO etc ). Currently it returns the employee name for manager and if there is no ManagerID in contact it wont return any values
Here is my general table structure for the tables I'm trying to access:
Employee
EmpID
EmployeeNumber
StartDate
isManager
Status ( full time / part time )
ContactID
Contact
ContactID
Fname
Lname
ManagerID
Department
DeptID
Name
DeptHistory
DeptHistID
DeptID
EmpID
PosTitle
StartDate
EndDate
ModifiedDate
And here is the query I have been manipulating:
SELECT
dh.StartDate, dh.PositionTitle, d.Name,
e.EmployeeNumber, e.Classification, e.Status,
c1.FirstName, c1.LastName, c1.SIN,
c1.DateOfBirth, c1.PhoneNumber, c1.EmailAddress, c1.AddressLine1,
c1.AddressLine2, c1.PostalCode, c1.City, c1.Province,
(c2.FirstName+ ' ' + c2.LastName) AS Manager
FROM
Person.Contact c1
JOIN
HumanResources.Employee e ON c1.ContactID = e.ContactID
JOIN
HumanResources.EmployeeDepartmentHistory dh ON e.EmpID = dh.EmpID
JOIN
HumanResources.Department d ON dh.DepartmentID = d.DepartmentID
JOIN
Person.Contact c2 ON c2.ManagerID = e.EmpID
WHERE
e.EmpID = #empID
Use LEFT OUTER JOIN instead of JOIN, so the query would look like this:
LEFT OUTER JOIN Person.Contact c2 ON c2.ManagerID = e.EmpID
OUTER word is optional so you can use just LEFT JOIN.
See Using Left Outer Joins on Microsoft's TechNet.
I have two tables table1 and table2. Each table contains a column with itemPrice. I need to add the two columns together.
The SQL query below returns the correct SUM.
SELECT SUM(item1+ item2) FROM
(select SUM(t1.itemPrice) item1 from table1 t1 WHERE t1.userid=='jonh') tableA
CROSS JOIN
(select SUM(t2.itemPrice) item2 from table2 t2 WHERE t1.userid=='jonh') tableB
I am not been lazy but the above query has so many SUM functions that I don't know where to start to write LINQ queries.
Can anyone help?
Ceci,
Hopefully this will give you what you want...
from f in (
from x in ( from t1 in Table1
where t1.Userid.Equals("John")
select new { Userid = t1.Userid }
).Distinct()
select new { item1 = ( from z in Table1
where z.Userid.Equals("John")
select z.ItemPrice ).Sum() ??0 ,
item2 = ( from z in Table2
where z.Userid.Equals("John")
select z.ItemPrice ).Sum() ??0 }
) select new { total = f.item1 + f.item2 }
In the case where there are no records for "john" in one table, it will bring back a 0 and sum up the other tables.
hope this helps.
I am trying to understand left outer joins in LINQ to Entity. For example I have the following 3 tables:
Company, CompanyProduct, Product
The CompanyProduct is linked to its two parent tables, Company and Product.
I want to return all of the Company records and the associated CompanyProduct whether the CompanyProduct exists or not for a given product. In Transact SQL I would go from the Company table using left outer joins as follows:
SELECT * FROM Company AS C
LEFT OUTER JOIN CompanyProduct AS CP ON C.CompanyID=CP.CompanyID
LEFT OUTER JOIN Product AS P ON CP.ProductID=P.ProductID
WHERE P.ProductID = 14 OR P.ProductID IS NULL
My database has 3 companies, and 2 CompanyProduct records assocaited with the ProductID of 14. So the results from the SQL query are the expected 3 rows, 2 of which are connected to a CompanyProduct and Product and 1 which simply has the Company table and nulls in the CompanyProduct and Product tables.
So how do you write the same kind of join in LINQ to Entity to acheive a similiar result?
I have tried a few different things but can't get the syntax correct.
Thanks.
Solved it!
Final Output:
theCompany.id: 1
theProduct.id: 14
theCompany.id: 2
theProduct.id: 14
theCompany.id: 3
Here is the Scenario
1 - The Database
--Company Table
CREATE TABLE [theCompany](
[id] [int] IDENTITY(1,1) NOT NULL,
[value] [nvarchar](50) NULL,
CONSTRAINT [PK_theCompany] PRIMARY KEY CLUSTERED
( [id] ASC ) WITH (
PAD_INDEX = OFF,
STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY];
GO
--Products Table
CREATE TABLE [theProduct](
[id] [int] IDENTITY(1,1) NOT NULL,
[value] [nvarchar](50) NULL,
CONSTRAINT [PK_theProduct] PRIMARY KEY CLUSTERED
( [id] ASC
) WITH (
PAD_INDEX = OFF,
STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY];
GO
--CompanyProduct Table
CREATE TABLE [dbo].[CompanyProduct](
[fk_company] [int] NOT NULL,
[fk_product] [int] NOT NULL
) ON [PRIMARY];
GO
ALTER TABLE [CompanyProduct] WITH CHECK ADD CONSTRAINT
[FK_CompanyProduct_theCompany] FOREIGN KEY([fk_company])
REFERENCES [theCompany] ([id]);
GO
ALTER TABLE [dbo].[CompanyProduct] CHECK CONSTRAINT
[FK_CompanyProduct_theCompany];
GO
ALTER TABLE [CompanyProduct] WITH CHECK ADD CONSTRAINT
[FK_CompanyProduct_theProduct] FOREIGN KEY([fk_product])
REFERENCES [dbo].[theProduct] ([id]);
GO
ALTER TABLE [dbo].[CompanyProduct] CHECK CONSTRAINT
[FK_CompanyProduct_theProduct];
2 - The Data
SELECT [id] ,[value] FROM theCompany
id value
----------- --------------------------------------------------
1 company1
2 company2
3 company3
SELECT [id] ,[value] FROM theProduct
id value
----------- --------------------------------------------------
14 Product 1
SELECT [fk_company],[fk_product] FROM CompanyProduct;
fk_company fk_product
----------- -----------
1 14
2 14
3 - The Entity in VS.NET 2008
alt text http://i478.photobucket.com/albums/rr148/KyleLanser/companyproduct.png
The Entity Container Name is 'testEntities' (as seen in model Properties window)
4 - The Code (FINALLY!)
testEntities entity = new testEntities();
var theResultSet = from c in entity.theCompany
select new { company_id = c.id, product_id = c.theProduct.Select(e=>e) };
foreach(var oneCompany in theResultSet)
{
Debug.WriteLine("theCompany.id: " + oneCompany.company_id);
foreach(var allProducts in oneCompany.product_id)
{
Debug.WriteLine("theProduct.id: " + allProducts.id);
}
}
5 - The Final Output
theCompany.id: 1
theProduct.id: 14
theCompany.id: 2
theProduct.id: 14
theCompany.id: 3
IT should be something like this....
var query = from t1 in db.table1
join t2 in db.table2
on t1.Field1 equals t2.field1 into T1andT2
from t2Join in T1andT2.DefaultIfEmpty()
join t3 in db.table3
on t2Join.Field2 equals t3.Field3 into T2andT3
from t3Join in T2andT3.DefaultIfEmpty()
where t1.someField = "Some value"
select
{
t2Join.FieldXXX
t3Join.FieldYYY
};
This is how I did....
You'll want to use the Entity Framework to set up a many-to-many mapping from Company to Product. This will use the CompanyProduct table, but will make it unnecessary to have a CompanyProduct entity set in your entity model. Once you've done that, the query will be very simple, and it will depend on personal preference and how you want to represent the data. For example, if you just want all the companies who have a given product, you could say:
var query = from p in Database.ProductSet
where p.ProductId == 14
from c in p.Companies
select c;
or
var query = Database.CompanySet
.Where(c => c.Products.Any(p => p.ProductId == 14));
Your SQL query returns the product information along with the companies. If that's what you're going for, you might try:
var query = from p in Database.ProductSet
where p.ProductId == 14
select new
{
Product = p,
Companies = p.Companies
};
Please use the "Add Comment" button if you would like to provide more information, rather than creating another answer.
LEFT OUTER JOINs are done by using the GroupJoin in Entity Framework:
http://msdn.microsoft.com/en-us/library/bb896266.aspx
The normal group join represents a left outer join. Try this:
var list = from a in _datasource.table1
join b in _datasource.table2
on a.id equals b.table1.id
into ab
where ab.Count()==0
select new { table1 = a,
table2Count = ab.Count() };
That example gives you all records from table1 which don't have a reference to table2.
If you omit the where sentence, you get all records of table1.
Please try something like this:
from s in db.Employees
join e in db.Employees on s.ReportsTo equals e.EmployeeId
join er in EmployeeRoles on s.EmployeeId equals er.EmployeeId
join r in Roles on er.RoleId equals r.RoleId
where e.EmployeeId == employeeId &&
er.Status == (int)DocumentStatus.Draft
select s;
Cheers!
What about this one (you do have a many-to-many relationship between Company and Product in your Entity Designer, don't you?):
from s in db.Employees
where s.Product == null || s.Product.ProductID == 14
select s;
Entity Framework should be able to figure out the type of join to use.