I was trying to understand if I can perform merge along with join to load data into a table.
In my scenario, I used merge statement to load data into my PRODUCTSDETAIL table using data from staging table but I was not able to populate a foreign key column (PRODUCTID) since that is the primary key of DIM_PRODUCT table and is autoincremented.
It shows null on executing the merge statement. Can I do Join inside the merge statement in SNOWFLAKE.
If I can, could someone please help me in accomplishing this. I was also not sure if snowflake could automatically populate the foreign key based on the PRODUCTNAME
Below is the code for creation statement of PRODUCTDETAILS Table
Create table PRODUCTSDETAIL(
ProductsdetailID NUMBER(38,0) NOT NULL autoincrement,
CUSTOMERID NUMBER(38,0),
PRODUCTID NUMBER(38, 0),
PRODUCTunit varchar(255),
PRODUCTname varchar(255),
PRODUCTcode varchar(255),
PRODUCTquantity varchar(255),
PURCHASEdate TIMESTAMP_NTZ(9),
foreign key (PRODUCTID) references test."manufacturer".dim_product(PRODUCTID))
Below is the merge statement I am using to load and got NULL values for PRODUCTSID
merge into PRODUCTDETAILS as a using (
select
CUSTOMERID,
f.value as Productunit,
Productname[f.index]::Varchar as Productname,
Productcode [f.index]::Varchar as Productcode,
Productquantity [f.index]::Varchar as Productquantity,
PURCHASEDate [f.index]::Timestamp as PURCHASEDate
from staging_json b
LATERAL FLATTEN(Productunit, RECURSIVE=>true)f) as b on a.PURCHASEDate = b.PURCHASEDate
and a.Productcode = b.Productcode and a.CUSTOMERID = b.CUSTOMERID
and a.Productquantity = b.Productquantity
when not matched then insert (CUSTOMERID, PRODUCTID, Productunit, Productname, Productcode,Productquantity, PURCHASEDate)
values (CUSTOMERID, PRODUCTID, Productunit, Productname, Productcode,Productquantity, PURCHASEDate)
When I did insert using Join after the execution of above merge statement the PRODUCTID got added below the records will NULL values populated for other columns as below screenshot
Below is insert statement I ran after the above merge statment and values got added below
insert into PRODUCTSDETAIL(PRODUCTID)
select a.PRODUCTID
from DIM_PRODUCT a
inner join PRODUCTSETAIL b on a.PRODUCTNAME = b.PRODUCTNAME
where b.PRODUCTID is null;
I tried to do a left join inside the merge statement to fetch PRODUCTID from DIM_PRODUCT but it throws an error UNKNOWN LEFT JOIN
Below is the syntax for that
**
merge into PRODUCTDETAILS as a using (
select
CUSTOMERID,
f.value as Productunit,
Productname[f.index]::Varchar as Productname,
Productcode [f.index]::Varchar as Productcode,
Productquantity [f.index]::Varchar as Productquantity,
PURCHASEDate [f.index]::Timestamp as PURCHASEDate
from staging_json b
LEFT JOIN DIM_PRODUCTS c ON c.PRODUCTNAME = a.PRODUCTNAME,
LATERAL FLATTEN(Productunit, RECURSIVE=>true)f) as b on a.PURCHASEDate = b.PURCHASEDate
and a.Productcode = b.Productcode and a.CUSTOMERID = b.CUSTOMERID
and a.Productquantity = b.Productquantity
when not matched then insert (CUSTOMERID, PRODUCTID, Productunit, Productname, Productcode,Productquantity, PURCHASEDate)
values (CUSTOMERID, PRODUCTID, Productunit, Productname, Productcode,Productquantity, PURCHASEDate)
**
Related
I have two tables as per the below structure in SNOWFLAKE database and I am trying to merge data from the staging table onto the below two tables.
Primary key of DIM_PRODUCTS table below is PRODUCTID
Primary Key of PRODUCTSDETAIL table is PRODUCTSDETAILID and the CUSTOMERID is generated in the stagingtbale using nextval
My understanding is if I create a table with referencing the foreign key, then the values should populate automatically for that column if the corresponding names are found but in my case it shows NULL values.
Below is the syntax I used to create the DIM_PRODUCT table and the PRODUCTSDETAIL table
create or replace TABLE DIM_PRODUCT cluster by (PRODUCTID)(
PRODUCTID NUMBER(38,0) NOT NULL autoincrement,
PRODUCTName VARCHAR(),
primary key (PRODUCTID)
);
Create table PRODUCTSDETAIL(
ProductsdetailID NUMBER(38,0) NOT NULL autoincrement,
CUSTOMERID NUMBER(38,0),
PRODUCTID NUMBER(38, 0),
PRODUCTunit varchar(255),
PRODUCTname varchar(255),
PRODUCTcode varchar(255),
PRODUCTquantity varchar(255),
PURCHASEdate TIMESTAMP_NTZ(9),
foreign key (PRODUCTID) references test."manufacturer".dim_product(PRODUCTID))
Below is the merge statement I used to load data into details table and got NULL values for PRODUCTID
merge into PRODUCTDETAILS as a using (
select
CUSTOMERID,
f.value as Productunit,
Productname[f.index]::Varchar as Productname,
Productcode [f.index]::Varchar as Productcode,
Productquantity [f.index]::Varchar as Productquantity,
PURCHASEDate [f.index]::Timestamp as PURCHASEDate
from staging_json b
LATERAL FLATTEN(Productunit, RECURSIVE=>true)f) as b on a.PURCHASEDate = b.PURCHASEDate
and a.Productcode = b.Productcode and a.CUSTOMERID = b.CUSTOMERID
and a.Productquantity = b.Productquantity
when not matched then insert (CUSTOMERID, PRODUCTID, Productunit, Productname, Productcode,Productquantity, PURCHASEDate)
values (CUSTOMERID, PRODUCTID, Productunit, Productname, Productcode,Productquantity, PURCHASEDate)
I have also created sequences to autoincrement for ProductID and ProductdetailID
I also tried a Join approach to populate the ProductID but it did not work out and the code for that is as below
merge into PRODUCTDETAILS as a using (
select
CUSTOMERID,
f.value as Productunit,
Productname[f.index]::Varchar as Productname,
Productcode [f.index]::Varchar as Productcode,
Productquantity [f.index]::Varchar as Productquantity,
PURCHASEDate [f.index]::Timestamp as PURCHASEDate
from staging_json b
LEFT JOIN DIM_PRODUCTS c ON c.PRODUCTNAME = a.PRODUCTNAME,
LATERAL FLATTEN(Productunit, RECURSIVE=>true)f) as b on a.PURCHASEDate = b.PURCHASEDate
and a.Productcode = b.Productcode and a.CUSTOMERID = b.CUSTOMERID
and a.Productquantity = b.Productquantity
when not matched then insert (CUSTOMERID, PRODUCTID, Productunit, Productname, Productcode,Productquantity, PURCHASEDate)
values (CUSTOMERID, PRODUCTID, Productunit, Productname, Productcode,Productquantity, PURCHASEDate)
I am trying to find the good practice to load the PRODUCTID into the PRODUCTSDETAIL table and make it consistent for querying. Could someone please help me?
I am trying to get an employee manager's first and last name IF the employee has a manager ( Some employee's do not ex CEO etc ). Currently it returns the employee name for manager and if there is no ManagerID in contact it wont return any values
Here is my general table structure for the tables I'm trying to access:
Employee
EmpID
EmployeeNumber
StartDate
isManager
Status ( full time / part time )
ContactID
Contact
ContactID
Fname
Lname
ManagerID
Department
DeptID
Name
DeptHistory
DeptHistID
DeptID
EmpID
PosTitle
StartDate
EndDate
ModifiedDate
And here is the query I have been manipulating:
SELECT
dh.StartDate, dh.PositionTitle, d.Name,
e.EmployeeNumber, e.Classification, e.Status,
c1.FirstName, c1.LastName, c1.SIN,
c1.DateOfBirth, c1.PhoneNumber, c1.EmailAddress, c1.AddressLine1,
c1.AddressLine2, c1.PostalCode, c1.City, c1.Province,
(c2.FirstName+ ' ' + c2.LastName) AS Manager
FROM
Person.Contact c1
JOIN
HumanResources.Employee e ON c1.ContactID = e.ContactID
JOIN
HumanResources.EmployeeDepartmentHistory dh ON e.EmpID = dh.EmpID
JOIN
HumanResources.Department d ON dh.DepartmentID = d.DepartmentID
JOIN
Person.Contact c2 ON c2.ManagerID = e.EmpID
WHERE
e.EmpID = #empID
Use LEFT OUTER JOIN instead of JOIN, so the query would look like this:
LEFT OUTER JOIN Person.Contact c2 ON c2.ManagerID = e.EmpID
OUTER word is optional so you can use just LEFT JOIN.
See Using Left Outer Joins on Microsoft's TechNet.
Hive does not support non equi joins: The common work around is to move the join condition to the where clause, which work fine when you want an inner join. but what about a left join?
Contrived example. Let say we have an orderLineItem table, and we need to join to a ProductPrice table that has a productID, price & a date range for which the price applies. We want to join to this where ProductID=ProductID & OrderDate between start and End date. If a productID or a valid date range do not match, I'd still want to see all orderLineItems.
This SQL fiddle is an example of how we'd do this in MSSQL:
http://sqlfiddle.com/#!6/fb877/7
Problem
If I apply the typical workaround, and move the non equi filter to the where clause, it becomes an inner join. In the case above, in the sql fiddle & below, I have a product ID that is not in the lookup.
Question:
Provided hive does not support non eqi-joins, How can a left non-eqi be achieved ?
[SQLFiddle Content]
Tables:
CREATE TABLE OrderLineItem(
LineItemIDId int IDENTITY(1,1),
OrderID int NOT NULL,
ProductID int NOT NULL,
OrderDate Date
);
CREATE TABLE ProductPrice(
ProductID int,
Cost float,
startDate Date,
EndDate Date
);
loading The data & how we'd join in MSSQL:
--Old Price. Should be ignored
INSERT INTO ProductPrice(ProductID, COST,startDate,EndDate) VALUES (1, 50,'12/1/2012','1/1/2013');
INSERT INTO ProductPrice(ProductID, COST,startDate,EndDate) VALUES (2, 55,'12/1/2012','1/1/2013');
--Price for Order 2. Should be applied to Order 1
INSERT INTO ProductPrice (ProductID, COST,startDate,EndDate) VALUES(1, 20,'12/1/2013','1/1/2014');
INSERT INTO ProductPrice (ProductID, COST,startDate,EndDate) VALUES(2, 25,'12/1/2013','1/1/2014');
--Price for Order 2. Should be applied to Order 2
INSERT INTO ProductPrice (ProductID, COST,startDate,EndDate) VALUES(1, 15,'1/2/2014','3/1/2014');
INSERT INTO ProductPrice (ProductID, COST,startDate,EndDate) VALUES(2, 20,'1/2/2014','3/1/2014');
--January 1st 2014 Order
INSERT INTO OrderLineItem(OrderID,ProductID,OrderDate) VALUES (1, 1,'1/1/2014') ;
INSERT INTO OrderLineItem(OrderID,ProductID,OrderDate) VALUES (1, 2,'1/1/2014');
--Feb 1st 2014 Order
INSERT INTO OrderLineItem(OrderID,ProductID,OrderDate) VALUES (2, 1,'2/1/2014');
INSERT INTO OrderLineItem (OrderID,ProductID,OrderDate) VALUES(2, 2,'2/1/2014');
INSERT INTO OrderLineItem (OrderID,ProductID,OrderDate) VALUES(2, 3,'2/1/2014'); -- no price
SELECT * FROM OrderLineItem;
SELECT * FROM OrderLineItem li LEFT OUTER JOIN ProductPrice p on
p.ProductID=li.ProductID AND OrderDate BETWEEN startDate AND EndDate;
Create a copy of the left table with added serial row numbers:
CREATE TABLE OrderLineItem_serial AS
SELECT ROW_NUMBER() OVER() AS serial, * FROM OrderLineItem;
Remark: This may work better for some tables formats (must be WITHOUT COMPRESSION):
CONCAT(INPUT__FILE__NAME, BLOCK__OFFSET__INSIDE__FILE) AS serial
Do an inner join:
CREATE TABLE OrderLineItem_inner AS
SELECT * FROM OrderLineItem_serial li JOIN ProductPrice p
on p.ProductID = li.ProductID WHERE OrderDate BETWEEN startDate AND EndDate;
Left join by serial:
SELECT * FROM OrderLineItem_serial li
LEFT OUTER JOIN OrderLineItem_inner i on li.serial = i.serial;
Why not use a WHERE clause that allows for NULL cases separately?
SELECT * FROM OrderLineItem li
LEFT OUTER JOIN ProductPrice p
ON p.ProductID=li.ProductID
WHERE ( StartDate IS NULL OR OrderDate BETWEEN startDate AND EndDate);
That should take care of it - if the left join matches it'll use the date logic, if it doesn't it'll keep the NULL values intact as a left join should.
Not sure if you can avoid using a double join:
SELECT *
FROM OrderLineItem li
LEFT OUTER JOIN (
SELECT p.*
FROM ProductPrice p
JOIN OrderLineItem li
ON p.ProductID=li.ProductID
WHERE OrderDate BETWEEN StartDate AND EndDate ) p
ON p.ProductId = li.ProductID
WHERE StartDate IS NULL OR
OrderDate BETWEEN StartDate AND EndDate;
This way if there is a match and StartDate is not null, there has to be a valid start/end date match.
Hive 0.10 supports cross joins, so you could handle all your "theta join" (non-equijoin) conditions in the WHERE clause.
I woud like to inquire if my Linq solution below is a good solution or if there is a better way. I am new to using Linq, and am most familiar with MySQL. So I've been converting one of my past projects from PHP to .NET MVC and am trying to learn Linq. I would like to find out if there is a better solution than the one I came up with.
I have the following table structures:
CREATE TABLE maplocations (
ID int NOT NULL AUTO_INCREMENT,
name varchar(35) NOT NULL,
Lat double NOT NULL,
Lng double NOT NULL,
PRIMARY KEY (ID),
UNIQUE KEY name (name)
);
CREATE TABLE reservations (
ID INT NOT NULL AUTO_INCREMENT,
loc_ID INT NOT NULL,
resDate DATE NOT NULL,
user_ID INT NOT NULL,
PRIMARY KEY (ID),
UNIQUE KEY one_per (loc_ID, resDate),
FOREIGN KEY (user_ID) REFERENCES Users (ID),
FOREIGN KEY (loc_ID) REFERENCES MapLocations (ID)
);
CREATE TABLE Users (
ID INT NOT NULL AUTO_INCREMENT,
name VARCHAR(20) NOT NULL,
email VARCHAR(50) NOT NULL,
pass VARCHAR(128) NOT NULL,
salt VARCHAR(5) NOT NULL,
PRIMARY KEY (ID),
UNIQUE KEY unique_names (name),
UNIQUE KEY unique_email (email)
);
In MySQL, I use the following query to get the ealiest reservation at each maplocation with a non null date for any locations that don't have a reservation.
SELECT locs.*, if(res.resDate,res.resDate,'0001-01-01') as resDate, res.Name as User
FROM MapLocations locs
LEFT JOIN (
SELECT loc_ID, resDate, Name
FROM Reservations, Users
WHERE resDate >= Date(Now())
AND user_ID = Users.ID
ORDER BY resDate
) res on locs.ID = res.loc_ID
group by locs.ID
ORDER BY locs.Name;
In Linq, with Visual studio automatically creating much of the structure after connecting to the database, I have come up with the following equivalent to that SQL Query
var resList = (from res in Reservations
where res.ResDate >= DateTime.Today
select res);
var locAndRes =
(from loc in Maplocations
join res in resList on loc.ID equals res.Loc_ID into join1
from res2 in join1.DefaultIfEmpty()
join usr in Users on res2.User_ID equals usr.ID into join2
from usr2 in join2.DefaultIfEmpty()
orderby loc.ID,res2.ResDate
select new {
ID = (int)loc.ID,
Name = (string)loc.Name,
Lat = (double)loc.Lat,
Lng = (double)loc.Lng,
resDate = res2 != null ?(DateTime)res2.ResDate : DateTime.MinValue,
user = usr2 != null ? usr2.Name : null
}).GroupBy(a => a.ID).Select(b => b.FirstOrDefault());
So, I'm wondering is there a better way to perform this query?
Are these equivalent?
Are there any good practices I should be following?
Also, one more question, I'm having trouble getting this from the var to a List. doing something like this doesn't work
List<locAndResModel> locList = locAndRes.AsQueryable().ToList<locAndResModel>();
In the above snippet locAndResModel is just a class which has variables to match the int, string, double double, DateTime, string results of the query. Is there an easy way to get a list without having to do a foreach and passing the results to a constructor override? Or should I just add it to ViewData and return the View?
You'll want to take advantage of the automatic joins performed by the Entity Framework. Give this a try and let me know if it does what you want:
var locAndRes = from maplocation in MapLocations
let earliestReservationDate = maplocation.Reservations.Min(res => res.resDate)
let earliestReservation = (from reservation in mapLocation.Reservations
where reservation.resDate == earliestReservationDate && reservation.resDate >= DateTime.Today
select reservation).FirstOrDefault()
select new locAndResModel( maplocation.ID, maplocation.name, maplocation.Lat, maplocation.Lng, earliestReservation != null ? earliestReservation.resDate : DateTime.MinValue, earliestReservation != null ?earliestReservation.User.name : null)
Run into a strange problem while writing an ASP.NET MVC site. I have a view in my SQL Server database that returns a few date ranges. The view works fine when running the query in SSMS.
When the view data is returned by the Entity Framework Model, It returns the correct number of rows but some of the rows are duplicated.
Here is an example of what I have done:
SQL Server code:
EDITED: (table A)
CREATE TABLE [dbo].[A](
[ID] [int] NOT NULL,
[PhID] [int] NULL,
[FromDate] [datetime] NOT NULL,
[ToDate] [datetime] NULL,
CONSTRAINT [PK_A] PRIMARY KEY CLUSTERED
( [ID] ASC,
[FromDate] ASC
)) ON [PRIMARY]
CREATE TABLE [dbo].[B](
[PhID] [int] NOT NULL,
[FromDate] [datetime] NULL,
[ToDate] [datetime] NULL,
CONSTRAINT [PK_B] PRIMARY KEY CLUSTERED
( [PhID] ASC )) ON [PRIMARY]
go
CREATE VIEW C as
SELECT A.ID,
CASE WHEN A.PhID IS NULL THEN A.FromDate ELSE B.FromDate END AS FromDate,
CASE WHEN A.PhID IS NULL THEN A.ToDate ELSE B.ToDate END AS ToDate
FROM A
LEFT OUTER JOIN B ON A.PhID = B.PhID
go
INSERT INTO B (PhID, FromDate, ToDate) VALUES (100, '20100615', '20100715')
INSERT INTO A (ID, PhID, FromDate, ToDate) VALUES (1, NULL, '20100101', '20100201')
INSERT INTO A (ID, PhID, FromDate, ToDate) VALUES (1, 100, '20100615', '20100715')
INSERT INTO B (PhID, FromDate, ToDate) VALUES (101, '20101201', '20101231')
INSERT INTO A (ID, PhID, FromDate, ToDate) VALUES (2, NULL, '20100801', '20100901')
INSERT INTO A (ID, PhID, FromDate, ToDate) VALUES (2, 101, '20101201', '20101231')
So now, if you select all from C, you get 4 separate date ranges
In the Entity Framework Model (which I call 'Core'), the view 'C' is added.
in MVC Controller:
public class HomeController : Controller
{
public ActionResult Index()
{
CoreEntities db = new CoreEntities();
var clist = from c in db.C
select c;
return View(clist.ToList());
}
}
in MVC View:
#model List<RM.Models.C>
#{
foreach (RM.Models.C c in Model)
{
#String.Format("{0:dd-MMM-yyyy}", c.FromDate)
<span>-</span>
#String.Format("{0:dd-MMM-yyyy}", c.ToDate)
<br />
}
}
When I run all this, it outputs this:
01-Jan-2010 - 01-Feb-2010
01-Jan-2010 - 01-Feb-2010
01-Aug-2010 - 01-Sep-2010
01-Aug-2010 - 01-Sep-2010
When it should do this (this is what the view returns):
01-Jan-2010 - 01-Feb-2010
15-Jun-2010 - 15-Jul-2010
01-Aug-2010 - 01-Sep-2010
01-Dec-2010 - 31-Dec-2010
Also, I've run the SQL profiler over it and according to that, the query being executed is:
SELECT
[Extent1].[ID] AS [ID],
[Extent1].[FromDate] AS [FromDate],
[Extent1].[ToDate] AS [ToDate]
FROM (SELECT
[C].[ID] AS [ID],
[C].[FromDate] AS [FromDate],
[C].[ToDate] AS [ToDate]
FROM [dbo].[C] AS [C]) AS [Extent1]
Which returns the correct data
So it seems that the entity framework is doing something to the data in the meantime.
To me, everything looks fine! Have I missed something?
Cheers,
Ben
EDIT:
sorry, table A should be:
CREATE TABLE [dbo].[A](
[ID] [int] NOT NULL,
[PhID] [int] NULL,
[FromDate] [datetime] NOT NULL,
[ToDate] [datetime] NULL,
CONSTRAINT [PK_A] PRIMARY KEY CLUSTERED
( [ID] ASC,
[FromDate] ASC
)) ON [PRIMARY]
I figured it out myself.
The problem was with the way the view was mapped in the entity model.
When it was added, it made the entity key just the ID. I needed it over the ID and FromDate. So I included the FromDate in the entity key and it works fine.