T-SQL Matching process using only provided fields - stored-procedures

I am trying to write a stored procedure to match lists of physicians with existing records in our database based off of the information provided to us by our clients. Currently we use MS Access to join manually based on the given identifiers, but this process tends to be tedious and overly time consuming, hence the desire to automate it.
What I am trying to do is create a temporary table that contains all columns that could potentially be matched on, and then run through a series of matching queries using the fields as join conditions to get our identifier to pass back.
For instance, the available matching fields are Name, NPI, MedicaidNum, and DOB so I would write something like:
UPDATE Temp
SET Temp.RECID = Phy.RECID
FROM TempTable Temp
INNER JOIN Physicians Phy
ON Phy.Name = Temp.Name
AND Phy.NPI = Temp.NPI
AND Phy.MedicaidNum = Temp.MedicaidNum
AND Phy.DOB = Temp.DOB
UPDATE Temp
SET Temp.RECID = Phy.RECID
FROM TempTable Temp
INNER JOIN Physicians Phy
ON Phy.Name = Temp.Name
AND Phy.NPI = Temp.NPI
AND Phy.MedicaidNum = Temp.MedicaidNum
WHERE Temp.RECID IS NULL
...etc
The problem lies in the fact that there about 15 different identifiers which could potentially be provided and clients usually only provide three or four per record set. So by the time null values are accounted for, there are potentially over a hundred different queries that need to be written to match on only half a dozen provided fields.
I am thinking that there may be a way to pass in a variable (or variables) which indicate which columns are actually provided with the data set, and then write a dynamic join statement and/or where clause, but I do not know if this will work in T-SQL. Something like:
DECLARE #Field1
DECLARE #Field2
....
UPDATE Temp
SET Temp.RECID = Phy.RECID
FROM TempTable Temp
INNER JOIN Physicians Phy
ON Phy.#Field1 = Temp.#Field1
AND Phy.#Field2 = Temp.#Field2
This way I would limit the number of queries I need to write, and only need to worry about the number of fields I am matching, rather then which specific ones. Perhaps there is a better approach to this problem however?

You can do something like this, but be warned this method is super prone to SQL injection. It's just to illustrate the principle of how to do something like this. I leave it up to you what you want to do with it. For this code, I made the proc take three fields:
CREATE PROC DynamicUpdateSQLFromFieldList #Field1 VARCHAR(50) = NULL,
#Field2 VARCHAR(50) = NULL,
#Field3 VARCHAR(50) = NULL,
#RunMe BIT = 0
AS
BEGIN
DECLARE #SQL AS VARCHAR(1000);
SELECT #SQL = 'UPDATE Temp
SET Temp.RECID = Phy.RECID
FROM TempTable Temp
INNER JOIN Physicians Phy ON ' +
COALESCE('Phy.' + #Field1 + ' = Temp.' + #Field1 + ' AND ', '') +
COALESCE('Phy.' + #Field2 + ' = Temp.' + #Field2 + ' AND ', '') +
COALESCE('Phy.' + #Field3 + ' = Temp.' + #Field3, '') + ';';
IF #RunMe = 0
SELECT #SQL AS SQL;
ELSE
EXEC(#SQL)
END
I've added a debug mode flag just so you can see the SQL if you don't want to run it. So, for example, if you run:
EXEC DynamicUpdateSQLFromFieldList #field1='col1', #field2='col2', #field3='col3'
or
EXEC DynamicUpdateSQLFromFieldList #field1='col1', #field2='col2', #field3='col3', #RunMe=0
the SQL produced will be:
UPDATE Temp
SET Temp.RECID = Phy.RECID
FROM TempTable Temp INNER JOIN Physicians Phy
ON Phy.col1 = Temp.col1 AND
Phy.col2 = Temp.col2 AND
Phy.col3 = Temp.col3;
If you run this line:
EXEC DynamicUpdateSQLFromFieldList #field1='col1', #field2='col2', #field3='col3', #RunMe=1
It will perform the update. If you wanted it to be more secure, you could whitelist the incoming field names against the sys tables to make sure the columns actually exist in each table before you execute any code.

Related

Create dynamic SQL based on column names passed through a string

I need to find out rows that are present in table A and missing from table B (using LEFT JOIN) wherein table A and table B are two tables with same structure but within different schema.
But the query has to be constructed using Dynamic SQL and the columns that need to be used for performing JOIN are stored in a string. How to extract the column names from string and use them to dynamically construct below query :
Database is Azure SQL Server
eg :
DECLARE #ColNames NVARCHAR(150) = 'col1,col2'
Query to be constructed based on columns defined in ColNames :-
SELECT *
FROM Table A
Left Join
Table B
ON A.col1 = B.col1
AND A.col2 = B.col2
AND B.col1 IS NULL AND B.col2 IS NULL
If the number of columns in #ColNames is more then the SELECT statement needs to cater for all the column.
Without knowing the full context, try this:
DECLARE #ColNames NVARCHAR(150) = 'col1,col2'
DECLARE #JoinContion NVARCHAR(MAX) = ''
DECLARE #WhereCondition NVARCHAR(MAX) = ''
SELECT #JoinContion += CONCAT('[a].', QUOTENAME(Value), ' = ', '[b].', QUOTENAME(Value), (CASE WHEN LEAD(Value) OVER(ORDER BY Value) IS NOT NULL THEN ' AND ' ELSE '' END))
,#WhereCondition += CONCAT('[a].', QUOTENAME(Value), ' IS NULL', (CASE WHEN LEAD(Value) OVER(ORDER BY Value) IS NOT NULL THEN ' AND ' ELSE '' END))
FROM STRING_SPLIT(#ColNames,N',')
SELECT #JoinContion, #WhereCondition
String_Split: To split the input string into columns
Lead: to determine if we need the AND keyword when it's not the last row.
Be aware the NOT EXISTS is probably a better solution then LEFT JOIN

Entity Framework 6 vs Entity Framework Core Raw Sql

Entity Framework 6 example writing SQL queries for non-entity types:
context.Database.SqlQuery<string>(" ; with tempSet as " +
"(select " +
In Entity Framework 6, I can also write the following query with SqlQuery. How can I run the following query with Entity Framework Core?
; with tempSet as
(
select
transitionDatetime = l.transitionDate,
gateName = g.gateName,
staffid = l.staffid,
idx = row_number() over(partition by l.staffid order by l.transitionDate) -
row_number() over(partition by l.staffid, cast(l.transitionDate as date) order by l.transitionDate),
transitionDate = cast(l.transitionDate as date)
from
logs l
inner join
staff s on l.staffid = s.staffid and staffType = 'Student'
join
gate g on g.gateid = l.gateid
), groupedSet as
(
select
t1.*,
FirstGateName = t2.gatename,
lastGateName = t3.gatename
from
(select
staffid,
mintransitionDate = min(transitionDatetime),
maxtransitionDate = case when count(1) > 1 then max(transitionDatetime) else null end,
transitionDate = max(transitionDate),
idx
from
tempSet
group by
staffid, idx) t1
left join
tempSet t2 on t1.idx = t2.idx
and t1.staffid = t2.staffid
and t1.mintransitionDate = t2.transitionDatetime
left join
tempSet t3 on t1.idx = t3.idx
and t1.staffid = t3.staffid
and t1.maxtransitionDate = t3.transitionDatetime
where
t1.transitionDate between #startdate and #enddate
)
select
t.*,
g.mintransitionDate,
g.maxtransitionDate,
g.FirstGateName,
g.LastGateName
from
groupedSet g
right join
(select
d,
staffid
from
(select top (select datediff(d, #startdate, #endDate))
d = dateadd(d, row_number() over(order by (select null)) - 1, #startDate)
from
sys.objects o1
cross join
sys.objects o2) tally
cross join
staff
where
staff.stafftype = 'Student') t on cast(t.d as date) = cast(g.transitionDate as date)
and t.staffid = g.staffid
order by
t.d asc, t.staffid asc
How can I do with Entity Framework Core? Writing SQL queries for non-entity types?
I have done the 'fromsql' off of the context directly when it is a single table, but I realize this is not what you want but it builds on it.
var blogs = context.Blogs
.FromSql("SELECT * FROM dbo.Blogs")
.ToList();
However in a case like yours it is complex and a joining of multiple tables and CTEs. I would suggest you create a custom object, POCO C# in code, and assign it a DbSet<> in your model builder. Then you can do something like this:
var custom = context.YOURCUSTOMOBJECT.FromSql("(crazy long SQL)").ToList();
If your return matches the type it may work. I did something similar and just wrapped my whole method in a procedure. However EF Core you need to make a migration manually up and then add the creation of the proc manually in the 'Up' method of the migration if you wish to deploy it. If you went that route your proc would need to exist on the server already or deploy it like said above and do something similar to this:
context.pGetResult.FromSql("pGetResult #p0, #p1, #p2", parameters: new[] { "Flight", null, null }).ToList()
The important thing to note is you need to create a DBSet object first in your model context so the context you are calling knows the well typed object it is returning from direct SQL. It must match EXACTLY the columns and types being returned.
EDIT 3-8
To be sure you need to do a few steps I will write out:
A POCO class that has a Data Annotation of [Key] above a distinct property. This class matches your columns of what a procedure returns exactly.
A DBSet<(POCO)> in your context.
Create a new Migration with: "Dotnet ef Migrations add 'yourname'"
Observe the new migration scripts. If anything generating a table for the POCO gets created, erase it. You don't need it. This is for a result set not storage in the database.
Change the 'Up' section to manually script your SQL to the database something like below. Also ensure you drop the data if you ever want to revert in the 'Down' section
protected override void Up(MigrationBuilder migrationBuilder)
{
migrationBuilder.Sql(
"create proc POCONameAbove" +
"( #param1 varchar(16), #Param2 int) as " +
"BEGIN " +
"Select * " +
"From Table "
"Where param1 = #param1 " +
" AND param2 = #param2 "
"END"
);
}
protected override void Down(MigrationBuilder migrationBuilder)
{
migrationBuilder.Sql("drop proc POCONameAbove");
}
So now you essentially hijacked the migration to do explicitly what you want. Test it out by deploying the changes to the database with "dotnet ef database update 'yourmigrationname'".
Observe the database, it should have your proc if the database update succeeded and you did not accidentally create a table in your migration.
The section you said you didn't understand is what gets the data in EF Core. Let's break it up:
context.pGetResult.FromSql("pGetResult #p0, #p1, #p2", parameters: new[] { "Flight", null, null }).ToList()
context.pGetResult = is using the DbSet you made up. It keeps you well typed to your proc.
.FromSQL( = telling the context you are going to do some SQL directly in the string.
"pGetResult #p0, #p1, #p2" = I am naming a procedure in the database that has three params.
, parameters: new[] { "Flight", null, null }) = I am just doing an array of objects that is in order of the parameters as needed. You need to match the SQL types of course but provided that is okay it will be fine.
.ToListAsync() = I want a collection and my goto is always ToList when debugging something.
Hope that helps. Once I learned this would work it opened up a whole other world of what I could do. You can take a look at a project I have done that is unfinished for reference. I hard coded a controller to show the proc with preset values. But it could be changed easily to just inject them in the api.
https://github.com/djangojazz/EFCoreTest/tree/master/EFCoreCodeFirstScaffolding

Self reference update on insert trigger in Informix

I'm extracting data from various sources into one table. In this new table, there's a field called lineno. This field value is should be in sequence based on company code and batch number. I've wrote the following procedure
CREATE PROCEDURE update_line(company CHAR(4), batch CHAR(8), rcptid CHAR(12));
DEFINE lineno INT;
SELECT Count(*)
INTO lineno
FROM tmp_cb_rcpthdr
WHERE cbrh_company = company
AND cbrh_batchid = batch;
UPDATE tmp_cb_rcpthdr
SET cbrh_lineno = lineno + 1
WHERE cbrh_company = company
AND cbrh_batchid = batch
AND cbrh_rcptid = rcptid;
END PROCEDURE;
This procedure will be called using the following trigger
CREATE TRIGGER tmp_cb_rcpthdr_ins INSERT ON tmp_cb_rcpthdr
REFERENCING NEW AS n
FOR EACH ROW
(
EXECUTE PROCEDURE update_line(n.company, cbrh_batchid, cbrh_rcptid)
);
However, I got the following error
SQL Error = -747 Table or column matches object referenced in
triggering statement.
From oninit.com, I learn that the error caused by a triggered SQL statement acts on the triggering table which in this case is the UPDATE statement.
So my question is, how do I solve this problem? Is there any work around or better solution?
I think the design needs to be reconsidered. For a start, what happens if some rows get deleted from tmp_cb_rcpthdr ? The COUNT(*) query will result in duplicate lineno values.
Even if this is an ETL only process, and you can be confident the data won't be manipulated from elsewhere, performance will be an issue, and will only get worse the more data you have for any one combination of company and batch_id.
Is it necessary for the lineno to increment from zero, or is it just to maintain the original load order? Because if it's the latter, a SEQUENCE or a SERIAL field on the table will achieve the same end, and be a lot more efficient.
If you must generate lineno in this way, I would suggest you create a second control table, keyed on company and batch_id, that tracks the current lineno value, ie: (untested)
CREATE PROCEDURE update_line(company CHAR(4), batch CHAR(8));
DEFINE lineno INT;
SELECT cbrh_lineno INTO lineno
FROM linenoctl
WHERE cbrh_company = company
AND cbrh_batchid = batch;
UPDATE linenoctl
SET cbrh_lineno = lineno + 1
WHERE cbrh_company = company
AND cbrh_batchid = batch;
-- A test that no other process has grabbed this record
-- might need to be considered here, ie cbrh_lineno = lineno
RETURN lineno + 1
END PROCEDURE;
Then use it as follows:
CREATE TRIGGER tmp_cb_rcpthdr_ins INSERT ON tmp_cb_rcpthdr
REFERENCING NEW AS n
FOR EACH ROW
(
EXECUTE PROCEDURE update_line(n.company, cbrh_batchid) INTO cbrh_lineno
);
See the IDS documentation for more on using calculated values with triggers.

MySql syntax question regarding CONCAT and strings

Environment is MySql 5.1.5.
This is a snippet from a larger stored procedure. Assume the variables are properly declared and set before this code is reached.
When I run the following loop, something seems to be failing in the CONCAT. #columnName and #xmlQuery are both VARCHARs. When I "SELECT #xmlQuery" at the end of the procedure, it is {null}.
If I simply replace:
SET #xmlQuery = CONCAT(#xmlQuery, #columnName);
with:
SET #xmlQuery = CONCAT(#xmlQuery, 'test');
then I get a nice string back like:
select xml_tag('result',null,null,concat(testtesttesttesttesttest
as one would expect.
WHY doesn't the CONCAT work with the local VARCHAR variable?
SET #xmlQuery = 'select xml_tag(''result'',null,null,concat(';
SET #columnCount = (SELECT COUNT(*) FROM ColumnNames);
WHILE (#rowIndex <= #columnCount) DO
SELECT #columnName = ColumnName FROM ColumnNames WHERE ID = #rowIndex;
SET #xmlQuery = CONCAT(#xmlQuery, #columnName);
SET #rowIndex = #rowIndex + 1;
END WHILE;
The problem turned out to be a conflict between the local variable #columnName and the column ColumnName in my temporary table.

Why does stored procedure invalidate SQL Cache Dependency?

After many hours, I finally realize that I am working correctly with the Cache object in my ASP.NET application but my stored procedures stops it from working correctly.
This stored procedure works correctly:
CREATE PROCEDURE [dbo].[ListLanguages]
#Page INT = 1,
#ItemsPerPage INT = 10,
#OrderBy NVARCHAR (100) = 'ID',
#OrderDirection NVARCHAR(4) = 'DESC'
AS
BEGIN
SELECT ID, [Name], Flag, IsDefault FROM dbo.Languages
END
But this (the one I wanted) doesn't:
CREATE PROCEDURE [dbo].[ListLanguages]
#Page INT = 1,
#ItemsPerPage INT = 10,
#OrderBy NVARCHAR (100) = 'ID',
#OrderDirection NVARCHAR(4) = 'DESC',
#TotalRecords INT OUTPUT
AS
BEGIN
SET #TotalRecords = 10
EXEC('SELECT ID, Name, Flag, IsDefault FROM (
SELECT ROW_NUMBER() OVER (ORDER BY ' + #OrderBy + ' ' + #OrderDirection + ') as Row, ID, Name, Flag, IsDefault
FROM dbo.Languages) results
WHERE Row BETWEEN ((' + #Page + '-1)*' + #ItemsPerPage + '+1) AND (' + #Page + '*' + #ItemsPerPage + ')')
END
I gave the #TotalRecords parameter the value 10 so you can be sure that the problem is not from the COUNT(*) function which I know is not supported well.
Also, when I run it from SQL Server Management Studio, it does exactly what it should do. In the ASP.NET application the results are retrieved correctly, only the cache is somehow unable to work!
Can you please help?
Maybe a hint
I believe that the reason why the dependency HasChanged property is related to the fact that the column Row generated from the ROW_NUMBER is only temporary and, therefore, the SQL SERVER is not able to to say whether the results are changed or not. That's why HasChanged is always set to true.
Does anyone know how to paginate results from SQL SERVER without using COUNT or ROW_NUMBER functions?
not enough cache size.
Sql cache dependency for .NET 3.5 only works for simple queries. Maybe .NET 4 will surprise me.
1 - Can you copy & paste the code you actually use to cache the results of that sproc ?
2 - Have you tried a sproc where you use straight query instead of EXEC-ing a string ?
Yes #2 means that you can't change the structure of the query on the fly :-) but unless you are calculating your own caching criteria in #1 that's the rule of caching you have to abide by in general. No caching mechanism is ever going to parse a string from EXEC from you.
EXEC-ing a string in a sproc makes that sproc a total toss of a coin on each and every run even for SQL Server itself. It also leaves you open to script injection attacks since your query is still being composed by strings at run time - it's not any different from composing the whole string in C# and passing it to sproc to "just EXEC whatever is inside"

Resources