I have an sql query that retrieves about 2500 rows in 1 second. When i put this query in a table valued function it took 20 minutes and only retrieved 450-500 rows and it was still running.
select g.AccountNumber,g.AccountName,g.ProductNumber,g.ProductName,g.SlsNumber,g.SlsName,g.GoalAmount,s.SalesAmount
from
(
select g.AccountNumber,g.AccountName,g.ProductNumber,g.ProductName,g.SlsNumber,g.SlsName,sum(g.GoalAmount) GoalAmount
from RucoNetApiBase.dbo.Goals g with(nolock)
where ((#month <> 0 and g.Month_=#month) or (#month=0 and 1=1)) and (g.Year_=year(getdate()))
group by g.AccountNumber,g.AccountName,g.ProductNumber,g.ProductName,g.SlsNumber,g.SlsName
) g left outer join
(
select s.ClientCode,left(s.ItemCode,5) ItemCode,s.SalesmanCode,sum(s.Amount) SalesAmount
from dbo.V_NetSalesBase s
where ((#month <> 0 and s.Month_=#month) or (#month=0 and 1=1)) and (s.Year_=year(getdate())) and (s.FicheSpeCode='')
group by s.ClientCode,left(s.ItemCode,5),s.SalesmanCode
) s on (g.AccountNumber=s.ClientCode) and (g.ProductNumber=s.ItemCode) and (g.SlsNumber=s.SalesmanCode)
And the function;
ALTER FUNCTION [dbo].[F_GoalSummary]
(
#month int
)
RETURNS TABLE
AS
RETURN
(
select g.AccountNumber,g.AccountName,g.ProductNumber,g.ProductName,g.SlsNumber,g.SlsName,g.GoalAmount,s.SalesAmount
from
(
select g.AccountNumber,g.AccountName,g.ProductNumber,g.ProductName,g.SlsNumber,g.SlsName,sum(g.GoalAmount) GoalAmount
from RucoNetApiBase.dbo.Goals g with(nolock)
where ((#month <> 0 and g.Month_=#month) or (#month=0 and 1=1)) and (g.Year_=year(getdate()))
group by g.AccountNumber,g.AccountName,g.ProductNumber,g.ProductName,g.SlsNumber,g.SlsName
) g left outer join
(
select s.ClientCode,left(s.ItemCode,5) ItemCode,s.SalesmanCode,sum(s.Amount) SalesAmount
from dbo.V_NetSalesBase s
where ((#month <> 0 and s.Month_=#month) or (#month=0 and 1=1)) and (s.Year_=year(getdate())) and (s.FicheSpeCode='')
group by s.ClientCode,left(s.ItemCode,5),s.SalesmanCode
) s on (g.AccountNumber=s.ClientCode) and (g.ProductNumber=s.ItemCode) and (g.SlsNumber=s.SalesmanCode)
)
and use of function;
SELECT [t0].[AccountNumber], [t0].[AccountName], [t0].[ProductNumber], [t0].[ProductName], [t0].[SlsNumber], [t0].[SlsName], [t0].[GoalAmount], [t0].[SalesAmount]
FROM [dbo].[F_GoalSummary](5) AS [t0]
ORDER BY [t0].[ProductName]
if a execute query itself in management studio every thing is fine(2500 rows in 1 second) but when i put the query text in a table valued function and use it in a query i get only 450 rows in 20 minutes. It was still running. That function was running fine yesterday.
i am using sql server 2008 r2
Microsoft SQL Server Management Studio 10.50.1600.1
Microsoft Analysis Services Client Tools 10.50.1600.1
Microsoft Data Access Components (MDAC) 6.1.7600.16385
Microsoft MSXML 3.0 6.0
Microsoft Internet Explorer 8.0.7600.16385
Microsoft .NET Framework 2.0.50727.4963
Operating System 6.1.7600
A simple work around can be, not to using functions. but a have more complicated functions that i used clr stored procedures in and can't easily copy paste.
i have no idea what is difference between those two.
please help me!!
Have you tried to put the results of the two queries you try to join into a table variable and join those two variables?
What happens if you don't order the results?
Related
We are facing this strange issue with Hive on HDInsight 4.0 - Hive 3.1.0
By default, Hive is set to handle all tables as transactional.
We have 3 tables which join together:
a
b
c
In initial phase 2 of those tables were partitioned by year/month (b,c).
Now, we have repartitioned them by year/month/day (b,c).
Which generates a number of around 200 partitions for each table(b,c).
Now if we do a select from a join b join c we get a transaction lock error.
However, if I do Select a join b - works fine, if I do Select a join c works fine.
Also, if I restrict in the join clause for one of the newly partitioned tables to scan only one partition like
Select a join b on 1=1 join c on 1=1 and c.YEAR=2019 AND c.MONTH=1
it also works fine.
It seems that the increased number of partitions which the join has to scan, or something like that is preventing Hive to take a read lock on those tables... which is very strange, since the tables do not share anything, except the same database.
Any ideas?
Full error:
java.sql.SQLException: Error while processing statement: FAILED: Error in acquiring locks: Error communicating with the metastore
at org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:401)
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:266)
at com.hortonworks.hivestudio.hive.HiveJdbcConnectionDelegate.execute(HiveJdbcConnectionDelegate.java:56)
at com.hortonworks.hivestudio.hive.actor.StatementExecutor.runStatement(StatementExecutor.java:93)
at com.hortonworks.hivestudio.hive.actor.StatementExecutor.handleMessage(StatementExecutor.java:74)
at com.hortonworks.hivestudio.hive.actor.HiveActor.onReceive(HiveActor.java:45)
at akka.actor.UntypedAbstractActor$$anonfun$receive$1.applyOrElse(AbstractActor.scala:243)
at akka.actor.Actor.aroundReceive(Actor.scala:514)
at akka.actor.Actor.aroundReceive$(Actor.scala:512)
at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:132)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:527)
at akka.actor.ActorCell.invoke(ActorCell.scala:496)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
at akka.dispatch.Mailbox.run(Mailbox.scala:224)
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
I recently started using SAS, only receiving a basic training that didn't cover proc sql. I'd like to read up a bit more on SAS sql when I have the time.
For now, I found a solution to what I wanted to do, but I'm having difficulties understanding what is happening.
My issue started when I wanted to find out which subjects in my dataset have a certain value for all their records. I made use of my previously written snippet of code that I thought I understood. I just tried adding a couple more variables and group by statements:
data have;
input subject:$1. myvar:1. mycount:1.;
datalines;
a 1 1
a 0 2
a 0 3
b 1 1
b 0 2
b 1 3
c 1 1
c 1 2 /*This subject has myvar = 1 for all its observations*/
;
run;
*find subjects;
proc sql;
create table want as
/* select*/
/* distinct x.subject */
/* from */
(select distinct subject, count(myvar) as myvar_c
from have where myvar = 1 group by subject) x,
(select distinct subject, max(mycount) as max_c
from have group by subject) y
where x.subject = y.subject and x.myvar_c = y.max_c;
quit;
When removing the commented 'select distinct x.subject from' in the create table statement, the above code works as should.
However, I've previously also created another piece of code, to select all subjects in my dataset that have two types of records:
data have2;
input subject:$1. mytype:1.;
datalines;
a 1
a 0
a 0
b 1
b 0
b 1
c 1
c 1 /*This subject doesn't have two types of records in all its observations*/
;
run;
*Find subjects;
proc sql;
create table want2 as select
distinct x.subject from
have2 x,
(select distinct subject, count(distinct mytype) as mytype_c from have2 group by subject) y
where y.mytype_c = 2 and x.subject = y.subject;
quit;
Which is similar, but didn't require the additional select statement. The first code has 3 select statements, the second code only requires two select statements.
Can someone inform me why this is exactly required?
Or link me some good documentation that lists the specifications of these types of joins - can anyone also inform me of the specific name of this type of join where you only use a comma?
while I'm writing, also see that could've used my code I initially wrote to find subjects that have only 1 type of record and tweak it for my current issue >.< but still would like to know what is happening in the first example.
The SQL join construct
FROM ONE, TWO, THREE, …
is known as a CROSS JOIN and is a join without criteria. The comma (,) syntax is less prevalent today and the following construct is recommended
FROM ONE
CROSS JOIN TWO
CROSS JOIN THREE
The result set is a cartesian product and the number of rows is the product of the number of rows in the cross joined tables.
When the query has criteria (WHERE clause) the join is an INNER JOIN.
The SAS documentation for Proc SQL is a good starting point and includes examples.
joined-table Component
Joins a table with itself or with other tables or views.
…
Table of Contents
Syntax
Required Arguments
Optional Argument
Details
Types of Joins
Joining Tables
Table Limit
Specifying the Rows to Be Returned
Table Aliases
Joining a Table with Itself
Inner Joins
Outer Joins
Cross Joins
Union Joins
Natural Joins
Joining More Than Two Tables
Comparison of Joins and Subqueries
General tip:
If you want to fool around (fiddle) with SQL queries in a browser, try visiting
SQL Fiddle web site.
We are currently running a number of hand-crafted and optimized OData queries on Exact Online using Python. This runs on several thousand of divisions. However, I want to migrate them to Invantive SQL for ease of maintenance.
But some of the optimizations like explicit orderby in the OData query are not forwarded to Exact Online by Invantive SQL; they just retrieve all data or the top x and then do an orderby.
Especially for maximum value determination that can be a lot slower.
Simple sample on small table:
https://start.exactonline.nl/api/v1/<<division>>/financial/Journals?$select=BankAccountIBAN,BankAccountDescription&$orderby=BankAccountIBAN desc&$top=5
Is there an alternative to optimize the actual OData queries executed by Invantive SQL?
You can either use the Data Replicator or send the hand-craft OData query through a native platform request, such as:
insert into NativePlatformScalarRequests
( url
, orig_system_group
)
select replace('https://start.exactonline.nl/api/v1/{division}/financial/Journals?$select=BankAccountIBAN,BankAccountDescription&$orderby=BankAccountIBAN desc&$top=5', '{division}', code)
, 'MYSTUFF-' || code
from systempartitions#datadictionary
limit 100 /* First 100 divisions. */
create or replace table exact_online_download_journal_top5#inmemorystorage
as
select jte.*
from ( select npt.result
from NativePlatformScalarRequests npt
where npt.orig_system_group like 'MYSTUFF-%'
and npt.result is not null
) npt
join jsontable
( null
passing npt.result
columns BankAccountDescription varchar2 path 'd[0].BankAccountDescription'
, BankAccountIBAN varchar2 path 'd[0].BankAccountIBAN'
) jte
From here on you can use the in memory table, such as:
select * from exact_online_download_journal_top5#inmemorystorage
But of course you can also 'insert into sqlserver'.
I made a procedure that uses a cursor to generate a report. It is designed to return products with p_qoh > avg(p_qoh).
When i run the cursor on its own outside of the procedure, APEX tells me
PLS-00204: function or pseudo-column 'AVG' may be used inside a SQL
statement only
How is this not inside a SQL statement? Im new to SQL just a required class as a comp sci major.
Heres the whole block. If you run just the cursor you will see what I mean.
CREATE OR REPLACE PROCEDURE prod_rep
IS
CURSOR cur_qoh IS
SELECT p_qoh, p_descript, p_code
FROM xx_product;
TYPE type_prod IS RECORD(
prod_qoh xx_product.p_qoh%TYPE,
prod_code xx_product.p_code%TYPE,
prod_descr xx_product.p_descript%TYPE);
rec_prod type_prod;
BEGIN
OPEN cur_qoh;
LOOP
FETCH cur_qoh INTO rec_prod;
EXIT WHEN cur_qoh%NOTFOUND;
IF rec_prod.prod_qoh > avg(rec_prod.prod_qoh) THEN
DBMS_OUTPUT.PUT_LINE(rec_prod.prod_code||' -> '||rec_prod.prod_desc);
END IF;
END LOOP;
CLOSE cur_qoh;
END;
UPDATE: working block
BEGIN
FOR cur_r IN (SELECT p_qoh, p_descript, p_code FROM xx_product
WHERE p_qoh > (SELECT avg(p_qoh) FROM xx_product))
LOOP
DBMS_OUTPUT.PUT_LINE(cur_r.p_code||' -> '|| cur_r.p_descript);
END LOOP;
END;
Well, yes - this is (PL/)SQL, but you just can't use AVG that way. Have a look: this is an example based on Scott's schema.
Average salary is
SQL> select avg(sal) from emp;
AVG(SAL)
----------
2077,08333
In order to select salaries higher than the average one, you'd use a subquery. Let's call this query the "A" query (for future reference):
SQL> select ename, sal
2 from emp
3 where sal > (select avg(sal) from emp);
ENAME SAL
---------- ----------
JONES 2975
BLAKE 2850
CLARK 2450
KING 5000
FORD 3000
SQL>
Also, no need to declare that much things - a simple cursor FOR loop is easier to maintain:
SQL> create or replace procedure prod_rep as
2 begin
3 for cur_r in (select ename, sal from emp
4 where sal > (select avg(sal) from emp))
5 loop
6 dbms_output.put_line(cur_r.ename ||' '|| to_char(cur_r.sal, '9990'));
7 end loop;
8 end;
9 /
Procedure created.
SQL> begin prod_rep; end;
2 /
JONES 2975
BLAKE 2850
CLARK 2450
KING 5000
FORD 3000
PL/SQL procedure successfully completed.
SQL>
See? No need to declare a cursor (OK, you do use that SELECT in a loop), type & record of that type, open the cursor, worry when to exit a loop, close the cursor.
Your code, however, doesn't make much sense in Apex environment. There's no DBMS_OUTPUT there, and it is rather unusual to create a report (either a classic or interactive one) using a procedure; I've never done that. I've used a function (which is a PL/SQL code) that returns a SQL query and based reports on that.
Your problem is a simple one, so - use the Wizard, create a report and - as its source - use the "A" query (I mentioned earlier). That's all you should do.
At that point you're into PL/SQL, not SQL.
You could use the analytic function AVG into your SQL query to get the average:
CURSOR cur_qoh IS
SELECT p_qoh, p_descript, p_code, AVG(p_goh) OVER () as p_goh_avg
FROM xx_product;
You can add your new field to type_prod:
TYPE type_prod IS RECORD(
prod_qoh xx_product.p_qoh%TYPE,
prod_code xx_product.p_code%TYPE,
prod_descr xx_product.p_descript%TYPE,
prod_avg xx_product.p_qoh%TYPE)
;
And then you can use your average into the LOOP:
IF rec_prod.prod_qoh > rec_prod.prod_avg THEN
It doesn't make sense to use AVG inside a PL/SQL loop. PL/SQL is procedural, at that point you're not dealing with the whole set of data: you manage only one row at a time. Whereas SQL deals with sets of data.
Using the analytic function, you get the average of the column on every row, so you can manage it inside the PL/SQL loop.
I have an MVC web site the presents a paged list of data records from a SQL Server database. The UI allows the user to filter the returned data on a number of different criteria, e.g. email address. Here is a snippet of code:
Stopwatch stopwatch = new Stopwatch();
var temp = SubscriberDB
.GetSubscribers(model.Filter, model.PagingInfo);
// Inspect SQL expression here
stopwatch.Start();
model.Subscribers = temp.ToList();
stopwatch.Stop(); // 9 seconds plus compared to < 1 second in Query Analyzer
When this code is run, the StopWatch shows an execution time of around 9 seconds. If I capture the generated SQL expression (just before it is evaluated with the .ToList() method) and cut'n'paste that as a query into SQL Server Management Studio, the execution times drops to less than 1 second. For reference here is the generated SQL expression:
SELECT [t2].[SubscriberId], [t2].[Email], [t3].[Reference] AS [DataSet], [t4].[Reference] AS [DataSource], [t2].[Created]
FROM (
SELECT [t1].[SubscriberId], [t1].[SubscriberDataSetId], [t1].[SubscriberDataSourceId], [t1].[Email], [t1].[Created], [t1].[ROW_NUMBER]
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY [t0].[Email], [t0].[SubscriberDataSetId]) AS [ROW_NUMBER], [t0].[SubscriberId], [t0].[SubscriberDataSetId], [t0].[SubscriberDataSourceId], [t0].[Email], [t0].[Created]
FROM [dbo].[inbox_Subscriber] AS [t0]
WHERE [t0].[Email] LIKE '%_EMAIL_ADDRESS_%'
) AS [t1]
WHERE [t1].[ROW_NUMBER] BETWEEN 0 + 1 AND 0 + 20
) AS [t2]
INNER JOIN [dbo].[inbox_SubscriberDataSet] AS [t3] ON [t3].[SubscriberDataSetId] = [t2].[SubscriberDataSetId]
INNER JOIN [dbo].[inbox_SubscriberDataSource] AS [t4] ON [t4].[SubscriberDataSourceId] = [t2].[SubscriberDataSourceId]
ORDER BY [t2].[ROW_NUMBER]
If I remove the email filter clause, then the controller's StopWatch returns a similar response time to the SQL Management Studio query, less than 1 second - so I am assuming that the basic interface to SQL plumbing is working correctly and that the problem lies with the evaluation of the Linq expression. I should also mention that this is quite a large database with upwards of 1M rows in the subscriber table.
Can anyone throw any light on why there should be such a high (x10) performance differential and what, if anything can be done to address this?
Well not sure about that. 1M rows with a full like can take quiet time. Is Email indexed? Can you run the query with Email% instead of %Email% and see what happen?