I have a query that returns 2 or more results.
I want to be able to pass ALL the results, one-by-one to power another query. I decided to embed the queries.
=IFERROR(QUERY('Sheet1'!A1:J36,"SELECT J,B,C,D,E,F,G,H WHERE J Contains '" & B11 & "' AND B='" & QUERY('Sheet2'!$A$3:$AR$103,"SELECT D WHERE " & VLOOKUP($B11,'Test Sheet'!$A$33:$B$43,2,FALSE) & "='Yes' AND (AR='High' OR AR='Low') ORDER BY AH desc" ,0) & "' ORDER BY G desc" ,1),"No Results")
The query as a whole runs successfully, however only the first result of the initial query is passed to the outer query. This means I don't get all the matches I am expecting.
Is there a way of accomplishing this?
I think this is what your'e trying to do, on the MK.Help tab in cell A74.
=IFERROR(QUERY('Interventions V0.2'!A1:J39,"SELECT J,B,C,D,E,F,G,H WHERE J Contains '" & B11 & "' AND B matches'" & JOIN("|",QUERY('Biomarker Ref. Sheet (Static)'!$A$3:$AR$103,"SELECT D WHERE " & VLOOKUP($B11,'Test Sheet'!$A$33:$B$43,2,FALSE) & "='Yes' AND (AR='High' OR AR='Low')",0)) & "' ORDER BY G desc" ,1),"No Results")
I am running 6 Ignite servers on version 2.7.5. The problem is when I am hitting queries using my client API I am not getting all records. Only some records are coming. I am using partitioned cache. I don't want to use replicated mode. When queried with DBeaver it show all records have been fetched.
The following code is used to fetch the data:
public List<Long> getGroupIdsByUserId(Long createdBy) {
final String query = "select g.groupId from groups g where g.createdBy = ? and g.isActive = 1";
SqlFieldsQuery sql = new SqlFieldsQuery(query);
sql.setArgs(createdBy);
List<List<?>> rsList = groupsCache.query(sql).getAll();
List<Long> ids = new ArrayList<>();
for (List<?> l : rsList) {
ids.add((Long)l.get(0));
}
return ids;
}
Ignite Version - 2.7.5
Client Query method
And the join Query is :
final String query = "select distinct u.userId from
groupusers gu "
+ "inner join \"GroupsCache\".groups g on gu.groupId = g.groupId
"
+ "inner join \"OrganizationsCache\".organizations o on
gu.organizationId = o.organizationId "
+ "inner join \"UsersCache\".users u on gu.userId = u.userId
where " + "g.groupId = ? and "
+ "g.isActive = 1 and " + "gu.isActive = 1 and " +
"gu.createdBy
= ? and " + "o.organizationId = ? and "
+ "o.isActive = 1 and " + "u.isActive = 1";
For the join query Actual records in db is 120 but with ignite client only 3-4 records are comming .and they are not consistent. sometime it comes 3 records and some time it is 4 records. And for query
select g.groupId from groups g where g.createdBy = ? and g.isActive = 1
actual records are 27 but comming records are sometimes 20 sometimes 19 and sometimes complete. Please Help me with this and with collocated joins..
Most likely this would mean that your affinity is incorrect.
Apache Ignite assumes that your data has proper affinity, i.e. when joining two tables, rows to join will always be available on the same node. This works when you either join by primary key, or by a part of primary key which is marked as affinity column (e.g. by #AffinityKeyMapped annotation). There's a documentation page about affinity.
You can check that by setting distribtedJoins connection setting to true. If you see all the records after that, it means you need to fix your affinity.
I've written a SQL query that uses six tables to construct the output for a separate C# program, and I'm looking for a way to speed up the search.
I walked through the execution plan and I notice that one spot in particular is taking up 85% of the execution time, labeled with a comment in the code block below with --This spot right here.
select distinct
ta.account_num as 'Account',
tl.billing_year as 'Year',
tl.billing_month as 'Month',
ta.bill_date as 'Bill Date',
DATEDIFF(DD, cast(cast(tl.billing_year as varchar(4)) + right('0' + cast(tl.billing_month as varchar(2)), 2) + right('0' + (case when billing_month in (4,6,9,11) and bill_date > 30 then '30' when billing_month = 2 and bill_date > 28 then '28' else cast(bill_date as varchar(2)) end), 2) as datetime), GETDATE()) as 'Past',
DATEADD(Day,10,d) as 'To be Loaded Before',
p.provider_name as 'Provider',
c.client as 'Client',
tip.invoice_load_type as 'Load Type'
from
tm_invoice_load tl
inner join
tm_client c on tl.client_id = c.client_id
inner join
tm_client_account ta on (ta.account_num = tl.account_num or ta.pilot = tl.account_num) --This spot right here
inner join
provider p on p.id_provider = ta.id_provider
inner join
tm_calendar cal on DATEPART(DAY, d) = DATEPART(DAY, entry_dt)
and DATEPART(MONTH, d) = DATEPART(MONTH, entry_dt)
and DATEPART(YEAR, d) = DATEPART(YEAR, entry_dt)
inner join
tm_invoice_load_type tip on tip.invoice_load_type_id = ta.invoice_load_type_id
where
not exists (select top 1 id
from tm_invoice_load
where billing_year = tl.billing_year
and billing_month = tl.billing_month
and status_id = 1
and (account_num = ta.account_num or account_num = ta.pilot))
and ta.status_id = 1
--and ta.invoice_load_type_id = 2
and tl.status_id = 2
and (ta.pilot is null or ta.account_num <> ta.pilot)
order by
c.client, p.provider_name, ta.account_num, tl.billing_year, tl.billing_month
Above, it's when joining tm_client_account, where it has an account number column, and a pilot in case it is a child to another account. When such a thing happens, the parent account is NOT selected (ta.pilot is null or ta.account_num <> ta.pilot), and instead the child accounts are shown.
The query works exactly as intended, but it's kinda slow, and as these tables grow (and they are doing so on a nearly exponential curve) it will only get worse.
Is there some way that I can accomplish this join in a faster way? Even small gains would be great!
If it helps, I'm running this on SQL Server 2008 R2.
Here is a screenshot of the execution plan. If needed, I can provide more/different information.
I don't think there is anything really wrong with the query. I tend to keep
things like ta.status_id = 1 inside the relevant (ta) JOIN clause rather than in the WHERE, but the query optimizer is smart enough to handle that.
The one thing that I'd suggest to change is this one:
inner join
tm_calendar cal on DATEPART(DAY, d) = DATEPART(DAY, entry_dt)
and DATEPART(MONTH, d) = DATEPART(MONTH, entry_dt)
and DATEPART(YEAR, d) = DATEPART(YEAR, entry_dt)
Try to replace it with this and see what happens:
inner join
tm_calendar cal on Convert(date, d) = Convert(date, entry_dt)
The result should be identical but this way the system can still use the index (and statistics) on the d and/or entry_dt fields. Convert(date,...) is one of the few SARG-able converts there are.
Apart from that it all depends on the indexes available, the amount and distribution of the data.
UPDATE: as you mention that the heavy cost (85%) seems to come from the ta part, here some thoughts
I've taken the liberty of reformatting the query a bit so it makes more sense to me. Mainly I've grouped the relevant parts together so it's more clear (to me) what's being done on what table.
SELECT DISTINCT
ta.account_num as 'Account',
tl.billing_year as 'Year',
tl.billing_month as 'Month',
ta.bill_date as 'Bill Date',
DATEDIFF(DD, cast(cast(tl.billing_year as varchar(4)) + right('0' + cast(tl.billing_month as varchar(2)), 2) + right('0' + (case when billing_month in (4,6,9,11) and bill_date > 30 then '30' when billing_month = 2 and bill_date > 28 then '28' else cast(bill_date as varchar(2)) end), 2) as datetime), GETDATE()) as 'Past',
DATEADD(Day,10,d) as 'To be Loaded Before',
p.provider_name as 'Provider',
c.client as 'Client',
tip.invoice_load_type as 'Load Type'
FROM tm_invoice_load tl
JOIN tm_client c
ON tl.client_id = c.client_id
JOIN tm_client_account ta
ON (ta.account_num = tl.account_num or ta.pilot = tl.account_num) --This spot right here
AND ta.status_id = 1
AND (ta.pilot IS NULL OR ta.account_num <> ta.pilot)
--AND ta.invoice_load_type_id = 2
JOIN provider p
ON p.id_provider = ta.id_provider
JOIN tm_invoice_load_type tip
ON tip.invoice_load_type_id = ta.invoice_load_type_id
JOIN tm_calendar cal
ON Convert(date, cal.d) = Convert(date, tl.entry_dt)
WHERE tl.status_id = 2
AND NOT EXISTS (SELECT *
FROM tm_invoice_load xx
WHERE xx.billing_year = tl.billing_year
AND xx.billing_month = tl.billing_month
AND xx.status_id = 1
AND (xx.account_num = ta.account_num OR xx.account_num = ta.pilot))
ORDER BY c.client, p.provider_name, ta.account_num, tl.billing_year, tl.billing_month
Could you give the query as it is here a try and time it + check the query plan?
Focussing on tm_client_account the next logical step would be to add a specific index on this table for this query.
The problem is that the filtered index capabilities of MSSQL are ... well, challenging =)
The closest I get is this:
CREATE INDEX idx_test ON tm_client_account (account_num) INCLUDE (pilot, bill_date, id_provider, invoice_load_type_id) WHERE (status = 1)
Could you create the index and see if that helps?
If the query-plan shows that rather continues using a table-scan on tm_client_account then add WITH (INDEX = idx_test) to the JOIN clause and try again.
If all that doesn't really cut down on the time the query takes you could try pre-filtering things by means of an indexed view.
WARNING: Keep in mind though that adding indexes to the table will cause some performance degradation when doing INSERT/UPDATE/DELETE on the table; adding a filtered view on it will double-so! If the data in this table (very) volatile, the indexes and/or view might make other parts of your system noticably slower. Then again, SQL is pretty good at this so the only way to know for sure is by testing.
Step 1: create the view
CREATE VIEW v_test_tm_client_account_filtered
WITH SCHEMABINDING
AS
SELECT id_client_account, -- I am assuming that there is an identity-field like this on the table, adapt as needed!
account_num,
pilot,
bill_date,
id_provider,
invoice_load_type_id
FROM tm_client_account ta
WHERE status = 1
AND ta.status_id = 1
AND (ta.pilot IS NULL OR ta.account_num <> ta.pilot)
--AND ta.invoice_load_type_id = 2
Step 2: index the view
CREATE UNIQUE CLUSTERED INDEX idx_test ON v_test_tm_client_account_filtered (account_num, pilot, id_client_account)
Step 3: adapt the query to use the view rather than the table directly
JOIN tm_client_account ta
ON (ta.account_num = tl.account_num or ta.pilot = tl.account_num) --This spot right here
AND ta.status_id = 1
AND (ta.pilot IS NULL OR ta.account_num <> ta.pilot)
--AND ta.invoice_load_type_id = 2
becomes
JOIN v_test_tm_client_account_filtered ta
ON ta.account_num = tl.account_num
OR ta.pilot = tl.account_num
And then run the query once more...
In case someone comes across this with a similar issue, the only way I've been able to make this any faster has been to tell the query not to lock the data. I don't know why this helped, but it cut the query time in less than half. This has had no effect on the returned data, and explicitly sorting it makes it unapparent.
select distinct
ta.account_num as 'Account',
tl.billing_year as 'Year',
tl.billing_month as 'Month',
ta.bill_date as 'Bill Date',
DATEDIFF(DD, cast(cast(tl.billing_year as varchar(4)) + right('0' + cast(tl.billing_month as varchar(2)), 2) + right('0' + (case when billing_month in (4,6,9,11) and bill_date > 30 then '30' when billing_month = 2 and bill_date > 28 then '28' else cast(bill_date as varchar(2)) end), 2) as datetime), GETDATE()) as 'Past',
DATEADD(Day,10,d) as 'To be Loaded Before',
p.provider_name as 'Provider',
c.client as 'Client',
tip.invoice_load_type as 'Load Type'
from
tm_invoice_load tl
with (nolock) --This accelerates the select statement!
inner join
tm_client c on tl.client_id = c.client_id
[...]
Meanwhile a computed column was added, and this query is still down from 7-8 seconds to 2.5-3.5 seconds. Again, I don't know why, but it helped. It might help someone else too.
Thanks in advance for any help with this, it is highly appreciated.
So, basically, I have a Greenplum database and I am wanting to select the table size for the top 10 largest tables. This isn't a problem using the below:
select
sotaidschemaname schema_name
,sotaidtablename table_name
,pg_size_pretty(sotaidtablesize) table_size
from gp_toolkit.gp_size_of_table_and_indexes_disk
order by 3 desc
limit 10
;
However I have several partitioned tables in my database and these show up with the above sql as all their 'child tables' split up into small fragments (though I know they accumalate to make the largest 2 tables). Is there a way of making a script that selects tables (partitioned or otherwise) and their total size?
Note: I'd be happy to include some sort of join where I specify the partitoned table-name specifically as there are only 2 partitioned tables. However, I would still need to take the top 10 (where I cannot assume the partitioned table(s) are up there) and I cannot specify any other table names since there are near a thousand of them.
Thanks again,
Vinny.
Your friends would be pg_relation_size() function for getting relation size and you would select pg_class, pg_namespace and pg_partition joining them together like this:
select schemaname,
tablename,
sum(size_mb) as size_mb,
sum(num_partitions) as num_partitions
from (
select coalesce(p.schemaname, n.nspname) as schemaname,
coalesce(p.tablename, c.relname) as tablename,
1 as num_partitions,
pg_relation_size(n.nspname || '.' || c.relname)/1000000. as size_mb
from pg_class as c
inner join pg_namespace as n on c.relnamespace = n.oid
left join pg_partitions as p on c.relname = p.partitiontablename and n.nspname = p.partitionschemaname
) as q
group by 1, 2
order by 3 desc
limit 10;
select * from
(
select schemaname,tablename,
pg_relation_size(schemaname||'.'||tablename) as Size_In_Bytes
from pg_tables
where schemaname||'.'||tablename not in (select schemaname||'.'||partitiontablename from pg_partitions)
and schemaname||'.'||tablename not in (select distinct schemaname||'.'||tablename from pg_partitions )
union all
select schemaname,tablename,
sum(pg_relation_size(schemaname||'.'||partitiontablename)) as Size_In_Bytes
from pg_partitions
group by 1,2) as foo
where Size_In_Bytes >= '0' order by 3 desc;
can i do JOIN with two querys with yql, i have two querys:
select *
from yahoo.finance.historicaldata
where symbol in ('YHOO')
and startDate='" + startDate + "'
and endDate='" + endDate + "'&format=json&diagnostics=true&env=store://datatables.org/alltableswithkeys&callback="
and
select symbol,
Earnings_per_Share,
Dividend_Yield,
week_Low,
Week_High,
Last_Trade_Date,
open,
low,
high,
volume,
Last_Trade
from csv where url="http://download.finance.yahoo.com/d/quotes.csv?s=YHOO,GOOG&f=seyjkd1oghvl1&e=.csv"
and columns="symbol,Earnings_per_Share,Dividend_Yield,Last_Trade_Date,week_Low,Week_High,open,low,high,volume,Last_Trade"
i need to convert this two querys to one. how to do this?