Crystal Reports External Join - join

I have 2 data sources that I am querying, then joining in Crystal Reports on a key string with a Left Outer Join. The intent of the report is to identify purchases made that were not processed. The issue is that CR refuses to show the matching right query records.
Data Source 1: Excel worksheet on my local machine containing raw
credit card purchases. "Left table"
Data Source 2: 2 subqueries from a hosted Oracle database with a
Union join containing processed credit card transactions. "Right
table"
Key String: The last 4 digits of a credit card number concatenated
with the date-time of the transaction, e.g. "223402-06-2019 04:15:00"
The queries return proper values when executed separately. I have verified that many records returned for the Left table actually do have matching Right table records that are not displayed. I did this using a separate report showing only the Right table query results and manually searching for different key strings.
I'm completely buffaloed and any assistance would be appreciated.
The SQL from Crystal Reports:
I:\Dept\DCS\MPOOL\Fleet Management Data\M5\M5 Automation Data Tables\ComData Transaction Data.xls
`SELECT DISTINCT CD.`First Name` AS UNIT_NO,
CD.`HIERARCHY LEVEL3` AS USE_DEPT,
DATEVALUE(MONTH(CD.`Transaction Date`) & "/" & DAY(CD.`Transaction Date`) & "/" & YEAR(CD.`Transaction Date`)) + TIMEVALUE(HOUR(CD.`Transaction Time`) & ":" & MINUTE(CD.`Transaction Time`) & ":" & SECOND(CD.`Transaction Time`)) AS TRANS_DT,
CD.`Odometer` AS ODOMETER,
CD.`Card Number` AS CARD_NO,
RIGHT(CD.`Card Number`, 4) & FORMAT(DATEVALUE(MONTH(CD.`Transaction Date`) & "/" & DAY(CD.`Transaction Date`) & "/" & YEAR(CD.`Transaction Date`)) + TIMEVALUE(HOUR(CD.`Transaction Time`) & ":" & MINUTE(CD.`Transaction Time`) & ":" & SECOND(CD.`Transaction Time`)), "mm-dd-yyyy hh:mm:ss") AS KEYSTRING
FROM `Sheet1$` CD
WHERE ISDATE(CD.`Transaction Date`) AND CD.`Transaction Date` >= FORMAT('02/01/2019', 'mm-dd-yyyy') AND CD.`Transaction Date` <= FORMAT('02/15/2019', 'mm-dd-yyyy')
EXTERNAL JOIN Command.KEYSTRING={?m5oksr: Command_1.KEYSTRING}
m5oksr
SELECT DISTINCT TCC.UNIT_NO,
VUDC.USING_DEPT_NO AS USE_DEPT,
TCC.ISSUE_DT + 2/24 AS TRANS_DT,
TCC.NEW_METER AS ODOMETER,
'COMP' AS STATUS,
TCC.CARD_NO AS CARD_NO,
SUBSTR(TCC.CARD_NO, 16, 4) || TO_CHAR(TCC.ISSUE_DT + 2/24, 'MM-DD-YYYY HH24:MI:SS') AS KEYSTRING,
FROM MFIVE.VIEW_TRIPCARD_COMPLETED_TRANS TCC
LEFT OUTER JOIN VIEW_UNIT_DEPT_COMP VUDC ON TCC.COMPANY = VUDC.COMPANY and TCC.UNIT_NO = VUDC.UNIT_NO
WHERE TCC.ISSUE_DT + 2/24 >= TO_DATE('02/01/2019 00:00:00', 'MM/DD/YYYY HH24:MI:SS') AND TCC.ISSUE_DT + 2/24 <= TO_DATE('02/15/2019 11:59:59', 'MM/DD/YYYY HH24:MI:SS')
UNION
SELECT DISTINCT IR.FIELD2 as UNIT_NO,
VUDC.USING_DEPT_NO AS USE_DEPT,
TO_DATE(IR.FIELD1, 'MM/DD/YYYY HH24:MI:SS') + 2/24 AS TRANS_DT,
IR.METER as ODOMETER,
'FAIL' AS STATUS,
NVL2(IR.FIELD27, CONCAT('XXXX-XXXX-XXXX-', SUBSTR(IR.FIELD27,-4)),'') as CARD_NO,
SUBSTR(NVL2(IR.FIELD27, CONCAT('XXXX-XXXX-XXXX-', SUBSTR(IR.FIELD27,-4)),''), 16, 4) || TO_CHAR(TO_DATE(IR.FIELD1, 'MM/DD/YYYY HH24:MI:SS') + 2/24, 'MM-DD-YYYY HH24:MI:SS') AS KEYSTRING,
FROM INTERFACE_REJECT IR
INNER JOIN INTERFACE_STAT ST ON IR.COMPANY = ST.COMPANY and IR.STAT_ID = ST.STAT_ID
LEFT OUTER JOIN EMP_MAIN E ON IR.COMPANY = E.COMPANY AND IR.FIELD29 = E.TRIPCARD_PIN
LEFT OUTER JOIN VIEW_UNIT_DEPT_COMP VUDC ON IR.COMPANY = VUDC.COMPANY and IR.FIELD2 = VUDC.UNIT_NO
WHERE LENGTH(IR.FIELD1) = 19 AND ST.INTERFACE_NAME = 'M5-TRIP-CARD-INTF' AND TO_DATE(IR.FIELD1, 'MM/DD/YYYY HH24:MI:SS') + 2/24 >=TO_DATE('02/01/2019 00:00:00', 'MM/DD/YYYY HH24:MI:SS') AND TO_DATE(IR.FIELD1, 'MM/DD/YYYY HH24:MI:SS') + 2/24 <= TO_DATE('02/15/2019 11:59:59', 'MM/DD/YYYY HH24:MI:SS')
EXTERNAL JOIN Command_1.KEYSTRING={?I:\Dept\DCS\MPOOL\Fleet Management Data\M5\M5 Automation Data Tables\ComData Transaction Data.xls: Command.KEYSTRING}

Are you sure the join works? If the join doesn't work then you will get nulls and my guess is that this is what is happening. Try to use INNER JOIN instead of Lef join and check if there are any rows returned. If records are returned you may need to cast the values to the same type and trim them. It is possible that the value returned by excel has empty spaces or different value type, which Crystal converts incorrectly

Related

Using a query with multiple results in a subsequent query - Google Sheets

I have a query that returns 2 or more results.
I want to be able to pass ALL the results, one-by-one to power another query. I decided to embed the queries.
=IFERROR(QUERY('Sheet1'!A1:J36,"SELECT J,B,C,D,E,F,G,H WHERE J Contains '" & B11 & "' AND B='" & QUERY('Sheet2'!$A$3:$AR$103,"SELECT D WHERE " & VLOOKUP($B11,'Test Sheet'!$A$33:$B$43,2,FALSE) & "='Yes' AND (AR='High' OR AR='Low') ORDER BY AH desc" ,0) & "' ORDER BY G desc" ,1),"No Results")
The query as a whole runs successfully, however only the first result of the initial query is passed to the outer query. This means I don't get all the matches I am expecting.
Is there a way of accomplishing this?
I think this is what your'e trying to do, on the MK.Help tab in cell A74.
=IFERROR(QUERY('Interventions V0.2'!A1:J39,"SELECT J,B,C,D,E,F,G,H WHERE J Contains '" & B11 & "' AND B matches'" & JOIN("|",QUERY('Biomarker Ref. Sheet (Static)'!$A$3:$AR$103,"SELECT D WHERE " & VLOOKUP($B11,'Test Sheet'!$A$33:$B$43,2,FALSE) & "='Yes' AND (AR='High' OR AR='Low')",0)) & "' ORDER BY G desc" ,1),"No Results")

Getting incomplete data when running 6 ignite servers

I am running 6 Ignite servers on version 2.7.5. The problem is when I am hitting queries using my client API I am not getting all records. Only some records are coming. I am using partitioned cache. I don't want to use replicated mode. When queried with DBeaver it show all records have been fetched.
The following code is used to fetch the data:
public List<Long> getGroupIdsByUserId(Long createdBy) {
final String query = "select g.groupId from groups g where g.createdBy = ? and g.isActive = 1";
SqlFieldsQuery sql = new SqlFieldsQuery(query);
sql.setArgs(createdBy);
List<List<?>> rsList = groupsCache.query(sql).getAll();
List<Long> ids = new ArrayList<>();
for (List<?> l : rsList) {
ids.add((Long)l.get(0));
}
return ids;
}
Ignite Version - 2.7.5
Client Query method
And the join Query is :
final String query = "select distinct u.userId from
groupusers gu "
+ "inner join \"GroupsCache\".groups g on gu.groupId = g.groupId
"
+ "inner join \"OrganizationsCache\".organizations o on
gu.organizationId = o.organizationId "
+ "inner join \"UsersCache\".users u on gu.userId = u.userId
where " + "g.groupId = ? and "
+ "g.isActive = 1 and " + "gu.isActive = 1 and " +
"gu.createdBy
= ? and " + "o.organizationId = ? and "
+ "o.isActive = 1 and " + "u.isActive = 1";
For the join query Actual records in db is 120 but with ignite client only 3-4 records are comming .and they are not consistent. sometime it comes 3 records and some time it is 4 records. And for query
select g.groupId from groups g where g.createdBy = ? and g.isActive = 1
actual records are 27 but comming records are sometimes 20 sometimes 19 and sometimes complete. Please Help me with this and with collocated joins..
Most likely this would mean that your affinity is incorrect.
Apache Ignite assumes that your data has proper affinity, i.e. when joining two tables, rows to join will always be available on the same node. This works when you either join by primary key, or by a part of primary key which is marked as affinity column (e.g. by #AffinityKeyMapped annotation). There's a documentation page about affinity.
You can check that by setting distribtedJoins connection setting to true. If you see all the records after that, it means you need to fix your affinity.

How do I optimize this SQL query comparing a value in two possible columns?

I've written a SQL query that uses six tables to construct the output for a separate C# program, and I'm looking for a way to speed up the search.
I walked through the execution plan and I notice that one spot in particular is taking up 85% of the execution time, labeled with a comment in the code block below with --This spot right here.
select distinct
ta.account_num as 'Account',
tl.billing_year as 'Year',
tl.billing_month as 'Month',
ta.bill_date as 'Bill Date',
DATEDIFF(DD, cast(cast(tl.billing_year as varchar(4)) + right('0' + cast(tl.billing_month as varchar(2)), 2) + right('0' + (case when billing_month in (4,6,9,11) and bill_date > 30 then '30' when billing_month = 2 and bill_date > 28 then '28' else cast(bill_date as varchar(2)) end), 2) as datetime), GETDATE()) as 'Past',
DATEADD(Day,10,d) as 'To be Loaded Before',
p.provider_name as 'Provider',
c.client as 'Client',
tip.invoice_load_type as 'Load Type'
from
tm_invoice_load tl
inner join
tm_client c on tl.client_id = c.client_id
inner join
tm_client_account ta on (ta.account_num = tl.account_num or ta.pilot = tl.account_num) --This spot right here
inner join
provider p on p.id_provider = ta.id_provider
inner join
tm_calendar cal on DATEPART(DAY, d) = DATEPART(DAY, entry_dt)
and DATEPART(MONTH, d) = DATEPART(MONTH, entry_dt)
and DATEPART(YEAR, d) = DATEPART(YEAR, entry_dt)
inner join
tm_invoice_load_type tip on tip.invoice_load_type_id = ta.invoice_load_type_id
where
not exists (select top 1 id
from tm_invoice_load
where billing_year = tl.billing_year
and billing_month = tl.billing_month
and status_id = 1
and (account_num = ta.account_num or account_num = ta.pilot))
and ta.status_id = 1
--and ta.invoice_load_type_id = 2
and tl.status_id = 2
and (ta.pilot is null or ta.account_num <> ta.pilot)
order by
c.client, p.provider_name, ta.account_num, tl.billing_year, tl.billing_month
Above, it's when joining tm_client_account, where it has an account number column, and a pilot in case it is a child to another account. When such a thing happens, the parent account is NOT selected (ta.pilot is null or ta.account_num <> ta.pilot), and instead the child accounts are shown.
The query works exactly as intended, but it's kinda slow, and as these tables grow (and they are doing so on a nearly exponential curve) it will only get worse.
Is there some way that I can accomplish this join in a faster way? Even small gains would be great!
If it helps, I'm running this on SQL Server 2008 R2.
Here is a screenshot of the execution plan. If needed, I can provide more/different information.
I don't think there is anything really wrong with the query. I tend to keep
things like ta.status_id = 1 inside the relevant (ta) JOIN clause rather than in the WHERE, but the query optimizer is smart enough to handle that.
The one thing that I'd suggest to change is this one:
inner join
tm_calendar cal on DATEPART(DAY, d) = DATEPART(DAY, entry_dt)
and DATEPART(MONTH, d) = DATEPART(MONTH, entry_dt)
and DATEPART(YEAR, d) = DATEPART(YEAR, entry_dt)
Try to replace it with this and see what happens:
inner join
tm_calendar cal on Convert(date, d) = Convert(date, entry_dt)
The result should be identical but this way the system can still use the index (and statistics) on the d and/or entry_dt fields. Convert(date,...) is one of the few SARG-able converts there are.
Apart from that it all depends on the indexes available, the amount and distribution of the data.
UPDATE: as you mention that the heavy cost (85%) seems to come from the ta part, here some thoughts
I've taken the liberty of reformatting the query a bit so it makes more sense to me. Mainly I've grouped the relevant parts together so it's more clear (to me) what's being done on what table.
SELECT DISTINCT
ta.account_num as 'Account',
tl.billing_year as 'Year',
tl.billing_month as 'Month',
ta.bill_date as 'Bill Date',
DATEDIFF(DD, cast(cast(tl.billing_year as varchar(4)) + right('0' + cast(tl.billing_month as varchar(2)), 2) + right('0' + (case when billing_month in (4,6,9,11) and bill_date > 30 then '30' when billing_month = 2 and bill_date > 28 then '28' else cast(bill_date as varchar(2)) end), 2) as datetime), GETDATE()) as 'Past',
DATEADD(Day,10,d) as 'To be Loaded Before',
p.provider_name as 'Provider',
c.client as 'Client',
tip.invoice_load_type as 'Load Type'
FROM tm_invoice_load tl
JOIN tm_client c
ON tl.client_id = c.client_id
JOIN tm_client_account ta
ON (ta.account_num = tl.account_num or ta.pilot = tl.account_num) --This spot right here
AND ta.status_id = 1
AND (ta.pilot IS NULL OR ta.account_num <> ta.pilot)
--AND ta.invoice_load_type_id = 2
JOIN provider p
ON p.id_provider = ta.id_provider
JOIN tm_invoice_load_type tip
ON tip.invoice_load_type_id = ta.invoice_load_type_id
JOIN tm_calendar cal
ON Convert(date, cal.d) = Convert(date, tl.entry_dt)
WHERE tl.status_id = 2
AND NOT EXISTS (SELECT *
FROM tm_invoice_load xx
WHERE xx.billing_year = tl.billing_year
AND xx.billing_month = tl.billing_month
AND xx.status_id = 1
AND (xx.account_num = ta.account_num OR xx.account_num = ta.pilot))
ORDER BY c.client, p.provider_name, ta.account_num, tl.billing_year, tl.billing_month
Could you give the query as it is here a try and time it + check the query plan?
Focussing on tm_client_account the next logical step would be to add a specific index on this table for this query.
The problem is that the filtered index capabilities of MSSQL are ... well, challenging =)
The closest I get is this:
CREATE INDEX idx_test ON tm_client_account (account_num) INCLUDE (pilot, bill_date, id_provider, invoice_load_type_id) WHERE (status = 1)
Could you create the index and see if that helps?
If the query-plan shows that rather continues using a table-scan on tm_client_account then add WITH (INDEX = idx_test) to the JOIN clause and try again.
If all that doesn't really cut down on the time the query takes you could try pre-filtering things by means of an indexed view.
WARNING: Keep in mind though that adding indexes to the table will cause some performance degradation when doing INSERT/UPDATE/DELETE on the table; adding a filtered view on it will double-so! If the data in this table (very) volatile, the indexes and/or view might make other parts of your system noticably slower. Then again, SQL is pretty good at this so the only way to know for sure is by testing.
Step 1: create the view
CREATE VIEW v_test_tm_client_account_filtered
WITH SCHEMABINDING
AS
SELECT id_client_account, -- I am assuming that there is an identity-field like this on the table, adapt as needed!
account_num,
pilot,
bill_date,
id_provider,
invoice_load_type_id
FROM tm_client_account ta
WHERE status = 1
AND ta.status_id = 1
AND (ta.pilot IS NULL OR ta.account_num <> ta.pilot)
--AND ta.invoice_load_type_id = 2
Step 2: index the view
CREATE UNIQUE CLUSTERED INDEX idx_test ON v_test_tm_client_account_filtered (account_num, pilot, id_client_account)
Step 3: adapt the query to use the view rather than the table directly
JOIN tm_client_account ta
ON (ta.account_num = tl.account_num or ta.pilot = tl.account_num) --This spot right here
AND ta.status_id = 1
AND (ta.pilot IS NULL OR ta.account_num <> ta.pilot)
--AND ta.invoice_load_type_id = 2
becomes
JOIN v_test_tm_client_account_filtered ta
ON ta.account_num = tl.account_num
OR ta.pilot = tl.account_num
And then run the query once more...
In case someone comes across this with a similar issue, the only way I've been able to make this any faster has been to tell the query not to lock the data. I don't know why this helped, but it cut the query time in less than half. This has had no effect on the returned data, and explicitly sorting it makes it unapparent.
select distinct
ta.account_num as 'Account',
tl.billing_year as 'Year',
tl.billing_month as 'Month',
ta.bill_date as 'Bill Date',
DATEDIFF(DD, cast(cast(tl.billing_year as varchar(4)) + right('0' + cast(tl.billing_month as varchar(2)), 2) + right('0' + (case when billing_month in (4,6,9,11) and bill_date > 30 then '30' when billing_month = 2 and bill_date > 28 then '28' else cast(bill_date as varchar(2)) end), 2) as datetime), GETDATE()) as 'Past',
DATEADD(Day,10,d) as 'To be Loaded Before',
p.provider_name as 'Provider',
c.client as 'Client',
tip.invoice_load_type as 'Load Type'
from
tm_invoice_load tl
with (nolock) --This accelerates the select statement!
inner join
tm_client c on tl.client_id = c.client_id
[...]
Meanwhile a computed column was added, and this query is still down from 7-8 seconds to 2.5-3.5 seconds. Again, I don't know why, but it helped. It might help someone else too.

PSQL - Select size of tables for both partitioned and normal

Thanks in advance for any help with this, it is highly appreciated.
So, basically, I have a Greenplum database and I am wanting to select the table size for the top 10 largest tables. This isn't a problem using the below:
select
sotaidschemaname schema_name
,sotaidtablename table_name
,pg_size_pretty(sotaidtablesize) table_size
from gp_toolkit.gp_size_of_table_and_indexes_disk
order by 3 desc
limit 10
;
However I have several partitioned tables in my database and these show up with the above sql as all their 'child tables' split up into small fragments (though I know they accumalate to make the largest 2 tables). Is there a way of making a script that selects tables (partitioned or otherwise) and their total size?
Note: I'd be happy to include some sort of join where I specify the partitoned table-name specifically as there are only 2 partitioned tables. However, I would still need to take the top 10 (where I cannot assume the partitioned table(s) are up there) and I cannot specify any other table names since there are near a thousand of them.
Thanks again,
Vinny.
Your friends would be pg_relation_size() function for getting relation size and you would select pg_class, pg_namespace and pg_partition joining them together like this:
select schemaname,
tablename,
sum(size_mb) as size_mb,
sum(num_partitions) as num_partitions
from (
select coalesce(p.schemaname, n.nspname) as schemaname,
coalesce(p.tablename, c.relname) as tablename,
1 as num_partitions,
pg_relation_size(n.nspname || '.' || c.relname)/1000000. as size_mb
from pg_class as c
inner join pg_namespace as n on c.relnamespace = n.oid
left join pg_partitions as p on c.relname = p.partitiontablename and n.nspname = p.partitionschemaname
) as q
group by 1, 2
order by 3 desc
limit 10;
select * from
(
select schemaname,tablename,
pg_relation_size(schemaname||'.'||tablename) as Size_In_Bytes
from pg_tables
where schemaname||'.'||tablename not in (select schemaname||'.'||partitiontablename from pg_partitions)
and schemaname||'.'||tablename not in (select distinct schemaname||'.'||tablename from pg_partitions )
union all
select schemaname,tablename,
sum(pg_relation_size(schemaname||'.'||partitiontablename)) as Size_In_Bytes
from pg_partitions
group by 1,2) as foo
where Size_In_Bytes >= '0' order by 3 desc;

convert two querys to one in YQL

can i do JOIN with two querys with yql, i have two querys:
select *
from yahoo.finance.historicaldata
where symbol in ('YHOO')
and startDate='" + startDate + "'
and endDate='" + endDate + "'&format=json&diagnostics=true&env=store://datatables.org/alltableswithkeys&callback="
and
select symbol,
Earnings_per_Share,
Dividend_Yield,
week_Low,
Week_High,
Last_Trade_Date,
open,
low,
high,
volume,
Last_Trade
from csv where url="http://download.finance.yahoo.com/d/quotes.csv?s=YHOO,GOOG&f=seyjkd1oghvl1&e=.csv"
and columns="symbol,Earnings_per_Share,Dividend_Yield,Last_Trade_Date,week_Low,Week_High,open,low,high,volume,Last_Trade"
i need to convert this two querys to one. how to do this?

Resources