Resulting KSQL join stream shows no data - ksqldb

I am joining a KSQL stream and a KSQL Table. Both are mapped to same key.
But no data is coming to the resulting stream.
create stream kz_yp_loan_join_by_bandid WITH (KAFKA_TOPIC='kz_yp_loan_join_by_bandid',VALUE_FORMAT='AVRO') AS
select ypl.loan_id, ypl.userid ,ypk.name as user_band_id_name
FROM kz_yp_loan_stream_partition_by_bandid ypl
INNER JOIN kz_yp_key_table ypk
ON ypl.user_band_id = ypk.id;
No data is in stream kz_yp_loan_join_by_bandid
But if I do simply :
select ypl.loan_id, ypl.userid ,ypk.name as user_band_id_name
FROM kz_yp_loan_stream_partition_by_bandid ypl
INNER JOIN kz_yp_key_table ypk
ON ypl.user_band_id = ypk.id;
There is data present.
It shows that stream is not written but why is it so?
I have tried doing entire setup again.

A few things to check:
If you want to process all the existing data as well as new data, make sure that before you run your CREATE STREAM … AS SELECT ("CSAS") you have run SET 'auto.offset.reset' = 'earliest';
If the join is returning data when run outside of the CSAS then this may not be relevant, but always good to check your join is going to match all the requirements
Check the KSQL server log in case there's an issue with writing to the target topic, creating the schema on the Schema Registry, etc.
These references will be useful:
https://www.confluent.io/blog/troubleshooting-ksql-part-1
https://www.confluent.io/blog/troubleshooting-ksql-part-2

Related

TFDQuery failing to update?

I'm having a problem with a synchronisation issue... I have a source table (mtAllowanceCategory) which I want to update to a copy (qryAllowanceCategory) of it. To make sure records in the copy are deleted if they are no longer present in the source, the copy has a "StillHere" boolean field, which is set to on when the record is added or updated and otherwise stays off. Afterwards, all records with StillHere=false are deleted.
That's the idea, anyway... in practice, the flag fields isn't turned on when posting updates. When I trace the code, the statement is executed; when I look in Access, it stays off. Hence the delete SQL afterwards clears the entire table.
Been trying to figure this for hours now; what am I missing??
mtAllowanceCategory:TFDMemTable (filled from an API call, this works fine)
qryAllowanceCategory:TFDQuery
conn:TFDConnection to a local Access database (also used for qryAllowanceCategory)
conn.ExecSQL('UPDATE AllowanceCategory SET StillHere=false;');
while not mtAllowanceCategory.eof do
begin
if qryAllowanceCategory.locate('WLPid',mtAllowanceCategory.FieldByName('Id').AsString,[loCaseInsensitive]) then
begin
Updating:=true;
qryAllowanceCategory.Edit;
end
else
begin
Updating:=false;
qryAllowanceCategory.Insert;
end;
qryAllowanceCategory.fieldbyname('createdBy').AsString:=mtAllowanceCategory.FieldByName('createdBy').AsString;
qryAllowanceCategory.fieldbyname('createdOn').AsString:=mtAllowanceCategory.FieldByName('createdOn').AsString;
qryAllowanceCategory.fieldbyname('description').AsString:=mtAllowanceCategory.FieldByName('description').AsString;
qryAllowanceCategory.fieldbyname('WLPid').AsString:=mtAllowanceCategory.FieldByName('id').AsString;
qryAllowanceCategory.fieldbyname('isDeleted').Asboolean:=mtAllowanceCategory.FieldByName('isDeleted').Asboolean;
qryAllowanceCategory.fieldbyname('isInUse').Asboolean:=mtAllowanceCategory.FieldByName('isInUse').Asboolean;
qryAllowanceCategory.fieldbyname('modifiedBy').AsString:=mtAllowanceCategory.FieldByName('modifiedBy').AsString;
qryAllowanceCategory.fieldbyname('modifiedOn').AsString:=mtAllowanceCategory.FieldByName('modifiedOn').AsString;
qryAllowanceCategory.fieldbyname('WLPname').AsString:=mtAllowanceCategory.FieldByName('name').AsString;
qryAllowanceCategory.fieldbyname('number').AsInteger:=mtAllowanceCategory.FieldByName('number').AsInteger;
qryAllowanceCategory.fieldbyname('percentage').AsFloat:=mtAllowanceCategory.FieldByName('number').AsFloat;
qryAllowanceCategory.fieldbyname('remark').AsString:=mtAllowanceCategory.FieldByName('remark').AsString;
qryAllowanceCategory.fieldbyname('LocalEdited').AsBoolean:=false;
qryAllowanceCategory.fieldbyname('LocalInserted').AsBoolean:=false;
qryAllowanceCategory.fieldbyname('LocalDeleted').AsBoolean:=false;
qryAllowanceCategory.fieldbyname('StillHere').AsBoolean:=true;
qryAllowanceCategory.Post;
mtAllowanceCategory.next;
end;
conn.commit;
conn.ExecSQL('DELETE FROM AllowanceCategory WHERE StillHere=false;');
When I read your q, I was struck by two thoughts:
One was that I couldn't immediately
see the cause of your problem and the other that you could probably avoid the problem anyway
if you used Sql rather than table traversals in code.
It seemed to me that you might be able to do most
if not all of what you need, in terms of synchronising the two tables, using Access
Sql rather than traversing the qryAllowanceCategory table using a while not EOF loop.
(btw, in the following I'm going to use 'mtAC' and qryAC to reduce typing & typos)
Using Access SQL
Initially, I did not have much luck, as Access rejected my attempts to
refer to both tables in an Update statement against the qryAC one using a Join
or Outer Join, but then I came across a reference that showed that Access does
support an Inner Join syntax. These SQL statements execute successfully by calling
ExecSQL on the FireDAC connection to the database:
update qryAC set qryAC.StillHere = True
where exists(select mtAC.* from mtAC inner join qryAC on mtAC.WLPid = qryAC.WLPid)
and
update qryAC inner join mtAC on mtAC.WLPid = qryAC.WLPid set qryAC.AValue = mtAC.AValue
This first of these obviously provides a way to update the StillHere field to set it to True,
or False with a trivial modification.
The second shows a way to update a set of fields in qryAC from the matching rows in mtAC
and this could, of course, be limited to a subset of rows with a suitable Where clause.
Access Sql also supports checking whether a row in one table exists in the other, as in
select * from qryAC q where exists (select * from mtac m where q.wlpid = m.wlpid)
and for deleting rows in one table which do not exist in the other
delete from qryAC q where not exists (select * from mtac m where q.wlpid = m.wlpid)
Using FireDAC's LocalSQL
I also mentioned LocalSQL in a comment. This supports a far broader range
of Sql statements that native Access Sql and can operate on any TDataSet descendant,
so if you find something that Access Sql syntax doesn't support, it is worth considering
using LocalSQL instead. Its main downside is that it operates on the datasets using
traversals, so in not quite as "instant" as native Sql. It can be a bit tricky to set up,
so here are the settings from the DFM which show how the components need connecting up. You would use it by feeding what you want to FDQuery1.
object AccessConnection: TFDConnection
Params.Strings = (
'Database=D:\Delphi\Code\FireDAC\LocalSQL\Allowance.accdb'
'DriverID=MSAcc')
Connected = True
LoginPrompt = False
end
object mtAC: TFDQuery
AfterOpen = mtACAfterOpen
Connection = AccessConnection
SQL.Strings = (
'select * from mtAC')
end
object qryAC: TFDQuery
Connection = AccessConnection
end
object LocalSqlConnection: TFDConnection
Params.Strings = (
'DriverID=SQLite')
Connected = True
LoginPrompt = False
end
object FDLocalSQL1: TFDLocalSQL
Connection = LocalSqlConnection
DataSets = <
item
DataSet = mtAC
end
item
DataSet = qryAC
end>
end
object FDGUIxWaitCursor1: TFDGUIxWaitCursor
Provider = 'Forms'
end
object FDPhysSQLiteDriverLink1: TFDPhysSQLiteDriverLink
end
object FDQuery1: TFDQuery
Connection = LocalSqlConnection
end
If anyone is interested:
The problem was in not refreshing qryAllowanceCategory after the initial SQL setting StillHere to false. The memory version (qryAllowanceCategory) of the record didn't get that update, so according to him, the flag was still on; after the field updates it appeared there were no changes (all the other fields were unchanged as well) so the post was ignored. In the actual table it was off though, so the final delete SQL removed it.
The problem was solved by adding a refresh after the first UPDATE SQL statement.

Can you aggregate denormalized parse-server query results in one statement using Swift?

My experience is with SQL but I am working on learning parse server data management and in the example below I demonstrate how I would use SQL to represent the data I currently have stored in my parse server classes. I am trying to present all the users, the count of how many images they have uploaded, and a count of how many images they have liked for an app where users can upload images and they can also scroll through and like other people's images. I store the id of the user who uploads the image on the image table and I store an array column in the image table of all the ids that have liked it.
Using SQL I would have normalized this into 3 tables (user, image, user_x_image), joined the tables, and then aggregated that result. But I am trying to learn the right way to do this using parse server where my understanding is that the best practice is to structure the data the way I have below. What I want to do is produce a "leader board" that presents which users have uploaded the most images or liked the most images to inspire engagement. Even justy links to examples of how to join/aggregate parse data sets would be very helpful. If I wasn't clear in what I am trying to achieve please let me know if the comments and I will add updates.
-- SQL approximation of data structured in parse
create volatile table users
( user_id char(10)
, user_name char(50)
) on commit preserve rows;
insert into users values('1a','Tom');
insert into users values('2b','Dick');
insert into users values('3c','Harry');
insert into users values('4d','Simon');
insert into users values('5e','Garfunkel');
insert into users values('6f','Jerry');
create volatile table images
( image_id char(10)
, user_id_owner char(10) -- The object Id for the parse user that uploaded
, UsersWhoLiked varchar(100) -- in Parse class this is array of user ids that clicked like
) on commit preserve rows;
insert into images values('img01','1a','["4d","5e"]');
insert into images values('img02','6f','["1a","2b","3c"]');
insert into images values('img03','6f','["1a","6f",]');
-----------------------------
-- DESIRED RESULTS
-- Tom has 1 uploads and 2 likes
-- Dick has 0 uploads and 1 likes
-- Harry has 0 uploads and 1 likes
-- Simon has 0 uploads and 1 likes
-- Garfunkel has 0 uploads and 1 likes
-- Jerry has 2 uploads and 1 likes
-- How to do with normalized data structure
create volatile table user_x_image
( user_id char(10)
, image_id char(10)
, relationship char(10)
) on commit preserve rows;
insert into user_x_image values('4d','img01','liker');
insert into user_x_image values('5e','img01','liker');
insert into user_x_image values('1a','img02','liker');
insert into user_x_image values('2b','img02','liker');
insert into user_x_image values('3c','img02','liker');
insert into user_x_image values('1a','img03','liker');
insert into user_x_image values('6f','img03','liker');
-- Return the image likers/owners
sel
a.user_name
, a.user_id
, coalesce(c.cnt_owned,0) cnt_owned
, sum(case when b.relationship='liker' then 1 else 0 end) cnt_liked
from
users A
left join
user_x_image B
on a.user_id = b.user_id
left join (
sel user_id_owner, count(*) as cnt_owned
from images
group by 1) C
on a.user_id = c.user_id_owner
group by 1,2,3 order by 2
-- Returns desired results
First, I am assuming you are running Parse Server with a MongoDB database (Parse Server also supports Postgres and it can make things little bit easier for relational queries). Because of this, it is important to note that, besides Parse Server implements relational capabilities in its API, in fact we are talking about a NoSQL database behind the scenes. So, let's go with the options.
Option 1 - Denormalized Data
Since it is a NoSQL database, I'd prefer to have a third collection called LeaderBoard. You could add a afterSave trigger to the UserImage class and make LeaderBoard always updated. When you need the data, you can do a very simple and fast query. I know that it sounds kinda strange for a experienced SQL developer to have a denormalized data, but it is the best option in terms of performance if you have more reads than writes in this collection.
Option 2 - Aggregate
MongoDB supports aggregates (https://docs.mongodb.com/manual/aggregation/) and it has a pipeline stage called $lookup (https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/) that you can use in order to perform your query using a single api call/database operation. Parse Server supports aggregates as well in its API and JS SDK (https://docs.parseplatform.org/js/guide/#aggregate) but unfortunately not directly from client code in Swift because this operation requires a master key in Parse server. Therefore, you will need to write a cloud code function that performs the aggregate query for you and then call this cloud cloud function from your Swift client code.

Left join table on multiple tables in SAS

I've got multiple master tables in the same format with the same variables. I now want to left join another variable but I can't combine the master tables due to limited storage on my computer. Is there a way that I can left join a variable onto multiple master tables within one PROC SQL? Maybe with the help of a macro?
The LEFT JOIN code looks like this for one join but I'm looking for an alternative than to copy and paste this 5 times:
PROC SQL;
CREATE TABLE New AS
SELECT a.*, b.Value
FROM Old a LEFT JOIN Additional b
ON a.ID = b.ID;
QUIT;
You can't do it in one create table statement, as it only creates one table at a time. But you can do a few things, depending on what your actual limiting factor is (you mention a few).
If you simply want to avoid writing the same code five times, but otherwise don't care how it executes, then just write the code in a macro, as you reference.
%macro update_table(old=, new=);
PROC SQL;
CREATE TABLE &new. AS
SELECT a.*, b.Value
FROM &old. a LEFT JOIN Additional b
ON a.ID = b.ID;
QUIT;
%mend update_table;
%update_table(old=old1, new=new1)
%update_table(old=old2, new=new2)
%update_table(old=old3, new=new3)
Of course, if the names of the five tables are in a pattern, you can perhaps automate this further based on that pattern, but you don't give sufficient information to figure that out.
If you on the other hand need to do this more efficiently in terms of processing than running the SQL query five times, it can be done a number of ways, depending on the specifics of your additional table and your specific limitations. It looks to me that you have a good use case for a format lookup here, for example; see for example Jenine Eason's paper, Proc Format, a Speedy Alternative to Sort/Merge. If you're just merging on the ID, this is very easy.
data for_format;
set additional;
start = ID;
label = value;
fmtname='AdditionalF'; *or '$AdditionalF' if ID is character-valued;
output;
if _n_=1 then do; *creating an "other" option so it returns missing if not found;
hlo='o';
label = ' ';
output;
end;
run;
And then you just have five data steps with a PUT statement adding the value, or even you could simply format the ID variable with that format and it would have that value whenever you did most PROCs (if this is something like a classifier that you don't truly need "in" the data).
You can do this in a single pass through the data in a Data Step using a hash table to lookup values.
data new1 new2 new3;
set old1(in=a) old2(in=b) old3(in=c);
format value best.;
if _n_=1 then do;
%create_hash(lk,id,value,"Additional");
end;
value = .;
rc = lk.find();
drop rc;
if a then
output new1;
else if b then
output new2;
else if c then
output new3;
run;
%create_hash() macro available here.
You could, alternatively, use Joe's format with the same Data Step syntax.

Joining two database sets with different datatypes

Trying to join two datasets, but the join is based on two different data types (numeric and text)
SELECT *
FROM D1.T1 c
INNER JOIN
D1.T2 d
on c.CNUMBER=INPUT(d.CNUMBER, 8.) ;
This is does not work.
I can create a new dataset (copy existing one and add a numerical column) like this:
CNUMBER1=CNUMBER*1;
run;
Then when I join using this copy, it works... but I actually want to try to figure out the way to do it with direct Oracle connection.
In Oracle I would do:
on to_char(c.CNUMBER)=to_char(c.CNUMBER)
Taking a wild guess at what you actually want:
PROC SQL;
CONNECT TO ORACLE (...);
CREATE TABLE oracle_results AS
SELECT * FROM CONNECTION TO ORACLE (
SELECT *
FROM D1.T1 c
INNER JOIN
D1.T2 d
on to_char(c.CNUMBER)=d.CNUMBER);
DISCONNECT FROM ORACLE;
QUIT;
Will connect your SAS session to Oracle, perform the explicit passthrough SQL query and pass the results back to the SAS table oracle_results. Replace the dots with your Oracle connection credentials.

using SQL aggregate functions with JOINs

I have two tables - tool_downloads and tool_configurations. I am trying to retrieve the most recent build date for each tool in my database. The layout of the DB is simple. One table called tool_downloads keeps track of when a tool is downloaded. Another table is called tool_configurations and stores the actual data about the tool. They are linked together by the tool_conf_id.
If I run the following query which omits dates, I get back 200 records.
SELECT DISTINCT a.tool_conf_id, b.tool_conf_id
FROM tool_downloads a
JOIN tool_configurations b
ON a.tool_conf_id = b.tool_conf_id
ORDER BY a.tool_conf_id
When I try to add in date information I get back hundreds of thousands of records! Here is the query that fails horribly.
SELECT DISTINCT a.tool_conf_id, max(a.configured_date) as config_date, b.configuration_name
FROM tool_downloads a
JOIN tool_configurations b
ON a.tool_conf_id = b.tool_conf_id
ORDER BY a.tool_conf_id
I know the problem has something to do with group-bys/aggregate data and joins. I can't really search google since I don't know the name of the problem I'm encountering. Any help would be appreciated.
Solution is:
SELECT b.tool_conf_id, b.configuration_name, max(a.configured_date) as config_date
FROM tool_downloads a
JOIN tool_configurations b
ON a.tool_conf_id = b.tool_conf_id
GROUP BY b.tool_conf_id, b.configuration_name

Resources