SymmetricDS Two Way Synchronization Double Insert - data-synchronization

I have server node and client node. They do two way synchronization on a table and also both have sync_on_incoming_batch = 1.
Let say, table structure is (id, name).
The scenario is:
server insert data (1, 'a')
client insert data (1, 'b')
server send batch of (1, 'a') to client
client send batch of (1, 'b') to server
Now, server has data (1, 'b') and client has data (1, 'a')
Questions are:
After server received (1, 'b'), why server cannot route again the data to client? It's detected by node_id = -1 in sym_outgoing_batch on server. Vice versa on client.
How to sync data based on most recent data? So in this case, the result is (1, 'b') in all node.

sync_on_incoming_batch is telling symmetricDs not to route back data to the source node
add conflict detection with resolution strategy newer wins

Related

DB2: Procedure with a simple select return more than 15x slow compared to the same select in a query

I have a select that return a single column, using as where clause 4 columns, and I created a specific index for this query. When I do this select using Dbeaver, the result return in 30~50 milliseconds.
SELECT
column1
FROM
myTable
WHERE
column2 = a
AND column3 = b
AND (column4 = c AND column5 = d )
FOR READ ONLY WITH UR;
Now I created a procedure with this same simple select. The proc declare a P1, a 'cursor with return', do the select, open the cursor and close the P1. When I use the same dbeaver connection the result is returning between 1.2 ~ 1.6 seconds.
CREATE PROCEDURE myProc (
IN a DECIMAL(10),
IN b DECIMAL(10),
IN c DECIMAL(10),
IN d DECIMAL(10)
)
LANGUAGE SQL
DYNAMIC RESULT SETS 1
SPECIFIC myProc
P1: BEGIN
DECLARE cursor1 CURSOR WITH RETURN for
SELECT
column1
FROM
myTable
WHERE
column2 = a
AND column3 = b
AND (column4 = c AND column5 = d )
FOR READ ONLY WITH UR;
OPEN cursor1;
END P1
Is this huge return difference correct? If wrong, Is there something wrong in my procedure that justifies this return time difference? Or could be something wrong in the DB configuration or the server that justifies this difference, something like few resources in the DB server or something like the proc not using the index( I don't know if procs in the DB2 use the index by default like the queries )?
I am new in DB2 and new in procedures creation. Thank you in advance.
Best regards,
Luis
I don't know if is the best way, but I solved the problem using a solution that I read for SQL Server. The solution is create local variables that will receive the parameters values, and use the variable in the queries, to guarantee always the best execution plan. I didn't knew if this was valid to DB2, but my proc now has almost the same response time compared to query. Worked!
Link of SQL Server post: SQL Server: Query fast, but slow from procedure
In this link above a user called #Jake give this explanation:
"The reason this happens is because the procedures query plan is being cached, along with the parameters that were passed to it. On subsequent calls, this query plan generated will be reused with new parameters. This can cause problems because if the data is unevenly distributed, one parameter can generate a sub-optimal plan vs. another. Using local variables essentially does the same as OPTIMIZE FOR UNKNOWN because local variables cannot be sniffed."
I think that is the same for DB2, because worked. After I change these old procedures to use local variables my execution plan begun to use the indexes recently created

Auto-incrementing column in ksqlDB?

I'm currently using this process (see below) to generate an auto-incrementing column in ksqlDB. But now I'm wondering if there are race conditions or other synchronization problems with this approach. Is this a good way to generate an auto-incrementing column in ksqlDB? If not, is there a better way?
Suppose you want to insert values from one ksqlDB stream into another while auto-incrementing some integer value in the
destination stream.
First, create the two streams:
CREATE STREAM dest (ROWKEY INT KEY, i INT, x INT) WITH (kafka_topic='test_dest', value_format='json', partitions=1);
CREATE STREAM src (x INT) WITH (kafka_topic='test_src', value_format='json', partitions=1);
Next, create a materialized view that will contain the maximum value of the destination stream.
CREATE TABLE dest_maxi AS SELECT MAX(i) AS i FROM dest GROUP BY 1;
We need to be able to join the source stream to the materialized view. To do so, we'll create another intermediate stream
with a dummy one column that's always set to 1, which is what we grouped the materialized view on:
CREATE STREAM src_one AS SELECT x, 1 AS one FROM src;
INSERT INTO dest SELECT COALESCE(dest_maxi.i,0)+1 AS i, src_one.x AS x FROM src_one LEFT JOIN dest_maxi ON src_one.one = dest_maxi.ROWKEY PARTITION BY COALESCE(dest_maxi.i,0)+1 EMIT CHANGES;
Now you can insert values into stream src and watch them come up in stream dest with auto-incrementing IDs.
I don't think your approach will work. kslqDB offers no guarantees in the order in which is processes records across two different queries. In your case that means no ordering guarantees that
CREATE TABLE dest_maxi AS <query>;
Will run and update dest_maxi before
INSERT INTO dest <query>;
Runs. Hence, I think you'll run into issues.
It looks like you're trying to take a stream of numbers, e.g.
1234
24746
24848
4947
34
And add an auto-incrementing id column so that the result looks like:
1, 1234
2, 24746
3, 24848
4, 4947
5, 34
Something like this should give you what you want:
-- source stream of numbers:
CREATE STREAM src (
x INT
) WITH (
kafka_topic='test_src',
value_format='json'
);
-- intermediate 'table' of numbers and current count:
CREATE TABLE with_counter
WITH (partitions = 1) AS
SELECT
1 as k,
LATEST_BY_OFFSET(x) as x,
COUNT(1) AS id
FROM src
GROUP BY 1
-- if you need this back as a stream in ksqlDB you can run:
CREATE STREAM dest (
x INT,
id BIGINT
) WITH (
kafka_topic='WITH_COUNTER',
value_format='json'
);
UDAFs calculate values per-key, hence we group-by a constant, ensuring all input rows are funnelled into a single key (and partition - so this doesn't scale well!).
We use COUNT to count the number of rows seen, so its output auto-incrementing, and we use LATEST_BY_OFFSET to grab the current value of x into our table.
The changelog of the with_counter table will then contain the output you want, only with a constant key of 1:
1 -> 1, 1234
1 -> 2, 24746
1 -> 3, 24848
1 -> 4, 4947
1 -> 5, 34
We re-import this into ksqlDB as the dest stream. Which you can use as normal. If you want a topic without the key you can just run:
CREATE STREAM without_key AS SELECT * FROM dest;

Resulting KSQL join stream shows no data

I am joining a KSQL stream and a KSQL Table. Both are mapped to same key.
But no data is coming to the resulting stream.
create stream kz_yp_loan_join_by_bandid WITH (KAFKA_TOPIC='kz_yp_loan_join_by_bandid',VALUE_FORMAT='AVRO') AS
select ypl.loan_id, ypl.userid ,ypk.name as user_band_id_name
FROM kz_yp_loan_stream_partition_by_bandid ypl
INNER JOIN kz_yp_key_table ypk
ON ypl.user_band_id = ypk.id;
No data is in stream kz_yp_loan_join_by_bandid
But if I do simply :
select ypl.loan_id, ypl.userid ,ypk.name as user_band_id_name
FROM kz_yp_loan_stream_partition_by_bandid ypl
INNER JOIN kz_yp_key_table ypk
ON ypl.user_band_id = ypk.id;
There is data present.
It shows that stream is not written but why is it so?
I have tried doing entire setup again.
A few things to check:
If you want to process all the existing data as well as new data, make sure that before you run your CREATE STREAM … AS SELECT ("CSAS") you have run SET 'auto.offset.reset' = 'earliest';
If the join is returning data when run outside of the CSAS then this may not be relevant, but always good to check your join is going to match all the requirements
Check the KSQL server log in case there's an issue with writing to the target topic, creating the schema on the Schema Registry, etc.
These references will be useful:
https://www.confluent.io/blog/troubleshooting-ksql-part-1
https://www.confluent.io/blog/troubleshooting-ksql-part-2

Can you aggregate denormalized parse-server query results in one statement using Swift?

My experience is with SQL but I am working on learning parse server data management and in the example below I demonstrate how I would use SQL to represent the data I currently have stored in my parse server classes. I am trying to present all the users, the count of how many images they have uploaded, and a count of how many images they have liked for an app where users can upload images and they can also scroll through and like other people's images. I store the id of the user who uploads the image on the image table and I store an array column in the image table of all the ids that have liked it.
Using SQL I would have normalized this into 3 tables (user, image, user_x_image), joined the tables, and then aggregated that result. But I am trying to learn the right way to do this using parse server where my understanding is that the best practice is to structure the data the way I have below. What I want to do is produce a "leader board" that presents which users have uploaded the most images or liked the most images to inspire engagement. Even justy links to examples of how to join/aggregate parse data sets would be very helpful. If I wasn't clear in what I am trying to achieve please let me know if the comments and I will add updates.
-- SQL approximation of data structured in parse
create volatile table users
( user_id char(10)
, user_name char(50)
) on commit preserve rows;
insert into users values('1a','Tom');
insert into users values('2b','Dick');
insert into users values('3c','Harry');
insert into users values('4d','Simon');
insert into users values('5e','Garfunkel');
insert into users values('6f','Jerry');
create volatile table images
( image_id char(10)
, user_id_owner char(10) -- The object Id for the parse user that uploaded
, UsersWhoLiked varchar(100) -- in Parse class this is array of user ids that clicked like
) on commit preserve rows;
insert into images values('img01','1a','["4d","5e"]');
insert into images values('img02','6f','["1a","2b","3c"]');
insert into images values('img03','6f','["1a","6f",]');
-----------------------------
-- DESIRED RESULTS
-- Tom has 1 uploads and 2 likes
-- Dick has 0 uploads and 1 likes
-- Harry has 0 uploads and 1 likes
-- Simon has 0 uploads and 1 likes
-- Garfunkel has 0 uploads and 1 likes
-- Jerry has 2 uploads and 1 likes
-- How to do with normalized data structure
create volatile table user_x_image
( user_id char(10)
, image_id char(10)
, relationship char(10)
) on commit preserve rows;
insert into user_x_image values('4d','img01','liker');
insert into user_x_image values('5e','img01','liker');
insert into user_x_image values('1a','img02','liker');
insert into user_x_image values('2b','img02','liker');
insert into user_x_image values('3c','img02','liker');
insert into user_x_image values('1a','img03','liker');
insert into user_x_image values('6f','img03','liker');
-- Return the image likers/owners
sel
a.user_name
, a.user_id
, coalesce(c.cnt_owned,0) cnt_owned
, sum(case when b.relationship='liker' then 1 else 0 end) cnt_liked
from
users A
left join
user_x_image B
on a.user_id = b.user_id
left join (
sel user_id_owner, count(*) as cnt_owned
from images
group by 1) C
on a.user_id = c.user_id_owner
group by 1,2,3 order by 2
-- Returns desired results
First, I am assuming you are running Parse Server with a MongoDB database (Parse Server also supports Postgres and it can make things little bit easier for relational queries). Because of this, it is important to note that, besides Parse Server implements relational capabilities in its API, in fact we are talking about a NoSQL database behind the scenes. So, let's go with the options.
Option 1 - Denormalized Data
Since it is a NoSQL database, I'd prefer to have a third collection called LeaderBoard. You could add a afterSave trigger to the UserImage class and make LeaderBoard always updated. When you need the data, you can do a very simple and fast query. I know that it sounds kinda strange for a experienced SQL developer to have a denormalized data, but it is the best option in terms of performance if you have more reads than writes in this collection.
Option 2 - Aggregate
MongoDB supports aggregates (https://docs.mongodb.com/manual/aggregation/) and it has a pipeline stage called $lookup (https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/) that you can use in order to perform your query using a single api call/database operation. Parse Server supports aggregates as well in its API and JS SDK (https://docs.parseplatform.org/js/guide/#aggregate) but unfortunately not directly from client code in Swift because this operation requires a master key in Parse server. Therefore, you will need to write a cloud code function that performs the aggregate query for you and then call this cloud cloud function from your Swift client code.

Joining two database sets with different datatypes

Trying to join two datasets, but the join is based on two different data types (numeric and text)
SELECT *
FROM D1.T1 c
INNER JOIN
D1.T2 d
on c.CNUMBER=INPUT(d.CNUMBER, 8.) ;
This is does not work.
I can create a new dataset (copy existing one and add a numerical column) like this:
CNUMBER1=CNUMBER*1;
run;
Then when I join using this copy, it works... but I actually want to try to figure out the way to do it with direct Oracle connection.
In Oracle I would do:
on to_char(c.CNUMBER)=to_char(c.CNUMBER)
Taking a wild guess at what you actually want:
PROC SQL;
CONNECT TO ORACLE (...);
CREATE TABLE oracle_results AS
SELECT * FROM CONNECTION TO ORACLE (
SELECT *
FROM D1.T1 c
INNER JOIN
D1.T2 d
on to_char(c.CNUMBER)=d.CNUMBER);
DISCONNECT FROM ORACLE;
QUIT;
Will connect your SAS session to Oracle, perform the explicit passthrough SQL query and pass the results back to the SAS table oracle_results. Replace the dots with your Oracle connection credentials.

Resources