KSQLDB create a stream from multiple streams without join - ksqldb

is it possible to join multiple streams to one stream without a join / window clause? I just want something similiar as a combined kafka topic, where all messages can be found for further processing.
/--->transformationOnValueX stream ---\
kafka topic source1:/---->transformationOnValueY stream ----\
/--->TransformationOnValueW stream -----\
kafka topic source2:/---->TransformationOnValueZ stream ------\------> combind_stream_all_messages_transformed_multiple_sources
Any idea how to do that?
Edit://
Found https://kafka-tutorials.confluent.io/merge-many-streams-into-one-stream/ksql.html but that solution requires an insert into statement manually after the novel stream is created. Is there a way to do it in one statement?
As a side effect the event driven nature is broken, or am i wrong? what happened if an event is published to the source topic after the insert into statement? is it lost?
Edit://
To inform all fellows. If you use a insert into select... a running query will be created and the final stream will receive further updates

To inform all fellows. If you use a insert into select...-statement, a running query will be created and the final stream will receive further updates.

Related

Does AVRO supports schema evolution?

I am trying to understand whether AVRO supports schema evolution for the following case.
Kafka Producer writing using schema1
Then again producer writing using schema2 - A new field added with default value
Kafka Consumer consuming above both message using schema1?
I am able to read the first message successfully from Kafka but for the second message I am getting ArrayIndexOutOfBoundException. Ie - I am reading the second message (written using schema2) using schema1. Is this expected not to work? Is it expected to update the consumer first always?
Other option is to use schema registry but I don't want to opt this. So I would like know whether schema evolution for above case is possible?
When reading Avro data, you always need two schemata: the writer schema and the reader schema (they may be the same).
I'm assuming you're writing the data to Kafka using the BinaryMessageEncoder. This adds a 10-byte header describing the write schema.
To read the message (using the BinaryMessageDecoder), you'll need to give it the read schema (schema1) and a SchemaStore. This latter can be connected to a schema registry, but it need not. You can also use the SchemaStore.Cache implementation and add schema2 to it.
When reading the data, the BinaryMessageDecoder first reads the header, resolves the writer schema, and then reads the data as schema1 data.

select from system$stream_has_data returns error - parameter must be a valid stream name... hmm?

I'm trying to see if there is data in a stream and I provided the exact stream name as follows :
Select SYSTEM$STREAM_HAS_DATA('STRM_EXACT_STREAM_NAME_GIVEN');
But, I get an error :
SQL compilation error: Invalid value ['STRM_EXACT_STREAM_NAME_GIVEN'] for function 'SYSTEM$STREAM_HAS_DATA', parameter 1: must be a valid stream name
1) Any idea why ? How can this error be resolved ?
2) Would it hurt to resume a set of tasks (alter task resume;) without knowing if the corresponding stream has data in it or not? I blv if there is (delta) data in the stream, the task will load it, if not, the task won't do anything.
3) Any idea how to modify / update a stream that shows up as 'STALE' ? - or should just loading fresh data into the table associated with the stream should set the stream as 'NOT STALE' i.e. stale = false ? what if loading the associated table does not update the state of the task? (and that is what is happening currently in my case, as things appear.
1) It doesn't look like you have a stream by that name. Try running SHOW STREAMS; to see what streams you have active in the database/schema that you are currently using.
2) If your task has a WHEN clause that validates against the SYSTEM$STREAM_HAS_DATA result, then resuming a task and letting it run on schedule only hits against your global services layer (no warehouse credits), so there is no harm there.
3) STALE means that the stream data wasn't used by a DML statement in a long time (I think its 14 days by default or if data retention is longer than 14 days, then it's the longer of those). Loading more data into the stream table doesn't help that. Running a DML statement will, but since the stream is stale, doing so may have bad consequences. Streams are meant to be used for frequent DML, so not running DML against a stream for longer than 14 days is very uncommon.

Session Windows behave with Kafka Stream is not as expected

I am a bit newbie working with kafka stream but what I have noticed is a behave I am not expecting. I have developed an app which is consuming from 6 topics. My goal is to group (or join) an event on every topic by an internal field. That is working fine. But my issue is with window time, it looks like the end time of every cycle affect to all the aggregations are taking on that time. Is only one timer for all aggregation are taking at the same time ?. I was expecting that just when the stream get the 30 seconds configured get out of the aggregation process. I think it is possible because I have seen data on Windowed windowedRegion variable and the windowedRegion.window().start() and windowedRegion.window().end() values are different per every stream.
This is my code:
streamsBuilder
.stream(topicList, Consumed.with(Serdes.String(), Serdes.String()))
.groupBy(new MyGroupByKeyValueMapper(), Serialized.with(Serdes.String(), Serdes.String()))
.windowedBy(SessionWindows.with(windowInactivity).until(windowDuration))
.aggregate(
new MyInitializer(),
new MyAggregator(),
new MyMerger(),
Materialized.with(new Serdes.StringSerde(), new PaymentListSerde())
)
.mapValues(
new MyMapper()
)
.toStream(new MyKeyValueMapper())
.to(consolidationTopic,Produced.with(Serdes.String(), Serdes.String()));
I'm not sure if this is what you're asking but every aggregation (every per-key session window) may indeed be updated multiple times. You will not generally get just one message per window with the final result for that session window on your "consolidation" topic. This is explained in more detail here:
https://stackoverflow.com/a/38945277/7897191

Is there a way to create a circular file storage ? like syslog in linux

In my iOS application, I want to store some messages that I obtain from my remote server. However, instead of storing these messages forever, I want to purge, once I have a N number of messages; i.e., if my N is configured to be 10, I want to store 10 messages and on the arrival of the 11th message, I want to delete the 1st message.
Is there a standard way to do this in iOS ? I am yet to write the code to saving the messages, so choosing any method of saving is fine for me.
Store your messages in a file. After you get the message read your file's messages to an NSMutableArray, replace the most old message with new one and overwrite your file with new array data.
I dont think there is a straight fwd way.
The way I would do is have a table using SQLLite. Have 2 columns id(int, autoincrement), value(String). When inserting, if max(id) >=10 delete row with min(id) and insert the new value.
Ofcourse, this woud fail after it reached MAX_INT_VALUE. So if you thing you would never get to this value you are good.

how to send HL7 message using mirth by reading data from my database

I'm having a problem is sending(creating) an HL7 message using mirth.
I want to read data from my patient table in SQLSERVER 2008 and, using that data,
I want to send a message to my destination connector, a file writer. I want my messages to get saved in the file writer's output directory.
So far I'm able to generate the message, but the size of the output file in my destination directory is increasing as the channel's polling time goes on.
Have I done something wrong in the transformer mapping?
UPDATE:
The size of the output file in my destination directory IS increasing. (My .txt file starts from 1 kb and goes to 900kb and so on). This is happening becasue same data is getting generated again and again and multiple times too. for eg. my generated message has one(MSH,PID,PV1,ORM) for one row of data in my Database. The same MSH,PID, PV1 and ORM are getting generated multiple times.
If you are seeing the same data generated in your output directory multiple time, the most likely cause is that you are not doing anything to indicate to your database that a given record has been processed.
For example, if you have 1 record in your database: ["John", "Smith", "12134" ...] on the first poll, you will generate 1 message. If on the second poll you also have a second record ["Fred", "Jones", "98371" ...], you will generate TWO messages - one for John Smith and one for Fred Jones. And so on.
The key is to use the "Run On-Update Statement" of your Database Reader (Source) connector to update the database table you are polling with an indication that a given record has been processed. This ensures that the same record is not processed multiple times.
This requires that your source table have some kind of column to indicate the record has been processed. Mirth will not keep track of this for you - you must do it manually.
You can't have a file reader as a destination, so I assume you mean file writer. You say that "the size of my file in my destination is increasing." Is that a typo? Do you mean NOT increasing?
If it is increasing, then your messages are getting generated and you can view them to start your next round of troubleshooting...
If not, the you should look at the message log in the dashboard to see what is happening on a message-by-message basis - that would be the next place to troubleshoot.
You have to have a way of distinguishing what records to pull from the database by filtering on some sort of status flag or possible a time-stamp. Then, you have to use some sort of On-Update statement to mark these same records as processed.
i.e.
Select id, patient, result from results where status_flag='N'
or
Select * from results where status_flag = 'N' and created_date >= '9/25/2012'
Then, in either a transformer step or the On-Update section of your Source, you would do something like:
Update results
set status_flag = 'Y' where id=$(id)
If you do not do something like this and you have Mirth polling at a certain interval, it will just keep pulling the same records over and over.
You have to change your connector type as Database reader in source.
You have to change your connector type as file writer in the destination.
And you can write your data in the file, For which you have access to write.
while creating HL7 template you have to use the following code in outbound message template
MSH|^~\&|||
Thanks
Krishna

Resources