Empty KSQL Stream - ksqldb

I'm having a problem with fetching data from Kafka topic.
this topic content object, I didn't really know how to store some variable in the stream.
and I'm sure that the topic exists
i have:
Ticket object:{
header object{storeID,storename.....}
body object{...}
}
i wanna put storeID in the stream
create stream test (StoreID VARCHAR) with (KAFKA_TOPIC= 'output__tfrema',VALUE_FORMAT='AVRO');
i try that example but it give me 0 data , i expect it to give me at least 10000 data.
thx any way

If there's no results returned it can be for several reasons.
Make sure you've SET 'auto.offset.reset' = 'earliest'; so you read all messages from the beginning of the topic
Are there deserialisation errors in your KSQL server log?
You can read more here: https://www.confluent.io/blog/troubleshooting-ksql-part-1

Related

problems with smart contract address

I'm following this Elrond NFTs smart contracts (from scratch part 2) tutorial: https://www.youtube.com/watch?v=jpJQ-YB4NnQ
I successfully compile the smart contract but when I use . interaction/devnet.snippets.sh I do not get the smart contract address, as can be seen in this log.
Because of this, I cannot run the issueToken function.
can anyone help?
INFO:accounts:Account.sync_nonce()
INFO:accounts:Account.sync_nonce() done: 10733
INFO:cli.contracts:Contract address: erd1qqqqqqqqqqqqqpgq093ggau3mcjq4p5ln7skvtrk4wjhyfpwd8ssjj45qw
INFO:utils:View this contract address in the Elrond Devnet Explorer: https://devnet-explorer.elrond.com/accounts/erd1qqqqqqqqqqqqqpgq093ggau3mcjq4p5ln7skvtrk4wjhyfpwd8ssjj45qw
INFO:transactions:Transaction.send: nonce=10733
INFO:transactions:Hash: 4f25756f9246985732038eccc0cbc4fda480b8409fcc70dff089c1d59684e652
INFO:utils:View this transaction in the Elrond Devnet Explorer: https://devnet-explorer.elrond.com/transactions/4f25756f9246985732038eccc0cbc4fda480b8409fcc70dff089c1d59684e652
WARNING:cli.data:Always review --expression parameters before executing this command!
WARNING:cli.data:Always review --expression parameters before executing this command!
WARNING:cli.data:Never use this command to store sensitive information! Data is unencrypted.
INFO:cli.data:Data has been stored at key = 'address-devnet', in partition = '*'.
WARNING:cli.data:Never use this command to store sensitive information! Data is unencrypted.
INFO:cli.data:Data has been stored at key = 'deployTransaction-devnet', in partition = '*'.
Smart contract address:
The outfile log structure changed in the meantime and you're most likely still looking for data in the old structure.
Now, the address in the new log structure is stored under the ['contractAddress'] key instead of ['emitted_tx']['address'], respectively transaction hash into ['emittedTransactionHash'] instead of ['emitted_tx']['address'].
Therefore, you have to change these lines:
TRANSACTION=$(erdpy data parse --file="${MY_LOGS}/deploy-devnet.interaction.json" --expression="data['emitted_tx']['hash']")
ADDRESS=$(erdpy data parse --file="${MY_LOGS}/deploy-devnet.interaction.json" --expression="data['emitted_tx']['address']")
to these:
TRANSACTION=$(erdpy data parse --file="${MY_LOGS}/deploy-devnet.interaction.json" --expression="data['emittedTransactionHash']")
ADDRESS=$(erdpy data parse --file="${MY_LOGS}/deploy-devnet.interaction.json" --expression="data['contractAddress']")

Gcp Dataflow processes invalid data

We have an API as a proxy between clients and google Pub/Sub, so it basically retrieves a JSON body and publishes it to the topic. Then, it is processed by DataFlow, which stores it in BigQuery. Also, we use transform UDF to, for instance, convert a field value to upper case; it parses JSON sent and produces a new one.
The problem is the following. The number of bytes sent to the destination table is much less than to the deadletter, and the error message is 99% percent contains the error saying that the sent JSON is invalid. And that's true, the payloadstring column contains distorted JSONs: they could be truncated, concatenated with other ones, or even both. I've added logs on the API side to see where did the message set corrupted, but neither received or sent by the API JSON bodies are invalid.
How can I debug this problem? Is it any chance of pub/sub or dataflow to corrupt messages? If so, what can I do to fix it?
UPD. By the way, we use a Google-provided template called "pubsub topic to bigquery"
UPD2. API is written in Go, and the way we send the message is simply by calling
res := p.topic.Publish(ctx, &pubsub.Message{Data: msg})
The res variable is then used for error logging. p here is a custom struct.
The message we sent is a JSON with 15 fields, and just to be concise I'll mock it and UDF.
Message:
{"MessageName":"Name","MessageTimestamp":123123123",...}
UDF:
function transform(inJson) {
var obj;
try {
obj = JSON.parse(inJson);
} catch (error){
throw 'parse JSON error: '+error;
}
if (Object.keys(obj).length !== 15){
throw "Message is invalid";
}
if (!(obj.hasOwnProperty('EventSource') && typeof obj.EventSource === 'string' && obj.MessageName.length>0)) {
throw "MessageName is absent or invalid";
}
/*
other fields check
*/
obj.MessageName = obj.MessageName.toUpperCase()
/*
other fields transform
*/
return JSON.stringify(obj);
}
UPD3:
Besides being corrupted, I've noticed that every single message is duplicated at least once, and the duplicates are often truncated.
The problem occurred several days ago when it was a massive increase in the number of messages, but now it got back to normal, and the error is still there. The problem was seeing before, but it was a much more rare case.
The behavior you describe suggests that the data is corrupt before it gets to Pubsub or Dataflow.
I have performed a test, sending JSON messages containing 15 fields. Your UDF function as well as the Dataflow template work fine since I was able to insert the data to BigQuery.
Based on that, it seems your messages are already corrupted before getting to Pub/Sub, I suggest you to check your messages once they arrived to Pub/Sub and see if they have the correct format.
Please notice that it's required for the messages schema match with the BigQuery table schema.

Stream PubSub to Spanner - Wait.on Step

Requirement is to delete the data in spanner tables before inserting the data from pubsub messages. As MutationGroup does not guarantee the order of execution, separated delete mutations into separate set and so have two sets, one for Delete and other to AddReplace Mutations.
PCollection<Data> dataJson =
pipeLine
.apply(PubsubIO.readStrings().fromSubscription(options.getInputSubscription()))
.apply("ParsePubSubMessage", ParDo.of(new PubSubToDataFn()))
.apply(Window.into(FixedWindows.of(Duration.standardSeconds(10))))
;
SpannerWriteResult deleteResult = dataJson
.apply("DeleteDataMutation", MapElements.via(......))
.apply("DeleteData", SpannerIO.write().withSpannerConfig(spannerConfig).grouped());
dataJson
.apply("WaitOnDeleteMutation", Wait.on(deleteResult.getOutput()))
.apply("AddReplaceMutation", MapElements.via(...))
.apply("UpsertInfoToSpanner", SpannerIO.write().withSpannerConfig(spannerConfig).grouped());
This is a streaming dataflow job and I tried multiple Windowing but it never executes "UpsertInfoToSpanner" Step.
How can I fix this issue? Can someone suggest a path forward.
Update:
Requirement is to apply Two Mutation Groups sequential on same input data i.e. Read JSON from PubSub message to delete existing data from multiple tables with mutation group and then insert data reading from the JSON PubSub message.
Re-pasting the comment earlier for better visibility:
The Mutation operations within a single MutationGroup are guaranteed to be executed in order within a single transaction, so I don't see what the issue is here... The reason why Wait.on() never releases is because the output stream that is being waited on is on the global window, so will never be closed in a streaming pipeline.

Unrecognised header byte error when try to decode an Avro message in Spring Cloud Stream

I am trying to write a test case for my Spring Cloud Stream application. I am using Confluent Schema Registry with Avro, so I need to decode the message after polling from the channel. Here is my code:
processor.input()
.send(MessageBuilder.withPayload(InputData).build());
Message<?> message = messageCollector.forChannel(processor.output()).poll();
BinaryMessageDecoder<OutputData> decoder = OutputData.getDecoder();
OutputData outputObject = decoder.decode((byte[]) message.getPayload());
For some reason this code throws
org.apache.avro.message.BadHeaderException: Unrecognized header bytes: 0x00 0x08
I am not sure if this is some sort of bug I am facing or I am not following a proper way to decode the received avro message. I suspect I need to set header with something, but I am not quite sure how and with what exactly. I would appreciate it if someone could help me with this matter.
P.S: I am using spring-cloud-stream-test-support for the purpose of this test.
The data won't be avro-encoded when using the test binder.
The test binder is very limited.
To properly test end-to-end with avro, you should remove the test binder and use the real kafka binder with an embedded kafka broker.
One of the sample apps shows how to do it.
It turns out that the issue was related to how I was trying to decode the Avro message. By using the official Avro libraries, the following code worked for me:
Decoder decoder = DecoderFactory.get().binaryDecoder((byte[]) message.getPayload(), null);
DatumReader<OutputData> reader = new SpecificDatumReader<>(OutputData.getClassSchema());
RawDataCapsule rawDataCapsule = reader.read(null , decoder);

change trace log format in emqtt message broker

I am using emqtt message broker for mqtt.
I am not a erlang developer and has zero knowledge on that.
I have used this erlang based broker, because after searching many open source broker online and suggestions from people about the advantage of erlang based server.
Now i am kind of stuck with the out put of the emqttd_cli trace command.
Its not json type and if i use a perl parser to convert to json type i am getting delayed output.
I want to know, in which file i could change the trace log output format.
I looked on the trace code of the broker and found a file src/emqttd_protocol.erl. An exported function named trace/3 has the code that you need.
Second argument of this function, named Packet, has the information of receive & send data via broker. You can fetch required data from it and format according to how you want to print.
Edit : Sample modified code added
trace(recv, Packet, ProtoState) ->
PacketHeader = Packet#mqtt_packet.header,
HostInfo = esockd_net:format(ProtoState#proto_state.peername),
%% PacketInfo = {ClientId, Username, ClientIP, ClientPort, Payload, QoS, Retain}
PacketInfo = {ProtoState#proto_state.client_id, ProtoState#proto_state.username, lists:nth(1, HostInfo), lists:nth(3, HostInfo), Packet#mqtt_packet.payload, PacketHeader#mqtt_packet_header.qos, PacketHeader#mqtt_packet_header.retain},
?LOG(info, "Data Received ~s", [PacketInfo], ProtoState);

Resources