Unrecognised header byte error when try to decode an Avro message in Spring Cloud Stream - avro

I am trying to write a test case for my Spring Cloud Stream application. I am using Confluent Schema Registry with Avro, so I need to decode the message after polling from the channel. Here is my code:
processor.input()
.send(MessageBuilder.withPayload(InputData).build());
Message<?> message = messageCollector.forChannel(processor.output()).poll();
BinaryMessageDecoder<OutputData> decoder = OutputData.getDecoder();
OutputData outputObject = decoder.decode((byte[]) message.getPayload());
For some reason this code throws
org.apache.avro.message.BadHeaderException: Unrecognized header bytes: 0x00 0x08
I am not sure if this is some sort of bug I am facing or I am not following a proper way to decode the received avro message. I suspect I need to set header with something, but I am not quite sure how and with what exactly. I would appreciate it if someone could help me with this matter.
P.S: I am using spring-cloud-stream-test-support for the purpose of this test.

The data won't be avro-encoded when using the test binder.
The test binder is very limited.
To properly test end-to-end with avro, you should remove the test binder and use the real kafka binder with an embedded kafka broker.
One of the sample apps shows how to do it.

It turns out that the issue was related to how I was trying to decode the Avro message. By using the official Avro libraries, the following code worked for me:
Decoder decoder = DecoderFactory.get().binaryDecoder((byte[]) message.getPayload(), null);
DatumReader<OutputData> reader = new SpecificDatumReader<>(OutputData.getClassSchema());
RawDataCapsule rawDataCapsule = reader.read(null , decoder);

Related

How to create a custom JMeter Pre-Processor to convert a json payload into MQTT Binary packet

I need a JMeter Pre Processor which will convert a JSON String into a MQTT packet i.e binary data. The binary data will then be sent over the web socket using the JMeter web socket plugin - https://github.com/ptrd/jmeter-websocket-samplers
I am relatively new to JMeter and would appreciate any help on this. Please refer to my earlier question for the project scenario :
Testing a MQTT Client using JMeter
You kindly provided the link to the documentation which says:
The request-response sampler, as well as the single-read and single-write samplers, support both text and binary frames. For binary frames, enter the payload in hexadecimal format, e.g. 0xca 0xfe or ba be
So
Text data can be sent as they are (JSON is more or less plain text)
Binary (non-text) data needs to be converted to hex first
So you need to determine what exactly you need because your current requirement is a little bit vague and contraversial.
If you're looking for a function to convert string to hex - you can go for JSR223 PreProcessor and something like:
def hex(byte[] data) {
def rv = new StringBuilder();
data.each { aByte ->
rv.append('0x').append(String.format("%02x", aByte)).append(' ');
}
return rv.toString();
}
Demo:
More information on Groovy scripting in JMeter: Apache Groovy - Why and How You Should Use It

Gcp Dataflow processes invalid data

We have an API as a proxy between clients and google Pub/Sub, so it basically retrieves a JSON body and publishes it to the topic. Then, it is processed by DataFlow, which stores it in BigQuery. Also, we use transform UDF to, for instance, convert a field value to upper case; it parses JSON sent and produces a new one.
The problem is the following. The number of bytes sent to the destination table is much less than to the deadletter, and the error message is 99% percent contains the error saying that the sent JSON is invalid. And that's true, the payloadstring column contains distorted JSONs: they could be truncated, concatenated with other ones, or even both. I've added logs on the API side to see where did the message set corrupted, but neither received or sent by the API JSON bodies are invalid.
How can I debug this problem? Is it any chance of pub/sub or dataflow to corrupt messages? If so, what can I do to fix it?
UPD. By the way, we use a Google-provided template called "pubsub topic to bigquery"
UPD2. API is written in Go, and the way we send the message is simply by calling
res := p.topic.Publish(ctx, &pubsub.Message{Data: msg})
The res variable is then used for error logging. p here is a custom struct.
The message we sent is a JSON with 15 fields, and just to be concise I'll mock it and UDF.
Message:
{"MessageName":"Name","MessageTimestamp":123123123",...}
UDF:
function transform(inJson) {
var obj;
try {
obj = JSON.parse(inJson);
} catch (error){
throw 'parse JSON error: '+error;
}
if (Object.keys(obj).length !== 15){
throw "Message is invalid";
}
if (!(obj.hasOwnProperty('EventSource') && typeof obj.EventSource === 'string' && obj.MessageName.length>0)) {
throw "MessageName is absent or invalid";
}
/*
other fields check
*/
obj.MessageName = obj.MessageName.toUpperCase()
/*
other fields transform
*/
return JSON.stringify(obj);
}
UPD3:
Besides being corrupted, I've noticed that every single message is duplicated at least once, and the duplicates are often truncated.
The problem occurred several days ago when it was a massive increase in the number of messages, but now it got back to normal, and the error is still there. The problem was seeing before, but it was a much more rare case.
The behavior you describe suggests that the data is corrupt before it gets to Pubsub or Dataflow.
I have performed a test, sending JSON messages containing 15 fields. Your UDF function as well as the Dataflow template work fine since I was able to insert the data to BigQuery.
Based on that, it seems your messages are already corrupted before getting to Pub/Sub, I suggest you to check your messages once they arrived to Pub/Sub and see if they have the correct format.
Please notice that it's required for the messages schema match with the BigQuery table schema.

change trace log format in emqtt message broker

I am using emqtt message broker for mqtt.
I am not a erlang developer and has zero knowledge on that.
I have used this erlang based broker, because after searching many open source broker online and suggestions from people about the advantage of erlang based server.
Now i am kind of stuck with the out put of the emqttd_cli trace command.
Its not json type and if i use a perl parser to convert to json type i am getting delayed output.
I want to know, in which file i could change the trace log output format.
I looked on the trace code of the broker and found a file src/emqttd_protocol.erl. An exported function named trace/3 has the code that you need.
Second argument of this function, named Packet, has the information of receive & send data via broker. You can fetch required data from it and format according to how you want to print.
Edit : Sample modified code added
trace(recv, Packet, ProtoState) ->
PacketHeader = Packet#mqtt_packet.header,
HostInfo = esockd_net:format(ProtoState#proto_state.peername),
%% PacketInfo = {ClientId, Username, ClientIP, ClientPort, Payload, QoS, Retain}
PacketInfo = {ProtoState#proto_state.client_id, ProtoState#proto_state.username, lists:nth(1, HostInfo), lists:nth(3, HostInfo), Packet#mqtt_packet.payload, PacketHeader#mqtt_packet_header.qos, PacketHeader#mqtt_packet_header.retain},
?LOG(info, "Data Received ~s", [PacketInfo], ProtoState);

DocuSign Connect update XML desserialization error

I have been using DocuSign SOAP and REST based API calls to create envelope and am also using their Connect feature to update the recipient and envelope statuses for my clients.
I am getting a strange error parsing DocuSign Connect update for one client.
The error says "There is an error in XML document (1, 16174)".
Here is my code...
Dim sr As New StreamReader(Request.InputStream)
Dim reader As XmlReader = New XmlTextReader(New StringReader(xml))
Dim serializer As New XmlSerializer(GetType(DocuSignEnvelopeInformation), "http://www.docusign.net/API/3.0")
If Not serializer Is Nothing Then
envelopeInfo = TryCast(serializer.Deserialize(reader), DocuSignEnvelopeInformation)
Dim envid As String = envelopeInfo.EnvelopeStatus.EnvelopeID.ToString
I have tried bunch of things such as removing the XML definition from the XML document but did not work. The strange thing is that the same code works for all of my other clients. This is the only client that is having issues. They have added closed 65 tags in the document to be signed but I don't think that the tags are causing issues on their end since I also tried removing them.
Please advise.
Minal
I have run into this issue before when there are unsupported characters in the tab values or in the PDF byte stream itself when it is decoded. I suspect that copying and pasting values into tabs from external programs like Word introduce some invisible weird characters like 
 - carriage returns and the like. You should validate your XML in its entirety.

Encoding a JMS TextMessage

I'm receiving messages from a JMS MQ queue which are supposedly utf-8 encoded. However on reading the out using msgText = ((TextMessage)msg).getText();
I get question marks where non standard characters were present. It seems possible to specify the encoding when using a bytemessage, but I cant find a way to specify encoding while reading out the TextMessage. Is there a way to solve this, or should I press for bytemessages?
We tried adding Dfile.encoding="UTF-8" to Websphere's jvm and we added
source = new StreamSource(new ByteArrayInputStream(
((TextMessage) msg).getText().getBytes("UTF-8")));
In our MessageListener. This worked for us, so then we took out the Dfile.encoding bit away and it still works for us.
Due to preferred minimum configuration for Websphere we decided to leave it this way, also taking into account that we may easier switch the UTF-8 string by a setting from file or database.
If the text is not decoded correctly, then probably the client is not sending the message with the utf-8 codec; this should work:
byte[] by = ((TextMessage) msg).getText().getBytes("ISO-8859-1");
String text = new String(by,"UTF-8");

Resources