Siddhi stream schemaless data - stream

I have a source data from ActiveMQ, the problem that I have is that these data do not have a fixed structure, therefore, when I define the stream it throws an incompatible datatype error, is there any way to condition the source stream by some condition?
Thanks in advance.
/*
* Origin of data.
*/
#source(type='jms',
#map(type='csv', delimiter=',', fail.on.unknown.attribute='false'),
factory.initial='org.apache.activemq.jndi.ActiveMQInitialContextFactory',
provider.url='tcp://127.0.0.1:61616',
destination='simulatedData',
connection.factory.type='queue',
connection.factory.jndi.name='QueueConnectionFactory',
transport.jms.SubscriptionDurable='true',
transport.jms.DurableSubscriberClientID='wso2SPclient1')
define stream FileSourceProductionStream(type string, time long, studentId string, fileId string, totalAccesses float); /* totalAccesses : float Incompatible DataType*/
define stream TaskSourceProductionStream(type string, time long, studentId string, taskId string, deadline long); /*deadline: long Incompatible DataType*/

In siddhi, the source stream should have the schema of the input data. Therefore, we cannot have received schemaless data to a defined stream.
One possible solution for your scenario is defining a stream with all the possible input attributes and pre-format the input to a JSON[1], XML[2] or TEXT[3] formats which are supported by the siddhi-map extensions, with attribute names and values. For missing attributes, there won't be a key or a tag in the json/xml/text input payload.
Then use #map(fail.on.missing.attributes='false) configuration in the source config.
Then for all the missing attributes in the input payload, NULL will be assigned to the corresponding attribute of the input stream.
[1] https://wso2-extensions.github.io/siddhi-map-json/api/4.0.20/
[2] https://wso2-extensions.github.io/siddhi-map-xml/api/4.0.12/
[3] https://wso2-extensions.github.io/siddhi-map-text/api/1.0.16/

Related

failure in serializing optional date type filed to avro regardless of null value or non-null value

We are using avro1.8.2 to serialize data with optional date type field to be published to topic.
record aRecord {
/** Variable: lastUpdate
* lastUpdate indicates the latest date and time the reference asset was updated
*/
union {null, date} lastUpdate = null;
/** Variable: businessDate
* businessDate indicates the business date of the reference asset price
*/
union {null, date} businessDate = null;
}
Ran into the following exception while using the avro tool generated java class to serialize the data:
Error serializing avro message
Caused by: org.apache.avro.AvroRunTimeException: Unknown datum type org.joda.time.LocalDate: 2021-09-17
at org.apache.avro.generic.GenericData.getSchemaName(GenericData.java:772)
at org.apache.avro.specific.SpecificData.getSchemaName(SpecificData.java:302)
at org.apache.avro.generic.GenericData.resolveUnion
Please note that2 this happens regardless of the value is null or non-null (as shown value 2021-09-17 also caused the exception)
We did the following investigation and experiment but could not figure it out why:
Making the date field mandatory, the issue is resolved.
This is because DATE_CONVERSION is added to the corresponding field in the java class generated by avro tool.
If this field is defined as optional and default value is null, DATA_CONVERSION is not added to the java file generated by avro tool.
Using avro 1.9.1 resolved the issue unfortunately we must use avro 1.8.2
We also tried a few other versions of kafka-avro-serializer and spring-boot kafka framework. Nothing works for us.
Other projects that depend on avro1.8.2 seems to be able to handle this and we checked all the places as far as we considered relevant
and all the codes are the same except that somehow they have DATE_CONVERSION in place in the java file
generated by avro tool (although they are defined in advl file exactly the same).
Debuggin into the GenericData.java we found that if DATE_CONVERSION is in place for optional date field, getSchemaName is not called at all.
The getSchemaName basically checks of the type of the object, whether it's an Int, Record, String,...etc.
The date is a logicaltype of joda. Its real type is int as far as we understand
So our questions are:
How to make avro tool enable DATE_CONVERSION for optional "date" type field using avro 1.8.2?
If DATE_CONVERSION is not the key to resolve the issue, what's the best practice to serialize date type field using avro 1.8.2?
and this field could be null (default) or non-null.
Thanks.
SpecificData specificData = SpecificData.get();
specificData.addLogicalTypeConversion(new DateConversion());
DatumWriter<MessageClass> dw = new SpecificDatumWriter<MessageClass>(message.getSchema(), specificData);
DataFileWriter<MessageClass> dfw = new DataFileWriter<MessageClass>(dw);
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
dfw.create(message.getSchema(), outputStream);
dfw.append(message);
dfw.close();
ProducerRecord<String, byte[]> record = new ProducerRecord<>(topic, key, outputStream.toByteArray());
return kafkaProducer.send(record, new Callback());
The above code fixed the issue. MessageClass is the java code generated by avro tool.
message is wrapped in specificData which is constructed with new DateConversion()
DATE_CONVERSION is exactly what is needed for optional date field during serialization.
Note that this solution is only needed as a workaround to avro1.8.

How to have Default value in Byte format (string type) in REST definition Swagger file

help needed in a REST swagger file.
How can I have default value in a Byte format parameter. or is it even possible ? Tried this one, it doesn't work.
default: "MA=="
gives me this error, at compile time:
incompatible types: java.lang.String cannot be converted to byte[]
example Swagger below:
someDVO:
type: object
description: Contains DocumentID Information.
properties:
specificDocID :
type: string
format: byte
description: The SiteSpecific ID representing the Document.
Any pointer, Thanks

Azure WebJobs QueueTrigger attempts (and fails) to convert message's byte[] body to string

I have a storage queue to which I post messages constructed using the CloudQueueMessage(byte[]) constructor. I then tried to process the messages in a webjob function with the following signature:
public static void ConsolidateDomainAuditItem([QueueTrigger("foo")] CloudQueueMessage msg)
I get a consistent failure with exception
Microsoft.Azure.WebJobs.Host.FunctionInvocationException: Exception while executing function: Program.ConsolidateDomainAuditItem ---> System.InvalidOperationException: Exception binding parameter 'msg' ---> System.Text.DecoderFallbackException: Unable to translate bytes [FF] at index -1 from specified code page to Unicode.
at System.Text.DecoderExceptionFallbackBuffer.Throw(Byte[] bytesUnknown, Int32 index)
at System.Text.DecoderExceptionFallbackBuffer.Fallback(Byte[] bytesUnknown, Int32 index)
at System.Text.DecoderFallbackBuffer.InternalFallback(Byte[] bytes, Byte* pBytes)
at System.Text.UTF8Encoding.GetCharCount(Byte* bytes, Int32 count, DecoderNLS baseDecoder)
at System.String.CreateStringFromEncoding(Byte* bytes, Int32 byteLength, Encoding encoding)
at System.Text.UTF8Encoding.GetString(Byte[] bytes, Int32 index, Int32 count)
at Microsoft.WindowsAzure.Storage.Queue.CloudQueueMessage.get_AsString()
at Microsoft.Azure.WebJobs.Host.Storage.Queue.StorageQueueMessage.get_AsString()
at Microsoft.Azure.WebJobs.Host.Queues.Triggers.UserTypeArgumentBindingProvider.UserTypeArgumentBinding.BindAsync(IStorageQueueMessage value, ValueBindingContext context)
at Microsoft.Azure.WebJobs.Host.Queues.Triggers.QueueTriggerBinding.<BindAsync>d__0.MoveNext()
Looking at the code of UserTypeArgumentBindingProvider.BindAsync, it clearly expects to be passed a message whose body is a JSON object. And the UserType... of the name also implies that it expects to bind a POCO.
Yet the MSDN article How to use Azure queue storage with the WebJobs SDK clearly states that
You can use QueueTrigger with the following types:
string
A POCO type serialized as JSON
byte[]
CloudQueueMessage
So why is it not binding to my message?
The WebJobs SDK parameter binding relies heavily on magic parameter names. Although [QueueTrigger(...)] string seems to permit any parameter name (and the MSDN article includes as examples logMessage, inputText, queueMessage, blobName), [QueueTrigger(...)] CloudQueueMessage requires that the parameter be named message. Changing the name of the parameter from msg to message fixes the binding.
Unfortunately, I'm not aware of any documentation which states this explicitly.
Try this instead:
public static void ConsolidateDomainAuditItem([QueueTrigger("foo")] byte[] message)
CloudQueueMessage is a wrapper, usually the bindings get rid of the wrapper and allow you to deal with the content instead.

System.IO.Stream in favor of HttpPostedFileBase

I have a site where I allow members to upload photos. In the MVC Controller I take the FormCollection as the parameter to the Action. I then read the first file as type HttpPostedFileBase. I use this to generate thumbnails. This all works fine.
In addition to allowing members to upload their own photos, I would like to use the System.Net.WebClient to import photos myself.
I am trying to generalize the method that processes the uploaded photo (file) so that it can take a general Stream object instead of the specific HttpPostedFileBase.
I am trying to base everything off of Stream since the HttpPostedFileBase has an InputStream property that contains the stream of the file and the WebClient has an OpenRead method that returns Stream.
However, by going with Stream over HttpPostedFileBase, it looks like I am loosing ContentType and ContentLength properties which I use for validating the file.
Not having worked with binary stream before, is there a way to get the ContentType and ContentLength from a Stream? Or is there a way to create a HttpPostedFileBase object using the Stream?
You're right to look at it from a raw stream perspective because then you can create one method that handles streams and therefore many scenarios from which they come.
In the file upload scenario, the stream you're acquiring is on a separate property from the content-type. Sometimes magic numbers (also a great source here) can be used to detect the data type by the stream header bytes but this might be overkill since the data is already available to you through other means (i.e. the Content-Type header, or the .ext file extension, etc).
You can measure the byte length of the stream just by virtue of reading it so you don't really need the Content-Length header: the browser just finds it useful to know what size of file to expect in advance.
If your WebClient is accessing a resource URI on the Internet, it will know the file extension like http://www.example.com/image.gif and that can be a good file type identifier.
Since the file info is already available to you, why not open up one more argument on your custom processing method to accept a content type string identifier like:
public static class Custom {
// Works with a stream from any source and a content type string indentifier.
static public void SavePicture(Stream inStream, string contentIdentifer) {
// Parse and recognize contentIdentifer to know the kind of file.
// Read the bytes of the file in the stream (while counting them).
// Write the bytes to wherever the destination is (e.g. disk)
// Example:
long totalBytesSeen = 0L;
byte[] bytes = new byte[1024]; //1K buffer to store bytes.
// Read one chunk of bytes at a time.
do
{
int num = inStream.Read(bytes, 0, 1024); // read up to 1024 bytes
// No bytes read means end of file.
if (num == 0)
break; // good bye
totalBytesSeen += num; //Actual length is accumulating.
/* Can check for "magic number" here, while reading this stream
* in the case the file extension or content-type cannot be trusted.
*/
/* Write logic here to write the byte buffer to
* disk or do what you want with them.
*/
} while (true);
}
}
Some useful filename parsing features are in the IO namespace:
using System.IO;
Use your custom method in the scenarios you mentioned like so:
From an HttpPostedFileBase instance named myPostedFile
Custom.SavePicture(myPostedFile.InputStream, myPostedFile.ContentType);
When using a WebClient instance named webClient1:
var imageFilename = "pic.gif";
var stream = webClient1.DownloadFile("http://www.example.com/images/", imageFilename)
//...
Custom.SavePicture(stream, Path.GetExtension(imageFilename));
Or even when processing a file from disk:
Custom.SavePicture(File.Open(pathToFile), Path.GetExtension(pathToFile));
Call the same custom method for any stream with a content identifer that you can parse and recognize.

Transferring large amounts of XML to a CLR stored procedure

I'm writing a CLR stored procedure to take XML data in the form of a string, then use the data to execute certain commands etc.
The problem that I'm running into is that whenever I try to send XML that is longer than 4000 characters, I get an error, as the XmlDocument object can't load the XML as a lot of the closing tags are missing, due to the text being truncated after 4000 chars.
I think this problem boils down to the CLR stored procedure mapping the string parameter onto nvarchar(4000), when I'm thinking something like nvarchar(max) or ntext would be what I need.
Unfortunately, I can't find a mapping from a .NET type onto ntext, and the string type automatically goes to nvarchar(max).
Does anyone know of a solution to my problem?
Thanks for any help
I think you want the System.Data.SqlTypes.SqlXml type.
For example:
using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using System.Xml;
using Microsoft.SqlServer.Server;
public partial class StoredProcedures
{
[SqlProcedure]
public static void StoredProcedure1(SqlXml data)
{
using (XmlReader reader = data.CreateReader())
{
reader.MoveToContent();
// Do stuff here.
}
}
};
For CLR stored procedures, char, varchar, text, ntext, image, cursor,
user-define table types and table cannot be specified as parameters.
You should be able the nvarchar(max) type instead of the ntext type.
ntext will disappear in future versions of SQL Server so you should use nvarchar(MAX) instead.

Resources