How can I config ksqldb to understand binary proto-buffer message in kafka? - ksqldb

I want to use ksqldb to do some query on the streamed data encoded in proto-buffer format.
But I don't have a clue how to achieve it. What if the binary message data is plain c struct, how to decode the c-struct messages and apply queries on the stream data?

ksqlDB supports Protobuf that's been serialised using the Schema Registry format. To specify your data as protobuf use FORMAT='PROTOBUF' e.g.
CREATE STREAM my_stream
WITH (KAFKA_TOPIC='my_topic',
FORMAT='PROTOBUF');
The schema itself is fetched from the Schema Registry.
For more details see https://docs.ksqldb.io/en/latest/reference/serialization/

Related

Data parsing and sending from DICOM image in .net core

I am currently working on a complete DICOM Web application based on .net core + Postgresql and OHIF viewer ( to render DICOM images).
I've built a database with tables as Patient, Study, etc. and the attributes I am storing as PatientName, PatientDOB, etc. now while returning the json the output is also the same as
"PatientName" : "temp"
"PatientDOB" : "2332"
..
but as DICOM viewers have a standard in which they recieve JSON objects as
{
"0020000D": {
"vr": "UI",
"Value": [ "1.2.392.200036.9116.2.2.2.1762893313.1029997326.945873" ]
}
}
so I want to map my JSON input/output in such a way that while returning I return values in above Dicom format and while getting the data I store them as attributes (column names) and not as tags?
I am pretty new in .net core and Dicom web so how to proceed further with that? Also, I am using fo-Dicom to read the data from Dicom image.
Please provide some hint/code that I can use.
You will propably store only few DicomTags into your database (the tags you need for doing a query against your database), but the viewer may want to have all the tags as Json. So I would not try to map your database-Jasons into Dicom-jsons, but I would use fo-dicom to generate the Json out of the DICOM file:
You need to add the nugeg package fo-dicom.json and then you can call
DicomDataset dataset = ... // wherever you get your DICOM file
string json = JsonConvert.SerializeObject(dataset, new JsonDicomConverter());
or the othe way round, if you want to convert such a DICOM conformant json into a DicomDataset
string json = ... // wherever you get the json from
DicomDataset dataset = JsonConvert.DeserializeObject<DicomDataset>(json, new JsonDicomConverter());
OHIF Viewer supports the standard DICOMweb WADO-RS JSON metadata format in addition to the custom format you mentioned in your question. This means you can use any DICOMweb server such as Orthanc, DCM4CHE or DICOMcloud
DICOMcloud may fit your scenario better as it uses fo-dicom. However, it currently only support MS SQL Server and .NET 4.6. (there is an effort to support mySQL but it is not 100% completed)
If you still want to write your own, you can look of how it is implemented and adapt it to your own solution.
[Disclosure] I am the author of DICOMcloud

Deserialize avro to generic record without schema

Is it possible to deserialize a byte array/buffer to generic record without having any schema available, beside what's encoded in message?
I'm writing a component that takes incoming encoded message and I want to make it a generic record without having any schema on my side.
I've kind of assumed it's possible, since the schema is part of this encoded message but I'm not sure anymore, I'm getting NPE if I don't specify schema in GenericDatumReader.
If you embed the schema in the header it should be possible, since it is the same that if you read an .avro file where the schema is specified at first. If you serialize the avro without specifying the schema on the header I don't think it is possible to deserialize it unless you get the schema from a central service like Schema Registry or you have the schema beforehand

Nifi and Avro: Convert the data and metadata into avro file using specified avro schema?

This is related to Apache Nifi.
I have a fixed schema, which I need to use.
AVSC file (schema) (Sample only):
​
{"name": "person","type": "record","fields": [{"name": "address","type": {"type" : "record","name" : "AddressUSRecord","fields" : [{"name": "streetaddress", "type": "string"},{"name": "city", "type":"string"}]}}]}
Basically this chema has 2 parts, i.e. one in metadata and other is actual Data.
I have metadata created in a csv format and actual data in another csv.
I can use,
GetFile-->InferAvroSchema-->ConvertCSVtoAvro
flow to convert them separately into avro.
But that will not be in the format defined in schema.
I am looking for some flow or processor, where I can give or use 2 different csv as input and convert into avro as per schema provided?
Not sure I understand your use case well enough, but you should be able to use LookupRecord with a CSVRecordLookupService to get the data from the "actual data" CSV into a record that has its metadata fields already in it. You just need to specify which field (via a RecordPath expression) in the metadata corresponds to which field in the data CSV file.

Is there any tools available for converting JSON data to Avro on the fly?

I need a utility to convert my incoming JSON data(on which I have no control) to Avro Format(on the fly)ie. without using any Hive query, MR job etc.
I am able to generate the avrso schema perfectly.
I am aware that we do have an Avro Tool for the purpose.
From JSON to Avro: DataFileWriteTool
From Avro to JSON: DataFileReadTool
But the challenge is that it accepts JSON data in a typesafe format types.I am facing the exact same issue when I try to use this tool.
How to fix Expected start-union. Got VALUE_NUMBER_INT when converting JSON to Avro on the command line?
Any pointers/help ?

How to put tweets in avro files and save them in HDFS using Spring XD?

how can I put tweets in avro files and save them in HDFS using Spring XD? The docu only tells me to do the following:
xd:>stream create --name mydataset --definition "time | hdfs-dataset --batchSize=20" --deploy
This works fine for the source "time" but if I want to store tweets as avro it only puts the raw json Strings in the avro files, which is pretty dumb.
I could not find any detailed information about how to tell Spring XD to apply a specific Avro Schema (avsc) or convert the json String to Tweet object.
Do I have to build a custom converter?
Can somebody please help? This is driving me insane...
Thanks.
According to the hdfs-dataset documentation, Kite SDK is used to infer the AVRO schema based on the object you passed into it. From its perspective, you passed in a String, which is why it behaves as it does. Since there is no mechanism to explicitly pick a schema for hdfs-dataset to use, you'll have to create a Java Class representative of the tweet (or use the Twitter4J api), turn the tweet JSON into a Java object (a custom processor will be necessary), and output that to your sink. Hdfs-dataset will use a schema based on your class.

Resources