Does AVRO supports schema evolution? - avro

I am trying to understand whether AVRO supports schema evolution for the following case.
Kafka Producer writing using schema1
Then again producer writing using schema2 - A new field added with default value
Kafka Consumer consuming above both message using schema1?
I am able to read the first message successfully from Kafka but for the second message I am getting ArrayIndexOutOfBoundException. Ie - I am reading the second message (written using schema2) using schema1. Is this expected not to work? Is it expected to update the consumer first always?
Other option is to use schema registry but I don't want to opt this. So I would like know whether schema evolution for above case is possible?

When reading Avro data, you always need two schemata: the writer schema and the reader schema (they may be the same).
I'm assuming you're writing the data to Kafka using the BinaryMessageEncoder. This adds a 10-byte header describing the write schema.
To read the message (using the BinaryMessageDecoder), you'll need to give it the read schema (schema1) and a SchemaStore. This latter can be connected to a schema registry, but it need not. You can also use the SchemaStore.Cache implementation and add schema2 to it.
When reading the data, the BinaryMessageDecoder first reads the header, resolves the writer schema, and then reads the data as schema1 data.

Related

Azure Data Factory read url values from csv and copy to sql database

I am quite new in ADF so thats why i am asking you for any suggestions.
The use case:
I have a csv file which contains unique id and url's (see image below). i would like to use this file in order to export the value from various url's. In the second image you can see a example of the data from a url.
So in the current situation i am using each url and insert this manually as a source from the ADF Copy Activity task to export the data to a SQL DB. This is very time consuming method.
How can i create a ADF pipeline to use the csv file as a source, and that a copy activity use each row of the url and copy the data to Azure SQL DB? Do i need to add GetMetaData activity for example? so how?
Many thanks.
use a look up activity that reads all the data,Then use a foreach loop which reads line by line.Inside foreach use a copy activity where u can able to copy response to the sink.
In order to copy XML response of URL, we can use HTTP linked service with XML dataset. As #BeingReal said, Lookup activity should be used to refer the table which contains all the URLs and inside for each activity, Give the copy activity with HTTP as source and sink as per the requirement. I tried to repro the same in my environment. Below are the steps.
Lookup table with 3 URLs are taken as in below image.
For-each activity is added in sequence with Lookup activity.
Inside For-each, Copy activity is added. Source is given as HTTP linked service.
In HTTP linked service, base URL, #item().name is given. name is the column that stored URLs in the lookup table. Replace the name with the column name that you gave in lookup table.
In Sink, azure database is given. (Any sink of your requirement is to be used). Data is copied to SQL database.
this is the dataset HTTP inside the copy activity
This is the input of the Copy Activity inside the for each
this is the output of the Copy Activity
My sink is A Azure SQL Database without any tables yet. i would like to create auto table on the fly from ADF. Dont understand why this error came up

Delphi and attachment files in MS access database

I search many times in Google, SO, and I can't find anything about working with attachment via delphi, so I decide to write this question.
I have a table in .accdb database called Files with those fields:
IDFile PK AutoIncField,
FileName WideStringField,
FilesAttached WideMemoFiled.
How can I save/load files to/from attachment fields using delphi?
Attach files and graphics to the records in your database
The problem here, in delphi the datatype of FilesAttached is TWideMemoField,
when I write ShowMessage(FDTable1FilesAttached.Value); it give just the name of the attachment.
I don't know how to Insert/save files to/from that field using delphi.
It didn't seem that hard to find VBA/C# examples of working with .accdb Attachment fields which should translate fairly easily into Delphi. However, it turned out to be more difficult than I imagined to find something that a) hadn't misunderstood what Attachment fields actually are and b) actually works. Skip to the update section below.
For example, googling
accdb create attachment in vba
gives numerous hits including this one
http://sourcedaddy.com/ms-access/working-with-attachment-fields.html
which you might try as a starting point. It uses MS DAO objects, and includes straightforward code for storing files to Attachment fields and for accessing them. You would need to create a Delphi wrapper unit for the DAO type library, if you don't already have one, using the IDE's Import Type Library
If you would prefer something ADO-based, you might take a look at
https://www.codeproject.com/Questions/843001/Handling-fields-of-Attachment-type-in-MS-Access-us
Update See the function OpenFirstAttachmentAsTempFile in the post by "aspen" (date = 4/11/2012 07:18 am) in this thread
https://access-programmers.co.uk/forums/showthread.php?t=224112&page=2
which shows an apparently successful attempt to extract a file from an attachment field (the thread also contains several other attempts at coding this function).
Note in particular this line
Set rstChild = rstCurrent.Fields(strFieldName).Value ' the .Value for a complex field returns the underlying recordset
which implies that the Value of the attachment field can return a recordset which contains the attached file(s).
Presumably, importing a recent version of the DAO type library into Delphi would allow
a Delphi app to do the same thing, and then one could reverse-engineer the rstChild recordset to see how to populate this field in code. I haven't done that yet, though.

cumulocity mqtt measurement

I am pretty new to Cumulocity and I am trying to get data into the platform from my own device using mqtt and the smartrest templates. I can get data in using the static templates but they only support certain data types. I am struggling to create the appropriate smartrest template in the UI and the documentation doesn't go into much detail.
I get that the template name goes in the MQTT topic (or selected on login as part of the username) in s/ut/template_name and the messageId of the messages in the template get matched to the first CSV field of the MQTT publish payload. What I don't get is the template terminology. In the UI I choose API->Measurement and Method->POST and I am presented with required values $.type and $.time. My questions:
Is $.type the "measurement fragment type" name or do I have to make it "c8y_CustomMeasurement"? Can I call it whatever I want?
$.time has a value field. Is this the default value if one is not supplied in the publish?
I assume I need to add a numerical value in the optional API values. To link it to the value of the data point should I make the key "c8y_CustomMeasurement.custom.value"?
Am I way off base here?
Every time I publish to my own smartrest template the server drops the connection so I assume its an error in my template setup but I don't see a way of accessing debug messages (also nothing is published back to me on s/e or s/dt).
For the sake of an example, lets say I wish to publish a unitless, timestamped pulse count with payload format "mId,ts,value" with example data "p01,'2017-07-17 12:34:00',1234"
What you wrote so far is mostly correct just to be a bit more precise:
The topic is s/uc/template_id (not the template name, this is just a label)
The $.type refers to the 'type' fragment in the measurement JSON. It is a free text field
In 99% of cases you want to leave the $.time empty. If you set something here it is not the default but fixed to that timestamp and you cannot change it when using the template. If you leave it empty and still not send something in
Example: p01,2017-07-17T12:34:00,1234 (no quotes arounf timestamp and ISO8601 format
Example without sending time: p01,,1234 (sending empty string as time results in server time beeing set. The template is the same)
Hope these points help you to find you issue

Apache Solr: Merging documents from two sources before indexing

I need to index data from a custom application in Solr. The custom app stores metadata in an Oracle RDBMS and documents (PDF, MS Word, etc.) in a file store. The two are linked in the sense that the metadata in the database refers to a physical document (PDF) in the file store.
I am able to index the metadata from the RDBMS without issues. Now I would like to update the indexed documents with an additional field in which I can store the parsed content from the PDFs.
I have considered and tried the following
1. Using Update RequestHandler to try and update the indexed document with . This didn't work and the original document indexed from the RDBMS was overwritten.
2. Using SolrJ to do atomic updates but I am not sure if this is a good approach for something like this
Has anyone come across this issue before and what would be the recommended approach?
You can update the document, but it requires that you know the id of the existing document. For example:
{
"id": "5",
"parsed_content":{"set": "long text field with parsed content"}
}
Instead of just saying "parsed_content":"something" you have to wrap the value in "parsed_content":{"set":"something"} to trigger adding it to the existing document.
See https://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22field.22 for documentation on how to work with multivalued fields etc.

how to send HL7 message using mirth by reading data from my database

I'm having a problem is sending(creating) an HL7 message using mirth.
I want to read data from my patient table in SQLSERVER 2008 and, using that data,
I want to send a message to my destination connector, a file writer. I want my messages to get saved in the file writer's output directory.
So far I'm able to generate the message, but the size of the output file in my destination directory is increasing as the channel's polling time goes on.
Have I done something wrong in the transformer mapping?
UPDATE:
The size of the output file in my destination directory IS increasing. (My .txt file starts from 1 kb and goes to 900kb and so on). This is happening becasue same data is getting generated again and again and multiple times too. for eg. my generated message has one(MSH,PID,PV1,ORM) for one row of data in my Database. The same MSH,PID, PV1 and ORM are getting generated multiple times.
If you are seeing the same data generated in your output directory multiple time, the most likely cause is that you are not doing anything to indicate to your database that a given record has been processed.
For example, if you have 1 record in your database: ["John", "Smith", "12134" ...] on the first poll, you will generate 1 message. If on the second poll you also have a second record ["Fred", "Jones", "98371" ...], you will generate TWO messages - one for John Smith and one for Fred Jones. And so on.
The key is to use the "Run On-Update Statement" of your Database Reader (Source) connector to update the database table you are polling with an indication that a given record has been processed. This ensures that the same record is not processed multiple times.
This requires that your source table have some kind of column to indicate the record has been processed. Mirth will not keep track of this for you - you must do it manually.
You can't have a file reader as a destination, so I assume you mean file writer. You say that "the size of my file in my destination is increasing." Is that a typo? Do you mean NOT increasing?
If it is increasing, then your messages are getting generated and you can view them to start your next round of troubleshooting...
If not, the you should look at the message log in the dashboard to see what is happening on a message-by-message basis - that would be the next place to troubleshoot.
You have to have a way of distinguishing what records to pull from the database by filtering on some sort of status flag or possible a time-stamp. Then, you have to use some sort of On-Update statement to mark these same records as processed.
i.e.
Select id, patient, result from results where status_flag='N'
or
Select * from results where status_flag = 'N' and created_date >= '9/25/2012'
Then, in either a transformer step or the On-Update section of your Source, you would do something like:
Update results
set status_flag = 'Y' where id=$(id)
If you do not do something like this and you have Mirth polling at a certain interval, it will just keep pulling the same records over and over.
You have to change your connector type as Database reader in source.
You have to change your connector type as file writer in the destination.
And you can write your data in the file, For which you have access to write.
while creating HL7 template you have to use the following code in outbound message template
MSH|^~\&|||
Thanks
Krishna

Resources