Azure Data Factory Stored Procedure Multiple Inputs - stored-procedures

In my data factory I've got a stored procedure which manipulates 2 tables for a single output. I need to pass 2 sqlWriterTableType but I cant seem to see how this is possible, anyone had experience of doing this ?
"sink": {
"type": "SqlSink",
"sqlWriterStoredProcedureName": "spDashStat",
"sqlWriterTableType": "UserType",
"sqlWriterTableType": "StatsType",
"writeBatchSize": 0,
"writeBatchTimeout": "00:00:00"
}
},
"inputs": [
{
"name": "InputDataset-kpx"
},
{
"name": "InputDataset-kpx"
},

I dont think you explained it correctly, you want to call a stored procedure and give parameters to it? That is explained here: https://learn.microsoft.com/en-us/azure/data-factory/transform-data-using-stored-procedure
Remember that its not mandatory for a pipeline to have an input dataset. Use an output dataset to save the output of your stored procedure.
Please give more info for a more detailed answer :)
Cheers!

Related

Exclude empty fields from Log4J2 JsonTemplateLayout output

The log4j2 PatternLayout offers a %notEmpty conversion pattern that allows you to skip sections of the pattern that refer to empty variables.
Is there any way to do something similar for JsonTemplateLayout, specifically for thread context data (MDC)? It correctly (IMO) suppresses null fields, but it doesn't do the same with empty ones.
E.g., given the following in my JSON template:
"application": {
"name": { "key": "x-app", "$resolver": "mdc" },
"context": { "key": "x-app-context", "$resolver": "mdc" },
"instance": {
"name": { "key": "x-appinst", "$resolver": "mdc" },
"context": { "key": "x-appinst-context", "$resolver": "mdc" }
}
}
is there a way to prevent blocks like this from being logged, where the only data in the subtree is the empty string values for context?
"application":{"context":"","instance":{"context":""}}
(Yes, ideally I'd prevent those empty strings being put into the context in the first place, but this isn't my app, I'm just configuring it.)
JsonTemplateLayout author speaking here. Currently, JsonTemplateLayout doesn't support blank property exclusion for the following reasons:
The definition of empty/blank is ambiguous. One might have, null, {}, "\s*", [], [[]], [{}], etc. as valid JSON values. Which one of these are empty/blank? Let's assume we have agreed on a certain behavior. Will it apply to the rest of its users?
Checking if a value is empty/blank incurs an extra runtime cost.
Most of the time you don't care. You persist logs in a storage system, e.g., ELK stack, and there blank value elimination is provided out of the box by the storage engine in the most efficient way.
Would you mind sharing your use case, please? Why do you want to prevent the emission of "context": "" properties? If you deliver your logs to Elasticsearch, there you can easily exclude such fields via appropriate index mappings.
Near as I can tell, no. I would suggest you create a Jira issue to get that addressed.

Azure Data Factory get data for "For Each"component from query

The situation is as follows: I have a table in my database that recieves about 3 million rows each day. We want to archive this table on a regular base, so that only the 8 most recents weeks are in the table. The rest of the data can be archived tot AZure Data lake.
I allready found out how to do this by one day at a time. But now I want to run this pipeline each week for the first seven days in the table. I assume I should do this with the "For Each" component. It should itterate along the seven distinct dates that are present in the dataset I want to backup. This dataset is copied from the source table to an archive table on forehand.
It's not difficult to get the distinct dates with a SQL query, but how to get the result of this query into an array that is used for the "For Each" component?
The issue is solved thanks to a co-worker.
What we have to do is assign a parameter to the dataset of the sink. Does not matter how you name this and you do not have to assign a value to it. But let's assume this parameter is called "date"
After that you can use this parameter in the filename of the sink (also in dataset) with by using "#dataset().Date".
After that you go back to the copyactivity and in the sink you assign a dataset property to #item().DateSelect. (DateSelect is the field name from the array that is passed to the For Each activity)
See also the answer from Bo Xioa as part of the answer
This way it works perfectly. It's just a shame that this is not well documented
You can use lookup activity to fetch the column content, and the output will be like
{
"count": "2",
"value": [
{
"Id": "1",
"TableName" : "Table1"
},
{
"Id": "2",
"TableName" : "Table2"
}
]
}
Then you can pass the value array to the Foreach activity items field by using the pattern of #activity('MyLookupActivity').output.value
ref doc: Use the Lookup activity result in a subsequent activity
I post this as an answer, because the error does not fit into a comment :D
have seen antoher option to accomplish this. That is by executing a pipeline from another pipeline. And in that way I can define the dates that I should iterate over as a parameter in the second pipeline (learn.microsoft.com/en-us/azure/data-factory/…). But unformtunately this leads to the same rsult as when just using the foreach parameter. Because in the filename of my data lake file I have to use: #{item().columname}. I can see in the monitoring view that the right values are passed in the iteration steps, but I keep getting an error:
{
"errorCode": "2200",
"message": "Failure happened on 'Sink' side. ErrorCode=UserErrorFailedFileOperation,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=The request to 'Unknown' failed and the status code is 'BadRequest', request id is ''. {\"error\":{\"code\":\"BadRequest\",\"message\":\"A potentially dangerous Request.Path value was detected from the client (:). Trace: cf3b4c3f-1681-4073-b225-17e1c07ec76d Time: 2018-08-02T05:16:13.2141897-07:00\"}} ,Source=Microsoft.DataTransfer.ClientLibrary,''Type=System.Net.WebException,Message=The remote server returned an error: (400) Bad Request.,Source=System,'",
"failureType": "UserError",
"target": "CopyDancerDatatoADL"
}

How to concatenate 2 fields into one during query time in solr

I have a document in solr which is already indexed and stored like
{
"title":"Harry potter",
"url":"http://harrypotter.com",
"series":[
"sorcer's stone",
"Goblin of fire",
]
}
My requirement is,during query time when I try to retrieve the document
it should concatenate 2 fields in to and give the output like
{
"title":"Harry potter",
"url":"http://harrypotter.com",
"series":[
"sorcer's stone",
"Goblin of fire",
],
"title_url":"Harry potter,http://harrypotter.com"
}
I know how to do it during index time by using URP but I'm not able to understand how to achieve this during query time.Could anyone please help me with this.Any sample code for reference would be a great help to me.Thanks for your time.
concat function is available in solr7:
http://localhost:8983/solr/col/query?...&fl=title,url,concat(title,url)
if you are in an older solr, how difficult is to do this on the client side?
To concat you can use concat(field1, field2).
There are many other functions to manipulate data while retrieving.
You can see that here.

Livebinding JSON objects and arrays

Good evening all.
I'm currently trying to get to grips with livebindings in Delphi as I'd like to refresh one of my current projects (complete rework from the base for the purpose of pushing to other platforms, optimizing performance and minimizing the code). I'm working with a web API which returns JSON data. The returned JSON format for one example call would look like this;
{
"response": {
"ips": [
{
"ip": "111.222.333.444",
"classification": "regular",
"hits": 134,
"latitude": 0.0000,
"longitude": 0.0000,
"zone_name": "example.com"
},
{
"ip": "555.666.777.888",
"classification": "regular",
"hits": 134,
"latitude": 50.0000,
"longitude: 50.0000,
"zone_name": "example-2.com"
},
]
},
"result": "success",
"msg": null
}
As you can see, it's a JSON object with an array and some data fields of various types (string, float, integer, etc).
Within my application, I've got the TRESTClient, TRESTRequest, TRESTResponse, TRESTResponseDataSetAdapter, TClientDataSet, and TBindSourceDB components. I've also got a TButton, a TMemo, and a TListView. I've managed to hook all the components up via livebindings and the entire data returned from the call is displayed in the memo when I click the button (which executes the request).
Where I'm struggling is with linking the data to the ListView. I've created the FieldDefs for the TClientDataSource as such (this is the literal tree view in relation to ChildDefs);
|--result (Type: ftString)
|--response (Type: ftObject)
|--|--ips (Type: ftArray, Size: 6)
|--|--|-- ip (Type: ftString)
|--|--|-- classification (Type: ftString)
|--|--|-- hits (Type: ftInteger)
|--|--|-- latitude (Type: ftFloat)
|--|--|-- longitude (Type: ftFloat)
|--|--|-- zone_name (Type: ftString)
I've then livebinded/livebound BindSourceDB1's response.ips[0] to the TListView's Item.Text field. Unfortunately, when I run the application and execute the request, I get an error;
ClientDataSet1: Field 'response.ips[0]' not found
In this instance, I'm trying to retrieve the response.ips[index].ip field of each item in the array and output it as an individual item in the TListView. Unfortunately, even livebinding the response.ips field without an index still presents a similar error. However, if I link it to the result field, then it returns the 'success' message inside the listview as expected.
I did take a look at Jim McKeeth's REST client example and that got me to the current point, but working out how to adapt it for my own data is proving a little challenging. I've noticed that the TRESTResponseDataSetAdapter also has it's own FieldDefs property, so I'm not sure whether I should define my fields there or not.
I imagine I've just got the data types setup incorrectly or missed something minor, but I'd appreciate any assistance.
I figured it out;
Set up your REST components
For the TRESTResponseDataSetAdapter, set it's RootElement property to response.ips
Then, add the fields ip, classification, hits, latitude, longitude, and zone_name as it's FieldDefs
Right-click the TRESTResponseDataSetAdapter and select 'Update DataSet'
Livebind one of the fields from the TRESTResponseDataSetAdapter to the item.text property of the TListView
The application then worked correctly and reflects the data properly.

Storing graph-like structure in Couch DB or do include_docs yourself

I am trying to store network layout in Couch DB, but my solution provides rather randomized graph.
I store a nodes with a document:
{_id ,
nodeName,
group}
and storing links in traditional:
{_id, source_id, target_id, value}
Following multiple tutorials on handling joins and multiple relationship in Couch DB I created view:
function(doc) {
if(doc.type == 'connection') {
if (doc.source_id)
emit("source", {'_id': doc.source_id});
if(doc.target_id)
emit("target", {'_id': doc.target_id});
}
}
which should have emitted sequence of source and target id, then I pass it to the list function with include_docs=true, assumes that source and target come in pairs stitches everything back in a structure like this:
{
"nodes":[
{"nodeName":"Name 1","group":"1"},
{"nodeName":"Name 2","group":"1"},
],
"links": [
{"source":7,"target":0,"value":1},
{"source":7,"target":5,"value":1}
]
}
Although my list produce a proper JSON, view map returns number of rows of source docs and then target docs.
So far I don't have any ideas how to make this thing working properly - I am happy to fetch additional values from document _id in the list, but so far I havn't find any good examples.
Alternative ways of achieving the same goal are welcome. _id values are standard for CouchDB so far.
Update: while writing a question I came up with different view which sorted my immediate problem, but I still would like to see other options.
updated map:
function(doc) {
if(doc.type == 'connection') {
if (doc.source_id)
emit([doc._id,0,"source"], {'_id': doc.source_id});
if(doc.target_id)
emit([doc._id,1,"target"], {'_id': doc.target_id});
}
}
Your updated map function makes more sense. However, you don't need 0 and 1 in your key since you have already "source"and "target".

Resources