Azure Data Factory get data for "For Each"component from query - foreach

The situation is as follows: I have a table in my database that recieves about 3 million rows each day. We want to archive this table on a regular base, so that only the 8 most recents weeks are in the table. The rest of the data can be archived tot AZure Data lake.
I allready found out how to do this by one day at a time. But now I want to run this pipeline each week for the first seven days in the table. I assume I should do this with the "For Each" component. It should itterate along the seven distinct dates that are present in the dataset I want to backup. This dataset is copied from the source table to an archive table on forehand.
It's not difficult to get the distinct dates with a SQL query, but how to get the result of this query into an array that is used for the "For Each" component?

The issue is solved thanks to a co-worker.
What we have to do is assign a parameter to the dataset of the sink. Does not matter how you name this and you do not have to assign a value to it. But let's assume this parameter is called "date"
After that you can use this parameter in the filename of the sink (also in dataset) with by using "#dataset().Date".
After that you go back to the copyactivity and in the sink you assign a dataset property to #item().DateSelect. (DateSelect is the field name from the array that is passed to the For Each activity)
See also the answer from Bo Xioa as part of the answer
This way it works perfectly. It's just a shame that this is not well documented

You can use lookup activity to fetch the column content, and the output will be like
{
"count": "2",
"value": [
{
"Id": "1",
"TableName" : "Table1"
},
{
"Id": "2",
"TableName" : "Table2"
}
]
}
Then you can pass the value array to the Foreach activity items field by using the pattern of #activity('MyLookupActivity').output.value
ref doc: Use the Lookup activity result in a subsequent activity

I post this as an answer, because the error does not fit into a comment :D
have seen antoher option to accomplish this. That is by executing a pipeline from another pipeline. And in that way I can define the dates that I should iterate over as a parameter in the second pipeline (learn.microsoft.com/en-us/azure/data-factory/…). But unformtunately this leads to the same rsult as when just using the foreach parameter. Because in the filename of my data lake file I have to use: #{item().columname}. I can see in the monitoring view that the right values are passed in the iteration steps, but I keep getting an error:
{
"errorCode": "2200",
"message": "Failure happened on 'Sink' side. ErrorCode=UserErrorFailedFileOperation,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=The request to 'Unknown' failed and the status code is 'BadRequest', request id is ''. {\"error\":{\"code\":\"BadRequest\",\"message\":\"A potentially dangerous Request.Path value was detected from the client (:). Trace: cf3b4c3f-1681-4073-b225-17e1c07ec76d Time: 2018-08-02T05:16:13.2141897-07:00\"}} ,Source=Microsoft.DataTransfer.ClientLibrary,''Type=System.Net.WebException,Message=The remote server returned an error: (400) Bad Request.,Source=System,'",
"failureType": "UserError",
"target": "CopyDancerDatatoADL"
}

Related

Add term to listItem in Microsoft Graph API

How do I add a term to a listItem in Microsoft Graph API?
For simple String types (ProductSegment in the example) I do the following:
PATCH https://graph.microsoft.com/v1.0/sites/{{sharepoint_site_id}}/lists/{{sharepoint_list_id}}/items/{{num}}/fields
{
"DisplayedName": "asdasfsvsvdvsdbvdfb",
"DocumentType": "FLYER",
"ProductSegment": ["SEG1"],
"TEST_x0020_2_x0020_ProductSegment": [{
"TermGuid": "c252c37d-1fa3-4860-8d3e-ff2cdde1f673"
}],
"Active": true,
"ProductSegment#odata.type": "Collection(Edm.String)",
"TEST_x0020_2_x0020_ProductSegment#odata.type": "Collection(Edm.Term)"
}
Obviously it won't work for TEST_x0020_2_x0020_ProductSegment. But I just cannot find any hints in the documentation.
I got one step closer thanks to the duplicated issue. First I found the name (not the id) of the hidden field TEST 2 ProductSegment_0 (notice the _0 suffix). Then assembled the term value to send: -1;#MyLabel|c352c37d-1fa3-4860-8d3e-ff2cdde1f673.
PATCH https://graph.microsoft.com/v1.0/sites/{{sharepoint_site_id}}/lists/{{sharepoint_list_id}}/items/{{num}}/fields
{
"DisplayedName": "asdasfsvsvdvsdbvdfb",
"DocumentType": "FLYER",
"ProductSegment": ["SEG1"],
"i9da5ea20ec548bfb2097f0aefe49df8": "-1;#MyLabel|c352c37d-1fa3-4860-8d3e-ff2cdde1f673",
"Active": true,
"ProductSegment#odata.type": "Collection(Edm.String)"
}
and so I can add one item. I would need to add multiple, so I wanted to add the values to an array and set the field type (i9da5ea20ec548bfb2097f0aefe49df8#odata.type) to Collection(Edm.String).
Now I get an error with the code generalException as opposed to an invalidRequest.
As far as I know, graph API does not support updating SharePoint taxonomy. For now, you can go with classic SharePoint REST API for example to accomplish "advanced" things like updating taxonomy terms. Probably a duplicate of: Can't Update Sharepoint Managed Meta Data Field from Microsoft Graph Explorer
Finally I got it.
Thanks #Nikolay for the linked issue.
As I also added this to the end of the question, first you need the name (not the id!) of the hidden field TEST 2 ProductSegment_0 (notice the _0 suffix). Then assemble the term values to send: -1;#MyLabel|c352c37d-1fa3-4860-8d3e-ff2cdde1f673 and -1;#SecondLabel|1ef2af46-1fa3-4860-8d3e-ff2cdde1f673, and separate them with ;# (actually the content of the label is irrelevant but some string needs to be there).
Looks utterly ridiculous but works.
PATCH https://graph.microsoft.com/v1.0/sites/{{sharepoint_site_id}}/lists/{{sharepoint_list_id}}/items/{{num}}/fields
{
"DisplayedName": "asdasfsvsvdvsdbvdfb",
"DocumentType": "FLYER",
"ProductSegment": ["SEG1"],
"i9da5ea20ec548bfb2097f0aefe49df8": "-1;#MyLabel|c352c37d-1fa3-4860-8d3e-ff2cdde1f673";#-1;#SecondLabel|1ef2af46-1fa3-4860-8d3e-ff2cdde1f673,
"Active": true,
"ProductSegment#odata.type": "Collection(Edm.String)"
}

Firebase Firestore Swift, Timestamp but server time?

With Firestore, I add a timestamp field like this
var ref: DocumentReference? = nil
ref = Firestore.firestore()
.collection("something")
.addDocument(data: [
"name": name,
"words": words,
"created": Timestamp(date: Date())
]) { ...
let theNewId = ref!.documentID
...
}
That's fine and works great, but it's not really correct. Should be using the "server timestamp" which Firestore supplies.
Please note this is on iOS (Swift) and Firestore, not Firebase.
What is the syntax to get a server timestamp?
The syntax you're looking for is:
"created": FieldValue.serverTimestamp()
This creates a token which itself has no date value. The value is assigned by the server when the write is actually written to the database, which could be much later if there are network issues, so keep that in mind.
Also keep in mind that because they are tokens, they can present different values when you read them, to which we can configure how they should be interpreted:
doc.get("created", serverTimestampBehavior: .none)
doc.get("created", serverTimestampBehavior: .previous)
doc.get("created", serverTimestampBehavior: .estimate)
none will give you a nil value if the value hasn't yet been set by the server. For example, if you're writing a document that relies on latency-compensated returns, you'll get nil on that latency-compensated return until the server eventually executes the write.
previous will give you any previous values, if they exist.
estimate will give you a value, but it will be an estimate of what the value is likely to be. For example, if you're writing a document that relies on a latency-compensated returns, estimate will give you a date value on that latency-compensated return even though the server has yet to execute the write and set its value.
It is for these reasons that dealing with Firestore's timestamps may require handling more returns by your snapshot listeners (to update tokens). A Swift alternative to these tokens is the Unix timestamp:
extension Date {
var unixTimestamp: Int {
return Int(self.timeIntervalSince1970 * 1_000) // millisecond precision
}
}
"created": Date().unixTimestamp
This is definitely the best explanation of how the timestamps work (written by the same Doug Stevenson who actually posted an answer): https://medium.com/firebase-developers/the-secrets-of-firestore-fieldvalue-servertimestamp-revealed-29dd7a38a82b
If you want a server timestamp for a field's value, use FieldValue.serverTimestamp(). This will return a token value that gets interpreted on the server after the write completes.

Google AutoML Ruby Gem Tables: Invalid String to assign to submessage field ''

I'm trying to use the AutoML prediction service from the Ruby gem google-cloud-automl and I keep getting errors. I already have a deployed model working with the online predictions.
Here is my current code:
payload = {
row: {
column_spec_ids: %w(COLUMN_NUMBER_1 COLUMN_NUMBER_2 COLUMN_NUMBER_3 COLUMN_NUMBER_4),
values: [
DATA_1,
DATA_2,
DATA_3,
DATA_4
]
}
}
client = Google::Cloud::AutoML::Prediction.new(version: :v1beta1)
response = client.predict(formatted_model_path, payload)
and I receive this error:
Google::Protobuf::TypeError: Invalid type String to assign to submessage field ''.
from path/to/my/vendor/bundle/ruby/2.5.0/gems/google-gax-1.8.1/lib/google/gax/util.rb:65:in `initialize'
(In my code, COLUMN_NUMBER_1 to _4 are actually real ID ant DATA_1 to _4 are strings)
formatted_model_path is the path of my model. I was able to access an NLP model earlier with this code (I only updated the payload format)
I've already tried to generate a Google::Cloud::AutoML::V1beta1::Row . I'm able to fill the column_spec_ids but every time I try this code
request = Google::Cloud::AutoML::V1beta1::Row.new
request.values = payload[:row][:values]
I get this error
Google::Protobuf::TypeError: Expected repeated field array
from (pry):4:in `method_missing'
I actually found the solution...
You need to provide the kind of data you are passing.
Instead of
values: [
DATA_1,
DATA_2,
DATA_3,
DATA_4
]
I should have
values: [
{string_value: DATA_1},
{string_value: DATA_2},
{string_value: DATA_3},
{string_value: DATA_4}
]
(and you can have string_value, number_value and some other kind, i think the full list is here)

Livebinding JSON objects and arrays

Good evening all.
I'm currently trying to get to grips with livebindings in Delphi as I'd like to refresh one of my current projects (complete rework from the base for the purpose of pushing to other platforms, optimizing performance and minimizing the code). I'm working with a web API which returns JSON data. The returned JSON format for one example call would look like this;
{
"response": {
"ips": [
{
"ip": "111.222.333.444",
"classification": "regular",
"hits": 134,
"latitude": 0.0000,
"longitude": 0.0000,
"zone_name": "example.com"
},
{
"ip": "555.666.777.888",
"classification": "regular",
"hits": 134,
"latitude": 50.0000,
"longitude: 50.0000,
"zone_name": "example-2.com"
},
]
},
"result": "success",
"msg": null
}
As you can see, it's a JSON object with an array and some data fields of various types (string, float, integer, etc).
Within my application, I've got the TRESTClient, TRESTRequest, TRESTResponse, TRESTResponseDataSetAdapter, TClientDataSet, and TBindSourceDB components. I've also got a TButton, a TMemo, and a TListView. I've managed to hook all the components up via livebindings and the entire data returned from the call is displayed in the memo when I click the button (which executes the request).
Where I'm struggling is with linking the data to the ListView. I've created the FieldDefs for the TClientDataSource as such (this is the literal tree view in relation to ChildDefs);
|--result (Type: ftString)
|--response (Type: ftObject)
|--|--ips (Type: ftArray, Size: 6)
|--|--|-- ip (Type: ftString)
|--|--|-- classification (Type: ftString)
|--|--|-- hits (Type: ftInteger)
|--|--|-- latitude (Type: ftFloat)
|--|--|-- longitude (Type: ftFloat)
|--|--|-- zone_name (Type: ftString)
I've then livebinded/livebound BindSourceDB1's response.ips[0] to the TListView's Item.Text field. Unfortunately, when I run the application and execute the request, I get an error;
ClientDataSet1: Field 'response.ips[0]' not found
In this instance, I'm trying to retrieve the response.ips[index].ip field of each item in the array and output it as an individual item in the TListView. Unfortunately, even livebinding the response.ips field without an index still presents a similar error. However, if I link it to the result field, then it returns the 'success' message inside the listview as expected.
I did take a look at Jim McKeeth's REST client example and that got me to the current point, but working out how to adapt it for my own data is proving a little challenging. I've noticed that the TRESTResponseDataSetAdapter also has it's own FieldDefs property, so I'm not sure whether I should define my fields there or not.
I imagine I've just got the data types setup incorrectly or missed something minor, but I'd appreciate any assistance.
I figured it out;
Set up your REST components
For the TRESTResponseDataSetAdapter, set it's RootElement property to response.ips
Then, add the fields ip, classification, hits, latitude, longitude, and zone_name as it's FieldDefs
Right-click the TRESTResponseDataSetAdapter and select 'Update DataSet'
Livebind one of the fields from the TRESTResponseDataSetAdapter to the item.text property of the TListView
The application then worked correctly and reflects the data properly.

Storing graph-like structure in Couch DB or do include_docs yourself

I am trying to store network layout in Couch DB, but my solution provides rather randomized graph.
I store a nodes with a document:
{_id ,
nodeName,
group}
and storing links in traditional:
{_id, source_id, target_id, value}
Following multiple tutorials on handling joins and multiple relationship in Couch DB I created view:
function(doc) {
if(doc.type == 'connection') {
if (doc.source_id)
emit("source", {'_id': doc.source_id});
if(doc.target_id)
emit("target", {'_id': doc.target_id});
}
}
which should have emitted sequence of source and target id, then I pass it to the list function with include_docs=true, assumes that source and target come in pairs stitches everything back in a structure like this:
{
"nodes":[
{"nodeName":"Name 1","group":"1"},
{"nodeName":"Name 2","group":"1"},
],
"links": [
{"source":7,"target":0,"value":1},
{"source":7,"target":5,"value":1}
]
}
Although my list produce a proper JSON, view map returns number of rows of source docs and then target docs.
So far I don't have any ideas how to make this thing working properly - I am happy to fetch additional values from document _id in the list, but so far I havn't find any good examples.
Alternative ways of achieving the same goal are welcome. _id values are standard for CouchDB so far.
Update: while writing a question I came up with different view which sorted my immediate problem, but I still would like to see other options.
updated map:
function(doc) {
if(doc.type == 'connection') {
if (doc.source_id)
emit([doc._id,0,"source"], {'_id': doc.source_id});
if(doc.target_id)
emit([doc._id,1,"target"], {'_id': doc.target_id});
}
}
Your updated map function makes more sense. However, you don't need 0 and 1 in your key since you have already "source"and "target".

Resources