Get Metadata Multiple Source File System Into Azure SQL Table

Get Metadata Multiple Source File System Into Azure SQL Table - stored-procedures

I have Multiple folder and files which from FileSystem (linked service) on Azure Data Factory. and my activity is references on link: https://www.sqlservercentral.com/articles/working-with-get-metadata-activity-in-azure-data-factory
for now I'm using process metadata FileName and LastModified per file like this:
and then i'm using stored-procedure on ADF like this:
ALTER PROCEDURE [dbo].[SP_FileSystemMonitoring]
(
-- Add the parameters for the stored procedure here
#FLAG int,
#FILE_NAME nvarchar(100),
#LAST_MODIFIED datetime
)
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON
-- Insert statements for procedure here
IF ( #FILE_NAME IS NOT NULL )
BEGIN
UPDATE [dwh].[FileSystemMonitoring]
SET STATUS = #FLAG,
PROCESS_DATE = DATEADD(HH, 7, Getdate()),
REPORT_DATE = DATEADD(hh,7,(DATEADD(dd,-1,GETDATE()))),
LAST_MODIFIED = #LAST_MODIFIED
WHERE FILE_NAME = #FILE_NAME
but, I want on 1 activity can get metadata on 1 folder and then then file that folder insert to Azure SQL Database, for example
folderA/file1.txt
folderA/file2.txt
On that Azure SQL Table like this:
--------------------------
File_Name | Last_Modified
--------------------------
file1.txt | 2021-12-19 13:45:56
file2.txt | 2021-12-18 10:23:32
I have no idea, because I'm confuse how to mapping on that sink on Azure SQL Table. Thanks before ...

Confused by your question, is it that you want to get the details of the file or folder from the get metadata activity? Or do you want to enumerate/store the child items of a root folder?
If you simply want to reference the items from Get Metadata, add a dynamic expression that navigates the output value to the JSON property you seek. For example:
#activity('Get Metadata Activity Name').output.lastModified
#activity('Get Metadata Activity Name').output.itemName
You can pass each of the above expressions as values to your stored procedure parameters. NOTE: 'Get Metadata Activity Name' should be renamed to the name of your activity.
The output JSON of this activity is like so and will grow depending on what you select to return in the Get Metadata activity. In my example I'm also including childItems.
{
"exists": true,
"lastModified": "2021-03-04T14:00:01Z",
"itemName": "some-container-name",
"itemType": "Folder",
"childItems": [{
"name": "someFilePrefix_1640264640062_24_12_2021_1640264640.csv",
"type": "File"
}, {
"name": "someFilePrefix_1640286000083_24_12_2021_1640286000.csv",
"type": "File"
}
],
"effectiveIntegrationRuntime": "DefaultIntegrationRuntime (Australia Southeast)",
"executionDuration": 0,
"durationInQueue": {
"integrationRuntimeQueue": 0
},
"billingReference": {
"activityType": "PipelineActivity",
"billableDuration": [{
"meterType": "AzureIR",
"duration": 0.016666666666666666,
"unit": "Hours"
}
]
}
}
if you want to store the child files, then you can either parse childItems as an nvarchar JSON value into your stored procedure and then enumerate the JSON array in SQL.
You could also use ADF an enumerate the same childItems property using a ForEach activity for each file. You simply enumerate over:
#activity('Get Metadata Activity Name').output.childItems
You can then call the SP for each file referencing the nested item as:
#item().name
You'll also still be able to reference any of the root parameters from the original get metadata activity within the ForEach activity.

Related

Jmeter ForEach Controller failing to write variables to file in order retrieved

Jmeter ForEach Controller failing to write variables in original order correctly
I am executing a http request retrieving a json payload with an array of employees. For each record (employee) I need to parse the record for specific fields e.g. firstName, lastName, PersonId and write to a single csv file, incrementing a new row per record.
Unfortunately, the file created has two issues. The PersonId never gets written and secondly the sequence of the values is not consistent with the returned original values. Sometimes I get the same record for lastName with the wrong firstName and vice versa. Not sure if the two issues are related, I suspect my regular expression extract is wrong for a number.
Jmeter setup. (5.2.1)
jmeter setUp
Thread group
+ HTTP Request
++ JSON JMESPath Extractor
+ ForEach Controller
++ Regular Expression Extractor: PersonId
++ Regular Expression Extractor: firstName
++ Regular Expression Extractor: lastName
++ BeanShell PostProcessor
getWorker returns the following payload
jsonPayload
JSON JMESPath Extractor to handle the payload.
{
"items" : [
{
"PersonId" : 398378,
"firstName" : "Sam",
"lastName" : "Shed"
},
{
"PersonId" : 398379,
"firstName" : "Bob",
"lastName" : "House"
}
],
"count" : 2,
"hasMore" : true,
"limit" : 2,
"offset" : 0,
"links" : [
{
"rel" : "self",
"href" : "https://a.site.on.the.internet.com/employees",
"name" : "employees",
"kind" : "collection"
}
]
}
JSON JMESPath Extractor Configuration
Name of created variables: items
JMESPath expressions: items
Match No. -1
Default Values: Not Found
ForEach Controller
ForEach Controller Configuration
Input variable prefix: items
Start Index: Empty
End Index: Empty
Output variable name: items
Add "_"? Checked
Each of the Regular Expression Extracts follow the same pattern as below.
Extract PersonId with Regular Expression
Apply to: Main Sample Only
Field to check: Body
Name of created variable: PersonId
Regular Expression: "PersonId":"(.+?)"
Template: $1$
Match No. Empty
Default Value: PersonId
The final step in the thread is where I write out the parsed results.
BeanShell PostProcessor
PersonNumber = vars.get("PersonNumber");
DisplayName = vars.get("DisplayName");
f = new FileOutputStream("/Applications/apache-jmeter-5.2.1/bin/scripts/getWorker/responses/myText.csv", true);
p = new PrintStream(f);
this.interpreter.setOut(p);
print(PersonId+", "+ PersonNumber+ ", " + DisplayName);
f.close();
I am new to this and looking either for someone to tell me where I screwed up or direct me to a place I can read up on the appropriate topics. (Both are fine). Thank you.

For Each Controller doesn't know the structure of items variable since it is in JSON format. It is capable of just understanding an array and traverses through them. I would suggest to move away from For Each Controller in your case and use the JSON extractor itself for all the values like below
Person ID
First Name
Last Name
Beanshell Sampler Code
import java.io.FileWriter; // Import the FileWriter class
int matchNr = Integer.parseInt(vars.get("personId_C_matchNr"));
log.info("Match number is "+matchNr);
f = new FileOutputStream("myText.csv", true);
p = new PrintStream(f);
for (int i=1; i<=matchNr; i++){
PersonId = vars.get("personId_C_"+i);
FirstName = vars.get("firstName_C_"+i);
LastName = vars.get("lastName_C_"+i);
log.info("Iteration is "+i);
log.info("Person ID is "+PersonId);
log.info("First Name is "+FirstName);
log.info("Last Name is "+LastName);
p.println(PersonId+", "+FirstName+", "+LastName);
}
p.close();
f.close();
Output File
HOW THE ABOVE ACTUALLY WORKS
When you extract values using the matchNr, it goes in a sequential order in which the response has arrived. For example, in your case, Sam & Shed appear as first occurrences and Bob & House appear as subsequent occurrences. Hence JMeter captures them with the corresponding match and stores them as 1st First Name = Sam, 2nd First Name = Bob and so on.
GENERIC STUFF
The regex expression for capturing Person ID which you have used seems to be inaccurate. The appropriate one would be
"PersonId" :(.+?),
and not
"PersonId":"(.+?)"
Move to JSR223 processors instead of Beanshell as they are more performant. Source: Which one is efficient : Java Request, JSR223 or BeanShell Sampler for my script. The migration is pretty simple. Just copy the code that you have in Beanshell and paste it in JSR223.
Close any stream or writer that is open appropriately else it might cause issues when other users are trying to write to the file during load test
In case you are planning to use this file as a subsequent input within JMeter, please note that there is a space between comma and the next element. For example, it is "Sam, Shed" and not "Sam,Shed".JMeter by default does not trim any spaces and will use the value just like that. Hence you might want to take a judicious call regarding that space
Hope this helps!

Since JMeter 3.1 you shouldn't be using Beanshell, go for JSR223 Test Elements and Groovy language for scripting.
Given Groovy has built-in JSON support you shouldn't need any extractors, you can write the data into a file in a single shot like:
new groovy.json.JsonSlurper().parse(prev.getResponseData()).items.each { item ->
new File('myText.csv') << item.get('PersonId') << ',' << item.get('firstName') << ',' << item.get('lastName') << System.getProperty('line.separator')
}
More information: Apache Groovy - Why and How You Should Use It

Azure Data Factory get data for "For Each"component from query

The situation is as follows: I have a table in my database that recieves about 3 million rows each day. We want to archive this table on a regular base, so that only the 8 most recents weeks are in the table. The rest of the data can be archived tot AZure Data lake.
I allready found out how to do this by one day at a time. But now I want to run this pipeline each week for the first seven days in the table. I assume I should do this with the "For Each" component. It should itterate along the seven distinct dates that are present in the dataset I want to backup. This dataset is copied from the source table to an archive table on forehand.
It's not difficult to get the distinct dates with a SQL query, but how to get the result of this query into an array that is used for the "For Each" component?

The issue is solved thanks to a co-worker.
What we have to do is assign a parameter to the dataset of the sink. Does not matter how you name this and you do not have to assign a value to it. But let's assume this parameter is called "date"
After that you can use this parameter in the filename of the sink (also in dataset) with by using "#dataset().Date".
After that you go back to the copyactivity and in the sink you assign a dataset property to #item().DateSelect. (DateSelect is the field name from the array that is passed to the For Each activity)
See also the answer from Bo Xioa as part of the answer
This way it works perfectly. It's just a shame that this is not well documented

You can use lookup activity to fetch the column content, and the output will be like
{
"count": "2",
"value": [
{
"Id": "1",
"TableName" : "Table1"
},
{
"Id": "2",
"TableName" : "Table2"
}
]
}
Then you can pass the value array to the Foreach activity items field by using the pattern of #activity('MyLookupActivity').output.value
ref doc: Use the Lookup activity result in a subsequent activity

I post this as an answer, because the error does not fit into a comment :D
have seen antoher option to accomplish this. That is by executing a pipeline from another pipeline. And in that way I can define the dates that I should iterate over as a parameter in the second pipeline (learn.microsoft.com/en-us/azure/data-factory/…). But unformtunately this leads to the same rsult as when just using the foreach parameter. Because in the filename of my data lake file I have to use: #{item().columname}. I can see in the monitoring view that the right values are passed in the iteration steps, but I keep getting an error:
{
"errorCode": "2200",
"message": "Failure happened on 'Sink' side. ErrorCode=UserErrorFailedFileOperation,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=The request to 'Unknown' failed and the status code is 'BadRequest', request id is ''. {\"error\":{\"code\":\"BadRequest\",\"message\":\"A potentially dangerous Request.Path value was detected from the client (:). Trace: cf3b4c3f-1681-4073-b225-17e1c07ec76d Time: 2018-08-02T05:16:13.2141897-07:00\"}} ,Source=Microsoft.DataTransfer.ClientLibrary,''Type=System.Net.WebException,Message=The remote server returned an error: (400) Bad Request.,Source=System,'",
"failureType": "UserError",
"target": "CopyDancerDatatoADL"
}

set arrary data to Avro array type with C language

I am programming in Objective-C. I am using Apache Avro for data serialization.
My avro schema is this:
{
"name": "School",
"type":"record",
"fields":[
{
"name":"Employees",
"type":["null", {"type": "array",
"items":{
"name":"Teacher",
"type":"record",
"fields":[
{"name":"name", "type":"string"}
{"name":"age", "type":"int"}
]
}
}
],
"default":null
}
]
}
In my Objective-C code, I have an Array of Teacher objects, each teacher object contains value of name & age.
I want to write the teacher array data to file using Avro with the schema showing above. I am mainly concern about how to write data to the Employees array defined in above schema.
Here is my code (I have to use C style code to do it, I follow the Avro C documentation):
// I don't show this function, it constructs the a `avro_value_t` based on the schema. No problem here.
avro_value_t school = [self constructSchoolValueForSchema];
// get "Employees" field
avro_value_t employees;
avro_value_get_by_name(school, "employees", &employees, 0);
int idx = 0;
for (Teacher *teacher in teacherArray) {
// get name and age
NSString *name = teacher.name;
int age = teacher.age;
// set value to avro data type.
// here 'unionField' is the field of 'Employees', it is a Avro union type which is either null or an array as defined in schema above
avro_value_t field, unionField;
avro_value_set_branch(&employees, 1, &unionField);
// based on documentation, I should use 'avro_value_append'
avro_value_append(&employees, name, idx);
// I get confused here!!!!
// in above line of code, I append 'name' to 'employees',
//which looks not correct,
// because the 'Employees' array is an array of 'Teacher', not arrary of 'name'
// What is the correct way to add teacher to 'employees' ?
idx ++;
}
The question I want to ask is actually in the code comment above.
I am following that Avro C documentation, but I get lost how can I add each teacher to employees ? In my above code, I only added each teacher's name to the employees array.

I think there are two things wrong with your code, but I am not familiar with Avro, so I can't guarantee one of them. I just quickly peeked at the documentation you linked and here's how I understood avro_value_append:
It creates a new element, i.e. Teacher and returns that via the reference in the second parameter (so it returns-by-reference). My guess is that you then need to use the other avro... methods to fill that element (i.e. set the teacher's name and so forth). In the end, do this:
avro_value_t teacher;
size_t idx;
avro_value_append(&employees, &teacher, &idx); // note that idx is also returned by reference and now contains the new elements index
I'm not sure if you set up employees correctly, btw, I didn't have the time to look into that.
The second problem will arise with your usage of name at some point. I assume Avro expects C strings, but you're using an NSString here. You'll have to use the getCString:maxLength:encoding: method on it to fill a prepared buffer to create a C string that you can pass around within Avro. You can probably also use UTF8String, but read up its documentation: You'll likely have to copy the memory (memcpy shenanigans), otherwise your Avro container will get its data swiped away from under its feet.

Livebinding JSON objects and arrays

Good evening all.
I'm currently trying to get to grips with livebindings in Delphi as I'd like to refresh one of my current projects (complete rework from the base for the purpose of pushing to other platforms, optimizing performance and minimizing the code). I'm working with a web API which returns JSON data. The returned JSON format for one example call would look like this;
{
"response": {
"ips": [
{
"ip": "111.222.333.444",
"classification": "regular",
"hits": 134,
"latitude": 0.0000,
"longitude": 0.0000,
"zone_name": "example.com"
},
{
"ip": "555.666.777.888",
"classification": "regular",
"hits": 134,
"latitude": 50.0000,
"longitude: 50.0000,
"zone_name": "example-2.com"
},
]
},
"result": "success",
"msg": null
}
As you can see, it's a JSON object with an array and some data fields of various types (string, float, integer, etc).
Within my application, I've got the TRESTClient, TRESTRequest, TRESTResponse, TRESTResponseDataSetAdapter, TClientDataSet, and TBindSourceDB components. I've also got a TButton, a TMemo, and a TListView. I've managed to hook all the components up via livebindings and the entire data returned from the call is displayed in the memo when I click the button (which executes the request).
Where I'm struggling is with linking the data to the ListView. I've created the FieldDefs for the TClientDataSource as such (this is the literal tree view in relation to ChildDefs);
|--result (Type: ftString)
|--response (Type: ftObject)
|--|--ips (Type: ftArray, Size: 6)
|--|--|-- ip (Type: ftString)
|--|--|-- classification (Type: ftString)
|--|--|-- hits (Type: ftInteger)
|--|--|-- latitude (Type: ftFloat)
|--|--|-- longitude (Type: ftFloat)
|--|--|-- zone_name (Type: ftString)
I've then livebinded/livebound BindSourceDB1's response.ips[0] to the TListView's Item.Text field. Unfortunately, when I run the application and execute the request, I get an error;
ClientDataSet1: Field 'response.ips[0]' not found
In this instance, I'm trying to retrieve the response.ips[index].ip field of each item in the array and output it as an individual item in the TListView. Unfortunately, even livebinding the response.ips field without an index still presents a similar error. However, if I link it to the result field, then it returns the 'success' message inside the listview as expected.
I did take a look at Jim McKeeth's REST client example and that got me to the current point, but working out how to adapt it for my own data is proving a little challenging. I've noticed that the TRESTResponseDataSetAdapter also has it's own FieldDefs property, so I'm not sure whether I should define my fields there or not.
I imagine I've just got the data types setup incorrectly or missed something minor, but I'd appreciate any assistance.

I figured it out;
Set up your REST components
For the TRESTResponseDataSetAdapter, set it's RootElement property to response.ips
Then, add the fields ip, classification, hits, latitude, longitude, and zone_name as it's FieldDefs
Right-click the TRESTResponseDataSetAdapter and select 'Update DataSet'
Livebind one of the fields from the TRESTResponseDataSetAdapter to the item.text property of the TListView
The application then worked correctly and reflects the data properly.

Detecting and Using CURRENT SCHEMA in DB2 v8

I have a very big Stored Procedure in iSeries DB2 v8 which does the following:
Calls other stored procedures inside the same schema
Prepares dynamic sql statments from strings and runs them
Calls other functions from the same schema
Uses various tables from the same schema
My problem is that this Stored Procedure and the accompanying functions may change from that schema into another (ie. from 'superlib' to 'restorelib') and the whole code is currently hardcoded to run with the named schema.
What I want is to be able to do one of the two: either pass the name of the schema where everything is located via a parameter to the stored procedure, or have the stored procedure detect the name of the schema and use it to run itself.
This is a sample of my current code:
SELECT COUNT(*) INTO TotalNotDone FROM superlib.PROCESSTABLES WHERE PROCESS_FLAG < 1;
WHILE TotalNotDone > 0 DO
SET SQLLOOPSTMT = 'select name_to_proces from ' CONCAT SOURCELIBRARY CONCAT '.processtables where process_flag = 0' ;
PREPARE LOOPSTMT FROM SQLLOOPSTMT ;
OPEN LOOPCUR ;
FETCH LOOPCUR INTO TABLETOPROCESS ;
CALL superlib.SP_RESTORE_INSERTS ( SOURCELIBRARY , DESTLIBRARY , TABLETOPROCESS, P_STARTTIME ) ;
CLOSE LOOPCUR;
SELECT COUNT(*) INTO TotalNotDone FROM superlib.PROCESSTABLES WHERE PROCESS_FLAG < 1;
END WHILE ;
What I want is for NOT to have to write superlib inside the stored procedure to call or reference the tables i'm using and just have the Stored Procedure recognize it currently IS living in the schema superlib.
I've tried SET CURRENT SCHEMA = 'SUPERLIB'; and SET SCHEMA = 'SUPERLIB'; but neither works when calling the TABLES.
I've also changed the path when creating the Stored Procedure from:
SET PATH "QSYS","QSYS2","SYSPROC","SYSIBMADM","PROGUSER1" ;
to
SET PATH "QSYS","QSYS2","SYSPROC","SYSIBMADM","SUPERLIB" ;
but that apparently does nothing.

i believe you'll need to set the current path on the connection that calls the stored proc, not just when creating it.
see this: Weblogic: Call DB2 stored procedure without schema name (property currentSchema)
current path documentation here: http://publib.boulder.ibm.com/infocenter/db2luw/v8//topic/com.ibm.db2.udb.doc/admin/r0005877.htm

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Get Metadata Multiple Source File System Into Azure SQL Table - stored-procedures

Related

Jmeter ForEach Controller failing to write variables to file in order retrieved

Azure Data Factory get data for "For Each"component from query

set arrary data to Avro array type with C language

Livebinding JSON objects and arrays

Detecting and Using CURRENT SCHEMA in DB2 v8

Categories

Resources