I have a Data Factory pipeline with a ForEach loop where I have two activities: one to call an HTTP endpoint to retrieve a file, one to store this file into an Azure storage account.
I have set the Batch Count to 5, to have the ability to speed up the process.
I use the "item()" property in the activities inside the for each. As far as I see, when there are parallel executions, it sounds like the item() property is not relevant because there are mixed executions at the same time and the item() property could be modified by another branch of the for each loop.
What I'm looking for is the ability to read the item() value in the first step of "for each" loop, store it into a variable that will have the "for each" current loop scope, then use the content of the variable in the latter stages of the loop.
Or maybe there is a better way to manage my use case, any ideas?
I think the only solution is to perform the traversal sequentially. In ADF variables are shared by threads in the heap. Multi-threaded execution of tasks cannot be controlled manually. So we need to check the Sequential option.
I got the same issue today and was going through answers here on SO.
Took me some time but I was able to fix the issue.
Assume Table_Name is the item I want to loop in foreach.
If I use #item.Table_Name for dynamic purpose then it gets me into an issue where different instances of foreach loop shares same value, that is wrong and get all failures afterward.
If I add a set Variable task as my first step in Foreach loop to capture the #item.Table_Name value then it works as expected. Later in other steps I was able to use the Table_Name as variable at all different places which needs a dynamic input.
As shown in snapshot, add a set variable and it works.
Related
I have a function that updates different #Published variables within an ObservableObject. They aren't updated all at the same time due to processing times of my algorithm between assignments.
Is there any clever way to delay the publishing of updates of variables to observers of my class? Something like manually blocking the publishing and then manually publishing when the function has finished?
Another way could be to do all the calculations and then assign the values to the variables at the end, but even then I'm assuming it probably won't be an exactly synchronised update?
Look at using zip. It will wait until all the attached publishers emit a value before emitting a tuple that contains all of them.
Consider following 4 lines of code:
Mono<Void> result = personRepository.findByNameStartingWith("Alice")
.map(...)
.flatMap(...)
.subscriberContext()
Fictional Use Case which I hope you will immediately map to your real task requirement:
How does one adds "Alice" to the context, so that after .map() where "Alice" is no longer Person.class but a Cyborg.class (assuming irreversible transformation), in .flatMap() I can access original "Alice" Person.class. We want to compare the strength of "Alice" person versus "Alice" cyborg inside .flatMap() and then send them both to the moon on a ship to build a colony.
I've read about 3 times:
https://projectreactor.io/docs/core/release/reference/#context
I've read dozen articles on subscriberContext
I've looked at colleague code who uses subscriberContext but only for Tracing Context and MDM which are statically initialised outside of pipelines at the top of the code.
So the conclusion I am coming to is that something else was named as "context" , what majority can't use for the overwhelming use case above.
Do I have to stick to tuples and wrappers? Or I am totally dummy and there is a way. I need this context to work in entirely opposite direction :-), unless "this" context is not the context I need.
I will await for Reactor developers attention (or later than that go to GitHub to raise an issue with the conceptual naming error, if I am correct) but in the meantime. I believed that Reactor Context could solve this:
What is the efficient/proper way to flow multiple objects in reactor
But what it actually reminds is some kind of mega closure over reactive pipeline propagating down->up and accepting values from outside in an imperative way, which IMO is a very narrow and limited use case to be called a "context", which will confuse more people to come.
Context and subscribeContext in posts you refer to are indeed one and the same...
The goal of the Context is more along the lines of attaching some information to a given subscription.
This works because upon subscription, a chain of Subscriber is constructed to "materialize" the processing, and by nature each given operator (or step) as a reference to its downstream in order to be able to push data to it.
As a result, it can also query it for its view of what the current subscription Context is, hence the down-to-up approach.
I am creating a Google dataflow pipeline, using Apache Beam Java SDK. I have a few transforms there, and I finally create a collection of Entities ( PCollection< Entity > ) . I need to write this into the Google DataStore and then, perform another transform AFTER all entities have been written. (such as broadcasting the IDs of the saved objects through a PubSub Message to multiple subscribers).
Now, the way to store a PCollection is by:
entities.DatastoreIO.v1().write().withProjectId("abc")
This returns a PDone object, and I am not sure how I can chain another transform to occur after this Write() has completed. Since DatastoreIO.write() call does not return a PCollection, I am not able to further the pipeline. I have 2 questions :
How can I get the Ids of the objects written to datastore?
How can I attach another transform that will act after all entities are saved?
We don't have a good way to do either of these things (returning IDs of written Datastore entities, or waiting until entities have been written), though this is far from the first similar request (people have asked for this for BigQuery, for example) and we're thinking about it.
Right now your only option is to wait until the entire pipeline finishes, e.g. via pipeline.run().waitUntilFinish(), and then doing what you wanted in your main program (e.g. you can run another pipeline).
I currently have about 15 scenarios in one feature file and want to share data between them. I thought context injection would work and it is working between steps within a single scenario but I can't get it to pass data from one scenario to another. How does everyone else achieve this ?
Short answer:
No one does this, as it's a Bad Idea™
Long answer:
If you have data valid for the whole feature, place it in the feature context. But this data can't be modified in one scenario and accessed in another.
The tests will be executed in an order determined by your test runner. Different runners may choose different orders. Execution order may be changed from one release of a runner to the next. Having temporal coupling between your tests, or implicit dependencies causes other problems as well, such as what happens if I want to run a test on its own? Now it will fail as the previous tests have not been run first. What If I want to run the tests in parallel? Now I can't as the tests have dependencies which need to be run first.
So what can I do?
My suggestion would be to use background steps (or explicit steps in your givens) to set up the data your individual scenario requires. Specflow makes reusing these steps, or have these steps reuse other steps, fairly simple. So if you need a customer and a product to create an order and you have scenarios like this:
Scenario: Creating a customer
Given a create a new customer called 'bob'
When I query for customers called 'bob'
Then I should get back a customer
Scenario: Creating a product
Given a create a new product called 'foo'
And 'foo' has a price of £100
When I query for products called 'foo'
Then I should get back a product
And the price should be £100
Scenario: customer places an order
Given I have a customer called 'bob'
And I have a product called 'foo' with a price £100
When 'bob' places an order for a 'foo'
Then an order for 1 'foo' should be created
here the last scenario creates all the data it needs. It can reuse the same step (with a different Given attribute) as Given a create a new customer called 'bob' and it can have a new step And I have a product called 'foo' with a price £100 which just calls the two existing steps Given a create a new product called 'foo'
And 'foo' has a price of £100
This ensures that the test is isolated and does not have any dependencies.
you can create a variable static IDictionary<String, Object> globalData in another class say Global.cs
Now, in scenario 1: save any object
Globle.globalData.Set("Key", Object);
in scenario 2: retrieve the object by its key and cast it into previous type
var dataFromScen1 = Global.globalData.Get("Key");
in this way you can use data from scenario 1 into scenario 2 but you will face issues during parallel execution
In one unit I'm running a query which will return one users details from the database. Right now I'm thinking of creating a user object and assigning the results of the query to the different properties, the setting that as a global variable. I wanted to know if there was a way to pass the data between the units without having to use the global variables.
Avoiding global variables is actually a good idea. Also, storing the query result as properties of a (database-independent) object makes sense, because the application might need the information also when the connection is not active.
To avoid a global variable, the easiest way would be to make the object a field of a main form (or datamodule), and use Getter methods to make it (and its fields) read-only. I would also implement the procedure of loading the dataset values into the object properties as a spearate class.