Where do you do CallActivityAsync in orchestration method - azure-durable-functions

I have just started using durable functions and needs some advise for how to do fan out pattern correctly. I have a FTP server where from I read all the files. I want to start an Activity function for each file. As I understand it the orchestrator function will be called everytime an Activity function is being executed. I just want to read the files once. To avoid calling the code that read the files and starts the activity functions multiple times, what is the recommended approach? Is it having an activity function that that add's all the activity functions or is it using the IsReplaying property, or something different?
[FunctionName("OrchestrationMoveFilesToBlob")]
public static async Task<List<string>> RunOrchestrator(
[OrchestrationTrigger] DurableOrchestrationContext context)
{
var outputs = new List<string>();
if (!context.IsReplaying)
{
// Do you call your database here and make a call to CallActivityAsync for each row?
}
// doing it here is properly very wrong as it will be called multiple times
var tasks = new Task<string>[7];
for (int i = 0; i < 7; i++)
{
tasks[i] = context.CallActivityAsync<string>("E2_CopyFileToBlob",""); }
await Task.WhenAll(tasks);
return outputs;
}
When looking into the sample in the link below this actually calls it directly in the orchestrator function? Is this not really bad? It continue adding same activities again and again .... ?
https://learn.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-cloud-backup

Not sure I understand what you try to achieve but your code looks not bad so far. An orchestration is just called once (and maybe some times more for replay but this is not your problem here). From your orchestration you can call in a fan out all your activity functions (gathering a file from an ftp) each activity function one file. await Task.WhenAll(tasks) is your fan in. (you can use a List<Task> instead of the array and call .Add(task) on it if you want. In order to not edit your code I copied it here and added some comments and questions (feel free to edit here):
[FunctionName("OrchestrationMoveFilesToBlob")]
public static async Task<List<string>> RunOrchestrator(
[OrchestrationTrigger] DurableOrchestrationContext context)
{
var outputs = new List<string>();
if (!context.IsReplaying)
{
// just needed for things that should not happen twice like logging....
}
// if your work isn't a fixed list just call an activity
// which replies with the list of work here (e.g. list of filenames)
var tasks = new Task<string>[7]; // can be a List<Task> too
for (int i = 0; i < 7; i++)
{
tasks[i] = context.CallActivityAsync<string>("E2_CopyFileToBlob","");
}
await Task.WhenAll(tasks);
return outputs; // currently an empty list. What do you want to give back?
}

Related

Send future through stream

I was wondering, is it possible to send a Future through a stream that can get resolved after "reception" by a listener?
Cheers!
Futures are just values, so you can make a Stream<Future>. The receiver of the future can then wait on the future normally, and you can complete it at any point in between
It's generally frowned upon, though, because of the double asynchrony.
What it does is to make the receiver wait for a stream event, which you emit at one point, then have the receiver wait again for the actual result which may come at an even later point.
The most urgent issue with that is that you don't always know whether the future has been received yet when you complete it. Maybe the receiver paused the stream, maybe you are quicker than you expected. In any case, if you complete the future with an error before it has been received, then that error is probably going to end up uncaught, which may crash your entire program.
It also has bad usability. If you instead waited for the future on the sending side and only sent the event when the result was ready, it's easier and simpler for the receiver (they just get the result as normal), and it's usually just as good at achieving what you want to achieve.
If you really have a situation where a number of asynchronous results (futures) can complete in any order, but the receiver needs to know the original order of the futures themselves, then I guess a Stream<Future<X>> can be the answer (but do consider whether your solution is just needlessly complicated).
Example (in full generality):
Stream<Future<int>> randomDelays() {
var controller = StreamController<Future<String>>();
controller.onListen = () {
var rng = Random();
for (int i = 0; i < 10; i++) {
var delay = rng.nextInt(10);
var completer = Completer<int>();
controller.add(completer.future);
Timer(Duration(seconds: delay), () {
completer.complete(i);
});
}
controller.close();
}
}
or simpler:
Future<Stream<int>> randomDelays() async* {
var rng = Random();
for (int i = 0; i < 10; i++) {
var delay = rng.nextInt(10);
yield Future.delayed(Duration(seconds: delay), () => i);
}
}
I am not sure why your want that but sure you can do that:
import 'dart:async';
void main() {
final controller = StreamController<FutureOr<int>>();
controller.sink.add(Future.delayed(Duration(seconds: 1), () => 5));
print(controller.stream.first.runtimeType); // _Future<FutureOr<int>>
}
When you add a Future to the sink it will not be automatically awaited. So what you get out from the Stream are Future objects if you put Future objects in the sink.

Async Future, How to send an event back to caller that a single item has been fetched?

I'm making a Future method that lives inside a seperate class, that fetches a bunch XKCD comics, and puts them in a List and returns it.
And that is all fine and dandy, but I would like to notify back when a single comic has been fetched, so I can show a progress dialog, on how far we are.
This is my code:
// This is inside my class ComicManager
Future<List<ComicModel>> generateComicList() async {
List<ComicModel> comicList = new List<ComicModel>();
ComicModel latestComic = await getLatestComic();
for (var i = 1; i <= latestComic.num; i++) {
try {
http.Response response =
await http.get('https://xkcd.com/${i}/info.0.json');
Map comicmap = json.decode(response.body);
var comic = new ComicModel.fromJson(comicmap);
comicList.add(comic);
print(comic.num);
// Notify here that we have fetched a comic
} catch (ex) {
// Comic could apparently not be parsed, skip it.
}
}
return comicList;
}
How should I solve this?
There seems no particularly elegant way to do this. From some flutter code samples, it seems using VoidCallBack listeners is an accepted way.
First register callback functions in a Set
Set<VoidCallBack> listeners
Then define the callback functions you needed. And add them to the set
void fun()
//...
listeners.add(fun);//Or you can define a method to do this or simply pass the function through the constructor of this class.
Finally, write a notifyListeners function or its equivalent and call it wherever you want
void notifyListeners(){
for(final listener in listeners){
listener();
}
}
If you want callback functions to carry an argument, just change the VoidCallBack to whatever function types.
Found a solution.
I just used Streams like so:
Stream<ComicProgressModel> getAllComicsStream() async* {
// Do what you need to do here
// This will respond back when you are listening to the stream
yield stuffToYield; // Can be anything, and you can yield as many times you want
// When you reach the end of the method, the onDone method will be called.
// So if you are running a for loop, and call yield multiple times it onDone is only called the the this method ends
}
Then I can just listen to events like so:
Stream comicStream =
ComicManager().getAllComicsStream().asBroadcastStream();
StreamSubscription comicsub = comicStream.listen((onData) {
// Do what i need
});
Super easy to be honest.

Create a new stream from a stream in Dart

I suspect that my understanding of Streams in Dart might have a few holes in it...
I have a situation in which I'd like a Dart app to respond to intermittent input (which immediately suggests the use of Streamss -- or Futures, maybe). I can implement the behavior I want with listener functions but I was wondering how to do this in a better, more Dartesque way.
As a simple example, the following (working) program listens to keyboard input from the user and adds a div element to the document containing what has been typed since the previous space, whenever the space bar is hit.
import 'dart:html';
main() {
listenForSpaces(showInput);
}
void listenForSpaces(void Function(String) listener) {
var input = List<String>();
document.onKeyDown.listen((keyboardEvent) {
var key = keyboardEvent.key;
if (key == " ") {
listener(input.join());
input.clear();
} else {
input.add(key.length > 1 ? "[$key]" : key);
}
});
}
void showInput(String message) {
document.body.children.add(DivElement()..text = message);
}
What I'd like to be able to do is to create a new Stream from the Stream that I'm listening to (in the example above, to create a new Stream from onKeyDown). In other words, I might set the program above out as:
var myStream = ...
myStream.listen(showInput);
I suspect that there is a way to create a Stream and then, at different times and places, insert elements to it or call for it to emit a value: it feels as though I am missing something simple. In any case, any help or direction to documentation would be appreciated.
Creating a new stream from an existing stream is fairly easy with an async* function.
For a normal stream, I would just do:
Stream<String> listenForSpaces() async* {
var input = <String>[];
await for (var keyboardEvent in document.onKeyDown) {
var key = keyboardEvent.key;
if (key == " ") {
yield input.join();
input.clear();
} else {
input.add(key.length > 1 ? "[$key]" : key);
}
}
}
The async* function will propagate pauses through to the underlying stream, and it may potentially pause the source during the yield.
That may or may not be what you want, since pausing a DOM event stream can cause you to miss events. For a DOM stream, I'd probably prefer to go with the StreamController based solution above.
There are several methods and there is a whole package rxdart to allow all kinds of things.
Only the final consumer should use listen and only if you need to explicitly want to unsubscribe, otherwise use forEach
If you want to manipulate events like in your example, use map.
I wasn't originally planning to answer my own question but I have since found a very simple answer to this question in the dartlang creating streams article; in case it's helpful to others:
Specifically, if we'd like to create a stream that we can insert elements into at arbitrary times and places in the code, we can do so via the StreamController class. Instances of this class have an add method; we can simply use the instance's stream property as our stream.
As an example, the code in my question could be rewritten as:
import 'dart:html';
import 'dart:async';
main() async {
// The desired implementation stated in the question:
var myStream = listenForSpaces();
myStream.listen(showInput);
}
Stream<String> listenForSpaces() {
// Use the StreamController class.
var controller = StreamController<String>();
var input = List<String>();
document.onKeyDown.listen((keyboardEvent) {
var key = keyboardEvent.key;
if (key == " ") {
// Add items to the controller's stream.
controller.add(input.join());
input.clear();
} else {
input.add(key.length > 1 ? "[$key]" : key);
}
});
// Listen to the controller's stream.
return controller.stream;
}
void showInput(String message) {
document.body.children.add(DivElement()..text = message);
}
(As mentioned in the article, we need to be careful if we want to set up a stream from scratch like this because there is nothing to stop us from inserting items to streams that don't have associated, active subscribers; inserted items would in that case be buffered, which could result in a memory leak.)

Scheduling the same activity with different arguments in Amazon SWF

I am trying to schedule an activity in Amazon SWF. Initially, I used to loop through a list and schedule the activity for each value of the list. But this would invoke the activities in parallel which I did not want. So, I modified my code to do something like this:
Promise<Void> promiseArg = null;
for(Integer i : IntegerList){
Promise<Void> nextArg = activityClient.activity1(i);
promiseArg = nextArg;
}
Though code is working, I am not sure if this is the right way to do it. Any comments would be helpful.
What is the point of using promiseArg if it is unused?
If you want them to be dependent on prev method call, create an Asynchronous method and call that with promise variable.
//Main method of decider.
Promise<Integer> promiseArg = null;
Promise<Integer> nextArg = activityClient.activity1(i, 1);
for(Integer i : IntegerList){
Promise<Integer> nextArg = fun(nextArg, Promise.asPromise(i));
}
#Asynchronous
public Promise<Integer> fun(Promise<int> nextArg, int i) {
System.out.println("Testing with current value: " + Integer.toString(nextArg.get()));
return activityClient.activity1(i, nextArg.get());
}
I haven't tested it but it should work.
Apart from this, you can also try passing prev Promise variable to activity itself with #Wait annotation in the activity declaration.
Something like this,
prevArgs = activityClient.activity1(i, prevArg));
with Activity like,
XYZ activity1(int i,#Wait Promise<int> prevArgs){
//Please check if int should be used instead of Promise<int>
}

Method not called when using yield return

I'm having a little trouble with a method in which I use yield return this doesn't work...
public IEnumerable<MyClass> SomeMethod(int aParam)
{
foreach(DataRow row in GetClassesFromDB(aParam).Rows)
{
yield return new MyClass((int)row["Id"], (string)row["SomeString"]);
}
}
The above code never runs, when the call is made to this method it just steps over it.
However if I change to...
public IEnumerable<MyClass> SomeMethod(int aParam)
{
IList<MyClass> classes = new List<MyClass>();
foreach(DataRow row in GetClassesFromDB(aParam).Rows)
{
classes.Add(new MyClass((int)rows["Id"], (string)row["SomeString"]);
}
return classes;
}
It works just fine.
I don't understand why the first method never runs, could you help me in understanding what is happening here?
The "yield" version is only "run" when the caller actually starts to enumerate the returned collection.
If, for instance, you only get the collection:
var results = SomeObject.SomeMethod (5);
and don't do anything with it, the SomeMethod will not execute.
Only when you start enumerating the results collection, it will hit.
foreach (MyClass c in results)
{
/* Now it strikes */
}
yield return methods are actually converted into state machine classes that retrieve information lazily - only when you actually ask for it. That means that in order to actually pull data, you have to iterate over the result of your method.
// Gives you an iterator object that hasn't done anything yet
IEnumerable<MyClass> list = SomeMethod();
// Enumerate over the object
foreach (var item in list ) {
// Only here will the data be retrieved.
// The method will stop on yield return every time the foreach loops.
}
The reason it runs in the second case is because there's no yield block, and thus the entire method runs in one go.
In this specific case, it's unlikely that you'll have any advantage to use an iterator block over a regular one because your GetClassesFromDb() isn't one either. This means that it will retrieve all the data at the same time first time it runs. Iterator blocks are best used when you can access items one at a time, because that way you can stop if you don't need them anymore.
I had to learn in a near disastrous way how cool/dangerous yield is when I decided to make our company's parser read incoming data lazily. Fortunately only one of the handful of our implementing functions actually used the yield keyword. Took a few days to realize it was quietly not doing any work at all.
The yield keyword it will be as lazy as it possibly can, including skipping over the method altogether if you don't put it to work with something like .ToList() or .FirstOrDefault() or .Any()
Below are two variations, one using the keyword and one returning a straight-up list. One won't even bother to execute, while the other will, even though they seem the same.
public class WhatDoesYieldDo
{
public List<string> YieldTestResults;
public List<string> ListTestResults;
[TestMethod]
public void TestMethod1()
{
ListTest();
Assert.IsTrue(ListTestResults.Any());
YieldTest();
Assert.IsTrue(YieldTestResults.Any());
}
public IEnumerable<string> YieldTest()
{
YieldTestResults = new List<string>();
for (var i = 0; i < 10; i++)
{
YieldTestResults.Add(i.ToString(CultureInfo.InvariantCulture));
yield return i.ToString(CultureInfo.InvariantCulture);
}
}
public IEnumerable<string> ListTest()
{
ListTestResults = new List<string>();
for (var i = 0; i < 10; i++)
{
ListTestResults.Add(i.ToString(CultureInfo.InvariantCulture));
}
return ListTestResults;
}
}
Moral of the story: Make sure that if have a method that returns IEnumerable and you use yield in that method, you have something that will iterate over the results, or the method won't execute at all.

Resources