Determinism with Orchestrator/Sub-orchestrator & Durable Entities - azure-durable-functions

If I have an Orchestrator that calls multiple sub-orchestrators, can I safely use a single Durable Entity to share common data across the primary and sub-orchestrators without violating Durable Function determinism rules? I think this is legit, but want to make sure I'm not missing something. Thoughts? Thanks.

Yes, durable entities are safe to use across multiple orchestrators with the OrchestrationTrigger binding. With this binding, it will only read the entity one time and then put it in a table to be used on subsequent runs so it is deterministic. It will also guarantee that all operations are processed in order across multiple orchestrators because they operate on queues and only process one operation at a time.
But as with any distributed system operating working on the same data, it is prone to race conditions and async operations. This must be considered when developing.
ex. a counter with a initial value 5
Orch1 -> Get -> returns 5, commited value is now 5
Orch2 -> Get -> returns 5, commited value is now 5
Orch1 -> Set 5 + 1 -> commited value is now 6
Orch2 -> Set 5 + 1 -> commited value is now 6
increment instead or get and increment in one operation
Orch1 -> GetAndIncrement 1 -> returns 5, commited value is now 6
Orch2 -> GetAndIncrement 1 -> returns 6, commited value is now 7
Note: If this entity is also accessed by a normal functions with ReadEntityStateAsync then there exists a situation when this data reads the current committed state and not in sequence. This is because it then reads it from the storage table directly instead of calling it with the queue.
ex. value is 5
Orch1 -> GetIncrement 1 -> return 5, commit 6
Func1 -> ReadState -> depending on how close to the last operation it is made there is a possibility of it returning 5 or 6.

Related

Flux.switchOnNext variant that switches when the next publisher _emits_ rather than when it _is emitted_

Reactor has the switchOnNext operator which mirrors a sequence of publishers, cancelling the previous subscription whenever a new publisher becomes available:
For my use case I need a variation on this theme, where instead of cancelling the first publisher before subscribing to the next, I continue to mirror the sequence of publisher 1 until the point when publisher 2 emits its first item, and only then make the switch, as in this marble diagram (for anyone who finds this question later this is not a diagram of an existing operator from the Reactor docs, it's one I've just sketched myself):
I appreciate that in the general case this could potentially involve the operator maintaining an unbounded number of subscriptions waiting for any one of them to emit before cancelling the others, but for my use case I know that the initial flux-of-fluxes is finite (so I don't necessarily need the fully general publisher-of-publishers solution, something that works for a finite list of N publishers would be sufficient).
Can anyone see a clever combination of the existing operators that would implement this behaviour or do I need to write it from first principles?
Interesting problem! I think something like this might work:
#Test
void switchOnNextEmit() {
Duration gracePeriod = Duration.ofSeconds(2);
Flux.concat(
Mono.just(sequence("a", 1)),
Mono.just(sequence("b", 3)).delaySubscription(Duration.ofSeconds(5)),
Mono.just(sequence("c", 10)).delaySubscription(Duration.ofSeconds(10)))
.map(seq -> seq.publish().refCount(1, gracePeriod))
.scan(
Tuples.of(Flux.<String>never(), Flux.<String>never()),
(acc, next) -> Tuples.of(acc.getT2().takeUntilOther(next), next))
.switchMap(t -> Flux.merge(t.getT1(), t.getT2()))
.doOnNext(it -> System.out.println("Result: " + it))
.then()
.block();
}
private static Flux<String> sequence(String name, int interval) {
return Flux.interval(Duration.ofSeconds(interval))
.map(i -> name + i)
.doOnSubscribe(__ -> System.out.println("Subscribe: " + name))
.doOnCancel(() -> System.out.println("Cancel: " + name));
}
The important part is that we convert the sequences into hot publishers (meaning that resubscribing them doesn't cause another subscription, rather than we share the initial subscription). Then we use scan to emit a tuple containing the previous and the next, and finally we just use a regular switchMap to observe both (note how the first will stop when the second emits, due to takeUntilOther).
Note that the grace period is important because switchMap will first cancel and then subscribe the next, so if there isn't any grace period it would cause the current hot publisher to fully stop and start from scratch, which is not what we want.

doAfterSuccessOrError and doOnSuccess not called in order if there is intermediate operators

I'm using reactor-core 3.2.10.RELEASE. By default the doAfterSuccessOrError should be called after doOnSuccess called. But if I add then or publishOn operators it seems that it creates an inner Mono and the order of the doXXX changes.
Is this an intended behavior?
Mono.just(1)
.doAfterTerminate(() -> System.out.println("Terminated"))
.doAfterSuccessOrError((i, e) -> System.out.println("AfterSuccessOrError: " + i))
// Uncommenting any of these will change the order to
// .then(Mono.empty())
// .then()
// .publishOn(Schedulers.elastic())
.doFinally(s -> System.out.println("Finally called"))
.doOnSuccess(s -> System.out.println("Success"))
.subscribe(i -> System.out.println("Result: " + i));
Expected output:
Success
Result: 1
AfterSuccessOrError: 1
Terminated
Finally called
After uncommenting then or publishOn the order changes.
AfterSuccessOrError: 1
Terminated
Success
Result: 1
Finally called
Is this an intended behavior?
Yes. As per the docs for `then():
Let this Mono complete then play another Mono.
...so in this case, it's because the Mono up until that point completes (hence the first two operators print), and then the output from your next Mono prints (the final 3 operators).
For publishOn(), it's slightly different:
This operator influences the threading context where the rest of the operators in the chain below it will execute, up to a new occurrence of publishOn.
This means that you have your first two operators executing on one thread, and then the rest of your operators executing on a separate thread (defined by the elastic scheduler) - which is why you see the output. The mechanism is different here, but the end result happens to be exactly the same.

RocksDB: too many comparisons when using custom comparator

I am using RocksDB for storing and indexing encrypted data of fixed size of about 3KB (key,value = encryted_data, insertion_sequence_number), and for that purpose I'm providing a custom comparator to be used by RocksDB when inserting and searching. It compares the keys by decrypting them, so it is time consuming. This works well, but I noticed that the number of comparisons is much bigger than what I would expect for a given number of values. I also noticed that the number of stored values is bigger than what I expected.
Here is an example: I insert the values 1,2,3,7,14,22. (Each value is encrypted on insertion). When I print all contents I have:
RocksDB contents:
[row000]<[00007f1ebc0079d00d0000000000000068cd01bc1e7f0000]> -> {}
[row001]<[00007f1ebc0019900d0000000000000068cd01bc1e7f0000]> -> {}
[row002]<[c750e38063a871001af286720d7943095111a0dba808d00d]> -> {00000000}
[row003]<[f146398078a31c00de0aa3026f855101c36592eae1324905]> -> {01000000}
[row004]<[d687a251eb43d1081bb2ebaf0dd9f5077029f213b814f909]> -> {02000000}
[row005]<[571050962b08280b9f5b1ca9cf2d8e04959cd6d14a4eeb08]> -> {03000000}
[row006]<[4d4f01d6a6a7000b8800af4b4480d705bd9987efdeccc307]> -> {04000000}
[row007]<[5ee699152ff3cc0a21e3e8da9071740d7559b1b5baacd80a]> -> {05000000}
[row008]<[00007f1ebc0525000d0000000000000068cd01bc1e7f0000]> -> {}
[row009]<[00007f1ebc02ef500d0000000000000068cd01bc1e7f0000]> -> {}
[row010]<[00007f1ebc0191f00d0000000000000068cd01bc1e7f0000]> -> {}
Instead of 6, I have 11, so 5 are not coming from me. The number of comparisons for the last insertion is above 100. If I make a search, my comparator function is called above 100 times too. The values compared are mine and not mine, which doesn't make any sens because not valid encrypted values can not have a valid decryption.
What could explain such behavior? Fake values are maybe leaves of a tree internally used for storage, but the huge number of comparisons?

Restricting number of function iterations

I'm writing a code in Erlang which suppose to generate a random number for a random amount of time and add each number to a list. I managed a function which can generate random numbers and i kinda managed a method for adding it to a list,but my main problem is restricting the number of iterations of the function. I like the function to produce several numbers and add them to the list and then kill that process or something like that.
Here is my code so far:
generator(L1)->
random:seed(now()),
A = random:uniform(100),
L2 = lists:append(L1,A),
generator(L2),
producer(B,L) ->
receive
{last_element} ->
consumer ! {lists:droplast(B)}
end
consumer()->
timer:send_after(random:uniform(1000),producer,{last_element,self()}),
receive
{Answer, Producer_PID} ->
io:format("the last item is:~w~n",[Answer])
end,
consumer().
start() ->
register(consumer,spawn(lis,consumer,[])),
register(producer,spawn(lis,producer,[])),
register(generator,spawn(lis,generator,[random:uniform(10)])).
I know it's a little bit sloppy and incomplete but that's not the case.
First, you should use rand to generate random numbers instead of random, it is an improved module.
In addition, when using rand:uniform/1 you won't need to change the seed every time you run your program. From erlang documentation:
If a process calls uniform/0 or uniform/1 without setting a seed
first, seed/1 is called automatically with the default algorithm and
creates a non-constant seed.
Finally, in order to create a list of random numbers, take a look at How to create a list of 1000 random number in erlang.
If I conclude all this, you can just do:
[rand:uniform(100) || _ <- lists:seq(1, 1000)].
There have some issue in your code:
timer:send_after(random:uniform(1000),producer,{last_element,self()}),, you send {last_element,self()} to producer process, but in producer, you just receive {last_element}, these messages are not matched.
you can change
producer(B,L) ->
receive
{last_element} ->
consumer ! {lists:droplast(B)}
end.
to
producer(B,L) ->
receive
{last_element, FromPid} ->
FromPid! {lists:droplast(B)}
end.
the same reason for consumer ! {lists:droplast(B)} and {Answer, Producer_PID} ->.

Spawn many processes erlang

I wanna measure the performance to my database by measuring the time taken to do something as the number of processes increase. The intention is to plot a graph of performance vs number of processes after, anyone has an idea how? i am a beginner in elrlang please helo
Assuming your database is mnesia, this should not be hard. one way would be to have a write function and a read function. However, note that there are several Activity access contexts with mnesia. To test write times, you should NOT use the context of transaction because it returns immediately to the calling process, even before a disc write has occured. However, for disc writes, its important that you look at the context called: sync_transaction. Here is an example:
write(Record)->
Fun = fun(R)-> mnesia:write(R) end,
mnesia:activity(sync_transaction,Fun,[Record],mnesia_frag).
The function above will return only when all active replicas of the mnesia table have committed the record onto the data disc file. Hence to test the speed as processes increase, you need to have a record generator,a a process spawner , the write function and finally a timing mechanism. For timing, we have a built in function called: timer:tc/1, timer:tc/2 and timer:tc/3 which returns the exact time it took to execute (completely) a given function. To cut the story short, this is how i would do this:
-module(stress_test).
-compile(export_all).
-define(LIMIT,10000).
-record(book,{
isbn,
title,
price,
version}).
%% ensure this table is {type,bag}
-record(write_time,{
isbn,
num_of_processes,
write_time
}).
%% Assuming table (book) already exists
%% Assuming mnesia running already
start()->
ensure_gproc(),
tv:start(),
spawn_many(?LIMIT).
spawn_many(0)-> ok;
spawn_many(N)->
spawn(?MODULE,process,[]),
spawn_many(N - 1).
process()->
gproc:reg({n, l,guid()},ignored),
timer:apply_interval(timer:seconds(2),?MODULE,write,[]),
receive
<<"stop">> -> exit(normal)
end.
total_processes()->
proplists:get_value(size,ets:info(gproc)) div 3.
ensure_gproc()->
case lists:keymember(gproc,1,application:which_applications()) of
true -> ok;
false -> application:start(gproc)
end.
guid()->
random:seed(now()),
MD5 = erlang:md5(term_to_binary([random:uniform(152629977),{node(), now(), make_ref()}])),
MD5List = lists:nthtail(3, binary_to_list(MD5)),
F = fun(N) -> f("~2.16.0B", [N]) end,
L = [F(N) || N <- MD5List],
lists:flatten(L).
generate_record()->
#book{isbn = guid(),title = guid(),price = guid()}.
write()->
Record = generate_record(),
Fun = fun(R)-> ok = mnesia:write(R),ok end,
%% Here is now the actual write we measure
{Time,ok} = timer:tc(mnesia,activity,[sync_transaction,Fun,[Record],mnesia_frag]),
%% The we save that time, the number of processes
%% at that instant
NoteTime = #write_time{
isbn = Record#book.isbn,
num_of_processes = total_processes(),
write_time = Time
},
mnesia:activity(transaction,Fun,[NoteTime],mnesia_frag).
Now there are dependencies here, especially: gproc download and build it into your erlang lib path from here Download Gproc.To run this, just call: stress_test:start(). The table write_time will help you draw a graph of number of processes against time taken to write. As the number of processes increase from 0 to the upper limit (?LIMIT), we note the time taken to write a given record at the given instant and we also note the number of processes at that time.UPDATE
f(S)-> f(S,[]).
f(S,Args) -> lists:flatten(io_lib:format(S, Args)).
That is the missing function. Apologies.... Remember to study the table write_time, using the application tv, a window is opened in which you can examine the mnesia tables. Use this table to see increasing write times/ or decreasing performance as number of processes increase from time to time. An element i have left out is to note the actual time of the write action using time() which may be important parameter. You may add it in the table definition of the write_time table.
Also look at http://wiki.basho.com/Benchmarking.html
you might look at tsung http://tsung.erlang-projects.org/

Resources