is the following the proper/idiomatic way to split off a Flux into different processing paths and join them back - for the purpose of the question, events shouldn't be discarded, ordering is unimportant, and memory is unlimited.
Flux<Integer> beforeFork = Flux.range(1, 10);
ConnectableFlux<Integer> forkPoint = beforeFork
.publish()
;
Flux<String> slowPath = forkPoint
.filter(i -> i % 2 == 0)
.map(i -> "slow"+"_"+i)
.delayElements(Duration.ofSeconds(1))
;
Flux<String> fastPath = forkPoint
.filter(i -> i % 2 != 0)
.map(i -> "fast"+"_"+i)
;
// merge vs concat since we need to eagerly subscribe to
// the ConnectableFlux before the connect()
Flux.merge(fastPath, slowPath)
.map(s -> s.toUpperCase()) // pretend this is a more complex sequence
.subscribe(System.out:println)
;
forkPoint.connect();
i suppose i could also groupBy() then filter() on key() if the filter() function was slower than %.
NOTE that i do want the slowPath and fastPath to consume the same events from the beforeFork point since beforeFork is slow to produce.
NOTE that i do have a more complex followup (i.e. change to range(1,100) and the behavior around the prefetch boundary is confusing to me) - but i'd only make sense if the above snippet is legal.
I believe that it is more common to see this written this way:
Flux<Integer> beforeFork = Flux.range(1, 10).publish().autoConnect(2);
Flux<String> slowPath = beforeFork
.filter(i -> i % 2 == 0)
.map(i -> "slow"+"_"+i)
.delayElements(Duration.ofSeconds(1));
Flux<String> fastPath = beforeFork
.filter(i -> i % 2 != 0)
.map(i -> "fast"+"_"+i);
Flux.merge(fastPath, slowPath)
.map(s -> s.toUpperCase())
.doOnNext(System.out::println)
.blockLast();
A summary of the changes:
autoConnect(N) - allows us to specify that the beforeFork publisher is connected to after N subscribers. If we know the number of expected paths upfront, we can specify this and prevent caching or duplicate execution the publisher.
blockingLast() - we block on the joining Flux itself. You might have noticed that if you ran your current code only the fast results appear to be logged. This is because we were not actually waiting for the slow results to complete.
This is assuming that your original Publisher is finite with a fixed number of elements. Other changes would need to be made for something like Flux.interval or an ongoing stream.
For prefetch I can refer you to this question:
What does prefetch mean in Project Reactor?
Related
I'm trying to divide different short chains according to the head and tail I want in a long chain, and find the max duration in the short chains.
E.g..
Long chain:
NA1 -> NA2 -> NA3 -> NA4 -> NB1 -> NB2 -> NB3 -> NB4 ->...
I want to check whether the max duration in each chain is the second node.
NA1 -> NA2 -> NA3 -> NA4
NB1 -> NB2 -> NB3 -> NB4
...
( N means node, A,b and the number are the attribute and each node has its own duration )
MATCH p = (A:Task{FROMLOCTYPE:"1"})-[:path*]->(b:Task{TOLOCTYPE:"4"})
WITH reduce(output = [], n IN nodes(p) | output + n ) as tasks
But I'm stuck here and don't know how to check the maximum duration in each list.
Or do any operation in each list.
enter image description here
MATCH p = (A:Task{FROMLOCTYPE:"LOAD_PORT"})-[:wafer_path*]->(b:Task{TOLOCTYPE:"LOAD_PORT"})
return p limit 3
The photo is my sample data.
Sorry my previous description was not clear enough.
I think my problem should be when I find a bunch of lists.
How do I search these nodes at the same time.
A bit like how to find the maximum value of an array in a two-dimensional array.
I can group nodes according to the classification I want.
MATCH p = (A:Task{FROMLOCTYPE:"LOAD_PORT"})-[:wafer_path*]->(b:Task{TOLOCTYPE:"LOAD_PORT"})
return p
P will return all the Ps that Neo4j finds.
And the nodes in p has its own duration.
[12,18,14,15]
[15,19,12,11]
[12,15,13,14]
But I ’m not sure how to query for the max duration in each group with cypher.
Or check the MAX duration node is (Task:{position:"2")
Thank again for your reply.
I don't know if it's what you're looking for, but here is an example on how to find the item with a max value plus its position in the array :
MATCH p = (A:Task{FROMLOCTYPE:"1"})-[:path*]->(b:Task{TOLOCTYPE:"4"})
WITH
reduce(
output = [0, 0, 0],
n IN nodes(p) |
CASE
WHEN output[0] < n.duration THEN [n.duration, output[2], output[2]+1]
ELSE [output[0], output[1], output[2]+1]
) as tasks
RETURN
tasks[0] AS max,
tasks[1] AS positionInArray
In the reduce we create an array with 3 elements :
in the first one we will store the maximum value found
in the second the position in the array of the maximum value
in the third, it's just an increment to track the position of where we are in the array
A good read about that : https://blog.armbruster-it.de/2015/03/cypher-fun-finding-the-position-of-an-element-in-an-array/
We use distributed erlang cluster and now I tests it in case of net splits.
To get information from all nodes of the cluster I use gen_server:multicall/4 with defined timeout. What I need is to get information from available nodes as soon as possible. So timeout is not too big (about 3000 ms).
Here call example:
Timeout = 3000
Nodes = AllConfiguredNodes
gen_server:multi_call(Nodes, broker, get_score, Timeout)
I expect that this call returns result in Timeout ms. But in case of net split it does not. It waits approx. 8 seconds.
What I found that multi_call request is halted for additional 5 seconds in call erlang:monitor(process, {Name, Node}) before sending request.
I really do not care that some node do not reply or busy or not available, I can use any other but with this halting I forced to wait until Erlang VM
try to establish new connection to dead/not available node.
The question is: do you know solution that can prevent this halting? Or may be another RPC that suitable for my situation.
I'm not sure if I totally understand the problem you are trying to solve, but if it is to get all the answers that can be retrieved in X amount of time and ignore the rest, you might try a combination of async_call and nb_yield.
Maybe something like
somefun() ->
SmallTimeMs = 50,
Nodes = AllConfiguredNodes,
Promises = [rpc:async_call(N, some_mod, some_fun, ArgList) || N <- Nodes],
get_results([], Promises, SmallTimeMs).
get_results(Results, _Promises, _SmallTimeMs) when length(Results) > 1 -> % Replace 1 with whatever is the minimum acceptable number of results
lists:flatten(Results);
get_results(Results, Promises, SmallTimeMs) ->
Rs = get_promises(Promises, SmallTimeMs)
get_results([Results|Rs], Promises, SmallTimeMs)).
get_promise(Promises, WaitMs) ->
[rpc:nb_yield(Key, WaitMs) || Key <- Promises].
See: http://erlang.org/doc/man/rpc.html#async_call-4 for more details.
My solution of the problem.
I've made my own implementation of multicall that uses gen_server:call
Basic idea is to call all nodes with gen_server:call() in separate process. And collect result of these calls. Collection is made by receiving messages from mailbox of calling process.
To control timeout I calculate deadline when timeout expired and then use it as reference point to calculate timeout for after in receive.
Implementation
Main function is:
multicall(Nodes, Name, Req, Timeout) ->
Refs = lists:map(fun(Node) -> call_node(Node, Name, Req, Timeout) end, Nodes),
Results = read_all(Timeout, Refs),
PosResults = [ { Node, Result } || { ok, { ok, { Node, Result } } } <- Results ],
{ PosResults, calc_bad_nodes(Nodes, PosResults) }.
Idea here is to call all nodes and wait for all results within one Timeout.
Calling one node is performed from spawned process. It catches exits that used by gen_server:call in case of error.
call_node(Node, Name, Req, Timeout) ->
Ref = make_ref(),
Self = self(),
spawn_link(fun() ->
try
Result = gen_server:call({Name,Node},Req,Timeout),
Self ! { Ref, { ok, { Node, Result } } }
catch
exit:Exit ->
Self ! { Ref, { error, { 'EXIT', Exit } } }
end
end),
Ref.
Bad nodes are calculated as those that are not respond within Timout
calc_bad_nodes(Nodes, PosResults) ->
{ GoodNodes, _ } = lists:unzip(PosResults),
[ BadNode || BadNode <- Nodes, not lists:member(BadNode, GoodNodes) ].
Results are collected by reading mailbox with Timeout
read_all(ReadList, Timeout) ->
Now = erlang:monotonic_time(millisecond),
Deadline = Now + Timeout,
read_all_impl(ReadList, Deadline, []).
Implementation reads until Deadline does not occur
read_all_impl([], _, Results) ->
lists:reverse(Results);
read_all_impl([ W | Rest ], expired, Results) ->
R = read(0, W),
read_all_impl(Rest, expired, [R | Results ]);
read_all_impl([ W | Rest ] = L, Deadline, Results) ->
Now = erlang:monotonic_time(millisecond),
case Deadline - Now of
Timeout when Timeout > 0 ->
R = read(Timeout, W),
case R of
{ ok, _ } ->
read_all_impl(Rest, Deadline, [ R | Results ]);
{ error, { read_timeout, _ } } ->
read_all_impl(Rest, expired, [ R | Results ])
end;
Timeout when Timeout =< 0 ->
read_all_impl(L, expired, Results)
end.
One single read is just receive from mailbox with Timeout.
read(Timeout, Ref) ->
receive
{ Ref, Result } ->
{ ok, Result }
after Timeout ->
{ error, { read_timeout, Timeout } }
end.
Further improvements:
rpc module spawns separate process to avoid garbage of late answers. So It will be useful to do the same in this multicall function
infinity timeout may be handled in obvious way
The Stats.expandingXXXX functions are pretty fast. However there is no public api to do a expandingWindow apply. The following solution i created is really slow when it comes to large dataset like 100k. Any suggestion is appreciated?
let ExpWindowApply f minSize data =
let keys = dataSeries.Keys
let startKey = dataSeries.FirstKey()
let values = keys
|> Seq.map(fun k ->
let ds = data.Between(startKey,k)
match ds with
|_ when ds.ValueCount >= minSize -> f ds.Values
|_ -> Double.NaN
)
let result = Series(keys, values)
result
I understand the Stats.expandingXXX function are actually special cases where the function being applied can be iterately calculated based on previous loop's state. And not all function can take advantage of states from previous calculation. Is there anything better way than Series.Between in terms of creating a window of data?
Update
For those who are also interested in the similar issue. The answer provides alternative implementation and insight into rarely documented series vector and index operation. But it doesn't improve performance.
The expanding functions in Deedle are fast because they are using an efficient online algorithm that makes it possible to calculate the statistics on the fly with just one pass - rather than actually building the intermediate series for the sub-ranges.
There is a built-in function aggregate that lets you do something this - though it works in the reversed way. For example, if you want to sum all elements starting from the current one to the end, you can write:
let s = series [ for i in 1 .. 10 -> i, float i ]
s |> Series.aggregateInto
(Aggregation.WindowWhile(fun _ _ -> true))
(fun seg -> seg.Data.FirstKey())
(fun seg -> OptionalValue(Stats.sum seg.Data))
If you want to do the same thing using the underlying representation, you can directly use the addressing scheme that Deedle uses to link the keys (in the index) with values (in the data vector). This is an ugly mutable sample, but you can encapsulate it into something nicer:
[ let firstAddr = s.Index.Locate(s.FirstKey())
for k in s.Index.KeySequence ->
let lastAddr = s.Index.Locate(k)
seq {
let a = ref firstAddr
while !a <> lastAddr do
yield s.Vector.GetValue(!a).Value
a := s.Index.AddressOperations.AdjustBy(!a, +1L) } |> Seq.sum ]
I've implemented a recursive mergesort algorithm:
-module(ms).
-import(lists,[sublist/3,delete/2,min/1,reverse/1]).
-export([mergesort/1]).
mergesort([])->
[];
mergesort([N])->
N;
mergesort(L)->
mergesort(split(1,L),split(2,L),[]).
mergesort(L1,L2,[])->
case {sorted(L1),sorted(L2)} of
{true,true}->
merge(L1,L2,[]);
{true,false}->
merge(L1,mergesort(split(1,L2),split(2,L2),[]),[]);
{false,true}->
merge(mergesort(split(1,L1),split(2,L1),[]),L2,[]);
{false,false}->
merge(mergesort(split(1,L1),split(2,L1),[]),mergesort(split(1,L2),split(2,L2),[]),[])
end.
merge([],[],R)->
reverse(R);
merge(L,[],R)->
merge(delete(min(L),L),[],[min(L)|R]);
merge([],L,R)->
merge([],delete(min(L),L),[min(L)|R]);
merge([H1|T1],[H2|T2],R) when H1 < H2 ->
merge(T1,[H2|T2],[H1|R]);
merge([H1|T1],[H2|T2],R) when H1 >= H2 ->
merge([H1|T1],T2,[H2|R]).
split(1,L)->
sublist(L,1,ceiling(length(L)/2));
split(2,L)->
sublist(L,ceiling(length(L)/2+1),length(L)).
ceiling(X) when X < 0 ->
trunc(X);
ceiling(X) ->
T = trunc(X),
case X - T == 0 of
true -> T;
false -> T + 1
end.
However I'm irked by the fact that mergesort/3 is not tail recursive (TR), and is verbose.
I guess the problem here is that I'm not particularly aware of the TR 'template' that I would use here - I understand how I would implement a TR function that can be defined in terms of a series, for example - that would just move the arguments to the function up the series, however for the case in which we merge a sublist conditionally to the natural recursion of the rest of the list, I'm ignorant.
Therefore, I would like to ask:
1) How can I make mergesort/3 TR?
2) What resources can I use to understand erlang tail recursion in-depth?
Your merge-sort is not tail recursive because the last function called in mergesort/3 is merge/3. You call mergesort as arguments of merge so stack has to grow - upper called mergesort/3 is not yet finished and its stack frame can't be reused.
To write it in TR approach you need think of it as much imperatively as you can. Every TR function is easily rewritable to iterational while loop. Consider:
loop(Arg) ->
NewArg = something_happens_to(Arg),
loop(NewArg) or return NewArg.
And:
data = something;
while(1){
...
break loop or modify data block
...
} // data equals to NewArg at the end of iteration
Here is my TR merge-sort example. It's bottom-up way of thinking. I used merge/3 function from your module.
ms(L) ->
ms_iteration([[N] || N <- L], []).
ms_iteration([], []) -> % nothing to do
[];
ms_iteration([], [OneSortedList]) -> % nothing left to do
OneSortedList;
ms_iteration([], MergedLists) ->
ms_iteration(MergedLists, []); % next merging iteration
ms_iteration([L], MergedLists) -> % can't be merged yet but it's sorted
ms_iteration([], [L | MergedLists]);
ms_iteration([L1, L2 | ToMergeTail], MergedLists) -> % merging two sorted lists
ms_iteration(ToMergeTail, [merge(L1, L2, []) | MergedLists]).
It's nicely explained here: http://learnyousomeerlang.com/recursion .
I'm trying to wrap my head around functional programming using F#. I'm working my way through the Project Euler problems, and I feel like I am just writing procedural code in F#. For instance, this is my solution to #3.
let Calc() =
let mutable limit = 600851475143L
let mutable factor = 2L // Start with the lowest prime
while factor < limit do
if limit % factor = 0L then
begin
limit <- limit / factor
end
else factor <- factor + 1L
limit
This works just fine, but all I've really done is taken how I would solve this problem in c# and converted it to F# syntax. Looking back over several of my solutions, this is becoming a pattern. I think that I should be able to solve this problem without using mutable, but I'm having trouble not thinking about the problem procedurally.
Why not with recursion?
let Calc() =
let rec calcinner factor limit =
if factor < limit then
if limit % factor = 0L then
calcinner factor (limit/factor)
else
calcinner (factor + 1L) limit
else limit
let limit = 600851475143L
let factor = 2L // Start with the lowest prime
calcinner factor limit
For algorithmic problems (like project Euler), you'll probably want to write most iterations using recursion (as John suggests). However, even mutable imperative code sometimes makes sense if you are using e.g. hashtables or arrays and care about performance.
One area where F# works really well which is (sadly) not really covered by the project Euler exercises is designing data types - so if you're interested in learning F# from another perspective, have a look at Designing with types at F# for Fun and Profit.
In this case, you could also use Seq.unfold to implement the solution (in general, you can often compose solutions to sequence processing problems using Seq functions - though it does not look as elegant here).
let Calc() =
// Start with initial state (600851475143L, 2L) and generate a sequence
// of "limits" by generating new states & returning limit in each step
(600851475143L, 2L)
|> Seq.unfold (fun (limit, factor) ->
// End the sequence when factor is greater than limit
if factor >= limit then None
// Update limit when divisible by factor
elif limit % factor = 0L then
let limit = limit / factor
Some(limit, (limit, factor))
// Update factor
else
Some(limit, (limit, factor + 1L)) )
// Take the last generated limit value
|> Seq.last
In functional programming when I think mutable I think heap and when trying to write code that is more functional, you should use the stack instead of the heap.
So how do you get values on to the stack for use with a function?
Place the value in the function's parameters.
let result01 = List.filter (fun x -> x % 2 = 0) [0;1;2;3;4;5]
here both a function an a list of values are hard coded into the List.filter parameter's.
Bind the value to a name and then reference the name.
let divisibleBy2 = fun x -> x % 2 = 0
let values = [0;1;2;3;4;5]
let result02 = List.filter divisibleBy2 values
here the function parameter for list.filter is bound to divisibleBy2 and the list parameter for list.filter is bound to values.
Create a nameless data structure and pipe it into the function.
let result03 =
[0;1;2;3;4;5]
|> List.filter divisibleBy2
here the list parameter for list.filter is forward piped into the list.filter function.
Pass the result of a function into the function
let result04 =
[ for i in 1 .. 5 -> i]
|> List.filter divisibleBy2
Now that we have all of the data on the stack, how do we process the data using only the stack?
One of the patterns often used with functional programming is to put data into a structure and then process the items one at a time using a recursive function. The structure can be a list, tree, graph, etc. and is usually defined using a discriminated union. Data structures that have one or more self references are typically used with recursive functions.
So here is an example where we take a list and multiply all the values by 2 and put the result back onto the stack as we progress. The variable on the stack holding the new values is accumulator.
let mult2 values =
let rec mult2withAccumulator values accumulator =
match values with
| headValue::tailValues ->
let newValue = headValue * 2
let accumulator = newValue :: accumulator
mult2withAccumulator tailValues accumulator
| [] ->
List.rev accumulator
mult2withAccumulator values []
We use an accumulator for this which being a parameter to a function and not defined mutable is stored on the stack. Also this method is using pattern matching and the list discriminated union. The accumulator holds the new values as we process the items in the input list and then when there are not more items in the list ([]) we just reverse the list to get the new list in the correct order because the new items are concatenated to the head of the accumulator.
To understand the data structure (discriminated union) for a list you need to see it, so here it is
type list =
| Item of 'a * List
| Empty
Notice how the end of the definition of an item is List referring back to itself, and that a list can ben an empty list, which is when used with pattern match is [].
A quick example of how list are built is
empty list - []
list with one int value - 1::[]
list with two int values - 1::2::[]
list with three int values - 1::2::3::[]
Here is the same function with all of the types defined.
let mult2 (values : int list) =
let rec mult2withAccumulator (values : int list) (accumulator : int list) =
match (values : int list) with
| (headValue : int)::(tailValues : int list) ->
let (newValue : int) = headValue * 2
let (accumulator : int list) =
(((newValue : int) :: (accumulator : int list)) : int list)
mult2withAccumulator tailValues accumulator
| [] ->
((List.rev accumulator) : int list)
mult2withAccumulator values []
So putting values onto the stack and using self referencing discriminated unions with pattern matching will help to solve a lot of problems with functional programming.