Speeding up Erlang indexation function - erlang

So following on from this question:
Erlang lists:index_of function?
I have the following code which works just fine:
-module(test_index_of).
-compile(export_all).
index_of(Q)->
N=length(Q),
Qs=lists:zip(lists:sort(Q), lists:seq(1, N)),
IndexFn=fun(X)->
{_, {_, I}}=lists:keysearch(X, 1, Qs),
I
end,
[IndexFn(X) || X <- Q].
test()->
Q=[random:uniform() || _X <- lists:seq(1, 20)],
{T1, _}=timer:tc(test_index_of, index_of, [Q]),
io:format("~p~n", [T1]).
Problem is, I need to run the index_of function a very large number of times [10,000] on lists of length 20-30 characters; the index_of function is the performance bottleneck in my code. So although it looks to be implemented reasonably efficiently to me, I'm not convinced it's the fastest solution.
Can anyone out there improve [performance-wise] on the current implementation of index_of ? [Zed mentioned gb_trees]
Thanks!

You are optimizing an operation on the wrong data type.
If you are going to make 10 000 lookups on the same list of 20-30 items, then it really pays off to do pre-computation to speed up those lookups. For example, lets make a tuple sorted on the key in a tuples of {key, index}.
1> Ls = [x,y,z,f,o,o].
[x,y,z,f,o,o]
2> Ls2 = lists:zip(Ls, lists:seq(1, length(Ls))).
[{x,1},{y,2},{z,3},{f,4},{o,5},{o,6}]
3> Ts = list_to_tuple(lists:keysort(1, Ls2)).
{{f,4},{o,5},{o,6},{x,1},{y,2},{z,3}}
A recursive binary search for a key on this tuple will very quickly home in on the right index.
Use proplists:normalize to remove duplicates, that is, if it is wrong to return 6 when looking up 'o' instead of 5. Or use folding and sets to implement your own filter that removes duplicates.
Try building a dict with dict:from_list/1 and make lookups on that dict instead.
But this still begs the question: Why do you want the index into a list of something? Lookups with lists:nth/2 has O(n) complexity.

Not sure if I understand this completely, but if the above is your actual usecase, then...
First of all, you could generate Q as the following, and you already save the zipping part.
Q=[{N,random:uniform()} || N <- lists:seq(1, 20)]
Taking this further on, you could generate a tree indexed by the values from the beginning:
Tree = lists:foldl(
fun(T, N) -> gb_trees:enter(uniform:random(), N, T) end,
gb_trees:empty(),
lists:seq(1, 20)
).
Then looking up your index becomes:
index_of(Item, Tree) ->
case gb_trees:lookup(Item, Tree) of
{value, Index} -> Index;
_ -> not_found
end.

I think you need custom sort function which record permutations it makes to input list. For example you can use lists:sort source. This should give you O(N*log N) performance.

Just one question: WTF are you trying do?
I just can't found what is practical purpose of this function. I think you do something odd. It seems that you just improved from O(NM^2) to O(NM*logM) but it is still very bad.
EDIT:
When I synthesize what is goal, It seems that you are trying use Monte Carlo method to determine probabilities of team's 'finishing positions' in English Premiere League. But I'm still not sure. You can determine most probable position [1,1,2] -> 1 or as fractional number as some sort of average 1.33 - for example this last one can be achieve with less effort than others.
In functional programing languages data structures are more important that in procedural or OO ones. They are more about work-flow. You will do this and than this and than ... In functional language as Erlang you should think in manner, I have this input and I want that output. Required output I can determine from this and this and so. There may be not necessary have list of things as you used to be in procedural approaches.
In procedural approaches you are used to use arrays for storage with constant random access. List is not that such thing. There are not arrays in Erlang where you can write (even array module which is balanced tree in reality). You can use tuple or binary for read only array but no one read write. I can write a lot about that there doesn't exist data structure with constant access time at all (from RAM, through arrays in procedural languages to HASH maps) but there is not enough space to explain it in detail here (from RAM technology, through L{1,2,3} CPU caches to necessity increase HASH length when number of keys increase and key HASH computation dependency of key length).
List is data structure which have O(N) random access time. It is best structure for store data which you want take one by one in same order as stored in list. For small N it can be capable structure for random access for small N when corresponding constant is small. For example when N is number of teams (20) in your problem it can be faster than O(logN) access to some sort of tree. But you must take care how big your constant is.
One of common component of algorithms are Key-Value lookups. There can be used arrays as supporting data structure in procedural world in some circumstances. Key must be integer number, space of possible key must not be to sparse and so. List doesn't serve as its substitution well for this purpose except for very small N here. I learn that best way how write functional code is avoid Key-Value lookups where is unnecessary. It often needs rearrange work-flow or refactoring data structures and so. Sometimes it looks like flip over problem solution like glove.
If I ignore that your probability model is wrong. From information you provide it seems that in your model team's season points are independent random events which is not true of course. There is impossible that all teams have some high amount of point, 82 for example just because there is some limit of points taken by all teams in one season. So forgot for this for now. Then I will simulate one 'path' - season and take result in form [{78,'Liverpool'}, {81,'Man Utd'}, ...], then I can sort it using lists:sort without loosing information which team is where. Results I would collect using iteration by path. For each path I would iterate over sorted simulation result and collect it in dictionary where team is key (constant and very cheap hash computation from atom and constant storage because key set is fixed, there is possibility to use tuples/records but seems like premature optimization). Value can be tuple of size 20 and position in tuple is final position and value is count of it.
Something like:
% Process simulation results.
% Results = [[{Points, Team}]]
process(Results) ->
lists:foldl(fun process_path/2,
dict:from_list([{Team, {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0}} ||
Team <- ['Liverpool', 'Man Utd', ...]]),
Results).
% process simulation path result
process_path(R, D) ->
process_path(lists:reverse(lists:sort(R)), D, 1).
process_path([], _, D) -> D;
process_path([{_, Team}|R], D, Pos) ->
process_path(R, update_team(Team, Pos, D), Pos + 1).
% update team position count in dictionary
update_team(Team, Pos, D) ->
dict:update(Team, fun(T) -> add_count(T, Pos) end, D).
% Add final position Pos to tuple T of counts
add_count(T, P) -> setelement(P, T, element(P, T) + 1).
Notice that there is nothing like lists:index_of or lists:nth function. Resulting complexity will look like O(NM) or O(NMlogM) for small number M of Teams, but real complexity is O(NM^2) for O(M) setelement/3 in add_count/2. For bigger M you should change add_count/2 to some more reasonable.

Related

What is the sorting complexity time in the streaming API for the sorted() and thenComparing () methods (sorting by multiple fields (conditions))?

I have an information that the method sorted() in Stream API maybe to use merging sort (mergesort).
Then time complexity for the kind of sort :
Big Θ (n (log n) ) - best
Big Ω (n (log n) ) - average
Big O (n (log n) ) - worst
space complecity – O (n) - worst
And what is the time complexity if we use sorting by multiple fields of a custom object, using then.comparing() to build a chain of comparisons ?
How would you calculate the time complexity in such a case ?
While the actual algorithm used in Stream.sorted is intentionally unspecified, there are obvious reasons not to implement another sorting algorithm, but use the existing implementation of Arrays.sort.
The current implementation uses TimSort, a variation of merge sort that can exploit ranges of pre-sorted elements within the input, with a best case of being entirely linear, when the input is already sorted, which also applies to the possible case that the input is sorted backwards. In these cases, no additional memory is needed. The average case is somewhere between that best case and the unchanged worst case of O(n log n).
As explained in this answer, general statements about the algorithms used in Arrays.sort are tricky, because all of them are hybrid sort algorithms and constantly improved.
Normally, a comparison function does not depend on the size of the input (the array or collection to sort), which doesn’t change when using Comparator.comparing(…).thenComparing(…), as more expensive comparison functions only add a constant factor that doesn’t affect the overall time complexity, as long as the comparator still doesn’t depend on the input size.

Why is splitting a Rust's std::collections::LinkedList O(n)?

The .split_off method on std::collections::LinkedList is described as having a O(n) time complexity. From the (docs):
pub fn split_off(&mut self, at: usize) -> LinkedList<T>
Splits the list into two at the given index. Returns everything after the given index, including the index.
This operation should compute in O(n) time.
Why not O(1)?
I know that linked lists are not trivial in Rust. There are several resources going into the how's and why's like this book and this article among several others, but I haven't got the chance to dive into those or the standard library's source code yet.
Is there a concise explanation about the extra work needed when splitting a linked list in (safe) Rust?
Is this the only way? And if not why was this implementation chosen?
The method LinkedList::split_off(&mut self, at: usize) first has to traverse the list from the start (or the end) to the position at, which takes O(min(at, n - at)) time. The actual split off is a constant time operation (as you said). And since this min() expression is confusing, we just replace it by n which is legal. Thus: O(n).
Why was the method designed like that? The problem goes deeper than this particular method: most of the LinkedList API in the standard library is not really useful.
Due to its cache unfriendliness, a linked list is often a bad choice to store sequential data. But linked lists have a few nice properties which make them the best data structure for a few, rare situations. These nice properties include:
Inserting an element in the middle in O(1), if you already have a pointer to that position
Removing an element from the middle in O(1), if you already have a pointer to that position
Splitting the list into two lists at an arbitrary position in O(1), if you already have a pointer to that position
Notice anything? The linked list is designed for situations where you already have a pointer to the position that you want to do stuff at.
Rust's LinkedList, like many others, just store a pointer to the start and end. To have a pointer to an element inside the linked list, you need something like an Iterator. In our case, that's IterMut. An iterator over a collection can function like a pointer to a specific element and can be advanced carefully (i.e. not with a for loop). And in fact, there is IterMut::insert_next which allows you to insert an element in the middle of the list in O(1). Hurray!
But this method is unstable. And methods to remove the current element or to split the list off at that position are missing. Why? Because of the vicious circle that is:
LinkedList lacks almost all features that make linked lists useful at all
Thus (nearly) everyone recommends not to use it
Thus (nearly) no one uses LinkedList
Thus (nearly) no one cares about improving it
Goto 1
Please note that are a few brave souls occasionally trying to improve the situations. There is the tracking issue about insert_next, where people argue that Iterator might be the wrong concept to perform these O(1) operations and that we want something like a "cursor" instead. And here someone suggested a bunch of methods to be added to IterMut (including cut!).
Now someone just has to write a nice RFC and someone needs to implement it. Maybe then LinkedList won't be nearly useless anymore.
Edit 2018-10-25: someone did write an RFC. Let's hope for the best!
Edit 2019-02-21: the RFC was accepted! Tracking issue.
Maybe I'm misunderstanding your question, but in a linked list, the links of each node have to be followed to proceed to the next node. If you want to get to the third node, you start at the first, follow its link to the second, then finally arrive at the third.
This traversal's complexity is proportional to the target node index n because n nodes are processed/traversed, so it's a linear O(n) operation, not a constant time O(1) operation. The part where the list is "split off" is of course constant time, but the overall split operation's complexity is dominated by the dominant term O(n) incurred by getting to the split-off point node before the split can even be made.
One way in which it could be O(1) would be if a pointer existed to the node after which the list is split off, but that is different from specifying a target node index. Alternatively, an index could be kept mapping the node index to the corresponding node pointer, but it would be extra space and processing overhead in keeping the index updated in sync with list operations.
pub fn split_off(&mut self, at: usize) -> LinkedList<T>
Splits the list into two at the given index. Returns everything after the given index, including the index.
This operation should compute in O(n) time.
The documentation is either:
unclear, if n is supposed to be the index,
pessimistic, if n is supposed to be the length of the list (the usual meaning).
The proper complexity, as can be seen in the implementation, is O(min(at, n - at)) (whichever is smaller). Since at must be smaller than n, the documentation is correct that O(n) is a bound on the complexity (reached for at = n / 2), however such a large bound is unhelpful.
That is, the fact that list.split_off(5) takes the same time if list.len() is 10 or 1,000,000 is quite important!
As to why this complexity, this is an inherent consequence of the structure of doubly-linked list. There is no O(1) indexing operation in a linked-list, after all. The operation implemented in C, C++, C#, D, F#, ... would have the exact same complexity.
Note: I encourage you to write a pseudo-code implementation of a linked-list with the split_off operation; you'll realize this is the best you can get without altering the data-structure to be something else.

working on sequences with lagged operators

A machine is turning on and off.
seqStartStop is a seq<DateTime*DateTime> that collects the start and end time of the execution of the machine tasks.
I would like to produce the sequence of periods where the machine is idle. In order to do so, I would like to build a sequence of tuples (beginIdle, endIdle).
beginIdle corresponds to the stopping time of the machine during
the previous cycle.
endIdle corresponds to the start time of the current production
cycle.
In practice, I have to build (beginIdle, endIdle) by taking the second element of the tuple for i-1 and the fist element of the following tuple i
I was wondering how I could get this task done without converting seqStartStop to an array and then looping through the array in an imperative fashion.
Another idea creating two copies of seqStartStop: one where the head is tail is removed, one where the head is removed (shifting backwards the elements); and then appying map2.
I could use skipand take as described here
All these seem rather cumbersome. Is there something more straightforward
In general, I was wondering how to execute calculations on elements with different lags in a sequence.
You can implement this pretty easily with Seq.pairwise and Seq.map:
let idleTimes (startStopTimes : seq<DateTime * DateTime>) =
startStopTimes
|> Seq.pairwise
|> Seq.map (fun (_, stop) (start, _) ->
stop, start)
As for the more general question of executing on sequences with different lag periods, you could implement that by using Seq.skip and Seq.zip to produce a combined sequence with whatever lag period you require.
The idea of using map2 with two copies of the sequence, one slightly shifted by taking the tail of the original sequence, is quite a standard one in functional programming, so I would recommend that route.
The Seq.map2 function is fine with working with lists with different lengths - it just stops when you reach the end of the shorter list - so you don't need to chop the last element of the original copy.
One thing to be careful of is how your original seq<DateTime*DateTime> is calculated. It will be recalculated each time it is enumerated, so with the map2 idea it will be calculated twice. If it's cheap to calculate and doesn't involve side-effects, this is fine. Otherwise, convert it to a list first with List.ofSeq.
You can still use Seq.map2 on lists as a list is an IEnumerable (i.e. a seq). Don't use List.map2 unless the lists are the same length though as it is more picky than Seq.map2.

Splitting and runtime of log n

Sorry, I made a mistake in my earlier question. Because of that I didn't get the answer I wanted.
The teacher told us that every time you divide something by 2, the run-time is likely to be log n. For instance, if we divide an array into two, each time we traverse one of the array, the run-time would be log n. However, we may run into a case with LinkedList where we may be easily misled. For instance, we may have an algorithm to set the nth element of the list to something else by starting from either the head or the tail in order to have a run-time of less than n. Logically, we may think that the run time would be log n, but it's not. Why is that? And how do you determine that?
Do we need to absolutely have splitting to get a run-time of log n? I don't think it makes any logical sense to say the run-time of n when the maximum run-time of the loop is n/2.
I think some concepts need a bit of refining here, because the time complexity is only related to algorithm, not to the size of the data structure you're operating on.
The teacher told us that every time you divide something by 2, the run-time is likely to be log n. For instance, if we divide an array into two, each time we traverse one of the array, the run-time would be log n.
Now, traversing an array, like
for (int i = 0; i < array.size; i++) {
variable = array[i];
}
runs in O(n): the time needed to perform such an operation varies linearly with the size of the array. You will have O(log n) for operations like a binary search on an array, but you cannot generalize this concept to all array operations, and especially not to those who need to iterate over the array.
Now, this sentence
For instance, we may have an algorithm to set the nth element of the list to something else by starting from either the head or the tail in order to have a run-time of less than n.
leads me to believe that you think that the n as used in big O and what you call the "nth element" are directly related. They aren't. On a linked list your only option to go to element n is to go to the start of the list and follow the links down the element you're looking for (or in the case of a double linked list, go to the start or end depending on the position of the element you're looking for), so this operation has a time complexity of O(n), ie linearly related to the length of the collection.

find the last item but N in a stream w/o storing n items

Suppose there is a stream of data arriving, D(0), D(1), D(2), .... When D(i) comes, I want to know D(i - N). The most straight forward way is to store the most recent N items and keep updating them upon arrival of new data. But the problem is N can be large so that there is no enough memory to store them. Is there anyway to achieve this by storing much less items than N? A constant of M << N of spaces are preferred? Thanks in advance.
Not as far as I can see, unless there is some regularity in the data that you can exploit. If the data are completely random (such that no element can be inferred from the others), then a choice of not saving element k will make it impossible to reproduce that element in iteration k + N.
Instead, consider:
Can you reduce N?
Can you store information on disk or (if you are in an embedded environment) on a slower, cheaper form of memory?
Is there some pattern in the data? If there is e.g. a repeating pattern, you can utilize that, or if there is some mathematical relationship between the numbers, perhaps some formula can aid in reconstructing one number from others. Even if there is no perceptible pattern, perhaps you could use some compression algorithm to reduce the data size?
Is there some limitation to the data, e.g. every number is between 0 and 255? If so, you could perhaps reduce the storage requirements.
(What is the application of this, by the way?)

Resources