Comparing arrays elements in Erlang - erlang

I'm trying to learn how to think in a functional programming way, for this, I'm trying to learn Erlang and solving easy problems from codingbat. I came with the common problem of comparing elements inside a list. For example, compare a value of the i-th position element with the value of the i+1-th position of the list. So, I have been thinking and searching how to do this in a functional way in Erlang (or any functional language).
Please, be gentle with me, I'm very newb in this functional world, but I want to learn
Thanks in advance

Define a list:
L = [1,2,3,4,4,5,6]
Define a function f, which takes a list
If it matches a list of one element or an empty list, return the empty list
If it matches the first element and the second element then take the first element and construct a new list by calling the rest of the list recursivly
Otherwise skip the first element of the list.
In Erlang code
f ([]) -> [];
f ([_]) -> [];
f ([X, X|Rest]) -> [X | f(Rest)];
f ([_|Rest]) -> f(Rest).
Apply function
f(L)
This should work... haven't compiled and run it but it should get you started. Also in case you need to do modifications to it to behave differently.
Welcome to Erlang ;)

I try to be gentle ;-) So main thing in functional approach is thinking in terms: What is input? What should be output? There is nothing like comparing the i-th element with the i+1-th element alone. There have to be always purpose of it which will lead to data transformation. Even Mazen Harake's example doing it. In this example there is function which returns only elements which are followed by same value i.e. filters given list. Typically there are very different ways how do similar thing which depends of purpose of it. List is basic functional structure and you can do amazing things with it as Lisp shows us but you have to remember it is not array.
Each time you need access i-th element repeatable it indicates you are using wrong data structure. You can build up different data structures form lists and tuples in Erlang which can serve your purposes better. So when you face problem to compare i-th with i+1-th element you should stop and think. What is purpose of it? Do you need perform some stream data transformation as Mazen Harake does or You need random access? If second you should use different data structure (array for example). Even then you should think about your task characteristics. If you will be mostly read and almost never write then you can use list_to_tuple(L) and then read using element/2. When you need write occasionally you will start thinking about partition it to several tuples and as your write ratio will grow you will end up with array implementation.
So you can use lists:nth/2 if you will do it only once or several times but on short list and you are not performance freak as I'm. You can improve it using [X1,X2|_] = lists:nthtail(I-1, L) (L = lists:nthtail(0,L) works as expected). If you are facing bigger lists and you want call it many times you have to rethink your approach.
P.S.: There are many other fascinating data structures except lists and trees. Zippers for example.

Related

What are the steps in doing incrementation in erlang?

increment([]) -> [];
increment([H|T]) -> [H+1|increment(T)].
decrement([]) -> [];
decrement([H|T]) -> [H-1|decrement(T)].
So I have this code but I don't know how they properly work like in java.
Java and Erlang are different beasts. I don't recommend trying to make comparisons to Java when learning Erlang, especially if Java is the only language you know so far. The code you've posted is a good example of the paradigm known as "functional programming". I'd suggest doing some reading on that subject to help you understand what's going on. To try to break this down as far as Erlang goes, you need to understand that an Erlang function is completely different from a Java method.
In Java, your method signature is composed of the method name and the types of its arguments. The return type can also be significant. A Java increment method like the function you wrote might be written like List<Integer> increment(List<Integer> input). The body of the Java method would probably iterate through the list an element at a time and set each element to itself plus one:
List<Integer> increment(List<Integer> input) {
for (int i = 0; i < input.size; i++) {
input.set(i, input.get(i) + 1);
}
}
Erlang has almost nothing in common with this. To begin with, an erlang function's "signature" is the name and arity of the function. Arity means how many arguments the function accepts. So your increment function is known as increment/1, and that's its unique signature. The way you write the argument list inside the parentheses after the function name has less to do with argument types than with the pattern of the data passed to it. A function like increment([]) -> ... can only successfully be called by passing it [], the empty list. Likewise, the function increment([Item]) -> ... can only be successfully called by passing it a list with one item in it, and increment([Item1, Item2]) -> ... must be passed a list with two items in it. This concept of matching data to patterns is quite aptly known as "pattern matching", and you'll find it in many functional languages. In Erlang functions, it's used to select which head of the function to execute. This bears a rough similarity to Java's method overloading, where you can have many methods with the same name but different argument types; however a pattern in an Erlang function head can bind variables to different pieces of the arguments that match the pattern.
In your code example, the function increment/1 has two heads. The first head is executed only if you pass an empty list to the function. The second head is executed only if you pass a non-empty list to the function. When that happens, two variables, H and T, are bound. H is bound to the first item of the list, and T is bound to the rest of the list, meaning all but the first item. That's because the pattern [H|T] matches a non-empty list, including a list with one element, in which case T would be bound to the empty list. The variables thus bound can be used in the body of the function.
The bodies of your functions are a very typical form of iterating a list in Erlang to produce a new list. It's typical because of another important difference from Java, which is that Erlang data is immutable. That means there's no such concept as "setting an element of a list" like I did in the Java code above. If you want to change a list, you have to build a new one, which is what your code does. It effectively says:
The result of incrementing the empty list is the empty list.
The result of incrementing a non-empty list is:
Take the first element of the list: H.
Increment the rest of the list: increment(T).
Prepend H+1 to the result of incrementing the rest of the list.
Note that you want to be careful about how you build lists in Erlang, or you can end up wasting a lot of resources. The List Handling User's Guide is a good place to learn about that. Also note that this code uses a concept known as "recursion", meaning that the function calls itself. In many popular languages, including Java, recursion is of limited usefulness because each new function call adds a stack frame, and your available memory space for stack frames is relatively limited. Erlang and many functional languages support a thing known as "tail call elimination", which is a feature that allows properly written code to recurse indefinitely without exhausting any resources.
Hopefully this helps explain things. If you can ask a more specific question, you might get a better answer.

How do I efficiently search through an ordered list?

I have a function that predicts a words being typed and returns the possibilities in an array. Unfortunately those aren’t sorted by frequency used. So I have a list of 10K ordered words listed by most frequent to less frequent. What would be an efficient way to compare the words in the array and the ordered list to return the most frequent one? (i.e the one it encounters first?)
I was tipped off by a friend to use a binary search tree but I really don't see how that helps me. From what I understood from the following website, only numerical values can be used.. Am I wrong in thinking so? Is there a better way of doing the aforementioned task?
Thanks in advance
You could create a dictionary with words as keys and frequencies as values. Then iterate over your result array, use the dictionary to obtain the frequency value for each item, and predict the item with the highest frequency.
I wouldn't use a vanilla binary search tree here. It would be possible - as Taylor Kirkpatrick says, you could just create a tree with words as keys and frequencies and use that to find the frequency for each result word, in much the same way as the dictionary solution.
The problem is that you cannot guarantee that a simple binary tree will be balanced. From the sound of it your data would probably be OK, since your words are in frequency order. The worst case would be if the words were in alphabetic order - then your binary tree would end up being identical to a linked list - it would never branch, since every node would attach to the right of the previous one. So the computational complexity of a search would be the same as iterating over the array of words - O(n) instead of O(log2N) (which is the best case for binary trees).
Of course, you could guard against this by randomising the list of words before doing the insert. But to my mind it's just easier to use a dictionary. I don't know what the actual implementation of Swift dictionaries is (and we won't until they open source it in a couple of months), but you can take it as read that it will out perform a vanilla BT for value retrieval.
I don't know what the background to this problem is - if you are learning CS it might be worth implementing the BST just for intellectual growth - in this case, with only 10,000 items you might find the performance differences are ultimately quite small. But if you are a working programmer trying to solve a problem, go with the dictionary approach.
You put all your words into a dictionary or a set. That's it. Dictionary if you have data associated with the words, set if you have no data and just want to know if the word is in the list or not.
You might want to use a Trie.
Put your word list into it. For every character entered, you traverse the Trie as deeply as you can and then show all paths to leaf nodes as possible completions.
Since the world like you have is likely static, you can precompute the Trie and load from disk/network/whatever at startup if performance is a concern.
You can use a binary search tree with anything as the actual value. To actually make use of the tree, use the frequency of the words as the numerical value. This is actually a pretty good solution to your problem. Each node of the tree will contain this word and a numeric value that represents the frequency of the word.
Here are a few links to help you out with making it.
Hope that helps.

How to create a string outside of Erlang that represents a DICT Term?

I want to construct a string in Java that represents a DICT term and that will be passed to an Erlang process for being reflected back as an erlang term ( string-to-term ).
I can achieve this easily for ORDDICT 's, since they are structured as a simple sorted key / value pair in a list of tuples such as : [ {field1 , "value1"} , {field2 , "value2} ]
But, for DICTS, they are compiled into a specific term that I want to find how to reverse-engineer it. I am aware this structure can change over new releases, but the benefits for performance and ease of integration to Java would overcome this. Unfortunately Erlang's JInterface is based on simple data structures. An efficient DICT type would be of great use.
A simple dict gets defined as follows:
D1 = dict:store("field1","AAA",dict:new()).
{dict,1,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
{{[],[],[],[],[],[],[],[],
[["field1",65,65,65]],
[],[],[],[],[],[],[]}}}
As it can be seen above, there are some coordinates which I do not understand what they mean ( the numbers 1,16,16,8,80,48 and a set of empty lists, which likely represent something as well.
Adding two other rows (key-value pairs) causes the data to look like:
D3 = dict:store("field3","CCC",D2).
{dict,3,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
{{[],[],
[["field3",67,67,67]],
[],[],[],[],[],
[["field1",65,65,65]],
[],[],[],[],
[["field2",66,66,66]],
[],[]}}}
From the above I can notice that:
the first number (3) reppresets the number of items in the DICT.
the second number (16) shows the number of list slots in the first tuple of lists
the third number (16) shows the number of list slots in the second typle of lists, of which the values ended up being placed on ( in the middle ).
the fourth number (8) appears to be the number of slots in the second row of tuples from where the values are placed ( a sort of index-pointer )
the remaining numbers (80 and 48)... no idea...
adding a key "field0" gets placed not in the end but just after "field1"'s data. This indicates the indexing approach.
So the question, is there a way (algorithm) to reliably directly create a DICT string from outside of Erlang ?
The comprehensive specification how dict is implemented can be found simply in the dict.erl sourcecode.
But I'm not sure replicating dict.erl's implementation in Java is worthwhile. This would only make sense if you want a fast dict like data structure that you need to pass often between Java and Erlang code. It might make more sense to use a Key-Value store both from Erlang and Java without passing it directly around. Depending on your application this could be e.g. riak or maybe even connect your different language worlds with RabbitMQ. Both examples are implemented in Erlang and are easily accessible from both worlds.

Why don't F# lists have a tail pointer

Or phrased another way, what kind of benefits do you get from having a basic, singly linked list with only a head pointer? The benefits of a tail pointer that I can see are:
O(1) list concatenation
O(1) Appending stuff to the right side of the list
Both of which are rather convenient things to have, as opposed to O(n) list concatenation (where n is the length of the left-side list?). What advantages does dropping the tail pointer have?
F#, like many other functional[-ish] languages, has a cons-list (the terminology originally comes from LISP, but the concept is the same). In F# the :: operator (or List.Cons) is used for cons'ing: note the signature is ‘a –> ‘a list –> ‘a list (see Mastering F# Lists).
Do not confuse a cons-list with an opaque Linked List implementation which contains a discrete first[/last] node - every cell in a cons-list is the start of a [different] list! That is, a "list" is simply the chain of cells that starts at a given cons-cell.
This offers some advantages when used in a functional-like manner: one is that all the "tail" cells are shared and since each cons-cell is immutable (the "data" might be mutable, but that's a different issue) there is no way to make a change to a "tail" cell and flub up all the other lists which contain said cell.
Because of this property, [new] lists can be efficiently built - that is, they do not require a copy - simply by cons'ing to the front. In addition, it is also very efficient to deconstruct a list to head :: tail - once again, no copy - which is often very useful in recursive functions.
This immutable property generally does not exist in a [standard mutable] Double Linked List implementation in that appending would add side-effects: the internal 'last' node (the type is now opaque) and one of the "tail" cells are changed. (There are immutable sequence types that allow an "effectively constant time" append/update such as immutable.Vector in Scala -- however, these are heavy-weight objects compared to a cons-list that is nothing more than a series of cells cons'ed together.)
As mentioned, there are also disadvantages a cons-list is not appropriate for all tasks - in particular, creating a new list except by cons'ing to the head is an O(n) operation, fsvo n, and for better (or worse) the list is immutable.
I would recommend creating your own version of concat to see how this operation is really done. (The article Why I love F#: Lists - The Basics covers this.)
Happy coding.
Also see related post: Why can you only prepend to lists in functional languages?
F# lists are immutable, there's no such thing as "append/concat", rather there's just creating new lists (that may reuse some suffixes of old lists). Immutability has many advantages, outside the scope of this question. (All pure languages, and most functional languages have this data structure, it is not an F#-ism.)
See also
http://diditwith.net/2008/03/03/WhyILoveFListsTheBasics.aspx
which has good picture diagrams to explain things (e.g. why consing on the front is cheaper than at the end for an immutable list).
In addition to what the others said: if you need efficient, but yet immutable data structures (which should be an idiomatic F# way), you have to consider reading Chris Okasaki, Purely Functional Data Structures. There is also a thesis available (on which the book is based).
In addition to what has been already said, the Introducing Functional Programming section on MSDN has an article about Working with Functional Lists that explains how lists work and also implements them in C#, so it may be a good way to understand how they work (and why adding reference to the last element would not allow efficient implementation of append).
If you need to append things to the end of the list, as well as to the front, then you need a different data structure. For example, Norman Ramsey posted source code for DList which has these properties here (The implementation is not idiomatic F#, but it should be easy to fix).
If you find you want a list with better performance for append operations, have a look at the QueueList in the F# PowerPack and the JoinList in the FSharpx extension libraries.
QueueList encapsulates two lists. When you prepend using the cons, it prepends an element to the first list, just as a cons-list. However, if you want to append a single element, it can be pushed to the top of the second list. When the first list runs out of elements, List.rev is run on the second list, and the two are swapped putting your list back in order and freeing the second list to append new elements.
JoinList uses a discriminated union to more efficiently append whole lists and is a bit more involved.
Both are obviously less performant for standard cons-list operations but offer better performance for other scenarios.
You can read more about these structures in the article Refactoring Pattern Matching.
As others have pointed out, an F# list could be represented by a data structure:
List<T> { T Value; List<T> Tail; }
From here, the convention is that a list goes from the List you have a reference to until Tail is null. Based on that definition, the benefits/features/limitations in the other answers come naturally.
However, your original question seems to be why the list is not defined more like:
List<T> { Node<T> Head; Node<T> Tail; }
Node<T> { T Value; Node<T> Next; }
Such a structure would allow both appending and prepending to the list without any visible effects to the a reference to the original list, since it still only sees a "window" of the now expanded list. Although this would (sometimes) allow O(1) concatenation, there are several issues such a feature would face:
The concatenation only works once. This can lead to unexpected performance behavior where one concatenation is O(1), but the next is O(n). Say for example:
listA = makeList1 ()
listB = makeList2 ()
listC = makeList3 ()
listD = listA + listB //modified Node at tail of A for O(1)
listE = listA + listC //must now make copy of A to concat with C
You could argue that the time savings for the cases where possible are worth it, but the surprise of not knowing when it will be O(1) and when O(n) are strong arguments against the feature.
All lists now take up twice as much space, even if you never plan to concatenate them.
You now have a separate List and Node type. In the current implementation, I believe F# only uses a single type like the beginning of my answer. There may be a way to do what you are suggesting with only one type, but it is not obvious to me.
The concatenation requires mutating the original "tail" node instance. While this shouldn't affect programs, it is a point of mutation, which most functional languages tend to avoid.
Or phrased another way, what kind of benefits do you get from having a basic, singly linked list with only a head pointer? The benefits of a tail pointer that I can see are:
O(1) list concatenation
O(1) Appending stuff to the right side of the list
Both of which are rather convenient things to have, as opposed to O(n) list concatenation (where n is the length of the left-side list?).
If by "tail pointer" you mean a pointer from every list to the last element in the list, that alone cannot be used to provide either of the benefits you cite. Although you could then get at the last element in the list quickly, you cannot do anything with it because it is immutable.
You could write a mutable doubly-linked list as you say but the mutability would make programs using it significantly harder to reason about because every function you call with one might change it.
As Brian said, there are purely functional catenable lists. However, they are many times slower at common operations than the simple singly-linked list that F# uses.
What advantages does dropping the tail pointer have?
30% less space usage and better performance on virtually all list operations.

Binary search in Erlang in lg(n) time

I was searching through the possible work arounds for doing Binary search in Erlang and I found http://ruslanspivak.com/2007/08/15/my-erlang-binary-search/ But I was wondering if the solution in blog runs in O(lg n). Now since the recurrence for Binary search is:T(n) = T(n/2) + c which gives me an execution time of O(lg n).
Since in a C array you have the power of accessing any element in O(1) time. But in erlang if accessing the middle of list takes cn time, then binary search runs in linear overall time as poor as linear search.
I came across lists:nth/2 BIF for finding the nth item in a list but I am not sure about its execution time.
Any comments ?
There are a few data structures that allow O(1) access in Erlang: ETS tables, tuples and binaries.
Now, none of them would really be appropriate for a binary search. The ETS table supports searching from the start, and otherwise, data is copied to your process when returning the result, which is likely not going to be optimal for your use case.
Tuples allow O(1) access with element/2, but modifying them has a certain overhead (which is why the array module uses trees of tuples).
Then you have binaries (<<1,2,3,4,5>>), which allow for something similar to pointer arithmetic, like in the following example:
1> Sorted = <<$a,$b,$c,$d,$e,$f,$g,$h>>.
<<"abcdefgh">>
2> <<_:3/binary, X:1/binary, _/binary>> = Sorted.
<<"abcdefgh">>
3> X.
<<"d">>
However, predicting the performance when building the binary is a bit sketchy, and this kind of pointer arithmetic is harder to do if your values have different types and different sizes when represented in a binary.
Your best bet would likely be to use a list of values, sort it, then use list_to_tuple/1 to navigate around it with element/2.
I would however strongly recommend using a tree to do your searching; it would likely be much much simpler to use the gb_tree module to build a balanced tree and still get O(log N) search.
nth is O(n). Use array module for constant access data structure (array as in C - almost).

Resources