Splitting a list in OCaml - linked-list

I am wondering whether there is an option for splitting a list in half (or in general at specified element).
To be precise I would like to do something like this:
Having a list (of integers for example):
let ml = [1;2;3;4;5]
I would like to make two lists out of it with lengths specified with an argument of first list.
It would look like something like this:
let msl1, msl2 = split_at_point ml 3
(* msl1 = [1;2;3], msl2=[4;5] *)
To be honest I don't really care if the splitting occurs at the specific point inclusively or not. All I care is it would be fast and memory saving (no copying would be great). And I know that best in terms of efficiency is something like O(n) where n the length of the first list (of the results).

You can't avoid copying the first half of the list. List are immutable, so the only way to get the new list is to copy elements from the original list.
The second half of the list doesn't require a copy, because the tail of a list is a list.
There's no built-in function for doing this (in the OCaml standard library anyway), so I'd suggest you write your own.
This seems very much like something that would come up a homework assignment, so I'll just say that the way to write most list functions in a functional language is to ask yourself how you could use a function that works for a smaller input (generally, the tail of the list) to calculate the value for your original input.
In your case you have something like this:
let rec split_at_point l n =
if n = 0 then
(* Answer is obvious *)
else
match l with
| [] -> (* Answer is obvious *)
| head :: tail ->
(* Call split_at_point on the tail and
* construct your answer
*)

Related

Erlang flatten function time complexity

I need a help with following:
flatten ([]) -> [];
flatten([H|T]) -> H ++ flatten(T).
Input List contain other lists with a different length
For example:
flatten([[1,2,3],[4,7],[9,9,9,9,9,9]]).
What is the time complexity of this function?
And why?
I got it to O(n) where n is a number of elements in the Input list.
For example:
flatten([[1,2,3],[4,7],[9,9,9,9,9,9]]) n=3
flatten([[1,2,3],[4,7],[9,9,9,9,9,9],[3,2,4],[1,4,6]]) n=5
Thanks for help.
First of all I'm not sure your code will work, at least not in the way standard library works. You could compare your function with lists:flatten/1 and maybe improve on your implementation. Try lists such as [a, [b, c]] and [[a], [b, [c]], [d]] as input and verify if you return what you expected.
Regarding complexity it is little tricky due to ++ operator and functional (immutable) nature of the language. All lists in Erlang are linked lists (not arrays like in C++), and you can not just add something to end of one without modifying it; before it was pointing to end of list, now you would like it to link to something else. And again, since it is not mutable language you have to make copy of whole list left of ++ operator, which increases complexity of this operator.
You could say that complexity of A ++ B is length(A), and it makes complexity of your function little bit greater. It would go something like length(FirstElement) + (lenght(FirstElement) + length(SecondElement)) + .... up to (without) last, which after some math magic could be simplified to (n -1) * 1/2 * k * k where n is number of elements, and k is average length of element. Or O(n^3).
If you are new to this it might seem little bit odd, but with some practice you can get hang of it. I would recommend going through few resources:
Good explanation of lists and how they are created
Documentation on list handling with DO and DO NOT parts
Short description of ++ operator myths and best practices
Chapter about recursion and tail-recursion with examples using ++ operator

maps,filter,folds and more? Do we really need these in Erlang?

Maps, filters, folds and more : http://learnyousomeerlang.com/higher-order-functions#maps-filters-folds
The more I read ,the more i get confused.
Can any body help simplify these concepts?
I am not able to understand the significance of these concepts.In what use cases will these be needed?
I think it is majorly because of the syntax,diff to find the flow.
The concepts of mapping, filtering and folding prevalent in functional programming actually are simplifications - or stereotypes - of different operations you perform on collections of data. In imperative languages you usually do these operations with loops.
Let's take map for an example. These three loops all take a sequence of elements and return a sequence of squares of the elements:
// C - a lot of bookkeeping
int data[] = {1,2,3,4,5};
int squares_1_to_5[sizeof(data) / sizeof(data[0])];
for (int i = 0; i < sizeof(data) / sizeof(data[0]); ++i)
squares_1_to_5[i] = data[i] * data[i];
// C++11 - less bookkeeping, still not obvious
std::vec<int> data{1,2,3,4,5};
std::vec<int> squares_1_to_5;
for (auto i = begin(data); i < end(data); i++)
squares_1_to_5.push_back((*i) * (*i));
// Python - quite readable, though still not obvious
data = [1,2,3,4,5]
squares_1_to_5 = []
for x in data:
squares_1_to_5.append(x * x)
The property of a map is that it takes a collection of elements and returns the same number of somehow modified elements. No more, no less. Is it obvious at first sight in the above snippets? No, at least not until we read loop bodies. What if there were some ifs inside the loops? Let's take the last example and modify it a bit:
data = [1,2,3,4,5]
squares_1_to_5 = []
for x in data:
if x % 2 == 0:
squares_1_to_5.append(x * x)
This is no longer a map, though it's not obvious before reading the body of the loop. It's not clearly visible that the resulting collection might have less elements (maybe none?) than the input collection.
We filtered the input collection, performing the action only on some elements from the input. This loop is actually a map combined with a filter.
Tackling this in C would be even more noisy due to allocation details (how much space to allocate for the output array?) - the core idea of the operation on data would be drowned in all the bookkeeping.
A fold is the most generic one, where the result doesn't have to contain any of the input elements, but somehow depends on (possibly only some of) them.
Let's rewrite the first Python loop in Erlang:
lists:map(fun (E) -> E * E end, [1,2,3,4,5]).
It's explicit. We see a map, so we know that this call will return a list as long as the input.
And the second one:
lists:map(fun (E) -> E * E end,
lists:filter(fun (E) when E rem 2 == 0 -> true;
(_) -> false end,
[1,2,3,4,5])).
Again, filter will return a list at most as long as the input, map will modify each element in some way.
The latter of the Erlang examples also shows another useful property - the ability to compose maps, filters and folds to express more complicated data transformations. It's not possible with imperative loops.
They are used in almost every application, because they abstract different kinds of iteration over lists.
map is used to transform one list into another. Lets say, you have list of key value tuples and you want just the keys. You could write:
keys([]) -> [];
keys([{Key, _Value} | T]) ->
[Key | keys(T)].
Then you want to have values:
values([]) -> [];
values([{_Key, Value} | T}]) ->
[Value | values(T)].
Or list of only third element of tuple:
third([]) -> [];
third([{_First, _Second, Third} | T]) ->
[Third | third(T)].
Can you see the pattern? The only difference is what you take from the element, so instead of repeating the code, you can simply write what you do for one element and use map.
Third = fun({_First, _Second, Third}) -> Third end,
map(Third, List).
This is much shorter and the shorter your code is, the less bugs it has. Simple as that.
You don't have to think about corner cases (what if the list is empty?) and for experienced developer it is much easier to read.
filter searches lists. You give it function, that takes element, if it returns true, the element will be on the returned list, if it returns false, the element will not be there. For example filter logged in users from list.
foldl and foldr are used, when you have to do additional bookkeeping while iterating over the list - for example summing all the elements or counting something.
The best explanations, I've found about those functions are in books about Lisp: "Structure and Interpretation of Computer Programs" and "On Lisp" Chapter 4..

Immutable members on objects

I have an object that can be neatly described by a discriminated union. The tree that it represents has some properties that can be easily updated when the tree is modified (but remaining immutable) but that are relatively expensive to recalculate.
I would like to store those properties along with the object as cached values but I don't want to put them into each of the discriminated union cases so I figured a member variable would fit here.
The question is then, how do I change the member value (when I modify the tree) without mutating the actual object? I know I could modify the tree and then mutate that copy without ruining purity but that seems like a wrong way to go about it to me. It would make sense to me if there was some predefined way to change a property but so that the result of the operation is a new object with that property changed.
To clarify, when I say modify I mean doing it in a functional way. Like (::) "appends" to the beginning of a list. I'm not sure what the correct terminology is here.
F# actually has syntax for copy and update records.
The syntax looks like this:
let myRecord3 = { myRecord2 with Y = 100; Z = 2 }
(example from the MSDN records page - http://msdn.microsoft.com/en-us/library/dd233184.aspx).
This allows the record type to be immutable, and for large parts of it to be preserved, whilst only a small part is updated.
The cleanest way to go about it would really be to carry the 'cached' value attached to the DU (as part of the case) in one way or another. I could think of several ways to implement this, I'll just give you one, where there are separate cases for the cached and non-cached modes:
type Fraction =
| Frac of int * int
| CachedFrac of (int * int) * decimal
member this.AsFrac =
match this with
| Frac _ -> this
| CachedFrac (tup, _) -> Frac tup
An entirely different option would be to keep the cached values in a separate dictionary, this is something that makes sense if all you want to do is save some time recalculating them.
module FracCache =
let cache = System.Collections.Generic.Dictionary<Fraction, decimal>()
let modify (oldFrac: Fraction) (newFrac: Fraction) =
cache.[newFrac] <- cache.[oldFrac] + 1 // need to check if oldFrac has a cached value as well.
Basically what memoize would give you plus you have more control over it.

Writing F# code to parse "2 + 2" into code

Extremely just-started-yesterday new to F#.
What I want: To write code that parses the string "2 + 2" into (using as an example code from the tutorial project) Expr.Add(Expr.Num 2, Expr.Num 2) for evaluation. Some help to at least point me in the right direction or tell me it's too complex for my first F# project. (This is how I learn things: By bashing my head against stuff that's hard)
What I have: My best guess at code to extract the numbers. Probably horribly off base. Also, a lack of clue.
let script = "2 + 2";
let rec scriptParse xs =
match xs with
| [] -> (double)0
| y::ys -> (double)y
let split = (script.Split([|' '|]))
let f x = (split[x]) // "This code is not a function and cannot be applied."
let list = [ for x in 0..script.Length -> f x ]
let result = scriptParse
Thanks.
The immediate issue that you're running into is that split is an array of strings. To access an element of this array, the syntax is split.[x], not split[x] (which would apply split to the singleton list [x], assuming it were a function).
Here are a few other issues:
Your definition of list is probably wrong: x ranges up to the length of script, not the length of the array split. If you want to convert an array or other sequence to a list you can just use List.ofSeq or Seq.toList instead of an explicit list comprehension [...].
Your "casts" to double are a bit odd - that's not the right syntax for performing conversions in F#, although it will work in this case. double is a function, so the parentheses are unnecessary and what you are doing is really calling double 0 and double y. You should just use 0.0 for the first case, and in the second case, it's unclear what you are converting from.
In general, it would probably be better to do a bit more design up front to decide what your overall strategy will be, since it's not clear to me that you'll be able to piece together a working parser based on your current approach. There are several well known techniques for writing a parser - are you trying to use a particular approach?

Creating a valid function declaration from a complex tuple/list structure

Is there a generic way, given a complex object in Erlang, to come up with a valid function declaration for it besides eyeballing it? I'm maintaining some code previously written by someone who was a big fan of giant structures, and it's proving to be error prone doing it manually.
I don't need to iterate the whole thing, just grab the top level, per se.
For example, I'm working on this right now -
[[["SIP",47,"2",46,"0"],32,"407",32,"Proxy Authentication Required","\r\n"],
[{'Via',
[{'via-parm',
{'sent-protocol',"SIP","2.0","UDP"},
{'sent-by',"172.20.10.5","5060"},
[{'via-branch',"z9hG4bKb561e4f03a40c4439ba375b2ac3c9f91.0"}]}]},
{'Via',
[{'via-parm',
{'sent-protocol',"SIP","2.0","UDP"},
{'sent-by',"172.20.10.15","5060"},
[{'via-branch',"12dee0b2f48309f40b7857b9c73be9ac"}]}]},
{'From',
{'from-spec',
{'name-addr',
[[]],
{'SIP-URI',
[{userinfo,{user,"003018CFE4EF"},[]}],
{hostport,"172.20.10.11",[]},
{'uri-parameters',[]},
[]}},
[{tag,"b7226ffa86c46af7bf6e32969ad16940"}]}},
{'To',
{'name-addr',
[[]],
{'SIP-URI',
[{userinfo,{user,"3966"},[]}],
{hostport,"172.20.10.11",[]},
{'uri-parameters',[]},
[]}},
[{tag,"a830c764"}]},
{'Call-ID',"90df0e4968c9a4545a009b1adf268605#172.20.10.15"},
{'CSeq',1358286,"SUBSCRIBE"},
["date",'HCOLON',
["Mon",44,32,["13",32,"Jun",32,"2011"],32,["17",58,"03",58,"55"],32,"GMT"]],
{'Contact',
[[{'name-addr',
[[]],
{'SIP-URI',
[{userinfo,{user,"3ComCallProcessor"},[]}],
{hostport,"172.20.10.11",[]},
{'uri-parameters',[]},
[]}},
[]],
[]]},
["expires",'HCOLON',3600],
["user-agent",'HCOLON',
["3Com",[]],
[['LWS',["VCX",[]]],
['LWS',["7210",[]]],
['LWS',["IP",[]]],
['LWS',["CallProcessor",[['SLASH',"v10.0.8"]]]]]],
["proxy-authenticate",'HCOLON',
["Digest",'LWS',
["realm",'EQUAL',['SWS',34,"3Com",34]],
[['COMMA',["domain",'EQUAL',['SWS',34,"3Com",34]]],
['COMMA',
["nonce",'EQUAL',
['SWS',34,"btbvbsbzbBbAbwbybvbxbCbtbzbubqbubsbqbtbsbqbtbxbCbxbsbybs",
34]]],
['COMMA',["stale",'EQUAL',"FALSE"]],
['COMMA',["algorithm",'EQUAL',"MD5"]]]]],
{'Content-Length',0}],
"\r\n",
["\n"]]
Maybe https://github.com/etrepum/kvc
I noticed your clarifying comment. I'd prefer to add a comment myself, but don't have enough karma. Anyway, the trick I use for that is to experiment in the shell. I'll iterate a pattern against a sample data structure until I've found the simplest form. You can use the _ match-all variable. I use an erlang shell inside an emacs shell window.
First, bind a sample to a variable:
A = [{a,b},[{c,d}, {e,f}]].
Now set the original structure against the variable:
[{a,b},[{c,d},{e,f}]] = A.
If you hit enter, you'll see they match. Hit alt-p (forget what emacs calls alt, but it's alt on my keyboard) to bring back the previous line. Replace some tuple or list item with an underscore:
[_,[{c,d},{e,f}]].
Hit enter to make sure you did it right and they still match. This example is trivial, but for deeply nested, multiline structures it's trickier, so it's handy to be able to just quickly match to test. Sometimes you'll want to try to guess at whole huge swaths, like using an underscore to match a tuple list inside a tuple that's the third element of a list. If you place it right, you can match the whole thing at once, but it's easy to misread it.
Anyway, repeat to explore the essential shape of the structure and place real variables where you want to pull out values:
[_, [_, _]] = A.
[_, _] = A.
[_, MyTupleList] = A. %% let's grab this tuple list
[{MyAtom,b}, [{c,d}, MyTuple]] = A. %% or maybe we want this atom and tuple
That's how I efficiently dissect and pattern match complex data structures.
However, I don't know what you're doing. I'd be inclined to have a wrapper function that uses KVC to pull out exactly what you need and then distributes to helper functions from there for each type of structure.
If I understand you correctly you want to pattern match some large datastructures of unknown formatting.
Example:
Input: {a, b} {a,b,c,d} {a,[],{},{b,c}}
function({A, B}) -> do_something;
function({A, B, C, D}) when is_atom(B) -> do_something_else;
function({A, B, C, D}) when is_list(B) -> more_doing.
The generic answer is of course that it is undecidable from just data to know how to categorize that data.
First you should probably be aware of iolists. They are created by functions such as io_lib:format/2 and in many other places in the code.
One example is that
[["SIP",47,"2",46,"0"],32,"407",32,"Proxy Authentication Required","\r\n"]
will print as
SIP/2.0 407 Proxy Authentication Required
So, I'd start with flattening all those lists, using a function such as
flatten_io(List) when is_list(List) ->
Flat = lists:map(fun flatten_io/1, List),
maybe_flatten(Flat);
flatten_io(Tuple) when is_tuple(Tuple) ->
list_to_tuple([flatten_io(Element) || Element <- tuple_to_list(Tuple)];
flatten_io(Other) -> Other.
maybe_flatten(L) when is_list(L) ->
case lists:all(fun(Ch) when Ch > 0 andalso Ch < 256 -> true;
(List) when is_list(List) ->
lists:all(fun(X) -> X > 0 andalso X < 256 end, List);
(_) -> false
end, L) of
true -> lists:flatten(L);
false -> L
end.
(Caveat: completely untested and quite inefficient. Will also crash for inproper lists, but you shouldn't have those in your data structures anyway.)
On second thought, I can't help you. Any data structure that uses the atom 'COMMA' for a comma in a string should be taken out and shot.
You should be able to flatten those things as well and start to get a view of what you are looking at.
I know that this is not a complete answer. Hope it helps.
Its hard to recommend something for handling this.
Transforming all the structures in a more sane and also more minimal format looks like its worth it. This depends mainly on the similarities in these structures.
Rather than having a special function for each of the 100 there must be some automatic reformatting that can be done, maybe even put the parts in records.
Once you have records its much easier to write functions for it since you don't need to know the actual number of elements in the record. More important: your code won't break when the number of elements changes.
To summarize: make a barrier between your code and the insanity of these structures by somehow sanitizing them by the most generic code possible. It will be probably a mix of generic reformatting with structure speicific stuff.
As an example already visible in this struct: the 'name-addr' tuples look like they have a uniform structure. So you can recurse over your structures (over all elements of tuples and lists) and match for "things" that have a common structure like 'name-addr' and replace these with nice records.
In order to help you eyeballing you can write yourself helper functions along this example:
eyeball(List) when is_list(List) ->
io:format("List with length ~b\n", [length(List)]);
eyeball(Tuple) when is_tuple(Tuple) ->
io:format("Tuple with ~b elements\n", [tuple_size(Tuple)]).
So you would get output like this:
2> eyeball({a,b,c}).
Tuple with 3 elements
ok
3> eyeball([a,b,c]).
List with length 3
ok
expansion of this in a useful tool for your use is left as an exercise. You could handle multiple levels by recursing over the elements and indenting the output.
Use pattern matching and functions that work on lists to extract only what you need.
Look at http://www.erlang.org/doc/man/lists.html:
keyfind, keyreplace, L = [H|T], ...

Resources