Related
Why is the following saying variable unbound?
9> {<<A:Length/binary, Rest/binary>>, Length} = {<<1,2,3,4,5>>, 3}.
* 1: variable 'Length' is unbound
It's pretty clear that Length should be 3.
I am trying to have a function with similar pattern matching, ie.:
parse(<<Body:Length/binary, Rest/binary>>, Length) ->
But if fails with the same reason. How can I achieve the pattern matching I want?
What I am really trying to achieve is parse in incoming tcp stream packets as LTV(Length, Type, Value).
At some point after I parse the the Length and the Type, I want to ready only up to Length number of bytes as the value, as the rest will probably be for the next LTV.
So my parse_value function is like this:
parse_value(Value0, Left, Callback = {Module, Function},
{length, Length, type, Type, value, Value1}) when byte_size(Value0) >= Left ->
<<Value2:Left/binary, Rest/binary>> = Value0,
Module:Function({length, Length, type, Type, value, lists:reverse([Value2 | Value1])}),
if
Rest =:= <<>> ->
{?MODULE, parse, {}};
true ->
parse(Rest, Callback, {})
end;
parse_value(Value0, Left, _, {length, Length, type, Type, value, Value1}) ->
{?MODULE, parse_value, Left - byte_size(Value0), {length, Length, type, Type, value, [Value0 | Value1]}}.
If I could do the pattern matching, I could break it up to something more pleasant to the eye.
The rules for pattern matching are that if a variable X occurs in two subpatterns, as in {X, X}, or {X, [X]}, or similar, then they have to have the same value in both positions, but the matching of each subpattern is still done in the same input environment - bindings from one side do not carry over to the other. The equality check is conceptually done afterwards, as if you had matched on {X, X2} and added a guard X =:= X2. This means that your Length field in the tuple cannot be used as input to the binary pattern, not even if you make it the leftmost element.
However, within a binary pattern, variables bound in a field can be used in other fields following it, left-to-right. Therefore, the following works (using a leading 32-bit size field in the binary):
1> <<Length:32, A:Length/binary, Rest/binary>> = <<0,0,0,3,1,2,3,4,5>>.
<<0,0,0,3,1,2,3,4,5>>
2> A.
<<1,2,3>>
3> Rest.
<<4,5>>
I've run into this before. There is some weirdness between what is happening inside binary syntax and what happens during unification (matching). I suspect that it is just that binary syntax and matching occur at different times in the VM somewhere (we don't know which Length is failing to get assigned -- maybe binary matching is always first in evaluation, so Length is still meaningless). I was once going to dig in and find out, but then I realized that I never really needed to solve this problem -- which might be why it was never "solved".
Fortunately, this won't stop you with whatever you are doing.
Unfortunately, we can't really help further unless you explain the context in which you think this kind of a match is a good idea (you are having an X-Y problem).
In binary parsing you can always force the situation to be one of the following:
Have a fixed-sized header at the beginning of the binary message that tells you the next size element you need (and from there that can continue as a chain of associations endlessly)
Inspect the binary once on entry to determine the size you are looking for, pull that one value, and then begin the real parsing task
Have a set of fields, all of predetermined sizes that conform to some a binary schema standard
Convert the binary to a list and iterate through it with any arbitrary amount of look-ahead and backtracking you might need
Quick Solution
Without knowing anything else about your general problem, a typical solution would look like:
parse(Length, Bin) ->
<<Body:Length/binary, Rest/binary>> = Bin,
ok = do_something(Body),
do_other_stuff(Rest).
But I smell something funky here.
Having things like this in your code is almost always a sign that a more fundamental aspect of the code structure is not in agreement with the data that you are handling.
But deadlines.
Erlang is all about practical code that satisfies your goals in the real world. With that in mind, I suggest that you do something like the above for now, and then return to this problem domain and rethink it. Then refactor it. This will gain you three benefits:
Something will work right away.
You will later learn something fundamental about parsing in general.
Your code will almost certainly run faster if it fits your data better.
Example
Here is an example in the shell:
1> Parse =
1> fun
1> (Length, Bin) when Length =< byte_size(Bin) ->
1> <<Body:Length/binary, Rest/binary>> = Bin,
1> ok = io:format("Chopped off ~p bytes: ~p~n", [Length, Body]),
1> Rest;
1> (Length, Bin) ->
1> ok = io:format("Binary shorter than ~p~n", [Length]),
1> Bin
1> end.
#Fun<erl_eval.12.87737649>
2> Parse(3, <<1,2,3,4,5>>).
Chopped off 3 bytes: <<1,2,3>>
<<4,5>>
3> Parse(8, <<1,2,3,4,5>>).
Binary shorter than 8
<<1,2,3,4,5>>
Note that this version is a little safer, in that we avoid a crash in the case that Length is longer than the binary. This is yet another good reason why maybe we can't do that match in the function head.
Try with below code:
{<<A:Length/binary, Rest/binary>>, _} = {_, Length} = {<<1,2,3,4,5>>, 3}.
This question is mentioned a bit in EEP-52:
Any variables used in the expression must have been previously bound, or become bound in the same binary pattern as the expression. That is, the following example is illegal:
illegal_example2(N, <<X:N,T/binary>>) ->
{X,T}.
And explained a bit more in the following e-mail: http://erlang.org/pipermail/eeps/2020-January/000636.html
Illegal. With one exception, matching is not done in a left-to-right
order, but all variables in the pattern will be bound at the same
time. That means that the variables must be bound before the match
starts. For maps, that means that the variables referenced in key
expressions must be bound before the case (or receive) that matches
the map. In a function head, all map keys must be literals.
The exception to this general rule is that within a binary pattern,
the segments are matched from left to right, and a variable bound in a
previous segment can be used in the size expression for a segment
later in the binary pattern.
Also one of the members of OTP team mentioned that they made a prototype that can do that, but it was never finished http://erlang.org/pipermail/erlang-questions/2020-May/099538.html
We actually tried to make your example legal. The transformation of
the code that we did was not to rewrite to guards, but to match
arguments or parts of argument in the right order so that variables
that input variables would be bound before being used. (We would do a
topological sort to find the correct order.) For your example, the
transformation would look similar to this:
legal_example(Key, Map) ->
case Map of
#{Key := Value} -> Value;
_ -> error(function_clause, [Key, Map])
end.
In the prototype implementation, the compiler could compile the
following example:
convoluted(Ref,
#{ node(Ref) := NodeId, Loop := universal_answer},
[{NodeId, Size} | T],
<<Int:(Size*8+length(T)),Loop>>) when is_reference(Ref) ->
Int.
Things started to fall apart when variables are repeated. Repeated
variables in patterns already have a meaning in Erlang (they should be
the same), so it become tricky to understand to distinguish between
variables being bound or variables being used a binary size or map
key. Here is an example that the prototype couldn't handle:
foo(#{K := K}, K) -> ok.
A human can see that it should be transformed similar to this:
foo(Map, K) -> case Map of
{K := V} when K =:= V -> ok end.
Here are few other examples that should work but the prototype would
refuse to compile (often emitting an incomprehensible error message):
bin2(<<Sz:8,X:Sz>>, <<Y:Sz>>) -> {X,Y}.
repeated_vars(#{K := #{K := K}}, K) -> K.
match_map_bs(#{K1 := {bin,<<Int:Sz>>}, K2 := <<Sz:8>>}, {K1,K2}) ->
Int.
Another problem was when example was correctly rejected, the error
message would be confusing.
Because much more work would clearly be needed, we have shelved the
idea for now. Personally, I am not sure that the idea is sound in the
first place. But I am sure of one thing: the implementation would be
very complicated.
UPD: latest news from 2020-05-14
If I declare a function
test(A) -> 3.
Erlang generates a warning about variable A not being used. However the definition
isEqual(X,X) -> 1.
Doesn't produce any warning but
isEqual(X,X) -> 1;
isEqual(X,Y) -> 0.
again produces a warning but only for the second line.
The reason why that doesn't generate a warning is because in the second case you are asserting (through pattern matching), by using the same variable name, that the first and second arguments to isEqual/2 have the same value. So you are actually using the value of the argument.
It might help to understand better if we look at the Core Erlang code produced from is_equal/2. You can get .core source files by compiling your .erl file in the following way: erlc +to_core pattern.erl (see here for pattern.erl).
This will produce a pattern.core file that will look something like this (module_info/[0,1] functions removed):
module 'pattern' ['is_equal'/2]
attributes []
'is_equal'/2 = fun (_cor1,_cor0) ->
case <_cor1,_cor0> of
%% Line 5
<X,_cor4> when call 'erlang':'=:=' (_cor4, X) ->
1
%% Line 6
<X,Y> when 'true' ->
0
end
As you can see, each function clause from is_equal/2 in the .erl source code gets translated to a case clause in Core Erlang. X does get used in the first clause since it needs to be compared to the other argument. On the other hand neither X or Y are used in the second clause.
I've created the snippet below based on this tutorial. The last two lines (feed_squid(FeederRP) and feed_red_panda(FeederSquid)) are obviously violating the defined constraints, yet Dialyzer finds them okay. This is quite disappointing, because this is exactly the type of error I want to catch with a tool performing static analysis.
There is an explanation provided in the tutorial:
Before the functions are called with the wrong kind of feeder, they're
first called with the right kind. As of R15B01, Dialyzer would not
find an error with this code. The observed behaviour is that as soon
as a call to a given function succeeds within the function's body,
Dialyzer will ignore later errors within the same unit of code.
What is the rationale for this behavior? I understand that the philosophy behind success typing is "to never cry wolf", but in the current scenario Dialyzer plainly ignores the intentionally defined function specifications (after it sees that the functions have been called correctly earlier). I understand that the code does not result in a runtime crash. Can I somehow force Dialyzer to always take my function specifications seriously? If not, is there a tool that can do it?
-module(zoo).
-export([main/0]).
-type red_panda() :: bamboo | birds | eggs | berries.
-type squid() :: sperm_whale.
-type food(A) :: fun(() -> A).
-spec feeder(red_panda) -> food(red_panda());
(squid) -> food(squid()).
feeder(red_panda) ->
fun() ->
element(random:uniform(4), {bamboo, birds, eggs, berries})
end;
feeder(squid) ->
fun() -> sperm_whale end.
-spec feed_red_panda(food(red_panda())) -> red_panda().
feed_red_panda(Generator) ->
Food = Generator(),
io:format("feeding ~p to the red panda~n", [Food]),
Food.
-spec feed_squid(food(squid())) -> squid().
feed_squid(Generator) ->
Food = Generator(),
io:format("throwing ~p in the squid's aquarium~n", [Food]),
Food.
main() ->
%% Random seeding
<<A:32, B:32, C:32>> = crypto:rand_bytes(12),
random:seed(A, B, C),
%% The zoo buys a feeder for both the red panda and squid
FeederRP = feeder(red_panda),
FeederSquid = feeder(squid),
%% Time to feed them!
feed_squid(FeederSquid),
feed_red_panda(FeederRP),
%% This should not be right!
feed_squid(FeederRP),
feed_red_panda(FeederSquid).
Minimizing the example quite a bit I have these two versions:
First one that Dialyzer can catch:
-module(zoo).
-export([main/0]).
-type red_panda_food() :: bamboo.
-type squid_food() :: sperm_whale.
-spec feed_squid(fun(() -> squid_food())) -> squid_food().
feed_squid(Generator) -> Generator().
main() ->
%% The zoo buys a feeder for both the red panda and squid
FeederRP = fun() -> bamboo end,
FeederSquid = fun() -> sperm_whale end,
%% CRITICAL POINT %%
%% This should not be right!
feed_squid(FeederRP),
%% Time to feed them!
feed_squid(FeederSquid)
Then the one with no warnings:
[...]
%% CRITICAL POINT %%
%% Time to feed them!
feed_squid(FeederSquid)
%% This should not be right!
feed_squid(FeederRP).
Dialyzer's warnings for the version it can catch are:
zoo.erl:7: The contract zoo:feed_squid(fun(() -> squid_food())) -> squid_food() cannot be right because the inferred return for feed_squid(FeederRP::fun(() -> 'bamboo')) on line 15 is 'bamboo'
zoo.erl:10: Function main/0 has no local return
... and is a case of preferring to trust its own judgement against a user's tighter spec.
For the version it doesn't catch, Dialyzer assumes that the feed_squid/1 argument's type fun() -> bamboo is a supertype of fun() -> none() (a closure that will crash, which, if not called within feed_squid/1, is still a valid argument). After the types have been inferred, Dialyzer cannot know if a passed closure is actually called within a function or not.
Dialyzer still gives a warning if the option -Woverspecs is used:
zoo.erl:7: Type specification zoo:feed_squid(fun(() -> squid_food())) -> squid_food() is a subtype of the success typing: zoo:feed_squid(fun(() -> 'bamboo' | 'sperm_whale')) -> 'bamboo' | 'sperm_whale'
... warning that nothing prevents this function to handle the other feeder or any given feeder! If that code did check for the closure's expected input/output, instead of being generic, then I am pretty sure that Dialyzer would catch the abuse. From my point of view, it is much better if your actual code checks for erroneous input instead of you relying on type specs and Dialyzer (which never promised to find all the errors anyway).
WARNING: DEEP ESOTERIC PART FOLLOWS!
The reason why the error is reported in the first case but not the second has to do with the progress of module-local refinement. Initially the function feed_squid/1 has success typing (fun() -> any()) -> any(). In the first case the function feed_squid/1 will first be refined with just the FeederRP and will definitely return bamboo, immediately falsifying the spec and stopping further analysis of main/0. In the second, the function feed_squid/1 will first be refined with just the FeederSquid and will definitely return sperm_whale, then refined with both FeederSquid and FeederRP and return sperm_whale OR bamboo. When then called with FeederRP the expected return value success-typing-wise is sperm_whale OR bamboo. The spec then promises that it will be sperm_whale and Dialyzer accepts it. On the other hand, the argument should be fun() -> bamboo | sperm_whale success-typing-wise, it is fun() -> bamboo so that leaves it with just fun() -> bamboo. When that is checked against the spec (fun() -> sperm_whale), Dialyzer assumes that the argument could be fun() -> none(). If you never call the passed function within feed_squid/1 (something that Dialyzer's type system doesn't keep as information), and you promise in the spec that you will always return sperm_whale, everything is fine!
What can be 'fixed' is for the type system to be extended to note when a closure that is passed as an argument is actually used in a call and warn in cases where the only way to 'survive' some part of the type inference is to be fun(...) -> none().
(Note, I am speculating a bit here. I have not read the dialyzer code in detail).
A "Normal" full-fledged type checker has the advantage that type checking is decidable. We can ask "Is this program well-typed" and get either a Yes or a No back when the type checker terminates. Not so for the dialyzer. It is essentially in the business of solving the halting problem. The consequence is that there will be programs which are blatantly wrong, but still slips through the grips of the dialyzer.
However, this is not one of those cases :)
The problem is two-fold. A success type says "If this function terminates normally, what is its type?". In the above, our feed_red_panda/1 function terminates for any argument matching fun (() -> A) for an arbitrary type A. We could call feed_red_panda(fun erlang:now/0) and it should also work. Thus our two calls to the function in main/0 does not give rise to a problem. They both terminate.
The second part of the problem is "Did you violate the spec?". Note that often, specs are not used in the dialyzer as a fact. It infers the types itself and uses the inference patterns instead of your spec. Whenever a function is called, it is annotated with the parameters. In our case, it will be annotated with the two generator types: food(red_panda()), food(squid()). Then a function local analysis is made based on these annotations in order to figure out if you violated the spec. Since the correct parameters are present in the annotations, we must assume the function is used correctly in some part of the code. That it is also called with squids could be an artifact of code which are never called due to other circumstances. Since we are function-local we don't know and give the benefit of doubt to the programmer.
If you change the code to only make the wrong call with a squid-generator, then we find the spec-discrepancy. Because we know the only possible call site violates the spec. If you move the wrong call to another function, it is not found either. Because the annotation is still on the function and not on the call site.
One could imagine a future variant of the dialyzer which accounted for the fact that each call-site can be handled individually. Since the dialyzer is changing as well over time, it may be that it will be able to handle this situation in the future. But currently, it is one of the errors that will slip through.
The key is to notice that the dialyzer cannot be used as a "Checker of well-typedness". You can't use it to enforce structure on your programs. You need to do that yourself. If you would like more static checking, it would probably be possible to write a type checker for Erlang and run it on parts of your code base. But you will run into trouble with code upgrades and distribution, which are not easy to handle.
Is there a generic way, given a complex object in Erlang, to come up with a valid function declaration for it besides eyeballing it? I'm maintaining some code previously written by someone who was a big fan of giant structures, and it's proving to be error prone doing it manually.
I don't need to iterate the whole thing, just grab the top level, per se.
For example, I'm working on this right now -
[[["SIP",47,"2",46,"0"],32,"407",32,"Proxy Authentication Required","\r\n"],
[{'Via',
[{'via-parm',
{'sent-protocol',"SIP","2.0","UDP"},
{'sent-by',"172.20.10.5","5060"},
[{'via-branch',"z9hG4bKb561e4f03a40c4439ba375b2ac3c9f91.0"}]}]},
{'Via',
[{'via-parm',
{'sent-protocol',"SIP","2.0","UDP"},
{'sent-by',"172.20.10.15","5060"},
[{'via-branch',"12dee0b2f48309f40b7857b9c73be9ac"}]}]},
{'From',
{'from-spec',
{'name-addr',
[[]],
{'SIP-URI',
[{userinfo,{user,"003018CFE4EF"},[]}],
{hostport,"172.20.10.11",[]},
{'uri-parameters',[]},
[]}},
[{tag,"b7226ffa86c46af7bf6e32969ad16940"}]}},
{'To',
{'name-addr',
[[]],
{'SIP-URI',
[{userinfo,{user,"3966"},[]}],
{hostport,"172.20.10.11",[]},
{'uri-parameters',[]},
[]}},
[{tag,"a830c764"}]},
{'Call-ID',"90df0e4968c9a4545a009b1adf268605#172.20.10.15"},
{'CSeq',1358286,"SUBSCRIBE"},
["date",'HCOLON',
["Mon",44,32,["13",32,"Jun",32,"2011"],32,["17",58,"03",58,"55"],32,"GMT"]],
{'Contact',
[[{'name-addr',
[[]],
{'SIP-URI',
[{userinfo,{user,"3ComCallProcessor"},[]}],
{hostport,"172.20.10.11",[]},
{'uri-parameters',[]},
[]}},
[]],
[]]},
["expires",'HCOLON',3600],
["user-agent",'HCOLON',
["3Com",[]],
[['LWS',["VCX",[]]],
['LWS',["7210",[]]],
['LWS',["IP",[]]],
['LWS',["CallProcessor",[['SLASH',"v10.0.8"]]]]]],
["proxy-authenticate",'HCOLON',
["Digest",'LWS',
["realm",'EQUAL',['SWS',34,"3Com",34]],
[['COMMA',["domain",'EQUAL',['SWS',34,"3Com",34]]],
['COMMA',
["nonce",'EQUAL',
['SWS',34,"btbvbsbzbBbAbwbybvbxbCbtbzbubqbubsbqbtbsbqbtbxbCbxbsbybs",
34]]],
['COMMA',["stale",'EQUAL',"FALSE"]],
['COMMA',["algorithm",'EQUAL',"MD5"]]]]],
{'Content-Length',0}],
"\r\n",
["\n"]]
Maybe https://github.com/etrepum/kvc
I noticed your clarifying comment. I'd prefer to add a comment myself, but don't have enough karma. Anyway, the trick I use for that is to experiment in the shell. I'll iterate a pattern against a sample data structure until I've found the simplest form. You can use the _ match-all variable. I use an erlang shell inside an emacs shell window.
First, bind a sample to a variable:
A = [{a,b},[{c,d}, {e,f}]].
Now set the original structure against the variable:
[{a,b},[{c,d},{e,f}]] = A.
If you hit enter, you'll see they match. Hit alt-p (forget what emacs calls alt, but it's alt on my keyboard) to bring back the previous line. Replace some tuple or list item with an underscore:
[_,[{c,d},{e,f}]].
Hit enter to make sure you did it right and they still match. This example is trivial, but for deeply nested, multiline structures it's trickier, so it's handy to be able to just quickly match to test. Sometimes you'll want to try to guess at whole huge swaths, like using an underscore to match a tuple list inside a tuple that's the third element of a list. If you place it right, you can match the whole thing at once, but it's easy to misread it.
Anyway, repeat to explore the essential shape of the structure and place real variables where you want to pull out values:
[_, [_, _]] = A.
[_, _] = A.
[_, MyTupleList] = A. %% let's grab this tuple list
[{MyAtom,b}, [{c,d}, MyTuple]] = A. %% or maybe we want this atom and tuple
That's how I efficiently dissect and pattern match complex data structures.
However, I don't know what you're doing. I'd be inclined to have a wrapper function that uses KVC to pull out exactly what you need and then distributes to helper functions from there for each type of structure.
If I understand you correctly you want to pattern match some large datastructures of unknown formatting.
Example:
Input: {a, b} {a,b,c,d} {a,[],{},{b,c}}
function({A, B}) -> do_something;
function({A, B, C, D}) when is_atom(B) -> do_something_else;
function({A, B, C, D}) when is_list(B) -> more_doing.
The generic answer is of course that it is undecidable from just data to know how to categorize that data.
First you should probably be aware of iolists. They are created by functions such as io_lib:format/2 and in many other places in the code.
One example is that
[["SIP",47,"2",46,"0"],32,"407",32,"Proxy Authentication Required","\r\n"]
will print as
SIP/2.0 407 Proxy Authentication Required
So, I'd start with flattening all those lists, using a function such as
flatten_io(List) when is_list(List) ->
Flat = lists:map(fun flatten_io/1, List),
maybe_flatten(Flat);
flatten_io(Tuple) when is_tuple(Tuple) ->
list_to_tuple([flatten_io(Element) || Element <- tuple_to_list(Tuple)];
flatten_io(Other) -> Other.
maybe_flatten(L) when is_list(L) ->
case lists:all(fun(Ch) when Ch > 0 andalso Ch < 256 -> true;
(List) when is_list(List) ->
lists:all(fun(X) -> X > 0 andalso X < 256 end, List);
(_) -> false
end, L) of
true -> lists:flatten(L);
false -> L
end.
(Caveat: completely untested and quite inefficient. Will also crash for inproper lists, but you shouldn't have those in your data structures anyway.)
On second thought, I can't help you. Any data structure that uses the atom 'COMMA' for a comma in a string should be taken out and shot.
You should be able to flatten those things as well and start to get a view of what you are looking at.
I know that this is not a complete answer. Hope it helps.
Its hard to recommend something for handling this.
Transforming all the structures in a more sane and also more minimal format looks like its worth it. This depends mainly on the similarities in these structures.
Rather than having a special function for each of the 100 there must be some automatic reformatting that can be done, maybe even put the parts in records.
Once you have records its much easier to write functions for it since you don't need to know the actual number of elements in the record. More important: your code won't break when the number of elements changes.
To summarize: make a barrier between your code and the insanity of these structures by somehow sanitizing them by the most generic code possible. It will be probably a mix of generic reformatting with structure speicific stuff.
As an example already visible in this struct: the 'name-addr' tuples look like they have a uniform structure. So you can recurse over your structures (over all elements of tuples and lists) and match for "things" that have a common structure like 'name-addr' and replace these with nice records.
In order to help you eyeballing you can write yourself helper functions along this example:
eyeball(List) when is_list(List) ->
io:format("List with length ~b\n", [length(List)]);
eyeball(Tuple) when is_tuple(Tuple) ->
io:format("Tuple with ~b elements\n", [tuple_size(Tuple)]).
So you would get output like this:
2> eyeball({a,b,c}).
Tuple with 3 elements
ok
3> eyeball([a,b,c]).
List with length 3
ok
expansion of this in a useful tool for your use is left as an exercise. You could handle multiple levels by recursing over the elements and indenting the output.
Use pattern matching and functions that work on lists to extract only what you need.
Look at http://www.erlang.org/doc/man/lists.html:
keyfind, keyreplace, L = [H|T], ...
I'm trying to learn a little of the mindset of functional programming in F#, so any tips are appreciated. Right now I'm making a simple recursive function which takes a list and returns the i:th element.
let rec nth(list, i) =
match (list, i) with
| (x::xs, 0) -> x
| (x::xs, i) -> nth(xs, i-1)
The function itself seems to work, but it warns me about an incomplete pattern. I'm not sure what to return when I match the empty list in this case, since if I for example do the following:
| ([], _) -> ()
The whole function is treated like a function that takes a unit as argument. I want it to treat is as a polymorphic function.
While I'm at it, I may as well ask how far is reasonable to go to check for valid arguments when designing a function when developing seriously. Should I check for everything, so "misuse" of the function is prevented? In the above example I could for example specify the function to try to access an element in the list that is larger than its size. I hope my question isn't too confusing :)
You can learn a lot about the "usual" library design by looking at the standard F# libraries. There is already a function that does what you want called List.nth, but even if you're implementing this as an exercise, you can check how the function behaves:
> List.nth [ 1 .. 3 ] 10;;
System.ArgumentException: The index was outside the range
of elements in the list. Parameter name: index
The function throws System.ArgumentException with some additional information about the exception, so that users can easily find out what went wrong. To implement the same functionality, you can use the invalidArg function:
| _ -> invalidArg "index" "Index is out of range."
This is probably better than just using failwith which throws a more general exception. When using invalidArg, users can check for a specific type of exceptions.
As kvb noted, another option is to return option 'a. Many standard library functions provide both a version that returns option and a version that throws an exception. For example List.pick and List.tryPick. So, maybe a good design in your case would be to have two functions - nth and tryNth.
If you want your function to return a meaningful result and to have the same type as it has now, then you have no alternative but to throw an exception in the remaining case. A matching failure will throw an exception, so you don't need to change it, but you may find it preferable to throw an exception with more relevant information:
| _ -> failwith "Invalid list index"
If you expect invalid list indices to be rare, then this is probably good enough. However, another alternative would be to change your function so that it returns an 'a option:
let rec nth = function
| x::xs, 0 -> Some(x)
| [],_ -> None
| _::xs, i -> nth(xs, i-1)
This places an additional burden on the caller, who must now explicitly deal with the possibility of failure.
Presumably, if taking an empty list is invalid, you're best off just throwing an exception?
Generally the rules for how defensive you should be don't really change from language to language - I always go by the guideline that if it's public be paranoid about validating input, but if it's private code, you can be less strict. (Actually if it's a large project, and it's private code, be a little strict... basically strictness is proportional to the number of developers who might call your code.)