Binary Protocol Parsing in Erlang - parsing

I am a bit struggling with extracting fields from a binary message. Raw message looks like the following:
<<1,0,97,98,99,100,0,0,0,3,0,0,0,0,0,0,0,0,0,3,32,3,0,0,88,2,0,0>>
I know the order, type and static sizes of fields, some have arbitary sizes thought, so I am trying to do something like the following:
newobj(Data) ->
io:fwrite("NewObj RAW ~p~n",[Data]),
NewObj = {obj,rest(uint16(string(uint16({[],Data},id),type),parent),unparsed)},
io:fwrite("NewObj ~p~n",[NewObj]),
NewObj.
uint16/2, string/2, and rest/2 are actually extraction functions and look like this:
uint16(ListData, Name) ->
{List, Data} = ListData,
case Data of
<<Int:2/little-unsigned-unit:8, Rest/binary>> ->
{List ++ [{Name,Int}], Rest};
<<Int:2/little-unsigned-unit:8>> ->
List ++ [{Name,Int}]
end.
string(ListData, Name) ->
{List, Data} = ListData,
Split = binary:split(Data,<<0>>),
String = lists:nth(1, Split),
if
length(Split) == 2 ->
{List ++ [{Name, String}], lists:nth(2, Split)};
true ->
List ++ [{Name, String}]
end.
rest(ListData, Name) ->
{List, Data} = ListData,
List ++ [{Name, Data}].
This works and looks like:
NewObj RAW <<1,0,97,98,99,100,0,0,0,3,0,0,0,0,0,0,0,0,0,3,32,3,0,0,88,2,0,0>>
NewObj {obj,[{id,1},
{type,<<"abcd">>},
{parent,0},
{unparsed,<<3,0,0,0,0,0,0,0,0,0,3,32,3,0,0,88,2,0,0>>}]}
The reason for this question though is that passing {List, Data} as ListData and then splitting it within the function with {List, Data} = ListData feels clumsy - so is there a better way? I think I can't use static matching because "unparsed" and "type" parts are of arbitary length, so it's not possible to define their respective sizes.
Thanks!
---------------Update-----------------
Trying to take comments below into account - code now looks like the following:
newobj(Data) ->
io:fwrite("NewObj RAW ~p~n",[Data]),
NewObj = {obj,field(
field(
field({[], Data},id,fun uint16/1),
type, fun string/1),
unparsed,fun rest/1)},
io:fwrite("NewObj ~p~n",[NewObj]).
field({List, Data}, Name, Func) ->
{Value,Size} = Func(Data),
case Data of
<<_:Size/binary-unit:8>> ->
[{Name,Value}|List];
<<_:Size/binary-unit:8, Rest/binary>> ->
{[{Name,Value}|List], Rest}
end.
uint16(Data) ->
case Data of
<<UInt16:2/little-unsigned-unit:8, _/binary>> ->
{UInt16,2};
<<UInt16:2/little-unsigned-unit:8>> ->
{UInt16,2}
end.
string(Data) ->
Split = binary:split(Data,<<0>>),
case Split of
[String, Rest] ->
{String,byte_size(String)+1};
[String] ->
{String,byte_size(String)+1}
end.
rest(Data) ->
{Data,byte_size(Data)}.

The code is non idiomatic and some pieces cannot compile as is :-) Here are some comments:
The newobj/1 function makes a reference to a NewObj variable that is unbound. Probably the real code is something like NewObj = {obj,rest(... ?
The code uses list append (++) multiple times. This should be avoided if possible because it performs too much memory copies. The idiomatic way is to add to the head of the list as many times as needed (that is: L2 = [NewThing | L1]) and call lists:reverse/1 at the very end. See any Erlang book or the free Learn Yourself some Erlang for the details.
In a similar vein, lists:nth/2 should be avoided and replaced by pattern matching or a different way to construct the list or parse the binary
Dogbert's suggestion about doing the pattern matching directly in the function argument is a good idiomatic approach and allows to remove some lines from the code.
As last suggestion regarding the approach to debug, consider replacing the fwrite functions with proper unit tests.
Hope this gives some hints for what to look at. Feel free to append to your question the code changes, we can proceed from there.
EDIT
It's looking better. Let's see if we can simplify. Please note that we are doing the work backwards, because we are adding tests after the production code has been written, instead of doing test-driven development.
Step 1: add test.
I also reversed the order of the list because it looks more natural.
-include_lib("eunit/include/eunit.hrl").
happy_input_test() ->
Rest = <<3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 32, 3, 0, 0, 88, 2, 0, 0>>,
Input = <<1, 0,
97, 98, 99, 100, 0,
0, 0,
Rest/binary>>,
Expected = {obj, [{id, 1}, {type, <<"abcd">>}, {parent, 0}, {unparsed, Rest}]},
?assertEqual(Expected, binparse:newobj(Input)).
We can run this, among other ways, with rebar3 eunit (see the rebar3 documentation; I suggest to start with rebar3 new lib mylib to create a skeleton).
Step 2: the absolute minimum
Your description is not enough to understand which fields are mandatory and which are optional and whether there is always something more after the obj.
In the simplest possible case, all your code can be reduced to:
newobj(Bin) ->
<<Id:16/little-unsigned, Rest/binary>> = Bin,
[Type, Rest2] = binary:split(Rest, <<0>>),
<<Parent:16/little-unsigned, Rest3/binary>> = Rest2,
{obj, [{id, Id}, {type, Type}, {parent, Parent}, {unparsed, Rest3}]}.
Quite compact :-)
I find the encoding of the string very bizarre: a binary encoding where the string is NUL-terminated (so forces to walk the binary) instead of being encoded with, say, 2 or 4 bytes to represent the length and then the string itself.
Step 3: input validation
Since we are parsing a binary, this is probably coming from the outside of our system. As such, the let it crash philosophy doesn't apply and we have to perform full input validation.
I make the assumption that all fields are mandatory except unparsed, that can be empty.
missing_unparsed_is_ok_test() ->
Input = <<1, 0,
97, 98, 99, 100, 0,
0, 0>>,
Expected = {obj, [{id, 1}, {type, <<"abcd">>}, {parent, 0}, {unparsed, <<>>}]},
?assertEqual(Expected, binparse:newobj(Input)).
The simple implementation above passes it.
Step 4: malformed parent
We add the tests and we make a API decision: the function will return an error tuple.
missing_parent_is_error_test() ->
Input = <<1, 0,
97, 98, 99, 100, 0>>,
?assertEqual({error, bad_parent}, binparse:newobj(Input)).
malformed_parent_is_error_test() ->
Input = <<1, 0,
97, 98, 99, 100, 0,
0>>,
?assertEqual({error, bad_parent}, binparse:newobj(Input)).
We change the implementation to pass the tests:
newobj(Bin) ->
<<Id:16/little-unsigned, Rest/binary>> = Bin,
[Type, Rest2] = binary:split(Rest, <<0>>),
case Rest2 of
<<Parent:16/little-unsigned, Rest3/binary>> ->
{obj, [{id, Id}, {type, Type}, {parent, Parent}, {unparsed, Rest3}]};
Rest2 ->
{error, bad_parent}
end.
Step 5: malformed type
The new tests:
missing_type_is_error_test() ->
Input = <<1, 0>>,
?assertEqual({error, bad_type}, binparse:newobj(Input)).
malformed_type_is_error_test() ->
Input = <<1, 0,
97, 98, 99, 100>>,
?assertEqual({error, bad_type}, binparse:newobj(Input)).
We could be tempted to change the implementation as follows:
newobj(Bin) ->
<<Id:16/little-unsigned, Rest/binary>> = Bin,
case binary:split(Rest, <<0>>) of
[Type, Rest2] ->
case Rest2 of
<<Parent:16/little-unsigned, Rest3/binary>> ->
{obj, [
{id, Id}, {type, Type},
{parent, Parent}, {unparsed, Rest3}
]};
Rest2 ->
{error, bad_parent}
end;
[Rest] -> {error, bad_type}
end.
Which is an unreadable mess. Just adding functions doesn't help us:
newobj(Bin) ->
<<Id:16/little-unsigned, Rest/binary>> = Bin,
case parse_type(Rest) of
{ok, {Type, Rest2}} ->
case parse_parent(Rest2) of
{ok, Parent, Rest3} ->
{obj, [
{id, Id}, {type, Type},
{parent, Parent}, {unparsed, Rest3}
]};
{error, Reason} -> {error, Reason}
end;
{error, Reason} -> {error, Reason}
end.
parse_type(Bin) ->
case binary:split(Bin, <<0>>) of
[Type, Rest] -> {ok, {Type, Rest}};
[Bin] -> {error, bad_type}
end.
parse_parent(Bin) ->
case Bin of
<<Parent:16/little-unsigned, Rest/binary>> -> {ok, Parent, Rest};
Bin -> {error, bad_parent}
end.
This is a classic problem in Erlang with nested conditionals.
Step 6: regaining sanity
Here is my approach, quite generic so applicable (I think) to many domains. The overall idea is taken from backtracking, as explained in http://rvirding.blogspot.com/2009/03/backtracking-in-erlang-part-1-control.html
We create one function per parse step and pass them, as a list, to call_while_ok/3:
newobj(Bin) ->
Parsers = [fun parse_id/1,
fun parse_type/1,
fun parse_parent/1,
fun(X) -> {ok, {unparsed, X}, <<>>} end
],
case call_while_ok(Parsers, Bin, []) of
{error, Reason} -> {error, Reason};
PropList -> {obj, PropList}
end.
Function call_while_ok/3 is somehow related to lists:foldl and lists:filter:
call_while_ok([F], Seed, Acc) ->
case F(Seed) of
{ok, Value, _NextSeed} -> lists:reverse([Value | Acc]);
{error, Reason} -> {error, Reason}
end;
call_while_ok([F | Fs], Seed, Acc) ->
case F(Seed) of
{ok, Value, NextSeed} -> call_while_ok(Fs, NextSeed, [Value | Acc]);
{error, Reason} -> {error, Reason}
end.
And here are the parsing functions. Note that their signature is always the same:
parse_id(Bin) ->
<<Id:16/little-unsigned, Rest/binary>> = Bin,
{ok, {id, Id}, Rest}.
parse_type(Bin) ->
case binary:split(Bin, <<0>>) of
[Type, Rest] -> {ok, {type, Type}, Rest};
[Bin] -> {error, bad_type}
end.
parse_parent(Bin) ->
case Bin of
<<Parent:16/little-unsigned, Rest/binary>> ->
{ok, {parent, Parent}, Rest};
Bin -> {error, bad_parent}
end.
Step 7: homework
The list [{id, 1}, {type, <<"abcd">>}, {parent, 0}, {unparsed, Rest}] is a proplist (see Erlang documentation), which predates Erlang maps.
Have a look at the documentation for maps and see if it makes sense to return a map instead.

Related

How can I repair an improper list in Erlang?

I accidentally did (the equivalent of) the following:
lists:foldl(fun(X, Acc) -> [X|Acc] end, 0, List).
Note the not-a-list initial value for the accumulator.
This resulted in an improper list. This means that length, etc., don't work on it.
Given that my "equivalent of" took an hour to run, and I don't want to run it again, how do I repair my improper list?
For a simpler example of an improper list and the problem that it causes:
1> L = [1|[2|[3|4]]].
[1,2,3|4]
2> length(L).
** exception error: bad argument
in function length/1
called as length([1,2,3|4])
If you want to preserve the "improper tail", this would be enough:
Fix = fun Fix([H | T]) -> [H | Fix(T)];
Fix(T) -> [T]
end.
Here is a possible approach:
Lister = fun L([], Acc) -> lists:reverse(Acc);
L([[_ | _] = H | T], Acc) -> L(T, [L(H, []) | Acc]);
L([[] | T], Acc) -> L(T, Acc);
L([H | T], Acc) -> L(T, [H | Acc]);
L(X, Acc) -> L([], [X | Acc])
end.
L = [[[1,[1|2]],1|2],1|[2|[3|4]]].
Lister(L, []).
% output [[[1,[1,2]],1,2],1,2,3,4]
For the simple case I had, with non-nested improper list, where I don't want the extra item (because it should have been an empty list and doesn't mean anything), this'll do it:
Fix = fun F([H|T], A) when is_list(T) -> F(T, [H|A]);
F([H|_], A) -> F([], [H|A]);
F([], A) -> lists:reverse(A)
end.
Fix(L, []).
you must use list for ACC
lists:foldl(fun(X, Acc) -> [X|Acc] end, 0, [1,2,3]).
result => [3,2,1|0]
but if you use [0] for ACC argument in lists:foldl/3 function like bellow
lists:foldl(fun(X, Acc) -> [X|Acc] end, [0], [1,2,3]).
result => [3,2,1,0]

Erlang gen_tcp is accumulates received data

this my code to receive data
-module(t).
-compile(export_all).
start() ->
{ok, LSock} = gen_tcp:listen(5555, [binary, {packet, 0},
{active, false}]),
{ok, Sock} = gen_tcp:accept(LSock),
{ok, Bin} = do_recv(Sock, []),
ok = gen_tcp:close(Sock),
Bin.
do_recv(Sock, Bs) ->
io:format("(="), io:format(Bs),io:format("=)~n"),
case gen_tcp:recv(Sock, 0) of
{ok, B} ->
do_recv(Sock, [Bs, B]);
{error, closed} ->
{ok, list_to_binary(Bs)}
end.
i send to socket in series - 1, then 2, then 3, then 4, then 5
code is accumulates received data
it's print to screen
(=12345=)
how to modify the code to the code printed
(=1=)
(=2=)
(=3=)
(=4=)
(=5=)
that data is not accumulated
TCP represents data as a stream with no message structure. This has nothing to do with Erlangs implementation of it.
If you need a message structure you have to encode it in band in the data stream.
Erlang helps you with a simple builtin packet structure with 1, 2 or 4 byte length followed by the data. This is what {packet N} does for N equal to 1,2 or 4
But you need to also send data conforming to this structure.

In Erlang, how to return a string when you use recursion?

I really could'n formulate the question better, but here's my problem:
I want to use this code to convert infix expression to postfix expression in Erlang, but it only writes to the console output. The problem is, I need a list or a string returned, so I can use it as an argument in an other function.
-module(foo).
-compile(export_all).
parse(Str) ->
{ok, Tokens, _} = erl_scan:string(Str ++ "."),
{ok, [E]} = erl_parse:parse_exprs(Tokens),
E.
rpn({op, _, What, LS, RS}) ->
rpn(LS),
rpn(RS),
io:format(" ~s ", [atom_to_list(What)]);
rpn({integer, _, N}) ->
io:format(" ~B ", [N]).
p(Str) ->
Tree = parse(Str),
rpn(Tree),
io:format("~n").
For example, I want someting like this:
Str = "2 * (3 + 4)".
module:p(Str) =:= "2 3 4 + *".
module:anotherFunction(p(Str)).
You just need to io_lib:format/2 instead of io:format/2 and lists:flatten/1 in the end.
-module(foo).
-compile(export_all).
parse(Str) ->
{ok, Tokens, _} = erl_scan:string(Str ++ "."),
{ok, [E]} = erl_parse:parse_exprs(Tokens),
E.
rpn({op, _, What, LS, RS}) ->
io_lib:format("~s ~s ~s", [rpn(LS), rpn(RS), atom_to_list(What)]);
rpn({integer, _, N}) ->
io_lib:format("~b", [N]).
p(Str) ->
Tree = parse(Str),
lists:flatten(rpn(Tree)).

test with loop in erlang

I have a list of values ​​ "Z0010", "Z0011", "Z0012", "Z0013" "Z0014", "Z0015", "Z0016", "Z0017", "Z0018", "Z0019"
I want to develop a function that takes a value in parameter
and I want to do a test in my function if the value passed as a parameter is equal to a value in the list in this cases it will show "existe" if not it displays "not existe"
I try with :
test(Token)->
case get_user_formid_by_token(Token) of
{ok, H} ->
FormId=string:substr(H, 2, length(H)),
Form=string:to_integer(FormId),
case verify (0019,1,Form) of
{ok}->io:format("existe");
{notok}->io:format("not existe")
end;
{error, notfound}-> io:format("qq")
end.
verify(VariableLength,VariableIncrement,FormId)->
lists:map(fun(I) -> if I =:= FormId ->
{ok};
I =/= FormId ->
{notok}
end
end,
lists:seq(0010, VariableLength, VariableIncrement)).
but when I execute this code it displays :
1> model:test("21137900").
** exception error: no case clause matching [{notok},
{notok},
{notok},
{notok},
{notok},
{notok},
{notok},
{notok},
{notok},
{notok}]
in function model:test/1
I try now with this solution :
get_user_formid_by_token(Token) ->
Q = qlc:q([{X#person.formid} || X <- mnesia:table(person),
X#person.token =:= Token]),
case do(Q) of
[{H}] ->
{ok, H};
[] ->
{error, notfound}
end.
test(Token)->
case get_user_formid_by_token(Token) of
{ok, H} ->
io:format("~s~n",[H]),
FormId=string:substr(H, 5, length(H)),
io:format("~s~n",[FormId]),
Form=string:to_integer(FormId),
io:format("~p~n",[Form]),
lists:member(Form, lists:seq(313, 320, 1));
{error, notfound}-> io:format("qq")
end.
but when I test I have this message in the console:
1> model:test("21137900").
Z000313
313
{313,[]}
false
the result should be true and not false
I think that Form=string:to_integer(FormId), it not return in this case 313
and another thing I want to add in my code
for example if H equal "Z000010" FormId=string:substr(H, 2, length(H)),
it return "000010"
Now I want to eliminate the first zero before the first integer not null so extarct 0000
before 1
lists:map/2 takes one list and creates a new list with the same number of values, so your list of 10 values is transformed into a list of 10 {ok} or {notok} tuples.
You probably want lists:member/2 instead.
5> lists:member(0, lists:seq(1, 3, 1)).
false
6> lists:member(3, lists:seq(1, 3, 1)).
true
7> lists:map(fun(X) -> ok end, lists:seq(1, 3, 1)).
[ok,ok,ok]
Have a look at the documentation (http://www.erlang.org/doc/man/string.html#to_integer-1):
to_integer(String) -> {Int, Rest} | {error, Reason}
Types:
String = string()
Int = integer()
Rest = string()
Reason = no_integer | not_a_list
So to_integer returns a tuple containing the number that was consumed from the string and the rest of the string. You can even tell from your test output where it says {313,[]}. In order to get the value of the number bound to your Formvariable, you need to decompose the tuple, which is typically done by pattern matching:
{Form,_Rest}=string:to_integer(FormId)
Now your Form will contain only the number 313.
The string:to_integerfunction will also happily eat the leading zeroes:
1> {Form, _} = string:to_integer("000010"), Form.
10

The most efficient way to read a file into a list of strings

What is the most efficient way from the time consumed to read a text file into a list of binary strings in erlang ? The obvious solution
-module(test).
-export([run/1]).
open_file(FileName, Mode) ->
{ok, Device} = file:open(FileName, [Mode, binary]),
Device.
close_file(Device) ->
ok = file:close(Device).
read_lines(Device, L) ->
case io:get_line(Device, L) of
eof ->
lists:reverse(L);
String ->
read_lines(Device, [String | L])
end.
run(InputFileName) ->
Device = open_file(InputFileName, read),
Data = read_lines(Device, []),
close_file(Device),
io:format("Read ~p lines~n", [length(Data)]).
becomes too slow when the file contains more than 100000 lines.
{ok, Bin} = file:read_file(Filename).
or if you need the contents line by line
read(File) ->
case file:read_line(File) of
{ok, Data} -> [Data | read(File)];
eof -> []
end.
read the entire file in into a binary. Convert to a list and rip out the lines.
This is far more efficient than any other method. If you don't believe me time
it.
file2lines(File) ->
{ok, Bin} = file:read_file(File),
string2lines(binary_to_list(bin), []).
string2lines("\n" ++ Str, Acc) -> [reverse([$\n|Acc]) | string2lines(Str,[])];
string2lines([H|T], Acc) -> string2lines(T, [H|Acc]);
string2lines([], Acc) -> [reverse(Acc)].

Resources