Possible xml parsing - erlang

I've got a newbie question.
I'm trying to parse a xml message with pattern matching in functions
A sample of a message is:
<msg> <action type="xxx"... /> </msg>
What I would like to able to do is ( sort of )
decode_msg_in( << $<,$m,$s,$g,$>, Message/binary, $<,$/,$m,$s,$g,$> >>, R ) ->
The decode does not work (obviously, it's only a indication on what I'd like to do )
Is this even possible?
Does anyone have an idea? Or do I need to "iterate" the whole message as a list, building new "words" ?
Regards
/P

i probably think you need to read about Bit syntax expressions, Binary Comprehensions and about this xml parser library called erlsom, download it here. You will be brought up to speed in what you want to do.EDIT The xml message may reach your server as a binary, or as a string: Which ever way it does, the xml parser provided can parse the xml data into Erlang terms. Using the erlsom library, here is an example for your xml structure. I have my erlsom library in code path.
C:\Windows\System32>erl
Eshell V5.9 (abort with ^G)
1> XML = "<msg><action type=\"xxx\"/>message</msg>".
"<msg><action type=\"xxx\"/>message</msg>"
2> erlsom:simple_form(XML).
{ok,{"msg",[],[{"action",[{"type","xxx"}],[]},"message"]},
[]}
3> {_,Parsed,_} = erlsom:simple_form(XML).
{ok,{"msg",[],[{"action",[{"type","xxx"}],[]},"message"]},
[]}
4> Parsed.
{"msg",[],[{"action",[{"type","xxx"}],[]},"message"]}
5> {_,_,[{_,[{_,ActionType}],_},Message]} = Parsed.
{"msg",[],[{"action",[{"type","xxx"}],[]},"message"]}
6> ActionType.
"xxx"
7> Message.
"message"
8>
You can see above that it comes down to easy pattern matching. The resulting structure will give you something clean as long as the senders send properly formatted xml data. If you suspect improper xml data to hit your server, then, you need to wrap the parser in try [CALL] of [GoodResult] -> [Action1] catch _Error:_Reason -> [Action2] end. Note that if the XML Body is very large, you need to use SAX method to parse the xml to avoid big memory foot prints. SAX examples are included in the library documentation.

Related

In Erlang's PropEr, how to get a sample of a generator?

I'm using PropEr to write my property based test.
How can I see what kind of data my generator produces?
Let's say I have the following generator:
-module(my).
-include_lib("proper/include/proper.hrl").
-export([valid_type_gen/0]).
valid_type_gen() -> non_empty(list(any())).
I would like to examine what kind of data it generates, i.e. something like:
$ erl
1> my:valid_type_gen().sample() %???
[1,b,"blah"]
For the same question in Triq, look here.
The relevant function is proper_gen:pick/1. It returns a tuple {ok, V}.
$ erl
1> proper_gen:pick(my:valid_type_gen()).
{ok,[{{},<<>>,2},
[{},11.690292064109402,
{{}},
{},18.096053885231132,u,')[\2064Wue¢±'],
[{},-5.041761022794527,-13,
{[],-0.9553811124968509},
-5,'õ\232zc}:Ì'],
<<47,5,113,69,86,216,20,142,173,57:6>>,
'',
{2.710196163900066,0.47155396154628,{},[],
{[]},
8.42398680461108},
{[[25,
[-10.073999184421432,5.734631070941083,
{'æ\2367Ò§ü\233"',[30.925337851024143]}]],
'']},
'\031Þ\037\'\v','\214b\236']}

Any lib or formal method handling TLV in Erlang?

I'm working on a protocol handling data exchange that somehow a little complex, then I found TLV is the one I need. Is there a formal way to read and write TLV in erlang? or some lib / code example handling this? thanks.
The "default" in Erlang is LTV rather than TLV, but it is rather easy to handle:
case gen_tcp:recv(Socket, 8) of
<<Type:32/integer, Len:32/integer>> ->
Payload = gen_tcp:recv(Socket, Len),
{type_of(Type), Payload};
...
end,
You will need passive sockets to get this to work, but it is rather easy to do. If you have the freedom to pick your format, the LTV encoding is better because you can then put the socket in {active, once} mode which means the underlying layer decodes stuff for you.
I haven't actually used it, but how about this one: https://github.com/essiene/smpp34pdu/blob/master/src/tlv.erl

What kind of types can be sent on an Erlang message?

Mainly I want to know if I can send a function in a message in a distributed Erlang setup.
On Machine 1:
F1 = Fun()-> hey end,
gen_server:call(on_other_machine,F1)
On Machine 2:
handler_call(Function,From,State) ->
{reply,Function(),State)
Does it make sense?
Here's an interesting article about "passing fun's to other Erlang nodes". To resume it briefly:
[...] As you might know, Erlang distribution
works by sending the binary encoding
of terms; and so sending a fun is also
essentially done by encoding it using
erlang:term_to_binary/1; passing the
resulting binary to another node, and
then decoding it again using
erlang:binary_to_term/1.[...]
This is pretty obvious
for most data types; but how does it
work for function objects?
When you encode a fun, what is encoded
is just a reference to the function,
not the function implementation.
[...]
[...]the definition of the function is not passed along; just exactly enough information to recreate the fun at an other node if the module is there.
[...] If the module containing the fun has not yet been loaded, and the target node is running in interactive mode; then the module is attempted loaded using the regular module loading mechanism (contained in the module error_handler); and then it tries to see if a fun with the given id is available in said module. However, this only happens lazily when you try to apply the function.
[...] If you never attempt to apply the function, then nothing bad happens. The fun can be passed to another node (which has the module/fun in question) and then everybody is happy.
Maybe the target node has a module loaded of said name, but perhaps in a different version; which would then be very likely to have a different MD5 checksum, then you get the error badfun if you try to apply it.
I would suggest you to read the whole article, cause it's extremely interesting.
You can send any valid Erlang term. Although you have to be careful when sending funs. Any fun referencing a function inside a module needs that module to exist on the target node to work:
(first#host)9> rpc:call(second#host, erlang, apply,
[fun io:format/1, ["Hey!~n"]]).
Hey!
ok
(first#host)10> mymodule:func("Hey!~n").
5
(first#host)11> rpc:call(second#host, erlang, apply,
[fun mymodule:func/1, ["Hey!~n"]]).
{badrpc,{'EXIT',{undef,[{mymodule,func,["Hey!~n"]},
{rpc,'-handle_call_call/6-fun-0-',5}]}}}
In this example, io exists on both nodes and it works to send a function from io as a fun. However, mymodule exists only on the first node and the fun generates an undef exception when called on the other node.
As for anonymous functions, it seems they can be sent and work as expected.
t1#localhost:
(t1#localhost)7> register(shell, self()).
true
(t1#localhost)10> A = me, receive Fun when is_function(Fun) -> Fun(A) end.
hello me you
ok
t2#localhost:
(t2#localhost)11> B = you.
you
(t2#localhost)12> Fn2 = fun (A) -> io:format("hello ~p ~p~n", [A, B]) end.
#Fun<erl_eval.6.54118792>
(t2#localhost)13> {shell, 't1#localhost'} ! Fn2.
I am adding coverage logic to an app built on riak-core, and the merge of results gathered can be tricky if anonymous functions cannot be used in messages.
Also check out riak_kv/src/riak_kv_coverage_filter.erl
riak_kv might be using it to filter result, I guess.

How to refine the debugging?

Crash report (SASL) gives more or less where and why a bug happens.
But is it possible to refine this (the function, the line de code, etc) ?
If you can reproduce the fault, the best way to get more information is to put a dbg trace on sections in question and review that output.
dbg:tracer(),dbg:p(all,c),dbg:tpl(Mod,Func,x).
This usually does the trick for me. Replace Mod and Func with whatever module and function you want to debug.
If you are looking for more detailed post-mortem logging then sasl and the error_logger are your friends. There are of course times when SASL does not give you enough info, if this happens a lot in your system you probably should either learn to understand the SASL output better or write your own log handler. It is quite easy to plug-in your own error handler into SASL and output things as you want.
You will however never get line number as that information is destroyed at compilation time and there is no way for the VM to know which line crashed. It does however know which function and possibly with which arguments, given this it is usually possible to find out where things went wrong. Unless you write very long functions, which IMO is bad code smell and a sign that you should refactor your code to smaller functions.
In general, no. The erlang .beam files does not contain the line numbers from the original code, so it is hard to know at what line the problem occurred. I do have a number of macros I use in my project, included as "log.hrl":
-define(INFO(T), error_logger:info_report(T)).
-define(WARN(T), error_logger:warning_report(
[process_info(self(), current_function), {line, ?LINE} | T])).
-define(ERR(T), error_logger:error_report(
[process_info(self(), current_function), {line, ?LINE} | T])).
-define(DEBUG(Format, Args), io:format("D(~p:~p:~p) : "++Format++"~n",
[self(),?MODULE,?LINE]++Args)).
-define(DEBUGP(Args), io:format("D(~p:~p:~p) : ~p~n",
[self(),?MODULE,?LINE, Args])).
and this does give you some log lines in the program to hunt for. For debugging I often also use the redbug tool from the eper suite:
https://github.com/massemanet/eper
It allows you to trace in realtime whenever a call happens:
Eshell V5.8.3 (abort with ^G)
1> redbug:start("erlang:now() -> stack;return", [{time, 60*1000}]).
ok
2> erlang:now().
{1297,183814,756227}
17:50:14 <{erlang,apply,2}> {erlang,now,[]}
shell:eval_loop/3
shell:eval_exprs/7
shell:exprs/7
17:50:14 <{erlang,apply,2}> {erlang,now,0} -> {1297,183814,756227}
3>
I hope this helps.

Query an Erlang process for its state?

A common pattern in Erlang is the recursive loop that maintains state:
loop(State) ->
receive
Msg ->
NewState = whatever(Msg),
loop(NewState)
end.
Is there any way to query the state of a running process with a bif or tracing or something? Since crash messages say "...when state was..." and show the crashed process's state, I thought this would be easy, but I was disappointed that I haven't been able to find a bif to do this.
So, then, I figured using the dbg module's tracing would do it. Unfortunately, I believe because these loops are tail call optimized, dbg will only capture the first call to the function.
Any solution?
If your process is using OTP, it is enough to do sys:get_status(Pid).
The error message you mentions is displayed by SASL. SASL is an error reporting daemon in OTP.
The state you are referring in your example code is just an argument of tail recursive function. There is no way to extract it using anything except for tracing BIFs. I guess this would be not a proper solution in production code, since tracing is intended to be used only for debug purposes.
Proper, and industry tested, solution would be make extensive use of OTP in your project. Then you can take full advantage of SASL error reporting, rb module to collect these reports, sys - to inspect the state of the running OTP-compatible process, proc_lib - to make short-lived processes OTP-compliant, etc.
It turns out there's a better answer than all of these, if you're using OTP:
sys:get_state/1
Probably it didn't exist at the time.
It looks like you're making the problem out of nothing. erlang:process_info/1 gives enough information for debugging purposes. If your REALLY need loop function arguments, why don't you give it back to caller in response to one of the special messages that you define yourself?
UPDATE:
Just to clarify terminology. The closest thing to the 'state of the process' on the language level is process dictionary, usage of which is highly discouraged. It can be queried by erlang:process_info/1 or erlang:process/2.
What you actually need is to trace process's local functions calls along with their arguments:
-module(ping).
-export([start/0, send/1, loop/1]).
start() ->
spawn(?MODULE, loop, [0]).
send(Pid) ->
Pid ! {self(), ping},
receive
pong ->
pong
end.
loop(S) ->
receive
{Pid, ping} ->
Pid ! pong,
loop(S + 1)
end.
Console:
Erlang (BEAM) emulator version 5.6.5 [source] [smp:2] [async-threads:0] [kernel-poll:false]
Eshell V5.6.5 (abort with ^G)
1> l(ping).
{module,ping}
2> erlang:trace(all, true, [call]).
23
3> erlang:trace_pattern({ping, '_', '_'}, true, [local]).
5
4> Pid = ping:start().
<0.36.0>
5> ping:send(Pid).
pong
6> flush().
Shell got {trace,<0.36.0>,call,{ping,loop,[0]}}
Shell got {trace,<0.36.0>,call,{ping,loop,[1]}}
ok
7>
{status,Pid,_,[_,_,_,_,[_,_,{data,[{_,State}]}]]} = sys:get_status(Pid).
That's what I use to get the state of a gen_server. (Tried to add it as a comment to the reply above, but couldn't get formatting right.)
As far as I know you cant get the arguments passed to a locally called function. I would love for someone to prove me wrong.
-module(loop).
-export([start/0, loop/1]).
start() ->
spawn_link(fun () -> loop([]) end).
loop(State) ->
receive
Msg ->
loop([Msg|State])
end.
If we want to trace this module you do the following in the shell.
dbg:tracer().
dbg:p(new,[c]).
dbg:tpl(loop, []).
Using this tracing setting you get to see local calls (the 'l' in tpl means that local calls will be traced as well, not only global ones).
5> Pid = loop:start().
(<0.39.0>) call loop:'-start/0-fun-0-'/0
(<0.39.0>) call loop:loop/1
<0.39.0>
6> Pid ! foo.
(<0.39.0>) call loop:loop/1
foo
As you see, just the calls are included. No arguments in sight.
My recommendation is to base correctness in debugging and testing on the messages sent rather than state kept in processes. I.e. if you send the process a bunch of messages, assert that it does the right thing, not that it has a certain set of values.
But of course, you could also sprinkle some erlang:display(State) calls in your code temporarily. Poor man's debugging.
This is a "oneliner" That can be used in the shell.
sys:get_status(list_to_pid("<0.1012.0>")).
It helps you convert a pid string into a Pid.

Resources