String splitting problems in Erlang - erlang

I've been playing around with the splitting of atoms and have a problem with strings. The input data will always be an atom that consists of some letters and then some numbers, for instance ms444, r64 or min1. Since the function lists:splitwith/2 takes a list the atom is first converted into a list:
24> lists:splitwith(fun (C) -> is_atom(C) end, [m,s,4,4,4]).
{[m,s],[4,4,4]}
25> lists:splitwith(fun (C) -> is_atom(C) end, atom_to_list(ms444)).
{[],"ms444"}
26> atom_to_list(ms444).
"ms444"
I want to separate the letters from the numbers and I've succeeded in doing that when using a list, but since I start out with an atom I get a "string" as result to put into my splitwith function...
Is it interpreting each item in the list as a string or what is going on?

You might want to have a look at the string module documentation:
http://www.erlang.org/doc/man/string.html
The following function might interest you:
tokens(String, SeparatorList) -> Tokens

Since strings in Erlang are just a list() of integer() the test in the fun will be made if the item is an atom() when it is in fact an integer(). If the test is changed to look for letters it works:
29> lists:splitwith(fun (C) -> (C >= $a) and (C =< $Z) end, atom_to_list(ms444)).
{"ms","444"}

An atom in erlang is a named constant and not a variable (or not like a variable is in an imperative language).
You should really not create atoms in dynamic fashion (that is, don't convert things to atoms at runtime)
They are used more in pattern matching and send recive code.
Pid ! {matchthis, X}
recive
{foobar,Y} -> doY(Y);
{matchthis,X} -> doX(X);
Other -> doother(Other)
end
A variable, like X could be set to an atom. For example X=if 1==1 -> ok; true -> fail end. I could suffer from poor imagination but I can't think of a way why you would like to parse atom. You should be in charge of what atoms you write and not use list_to_atom(CharIntegerList).
Can you perhaps give a more overview of what you like to accomplish?

A "string" in Erlang is not a primitive type: it is just a list() of integers(). So if you want to "separate" the letters from the digits, you'll have to do comparison with the integer representation of the characters.

Related

Using Alex in Haskell to make a lexer that parses Dice Rolls

I'm making a parser for a DSL in Haskell using Alex + Happy.
My DSL uses dice rolls as part of the possible expressions.
Sometimes I have an expression that I want to parse that looks like:
[some code...] 3D6 [... rest of the code]
Which should translate roughly to:
TokenInt {... value = 3}, TokenD, TokenInt {... value = 6}
My DSL also uses variables (basically, Strings), so I have a special token that handle variable names.
So, with this tokens:
"D" { \pos str -> TokenD pos }
$alpha [$alpha $digit \_ \']* { \pos str -> TokenName pos str}
$digit+ { \pos str -> TokenInt pos (read str) }
The result I'm getting when using my parse now is:
TokenInt {... value = 3}, TokenName { ... , name = "D6"}
Which means that my lexer "reads" an Integer and a Variable named "D6".
I have tried many things, for example, i changed the token D to:
$digit "D" $digit { \pos str -> TokenD pos }
But that just consumes the digits :(
Can I parse the dice roll with the numbers?
Or at least parse TokenInt-TokenD-TokenInt?
PS: I'm using PosN as a wrapper, not sure if relevant.
The way I'd go about it would be to extend the TokenD type to TokenD Int Int so using the basic wrapper for convenience I would do
$digit+ D $digit+ { dice }
...
dice :: String -> Token
dice s = TokenD (read $ head ls) (read $ last ls)
where ls = split 'D' s
split can be found here.
This is an extra step that'd usually be done in during syntactic analysis but doesn't hurt much here.
Also I can't make Alex parse $alpha for TokenD instead of TokenName. If we had Di instead of D that'd be no problem. From Alex's docs:
When the input stream matches more than one rule, the rule which matches the longest prefix of the input stream wins. If there are still several rules which match an equal number of characters, then the rule which appears earliest in the file wins.
But then your code should work. I don't know if this is an issue with Alex.
I decided that I could survive with variables starting with lowercase letters (like Haskell variables), so I changed my lexer to parse variables only if they start with a lowercase letter.
That also solved some possible problems with some other reserved words.
I'm still curious to know if there were other solutions, but the problem in itself was solved.
Thank you all!

Can I match against a string that contains non-ASCII characters?

I am writing an program in which I am dealing with strings in the form, e.g., of "\001SOURCE\001". That is, the strings contained alphanumeric text with an ASCII character of value 1 at each end. I am trying to write a function to match strings like these. I have tried a match like this:
handle(<<1,"SOURCE",1>>) -> ok.
But the match does not succeed. I have tried a few variations on this theme, but all have failed.
Is there a way to match a string that contains mostly alphanumeric text, with the exception of a non-alpha character at each end?
You can also do the following
[1] ++ "SOURCE" ++ [1] == "\001SOURCE\001".
Or convert to binary using list_to_binary and pattern match as
<<1,"SOURCE",1>> == <<"\001SOURCE\001">>.
Strings are syntactic sugar for lists. Lists are a type and binaries are a different type, so your match isn't working out because you're trying to match a list against a binary (same problem if you tried to match {1, "STRING", 1} to it, tuples aren't lists).
Remembering that strings are lists, we have a few options:
handle([1,83,84,82,73,78,71,1]) -> ok.
This will work just fine. Another, more readable (but uglier, sort of) way is to use character literals:
handle([1, $S,$T,$R,$I,$N,$G, 1]) -> ok.
Yet another way would be to strip the non-character values, and then pass that on to a handler:
handle(String) -> dispatch(string:strip(String, both, 1)).
dispatch("STRING") -> do_stuff();
dispatch("OTHER") -> do_other_stuff().
And, if at all possible, the best case is if you just stop using strings for text values entirely (if that's feasible) and process binaries directly instead. The syntax of binaries is much friendlier, they take up way fewer resources, and quite a few binary operations are significantly more efficient than their string/list counterparts. But that doesn't fit every case! (But its awesome when dealing with sockets...)

Simple explanation of Erlang atom

I am learning Erlang and stuck trying to understand the concept of atoms. I know Python: What is a good explanation of these "atoms" in simple terms, or analogously with Python. So far, my understanding is that the type is like a string but without string operations?
Docs say that:
An atom is a literal, a constant with name.
Sometimes you have couple of options, that you would like to choose from. In C for example, you have enum:
enum Weekday { Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday };
In C, it is really an integer, but you can use it in code as one of options. Atoms in Erlang are very useful in pattern matching. Lets consider very simple server:
loop() ->
receive
{request_type_1, Request} ->
handle_request_1(Request),
loop();
{request_type_2, Request} ->
handle_request_2(Request),
loop();
{stop, Reason} ->
{ok, Reason};
_ ->
{error, bad_request}
end.
Your server receives messages, that are two element tuples and uses atoms to differentiate between different types of requests: request_type_1, request_type_2 and stop. It is called pattern matching.
Server also uses atoms as return values. ok atom means, that everything went ok. _ matches everything, so in case, that simple server receives something unexpected, it quits with tuple {error, Reason}, where the reason is also atom bad_request.
Boolean values true and false are also atoms. You can build logical functions using function clauses like this:
and(true, true) ->
true;
and(_, _) ->
false.
or(false, false) ->
false;
or(_, _) ->
true.
(It is a little bit oversimplified, because you can call it like this: or(atom1, atom2) and it will return true, but it is only for illustration.)
Module names in Erlang are also atoms, so you can bind module name to variable and call it, for example type this in Erlang shell:
io:format("asdf").
Variable = io.
Variable:format("asdf").
You should not use atoms as strings, because they are not garbage collected. If you start creating them dynamically, you can run out of memory. They should be only used, when there is fixed amount of options, that you type in code by hand. Of course, you can use the same atom as many times as you want, because it always points to the same point in memory (an atom table).
They are better than C enums, because the value is known at runtime. So while debugging C code, you would see 1 instead of Tuesday in debugger. Using atoms doesn't have that disadvantage, you will see tuesday in your both in your code and Erlang shell.
Also, they're often used to tag a tuple, for descriptiveness. For example:
{age, 42}
Rather than just
42
Atom is a literal constant. Has no value but can be used as a value. Examples are: true, false, undefined. If you want to use it as a string, you need to apply atom_to_list(atom) to get a string (list) to work with. Module names are also atoms.
Take a look at http://www.erlang.org/doc/reference_manual/data_types.html

Erlang matchspecs with tuple comparison

I want to use erlang datetime values in the standard format {{Y,M,D},{H,Min,Sec}} in a MNESIA table for logging purposes and be able to select log entries by comparing with constant start and end time tuples.
It seems that the matchspec guard compiler somehow confuses tuple values with guard sub-expressions. Evaluating ets:match_spec_compile(MatchSpec) fails for
MatchSpec = [
{
{'_','$1','$2'}
,
[
{'==','$2',{1,2}}
]
,
['$_']
}
]
but succeeds when I compare $2 with any non-tuple value.
Is there a restriction that match guards cannot compare tuple values?
I believe the answer is to use double braces when using tuples (see Variables and Literals section of http://www.erlang.org/doc/apps/erts/match_spec.html#id69408). So to use a tuple in a matchspec expression, surround that tuple with braces, as in,
{'==','$2',{{1,2}}}
So, if I understand your example correctly, you would have
22> M=[{{'_','$1','$2'},[{'==','$2',{{1,2}}}],['$_']}].
[{{'_','$1','$2'},[{'==','$2',{{1,2}}}],['$_']}]
23> ets:match_spec_run([{1,1,{1,2}}],ets:match_spec_compile(M)).
[{1,1,{1,2}}]
24> ets:match_spec_run([{1,1,{2,2}}],ets:match_spec_compile(M)).
[]
EDIT: (sorry to edit your answer but this was the easiest way to get my comment in a readable form)
Yes, this is how it must be done. An easier way to get the match-spec is to use the (pseudo) function ets:fun2ms/1 which takes a literal fun as an argument and returns the match-spec. So
10> ets:fun2ms(fun ({A,B,C}=X) when C == {1,2} -> X end).
[{{'$1','$2','$3'},[{'==','$3',{{1,2}}}],['$_']}]
The shell recognises ets:fun2ms/1. For more information see ETS documentation. Mnesia uses the same match-specs as ETS.

Can i set particular byte in string?

I have very long string returned from os:cmd. My exe-file output contains some symbols with code 4, so i replaced them with other symbol and put meta in the beginning of the output. Now i want to replace symbols back. How i can do it in quickest way?
I'm an Erlang noob, so this answer is most likely not the best answer. There's probably a function that does this in a chapter I haven't reached yet in the Erlang Programming book. However, I think this does what you want:
-module(replace).
-export([replace/3]).
replace([], _, _) -> [];
replace([OldChar | T], OldChar, NewChar) -> [NewChar | replace(T, OldChar, NewChar)];
replace([H | T], OldChar, NewChar) -> [H | replace(T, OldChar, NewChar)].
It just goes through list (your string) and replaces the old character with the new one. It doesn't handle I18N. There are probably faster ways to do this. It will let you do this:
24> replace:replace([48,49,50,51,52,53,54,55,56,57], 53, 45).
"01234-6789"
or this:
28> replace:replace("39582049867", 57, 45).
"3-58204-867"
In terms of the quickest way - I'm going to guess that would be a provided function. If not, you'll have to code it up different ways and run the numbers.
Erlang strings are lists. Erlang lists are immutable. So you can't change particular bytes within a string, you can only generate another string with these bytes replaced.
Either replace the characters again (using map), or pass the original string around.

Resources