Can I match against a string that contains non-ASCII characters? - erlang

I am writing an program in which I am dealing with strings in the form, e.g., of "\001SOURCE\001". That is, the strings contained alphanumeric text with an ASCII character of value 1 at each end. I am trying to write a function to match strings like these. I have tried a match like this:
handle(<<1,"SOURCE",1>>) -> ok.
But the match does not succeed. I have tried a few variations on this theme, but all have failed.
Is there a way to match a string that contains mostly alphanumeric text, with the exception of a non-alpha character at each end?

You can also do the following
[1] ++ "SOURCE" ++ [1] == "\001SOURCE\001".
Or convert to binary using list_to_binary and pattern match as
<<1,"SOURCE",1>> == <<"\001SOURCE\001">>.

Strings are syntactic sugar for lists. Lists are a type and binaries are a different type, so your match isn't working out because you're trying to match a list against a binary (same problem if you tried to match {1, "STRING", 1} to it, tuples aren't lists).
Remembering that strings are lists, we have a few options:
handle([1,83,84,82,73,78,71,1]) -> ok.
This will work just fine. Another, more readable (but uglier, sort of) way is to use character literals:
handle([1, $S,$T,$R,$I,$N,$G, 1]) -> ok.
Yet another way would be to strip the non-character values, and then pass that on to a handler:
handle(String) -> dispatch(string:strip(String, both, 1)).
dispatch("STRING") -> do_stuff();
dispatch("OTHER") -> do_other_stuff().
And, if at all possible, the best case is if you just stop using strings for text values entirely (if that's feasible) and process binaries directly instead. The syntax of binaries is much friendlier, they take up way fewer resources, and quite a few binary operations are significantly more efficient than their string/list counterparts. But that doesn't fit every case! (But its awesome when dealing with sockets...)

Related

wxMaxima: how to use texput to tell tex1 how to handle strings?

tex1() seems to return all strings as follow:
tex1(hello);
{\it hello}
tex1("hello");
\mbox{ hello }
What variable must one use to change this handling via texput? e.g. if I would just like it to print strings literally? I'm using other Maxima commands (like printf and concat to produce strings that are then passed to tex1, and occasionally the default handling is causing issues.
I tried texput(""", ...) and texput("''", ...); the first wasn't accepted, the 2nd was, but did not change the output. I really have no clue for the non-quoted strings.
Let's be careful to distinguish symbols from strings. When you enter tex1(hello) then hello is a symbol, and when you enter tex1("hello") then "hello" is a string. Symbols are essentially names for items in a lookup table, which can store additional info (symbol properties) for each. Strings on the other hand are just (from Maxima's point of view) just a sequence of characters.
Anyway changing the output for all symbols or all strings is unfortunately not possible via texput. But with a one-line Lisp function, one can accomplish it. Try this: for symbols,
:lisp (defun tex-stripdollar (sym) (maybe-invert-string-case (symbol-name (stripdollar sym))))
and for strings,
:lisp (defun tex-string (str) str)
These are going to change some existing outputs, so you'll want to try it and see if it works for you.

How can I detect a list [1,2,3] (not a string "example string")? [duplicate]

In Erlang when you have a list of printable characters, its a string, but a string is also a list of items and all functions of a list can be applied onto a string. Really, the data structure string doesn't exist in Erlang.
Part of my code needs to be sure that something is not only a list, but it's a string. (A real string). It needs to separate lists e.g. [1,2,3,a,b,"josh"] , from string e.g. "Muzaaya".
The guard expression is_list/1 will say true for both strings and lists. There is no such guard as is_string/1 and so this means I need a code snippet will make sure that my data is a string.
A string in this case is a list of only printable (alphabetical, both cases, upper and lower), and may contain numbers e.g "Muzaaya2536 618 Joshua". I need a code snippet please (Erlang) that will check this for me and ensure that the variable is a string, not just a list. thanks
You have two functions in the module io_lib which can be helpful: io_lib:printable_list/1 and io_lib:printable_unicode_list/1 which test if the argument is a list of printable latin1 or unicode characters respectively.
using the isprint(3) definition of printable characters --
isprint(X) when X >= 32, X < 127 -> true;
isprint(_) -> false.
is_string(List) when is_list(List) -> lists:all(fun isprint/1, List);
is_string(_) -> false.
you won't be able to use it as a guard, though.

Matching function in erlang based on string format

I have user information coming in from an outside source and I need to check if that user is active. Sometimes I have a User and a Server and other times I have User#Server. The former case is no problem, I just have:
active(User, Server) ->
do whatever.
What I would like to do with the User#Server case is something like:
active([User, "#", Server]) ->
active(User, Server).
Doesn't seem to work. When calling active in the erlang terminal with a#b for example, I get an error that there is no match. Any help would be appreciated!
You can tokenize the string to get the result:
active(UserString) ->
[User,Server] = string:tokens(UserString,"#"),
active(User,Server).
If you need something more elaborate, or with better handling of something like email addresses, it might then be time to delve into using regular expressions with the re module.
active(UserString) ->
RegEx = "^([\\w\\.-]+)#([\\w\\.-]+)$",
{match, [User,Server]} = re:run(UserString,RegEx,[{capture,all_but_first,list}]),
active(User,Server).
Note: The supplied Regex is hardly sufficient for email address validation, it's just an example that allows all alphanumeric characters including underscores (\\w), dots (\\.), and dashes (-) seperated by an at symbol. And it will fail if the match doesn't stretch the whole length of the string: (^ to $).
A note on the pattern matching, for the real solution to your problem I think #chops suggestions should be used.
When matching patterns against strings I think it's useful to keep in mind that erlang strings are really lists of integers. So the string "#" is actually the same as [64] (64 being the ascii code for #)
This means that you match pattern [User, "#", Server] will match lists like: [97,[64],98], but not "a#b" (which in list form is [97,64,98]).
To match the string you need to do [User,$#,Server]. The $ operator gives you the ascii value of the character.
However this match pattern limits the matching string to be 1 character followed by # and then one more character...
It can be improved by doing [User, $# | Server] which allows the server part to have arbitrary length, but the User variable will still only match one single character (and I don't see a way around that).

differentiate a string from a list in Erlang

In Erlang when you have a list of printable characters, its a string, but a string is also a list of items and all functions of a list can be applied onto a string. Really, the data structure string doesn't exist in Erlang.
Part of my code needs to be sure that something is not only a list, but it's a string. (A real string). It needs to separate lists e.g. [1,2,3,a,b,"josh"] , from string e.g. "Muzaaya".
The guard expression is_list/1 will say true for both strings and lists. There is no such guard as is_string/1 and so this means I need a code snippet will make sure that my data is a string.
A string in this case is a list of only printable (alphabetical, both cases, upper and lower), and may contain numbers e.g "Muzaaya2536 618 Joshua". I need a code snippet please (Erlang) that will check this for me and ensure that the variable is a string, not just a list. thanks
You have two functions in the module io_lib which can be helpful: io_lib:printable_list/1 and io_lib:printable_unicode_list/1 which test if the argument is a list of printable latin1 or unicode characters respectively.
using the isprint(3) definition of printable characters --
isprint(X) when X >= 32, X < 127 -> true;
isprint(_) -> false.
is_string(List) when is_list(List) -> lists:all(fun isprint/1, List);
is_string(_) -> false.
you won't be able to use it as a guard, though.

String splitting problems in Erlang

I've been playing around with the splitting of atoms and have a problem with strings. The input data will always be an atom that consists of some letters and then some numbers, for instance ms444, r64 or min1. Since the function lists:splitwith/2 takes a list the atom is first converted into a list:
24> lists:splitwith(fun (C) -> is_atom(C) end, [m,s,4,4,4]).
{[m,s],[4,4,4]}
25> lists:splitwith(fun (C) -> is_atom(C) end, atom_to_list(ms444)).
{[],"ms444"}
26> atom_to_list(ms444).
"ms444"
I want to separate the letters from the numbers and I've succeeded in doing that when using a list, but since I start out with an atom I get a "string" as result to put into my splitwith function...
Is it interpreting each item in the list as a string or what is going on?
You might want to have a look at the string module documentation:
http://www.erlang.org/doc/man/string.html
The following function might interest you:
tokens(String, SeparatorList) -> Tokens
Since strings in Erlang are just a list() of integer() the test in the fun will be made if the item is an atom() when it is in fact an integer(). If the test is changed to look for letters it works:
29> lists:splitwith(fun (C) -> (C >= $a) and (C =< $Z) end, atom_to_list(ms444)).
{"ms","444"}
An atom in erlang is a named constant and not a variable (or not like a variable is in an imperative language).
You should really not create atoms in dynamic fashion (that is, don't convert things to atoms at runtime)
They are used more in pattern matching and send recive code.
Pid ! {matchthis, X}
recive
{foobar,Y} -> doY(Y);
{matchthis,X} -> doX(X);
Other -> doother(Other)
end
A variable, like X could be set to an atom. For example X=if 1==1 -> ok; true -> fail end. I could suffer from poor imagination but I can't think of a way why you would like to parse atom. You should be in charge of what atoms you write and not use list_to_atom(CharIntegerList).
Can you perhaps give a more overview of what you like to accomplish?
A "string" in Erlang is not a primitive type: it is just a list() of integers(). So if you want to "separate" the letters from the digits, you'll have to do comparison with the integer representation of the characters.

Resources