Converting string to binary - erlang

I have the following issue.
I have a file which used for storing array of some records (unknown structure). All that I know that all records separated with "." (dot). One of the "fields" of this record is a binary value.
So the structure is:
multiline_text <<binary_value>> multiline_text .
I can read file chunk-by-chunk (because it pretty large) and parse data to get actual data "<>" but it's not a binary value it's a string. I'm trying to convert it binary (to convert to term late) but i have no success.
I tried to use BIF list_to_binary (but it won't work because it is not a list) - it's already a binary. I tried to convert it to list of integers, fold them and convert and it's still is not working.
I suppose I'm missing something basic (I'm newbie in Erlang).
Are there any advices?

If you get the binary you're interested in into an String in this format, for example:
S = "<< 1,2,3 >>".
then you can do something like this:
> {ok, T, _} = erl_scan:string(S ++ ".").
> {ok, Term} = erl_parse:parse_term(T).
{ok,<<1,2,3>>}
and then you can use Term, that actually has the binary you just read as a string.

Here is version without erl_parse. Just to study:
str2bin(Bin)->
Bin1 = string:strip(Bin, left, $<),
Bin2 = string:strip(Bin1, right, $>),
list_to_binary(lists:map(fun(Str) -> {Int, _Rest} = string:to_integer(string:strip(Str)), Int end, string:tokens(Bin2, ","))).

Related

How to also get how many characters read in parse?

I'm using Numeric.readDec to parse numbers and reads to parse Strings. But I also need to know how many characters were read.
For example readDec "52 rest" returns [(52," rest")], and read 2 characters. But there isn't a great way that I can find to know that it read 2 characters.
You could check the string length of show 52, but if the input was 052 that would give you the wrong answer (this solution also wouldn't work for the string parsing which has escape characters). You also could use the length of the post parsed string subtracted from the length of the input string. But this is very inefficient for long strings with many parses.
How can this be done correctly and efficiently (preferably without just writing your own parse)?
With just base, instead of readDec, you can use readDecP from Text.Read.Lex, which uses a ReadP parser:
readDecP :: (Eq a, Num a) => ReadP a
The gather combinator in Text.ParserCombinators.ReadP returns the parse result along with the actual characters parsed:
gather :: ReadP a -> ReadP (String, a)
You can run the parser with readP_to_S, which gives back a ReadS parser, which is a function that accepts a string and produces a list of possible parses with the remainder of the string.
readP_to_S :: ReadP a -> ReadS a
type ReadS a = String -> [(a, String)]
An example in GHCi:
> import Text.ParserCombinators.ReadP (gather, readP_to_S)
> import Text.Read.Lex (readDecP)
> readP_to_S (gather readDecP) "52 rest"
[(("52",52)," rest")]
> readP_to_S (gather readDecP) "0644 permissions"
[(("0644",644)," permissions")]
You can simply check that there is only one valid parse if you want the result to be unambiguous, and then take the length of the first component to find the number of Char code points parsed.
These parsers are fairly limited, however; if you want something easier to use, faster, or able to produce more detailed error messages, then you should check out a more fully featured parsing package such as regex-applicative (regular grammars) or megaparsec (context-sensitive grammars).

Can I match against a string that contains non-ASCII characters?

I am writing an program in which I am dealing with strings in the form, e.g., of "\001SOURCE\001". That is, the strings contained alphanumeric text with an ASCII character of value 1 at each end. I am trying to write a function to match strings like these. I have tried a match like this:
handle(<<1,"SOURCE",1>>) -> ok.
But the match does not succeed. I have tried a few variations on this theme, but all have failed.
Is there a way to match a string that contains mostly alphanumeric text, with the exception of a non-alpha character at each end?
You can also do the following
[1] ++ "SOURCE" ++ [1] == "\001SOURCE\001".
Or convert to binary using list_to_binary and pattern match as
<<1,"SOURCE",1>> == <<"\001SOURCE\001">>.
Strings are syntactic sugar for lists. Lists are a type and binaries are a different type, so your match isn't working out because you're trying to match a list against a binary (same problem if you tried to match {1, "STRING", 1} to it, tuples aren't lists).
Remembering that strings are lists, we have a few options:
handle([1,83,84,82,73,78,71,1]) -> ok.
This will work just fine. Another, more readable (but uglier, sort of) way is to use character literals:
handle([1, $S,$T,$R,$I,$N,$G, 1]) -> ok.
Yet another way would be to strip the non-character values, and then pass that on to a handler:
handle(String) -> dispatch(string:strip(String, both, 1)).
dispatch("STRING") -> do_stuff();
dispatch("OTHER") -> do_other_stuff().
And, if at all possible, the best case is if you just stop using strings for text values entirely (if that's feasible) and process binaries directly instead. The syntax of binaries is much friendlier, they take up way fewer resources, and quite a few binary operations are significantly more efficient than their string/list counterparts. But that doesn't fit every case! (But its awesome when dealing with sockets...)

Can i set particular byte in string?

I have very long string returned from os:cmd. My exe-file output contains some symbols with code 4, so i replaced them with other symbol and put meta in the beginning of the output. Now i want to replace symbols back. How i can do it in quickest way?
I'm an Erlang noob, so this answer is most likely not the best answer. There's probably a function that does this in a chapter I haven't reached yet in the Erlang Programming book. However, I think this does what you want:
-module(replace).
-export([replace/3]).
replace([], _, _) -> [];
replace([OldChar | T], OldChar, NewChar) -> [NewChar | replace(T, OldChar, NewChar)];
replace([H | T], OldChar, NewChar) -> [H | replace(T, OldChar, NewChar)].
It just goes through list (your string) and replaces the old character with the new one. It doesn't handle I18N. There are probably faster ways to do this. It will let you do this:
24> replace:replace([48,49,50,51,52,53,54,55,56,57], 53, 45).
"01234-6789"
or this:
28> replace:replace("39582049867", 57, 45).
"3-58204-867"
In terms of the quickest way - I'm going to guess that would be a provided function. If not, you'll have to code it up different ways and run the numbers.
Erlang strings are lists. Erlang lists are immutable. So you can't change particular bytes within a string, you can only generate another string with these bytes replaced.
Either replace the characters again (using map), or pass the original string around.

differentiate a string from a list in Erlang

In Erlang when you have a list of printable characters, its a string, but a string is also a list of items and all functions of a list can be applied onto a string. Really, the data structure string doesn't exist in Erlang.
Part of my code needs to be sure that something is not only a list, but it's a string. (A real string). It needs to separate lists e.g. [1,2,3,a,b,"josh"] , from string e.g. "Muzaaya".
The guard expression is_list/1 will say true for both strings and lists. There is no such guard as is_string/1 and so this means I need a code snippet will make sure that my data is a string.
A string in this case is a list of only printable (alphabetical, both cases, upper and lower), and may contain numbers e.g "Muzaaya2536 618 Joshua". I need a code snippet please (Erlang) that will check this for me and ensure that the variable is a string, not just a list. thanks
You have two functions in the module io_lib which can be helpful: io_lib:printable_list/1 and io_lib:printable_unicode_list/1 which test if the argument is a list of printable latin1 or unicode characters respectively.
using the isprint(3) definition of printable characters --
isprint(X) when X >= 32, X < 127 -> true;
isprint(_) -> false.
is_string(List) when is_list(List) -> lists:all(fun isprint/1, List);
is_string(_) -> false.
you won't be able to use it as a guard, though.

String splitting problems in Erlang

I've been playing around with the splitting of atoms and have a problem with strings. The input data will always be an atom that consists of some letters and then some numbers, for instance ms444, r64 or min1. Since the function lists:splitwith/2 takes a list the atom is first converted into a list:
24> lists:splitwith(fun (C) -> is_atom(C) end, [m,s,4,4,4]).
{[m,s],[4,4,4]}
25> lists:splitwith(fun (C) -> is_atom(C) end, atom_to_list(ms444)).
{[],"ms444"}
26> atom_to_list(ms444).
"ms444"
I want to separate the letters from the numbers and I've succeeded in doing that when using a list, but since I start out with an atom I get a "string" as result to put into my splitwith function...
Is it interpreting each item in the list as a string or what is going on?
You might want to have a look at the string module documentation:
http://www.erlang.org/doc/man/string.html
The following function might interest you:
tokens(String, SeparatorList) -> Tokens
Since strings in Erlang are just a list() of integer() the test in the fun will be made if the item is an atom() when it is in fact an integer(). If the test is changed to look for letters it works:
29> lists:splitwith(fun (C) -> (C >= $a) and (C =< $Z) end, atom_to_list(ms444)).
{"ms","444"}
An atom in erlang is a named constant and not a variable (or not like a variable is in an imperative language).
You should really not create atoms in dynamic fashion (that is, don't convert things to atoms at runtime)
They are used more in pattern matching and send recive code.
Pid ! {matchthis, X}
recive
{foobar,Y} -> doY(Y);
{matchthis,X} -> doX(X);
Other -> doother(Other)
end
A variable, like X could be set to an atom. For example X=if 1==1 -> ok; true -> fail end. I could suffer from poor imagination but I can't think of a way why you would like to parse atom. You should be in charge of what atoms you write and not use list_to_atom(CharIntegerList).
Can you perhaps give a more overview of what you like to accomplish?
A "string" in Erlang is not a primitive type: it is just a list() of integers(). So if you want to "separate" the letters from the digits, you'll have to do comparison with the integer representation of the characters.

Resources