Is there an erlang equivalent of codePointAt from js? One that gets the code point starting at a byte offset, without modifying the underlying string/binary?
You can use bit syntax pattern matching to skip the first N bytes and decode the first character from the remaining bytes as UTF-8:
1> CodePointAt = fun(Binary, Offset) ->
<<_:Offset/binary, Char/utf8, _/binary>> = Binary,
Char
end.
Test:
2> CodePointAt(<<"πr²"/utf8>>, 0).
960
3> CodePointAt(<<"πr²"/utf8>>, 1).
** exception error: no match of right hand side value <<207,128,114,194,178>>
4> CodePointAt(<<"πr²"/utf8>>, 2).
114
5> CodePointAt(<<"πr²"/utf8>>, 3).
178
6> CodePointAt(<<"πr²"/utf8>>, 4).
** exception error: no match of right hand side value <<207,128,114,194,178>>
7> CodePointAt(<<"πr²"/utf8>>, 5).
** exception error: no match of right hand side value <<207,128,114,194,178>>
As you can see, if the offset is not in a valid UTF-8 character boundary, the function will throw an error. You can handle that differently using a case expression if needed.
First, remember that only binary strings are using UTF-8 in Erlang. Plain double-quote strings are already just lists of code points (much like UTF-32). The unicode:chardata() type represents both of these kinds of strings, including mixed lists like ["Hello", $\s, [<<"Filip"/utf8>>, $!]]. You can use unicode:characters_to_list(Chardata) or unicode:characters_to_binary(Chardata) to get a flattened version to work with if needed.
Meanwhile, the JS codePointAt function works on UTF-16 encoded strings, which is what JavaScript uses. Note that the index in this case is not a byte position, but the index of the 16-bit units of the encoding. And UTF-16 is also a variable length encoding: code points that need more than 16 bits use a kind of escape sequence called "surrogate pairs" - for example emojis like 👍 - so if such characters can occur, the index is misleading: in "a👍z" (in JavaScript), the a is at 0, but the z is not at 2 but at 3.
What you want is probably what's called the "grapheme clusters" - those that look like a single thing when printed (see the docs for Erlang's string module: https://www.erlang.org/doc/man/string.html). And you can't really use numerical indexes to dig the grapheme clusters out from a string - you need to iterate over the string from the start, getting them out one at a time. This can be done with string:next_grapheme(Chardata) (see https://www.erlang.org/doc/man/string.html#next_grapheme-1) or if you for some reason really need to index them numerically, you could insert the individual cluster substrings in an array (see https://www.erlang.org/doc/man/array.html). For example: array:from_list(string:to_graphemes(Chardata)).
Tuple={<<"jid">>,Member},
Tuple_in_string=lists:flatten(io_lib:format("~p", [Tuple])),
it gives output as:
"{<<\"jid\">>,\"sdfs\"}"
But i want this output without these slashes like
"{<<"jid">>,Member}"
Any pointers?
I have tried all the answers but at the end with io:format("\"~s\"~n", [Tuple_in_string]). what am geeting is "{<<"jid">>,Member}" but it is not a string.it is a atom.I need string on which i can apply concat operation.Any pointers?
You can print it like this:
io:format("\"~s\"~n", [Tuple_in_string]).
It prints:
"{<<"jid">>,"sdfs"}"
The \ are here to denote that the following " is part of the string and not a string delimiter. they do not exist in the string itself. They appear because you use the pretty print format ~p. If you use the string format ~s they wont appear in the display.
1> io:format("~p~n",["a \"string\""]).
"a \"string\""
ok
2> io:format("~s~n",["a \"string\""]).
a "string"
ok
3> length("a \"string\""). % is 10 and not 12
10
Firstly, you don't need to flatten the list here:
Tuple_in_string=lists:flatten(io_lib:format("~p", [Tuple])),
Erlang has the concept of iodata(), which means that printable things can be in nested lists and most functions can handle them, so you should leave only:
Tuple_in_string = io_lib:format("~p", [Tuple]),
Secondly, when you use ~p, you tell Erlang to print the term in such way, that it can be copied and pasted into console. That is why all double quotes are escaped \". Use ~s, which means "treat as string".
1> 38> Tuple = {<<"jid">>,"asdf"}.
{<<"jid">>,"asdf"}
2> IODATA = io_lib:format("~p", [Tuple]).
[[123,[[60,60,"\"jid\"",62,62],44,"\"asdf\""],125]]
3> io:format("~s~n", [IODATA]).
{<<"jid">>,"asdf"}
ok
L = Packet_in_tuple_form={xmlel,<<"message">>,[{<<"id">>,<<"rkX6Q-8">>},{<<"to">>,<<"multicast.devlab">>}],[{xmlel,<<"body">>,[],[{xmlcdata,"Hello"}]},{xmlel,<<"addresses">>,[{<<"xmlns">>,<<"http://jabber.org/protocol/address">>}],[{xmlel,<<"address">>,[{<<"type">>,<<"to">>},"{<<\"jid\">>,\"sds\"}",{<<"desc">>,"Description"}],[]}]}]}.
Gives me:
{xmlel,<<"message">>,
[{<<"id">>,<<"rkX6Q-8">>},{<<"to">>,<<"multicast.devlab">>}],
[{xmlel,<<"body">>,[],[{xmlcdata,"Hello"}]},
{xmlel,<<"addresses">>,
[{<<"xmlns">>,<<"http://jabber.org/protocol/address">>}],
[{xmlel,<<"address">>,
[{<<"type">>,<<"to">>},
"{<<\"jid\">>,\"sds\"}",
{<<"desc">>,"Description"}],
[]}]}]}
The \ in the address field are escape characters.
You can verify the same by checking the length of string.
I tried to split two fields from a binary string:
-define(S,<<"M\0\0\0522039355099,010100000008,0,010170000000,0,0,0,0,0,0,,,0,0,,0110,00,150,0,0,0\0">>).<<Message_length:4/binary,Msg/binary>> = S.
the first 4 bytes are the length of the following message, the other byte are the message,
a null byte terminates the string.
The result is:
** exception error: o match of right hand side value
EDIT
Just before the given code, there is:
[Sequence|Reste] = binary:split(T,<<"\0">>),
Does "Reste" bounded ?
Your code is ok, so either you dont have a binary string, or the length of Mystring does not comply with the pattern. Here's a quick test:
1> Mystring = <<"abcde">>.
<<"abcde">>
2> <<Message_length:4/binary,Msg/binary>> = Mystring.
<<"abcde">>
3> Message_length.
<<"abcd">>
4> Msg.
<<"e">>
If you have a string (a list of integers) instead of a binary string (<<"string">>), as Vincenzo suggested, call erlang:list_to_binary/1 first.
Hope it helps
EDIT: I've checked the example string you left in a comment of Vincenzo's answer. I've tried it with your code and still works. Is it possible that Message_length and/or Msg are already bound (and different to Mystring) when reaching that line of code? That would make the pattern matching fail.
EDIT2: Tested with the updated data in the question:
1> S = <<"M\0\0\0522039355099,010100000008,0,010170000000,0,0,0,0,0,0,,,0,0,,\342\200\214\342\200\2130110,00,150,0,0,0\0">>.
<<77,0,0,42,50,48,51,57,51,53,53,48,57,57,44,48,49,48,49,
48,48,48,48,48,48,48,56,44,48,...>>
2> <<Message_length:4/binary,Msg/binary>> = S.
<<77,0,0,42,50,48,51,57,51,53,53,48,57,57,44,48,49,48,49,
48,48,48,48,48,48,48,56,44,48,...>>
3> Message_length.
<<77,0,0,42>>
4> Msg.
<<"2039355099,010100000008,0,010170000000,0,0,0,0,0,0,,,0,0,,\342"...>>
There is issue with erlang string escape interpolation. The fourth byte is not interpolated as "\0" but "\052".
1> Bin = <<"M\0\0\0522039355099,010100000008,0,010170000000,0,0,0,0,0,0,,,0,0,,0110,00,150,0,0,0\0">>.
<<77,0,0,42,50,48,51,57,51,53,53,48,57,57,44,48,49,48,49,
48,48,48,48,48,48,48,56,44,48,...>>
So you have to write it in this manner.
2> f().
ok
3> Bin = <<"M\0\0\0","522039355099,010100000008,0,010170000000,0,0,0,0,0,0,,,0,0,,0110,00,150,0,0,0\0">>.
<<77,0,0,0,53,50,50,48,51,57,51,53,53,48,57,57,44,48,49,
48,49,48,48,48,48,48,48,48,56,...>>
Then usual way to parse this form of messages is:
4> <<L:32/little,Rest/binary>> = Bin.
<<77,0,0,0,53,50,50,48,51,57,51,53,53,48,57,57,44,48,49,
48,49,48,48,48,48,48,48,48,56,...>>
5> L.
77
6> <<Msg:L/binary,R/binary>> = Rest.
<<"522039355099,010100000008,0,010170000000,0,0,0,0,0,0,,,0,0,,0110,00,150,0,0,0"...>>
7> R.
<<0>>
8> Msg.
<<"522039355099,010100000008,0,010170000000,0,0,0,0,0,0,,,0,0,,0110,00,150,0,0,0">>
You have to call list_to_binary/1 on string to be matched.
If you have further problems, type example string please!