I've inherited a binary file format with the following specification:
| F | E | D | C | B | A | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0
0:| Status bit | ------ 15 - bit unsigned integer -----------
1:| Status bit | ---- uint:10 ---- | ---- uint:5 ----
Bit matching in Erlang is awesome. So I'd love to do something like this:
<<StatBit1:1, ValA:15/unsigned>> = <<2#1000000000101010:16>>.
<<StatBit2:1, ValB:10/unsigned, ValC:5/unsigned>> = <<2#0000001010100111:16>>.
The problem is that the file I need to process is saved in 8-bit-little-endian
convention. So the very first 8-bits of the file in the example above would be
00101010 then 1000000 e.t.c.
{ok, S} = file:open("datafile", [read, binary, raw]).
{ok, <<Byte1:8, Byte2:8, Byte3:8, Byte4:8>>} = file:read(S,4).
io:format(
" ~8.2.0B | ~8.2.0B | ~8.2.0B | ~8.2.0B ~n ",
[Byte1, Byte2, Byte3, Byte4]).
# 00101010 | 1000000 | 10100111 | 00000010
# ok
So I resort to reading and swapping the bytes:
<<StatBit1:1, ValA:15/unsigned>> = <<Byte2:8, Byte1:8>>.
<<StatBit2:1, ValB:10/unsigned, ValC:5/unsigned>> = <<Byte4:8, Byte3:8>>.
Alternatively I can read 16 bit little-endian and then "parse" it:
{ok, S} = file:open("datafile", [read, binary, raw]).
{ok, <<DW1:16/little, DW2:16/little>>} = file:read(S,4).
<<StatBit1:1, ValA:15/unsigned>> = <<DW1:16>>.
<<StatBit2:1, ValB:10/unsigned, ValC:5/unsigned>> = <<DW2:16>>.
Both solutions make me equally frustrated. I still suspect that there is a nice way of
dealing with that type of situations. Is there?
I'd first look into changing the application generating these files to write the data in network (big-endian) order. If that's not possible, then you're stuck with byte swapping like you're already doing. You could wrap the swapping into a function to keep it out of your decoding logic:
byteswap16(F) ->
case file:read(F, 2) of
{ok, <<B1:8,B2:8>>} -> {ok, <<B2:8,B1:8>>};
Else -> Else
end.
Alternatively, perhaps you could preprocess the file. You mentioned in your comment that the files are huge, so maybe this isn't practical for your case, but if each file fits comfortably in memory you could use file:read_file/1 to read the whole file and then preprocess the contents using a binary comprehension:
byteswap16(Filename) ->
{ok,Bin} = file:read_file(Filename),
<< <<B2:8,B1:8>> || <<B1:8,B2:8>> <= Bin >>.
Both these solutions assume the entire file is written in 16-bit little endian format.
As an explanation of why the binary syntax (as it is) can't solve your problem, consider that the bits in your file really is in order 7, ...0, F, E, ...8. The status bit is in F, but if you say "the next field is 15 bits long, and is a little-endian unsigned integer", you'll get bits 7,...0,F,E,...9 (the next 15 bits) which will then be interpreted as little-endian. You can't express the fact that you'd like to skip bit F and use E-8 instead, and then go back and pick up bit F for the status. If you could byte swap the file first, e.g. with "dd if=infile of=outfile conv=swab", you'd make your life a whole lot easier.
Did you try something like:
[edit] make some correction, but I can't test this on my tab.
decode(<<A:8, 1:1, B:7>>) -> {status1, B*256+A};
decode(<<A:3, C:5, 0:1, B:7>>) -> {status2, B*8+A, C}.
Related
(Solution) TL;DR: Google assumes the key string is base32 encoded; replacing any 1 with I and 0 with O. This must be decoded prior to hashing.
Original Question
I'm having difficulty having my code match up with GA. I even went chasing down counters +/- ~100,000 from the current time step and found nothing. I was very excited to see my function pass the SHA-1 tests in the RFC 6238 Appendix, however when applied to "real life" it seems to fail.
I went so far as to look at the open source code for Google Authenticator at Github (here). I used the key for testing: "qwertyuiopasdfgh". According to the Github code:
/*
* Return key entered by user, replacing visually similar characters 1 and 0.
*/
private String getEnteredKey() {
String enteredKey = keyEntryField.getText().toString();
return enteredKey.replace('1', 'I').replace('0', 'O');
}
I believe my key would not be modified. Tracing through the files it seems the key remains unchanged through calls: AuthenticatorActivity.saveSecret() -> AccountDb.add() -> AccountDb.newContentValuesWith().
I compared my time between three sources:
(erlang shell): now()
(bash): date "+%s"
(Google/bash): pattern="\s*date\:\s*"; curl -I https://www.google.com 2>/dev/null | grep -iE $pattern | sed -e "s/$pattern//g" | xargs -0 date "+%s" -d
They are all the same. Despite that, it appears my phone is a bit off from my computer. It will change steps not in sync with my computer. However me trying to chase down the proper time step by +/- thousands didn't find anything. According to the NetworkTimeProvider class, that is the time source for the app.
This code worked with all the SHA-1 tests in the RFC:
totp(Secret, Time) ->
% {M, S, _} = os:timestamp(),
Msg = binary:encode_unsigned(Time), %(M*1000000+S) div 30,
%% Create 0-left-padded 64-bit binary from Time
Bin = <<0:((8-size(Msg))*8),Msg/binary>>,
%% Create SHA-1 hash
Hash = crypto:hmac(sha, Secret, Bin),
%% Determine dynamic offset
Offset = 16#0f band binary:at(Hash,19),
%% Ignore that many bytes and store 4 bytes into THash
<<_:Offset/binary, THash:4/binary, _/binary>> = Hash,
%% Remove sign bit and create 6-digit code
Code = (binary:decode_unsigned(THash) band 16#7fffffff) rem 1000000,
%% Convert to text-string and 0-lead-pad if necessary
lists:flatten(string:pad(integer_to_list(Code),6,leading,$0)).
For it to truly match the RFC it would need to be modified for 8-digit numbers above. I modified it to try and chase down the proper step. The goal was to figure out how my time was wrong. Didn't work out:
totp(_,_,_,0) ->
{ok, not_found};
totp(Secret,Goal,Ctr,Stop) ->
Msg = binary:encode_unsigned(Ctr),
Bin = <<0:((8-size(Msg))*8),Msg/binary>>,
Hash = crypto:hmac(sha, Secret, Bin),
Offset = 16#0f band binary:at(Hash,19),
<<_:Offset/binary, THash:4/binary, _/binary>> = Hash,
Code = (binary:decode_unsigned(THash) band 16#7fffffff) rem 1000000,
if Code =:= Goal ->
{ok, {offset, 2880 - Stop}};
true ->
totp(Secret,Goal,Ctr+1,Stop-1) %% Did another run with Ctr-1
end.
Anything obvious stick out?
I was tempted to make my own Android application to implement TOTP for my project. I did continue looking at the Java code. With aid of downloading the git repository and grep -R to find function calls I discovered my problem. To get the same pin codes as Google Authenticator the key is assumed to be base32 encoded and must be decoded prior to passing it to the hash algorithm.
There was a hint of this in getEnteredKey() by replacing the 0 and 1 characters as these are not present in the base32 alphabet.
Seems there are a lot of half float questions in other languages, but could not find one for Erlang.
So, I have a 2-byte float as part of a longer binary pattern input. I tried to use pattern matching like <<AFloat:16/float>> and got warning/error in compiler, while using 32/float produced no warning.
Question is: What workaround is there in Erlang to convert the 2-byte binary into float?
I saw the other elaborate bit processing answer to "reading from binary file in Erlang", and do not know if it is required in this case.
* Thanks for the answers below. I shall try them out later *
-- two sets of sample input: EF401C3FEA3F, 1242C341C341
It looks like Erlang does not support float widths other than 32 and 64 (at least currently), but I can't seem to find any documentation that says this explicitly.
Update: I checked the implementation of the bit syntax, and it definitely only handles 32 and 64 bit floats. This should really be better documented.
Update again: added to documentation upstream now.
Final update: Just saw your note about hex input. If you solve that as a first step, transforming the hex string H to a binary B with the actual data bytes, you can use the following to repack the data as 32-bit floats (a simple transformation), and then extract those the normal way:
Floats = [F || <<F:32/float>> <- [<<S:1,(E+(127-15)):8,(M bsl 13):23>> || <<S:1,E:5,M:10>> <- B]]
There is no support for 16 bit floats. If you want collect and works on float coming from another system (via a file for example) then you can easily convert this float to the representation used in your VM.
First read the file and extract the float16 as a 2 bytes binary, and call this conversion function:
-module (float16).
-export([conv_16_to_vm/1]).
% conv_16_to_vm(binary) with binary is a 16 bit float representation
conv_16_to_vm(<<S:1,E:5,M:10>>)->
conv(S,E,M).
conv(_,0,0) -> 0.0;
conv(_,31,_) -> {error,nan_or_qnan_or_infinity};
% Sign management
conv(1,E,M) -> -conv(0,E,M);
% sub normal floats
conv(0,0,M) -> M/(1 bsl 24);
% normal floats
conv(0,E,M) when E < 25 -> (1024 + M)/(1 bsl (25-E));
conv(0,25,M) -> (1024 + M);
conv(0,E,M) -> (1024 + M)*(1 bsl (E-25)).
I used the definition provided by Wikipedia and you can test it:
1> c(float16).
{ok,float16}
2> float16:conv_16_to_vm(<<0,1>>). % 0 00000 0000000001
5.960464477539063e-8
3> float16:conv_16_to_vm(<<3,255>>). % 0 00011 1111111111
6.097555160522461e-5
4> float16:conv_16_to_vm(<<4,0>>). % 0 00100 0000000000
6.103515625e-5
5> float16:conv_16_to_vm(<<123,255>>). % 0 11110 1111111111
65504
6> float16:conv_16_to_vm(<<60,0>>). % 0 01111 0000000000
1.0
7> float16:conv_16_to_vm(<<60,1>>). % 0 01111 0000000001
1.0009765625
8> float16:conv_16_to_vm(<<59,255>>). % 0 01110 1111111111
0.99951171875
8> float16:conv_16_to_vm(<<53,85>>). % 0 01101 0101010101
0.333251953125
As you should expect, the traditional problem or "rounding" is much more visible.
I have a list of variables for which I want to create a list of numbered variables. The intent is to use these with the reshape command to create a stacked data set. How do I keep them in order? For instance, with this code
local ct = 1
foreach x in q61 q77 q99 q121 q143 q165 q187 q209 q231 q253 q275 q297 q306 q315 q324 q333 q342 q351 q360 q369 q378 q387 q396 q405 q414 q423 {
gen runs`ct' = `x'
local ct = `ct' + 1
}
when I use the reshape command it generates an order as
runs1 runs10 runs11 ... runs2 runs22 ...
rather than the desired
runs01 runs02 runs03 ... runs26
Preserving the order is necessary in this analysis. I'm trying to add a leading zero to all ct values less than 10 when assigning variable names.
Generating a series of identifiers with leading zeros is a documented and solved problem: see e.g. here.
local j = 1
foreach v in q61 q77 q99 q121 q143 q165 q187 q209 q231 q253 q275 q297 q306 q315 q324 q333 q342 q351 q360 q369 q378 q387 q396 q405 q414 q423 {
local J : di %02.0f `j'
rename `v' runs`J'
local ++j
}
Note that I used rename rather than generate. If you are going to reshape the variables afterwards, the labour of copying the contents is unnecessary. Indeed the default float type for numeric variables used by generate could in some circumstances result in loss of precision.
I note that there may also be a solution with rename groups.
All that said, it's hard to follow your complaint about what reshape does (or does not) do. If you have a series of variables like runs* the most obvious reshape is a reshape long and for example
clear
set obs 1
gen id = _n
foreach v in q61 q77 q99 q121 q143 {
gen `v' = 42
}
reshape long q, i(id) j(which)
list
+-----------------+
| id which q |
|-----------------|
1. | 1 61 42 |
2. | 1 77 42 |
3. | 1 99 42 |
4. | 1 121 42 |
5. | 1 143 42 |
+-----------------+
works fine for me; the column order information is preserved and no use of rename was needed at all. If I want to map the suffixes to 1 up, I can just use egen, group().
So, that's hard to discuss without a reproducible example. See
https://stackoverflow.com/help/mcve for how to post good code examples.
Using Google Sheets, I want to automatically number rows like so:
The key is that I want this to use built-in functions only.
I have an implementation working where child items are in separate columns (e.g. "Foo" is in column B, "Bar" is in column C, and "Baz" is in column D). However, it uses a custom JavaScript function, and the slow way that custom JavaScript functions are evaluated, combined with the dependencies, possibly combined with a slow Internet connection, means that my solution can take over one second per row (!) to calculate.
For reference, here's my custom function (that I want to abandon in favor of native code):
/**
* Calculate the Work Breakdown Structure id for this row.
*
* #param {range} priorIds IDs that precede this one.
* #param {range} names The names for this row.
* #return A WBS string id (e.g. "2.1.5") or an empty string if there are no names.
* #customfunction
*/
function WBS_ID(priorIds,names){
if (Array.isArray(names[0])) names = names[0];
if (!names.join("")) return "";
var lastId,pieces=[];
for (var i=priorIds.length;i-- && !lastId;) lastId=priorIds[i][0];
if (lastId) pieces = (lastId+"").split('.').map(function(s){ return s*1 });
for (var i=0;i<names.length;i++){
if (names[i]){
var s = pieces.concat();
pieces.length=i+1;
pieces[i] = (pieces[i]||0) + 1;
return pieces.join(".");
}
}
}
For example, cell A7 would use the formula:
=WBS_ID(A$2:A6,B7:D7)
...to produce the result "1.3.2"
Note that in the above example blank rows are skipped during numbering. An answer that does not honor this—where the ID is calculated determinstically from the ROW())—is acceptable (and possibly even desirable).
Edit: Yes, I've tried to do this myself. I have a solution that uses three extra columns which I chose not to include in the question. I have been writing equations in Excel for at least 25 years (and Google Spreadsheets for 1 year). I have looked through the list of functions for Google Spreadsheets and none of them jumps out to me as making possible something that I didn't think of before.
When the question is a programming problem and the problem is an inability to see how to get from point A to point B, I don't know that it's useful to "show what I've done". I've considered splitting by periods. I've looked for a map equivalent function. I know how to use isblank() and counta().
Lol this is hilariously the longest (and very likely the most unnecessarily complicated way to combine formulas) but because I thought it was interesting that it does in fact work, so long as you just add a 1 in the first row then in the second row you add:
=if(row()=1,1,if(and(istext(D2),counta(split(A1,"."))=3),left(A1,4)&n(right(A1,1)+1),if(and(isblank(B2),isblank(C2),isblank(D2)),"",if(and(isblank(B2),isblank(C2),isnumber(indirect(address(row()-1,column())))),indirect(address(row()-1,column()))&"."&if(istext(D2),round(max(indirect(address(1,column())&":"&address(row()-1,column())))+0.1,)),if(and(isblank(B2),istext(C2)),round(max(indirect(address(1,column())&":"&address(row()-1,column())))+0.1,2),if(istext(B2),round(max(indirect(address(1,column())&":"&address(row()-1,column())))+1,),))))))
in my defense ive had a very long day at work - complicating what should be a simple thing seems to be my thing today :)
Foreword
Spreadsheet built-in functions doesn't include an equivalent to JavaScript .map. The alternative is to use the spreadsheets array handling features and iteration patterns.
A "complete solution" could include the use of built-in functions to automatically transform the user input into a simple table and returning the Work Breakdown Structure number (WBS) . Some people refer to transforming the user input into a simple table as "normalization" but including this will make this post to be too long for the Stack Overflow format, so it will be focused in presenting a short formula to obtain the WBS.
It's worth to say that using formulas for doing the transformation of large data sets into a simple table as part of the continuous spreadsheet calculations, in this case, of WBS, will make the spreadsheet to slow to refresh.
Short answer
To keep the WBS formula short and simple, first transform the user input into a simple table including task name, id and parent id columns, then use a formula like the following:
=ArrayFormula(
IFERROR(
INDEX($D$2:$D,MATCH($C2,$B$2:$B,0))
&"."
&COUNTIF($C$2:$C2,C2),
RANK($B2,FILTER($B$2:B,LEN($C$2:$C)=0),TRUE)&"")
)
Explanation
First, prepare your data
Put each task in one row. Include a General task / project to be used as the parent of all the root level tasks.
Add an ID to each task.
Add a reference to the ID of the parent task for each task. Left blank for the General task / project.
After the above steps the data should look like the following:
+---+--------------+----+-----------+
| | A | B | C |
+---+--------------+----+-----------+
| 1 | Task | ID | Parent ID |
| 2 | General task | 1 | |
| 3 | Substast 1 | 2 | 1 |
| 4 | Substast 2 | 3 | 1 |
| 5 | Subsubtask 1 | 4 | 2 |
| 6 | Subsubtask 2 | 5 | 2 |
+---+--------------+----+-----------+
Remark: This also could help to reduce of required processing time of a custom funcion.
Second, add the below formula to D2, then fill down as needed,
=ArrayFormula(
IFERROR(
INDEX($D$2:$D,MATCH($C2,$B$2:$B,0))
&"."
&COUNTIF($C$2:$C2,C2),
RANK($B2,FILTER($B$2:B,LEN($C$2:$C)=0),TRUE)&"")
)
The result should look like the following:
+---+--------------+----+-----------+----------+
| | A | B | C | D |
+---+--------------+----+-----------+----------+
| 1 | Task | ID | Parent ID | WBS |
| 2 | General task | 1 | | 1 |
| 3 | Substast 1 | 2 | 1 | 1.1 |
| 4 | Substast 2 | 3 | 1 | 1.2 |
| 5 | Subsubtask 1 | 4 | 2 | 1.1.1 |
| 6 | Subsubtask 2 | 5 | 2 | 1.1.2 |
+---+--------------+----+-----------+----------+
Here's an answer that does not allow a blank line between items, and requires that you manually type "1" into the first cell (A2). This formula is applied to cell A3, with the assumption that there are at most three levels of hierarchy in columns B, C, and D.
=IF(
COUNTA(B3), // If there is a value in the 1st column
INDEX(SPLIT(A2,"."),1)+1, // find the 1st part of the prior ID, plus 1
IF( // ...otherwise
COUNTA(C3), // If there's a value in the 2nd column
INDEX(SPLIT(A2,"."),1) // find the 1st part of the prior ID
& "." // add a period and
& IFERROR(INDEX(SPLIT(A2,"."),2),0)+1, // add the 2nd part of the prior ID (or 0), plus 1
INDEX(SPLIT(A2,"."),1) // ...otherwise find the 1st part of the prior ID
& "." // add a period and
& IFERROR(INDEX(SPLIT(A2,"."),2),1) // add the 2nd part of the prior ID or 1 and
& "." // add a period and
& IFERROR(INDEX(SPLIT(A2,"."),3)+1,1) // add the 3rd part of the prior ID (or 0), plus 1
)
) & "" // Ensure the result is a string ("1.2", not 1.2)
Without comments:
=IF(COUNTA(B3),INDEX(SPLIT(A2,"."),1)+1,IF(COUNTA(C3),INDEX(SPLIT(A2,"."),1)& "."& IFERROR(INDEX(SPLIT(A2,"."),2),0)+1,INDEX(SPLIT(A2,"."),1)& "."& IFERROR(INDEX(SPLIT(A2,"."),2),1)& "."& IFERROR(INDEX(SPLIT(A2,"."),3)+1,1))) & ""
I have a binary M such that 34= will always be present and the rest may vary between any number of digits but will always be an integer.
M = [<<"34=21">>]
When I run this command I get an answer like
hd([X || <<"34=", X/binary >> <- M])
Answer -> <<"21">>
How can I get this to be an integer with the most care taken to make it as efficient as possible?
[<<"34=",X/binary>>] = M,
list_to_integer(binary_to_list(X)).
That yields the integer 21
As of R16B, the BIF binary_to_integer/1 can be used:
OTP-10300
Added four new bifs, erlang:binary_to_integer/1,2,
erlang:integer_to_binary/1, erlang:binary_to_float/1 and
erlang:float_to_binary/1,2. These bifs work similarly to how
their list counterparts work, except they operate on
binaries. In most cases converting from and to binaries is
faster than converting from and to lists.
These bifs are auto-imported into erlang source files and can
therefore be used without the erlang prefix.
So that would look like:
[<<"34=",X/binary>>] = M,
binary_to_integer(X).
A string representation of a number can be converted by N-48. For multi-digit numbers you can fold over the binary, multiplying by the power of the position of the digit:
-spec to_int(binary()) -> integer().
to_int(Bin) when is_binary(Bin) ->
to_int(Bin, {size(Bin), 0}).
to_int(_, {0, Acc}) ->
erlang:trunc(Acc);
to_int(<<N/integer, Tail/binary>>, {Pos, Acc}) when N >= 48, N =< 57 ->
to_int(Tail, {Pos-1, Acc + ((N-48) * math:pow(10, Pos-1))}).
The performance of this is around 100 times slower than using the list_to_integer(binary_to_list(X)) option.