Matching TOTP implementation with Google Authenticator - erlang

(Solution) TL;DR: Google assumes the key string is base32 encoded; replacing any 1 with I and 0 with O. This must be decoded prior to hashing.
Original Question
I'm having difficulty having my code match up with GA. I even went chasing down counters +/- ~100,000 from the current time step and found nothing. I was very excited to see my function pass the SHA-1 tests in the RFC 6238 Appendix, however when applied to "real life" it seems to fail.
I went so far as to look at the open source code for Google Authenticator at Github (here). I used the key for testing: "qwertyuiopasdfgh". According to the Github code:
/*
* Return key entered by user, replacing visually similar characters 1 and 0.
*/
private String getEnteredKey() {
String enteredKey = keyEntryField.getText().toString();
return enteredKey.replace('1', 'I').replace('0', 'O');
}
I believe my key would not be modified. Tracing through the files it seems the key remains unchanged through calls: AuthenticatorActivity.saveSecret() -> AccountDb.add() -> AccountDb.newContentValuesWith().
I compared my time between three sources:
(erlang shell): now()
(bash): date "+%s"
(Google/bash): pattern="\s*date\:\s*"; curl -I https://www.google.com 2>/dev/null | grep -iE $pattern | sed -e "s/$pattern//g" | xargs -0 date "+%s" -d
They are all the same. Despite that, it appears my phone is a bit off from my computer. It will change steps not in sync with my computer. However me trying to chase down the proper time step by +/- thousands didn't find anything. According to the NetworkTimeProvider class, that is the time source for the app.
This code worked with all the SHA-1 tests in the RFC:
totp(Secret, Time) ->
% {M, S, _} = os:timestamp(),
Msg = binary:encode_unsigned(Time), %(M*1000000+S) div 30,
%% Create 0-left-padded 64-bit binary from Time
Bin = <<0:((8-size(Msg))*8),Msg/binary>>,
%% Create SHA-1 hash
Hash = crypto:hmac(sha, Secret, Bin),
%% Determine dynamic offset
Offset = 16#0f band binary:at(Hash,19),
%% Ignore that many bytes and store 4 bytes into THash
<<_:Offset/binary, THash:4/binary, _/binary>> = Hash,
%% Remove sign bit and create 6-digit code
Code = (binary:decode_unsigned(THash) band 16#7fffffff) rem 1000000,
%% Convert to text-string and 0-lead-pad if necessary
lists:flatten(string:pad(integer_to_list(Code),6,leading,$0)).
For it to truly match the RFC it would need to be modified for 8-digit numbers above. I modified it to try and chase down the proper step. The goal was to figure out how my time was wrong. Didn't work out:
totp(_,_,_,0) ->
{ok, not_found};
totp(Secret,Goal,Ctr,Stop) ->
Msg = binary:encode_unsigned(Ctr),
Bin = <<0:((8-size(Msg))*8),Msg/binary>>,
Hash = crypto:hmac(sha, Secret, Bin),
Offset = 16#0f band binary:at(Hash,19),
<<_:Offset/binary, THash:4/binary, _/binary>> = Hash,
Code = (binary:decode_unsigned(THash) band 16#7fffffff) rem 1000000,
if Code =:= Goal ->
{ok, {offset, 2880 - Stop}};
true ->
totp(Secret,Goal,Ctr+1,Stop-1) %% Did another run with Ctr-1
end.
Anything obvious stick out?

I was tempted to make my own Android application to implement TOTP for my project. I did continue looking at the Java code. With aid of downloading the git repository and grep -R to find function calls I discovered my problem. To get the same pin codes as Google Authenticator the key is assumed to be base32 encoded and must be decoded prior to passing it to the hash algorithm.
There was a hint of this in getEnteredKey() by replacing the 0 and 1 characters as these are not present in the base32 alphabet.

Related

Erlang: variable is unbound

Why is the following saying variable unbound?
9> {<<A:Length/binary, Rest/binary>>, Length} = {<<1,2,3,4,5>>, 3}.
* 1: variable 'Length' is unbound
It's pretty clear that Length should be 3.
I am trying to have a function with similar pattern matching, ie.:
parse(<<Body:Length/binary, Rest/binary>>, Length) ->
But if fails with the same reason. How can I achieve the pattern matching I want?
What I am really trying to achieve is parse in incoming tcp stream packets as LTV(Length, Type, Value).
At some point after I parse the the Length and the Type, I want to ready only up to Length number of bytes as the value, as the rest will probably be for the next LTV.
So my parse_value function is like this:
parse_value(Value0, Left, Callback = {Module, Function},
{length, Length, type, Type, value, Value1}) when byte_size(Value0) >= Left ->
<<Value2:Left/binary, Rest/binary>> = Value0,
Module:Function({length, Length, type, Type, value, lists:reverse([Value2 | Value1])}),
if
Rest =:= <<>> ->
{?MODULE, parse, {}};
true ->
parse(Rest, Callback, {})
end;
parse_value(Value0, Left, _, {length, Length, type, Type, value, Value1}) ->
{?MODULE, parse_value, Left - byte_size(Value0), {length, Length, type, Type, value, [Value0 | Value1]}}.
If I could do the pattern matching, I could break it up to something more pleasant to the eye.
The rules for pattern matching are that if a variable X occurs in two subpatterns, as in {X, X}, or {X, [X]}, or similar, then they have to have the same value in both positions, but the matching of each subpattern is still done in the same input environment - bindings from one side do not carry over to the other. The equality check is conceptually done afterwards, as if you had matched on {X, X2} and added a guard X =:= X2. This means that your Length field in the tuple cannot be used as input to the binary pattern, not even if you make it the leftmost element.
However, within a binary pattern, variables bound in a field can be used in other fields following it, left-to-right. Therefore, the following works (using a leading 32-bit size field in the binary):
1> <<Length:32, A:Length/binary, Rest/binary>> = <<0,0,0,3,1,2,3,4,5>>.
<<0,0,0,3,1,2,3,4,5>>
2> A.
<<1,2,3>>
3> Rest.
<<4,5>>
I've run into this before. There is some weirdness between what is happening inside binary syntax and what happens during unification (matching). I suspect that it is just that binary syntax and matching occur at different times in the VM somewhere (we don't know which Length is failing to get assigned -- maybe binary matching is always first in evaluation, so Length is still meaningless). I was once going to dig in and find out, but then I realized that I never really needed to solve this problem -- which might be why it was never "solved".
Fortunately, this won't stop you with whatever you are doing.
Unfortunately, we can't really help further unless you explain the context in which you think this kind of a match is a good idea (you are having an X-Y problem).
In binary parsing you can always force the situation to be one of the following:
Have a fixed-sized header at the beginning of the binary message that tells you the next size element you need (and from there that can continue as a chain of associations endlessly)
Inspect the binary once on entry to determine the size you are looking for, pull that one value, and then begin the real parsing task
Have a set of fields, all of predetermined sizes that conform to some a binary schema standard
Convert the binary to a list and iterate through it with any arbitrary amount of look-ahead and backtracking you might need
Quick Solution
Without knowing anything else about your general problem, a typical solution would look like:
parse(Length, Bin) ->
<<Body:Length/binary, Rest/binary>> = Bin,
ok = do_something(Body),
do_other_stuff(Rest).
But I smell something funky here.
Having things like this in your code is almost always a sign that a more fundamental aspect of the code structure is not in agreement with the data that you are handling.
But deadlines.
Erlang is all about practical code that satisfies your goals in the real world. With that in mind, I suggest that you do something like the above for now, and then return to this problem domain and rethink it. Then refactor it. This will gain you three benefits:
Something will work right away.
You will later learn something fundamental about parsing in general.
Your code will almost certainly run faster if it fits your data better.
Example
Here is an example in the shell:
1> Parse =
1> fun
1> (Length, Bin) when Length =< byte_size(Bin) ->
1> <<Body:Length/binary, Rest/binary>> = Bin,
1> ok = io:format("Chopped off ~p bytes: ~p~n", [Length, Body]),
1> Rest;
1> (Length, Bin) ->
1> ok = io:format("Binary shorter than ~p~n", [Length]),
1> Bin
1> end.
#Fun<erl_eval.12.87737649>
2> Parse(3, <<1,2,3,4,5>>).
Chopped off 3 bytes: <<1,2,3>>
<<4,5>>
3> Parse(8, <<1,2,3,4,5>>).
Binary shorter than 8
<<1,2,3,4,5>>
Note that this version is a little safer, in that we avoid a crash in the case that Length is longer than the binary. This is yet another good reason why maybe we can't do that match in the function head.
Try with below code:
{<<A:Length/binary, Rest/binary>>, _} = {_, Length} = {<<1,2,3,4,5>>, 3}.
This question is mentioned a bit in EEP-52:
Any variables used in the expression must have been previously bound, or become bound in the same binary pattern as the expression. That is, the following example is illegal:
illegal_example2(N, <<X:N,T/binary>>) ->
{X,T}.
And explained a bit more in the following e-mail: http://erlang.org/pipermail/eeps/2020-January/000636.html
Illegal. With one exception, matching is not done in a left-to-right
order, but all variables in the pattern will be bound at the same
time. That means that the variables must be bound before the match
starts. For maps, that means that the variables referenced in key
expressions must be bound before the case (or receive) that matches
the map. In a function head, all map keys must be literals.
The exception to this general rule is that within a binary pattern,
the segments are matched from left to right, and a variable bound in a
previous segment can be used in the size expression for a segment
later in the binary pattern.
Also one of the members of OTP team mentioned that they made a prototype that can do that, but it was never finished http://erlang.org/pipermail/erlang-questions/2020-May/099538.html
We actually tried to make your example legal. The transformation of
the code that we did was not to rewrite to guards, but to match
arguments or parts of argument in the right order so that variables
that input variables would be bound before being used. (We would do a
topological sort to find the correct order.) For your example, the
transformation would look similar to this:
legal_example(Key, Map) ->
case Map of
#{Key := Value} -> Value;
_ -> error(function_clause, [Key, Map])
end.
In the prototype implementation, the compiler could compile the
following example:
convoluted(Ref,
#{ node(Ref) := NodeId, Loop := universal_answer},
[{NodeId, Size} | T],
<<Int:(Size*8+length(T)),Loop>>) when is_reference(Ref) ->
Int.
Things started to fall apart when variables are repeated. Repeated
variables in patterns already have a meaning in Erlang (they should be
the same), so it become tricky to understand to distinguish between
variables being bound or variables being used a binary size or map
key. Here is an example that the prototype couldn't handle:
foo(#{K := K}, K) -> ok.
A human can see that it should be transformed similar to this:
foo(Map, K) -> case Map of
{K := V} when K =:= V -> ok end.
Here are few other examples that should work but the prototype would
refuse to compile (often emitting an incomprehensible error message):
bin2(<<Sz:8,X:Sz>>, <<Y:Sz>>) -> {X,Y}.
repeated_vars(#{K := #{K := K}}, K) -> K.
match_map_bs(#{K1 := {bin,<<Int:Sz>>}, K2 := <<Sz:8>>}, {K1,K2}) ->
Int.
Another problem was when example was correctly rejected, the error
message would be confusing.
Because much more work would clearly be needed, we have shelved the
idea for now. Personally, I am not sure that the idea is sound in the
first place. But I am sure of one thing: the implementation would be
very complicated.
UPD: latest news from 2020-05-14

Feed an intermediate digest to a sha-1 function?

I need to compute a hash of a text that is in two pieces, held by two different parties, who should not have access to each others' texts.
Since SHA-1 is incremental I have heard that this should be possible but cannot find any answers with Google of any libraries that implement this.
I'd like the first party to SHA-1 hash its part of the text, and then feed the hash (digest) to the 2nd party, and they will continue with hashing, and compute the total digest of the combined texts. If this is possible, does anyone know of any library that takes a text and a previous hash as input arguments? Preferably in javascript or python but I'm actually happy to accept any language.
Update: I am looking at the source code of the forge (javascript) implementation of SHA-1, and see this code in the update method:
// initialize hash value for this chunk
a = s.h0;
b = s.h1;
c = s.h2;
d = s.h3;
e = s.h4;
And at the end of that method:
s.h0 = (s.h0 + a) | 0;
s.h1 = (s.h1 + b) | 0;
s.h2 = (s.h2 + c) | 0;
s.h3 = (s.h3 + d) | 0;
s.h4 = (s.h4 + e) | 0;
So it seems that by carrying these values + any remaining data in the input buffer over, one should be able to re-create the state of the SHA-1 object at party number 2. So a bit of the buffer will leak over but I think that should be fine. Will investigate further to see if this works.

how to build complete certificate name, include serial number using openssl's index.txt

Problem Description:
I need to build a regular expression / pattern to find a value that can either be decimal or hex
Background Information:
I am trying to build a lua function that will lookup a cert in index.txt and return the serial number. Ultimately, I need to be able to take the full cert name and run the following command:
openssl x509 -noout -in
/etc/ssl/cert/myusername.6A756C65654063616E2E77746274732E6E6574.8F.crt
-dates
I have the logic to build the file name, all the way up to the serial number... which in the above example, is 8F.
Here's what the index.txt file looks like:
R 140320154649Z 150325040807Z 8E unknown /CN=test#gmail.com/emailAddress=test#gmail.com
V 160324050821Z 8F unknown /CN=test#yahoo.com/emailAddress=test#yahoo.com
V 160324051723Z 90 unknown /CN=test2#yahoo.com/emailAddress=test2#yahoo.com
The serial number is field 4 in the first record, and field 3 in the rest of the records.
According to the documentation https://www.openssl.org/docs/apps/x509.html, serial number can either be hex or decimal.
I'm not quite sure yet how / who determines whether it's hex or decimal (i'm modifying someone else's code that uses openssl)... but I'm wondering if there's a way to check for both. I'll only be checking the value for records that are not Revoked ...aka. ones that do not have "R" in the first column.
Thanks.
Lua unfortunately does not support grouping of patterns, so that you could make the pattern for the second timestamp optional. What you could do is check for the two-timestamp pattern first, and if no match was found (which means that match returns nil), repeat for the one-timestamp pattern:
sn = string.match(line, "^%a%s+%d+Z%s+%d+Z%s+(%x+)")
if not sn then
sn = string.match(line, "^%a%s+%d+Z%s+(%x+)")
end
Note that you could do this all in one line if you're eager:
sn = string.match(line, "^%a%s+%d+Z%s+%d+Z%s+(%x+)") or string.match(line, "^%a%s+%d+Z%s+(%x+)")
Each set of parentheses captures what is matched inside and adds a return value. For more information on patterns in Lua, see the reference manual.
local cert = {
'R 140320154649Z 150325040807Z 8E unknown /CN=test#gmail.com/emailAddress=test#gmail.com',
'V 160324050821Z 8F unknown /CN=test#yahoo.com/emailAddress=test#yahoo.com',
'V 160324051723Z 90 unknown /CN=test2#yahoo.com/emailAddress=test2#yahoo.com'
}
-- for Lua 5.1
for _, crt in ipairs(cert) do
local n3, n4 = crt:match'^%S+%s+%S+%s+(%S+)%s+(%S+)'
local serial = n3:match'^%x+$' or n4:match'^%x+$'
print(serial)
end
-- for Lua 5.2
for _, crt in ipairs(cert) do
local serial = crt:match'^%S+%s+%S+.-%f[%S](%x+)%f[%s]'
print(serial)
end

Can I get a list of all currently-registered atoms?

My project has blown through the max 1M atoms, we've cranked up the limit, but I need to apply some sanity to the code that people are submitting with regard to list_to_atom and its friends. I'd like to start by getting a list of all the registered atoms so I can see where the largest offenders are. Is there any way to do this. I'll have to be creative about how I do it so I don't end up trying to dump 1-2M lines in a live console.
You can get hold of all atoms by using an undocumented feature of the external term format.
TL;DR: Paste the following line into the Erlang shell of your running node. Read on for explanation and a non-terse version of the code.
(fun F(N)->try binary_to_term(<<131,75,N:24>>) of A->[A]++F(N+1) catch error:badarg->[]end end)(0).
Elixir version by Ivar Vong:
for i <- 0..:erlang.system_info(:atom_count)-1, do: :erlang.binary_to_term(<<131,75,i::24>>)
An Erlang term encoded in the external term format starts with the byte 131, then a byte identifying the type, and then the actual data. I found that EEP-43 mentions all the possible types, including ATOM_INTERNAL_REF3 with type byte 75, which isn't mentioned in the official documentation of the external term format.
For ATOM_INTERNAL_REF3, the data is an index into the atom table, encoded as a 24-bit integer. We can easily create such a binary: <<131,75,N:24>>
For example, in my Erlang VM, false seems to be the zeroth atom in the atom table:
> binary_to_term(<<131,75,0:24>>).
false
There's no simple way to find the number of atoms currently in the atom table*, but we can keep increasing the number until we get a badarg error.
So this little module gives you a list of all atoms:
-module(all_atoms).
-export([all_atoms/0]).
atom_by_number(N) ->
binary_to_term(<<131,75,N:24>>).
all_atoms() ->
atoms_starting_at(0).
atoms_starting_at(N) ->
try atom_by_number(N) of
Atom ->
[Atom] ++ atoms_starting_at(N + 1)
catch
error:badarg ->
[]
end.
The output looks like:
> all_atoms:all_atoms().
[false,true,'_',nonode#nohost,'$end_of_table','','fun',
infinity,timeout,normal,call,return,throw,error,exit,
undefined,nocatch,undefined_function,undefined_lambda,
'DOWN','UP','EXIT',aborted,abs_path,absoluteURI,ac,accessor,
active,all|...]
> length(v(-1)).
9821
* In Erlang/OTP 20.0, you can call erlang:system_info(atom_count):
> length(all_atoms:all_atoms()) == erlang:system_info(atom_count).
true
I'm not sure if there's a way to do it on a live system, but if you can run it in a test environment you should be able to get a list via crash dump. The atom table is near the end of the crash dump format. You can create a crash dump via erlang:halt/1, but that will bring down the whole runtime system.
I dare say that if you use more than 1M atoms, then you are doing something wrong. Atoms are intended to be static as soon as the application runs or at least upper bounded by some small number, 3000 or so for a medium sized application.
Be very careful when an enemy can generate atoms in your vm. especially calls like list_to_atom/1 is somewhat dangerous.
EDITED (wrong answer..)
You can adjust number of atoms with +t
http://www.erlang.org/doc/efficiency_guide/advanced.html
..but I know very few use cases when it is necessary.
You can track atom stats with erlang:memory()

Erlang: erl shell hangs after building a large data structure

As suggested in answers to a previous question, I tried using Erlang proplists to implement a prefix trie.
The code seems to work decently well... But, for some reason, it doesn't play well with the interactive shell. When I try to run it, the shell hangs:
> Trie = trie:from_dict(). % Creates a trie from a dictionary
% ... the trie is printed ...
% Then nothing happens
I see the new trie printed to the screen (ie, the call to trie:from_dict() has returned), then the shell just hangs. No new > prompt comes up and ^g doesn't do anything (but ^c will eventually kill it off).
With a very small dictionary (the first 50 lines of /usr/share/dict/words), the hang only lasts a second or two (and the trie is built almost instantly)... But it seems to grow exponentially with the size of the dictionary (100 words takes 5 or 10 seconds, I haven't had the patience to try larger wordlists). Also, as the shell is hanging, I notice that the beam.smp process starts eating up a lot of memory (somewhere between 1 and 2 gigs).
So, is there anything obvious that could be causing this shell hang and incredible memory usage?
Some various comments:
I have a hunch that the garbage collector is at fault, but I don't know how to profile or create an experiment to test that.
I've tried profiling with eprof and nothing obvious showed up.
Here is my "add string to trie" function:
add([], Trie) ->
[ stop | Trie ];
add([Ch|Rest], Trie) ->
SubTrie = proplists:get_value(Ch, Trie, []),
NewSubTrie = add(Rest, SubTrie),
NewTrie = [ { Ch, NewSubTrie } | Trie ],
% Arbitrarily decide to compress key/value list once it gets
% more than 60 pairs.
if length(NewTrie) > 60 ->
proplists:compact(NewTrie);
true ->
NewTrie
end.
The problem is (amongst others ? -- see my comment) that you are always adding a new {Ch, NewSubTrie} tuple to your proplist Tries, no matter if Ch already existed, or not.
Instead of
NewTrie = [ { Ch, NewSubTrie } | Trie ]
you need something like:
NewTrie = lists:keystore(Ch, 1, Trie, {Ch, NewSubTrie})
You're not really building a trie here. Your end result is effectively a randomly ordered proplist of proplists that requires full scans at each level when walking the list. Tries are typically implied ordering based on position in the array (or list).
Here's an implementation that uses tuples as the storage mechanism. Calling set only rebuilds the root and direct path tuples.
(note: would probably have to make the pair a triple (adding size) make delete work with any efficiency)
I believe erlang tuples are really just arrays (thought I read that somewhere), so lookup should be super fast, and modify is probably straight forward. Maybe this is faster with the array module, but I haven't really played with it much to know.
this version also stores an arbitrary value, so you can do things like:
1> c(trie).
{ok,trie}
2> trie:get("ab",trie:set("aa",bar,trie:new("ab",foo))).
foo
3> trie:get("abc",trie:set("aa",bar,trie:new("ab",foo))).
undefined
4>
code (entire module): note2: assumes lower case non empty string keys
-module(trie).
-compile(export_all).
-define(NEW,{ %% 26 pairs, to avoid cost of calculating a new level at runtime
{undefined,nodepth},{undefined,nodepth},{undefined,nodepth},{undefined,nodepth},
{undefined,nodepth},{undefined,nodepth},{undefined,nodepth},{undefined,nodepth},
{undefined,nodepth},{undefined,nodepth},{undefined,nodepth},{undefined,nodepth},
{undefined,nodepth},{undefined,nodepth},{undefined,nodepth},{undefined,nodepth},
{undefined,nodepth},{undefined,nodepth},{undefined,nodepth},{undefined,nodepth},
{undefined,nodepth},{undefined,nodepth},{undefined,nodepth},{undefined,nodepth},
{undefined,nodepth},{undefined,nodepth}
}
).
-define(POS(Ch), Ch - $a + 1).
new(Key,V) -> set(Key,V,?NEW).
set([H],V,Trie) ->
Pos = ?POS(H),
{_,SubTrie} = element(Pos,Trie),
setelement(Pos,Trie,{V,SubTrie});
set([H|T],V,Trie) ->
Pos = ?POS(H),
{SubKey,SubTrie} = element(Pos,Trie),
case SubTrie of
nodepth -> setelement(Pos,Trie,{SubKey,set(T,V,?NEW)});
SubTrie -> setelement(Pos,Trie,{SubKey,set(T,V,SubTrie)})
end.
get([H],Trie) ->
{Val,_} = element(?POS(H),Trie),
Val;
get([H|T],Trie) ->
case element(?POS(H),Trie) of
{_,nodepth} -> undefined;
{_,SubTrie} -> get(T,SubTrie)
end.

Resources