Antlr3 - HIDDEN token in the parser - parsing

Can you use a token defined in the lexer in a hidden channel in a single rule of the parser as if it were a normal token?
The generated code is Java...
thanks

When you construct a CommonTokenStream, you tell it what channel to use. Tokens on other channels will not be visible to the parser.

Yes you can use a hidden token in the Parser.
We do this all the time. The only problem is that you need to know when to look for it.
Antlr has a few pieces of terminology that it uses.
A Hidden token just travels on a separate stream. The user can always check for hidden tokens by calling getHiddenAfter or getHiddenBefore on a currently matched token.
Note: There may be more than one token hidden, before or after, a matched token so you should iterate through them.
A Discarded token is actually removed when you tell the lexer to discard it. It will never be seen by you again.

Related

What is an opaque token?

And what does it mean that they are in a "proprietary format"? I am reading about JWT refresh tokens and they are opaque tokens, but I don't understand the term.
A JWT has readable content, as you can see for example on https://jwt.io/.
Everyone can decode the token and read the information in it. The format is documented in RFC 7519.
An opaque token on the other hand has a format that is not intended to be read by you. Only the issuer knows the format.
The meaning of the word already gives a hint:
opaque
/ə(ʊ)ˈpeɪk/
adjective
not able to be seen through; not transparent.
Here's a quote from https://auth0.com/docs/tokens:
Opaque tokens: Tokens in a proprietary format that typically contain some identifier to information in a server’s persistent storage. To validate an opaque token, the recipient of the token needs to call the server that issued the token.
A "opaque JWT refresh token" is a contradiction as per definition above. What actually is meant here is, that in some JWT frameworks only the authentication token is a JWT, but as refresh token they use opaque tokens.
Here, the term "opaque" means the string (that serves as token) is like a reference (in OOPs), or pointer (in C), or foreign keys (in relational DBs).
i.e. You need an external content to resolve it.
Simple versus Composite:
The string is a "simple" string, as opposed to JWS, where is "composite"; It has parts "inside" it.
Inside versus Outside:
You can extract a payload (with claims, etc) out of it without referring to an external server or storage, "outside" this string.
Since an opaque token is a simple string it is just a reference, hence, naturally, its format is entirely arbitrarily determined by the server that issues it (hence the term "proprietary format"). The token string is determined at the time of creation of the underlying (referred-to) content, i.e. when it is paired (associated) with the contents that this token (as the reference or foreign key) refers to.

Node JWT library njwt verifies a token even though it differs from original

Trying out Stormpath's njwt package for handling JWTs as per this answer by #robertjd.
While trying to see what the various error messages are when verify()ing a token, changed a single character (the last one) of a token expecting the verification to fail, but to my surprise it passed and showed the contents of the token correctly.
More precisely, I changed the last character from an A to a B. This seems to not be the general case, since making other single character changes leads to the expected JwsParseError with message Signature verification failed. I tried this with both the default HS256 and with HS512.
Is that behavior legitimate for JWTs i.e. that the last char is redundant and doesn't affect the verification checksum? Or is it an issue in the njwt library?
Sub-question to njwt's maintainers: in getting back the token after verification, the header's algo property always has a value of none. I see in your source code that you explicitly set it so. Why is that?
Update: regarding the sub-question for the "algo": "none" in njwt's callback of verify(), it seems that "none" signifies that the digital signature is not included, which is the case when we get the token in the callback. Correct me if I'm wrong.
This is due to the base64 (technically, base64url) encoding, which is defined in RFC 4648. The low-order bits of the final (non-padding) character in the encoded data might not be used, so changing from A to B might not have a material effect on the decoded value.
Try changing any character but the last :)

Hacked. How can a hacker get to the individual-url (token) of my users?

How would you think a hacker is doing the following, and how would you prevent (looking for some helpfull links, keywords or assessment of the sitution)?
Their is a website where users can register and get an invitation Email. The invaitation link (https) contains the token. It looks like 'https://www.example.com/token/123456' (123456 is the token).
It seems that a day after my users clicked on this link, someone else uses the same links too.
How is this possible and how can I prevent this sort of hack?
Thanks
EDIT:
Sorry I should have given more information. I can eliminate the opinion that it is not just a try of random token variations. Why? The exact token is used a day after one of the user had use the link. The token is a hash token of more that 20 characters.
They can just run a script to try any numerical value in the token value.
it's easy. How long is your token? I would also suggest using a hash token rather than a simple numerical one to limit automatic processing, as the "hack" is scripting to try a number, gets a result - store the result, and then number = number + 1;
Edit: What evidence do you have you've been hacked? What happens in your script once someone has clicked the token link?
A simple logic to apply could be:
define a string pattern. like: secretconstant%email
hash the string and now you have the token (and save it)
create your invitation url with the token
If someone call your service with random token you can reject them because your information system don't have saved that token.
Then if you have the token you must discard it so the link will not be valid anymore.
You could check also if the email used in the registration is the same used for calculate the token.. so you may block the registration!

Java CUP and JFlex Interaction

I am considering to use the CUP parser generator for a project. In order to correctly parse some constructs of the language I am going to be compiling, I will need the lexer (generated by JFlex) to use information from the symbol table (not parse table -- I mean the table in which I will be storing information about identifiers) of the parser to generate the correct token type when its next_token() method is invoked. Since information in the symbol table depends statically on the program text, this will only work if the next_token() method is invoked "in lockstep" with the parser. In other words, this will work if the parser calls the lexer whenever it needs another token, but not if (for example) there is a parellel thread that is invoking the lexer and buffering tokens in a queue.
The question is thus: How does CUP call the lexer? Does it call it whenever it needs the next token? I could of course just write a CUP grammar specification and inspect the generated parser's source file to see what's going on, but that may be more work than necessary. I couldn't find any information on this on relevant websites.
Thanks a lot for any help you can offer!
I finished implementing my parser and scanner a while ago. Here's what I found:
CUP does indeed invoke the scanner as and when needed. It has always buffered one more token ahead of what has been recognized so far (the lookahead token). There is no fancy buffering of tokens ahead of time.
That being said, it can be tricky to set lexer states during parsing, as this can give rise to many grammar conflicts. I guess this is to do with the way CUP represents semantic actions embedded within productions. This forced me to abandon my initial design nonetheless, but not for the reason I was dreading.
Hope this helps someone!
Maybe this reply could be too late for you, but it could be useful for other users. The first thing to know is that a Parser couldn't do anything without a Scanner. As a matter of fact, the first parameter of the constructor of the parser is the scanner.
After the compilation of the .cup file, you will have, as output, a .java file that has the same name of the .cup one. Let's suppose its name is Parser.
So in the main class of your project you have to add the following lines:
TmpParser p = new TmpParser (new Scanner (new Reader (s)));
p.parse();
You should post this code into a try-catch block. With the method parse, the Parser starts its action and also it calls the next_token method of the Scanner, in order to recognize the token and verify if the grammar rules you wrote are right or not.
I don't know how late I'm to answer this question,
But I'm building 1 parser as a part of my course work..
I'm Using Lex and CUP for lexer and Parser, respectively. I'm also including my main class which calls parser which scans as in when required on get Token call
So My driver class will be :
// construct the lexer,
Yylex lexer = new Yylex(new FileReader(filename));
// create the parser
Parser parser = new Parser(lexer);
// and parse
Parser intern calls:
Parser.parse() {
...
this.cur_token = this.scan();
...
}
public Symbol scan() throws Exception {
Symbol sym = this.getScanner().next_token();
return sym != null ? sym : this.getSymbolFactory().newSymbol("END_OF_FILE", this.EOF_sym());
}
parser.parse();

What are the characteristics of an OAuth token?

How many characters long can an oauth access token and oauth access secret be and what are the allowed characters? I need to store them in a database.
I am not sure there are any explicit limits. The spec doesn't have any.
That said, OAuth tokens are often passed as url parameters and so have some of the same limitations. ie need to be properly encoded, etc.
OAuth doesn't specify the format or content of a token. We simply use encrypted name-value pairs as token. You can use any characters in token but it's much easier to handle if the token is URL-safe. We achieve this by encoding the ciphertext with an URL-safe Base64.
As most people already pointed out. The OAuth specification doesn't give you exact directions but they do say...
cited from: https://datatracker.ietf.org/doc/html/draft-hammer-oauth-10#section-4.9
"Servers should be careful to assign
shared-secrets which are long enough,
and random enough, to resist such
attacks for at least the length of
time that the shared-secrets are
valid."
"Of course, servers are urged to err
on the side of caution, and use the
longest secrets reasonable."
on the other hand, you should consider the maximum URL length of browsers:
see: http://www.boutell.com/newfaq/misc/urllength.html
If you read the spec, it says,
The authorization server issues the registered client a client
identifier - a unique string representing the registration
information provided by the client. The client identifier is not a
secret; it is exposed to the resource owner, and MUST NOT be used
alone for client authentication. The client identifier is unique to
the authorization server.
The client identifier string size is left undefined by this
specification. The client should avoid making assumptions about the
identifier size. The authorization server SHOULD document the size
of any identifier it issues.
Second, Access Token should be sent as header, not as a URL param.
Authorization: Bearer < token>.
An OAuth token is conceptually an arbitrary-sized sequence of bytes, not characters. In URLs, it gets encoded using standard URL escaping mechanisms:
unreserved = ALPHA, DIGIT, '-', '.', '_', '~'
Everything not unreserved gets %-encoded.
I'm not sure whether you just talk about the oauth_token parameter that gets passed around. Usually, additional parameters need to be stored and transmitted as well, such as oauth_token_secret, oauth_signature, etc. Some of them have different data types, for example, oauth_timestamp is an integer representing seconds since 1970 (encoded in decimal ASCII digits).
Valid chars for OAuth token are limited by HTTP header value restrictions as OAuth token is frequently sent in HTTP header "Authorization".
Valid chars for HTTP headers are specified by https://www.rfc-editor.org/rfc/rfc7230#section-3.2.6. Alternatively you may check HTTP header validating code of some popular HTTP client libs, for example see Headers.checkNameAndValue() util of OkHttp framework: https://github.com/square/okhttp/blob/master/okhttp/src/main/java/okhttp3/Headers.java
And this is not all. I wouldn't include HTTP header separator (; and many others) and whitespace symbols (' ' and '\t') and double quote (") (see https://www.rfc-editor.org/rfc/rfc7230#section-3.2.6) as it would require to escape OAuth token before using in HTTP header. Frequently tokens are used by humans in curl test requests, and so good token generators don't add such characters. But you should check what characters may produce Oauth token generator with which your service is working before making any assumptions.
To be specific, even if Oauth spec doesn't say anything, if you are using java and mysql then it will be 16 characters as we generally generate the tokens using UUID and store it as BINARY(16) in the database. I know these details as I have recently done the development using OAuth.

Resources