What is the double brace syntax in ASN.1? - parsing

I'm reading the PKCS #7 ASN.1 definition, and came across this type. I can't seem to find out what {{Authenticated}} is doing in this code, or what production this would be called. I've also seen as {{...}} in the PKCS #8 standard.
-- ATTRIBUTE information object class specification
ATTRIBUTE ::= CLASS {
&derivation ATTRIBUTE OPTIONAL,
&Type OPTIONAL, -- either &Type or &derivation required
&equality-match MATCHING-RULE OPTIONAL,
&ordering-match MATCHING-RULE OPTIONAL,
&substrings-match MATCHING-RULE OPTIONAL,
&single-valued BOOLEAN DEFAULT FALSE,
&collective BOOLEAN DEFAULT FALSE,
&dummy BOOLEAN DEFAULT FALSE,
-- operational extensions
&no-user-modification BOOLEAN DEFAULT FALSE,
&usage AttributeUsage DEFAULT userApplications,
&id OBJECT IDENTIFIER UNIQUE
}
WITH SYNTAX {
[SUBTYPE OF &derivation]
[WITH SYNTAX &Type]
[EQUALITY MATCHING RULE &equality-match]
[ORDERING MATCHING RULE &ordering-match]
[SUBSTRINGS MATCHING RULE &substrings-match]
[SINGLE VALUE &single-valued]
[COLLECTIVE &collective]
[DUMMY &dummy]
[NO USER MODIFICATION &no-user-modification]
[USAGE &usage]
ID &id
}
Authenticated ATTRIBUTE ::= {
contentType |
messageDigest |
-- begin added for VCE SCEP-support
transactionID |
messageType |
pkiStatus |
failInfo |
senderNonce |
recipientNonce,
-- end added for VCE SCEP-support
..., -- add application-specific attributes here
signingTime
}
SignerInfoAuthenticatedAttributes ::= CHOICE {
aaSet [0] IMPLICIT SET OF AttributePKCS-7 {{Authenticated}},
aaSequence [2] EXPLICIT SEQUENCE OF AttributePKCS-7 {{Authenticated}}
-- Explicit because easier to compute digest on sequence of attributes and then reuse
-- encoded sequence in aaSequence.
}
-- Also defined in X.501
-- Redeclared here as a parameterized type
AttributePKCS-7 { ATTRIBUTE:IOSet } ::= SEQUENCE {
type ATTRIBUTE.&id({IOSet}),
values SET SIZE (1..MAX) OF ATTRIBUTE.&Type({IOSet}{#type})
}
-- Inlined from PKCS5v2-0 since it is the only thing imported from that module
-- AlgorithmIdentifier { ALGORITHM-IDENTIFIER:InfoObjectSet } ::=
AlgorithmIdentifier { TYPE-IDENTIFIER:InfoObjectSet } ::=
SEQUENCE {
-- algorithm ALGORITHM-IDENTIFIER.&id({InfoObjectSet}),
algorithm TYPE-IDENTIFIER.&id({InfoObjectSet}),
-- parameters ALGORITHM-IDENTIFIER.&Type({InfoObjectSet}
parameters TYPE-IDENTIFIER.&Type({InfoObjectSet}
{#algorithm}) OPTIONAL }
-- Private-key information syntax
PrivateKeyInfo ::= SEQUENCE {
version Version,
-- privateKeyAlgorithm AlgorithmIdentifier {{PrivateKeyAlgorithms}},
privateKeyAlgorithm AlgorithmIdentifier {{...}},
privateKey PrivateKey,
attributes [0] Attributes OPTIONAL }

There is no ASN.1 item called double brace. Each single brace (even when nested) is a separate token. Since the definition of AttributePKCS-7 is not given here, I am guessing that it is likely a parametrized definition that takes an Information Object Set as the parameter. The outer pair of braces would be an indication of parameter substitution while the inner pair of braces indicates that Authenticated is an Information Object Set (which is used as the parameter). The purpose of the information object set is to restrict the possible values of certain fields to those contained in the object set. You will need to look at the definition of AttributePKCS-7 to see what components are being restricted by the object set.
As for the {{...}}, this is similar to the above except that the object set is an empty extensible object set (indicated by the {...}) which is being used as a parameter (indicated by the outer pair of braces).

Related

Make lexer consider parser before determining tokens?

I'm writing a lexer and parser in ocamllex and ocamlyacc as follows. function_name and table_name are same regular expression, i.e., a string containing only english alphabets. The only way to determine if a string is function_name or table_name is to check its surroundings. For example, if such a string is surrounded by [ and ], then we know that it is a table_name. Here is the current code:
In lexer.mll,
... ...
let function_name = ['a'-'z' 'A'-'Z']+
let table_name = ['a'-'z' 'A'-'Z']+
rule token = parse
| function_name as s { FUNCTIONNAME s }
| table_name as s { TABLENAME s }
... ...
In parser.mly:
... ...
main:
| LBRACKET TABLENAME RBRACKET { Table $2 }
... ...
As I wrote | function_name as s { FUNCTIONNAME s } before | table_name as s { TABLENAME s }, the above code failed to parse [haha]; it firstly considered haha as a function_name in the lexer, then it could not find any corresponding rule for it in the parser. If it could consider haha as a table_name in the lexer, it would match [haha] as a table in the parser.
One workaround for this is to be more precise in the lexer. For example, we define let table_name_with_brackets = '[' ['a'-'z' 'A'-'Z']+ ']' and | table_name_with_brackets as s { TABLENAMEWITHBRACKETS s } in the lexer. But, I would like to know if there is any other options. Is it not possible to make lexer and parser work together to determine the tokens and the reduction?
You should avoid trying to get the lexer to do the parser's work. The lexer should just identify lexemes; it should not try to figured out where a lexeme fits into the syntax. So in your (simplified) example, there should be only one lexical type, name. The parser will figure it out from there.
But it seems, from the comments, that in the unsimplified original, the two patterns are overlapping rather than identical. That's more annoying, although it's only slightly more complicated. Basically, you need to separate out the common pattern as one lexical type, and then add the additional matches as one or two other lexical types (depending on whether or not one pattern is a strict superset of the other).
That might not be too difficult, depending on the precise relationship between the two patterns. You might be able to find a very simple solution by writing the patterns in the correct order, for example, because of the longest match rule:
If several regular expressions match a prefix of the input, the “longest match” rule applies: the regular expression that matches the longest prefix of the input is selected. In case of tie, the regular expression that occurs earlier in the rule is selected.
Most of the time, that's all it takes: first define the intersection of the two patterns as a based lexeme, and then add the full lexical patterns of each contextual type to provide additional matches. Your parser will then have to match name | function_name in one context and name | table_name in the other context. But that's not too bad.
Where it will fail is when an input stream cannot be unambiguously divided in lexemes. For example, suppose that in a function context, a name could include a ? character, but in a table context the ? is a valid postscript operator. In that case, you have to actively prevent foo? from being analysed as a single token in the table context, which means that the lexer does have to be aware of parser context.

Is this double constraint syntax from PKCS#9 legal?

In the PKCS#9 standard they have the following assignment. The first line defines a type PrintableString, that can only two characters long, and must be one of the two letter country acronyms defined in ISO/IEC3166. The syntax used for defining this constraint is two separate constraints that follow one another, however looking at the ASN.1 standard, there can only be one "root" constraint. Is the syntax used in the PKCS#9 standard incorrect?
countryOfResidence ATTRIBUTE ::= {
WITH SYNTAX PrintableString (SIZE(2))(CONSTRAINED BY {
-- Must be a two-letter country acronym in accordance with
-- ISO/IEC 3166 --})
EQUALITY MATCHING RULE caseIgnoreMatch
ID pkcs-9-at-countryOfResidence
}
ATTRIBUTE ::= CLASS {
&derivation ATTRIBUTE OPTIONAL,
&Type OPTIONAL, -- either &Type or &derivation required
&equality-match MATCHING-RULE OPTIONAL,
&ordering-match MATCHING-RULE OPTIONAL,
&substrings-match MATCHING-RULE OPTIONAL,
&single-valued BOOLEAN DEFAULT FALSE,
&collective BOOLEAN DEFAULT FALSE,
&dummy BOOLEAN DEFAULT FALSE,
-- operational extensions
&no-user-modification BOOLEAN DEFAULT FALSE,
&usage AttributeUsage DEFAULT userApplications,
&id OBJECT IDENTIFIER UNIQUE
}
WITH SYNTAX {
[SUBTYPE OF &derivation]
[WITH SYNTAX &Type]
[EQUALITY MATCHING RULE &equality-match]
[ORDERING MATCHING RULE &ordering-match]
[SUBSTRINGS MATCHING RULE &substrings-match]
[SINGLE VALUE &single-valued]
[COLLECTIVE &collective]
[DUMMY &dummy]
[NO USER MODIFICATION &no-user-modification]
[USAGE &usage]
ID &id
}
ASN.1 Production (Found in ISO/IEC 8824-1:2015 / Rec. ITU-T X.680 (08/2015) page. 87)
ConstrainedType ::=
Type Constraint
| TypeWithConstraint
Constraint ::= "(" ConstraintSpec ExceptionSpec ")"
ConstraintSpec ::=
SubtypeConstraint
| GeneralConstraint
Constraints can be serially applied. It is legitimate. The "Type" that you are adding a constraint to can itself be a "ConstrainedType".

Explain the syntax of reserved.get(t.value,'ID') in lex.py

Code taken from ply.lex documentation: http://www.dabeaz.com/ply/ply.html#ply_nn6
reserved = {
'if' : 'IF',
'then' : 'THEN',
'else' : 'ELSE',
'while' : 'WHILE',
...
}
tokens = ['LPAREN','RPAREN',...,'ID'] + list(reserved.values())
def t_ID(t):
r'[a-zA-Z_][a-zA-Z_0-9]*'
t.type = reserved.get(t.value,'ID') # Check for reserved words
return t
For the reserved words, we need to change the token type. Doing reserved.get() by passing to it the t.value is understandable. Now it should return the entities in the second column in the reserved specification.
But why are we passing to it ID? What does it mean and what purpose does it solve?
The second parameter specifies the value to return should the key not exist in the dictionary. So in this case, if the value of t.value does not exist as a key in the reserved dictionary, the string 'ID' will be returned instead.
In other words, a.get(b, c) when a is a dict is roughly equivalent to a[b] if b in a else c (except it is presumably more efficient, as it would only look up the key once in the success case).
See the Python documentation for dict.get().

Let bindings support punctuation enclosed in double backticks, but types don't?

Using F# in Visual Studio 2012, this code compiles:
let ``foo.bar`` = 5
But this code does not:
type ``foo.bar`` = class end
Invalid namespace, module, type or union case name
According to section 3.4 of the F# language specification:
Any sequence of characters that is enclosed in double-backtick marks (````),
excluding newlines, tabs, and double-backtick pairs themselves, is treated
as an identifier.
token ident =
| ident-text
| `` [^ '\n' '\r' '\t']+ | [^ '\n' '\r' '\t'] ``
Section 5 defines type as:
type :=
( type )
type -> type -- function type
type * ... * type -- tuple type
typar -- variable type
long-ident -- named type, such as int
long-ident<types> -- named type, such as list<int>
long-ident< > -- named type, such as IEnumerable< >
type long-ident -- named type, such as int list
type[ , ... , ] -- array type
type lazy -- lazy type
type typar-defns -- type with constraints
typar :> type -- variable type with subtype constraint
#type -- anonymous type with subtype constraint
... and Section 4.2 defines long-ident as:
long-ident := ident '.' ... '.' ident
As far as I can tell from the spec, types are named with long-idents, and long-idents can be idents. Since idents support double-backtick-quoted punctuation, it therefore seems like types should too.
So am I misreading the spec? Or is this a compiler bug?
It definitely looks like the specification is not synchronized with the actual implementation, so there is a bug on one side or the other.
When you use identifier in double backticks, the compiler treats it as a name and simply generates type (or member) with the name you specified in backticks. It does not do any name mangling to make sure that the identifier is valid type/member name.
This means that it is not too surprising that you cannot use identifiers that would clash with some standard meaning in the compiled code. In your example, it is dot, but here are a few other examples:
type ``Foo.Bar``() = // Dot is not allowed because it represents namespace
member x.Bar = 0
type ``Foo`1``() = // Single backtick is used to compile generic types
member x.Bar = 0
type ``Foo+Bar``() = // + is used in the name of a nested type
member x.Bar = 0
The above examples are not allowed as type names (because they clash with some standard meaning), but you can use them in let-bindings, because there are no such restrictions on variable names:
let ``foo`1`` = 0
let ``foo.bar`` = 2
let ``foo+bar`` = 1
This is definitely something that should be explained in the documentation & the specification, but I hope this helps to clarify what is going on.

Lexing and Parsing CSS hierarchy

.someDiv { width:100%; height:100%; ... more properties ... }
How would I make up a rule in my parser that will match the string above?
Seems rather impossible for me, since you cant define an unlimited amount of properties in the rule? Could someone please clarify, how you would do such a thing with FsLex and FsYacc?
If you're using FsLex and FsYacc then you can parse the properties inside { ... } as a list of properties. Assuming you have a lexer that properly recognizes all the special characters and you have a rule that parses individual property, you can write something like:
declaration:
| navigators LCURLY propertyList RCURLY { Declaration($1, $3) }
| navigators LCURLY RCURLY { Declaration($1, []) }
propertyList:
| property SEMICOLON propertyList { $1::$2 }
| property { [$1] }
property:
| IDENTIFIER COLON values { Property($1, $3) }
The declaration rule parses the entire declaration (you'll need to write parser for various navigators that can be used in CSS such as div.foo #id etc.) The propertyList rule parses one property and then calls itself recursively to parse multiple properties.
The values constructed on the right-hand-side are going to be a list of values representing individual property. The property rule parses individual property assignment e.g. width:100% (but you'll need to finish parsing of values, because that can be a list or a more compelx expression).

Resources