I'm trying to send a message with metadata through the Erlang client, and I can't understand how should I set custom application headers in the message's basic properties record. I've tried all these options with no success:
#'P_basic'{headers = [{<<"key">>, <<"value">>}]}
#'P_basic'{headers = [{"key", <<"value">>}]}
#'P_basic'{headers = [{key, <<"value">>}]}
It seems that headers use some special data structure, an AMQP table - but I couldn't find any documentation or examples on this matter.
What is a correct way to send a message with headers?
Update: A stack trace (actually, it's not relevant - the cause of that error is the silently closed channel) and the source code.
Do you get any errors trying to send messages with headers?
Did you try to use string type for both key and value?
#'P_basic'{headers = [{"key", "value"}]}
Update: I investigated the source code of the package rabbit_common and I found out something about headers' type. There is a type headers() in rabbit_basic.erl:
-type(headers() :: rabbit_framing:amqp_table() | 'undefined').
And there are definition of the types in module rabbit_framing_amqp:
-type(amqp_field_type() ::
'longstr' | 'signedint' | 'decimal' | 'timestamp' |
'table' | 'byte' | 'double' | 'float' | 'long' |
'short' | 'bool' | 'binary' | 'void' | 'array').
-type(amqp_property_type() ::
'shortstr' | 'longstr' | 'octet' | 'shortint' | 'longint' |
'longlongint' | 'timestamp' | 'bit' | 'table').
-type(amqp_table() :: [{binary(), amqp_field_type(), amqp_value()}]).
-type(amqp_array() :: [{amqp_field_type(), amqp_value()}]).
-type(amqp_value() :: binary() | % longstr
integer() | % signedint
{non_neg_integer(), non_neg_integer()} | % decimal
amqp_table() |
amqp_array() |
byte() | % byte
float() | % double
integer() | % long
integer() | % short
boolean() | % bool
binary() | % binary
'undefined' | % void
non_neg_integer() % timestamp
).
So the header is a tuple of three items (not two), which are binary, type of value, value. So you have to define each header the way like that:
BooleanHeader = {<<"my-boolean">>, bool, true}.
StringHeader = {<<"my-string">>, longstr, <<"value">>}.
IntHeader = {<<"my-int">>, long, 1000}.
Related
I created a discriminated union which has three possible options:
type tool =
| Hammer
| Screwdriver
| Nail
I would like to match a single character to one tool option. I wrote this function:
let getTool (letter: char) =
match letter with
| H -> Tool.Hammer
| S -> Tool.Screwdriver
| N -> Tool.Nail
Visual Studio Code throws me now the warning that only the first character will be matched and that the other rules never will be.
Can somebody please explain this behaviour and maybe provide an alternative?
That's not how characters are denoted in F#. What you wrote are variable names, not characters.
To denote a character, use single quotes:
let getTool (letter: char) =
match letter with
| 'H' -> Tool.Hammer
| 'S' -> Tool.Screwdriver
| 'N' -> Tool.Nail
Apart from the character syntax (inside single quotes - see Fyodor's response), you should handle the case when the letter is not H, S or N, either using the option type or throwing an exception (less functional but enough for an exercise):
type Tool =
| Hammer
| Screwdriver
| Nail
module Tool =
let ofLetter (letter: char) =
match letter with
| 'H' -> Hammer
| 'S' -> Screwdriver
| 'N' -> Nail
| _ -> invalidArg (nameof letter) $"Unsupported letter '{letter}'"
Usage:
> Tool.ofLetter 'S';;
val it : Tool = Screwdriver
> Tool.ofLetter 'C';;
System.ArgumentException: Unsupported letter 'C' (Parameter 'letter')
I'm looking for a way to prevent KEYWORDS matching at a place where those KEYWORDS are not expected.
Take a look at the following grammar. Both 'APPLY' and 'OUTPUT' are keywords.
'OUTPUT' has an argument that contains any characters.
Everything works fine but if this argument contains the word APPLY, an error is raised (extraneous input APPLY expecting RULE_END).
Is there a way to solve this issue?
Thanks.
Sample text
APPLY, 'an id' $
OUTPUT, A text $
OUTPUT, A text with the word APPLY $
DSL
grammar org.xtext.example.mydsl.MyDsl with org.eclipse.xtext.common.Terminals
generate myDsl "http://www.xtext.org/example/mydsl/MyDsl"
Model:
statement+=Statement*;
Statement:
ApplyStatement | OutputStatement;
OutputStatement:
'OUTPUT' ',' out+=EXTENDLABEL* end=END;
ApplyStatement:
'APPLY' ',' id=LABELIDENTIFIER end=END;
terminal fragment LETTER:
'A' | 'B' | 'C' | 'D' | 'E' | 'F' | 'G' | 'H' | 'I' | 'J' | 'K' | 'L' | 'M' | 'N' | 'O' | 'P' | 'Q' | 'R' | 'S' | 'T'
| 'U' | 'V' | 'W' | 'X' | 'Y' | 'Z' | 'a' | 'b' | 'c' | 'd' | 'e' | 'f' | 'g' | 'h' | 'i' | 'j' | 'k' | 'l' | 'm' |
'n' | 'o' | 'p' | 'q' | 'r' | 's' | 't' | 'u' | 'v' | 'w' | 'x' | 'y' | 'z';
terminal LABELIDENTIFIER:
"'"->"'";
terminal EXTENDLABEL:
(LETTER) (LETTER)*;
terminal END:
'$' !('\n' | '\r')*;
I see a few different ways your issue can be handled. First of all, you could escape the keywords appearing, e.g. the Xbase language uses the '^' character as an escape character; if for any reason there is a problem with writing a keyword, you can prefix it with '^', and it would work. Similarly, if you would put your string inside specific symbols, e.g. apostrophes, it would help a lot. Of course, these solutions require to change your language itself, which you may or may not do.
You might also replace your EXTENDLABEL terminal with a datatype rule. This allows greater flexibility with regards to conflict resolution; worst case you could add the language keywords as options. I was suggested this route by a tangentially related case in the Eclipse forums.
an other solution is to change the ID of your token before that your parser used it. Token are provided by the lexer and your parser will take these tokens in input to produce your AST. So the idea is to change the tokens before to pass them to your parser.
To do it you need to declare your own parser:
#Override
public Class<? extends IParser> bindIParser() {
return ModelParser.class;
}
Note : your parser will extends the generated parser of your grammar.
Then you need to override the following method to introduce your own TokenSource:
override protected XtextTokenStream createTokenStream(TokenSource tokenSource) {
return new TokenSource(tokenSource, getTokenDefProvider());
}
You own token source need to extend 'XtextTokenStream'.
After you need to override the method 'LT' as following :
override LT(int k) {
var Token token = super.LT(k)
if(token != null && token.text != null) token.tokenOverride(k);
token
}
Then you just need to change the ID :
def void tokenOverride(Token token, int index){
switch (token.text){
case "APPLY" : {
overrideType(t_parameter, InternalModelParser.RULE_ID);
}
}
}
def void overrideType(Token token, int i) {
token.type = i
}
Note : don't forget to add your condition before to change the ID of your token, in this example all token 'APPLY' will become an ID.
And of course inside the switch you can use the ID of the token 'APPLY' instead the text of your token.
Can xtext lexer emit whatever it can't recognize as a special token? Like
terminal USE: 'use';
terminal SELECT: 'select';
terminal OTHER_KEYWORDS: /* not 'use' nor 'select' */;
I write grammar like
terminal fragment A: 'a' | 'A';
...
terminal fragment Z: 'z' | 'Z';
terminal fragment LETTER: 'a'..'z' | 'A'..'Z';
terminal fragment A_: 'b'..'z' | 'B'..'Z';
...
terminal fragment Z_: 'a'..'y' | 'A'..'Y';
terminal fragment SU_: 'a'..'r' | 't' | 'v'..'z' | 'A'..'R' | 'T' | 'V'..'Z';
terminal OTHER_KEYWORDS:
SU_ LETTER* |
U S_ LETTER* |
U S E_ LETTER* |
S E_ LETTER* |
S E L_ LETTER* |
S E L E_ LETTER* |
S E L E C_ LETTER* |
S E L E C T_ LETTER*
;
The reason I want to do this is because antlr will failed on that kind of typo and failed for all the parsing after that. If there is another could avoid failed for parsing then I don't need to use this error prone and looks stupid way to solve that.
I found out simply using ID to consume the other garbage in input stream would work.
terminal USE: 'use';
terminal SELECT: 'select';
...
terminal TYPO: ID;
So if I have us e, us will be parsed as an ID; if I have use, use will be parsed as a USE. The order of terminal tokens is important.
I begin with an otherwise well formed (and well working) grammar for a language. Variables,
binary operators, function calls, lists, loops, conditionals, etc. To this grammar I'd like to add what I'm calling the object construct:
object
: object_name ARROW more_objects
;
more_objects
: object_name
| object_name ARROW more_objects
;
object_name
: IDENTIFIER
;
The point is to be able to access scalars nested in objects. For example:
car->color
monster->weapon->damage
pc->tower->motherboard->socket_type
I'm adding object as a primary_expression:
primary_expression
: id_lookup
| constant_value
| '(' expression ')'
| list_initialization
| function_call
| object
;
Now here's a sample script:
const list = [ 1, 2, 3, 4 ];
for var x in list {
send "foo " + x + "!";
}
send "Done!";
Prior to adding the nonterminal object as a primary_expression everything is sunshine and puppies. Even after I add it, Bison doesn't complain. No shift and/or reduce conflicts reported. And the generated code compiles without a sound. But when I try to run the sample script above, I get told error on line 2: Attempting to use undefined symbol '{' on line 2.
If I change the script to:
var list = 0;
for var x in [ 1, 2, 3, 4 ] {
send "foo " + x + "!";
}
send "Done!";
Then I get error on line 3: Attempting to use undefined symbol '+' on line 3.
Clearly the presence of object in the grammar is messing up how the parser behaves [SOMEhow], and I feel like I'm ignoring a rather simple principle of language theory that would fix this in a jiff, but the fact that there aren't any shift/reduce conflicts has left me bewildered.
Is there a better way (grammatically) to write these rules? What am I missing? Why aren't there any conflicts?
(And here's the full grammar file in case it helps)
UPDATE: To clarify, this language, which compiles into code being run by a virtual machine, is embedded into another system - a game, specifically. It has scalars and lists, and there are no complex data types. When I say I want to add objects to the language, that's actually a misnomer. I am not adding support for user-defined types to my language.
The objects being accessed with the object construct are actually objects from the game which I'm allowing the language processor to access through an intermediate layer which connects the VM to the game engine. This layer is designed to decouple as much as possible the language definition and the virtual machine mechanics from the implementation and details of the game engine.
So when, in my language I write:
player->name
That only gets codified by the compiler. "player" and "name" are not traditional identifiers because they are not added to the symbol table, and nothing is done with them at compile time except to translate the request for the name of the player into 3-address code.
It seems you are doing a classical error when using direct strings in the yacc source file. Since you are using a lexer, you can only use token names in yacc source files. More on this here
So I spent a reasonable amount of time picking over the grammar (and the bison output) and can't see what is obviously wrong here. Without having the means to execute it, I can't easily figure out what is going on by experimentation. Therefore, here are some concrete steps I usually go through when debugging grammars. Hopefully you can do any of these you haven't already done and then perhaps post follow-ups (or edit your question) with any results that might be revealing:
Contrive the smallest (in terms of number of tokens) possible working input, and the smallest possible non-working inputs based on the rules you expect to be applied.
Create a copy of the grammar file including only the troublesome rules and as few other supporting rules as you can get away with (i.e. you want a language that only allows construction of sequences consisting of the object and more_object rules, joined by ARROW. Does this work as you expect?
Does the rule in which it is nested work as you expect? Try replacing object with some other very simple rule (using some tokens not occuring elsewhere) and seeing if you can include those tokens without it breaking everything else.
Run bison with --report=all. Inspect the output to try to trace the rules you've added and the states that they affect. Try removing those rules and repeat the process - what has changed? This is extremely time consuming often, and is a giant pain, but it's a good last resort. I recommend a pencil and some paper.
Looking at the structure of your error output - '+' is being recognised as an identifier token, and is therefore being looked up as a symbol. It might be worth checker your lexer to see how it is processing identifier tokens. You might just accidentally be grabbing too much. As a further debugging technique, you might consider turning some of those token literals (e.g. '+', '{', etc) into real tokens so that bison's error reporting can help you out a little more.
EDIT: OK, the more I've dug into it, the more I'm convinced that the lexer is not necessarily working as it should be. I would double-check that the stream of tokens you are getting from yylex() matches your expectations before proceeding any further. In particular, it looks like a bunch of symbols that you consider special (e.g. '+' and '{') are being captured by some of your regular expressions, or at least are being allowed to pass for identifiers.
You don't get shift/reduce conflicts because your rules using object_name and more_objects are right-recursive - rather than the left-recursive rules that Yacc (Bison) handles most naturally.
On classic Yacc, you would find that you can run out of stack space with deep enough nesting of the 'object->name->what->not' notation. Bison extends its stack at runtime, so you have to run out of memory, which is a lot harder these days than it was when machines had a few megabytes of memory (or less).
One result of the right-recursion is that no reductions occur until you read the last of the object names in the chain (or, more accurately, one symbol beyond that). I see that you've used right-recursion with your statement_list rule too - and in a number of other places too.
I think your principal problem is that you failed to define a subtree constructor
in your object subgrammar. (EDIT: OP says he left the semantic actions for
object out of his example text. That doesn't change the following answer).
You probably have to lookup up the objects in the order encountered, too.
Maybe you intended:
primary_expression
: constant_value { $$ = $1; }
| '(' expression ')' { $$ = $2; }
| list_initialization { $$ = $1; }
| function_call { $$ = $1; }
| object { $$ = $1; }
;
object
: IDENTIFIER { $$ = LookupVariableOrObject( yytext ); }
| object ARROW IDENTIFIER { $$ = LookupSubobject( $1, yytext ); }
;
I assume that if one encounters an identifier X by itself, your default interpretation
is that it is a variable name. But, if you encounter X -> Y, then even if X
is a variable name, you want the object X with subobject Y.
What LookupVarOrObject does is to lookup the leftmost identifier encountered to see if it is variable
(and return essentially the same value as idlookup which must produce an AST node of type AST_VAR),
or see if it is valid object name, and return an AST node marked as an AST_OBJ,
or complain if the identifier isn't one of these.
What LookupSuboject does, is to check its left operand to ensure it is an AST_OBJ
(or an AST_VAR whose name happens to be the same as that of an object).
and complain if it is not. If it is, then its looks up the yytext-child object of
the named AST_OBJ.
EDIT: Based on discussion comments in another answer, right-recursion in the OP's original
grammar might be problematic if the OP's semantic checks inspect global lexer state (yytext).
This solution is left-recursive and won't run afoul of that particular trap.
id_lookup
: IDENTIFIER
is formally identical to
object_name
: IDENTIFIER
and object_name would accept everything that id_lookup wouldn't, so assertLookup( yytext ); probably runs on everything that may look like IDENTIFIER and is not accepted by enother rule just to decide between the 2 and then object_name can't accept because single lookahead forbids that.
For the twilight zone, the two chars that you got errors for are not declared as tokens with opends the zone of undefinded behavior and could trip parser into trying to treat them as potential identifiers when the grammar gets loose.
I just tried running muscl in Ubuntu 10.04 using bison 2.4.1 and I was able to run both of your examples with no syntax errors. My guess is that you have a bug in your version of bison. Let me know if I'm somehow running your parser wrong. Below is the output from the first example you gave.
./muscle < ./test1.m (this was your first test)
\-statement list
|-declaration (constant)
| |-symbol reference
| | \-list (constant)
| \-list
| |-value
| | \-1
| |-value
| | \-2
| |-value
| | \-3
| \-value
| \-4
|-loop (for-in)
| |-symbol reference
| | \-x (variable)
| |-symbol reference
| | \-list (constant)
| \-statement list
| \-send statement
| \-binary op (addition)
| |-binary op (addition)
| | |-value
| | | \-foo
| | \-symbol reference
| | \-x (variable)
| \-value
| \-!
\-send statement
\-value
\-Done!
+-----+----------+-----------------------+-----------------------+
| 1 | VALUE | 1 | |
| 2 | ELMT | #1 | |
| 3 | VALUE | 2 | |
| 4 | ELMT | #3 | |
| 5 | VALUE | 3 | |
| 6 | ELMT | #5 | |
| 7 | VALUE | 4 | |
| 8 | ELMT | #7 | |
| 9 | LIST | | |
| 10 | CONST | #10 | #9 |
| 11 | ITER_NEW | #11 | #10 |
| 12 | BRA | #14 | |
| 13 | ITER_INC | #11 | |
| 14 | ITER_END | #11 | |
| 15 | BRT | #22 | |
| 16 | VALUE | foo | |
| 17 | ADD | #16 | #11 |
| 18 | VALUE | ! | |
| 19 | ADD | #17 | #18 |
| 20 | SEND | #19 | |
| 21 | BRA | #13 | |
| 22 | VALUE | Done! | |
| 23 | SEND | #22 | |
| 24 | HALT | | |
+-----+----------+-----------------------+-----------------------+
foo 1!
foo 2!
foo 3!
foo 4!
Done!
I'm trying to create a discriminated union for part of speech tags and other labels returned by a natural language parser.
It's common to use either strings or enums for these in C#/Java, but discriminated unions seem more appropriate in F# because these are distinct, read-only values.
In the language reference, I found that this symbol
``...``
can be used to delimit keywords/reserved words. This works for
type ArgumentType =
| A0 // subject
| A1 // indirect object
| A2 // direct object
| A3 //
| A4 //
| A5 //
| AA //
| ``AM-ADV``
However, the tags contain symbols like $, e.g.
type PosTag =
| CC // Coordinating conjunction
| CD // Cardinal Number
| DT // Determiner
| EX // Existential there
| FW // Foreign Word
| IN // Preposision or subordinating conjunction
| JJ // Adjective
| JJR // Adjective, comparative
| JJS // Adjective, superlative
| LS // List Item Marker
| MD // Modal
| NN // Noun, singular or mass
| NNP // Proper Noun, singular
| NNPS // Proper Noun, plural
| NNS // Noun, plural
| PDT // Predeterminer
| POS // Possessive Ending
| PRP // Personal Pronoun
| PRP$ //$ Possessive Pronoun
| RB // Adverb
| RBR // Adverb, comparative
| RBS // Adverb, superlative
| RP // Particle
| SYM // Symbol
| TO // to
| UH // Interjection
| VB // Verb, base form
| VBD // Verb, past tense
| VBG // Verb, gerund or persent participle
| VBN // Verb, past participle
| VBP // Verb, non-3rd person singular present
| VBZ // Verb, 3rd person singular present
| WDT // Wh-determiner
| WP // Wh-pronoun
| WP$ //$ Possessive wh-pronoun
| WRB // Wh-adverb
| ``#``
| ``$``
| ``''``
| ``(``
| ``)``
| ``,``
| ``.``
| ``:``
| `` //not sure how to escape/delimit this
``...``
isn't working for WP$ or symbols like (
Also, I have the interesting problem that the parser returns `` as a meaningful symbol, so I need to escape it as well.
Is there some other way to do this, or is this just not possible with a discriminated union?
Right now I'm getting errors like
Invalid namespace, module, type or union case name
Discriminated union cases and exception labels must be uppercase identifiers
I suppose I could somehow override toString for these goofy cases and replace the symbols with some alphanumeric equivalent?
The spec doesn't seem clear about what characters are allowed to be escaped in double-backticks in what contexts.
I think your best bet is to use standard identifiers for the DU cases, and override ToString as you suggest.
From my experience, double-backtick marks identifiers are/seem to be fully supported only in let Bindings or type members. So that means you can put about any sequence of characters inside (excepting the # character which is reserved for F# codegen).
When you want to use them as identifiers in module, type or DU cases definition, it doesn't play as nice since some characters are not supported.
E.g. ., /, *, +, $, [, ], \ or & generate an "Invalid namespace, module, type or union case name" error.