Redefining Lexical Tokens With Javacc

Redefining Lexical Tokens With Javacc - token

I'm quite new to creating language syntax with Javacc and i need to find a way to allow the user to redefine the definition of a token in code.
For example, the line
REDEFINE IF FOO
Should change the Definition of "IF" from
< IF: "IF" >
To
< IF: "FOO" >
If this is not possible, what would be the best way of solving this problem?

I think you can do it with a token action that changes the kind field of the token.
Something like the following. [Untested code follows. If you use it, please correct any errors in this answer.]
Make a token manager declaration of a hash map:
TOKEN_MGR_DECLS: {
public java.util.HashMap<String,Integer> keywordMap = new java.util.HashMap<String,Integer>() ;
{ keywordMap.put( "IF", ...Constants.IF); }
}
Make a definition for identifiers.
TOKEN : { <ID : (["a"-"z","A"-"Z"])(["a"-"z","A"-"Z","0"-"9"])* >
{ if( keywordMap.containsKey( matchedToken.image ) ) {
matchedToken.kind = keywordMap.get( matchedToken.image ) ; }
}
}
Make definitions for the key words. These need to come after then definition of ID. Really these are just here so that the kinds are created. They will be unreachable and may cause warnings.
TOKEN : { <IF : "A"> | ... }
In the parser you need to define redefine
void redefine() :
{
Token oldToken;
Token newToken;
}
{
<REDEFINE> oldToken=redefinableToken() newToken=redefinableToken()
{
if( ...TokenManager.keywordMap.containsKey( oldToken.image ) ) {
...TokenManager.keywordMap.remove( oldToken.image ) ;
...TokenManager.keywordMap.add( newToken.image, oldToken.kind ) ; }
else {
report an error }
}
}
Token redefinableToken() :
{ Token t ; }
{
t=<ID> {return t ;}
| t=<IF> {return t ;}
| ...
}
See the FAQ (4.14) for warnings about trying to alter the behaviour of the lexer from the parser. Long story short: avoid lookahead.
Another approach is to simply have one token kind, say ID, and handle everything in the parser. See FAQ 4.19 on "Replacing keywords with semantic lookahead". Here lookahead will be less of a problem because semantic actions in the parser aren't executed during syntactic lookahead (FAQ 4.10).

Related

Updating AST in jjtree/javacc

I have a small c like language for which I have generated a parser and created a printVisitor using jjtree. Now, I want to be able to change the AST and print the new modified AST.
For example I have a production for a construct called "taint" which wraps definitions inside it. I want while parsing to change that AST to point directly to the declaration and ignoring the taint constructs.
That this
taint(int x) ====becomes after parsing===> int x
how is that possible? I am not sure if I need a visitor for that or can I change the jjtree directly to include that.
Below is the code for the production. Your help is much appreciated.!
void Taint() #Taint: {Token t; String varname; }
{
t = <TAINT>< LPAREN> varname=Primitive()< RPAREN>
{
ST.put(varname, new STC(varname, true, true));
jjtThis.value=t.image;
}
}
String Primitive() #Primitive: { String type; String name;}
{
type=Type() name=Id()
{
ST.put(name, new STC(name, false, true) );
return name;
}
}

ANTLR4 : Island grammar, token matching / skipping

If fighting an island grammar with antlr4, and while I can make it work, I still have doubts if this is the "proper" way.
I need to parse :
Some random text
{ }
#if(condition) {
more random text
#foobar
#if (condition2) {
random text {}
}
}
The problem lies within the context : An "wild" {} isn't anything, but if it's a { } behind a language operator, the { } become meaningful. (read : It opens and closes a block)
In the above case, it would return the following, assuming that condition and condition2 are both true :
Some random text
{}
more random text
random text {}
I'm confused on which route to pick, any advice on the above ?
The original implementation seems to be matching braces :
{ }
#if (true) {
{
foo
bar
} }
yields
{ }
{
foo
bar
}
while
{ }
#if (true) {
{
foo
bar
}
yields a parse error.

this can be solved with a context specific lexer. In this case, by keeping track of the condition / block openings, we can determine if this is template content, or an actual block opening / closing.
See p219 of the ANTLR4 definitive ANTLR4 reference.

XText cross-reference to an non-DSL resource

Please consider this minimal Xtext grammar.
Model:
"As a" stackeholder=Stakeholder "I want" want=Want;
Stakeholder:
'client' | 'developer' | 'manager';
Want:
'everything' | 'cookies' | 'fame';
Now what I need to do, is to move the definition of stakeholders (let's forget about want) to SOME external data source. This "external data source" might be a CSV file, might be a DB or maybe a web service. But I it is highly unlikely to be some Xtext file or to come with an EMF model. But still I want to cross-reference it just like you can cross-reference java types in your DSL.
Issues like manual parsing and caching (for performance sake) aside: is this even doable?
I've dug a little into the topic of scopes and resource providers but everything I found required the external source to be part of at least another DSL.
I'd be very happy about a rough outline what would be needed to be done.

Sorry it took me so long to respond. I tried Christians suggestion, was not very satisfied and than priorities shifted. Now I'll have another go at the problem and in order to document for others (and to clear my head) I'll write down what I did so far since it was not all that straight forward and required a fair amount of experimentation.
I will not post full classes but only the relevant parts. Feel free to ask for more detail if you need it.
My Syntax-Definition now looks like this:
Model:
stakeholders+=StakeholderDecl*
requirements+=Requirement*;
Requirement:
'As a' stakeholder=[Stakeholder] 'I want' want=('everything' | 'cookies' | 'results')
;
StakeholderDecl returns Stakeholder :
'Stakeholder' Stakeholder
;
Stakeholder:
name=ID
;
Let it be noted that everything below needed to to be done in the .ui package.
First I created StakeholdersProvider.xtend:
class StakeholdersProvider extends AbstractResourceDescription {
// this is the dummy for an "external source". Just raw data.
val nameList = newArrayList( "buddy", "boss" )
val cache = nameList.map[it.toDescription]
private val uri = org.eclipse.emf.common.util.URI.createPlatformResourceURI("neverland", true)
def public List<IEObjectDescription> loadAdditionalStakeholders() {
cache
}
def private IEObjectDescription toDescription(String name) {
ExternalFactoryImpl.init()
val ExternalFactory factory = new ExternalFactoryImpl()
val Stakeholder obj = factory.createStakeholder as StakeholderImpl
obj.setName(name)
new StakeholderDescription(name, obj, uri)
}
. . .
override getURI() {
uri
}
def public boolean isProvided( EObject object ) {
if( object.eClass.classifierID != ExternalPackageImpl.STAKEHOLDER ) {
false
}
else {
val stakeholder = object as Stakeholder
nameList.exists[it == stakeholder.name]
}
}
}
note that the provider is also a resourceDescription and its uri of course is nonsense.
With this provider I wrote a ScopeWrapper.xtend :
class ScopeWrapper implements IScope {
private var IScope scope;
private var StakeholdersProvider provider
new( IScope scopeParam, StakeholdersProvider providerParam ) {
scope=scopeParam
provider = providerParam
}
override getAllElements() {
val elements = scope.allElements.toList
val ret = provider.loadAdditionalStakeholders()
ret.addAll(elements)
ret
}
override getSingleElement(QualifiedName name) {
allElements.filter[it.name == name].head
}
. . .
}
and ResourceDescriptionWrapper.xtend
class ResourceDescriptionsWrapper implements IResourceDescriptions {
private StakeholdersProvider provider;
private IResourceDescriptions descriptions;
new(IResourceDescriptions descriptionsParam, StakeholdersProvider providerParam) {
descriptions = descriptionsParam
provider = providerParam
}
override getAllResourceDescriptions() {
val resources = descriptions.allResourceDescriptions.toList
resources.add(provider)
resources
}
override getResourceDescription(URI uri) {
if( uri == provider.URI ) provider
else descriptions.getResourceDescription(uri)
}
override getExportedObjects() {
val descriptions = descriptions.exportedObjects.toList
descriptions.addAll(provider.exportedObjects)
descriptions
}
. . . some overrides for getExportedObjects-functions
}
all of this is wired together MyGlobalScopeProvider.xtend
class MyGlobalScopeProvider extends TypesAwareDefaultGlobalScopeProvider {
val provider = new StakeholdersProvider()
override getScope(Resource context, EReference reference, Predicate<IEObjectDescription> filter) {
val scope = super.getScope(context, reference, filter)
return new ScopeWrapper(scope, provider)
}
override public IResourceDescriptions getResourceDescriptions(Resource resource) {
val superDescr = super.getResourceDescriptions(resource)
return new ResourceDescriptionsWrapper(superDescr, provider)
}
}
which is registered in MyDslUiModule.java
public Class<? extends IGlobalScopeProvider> bindIGlobalScopeProvider() {
return MyGlobalScopeProvider.class;
}
So far so good. I now get boss and buddy suggested as stakeholders. However when I use one of those 2 I get an error in the editor complaining about a dangling reference and an error logging in the console that a stakeholder cannot be exported as the target is not contained in a resource. Figuring those 2 might are related I tried to fix the error logging, created MyresourceDescriptionStrategy.xtend
class MyResourcesDescriptionStrategy extends DefaultResourceDescriptionStrategy {
val provider = new StakeholdersProvider()
override isResolvedAndExternal(EObject from, EObject to) {
if (provider.isProvided(to)) {
// The object is a stakeholder that was originally provided by
// our StakeholdersProvider. So we mark it as resolved.
true
} else {
super.isResolvedAndExternal(from, to)
}
}
}
and also wire it in the UiModule:
public Class<? extends IDefaultResourceDescriptionStrategy> bindDefaultResourceDescriptionStrategy() {
return MyResourcesDescriptionStrategy.class;
}
This fixes the logging error but the "dangling reference" problem remains. I searched for solutions for this and the most prominent result suggests that defining a IResourceServiceProvider would have been the best way to solve my problem in the first place.
I'll spend a bit more time on the current approach and than try it with a ResourceProvider.
EDIT: I got the "dangling reference" problem fixed. The loadAdditionalStakeholders() function in StakeholdersProvider.xtend now looks like this:
override loadAdditionalStakeholders() {
val injector = Guice.createInjector(new ExternalRuntimeModule());
val rs = injector.getInstance(ResourceSet)
val resource = rs.createResource(uri)
nameList.map[it.toDescription(resource)]
}
def private IEObjectDescription toDescription(String name, Resource resource) {
ExternalFactoryImpl.init()
val ExternalFactory factory = new ExternalFactoryImpl()
val Stakeholder obj = factory.createStakeholder as StakeholderImpl
obj.setName(name)
// not sure why or how but when adding the obj to the resource, the
// the resource will be set in obj . . . thus no more dangling ref
resource.contents += obj
new StakeholderDescription(name, obj, uri)
}

How can I write Bison rule with 0 or 1+ tokens without S/R conflicts?

If I defined tokens like normal object access :
[$_a-zA-Z]+[.] { return ACCESS; }
[$_a-zA-Z]+ { return ID; }
[+] { return PLUIS; }
And the Bison grammar rules:
Accesses
: Accesses ACCESS { /*do something...*/ }
| ACCESS { /*do something...*/ }
Expression
: Accesses ID PLUS Accesses ID { /*do something...*/ }
I want to allow such contents in source codes:
moduleA.valueB.valueC + valueD
In the example, if I don't put empty rule in Accesses, the single ID variable like valueD is illegal. But if I put the empty rule in, Accesses will cause seriously S/R conflicts, and texts it matched will become strange.
And, I don't think duplicate the rules in Expression is a good idea, ex:
Expression
: Accesses ID PLUS Accesses ID { /*do something...*/ }
| ID PLUS Accesses ID { /*do something...*/ }
| Accesses ID PLUS ID { /*do something...*/ }
| ID PLUS ID { /*do something...*/ }
Can I find other ways to solve this problem ?
EDIT: Ok thanks to your answer noticed me this simple grammar got no conflicts. At least you let me know that the real problem may hide in somewhere else ( what a mess for a compiler newbie ! ).

You can do it like this:
lex:
[$_a-zA-Z]+ {return WORD;}
"." {return DOT;}
"+" {return PLUS;}
bison:
Expression : Value PLUS Value;
Value : WORD|WORD AccessList;
AccessElement: DOT WORD;
AccessList : AccessElement|AccessList AccessElement;

There's nothing wrong with just using an epsilon production in your example:
Expression
: Accesses ID PLUS Accesses ID { /*do something...*/ }
;
Accesses
: Accesses ACCESS { /*do something...*/ }
| { /*do something...*/ }
;
gives no conflicts....

Are parsers generated by FSYacc thread safe?

If I generate a parser using FSYacc will it be thread safe?
The only reason I ask is because the functions
Parsing.rhs_start_pos and Parsing.symbol_end_pos
don't appear to have any state passed into them, which would lead me to assume that they are getting the current NonTerminal/Symbols from a shared location, is this correct?
After reflecting the code I see that they are getting the postion from a static property
internal static IParseState parse_information
{
get
{
return parse_information;
}
set
{
parse_information = value;
}
}
Is this correct? If so what can I do about it?
Edit: I also see a static method called set_parse_state
public static void set_parse_state(IParseState x)
{
parse_information = x;
}
But that still wont solve my problem...

I really don't like to answer my own question, however since this could save someone else a world of grief someday I will.
It turns out that the functions provided in the parsing module are NOT thread safe.
What you can do however is access the parseState "variable", which is of type IParseState, in your nonterminal action.
For example (rough but work with me):
If you have a NonTerminal like
%token<string> NAME
%%
Person:
NAME NAME { $1 (* action *) }
The code that gets generated is:
(fun (parseState : Microsoft.FSharp.Text.Parsing.IParseState) ->
let _1 = (let data = parseState.GetInput(1) in
(Microsoft.FSharp.Core.Operators.unbox data : string)
) in
Microsoft.FSharp.Core.Operators.box((_1) : 'Person)
);
So you can interact with that parseState object in the same fashion.
%token<string> NAME
%%
Person:
NAME NAME { parseState.DoStuff(); }
The rhs_start_pos method basically does this:
let startPos,endPos = parseState.InputRange(n)
and the symbol_end_pos does this:
let startSymb,endSymb = parseState.ResultRange
I hope this helps

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Redefining Lexical Tokens With Javacc - token

Related

Updating AST in jjtree/javacc

ANTLR4 : Island grammar, token matching / skipping

XText cross-reference to an non-DSL resource

How can I write Bison rule with 0 or 1+ tokens without S/R conflicts?

Are parsers generated by FSYacc thread safe?

Categories

Resources