Identifier terminal except certain keywords - parsing

I'm using Irony framework and I have:
IdentifierTerminal variable = new IdentifierTerminal("variable");
a terminal for identifying an entry terminal.
This variable terminal can hold any string, except for a predefined list of reserved strings.
This identifier does not start or any with quotes or double quotes.
I want something like:
IdentifierTerminal variable = any contiguos string EXCEPT "event", "delegate";
How can I enforce this rule for this terminal?

Have you declared your keywords explicitly? If not, this page: https://en.wikibooks.org/wiki/Irony_-_Language_Implementation_Kit/Grammar/Terminals#Keywords will show you how. It seems that you don't need to explicitly say that an identifier cannot be a keyword, as the parser is able to figure this out. I found the following quote
Finally, in most cases, the Irony scanner does not need to distinguish between keywords and identifiers. It simply tokenizes all alpha-numeric words as identifiers, leaving it to the parser to differentiate them from each other. Some LEX-based solutions try to put too much responsibility on the scanner and make it recognize not only the token itself, but also its role in the surrounding context. This, in turn, requires extensive look-aheads, making the scanner code quite complex. In the author's opinion, identifying the token role is the responsibility of the parser, not the scanner. The scanner should recognize tokens without using any contextual information.
source: http://www.codeproject.com/Articles/22650/Irony-NET-Compiler-Construction-Kit

Related

What language is this Salesforce code that I need to wrap?

I'm working on a Salesforce coding issue. Let me preface this by saying I'm not a developer or Salesforce expert.
What language is this?
Data Type FormulaThis formula references multiple objects
IF (Fulfillment_Submission_Form_URL__c <> "" && CONTAINS(Fulfillment_Submission_Form_URL__c, "qualtrics"),
Fulfillment_Submission_Form_URL__c &
(IF (CONTAINS(Fulfillment_Submission_Form_URL__c,"?SID="), "&", "?")) &
(IF (CONTAINS(TEXT(Type__c), "Site Visit"),
"ContactId="&Statement_of_Work__r.Contractor_Contact__c&
"&CoachType="&SUBSTITUTE(Statement_of_Work__r.Work_Type__r.Name," ","%20")&
"&CoachName="&SUBSTITUTE(Statement_of_Work__r.Contractor_Name__c," ","%20")&
"&InitPartId="&Initiative_Participation__r.Id&
"&InstitutionName="&substitute(substitute(SUBSTITUTE(Institution_Name__c," ","%20"),")",""),"(","")&
"&AccountId="&Initiative_Participation__r.Participating_Institution__r.Id&
"&TodaysDate="&TEXT(TODAY())&
"&SOWLineItemId="&Id&
"&LeaderCollege="&Initiative_Participation__r.ATD_Leader_College_Status__c&
"&SVRCompleted="&TEXT(Count_of_Site_Visit_Fulfillments__c)&
"&SVRRequired="&TEXT(Number_of_Work_Units_Allocated__c),
IF (CONTAINS(TEXT(Type__c), "Feedback"),
"InitPartId="&Initiative_Participation__r.Id&
"&SOWLineItemId="&Id&
"&ReportYear="&Statement_of_Work__r.SOW_Year__c&
"&UserId="&Contractor_User_Id__c&
"&InstitutionName="&substitute(substitute(SUBSTITUTE(Institution_Name__c," ","%20"),")",""),"(",""),
"")
))
,"")
Essentially it's pulling a link from another product we've integrated it with. We then take the basic link and reformat it to add parameters.
The problem is when it pulls in some parameters (ex: CoachName) the Coach entered their name in strange formats like: John (Coach) Doe.
So when the script outputs a URL that includes parameters it breaks at the &CoachName=John%20(Coach)% portion of the URL. Any easy way to work around this by modifying the script? Unfortunately we DO need that (Coach) identifier because the system we push to grabs that as well.
It's formula syntax, I'd compare it to Excel-like formulas. There's self-paced training if you don't want to read documentation. And as it's not exactly code-related you may have more luck on dedicated site, https://salesforce.stackexchange.com/. More admins lurk there.
So you do want that "(Coach)" to go through but it breaks the link? Looks like ( is a special character. It's not technically wrong to have unescaped parentheses, if it breaks that other site you might want to contact them and get their act together. RFC doesn't force us to encode them but looks like you'll have to to solve it at least in the short term: https://webmasters.stackexchange.com/questions/78110/is-it-bad-to-use-parentheses-in-a-url
Instead of poor man's encoding (SUBSTITUTE(Statement_of_Work__r.Contractor_Name__c," ","%20") try using proper URLENCODE(Statement_of_Work__r.Contractor_Name__c).
Or there's bit more "pro" function called URLFOR but the documentation doesn't make it very clear how powerful the 3rd parameter is with the braces [key1 = value1, key2 = value2] syntax. Basically just pass the parameters and let SF worry about encoding special characters etc.
Read my answer https://salesforce.stackexchange.com/a/46445/799 and there are some examples on the net like https://support.docusign.com/s/articles/DFS-URL-buttons-for-Lightning-basic-setup-limitations?language=en_US&rsc_301

Java CUP and JFlex Interaction

I am considering to use the CUP parser generator for a project. In order to correctly parse some constructs of the language I am going to be compiling, I will need the lexer (generated by JFlex) to use information from the symbol table (not parse table -- I mean the table in which I will be storing information about identifiers) of the parser to generate the correct token type when its next_token() method is invoked. Since information in the symbol table depends statically on the program text, this will only work if the next_token() method is invoked "in lockstep" with the parser. In other words, this will work if the parser calls the lexer whenever it needs another token, but not if (for example) there is a parellel thread that is invoking the lexer and buffering tokens in a queue.
The question is thus: How does CUP call the lexer? Does it call it whenever it needs the next token? I could of course just write a CUP grammar specification and inspect the generated parser's source file to see what's going on, but that may be more work than necessary. I couldn't find any information on this on relevant websites.
Thanks a lot for any help you can offer!
I finished implementing my parser and scanner a while ago. Here's what I found:
CUP does indeed invoke the scanner as and when needed. It has always buffered one more token ahead of what has been recognized so far (the lookahead token). There is no fancy buffering of tokens ahead of time.
That being said, it can be tricky to set lexer states during parsing, as this can give rise to many grammar conflicts. I guess this is to do with the way CUP represents semantic actions embedded within productions. This forced me to abandon my initial design nonetheless, but not for the reason I was dreading.
Hope this helps someone!
Maybe this reply could be too late for you, but it could be useful for other users. The first thing to know is that a Parser couldn't do anything without a Scanner. As a matter of fact, the first parameter of the constructor of the parser is the scanner.
After the compilation of the .cup file, you will have, as output, a .java file that has the same name of the .cup one. Let's suppose its name is Parser.
So in the main class of your project you have to add the following lines:
TmpParser p = new TmpParser (new Scanner (new Reader (s)));
p.parse();
You should post this code into a try-catch block. With the method parse, the Parser starts its action and also it calls the next_token method of the Scanner, in order to recognize the token and verify if the grammar rules you wrote are right or not.
I don't know how late I'm to answer this question,
But I'm building 1 parser as a part of my course work..
I'm Using Lex and CUP for lexer and Parser, respectively. I'm also including my main class which calls parser which scans as in when required on get Token call
So My driver class will be :
// construct the lexer,
Yylex lexer = new Yylex(new FileReader(filename));
// create the parser
Parser parser = new Parser(lexer);
// and parse
Parser intern calls:
Parser.parse() {
...
this.cur_token = this.scan();
...
}
public Symbol scan() throws Exception {
Symbol sym = this.getScanner().next_token();
return sym != null ? sym : this.getSymbolFactory().newSymbol("END_OF_FILE", this.EOF_sym());
}
parser.parse();

Problems getting data from XML using Nokogiri and Rails

I'm trying to get information from a XML file with Nokogiri. I can retrieve file using
f = File.open("/my/path/file.xml")
cac=Nokogiri::XML(f)
And what a get is a fancy noko:file. My row tags are defined like
<z:row ...info..../>
like
<Nokogiri::XML::Element:0x217e7b8 name="z:row" attributes=[#<Nokogiri::XML::Attr:0x217e754 name="ID_Poblacio" value="3">
and I cannot retrieve the rows using either:
s=cac.at_xpath("/*/z:row") or
s=cac.at_xpath("//z:row") or
s=cac.at_xpath("//row") or
s=cac.at_xpath("z:row")...
Probably I'm really fool but I cannot figure out which can be the issue.
Does anyone face this problem?
Thanks in advance.
P:S I tried to paste my cac file directly from bash but something wierd happens with format so I remove it from question. If anyone can explain how to do it I will appreciate it.
Your XML element name contains a colon, but it is not in a namespace (otherwise the prefix and uri would show up in the dump of the node). Using element names with colons without using namespaces is valid, but can cause problems (like this case) so generally should be avoided. Your best solution, if possible, would be to either rename the elements in your xml to avoid the : character, or to properly use namespaces in your documents.
If you can’t do that, then you’ll need to be able to select such element names using XPath. A colon in the element name part of an XPath node test is always taken to indicate a namespace. This means you can’t directly specify a name with a colon that isn’t in a namespace. A way around this is to select all nodes and use an XPath function in a predicate to refine the selection to only those nodes you’re after. You can use a colon in an argument to name() and it won’t be interpreted as a namespace separator:
s=cac.at_xpath("//*[name()='z:row']")

How to get the string that caused parse error?

Suppose I have this code:
(handler-case (read ...)
(parse-error (condition)
(format t "What text was I reading last to get this error? ~s~&"
(how-to-get-this-text? condition))))
I can only see the parse-namestring accessors, but it gives the message of the error, not the text it was parsing.
EDIT
In my case the problem is less generic, so an alternative solution not involving the entire string that failed to parse can be good too.
Imagine this example code I'm trying to parse:
prefix(perhaps (nested (symbolic)) expressions))suffix
In some cases I need to stop on "suffix" and in others, I need to continue, the suffix itself has no other meaning but just being an indicator of the action the parser should take next.
READ parses from a stream, not a string. The s-expression can be arbitrarily long. Should READ keep a string of what's been read?
What you might need is a special stream. In standard Common Lisp there is no mechanism for user defined streams. But in real life every implementation has such extensible streams. See for example 'gray streams'.
http://www.sbcl.org/1.0/manual/Gray-Streams.html
There's no standard function to do it. You might be able to brute-force something with read-from-string, but whatever you do, it will require some extra work.

passing regular expressions as params into ruby

so i would like to expose regular expression queries on a field in my model, such that user could ask for
http://localhost:3000/myview.json?field=^hello, (there|world).*
so i know i'll have to change my routes to recognise the wildcard characters etc, and i can easily do a Regexp.new() inside my controller to convert this to a real regular expression (i'm using mongomapper in the back).
the issue is the potentially huge security hole with XSS.
should i be worried about this? how could i safely enable users to query with regular expression strings.
(i'm not too bothered about the user hammering the database... yet)
Regular expressions won't be able to perform arbitrary code execution unless there is something really wrong with Regexp.new. So if we assume that Regexp.new will either make a valid regular expression or fail or do something else sane you are safe already without having to sanitize the incoming string.

Resources