What is the best way to capture ambiguous segments of text? - parsing

What would be the best way to capture the inner text in the following case?
inner_text = any*;
tag_cdata = '<![CDATA[' inner_text >cdata_start %cdata_end ']]>';
The problem is, it seems like the cdata_end action fires several times due to the fact that inner_text could match ].

I found the solution. You need to handle non-determinism. It wasn't clear initially, but the correct solution is something like this:
inner_text = any*;
tag_cdata = '<![CDATA[' inner_text >text_begin %text_end ']]>' %cdata_end;
action text_begin {
text_begin_at = p;
}
action text_end {
text_end_at = p;
}
action cdata_end {
delegate.cdata(data.byteslice(text_begin_at, text_end_at-text_begin_at))
}
Essentially, you wait until you are sure you parsed a complete CDATA tag before firing the callback, using information you previously captured.
In addition, I found that some forms of non-determinism in Ragel need to be explicitly handled using priorities. While this seems a bit ugly, it is the only solution in some cases.
When dealing with a pattern such as (a+ >a_begin %a_end | b)* you will find that the events are called for every single a encountered, rather than at the longest sub-sequence. This ambiguity, in some cases, can be solved using the longest match kleene star **. What this does is it prefers to match the existing pattern rather than wrapping around.
What was surprising to me, is that this actually modifies the way events are called, too. As an example, this produces a machine which is unable to buffer more than one character at a time when invoking callbacks:
%%{
machine example;
action a_begin {}
action a_end {}
main := ('a'+ >a_begin %a_end | 'b')*;
}%%
Produces:
You'll notice that it calls a_begin and a_end every time.
In contrast, we can make the inner loop and event handling greedy:
%%{
machine example;
action a_begin {}
action a_end {}
main := ('a'+ >a_begin %a_end | 'b')**;
}%%
which produces:

Related

firefox addon webrequest.addListener misbehaving

I want to examine http requests in an extension for firefox. To begin figuring out how to do what I want to do I figured I'd just log everything and see what comes up:
webRequest.onResponseStarted.addListener(
(stuff) => {console.log(stuff);},
{urls: [/^.*$/]}
);
The domain is insignificant, and I know the regex works, verified in the console. When running this code I get no logging. When I take out the filter parameter I get every request:
webRequest.onResponseStarted.addListener(
(stuff) => {console.log(stuff);}
);
Cool, I'm probably doing something wrong, but I can't see what.
Another approach is to manually filter on my own:
var webRequest = Components.utils.import("resource://gre/modules/WebRequest.jsm", {});
var makeRequest = function(type) {
webRequest[type].addListener(
(stuff) => {
console.log(!stuff.url.match(/google.com.*/));
if(!stuff.url.match(/google.com.*/))
return;
console.log(type);
console.log(stuff);
}
);
}
makeRequest("onBeforeRequest");
makeRequest("onBeforeSentHeaders");
makeRequest("onSendHeaders");
makeRequest("onHeadersReceived");
makeRequest("onResponseStarted");
makeRequest("onCompleted");
With the console.log above the if, I can see the regex returning true when I want it to and the code making it past the if. When I remove the console.log above the if the if no longer gets executed.
My question is then, how do I get the filtering parameter to work or if that is indeed broken, how can I get the code past the if to be executed? Obviously, this is a fire hose, and to begin searching for a solution I will need to reduce the data.
Thanks
urls must be a string or an array of match patterns. Regular expressions are not supported.
WebRequest.jsm uses resource://gre/modules/MatchPattern.jsm. Someone might get confused with the util/match-pattern add-on sdk api, which does support regular expressions.

ActionScript Unexpected Curly Braces/Semicolon?

I'm editing an ActionScript file and I've run into an issue.
When I put the following, everything is fine.
if (x=x) {
//blah
}
If I put this, it says unexpected ; for one line and } for the another:
for (x=x) {
//blah
}
Same with when I put this:
while (x=x) {
//blah
}
Of course I only put those there as examples to test it, because I thought something was wrong with my code. Is ActionScript, in this part of my file, only allowing IF statements or what? I need to do the same long series of steps to two different strings, but I don't want to put the code in there twice. Do I have to make a function?
Read up on the looping syntax here.
The For loop doesn't take a boolean (true/false), it needs a counter, a boolean check for the limit and an increment.
i.e.
for (counter; condition; action){
statements;
}
I've never used action script but I would suggest trying this with
x==x
Since once = is assignment, not a comparison.
if the for loop still does not function try
for(;x==x;){
}
the semicolons tell it that you want to only use the second statement in the for loop declaration, the condition; since for loops use three statements,
for (variable; condition; iterative action)
by placing semicolons before and after x==x you specify only the condition, which seems to be what you're trying to do.
Turns out using any IF or WHILE statements caused the error no matter what was inside.
I was able to accomplish what I wanted by making another function and sending each string though those.
Appreciate the help, voted up on both of y'all.
you have to write it like this:
if(a==x){
// do that
}
for (x=0; x<maxloops; x++){
// do that
}
while(a==x){
}
The = symbol is used to define values to varialbes, while == has to be used when you comparing / checking (i.e. whether this is equal to that). This both applies to IF and WHILE
the FOR LOOP. Let's say that you want to execute the action "do that" 10 times. then you write
for (x=0; x<10; x++){
// do that
}
the first part x=0 is the definition of the counting variable and its initial value
the second part is the condition (run the loop as long as x is less than 10)
the third part is the stepper. (how the counter will raise its value in each loop). x++ is a short way to write x = x +1;

Lua arguments passed to function in table are nil

I'm trying to get a handle on how OOP is done in Lua, and I thought I had a simple way to do it but it isn't working and I'm just not seeing the reason. Here's what I'm trying:
Person = { };
function Person:newPerson(inName)
print(inName);
p = { };
p.myName = inName;
function p:sayHello()
print ("Hello, my name is " .. self.myName);
end
return p;
end
Frank = Person.newPerson("Frank");
Frank:sayHello();
FYI, I'm working with the Corona SDK, although I am assuming that doesn't make a difference (except that's where print() comes from I believe). In any case, the part that's killing me is that inName is nil as reported by print(inName)... therefore, myName is obviously set to nil so calls to sayHello() fail (although they work fine if I hardcode a value for myName, which leads me to think the basic structure I'm trying is sound, but I've got to be missing something simple). It looks, as far as I can tell, like the value of inName is not being set when newPerson() is called, but I can't for the life of me figure out why; I don't see why it's not just like any other function call.
Any help would be appreciated. Thanks!
Remember that this:
function Person:newPerson(inName)
Is equivalent to this:
function Person.newPerson(self, inName)
Therefore, when you do this:
Person.newPerson("Frank");
You are passing one parameter to a function that expects two. You probably don't want newPerson to be created with :.
Try
Frank = Person:newPerson("Frank");

How do you accept a string in Erlang?

I'm writing a program where the user must enter a 'yes' or 'no' value. The following is the code that will be executed when the message {createPatient,PatientName} is received.
{createPatient, PatientName} ->
Pid = spawn(patient,newPatient,[PatientName]),
register(PatientName,Pid),
io:format("*--- New Patient with name - ~w~n", [PatientName]),
Result = io:read("Yes/No> "),
{_,Input} = Result,
if(Input==yes) ->
io:format("OK")
end,
loop(PatientRecords, DoctorPatientLinks, DoctorsOnline, CurrentPatientRequests, WaitingOfflineDoctorRequests);
When executed ,the line "New Patient with name..." is displayed however the line Yes/No is not displayed and the program sort of crashes because if another message is sent, then the execution of that message will not occur. Is there a different way to solve this problem please?
There are a number of points I would like to make here:
The io:read/1 function reads a term, not just a line, so you have to terminate input with a '.' (like in the shell).
io:read/1 returns {ok,Term} or {error,Reason} or eof so you code should check for these values, for example with a case.
As #AlexeyRomanov mentioned, io:get_line/1 might be a better choice for input.
The if expression must handle all cases even the ones in which you don't want to do anything, otherwise you will get an error. This can be combined with the case testing the read value.
You spawn the function patient:newPatient/1 before you ask if the name is a new patient, this seems a little strange. What does the newpatient function do? Is it in anyway also doing io to the user which might be interfering with functions here?
The main problem seems to be work out what is being done where, and when.
This is very artificial problem. In erlang any communication is usually inter-process and exchanging strings wouldn't make any sense - you ask question in context of process A and you would like to post answer in context of process B (shell probably).
Anyways, consider asking question and waiting in receive block in order to get an answer.
When question pops out in shell, call a function which will send the answer to 'asking' process with your answer.
So:
io:format("*--- New Patient with name - ~w~n", [PatientName]),
receive
{answer, yes} -> do_something();
{answer, no} -> do_something()
end
The 'answering' function would look like that:
answer(PatientName, Answer) ->
PatientName ! {answer, Answer}.
And shell action:
$> *--- New Patient with name - User1036032
$> somemodule:answer('User1036032', yes).
It is possible to create some dialog with shell (even unix shell), but to be honest it is used so rare that I do not remember those I/O tricks with read and write. http://trapexit.com/ used to have cookbook with this stuff.
Use io:get_line("Yes/No> ") instead.

How to create a parser which tokenizes a list of words taken from a file?

I am trying to do a syntax text corrector for my compilers' class. The idea is: I have some rules, which are inherent to the language (in my case, Portuguese), like "A valid phrase is SUBJECT VERB ADJECTIVE", as in "Ruby is great".
Ok, so first I have to tokenize the input "Ruby is great". So I have a text file "verbs", with a lot of verbs, one by line. Then I have one text "adjectives", one "pronouns", etc.
I am trying to use Ragel to create a parser, but I don't know how I could do something like:
%%{
machine test;
subject = <open-the-subjects-file-and-accept-each-one-of-them>;
verb = <open-the-verbs-file-and-accept-each-one-of-them>;
adjective = <open-the-adjective-file-and-accept-each-one-of-them>;
main = subject verb adjective # { print "Valid phrase!" } ;
}%%
I looked at ANTLR, Lex/Yacc, Ragel, etc. But couldn't find one that seemed to solve this problem. The only way to do this that I could think of was to preprocess Ragel's input file, so that my program reads the file and writes its contents at the right place. But I don't like this solution either.
Does anyone knows how I could do this? There's no problem if it isn't with Ragel, I just want to solve this problem. I would like to use Ruby or Python, but that's not really necessary either.
Thanks.
If you want to read the files at compile time .. make them be of the format:
subject = \
ruby|\
python|\
c++
then use ragel's 'include' or 'import' statement (I forget which .. must check the manual) to import it.
If you want to check the list of subjects at run time, maybe just make ragel read 3 words, then have an action associated with each word. The action can read the file and lookup if the word is good or not at runtime.
The action reads the text file and compares the word's contents.
%%{
machine test
action startWord {
lastWordStart = p;
}
action checkSubject {
word = input[lastWordStart:p+1]
for possible in open('subjects.txt'):
if possible == word:
fgoto verb
# If we get here do whatever ragel does to go to an error or just raise a python exception
raise Exception("Invalid subject '%s'" % word)
}
action checkVerb { .. exercise for reader .. ;) }
action checkAdjective { .. put adjective checking code here .. }
subject = ws*.(alnum*)>startWord%checkSubject
verb := : ws*.(alnum*)>startWord%checkVerb
adjective := ws*.)alnum*)>startWord%checkAdjective
main := subject;
}%%
With bison I would write the lexer by hand, which lookup the words in the predefined dictionary.

Resources