Scala Parser, set reserved words - parsing

I am writing a simple proggramming language with scala parser. So far no trouble, but im worrying about the relation function name / variable name against reserved words.
I'va already addded some special functions like "floor" ~ gexp or "top" ~ gexp and i dont want anybody using this language being able to name a function or a variable like them. I have not found yet a way to check this.
in Ruby i would write something like
rule varname
lowerid &{ |id| id[0].is_not_reserved } <VarNameNode>
but i dont know how would i write this in scala
def varName : Parser[StringValue] = lowerid

You can use the ^? operator:
def varName: Parser[StringValue] = lowerid ^? ({
case id if !isReserved(id) => id
}, { id => s"Error: $id is reserved." })


Check if entered text is valid in Xtext

lets say we have some grammar like this.
'Hello' name=ID '!';
I would like to check whether the text written text in name is a valid text.
All the valid words are saved in an array.
Also the array should be filled with words from a given file.
So is it possible to check this at runtime and maybe also use this words as suggestions.
For this purpose you can use a validator.
A simple video tutorial about it can be found here
In your case the function in the validator could look like this:
public static val INVALID_NAME = "greeting_InvalidName"
def nameIsValid(Greeting grt) {
val name = grt.getName() //or just grt.Name
val validNames = NewArrayList
//add all valid names to this list
if (!validNames.contains(name)) {
val errorMsg = "Name is not valid"
error(errorMsg, GreetingsPackage.eINSTANCE.Greeting_name, INVALID_NAME)
You might have to replace the "GreetingsPackage" if your DSL isn't named Greetings.
The static String passed to the error-method serves for identification of the error. This gets important when you want to implement Quickfixes which is the second thing you have asked for as they provide the possibility to give the programmer a few ideas how to actually fix this particular problem.
Because I don't have any experience with implementing quickfixes myself I can just give you this as a reference.

How to have gsub handle multiple patterns and replacements

A while ago I created a function in PHP to "twitterize" the text of tweets pulled via Twitter's API.
Here's what it looked like:
function twitterize($tweet){
$patterns = array ( "/((([A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=\+\$,\w]+#)?[A-Za-z0-9.-]+|(?:www.|[-;:&=\+\$,\w]+#)[A-Za-z0-9.-]+)((?:\/[\+~%\/.\w-_]*)?\??(?:[-\+=&;%#.\w_]*)#?(?:[\w]*))?)/",
$replacements = array ("<a href='\\0' target='_blank'>\\0</a>", "<a href='\\1' target='_blank'>\\0</a>", "<a href='\\1&src=hash' target='_blank'>\\0</a>");
return preg_replace($patterns, $replacements, $tweet);
Now I'm a little stuck with Ruby's gsub, I tried:
def twitterize(text)
patterns = ["/((([A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=\+\$,\w]+#)?[A-Za-z0-9.-]+|(?:www.|[-;:&=\+\$,\w]+#)[A-Za-z0-9.-]+)((?:\/[\+~%\/.\w-_]*)?\??(?:[-\+=&;%#.\w_]*)#?(?:[\w]*))?)/", "/(?<=^|(?<=[^a-zA-Z0-9-\.]))#([A-Za-z_]+[A-Za-z0-9_]+)/", "/(?<=^|(?<=[^a-zA-Z0-9-\.]))#([A-Za-z_]+[A-Za-z0-9_]+)/"]
replacements = ["<a href='\\0' target='_blank'>\\0</a>",
"<a href='\\1' target='_blank'>\\0</a>",
"<a href='\\1&src=hash' target='_blank'>\\0</a>"]
return text.gsub(patterns, replacements)
Which obviously didn't work and returned an error:
No implicit conversion of Array into String
And after looking at the Ruby documentation for gsub and exploring a few of the examples they were providing, I still couldn't find a solution to my problem: How can I have gsub handle multiple patterns and multiple replacements at once?
Well, as you can read from the docs, gsub does not handle multiple patterns and replacements at once. That's what causing your error, quite explicit otherwise (you can read that as "give me a String, not an Array!!1").
You can write that like this:
def twitterize(text)
patterns = [/((([A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=\+\$,\w]+#)?[A-Za-z0-9.-]+|(?:www.|[-;:&=\+\$,\w]+#)[A-Za-z0-9.-]+)((?:\/[\+~%\/.\w-_]*)?\??(?:[-\+=&;%#.\w_]*)#?(?:[\w]*))?)/, /(?<=^|(?<=[^a-zA-Z0-9-\.]))#([A-Za-z_]+[A-Za-z0-9_]+)/, /(?<=^|(?<=[^a-zA-Z0-9-\.]))#([A-Za-z_]+[A-Za-z0-9_]+)/]
replacements = ["<a href='\\0' target='_blank'>\\0</a>",
"<a href='\\1' target='_blank'>\\0</a>",
"<a href='\\1&src=hash' target='_blank'>\\0</a>"]
patterns.each_with_index do |pattern, i|
text.gsub!(pattern, replacements[i])
This can be refactored into more elegant rubyish code, but I think it'll do the job.
The error was because you tried to use an array of replacements in the place of a string in the gsub function. Its syntax is:
You need to do something like this:
replaced_text = text.gsub(pattern1, replacement1)
replaced_text = replaced_text.gsub(pattern2, replacement2)
and so on, where the pattern 1 is one of your matching patterns and replacement is the replacement text you would like.

Difference between 'root :to =>' and 'root to:' in Rails Routes [duplicate]

Is there any difference between :key => "value" (hashrocket) and key: "value" (Ruby 1.9) notations?
If not, then I would like to use key: "value" notation. Is there a gem that helps me to convert from :x => to x: notations?
Yes, there is a difference. These are legal:
h = { :$in => array }
h = { :'a.b' => 'c' }
h[:s] = 42
but these are not:
h = { $in: array }
h = { 'a.b': 'c' } # but this is okay in Ruby2.2+
h[s:] = 42
You can also use anything as a key with => so you can do this:
h = { => 11 }
h = { 23 => 'pancakes house?' }
but you can't do this:
h = { 11 }
h = { 23: 'pancakes house?' }
The JavaScript style (key: value) is only useful if all of your Hash keys are "simple" symbols (more or less something that matches /\A[a-z_]\w*\z/i, AFAIK the parser uses its label pattern for these keys).
The :$in style symbols show up a fair bit when using MongoDB so you'll end up mixing Hash styles if you use MongoDB. And, if you ever work with specific keys of Hashes (h[:k]) rather than just whole hashes (h = { ... }), you'll still have to use the colon-first style for symbols; you'll also have to use the leading-colon style for symbols that you use outside of Hashes. I prefer to be consistent so I don't bother with the JavaScript style at all.
Some of the problems with the JavaScript-style have been fixed in Ruby 2.2. You can now use quotes if you have symbols that aren't valid labels, for example:
h = { 'where is': 'pancakes house?', '$set': { a: 11 } }
But you still need the hashrocket if your keys are not symbols.
key: "value" is a convenience feature of Ruby 1.9; so long as you know your environment will support it, I see no reason not to use it. It's just much easier to type a colon than a rocket, and I think it looks much cleaner. As for there being a gem to do the conversion, probably not, but it seems like an ideal learning experience for you, if you don't already know file manipulation and regular expressions.
Ruby hash-keys assigned by hash-rockets can facilitate strings for key-value pairs (e.g. 's' => x) whereas key assignment via symbols (e.g. key: "value" or :key => "value") cannot be assigned with strings. Although hash-rockets provide freedom and functionality for hash-tables, specifically allowing strings as keys, application performance may be slower than if the hash-tables were to be constructed with symbols as hash-keys. The following resources may be able to clarify any differences between hashrockets and symbols:
Ryan Sobol's Symbols in Ruby
Ruby Hashes Exaplained by Erik Trautman
The key: value JSON-style assignments are a part of the new Ruby 1.9 hash syntax, so bear in mind that this syntax will not work with older versions of Ruby. Also, the keys are going to be symbols. If you can live with those two constraints, new hashes work just like the old hashes; there's no reason (other than style, perhaps) to convert them.
Doing :key => value is the same as doing key: value, and is really just a convenience. I haven't seen other languages that use the =>, but others like Javascript use the key: value in their Hash-equivalent datatypes.
As for a gem to convert the way you wrote out your hashes, I would just stick with the way you are doing it for your current project.
*Note that in using key: value the key will be a symbol, and to access the value stored at that key in a foo hash would still be foo[:key].

How to create a parser which tokenizes a list of words taken from a file?

I am trying to do a syntax text corrector for my compilers' class. The idea is: I have some rules, which are inherent to the language (in my case, Portuguese), like "A valid phrase is SUBJECT VERB ADJECTIVE", as in "Ruby is great".
Ok, so first I have to tokenize the input "Ruby is great". So I have a text file "verbs", with a lot of verbs, one by line. Then I have one text "adjectives", one "pronouns", etc.
I am trying to use Ragel to create a parser, but I don't know how I could do something like:
machine test;
subject = <open-the-subjects-file-and-accept-each-one-of-them>;
verb = <open-the-verbs-file-and-accept-each-one-of-them>;
adjective = <open-the-adjective-file-and-accept-each-one-of-them>;
main = subject verb adjective # { print "Valid phrase!" } ;
I looked at ANTLR, Lex/Yacc, Ragel, etc. But couldn't find one that seemed to solve this problem. The only way to do this that I could think of was to preprocess Ragel's input file, so that my program reads the file and writes its contents at the right place. But I don't like this solution either.
Does anyone knows how I could do this? There's no problem if it isn't with Ragel, I just want to solve this problem. I would like to use Ruby or Python, but that's not really necessary either.
If you want to read the files at compile time .. make them be of the format:
subject = \
then use ragel's 'include' or 'import' statement (I forget which .. must check the manual) to import it.
If you want to check the list of subjects at run time, maybe just make ragel read 3 words, then have an action associated with each word. The action can read the file and lookup if the word is good or not at runtime.
The action reads the text file and compares the word's contents.
machine test
action startWord {
lastWordStart = p;
action checkSubject {
word = input[lastWordStart:p+1]
for possible in open('subjects.txt'):
if possible == word:
fgoto verb
# If we get here do whatever ragel does to go to an error or just raise a python exception
raise Exception("Invalid subject '%s'" % word)
action checkVerb { .. exercise for reader .. ;) }
action checkAdjective { .. put adjective checking code here .. }
subject = ws*.(alnum*)>startWord%checkSubject
verb := : ws*.(alnum*)>startWord%checkVerb
adjective := ws*.)alnum*)>startWord%checkAdjective
main := subject;
With bison I would write the lexer by hand, which lookup the words in the predefined dictionary.

Ruby gsub function

I'm trying to create a BBcode [code] tag for my rails forum, and I have a problem with the expression:
param_string.gsub!( /\[code\](.*?)\[\/code\]/im, '<pre>\1</pre>' )
How do I get what the regex match returns (the text inbetween the [code][/code] tags), and escape all the html and some other characters in it?
I've tried this:
param_string.gsub!( /\[code\](.*?)\[\/code\]/im, '<pre>' + my_escape_function('\1') + '</pre>' )
but it didn't work. It just passes "\1" as a string to the function.
You should take care of the greedy behavior of the regular expressions. So the correct code looks like this:
html.gsub!(/\[(\S*?)\](.*?)\[\/\1\]/) { |m| escape_method($1, $2) }
The escape_method then looks like this:
def escape_method( type, string )
case type.downcase
when 'code'
when 'bold'
Someone here posted an answer, but they've deleted it.
I've tried their suggestion, and made it work with a small change. Whoever you are, thanks! :)
Here it is
param_string.gsub!( /\[code\](.*?)\[\/code\]/im ) {|s| '<pre>' + my_escape_function(s) + '</pre>' }
You can simply use "<pre>#{$1}</pre>" for your replacement value.
