Rebol: how to split a string into characters - parsing

Using Rebol how do I split this string into characters (without using a c-like approach with loops)? I'm using version 2.7.8.2.5 which does not have the split method.
str: "Today is Monday"
I want to split the above into:
[ 'T' 'o' 'd' 'a' 'y' ' ' 'i' 's' ' ' 'M' 'o' 'n' 'd' 'a' 'y']
Parse method seems to only split a sentence into constituent words.
Thank you.

If you don't want to use loops, there's one nifty trick:
>> head extract/into str 1 []
== [#"T" #"o" #"d" #"a" #"y" #" " #"i" #"s" #" " #"M" #"o" #"n" #"d" #"a" #"y"]
OTOH, string! is already a series of char! values, so breaking it up into characters like that doesn't provide any clear benefit.

In some Rebols (not Rebol2) you could use MAP-EACH to do this, e.g. map-each ch str [ch].
In Rebol2, COLLECT and KEEP are fairly general and powerful ways of building up blocks:
>> collect [foreach c str [keep c]]
== [#"T" #"o" #"d" #"a" #"y" #" " #"i" #"s" #" " #"M" #"o" #"n" #"d" #"a" #"y"]
I'll give you that one and let others list out the infinity of faster ways. :-)

Depending if you want to get single characters or strings with the length one you can use parse too with the following rules
>> str: "Today is Monday"
== "Today is Monday"
>> collect [parse/all str [some [copy x skip (keep x) ] ]]
== ["T" "o" "d" "a" "y" " " "i" "s" " " "M" "o" "n" "d" "a" "y"]
>> collect [parse/all str [some [x: skip (keep x/1)]]]
== [#"T" #"o" #"d" #"a" #"y" #" " #"i" #"s" #" " #"M" #"o" #"n" #"d" #"a" #"y"]
Red allows a tighter version
>> parse str [collect [some [keep skip]]]
== [#"T" #"o" #"d" #"a" #"y" #" " #"i" #"s" #" " #"M" #"o" #"n" #"d" #"a" #"y"]

Related

Ruby compare 2 arrays of integer

Let's say i have 2 arrays with the same length filled with integers
a = [1,2,3,4]
b = [1,3,2,5]
And i want to compare these arrays and get an output in new (c) array so it would look like this
c = ["+","-","-"," "]
An empty space indicates that there is not a current digit in a first array
A - indicates a number match: one of the numbers in the second array is the same as one of the numbers in the first array but in a different position
Currently i have this comparison method and need to improve it
a.each_with_index.map {|x, i| b[i] == x ? '+' : '-'}
Something like this would work: (assuming unique elements)
a.zip(b).map do |i, j|
if i == j
'+'
elsif b.include?(i)
'-'
else
' '
end
end
#=> ["+", "-", "-", " "]
zip combines the arrays in an element-wise manner and then maps the pairs as follows:
'+' if they are identical
'-' if the element from a is included in b
' ' otherwise
Note that the result still reflects the element order i.e. the position of '+' exactly tells you which elements are identical. You might want to sort / shuffle the result.
appearance = a.each_with_index.to_h
b.map.with_index { |v, i|
appearance[v].nil? ? " " : (v == a[i] ? '+' : '-')
}

Lua gsub chars '(' and ')' fails

For some reason only the open and close bracket wont work, all others are fine.
RequestEncoded = string.gsub(RequestEncoded, '<', ' ')
RequestEncoded = string.gsub(RequestEncoded, '>', ' ')
RequestEncoded = string.gsub(RequestEncoded, '"', ' ')
RequestEncoded = string.gsub(RequestEncoded, '\'', ' ')
RequestEncoded = string.gsub(RequestEncoded, '\\', ' ')
-- RequestEncoded = string.gsub(RequestEncoded, '(', ' ') keeps failing
-- RequestEncoded = string.gsub(RequestEncoded, ')', ' ')
-- RequestEncoded = string.gsub(RequestEncoded, "\x28", " ") --keeps failing
-- RequestEncoded = string.gsub(RequestEncoded, "\x29", ' ')
-- RequestEncoded = string.gsub(RequestEncoded, '\050', ' ') --keeps failing
-- RequestEncoded = string.gsub(RequestEncoded, '\051', ' ')
) and ( are special characters that form a capturing group in a Lua pattern.
You need to escape them when they are outside of square brackets, [...], to match literal parentheses. You need to escape them with %.
string.gsub(RequestEncoded, '%(', ' ')
string.gsub(RequestEncoded, '%)', ' ')
However, since you are using the same replacement pattern in all the subsequent gsub calls, you may simplify your code to
RequestEncoded = string.gsub(RequestEncoded, '[<>"\'\\()]', ' ')
Note that here, () are inside a bracket expression and do not need escaping.
See Lua patterns docs:
Some characters, called magic characters, have special meanings when used in a pattern. The magic characters are
( ) . % + - * ? [ ^ $

Formatting a table in Lua with string.gsub

The following code formats a table into a printable string, but I feel like this can be done a lot easier.
function printFormat(table)
local str = ""
for key, value in pairs(table) do
if value == 1 then
str = str .. string.gsub(value, 1, "A, ") -- Replaces 1 with A
elseif value == 2 then
str = str .. string.gsub(value, 2, "B, ") -- Replaces 2 with B
elseif value == 3 then
str = str .. string.gsub(value, 3, "C, ") -- Replaces 3 with C
elseif value == 4 then
str = str .. string.gsub(value, 4, "D, ") -- Replaces 4 with D
end
end
str = string.sub(str, 1, #str - 2) -- Removes useless chars at the end (last comma and last whitespace)
str = "<font color=\"#FFFFFF\">" .. str .. "</font>" -- colors the string
print(str)
end
local myTable = {1,4,3,2,3,2,1,3,4,2,2,...}
printFormat(myTable)
Is there a way to use a oneliner instead of having to loop through every index and make changes?
Or make the code more compact?
You can use a helper table to replace the multiple if statement:
local chars = {"A", "B", "C", "D"}
for _, v in ipairs(t) do
str = str .. chars[v] .. ", "
end
Or, if there are more than 1 to 4, try this:
for _, v in ipairs(t) do
str = str .. string.char(string.byte('A') + v) .. ", "
end
Use table.concat.
string.gsub can perform replacement using a replacement table.
Don't use names like table for your own variables.
Therefore:
function printFormat( tInput )
local sReturn = table.concat( tInput, ', ' )
sReturn = sReturn:gsub( '%d', {['1'] = 'A', ...} ) -- update table accordingly
return '<font color="#FFFFFF">' .. str .. "</font>"
end
and, for one liner:
return '<font color="#FFFFFF">' .. ( table.concat(tInput, ', ') ):gsub( '%d', {['1'] = 'A', ...} )

ANTLR: Parser rule sensitive to whitespace

I have the following input data:
Valid string: "123A"
Invalid string: "123 A"
Valid string: "111A <= 5 AND 222A"
Invalid string: "111 A <= 5 AND 222A"
Below you can see the grammar I'm using (antlr 3.4).
my_id: INT ('A'|'a') -> INT;
fragment DIGIT: '0' .. '9';
INT : DIGIT+ ;
WS : (' '|'\t'|'\n'|'\r')+ {$channel=HIDDEN;} ;
The problem is that my_id matches both 123 A and 123A. How can I throw a parsing error when detecting 123 A?
Any help is gladly appreciated.

Mandatory and optional spaces

I need to parse strings like this:
"qqq www eee" -> "qqq", "www", "eee" (case A)
"qqq www eee" -> "qqq", "www", "eee" (case B)
Here's the grammar I currently have:
grammar Query;
SHORT_NAME : ('a'..'z')+ ;
name returns [String s]: SHORT_NAME { $s = $SHORT_NAME.text; };
names
returns [List<String> v]
#init { $v = new ArrayList<String>(); }
: name1 = name { $v.add($name1.s); }
(' ' name2 = name { $v.add($name2.s); })*;
It works fine for caseA, but fails for caseB:
line 1:4 missing SHORT_NAME at ' '
line 1:5 extraneous input ' ' expecting SHORT_NAME
line 1:10 extraneous input ' ' expecting SHORT_NAME
Any ideas how to make it work?
Remove the literal ' ' from your names rule and replace it with a SPACES token:
grammar Query;
SPACES
: (' ' | '\t')+
;
SHORT_NAME
: ('a'..'z')+
;
name returns [String s]
: SHORT_NAME { $s = $SHORT_NAME.text; }
;
names returns [List<String> v]
#init { $v = new ArrayList<String>(); }
: a=name { $v.add($a.s); } (SPACES b=name { $v.add($b.s); })*
;
Or simply discard the spaces at the lexer-level so that you don't need to put them in your parser rules:
grammar Query;
SPACES
: (' ' | '\t')+ {skip();}
;
SHORT_NAME
: ('a'..'z')+
;
name returns [String s]
: SHORT_NAME { $s = $SHORT_NAME.text; }
;
names returns [List<String> v]
#init { $v = new ArrayList<String>(); }
: (name { $v.add($b.s); })+
;

Resources